torchsig.datasets.dataset_metadata.DatasetMetadata¶
- class torchsig.datasets.dataset_metadata.DatasetMetadata(num_iq_samples_dataset: int, fft_size: int, impairment_level: int, num_signals_max: int, sample_rate: float = 10000000.0, num_signals_min: int = 0, num_signals_distribution: ndarray | List[float] | None = None, snr_db_min: float = 0.0, snr_db_max: float = 50.0, signal_duration_min: float | None = None, signal_duration_max: float | None = None, signal_bandwidth_min: float | None = None, signal_bandwidth_max: float | None = None, signal_center_freq_min: float | None = None, signal_center_freq_max: float | None = None, transforms: list = [], target_transforms: list = [], class_list: List[str] | None = None, class_distribution: ndarray | List[float] | None = None, num_samples: int | None = None, dataset_type: str = 'None', **kwargs)[source]¶
Bases:
SeedableDataset Metdata. Contains useful information about the dataset.
Maintains the metadata for the parameters of the datasets, such as sample rate. The class holds all of the high level information about the dataset that the signals, impairments and other processes will require. Parameters that are common to all signals will be stored in the dataset metadata. For example, all signal generation requires a common and consistent sampling rate reference.
This class is needed needed at almost every level of the DSP, therefore rather than pass around multiple variables, or a dict, or use globals, this class is defined and passed as a parameter.
This class stores metadata related to the dataset, including parameters related to signal generation, transforms, dataset path, and sample distribution. It also handles the verification of dataset settings and ensures that the configuration is valid for the dataset creation process.
Methods
Add parent Seedable object and set up RNGs accordingly
get_distributionGets second seed, usually used to seed both torch and numpy generators with slightly different seeds
Seed number generators with given seed.
Initialize torch and numpy number generators, and update its children.
Converts the dataset metadata into a dictionary format.
Update numpy and torch number generators with parent seed
Attributes
Signal modulation class distribution for dataset generation.
Signal modulation class list for dataset.
The maximum possible bandwidth for a signal
The minimum possible bandwidth for a signal
The maximum center frequency for a signal
The minimum center frequency for a signal
The maximum duration in samples possible within the dataset
The minimum duration in samples possible within the dataset
The maximum duration possible within the dataset
The minimum duration possible within the dataset
Type of dataset.
The maximum frequency associated with the FFT
The minimum frequency associated with the FFT
Frequency resolution of the spectrogram
The size of FFT (number of bins) to be used in spectrogram.
The stride of input samples in FFT (number of samples)
Maximum representable frequency
Minimum representable frequency
Level of signal impairments to apply to signals (0-2)
Impairment signal and dataset transforms
Reference noise power (dB) for the dataset
Length of I/Q array per sample in dataset.
Getter for the number of samples in the dataset.
Probabilities for each value in num_signals_range.
Max number of signals in each sample in the dataset
Minimum number of signals in each sample in the dataset.
Range of num_signals can be generated by a sample.
Sample rate for the dataset.
Defines the maximum bandwidth for a signal in the dataset Must be within the boundary provided by dataset_bandwidth_max().
Defines the minimum bandwidth for a signal in the dataset Must be within the boundary provided by dataset_bandwidth_min().
Defines the maximum center frequency boundary for a signal.
Defines the minimum center frequency boundary for a signal.
The maximum duration in samples for a signal
The minimum duration in samples for a signal
Getter for the maximum signal duration.
Getter for the minimum signal duration.
Minimum SNR in dB for signals in dataset
Minimum SNR in dB for signals in dataset
Target Transform to apply.
Transforms to perform on signal data (after signal impairments).
- __init__(num_iq_samples_dataset: int, fft_size: int, impairment_level: int, num_signals_max: int, sample_rate: float = 10000000.0, num_signals_min: int = 0, num_signals_distribution: ndarray | List[float] | None = None, snr_db_min: float = 0.0, snr_db_max: float = 50.0, signal_duration_min: float | None = None, signal_duration_max: float | None = None, signal_bandwidth_min: float | None = None, signal_bandwidth_max: float | None = None, signal_center_freq_min: float | None = None, signal_center_freq_max: float | None = None, transforms: list = [], target_transforms: list = [], class_list: List[str] | None = None, class_distribution: ndarray | List[float] | None = None, num_samples: int | None = None, dataset_type: str = 'None', **kwargs)[source]¶
Initializes Dataset Metadata
- Parameters:
num_iq_samples_dataset (int) – Length of I/Q array in dataset.
fft_size (int) – Size of FFT (number of bins) to be used in spectrogram.
impairment_level (int) – Signal impairment level.
sample_rate (float, optional) – Sample rate for dataset. Defaults to 10e6.
num_signals_min (int, optional) – Minimum number of signals per sample. Defaults to 0.
num_signals_max (int) – Maximum number of signals per sample in dataset.
num_signals_distribution (np.ndarray | List[float], optional) – Probability to generate sample with N signals for each value in [num_signals_min, num_signals_max]. Defaults to None (uniform).
snr_db_min (float, optional) – Minimum SNR of signals to generate. Defaults to 0.0.
snr_db_max (float, optional) – Maximum SNR of signals to generate. Defaults to 50.0.
signal_duration_min (float, optional) – Minimum duration of signal. Defaults to None.
signal_duration_max (float, optional) – Maximum duration of signal. Defaults to None.
signal_bandwidth_min (float, optional) – Minimum bandwidth of the signal. Defaults to None.
signal_bandwidth_max (float, optional) – Maximum bandwidth of the signal. Defaults to None.
signal_center_freq_min (float, optional) – Minimum center frequency of the signal. Defaults to None.
signal_center_freq_max (float, optional) – Maximum center frequency of the signal. Defaults to None.
transforms (list) – Transforms to apply. Defaults to [].
target_transforms (list) – List of Target Transforms to apply. Defaults to [].
class_list (List[str], optional) – Signal class name list. Defaults to TorchSigSignalLists.all_signals.
class_distribution (np.ndarray | List[float], optional) – Probabilities for each class in class_list. Defaults to None (uniform).
num_samples (int, optional) – Set dataset size. For infinite dataset, set to None, Defaults to None.
dataset_type (str, optional) – Dataset type name. Defaults to “None”.
- Raises:
ValueError – If any of the provided parameters are invalid or incompatible.
- to_dict() Dict[str, Any][source]¶
Converts the dataset metadata into a dictionary format.
This method organizes various metadata fields related to the dataset into categories such as general dataset information, signal generation parameters, and dataset writing information.
- Returns:
A dictionary representation of the dataset metadata.
- Return type:
Dict[str, Any]
- property dataset_center_freq_max: float¶
The maximum center frequency for a signal
The maximum is a boundary condition such that the center frequency will not alias across the upper sampling rate boundary.
The calculation includes a small epsilon such that the center_freq_max is never equal to sample_rate/2 to avoid the aliasing condition because -sample_rate/2 is equivalent to sample_rate/2.
- Returns:
maximum center frequency boundary for signal
- Return type:
- property dataset_duration_max: float¶
The maximum duration possible within the dataset
The maximum is a boundary condition such that the signal duration will not exceed the total time duration of the dataset.
- Returns:
maximum duration for a signal
- Return type:
- property dataset_duration_min: float¶
The minimum duration possible within the dataset
The minimum is a boundary condition such that the signal duration will not be less than a specified minimum.
- Returns:
minimum duration for a signal
- Return type:
- property dataset_duration_in_samples_max: float¶
The maximum duration in samples possible within the dataset
The maximum is a boundary condition such that the signal duration in number of samples will not exceed the total number of samples within the dataset.
- Returns:
maximum duration for a signal in number of samples
- Return type:
- property dataset_duration_in_samples_min: float¶
The minimum duration in samples possible within the dataset
The minimum is a boundary condition such that the signal duration in number of samples will not exceed the total number of samples within the dataset.
- Returns:
minimum duration for a signal in number of samples
- Return type:
- property dataset_center_freq_min: float¶
The minimum center frequency for a signal
The minimum is a boundary condition such that the center frequency will not alias across the lower sampling rate boundary.
- Returns:
minimum center frequency boundary for signal
- Return type:
- property dataset_bandwidth_min: float¶
The minimum possible bandwidth for a signal
Provides a boundary for the minimum bandwidth of a signal, which is the bandwidth of a tone, which is sample rate / number of samples.
- Returns:
the minimum bandwidth for a signal
- Return type:
- property dataset_bandwidth_max: float¶
The maximum possible bandwidth for a signal
Provides a boundary for the maximum bandwidth of a signal, which is the sampling rate.
- Returns:
the maximum bandwidth for a signal
- Return type:
- property signal_center_freq_min: None¶
Defines the minimum center frequency boundary for a signal. Must be within the boundary provided by dataset_center_freq_min().
- Returns:
minimum center frequency for signal
- Return type:
- property signal_center_freq_max: None¶
Defines the maximum center frequency boundary for a signal. Must be within the boundary provided by dataset_center_freq_max().
- Returns:
maximum center frequency for signal
- Return type:
- property signal_bandwidth_min: float¶
Defines the minimum bandwidth for a signal in the dataset Must be within the boundary provided by dataset_bandwidth_min().
- Returns:
minimum bandwidth for a signal
- Return type:
- property signal_bandwidth_max: float¶
Defines the maximum bandwidth for a signal in the dataset Must be within the boundary provided by dataset_bandwidth_max().
- Returns:
maximumum bandwidth for a signal
- Return type:
- property signal_duration_in_samples_max: int¶
The maximum duration in samples for a signal
Provides a maximum duration for a signal in number of samples.
- Returns:
the maximum duration in samples for a signal
- Return type:
- property signal_duration_in_samples_min: int¶
The minimum duration in samples for a signal
Provides a minimum duration for a signal in number of samples.
- Returns:
the minimum duration in samples for a signal
- Return type:
- property num_iq_samples_dataset: int¶
Length of I/Q array per sample in dataset.
Returns the number of IQ samples of the dataset, this is the length of the array that contains the IQ samples
- Returns:
number of IQ samples
- Return type:
- property sample_rate: float¶
Sample rate for the dataset.
Returns the sampling rate associated with the IQ samples of the dataset
- Returns:
sample rate
- Return type:
- property num_signals_max: int¶
Max number of signals in each sample in the dataset
Returns the number of distinct signals in the wideband dataset
- Returns:
max number of signals
- Return type:
- property num_signals_min: int¶
Minimum number of signals in each sample in the dataset.
- Returns:
min number of signals
- Return type:
- property num_signals_range: List[int]¶
Range of num_signals can be generated by a sample.
- Returns:
List of num_signals possibilities.
- Return type:
List[int]
- property num_signals_distribution: List[float]¶
Probabilities for each value in num_signals_range.
- Returns:
Probabilties sample generates N signals per sample.
- Return type:
List[float]
- property transforms: list¶
Transforms to perform on signal data (after signal impairments).
- Returns:
Transform to apply to data.
- Return type:
- property impairment_level: int¶
Level of signal impairments to apply to signals (0-2)
- Returns:
Impairment level.
- Return type:
- property impairments: Impairments¶
Impairment signal and dataset transforms
- Returns:
Transforms or impairments
- Return type:
- property class_list: List[str]¶
Signal modulation class list for dataset.
- Returns:
List of signal modulation class names
- Return type:
List[str]
- property class_distribution: ndarray | List[str]¶
Signal modulation class distribution for dataset generation.
- Returns:
List of class probabilites.
- Return type:
np.ndarray | List[str]
- property num_samples: int¶
Getter for the number of samples in the dataset.
This property returns the number of samples that the dataset is configured to have. If the value is set to None, it indicates that the number of samples is considered infinite.
- Returns:
The number of samples in the dataset, or a representation of infinite samples if set to None.
- Return type:
- property noise_power_db: float¶
Reference noise power (dB) for the dataset
The noise power is a common reference to be used for all signal generation in order to establish accurate SNR calculations. The noise power dB is given in decibels. The PSD estimate of the AWGN is calculated such that the averaging across all frequency bins average to noise_power_db.
- Returns:
noise power in dB
- Return type:
- property snr_db_min: float¶
Minimum SNR in dB for signals in dataset
Signals within the dataset will be assigned a signal to noise ratio (SNR), across a range defined by a minimum and maximum value. snr_db_min is the low end of the SNR range.
- Returns:
minimum SNR in dB
- Return type:
- property snr_db_max: float¶
Minimum SNR in dB for signals in dataset
Signals within the dataset will be assigned a signal to noise ratio (SNR), across a range defined by a minimum and maximum value. snr_db_max is the high end of the SNR range.
- Returns:
maximum SNR in dB
- Return type:
- property signal_duration_max: float¶
Getter for the maximum signal duration.
- Returns:
The maximum of the signal duration.
- Return type:
- property signal_duration_min: float¶
Getter for the minimum signal duration.
- Returns:
The minimum of the signal duration.
- Return type:
- property fft_size: int¶
The size of FFT (number of bins) to be used in spectrogram.
The FFT size used to compute the spectrogram for the wideband dataset.
- Returns:
FFT size
- Return type:
- property fft_stride: int¶
The stride of input samples in FFT (number of samples)
The FFT stride controls the distance in samples between successive FFTs. A smaller FFT stride means more averaging between FFTs, a larger stride means less averaging between FFTs. fft_stride = fft_size means there is no overlap of samples between the current and next FFT. fft_stride = fft_size/2 means there is 50% overlap between the input samples of the the current and next fft.
- Returns:
FFT stride
- Return type:
- property fft_frequency_resolution: float¶
Frequency resolution of the spectrogram
The frequency resolution, or resolution bandwidth, of the FFT.
- Returns:
frequency resolution
- Return type:
- property fft_frequency_min: float¶
The minimum frequency associated with the FFT
Defines the smallest frequency within the FFT of the spectrogram. The FFT has discrete bins and therefore each bin has an associated frequency. This frequency is associated with the 0th bin or left-most frequency bin.
- Returns:
minimum FFT frequency
- Return type:
- property fft_frequency_max: float¶
The maximum frequency associated with the FFT
Defines the largest frequency within the FFT of the spectrogram. The FFT has discrete bins and therefore each bin has an associated frequency. This frequency is associated with the N-1’th bin or right-most frequency bin.
- Returns:
maximum FFT frequency
- Return type:
- property frequency_min: float¶
Minimum representable frequency
Boundary edge for testing the lower Nyquist sampling boundary.
- Returns:
minimum frequency
- Return type:
- property frequency_max: float¶
Maximum representable frequency
Boundary edge for testing the upper Nyquist sampling boundary. Due to the circular nature of the frequency domain, both -fs/2 and fs/2 represent the boundary, therefore an epsilon value is used to back off the upper edge slightly.
- Returns:
maximum frequency
- Return type: