Datasets¶

There are two main types of datasets: torchsig.datasets.datasets.NewDataset and torchsig.datasets.datasets.StaticDataset.

NewDataset and its counterparts torchsig.datasets.narrowband.NewDataset and torchsig.datasets.wideband.NewWideband are for generating synthetic data in memory (infinitely). Samples are not saved after being returned, and previous samples are inaccesible.

To then save a dataset to disk, use a torchsig.utils.writer.DatasetCreator which accepts a NewDataset` object.

StaticDataset (torchsig.datasets.narrowband.StaticNarrowband and torchsig.datasets.wideband.StaticWideband) are for loading a saved dataset to disk. Samples can be accessed in any order and previously generated samples are accesible.

Note: If a NewDataset is written to disk with no transforms and target transforms, it is considered raw. Otherwise, it is considered to processed. raw means when the dataset is loaded back in using a StaticDataset object, users can define transforms and target transforms to be applied. When a processed dataset is loaded back in, users cannot define any transforms and target transform to be applied.

Base Classes ¶

TorchSig Datasets ¶

Dataset Base Classes for creation and static loading.

class torchsig.datasets.datasets.TorchsigIterableDataset(dataset_metadata: DatasetMetadata | str | dict, **kwargs)[source]¶

Bases: IterableDataset, Seedable

Creates a new TorchSig dataset that generates data infinitely unless num_samples inside dataset_metadata is defined.

This base class provides the functionality to generate signals and write them to disk if necessary. The dataset will continue to generate samples infinitely unless a num_samples value is defined in the dataset_metadata.

reset()[source]¶: Resets the dataset to its initial state.

property dataset_metadata¶

Returns the dataset metadata.

Returns:: The dataset metadata.
Return type:: DatasetMetadata

class torchsig.datasets.datasets.NewTorchSigDataset(dataset_metadata: DatasetMetadata | str | dict, **kwargs)[source]¶

Bases: Dataset, Seedable

Creates a new TorchSig dataset that generates data infinitely unless num_samples inside dataset_metadata is defined.

This base class provides the functionality to generate signals and write them to disk if necessary. The dataset will continue to generate samples infinitely unless a num_samples value is defined in the dataset_metadata.

reset()[source]¶: Resets the dataset to its initial state.

property dataset_metadata¶

Returns the dataset metadata.

Returns:: The dataset metadata.
Return type:: DatasetMetadata

class torchsig.datasets.datasets.StaticTorchSigDataset(root: str, impairment_level: int, dataset_type: str, transforms: list = [], target_transforms: list = [], file_handler_class: TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, train: bool = None)[source]¶

Bases: Dataset

Static Dataset class, which loads pre-generated data from a directory.

This class assumes that the dataset has already been generated and saved to disk using a subclass of NewTorchSigDataset. It allows loading raw or processed data from disk for inference or analysis.

Parameters:

root (str) – The root directory where the dataset is stored.
impairment_level (int) – Defines impairment level 0, 1, 2.
dataset_type (str) – Type of the dataset, either “narrowband” or “wideband”.
transforms (list, optional) – Transforms to apply to the data (default: []).
target_transforms (list, optional) – Target transforms to apply (default: []).
file_handler_class (TorchSigFileHandler, optional) – Class used for reading the dataset (default: ZarrFileHandler).

Dataset Metadata ¶

Dataset Metadata class for Narrowband and Wideband

class torchsig.datasets.dataset_metadata.DatasetMetadata(num_iq_samples_dataset: int, fft_size: int, impairment_level: int, num_signals_max: int, sample_rate: float = 10000000.0, num_signals_min: int = 0, num_signals_distribution: ndarray | List[float] | None = None, snr_db_min: float = 0.0, snr_db_max: float = 50.0, signal_duration_min: float | None = None, signal_duration_max: float | None = None, signal_bandwidth_min: float | None = None, signal_bandwidth_max: float | None = None, signal_center_freq_min: float | None = None, signal_center_freq_max: float | None = None, transforms: list = [], target_transforms: list = [], class_list: List[str] | None = None, class_distribution: ndarray | List[float] | None = None, num_samples: int | None = None, dataset_type: str = 'None', **kwargs)[source]¶

Bases: Seedable

Dataset Metdata. Contains useful information about the dataset.

Maintains the metadata for the parameters of the datasets, such as sample rate. The class holds all of the high level information about the dataset that the signals, impairments and other processes will require. Parameters that are common to all signals will be stored in the dataset metadata. For example, all signal generation requires a common and consistent sampling rate reference.

This class is needed needed at almost every level of the DSP, therefore rather than pass around multiple variables, or a dict, or use globals, this class is defined and passed as a parameter.

This class stores metadata related to the dataset, including parameters related to signal generation, transforms, dataset path, and sample distribution. It also handles the verification of dataset settings and ensures that the configuration is valid for the dataset creation process.

minimum_params: List[str] = ['num_iq_samples_dataset', 'fft_size', 'num_signals_max']¶

to_dict() → Dict[str, Any][source]¶

Converts the dataset metadata into a dictionary format.

This method organizes various metadata fields related to the dataset into categories such as general dataset information, signal generation parameters, and dataset writing information.

Returns:: A dictionary representation of the dataset metadata.
Return type:: Dict[str, Any]

property dataset_center_freq_max: float¶

The maximum center frequency for a signal

The maximum is a boundary condition such that the center frequency will not alias across the upper sampling rate boundary.

The calculation includes a small epsilon such that the center_freq_max is never equal to sample_rate/2 to avoid the aliasing condition because -sample_rate/2 is equivalent to sample_rate/2.

Returns:: maximum center frequency boundary for signal
Return type:: float

property dataset_duration_max: float¶

The maximum duration possible within the dataset

The maximum is a boundary condition such that the signal duration will not exceed the total time duration of the dataset.

Returns:: maximum duration for a signal
Return type:: float

property dataset_duration_min: float¶

The minimum duration possible within the dataset

The minimum is a boundary condition such that the signal duration will not be less than a specified minimum.

Returns:: minimum duration for a signal
Return type:: float

property dataset_duration_in_samples_max: float¶

The maximum duration in samples possible within the dataset

The maximum is a boundary condition such that the signal duration in number of samples will not exceed the total number of samples within the dataset.

Returns:: maximum duration for a signal in number of samples
Return type:: float

property dataset_duration_in_samples_min: float¶

The minimum duration in samples possible within the dataset

The minimum is a boundary condition such that the signal duration in number of samples will not exceed the total number of samples within the dataset.

Returns:: minimum duration for a signal in number of samples
Return type:: float

property dataset_center_freq_min: float¶

The minimum center frequency for a signal

The minimum is a boundary condition such that the center frequency will not alias across the lower sampling rate boundary.

Returns:: minimum center frequency boundary for signal
Return type:: float

property dataset_bandwidth_min: float¶

The minimum possible bandwidth for a signal

Provides a boundary for the minimum bandwidth of a signal, which is the bandwidth of a tone, which is sample rate / number of samples.

Returns:: the minimum bandwidth for a signal
Return type:: float

property dataset_bandwidth_max: float¶

The maximum possible bandwidth for a signal

Provides a boundary for the maximum bandwidth of a signal, which is the sampling rate.

Returns:: the maximum bandwidth for a signal
Return type:: float

property signal_center_freq_min: None¶

Defines the minimum center frequency boundary for a signal. Must be within the boundary provided by dataset_center_freq_min().

Returns:: minimum center frequency for signal
Return type:: float

property signal_center_freq_max: None¶

Defines the maximum center frequency boundary for a signal. Must be within the boundary provided by dataset_center_freq_max().

Returns:: maximum center frequency for signal
Return type:: float

property signal_bandwidth_min: float¶

Defines the minimum bandwidth for a signal in the dataset Must be within the boundary provided by dataset_bandwidth_min().

Returns:: minimum bandwidth for a signal
Return type:: float

property signal_bandwidth_max: float¶

Defines the maximum bandwidth for a signal in the dataset Must be within the boundary provided by dataset_bandwidth_max().

Returns:: maximumum bandwidth for a signal
Return type:: float

property signal_duration_in_samples_max: int¶

The maximum duration in samples for a signal

Provides a maximum duration for a signal in number of samples.

Returns:: the maximum duration in samples for a signal
Return type:: float

property signal_duration_in_samples_min: int¶

The minimum duration in samples for a signal

Provides a minimum duration for a signal in number of samples.

Returns:: the minimum duration in samples for a signal
Return type:: float

property num_iq_samples_dataset: int¶

Length of I/Q array per sample in dataset.

Returns the number of IQ samples of the dataset, this is the length of the array that contains the IQ samples

Returns:: number of IQ samples
Return type:: int

property sample_rate: float¶

Sample rate for the dataset.

Returns the sampling rate associated with the IQ samples of the dataset

Returns:: sample rate
Return type:: float

property num_signals_max: int¶

Max number of signals in each sample in the dataset

Returns the number of distinct signals in the wideband dataset

Returns:: max number of signals
Return type:: int

property num_signals_min: int¶

Minimum number of signals in each sample in the dataset.

Returns:: min number of signals
Return type:: int

property num_signals_range: List[int]¶

Range of num_signals can be generated by a sample.

Returns:: List of num_signals possibilities.
Return type:: List[int]

property num_signals_distribution: List[float]¶

Probabilities for each value in num_signals_range.

Returns:: Probabilties sample generates N signals per sample.
Return type:: List[float]

property transforms: list¶

Transforms to perform on signal data (after signal impairments).

Returns:: Transform to apply to data.
Return type:: Transform

property target_transforms: list¶

Target Transform to apply.

Returns:: _description_
Return type:: TargetTransform

property impairment_level: int¶

Level of signal impairments to apply to signals (0-2)

Returns:: Impairment level.
Return type:: int

property impairments: Impairments¶

Impairment signal and dataset transforms

Returns:: Transforms or impairments
Return type:: Impairments

property class_list: List[str]¶

Signal modulation class list for dataset.

Returns:: List of signal modulation class names
Return type:: List[str]

property class_distribution: ndarray | List[str]¶

Signal modulation class distribution for dataset generation.

Returns:: List of class probabilites.
Return type:: np.ndarray | List[str]

property num_samples: int¶

Getter for the number of samples in the dataset.

This property returns the number of samples that the dataset is configured to have. If the value is set to None, it indicates that the number of samples is considered infinite.

Returns:: The number of samples in the dataset, or a representation of infinite samples if set to None.
Return type:: int

property dataset_type: str¶

Type of dataset.

Returns:: Dataset type name
Return type:: str

property noise_power_db: float¶

Reference noise power (dB) for the dataset

The noise power is a common reference to be used for all signal generation in order to establish accurate SNR calculations. The noise power dB is given in decibels. The PSD estimate of the AWGN is calculated such that the averaging across all frequency bins average to noise_power_db.

Returns:: noise power in dB
Return type:: float

property snr_db_min: float¶

Minimum SNR in dB for signals in dataset

Signals within the dataset will be assigned a signal to noise ratio (SNR), across a range defined by a minimum and maximum value. snr_db_min is the low end of the SNR range.

Returns:: minimum SNR in dB
Return type:: float

property snr_db_max: float¶

Minimum SNR in dB for signals in dataset

Signals within the dataset will be assigned a signal to noise ratio (SNR), across a range defined by a minimum and maximum value. snr_db_max is the high end of the SNR range.

Returns:: maximum SNR in dB
Return type:: float

property signal_duration_max: float¶

Getter for the maximum signal duration.

Returns:: The maximum of the signal duration.
Return type:: float

property signal_duration_min: float¶

Getter for the minimum signal duration.

Returns:: The minimum of the signal duration.
Return type:: float

property fft_size: int¶

The size of FFT (number of bins) to be used in spectrogram.

The FFT size used to compute the spectrogram for the wideband dataset.

Returns:: FFT size
Return type:: int

property fft_stride: int¶

The stride of input samples in FFT (number of samples)

The FFT stride controls the distance in samples between successive FFTs. A smaller FFT stride means more averaging between FFTs, a larger stride means less averaging between FFTs. fft_stride = fft_size means there is no overlap of samples between the current and next FFT. fft_stride = fft_size/2 means there is 50% overlap between the input samples of the the current and next fft.

Returns:: FFT stride
Return type:: int

property fft_frequency_resolution: float¶

Frequency resolution of the spectrogram

The frequency resolution, or resolution bandwidth, of the FFT.

Returns:: frequency resolution
Return type:: float

property fft_frequency_min: float¶

The minimum frequency associated with the FFT

Defines the smallest frequency within the FFT of the spectrogram. The FFT has discrete bins and therefore each bin has an associated frequency. This frequency is associated with the 0th bin or left-most frequency bin.

Returns:: minimum FFT frequency
Return type:: float

property fft_frequency_max: float¶

The maximum frequency associated with the FFT

Defines the largest frequency within the FFT of the spectrogram. The FFT has discrete bins and therefore each bin has an associated frequency. This frequency is associated with the N-1’th bin or right-most frequency bin.

Returns:: maximum FFT frequency
Return type:: float

property frequency_min: float¶

Minimum representable frequency

Boundary edge for testing the lower Nyquist sampling boundary.

Returns:: minimum frequency
Return type:: float

property frequency_max: float¶

Maximum representable frequency

Boundary edge for testing the upper Nyquist sampling boundary. Due to the circular nature of the frequency domain, both -fs/2 and fs/2 represent the boundary, therefore an epsilon value is used to back off the upper edge slightly.

Returns:: maximum frequency
Return type:: float

class torchsig.datasets.dataset_metadata.NarrowbandMetadata(num_iq_samples_dataset: int, fft_size: int, impairment_level: int, sample_rate: float = 10000000.0, num_signals_min: int | None = None, num_signals_distribution: ndarray | List[float] | None = None, snr_db_min: float = 0.0, snr_db_max: float = 50.0, signal_duration_min: float | None = None, signal_duration_max: float | None = None, signal_bandwidth_min: float | None = None, signal_bandwidth_max: float | None = None, signal_center_freq_min: float | None = None, signal_center_freq_max: float | None = None, transforms: list = [], target_transforms: list = [], class_list: List[str] = ['ook', '4ask', '8ask', '16ask', '32ask', '64ask', '2fsk', '2gfsk', '2msk', '2gmsk', '4fsk', '4gfsk', '4msk', '4gmsk', '8fsk', '8gfsk', '8msk', '8gmsk', '16fsk', '16gfsk', '16msk', '16gmsk', 'bpsk', 'qpsk', '8psk', '16psk', '32psk', '64psk', '16qam', '32qam', '32qam_cross', '64qam', '128qam_cross', '256qam', '512qam_cross', '1024qam', 'ofdm-64', 'ofdm-72', 'ofdm-128', 'ofdm-180', 'ofdm-256', 'ofdm-300', 'ofdm-512', 'ofdm-600', 'ofdm-900', 'ofdm-1024', 'ofdm-1200', 'ofdm-2048', 'fm', 'am-dsb-sc', 'am-dsb', 'am-lsb', 'am-usb', 'lfm_data', 'lfm_radar', 'chirpss', 'tone'], class_distribution=None, num_samples: int | None = None, **kwargs)[source]¶

Bases: DatasetMetadata

Narrowband Dataset Metadata Class

This class encapsulates the metadata for a narrowband dataset, extending the base DatasetMetadata class. It provides useful information about the dataset such as the number of samples, the sample rate, the FFT size, the impairment level, and signal-related parameters. Additionally, it handles specific properties for narrowband signals, such as oversampling rates and center frequency offset (CFO) error percentage.

minimum_params¶

List of minimum required parameters for the narrowband dataset.

Type:: List[str]

minimum_params: List[str] = ['num_iq_samples_dataset', 'fft_size', 'impairment_level']¶

class torchsig.datasets.dataset_metadata.WidebandMetadata(num_iq_samples_dataset: int, fft_size: int, impairment_level: int, num_signals_max: int, sample_rate: float = 100000000.0, num_signals_min: int | None = None, num_signals_distribution: ndarray | List[float] | None = None, snr_db_min: float = 0.0, snr_db_max: float = 50.0, signal_duration_min: float | None = None, signal_duration_max: float | None = None, signal_bandwidth_min: float | None = None, signal_bandwidth_max: float | None = None, signal_center_freq_min: float | None = None, signal_center_freq_max: float | None = None, transforms: list = [], target_transforms: list = [], class_list: List[str] = ['ook', '4ask', '8ask', '16ask', '32ask', '64ask', '2fsk', '2gfsk', '2msk', '2gmsk', '4fsk', '4gfsk', '4msk', '4gmsk', '8fsk', '8gfsk', '8msk', '8gmsk', '16fsk', '16gfsk', '16msk', '16gmsk', 'bpsk', 'qpsk', '8psk', '16psk', '32psk', '64psk', '16qam', '32qam', '32qam_cross', '64qam', '128qam_cross', '256qam', '512qam_cross', '1024qam', 'ofdm-64', 'ofdm-72', 'ofdm-128', 'ofdm-180', 'ofdm-256', 'ofdm-300', 'ofdm-512', 'ofdm-600', 'ofdm-900', 'ofdm-1024', 'ofdm-1200', 'ofdm-2048', 'fm', 'am-dsb-sc', 'am-dsb', 'am-lsb', 'am-usb', 'lfm_data', 'lfm_radar', 'chirpss', 'tone'], class_distribution=None, num_samples: int | None = None, **kwargs)[source]¶

Bases: DatasetMetadata

Wideband Dataset Metadata Class

This class encapsulates all useful metadata for a wideband dataset, extending the DatasetMetadata class. It adds functionality to manage the FFT size used to compute the spectrogram, along with additional parameters specific to wideband signals like bandwidth, center frequency, and impairments.

minimum_params¶

List of the minimum parameters required for the dataset.

Type:: List[str]

minimum_params: List[str] = ['num_iq_samples_dataset', 'fft_size', 'num_signals_max', 'impairment_level']¶

Narrowband ¶

NarrowbandMetadata and NewNarrowband Class

class torchsig.datasets.narrowband.NewNarrowband(dataset_metadata: DatasetMetadata | str | dict, **kwargs)[source]¶

Bases: NewTorchSigDataset

Creates a Narrowband dataset.

This class is responsible for creating the Narrowband dataset, which includes the dataset metadata and signal impairments.

Parameters:

dataset_metadata (DatasetMetadata | str | dict) – Metadata for the Narrowband dataset. This can be a DatasetMetadata object, a string (path to the metadata file), or a dictionary.
**kwargs – Additional keyword arguments passed to the parent class (NewTorchSigDataset).

class torchsig.datasets.narrowband.StaticNarrowband(root: str, impairment_level: int, transforms: list = [], target_transforms: list = [], file_handler_class: ~torchsig.utils.file_handlers.base_handler.TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, train: bool | None = None, **kwargs)[source]¶

Bases: StaticTorchSigDataset

Loads and provides access to a pre-generated Narrowband dataset.

This class allows for loading a narrowband dataset stored on disk, with the ability to apply transformations to the data and target labels. The dataset can be accessed in raw or impaired form.

Parameters:

root (str) – The root directory where the dataset is stored.
impairment_level (int) – Defines impairment level 0, 1, 2.
transforms (list, optional) – A transformation to apply to the data. Defaults to [].
target_transforms (list, optional) – A transformation to apply to the targets. Defaults to [].
file_handler_class (TorchSigFileHandler, optional) – The file handler class for reading the dataset. Defaults to ZarrFileHandler.
**kwargs – Additional keyword arguments passed to the parent class (StaticTorchSigDataset).

Wideband ¶

WidebandMetadata and NewWideband Class

class torchsig.datasets.wideband.NewWideband(dataset_metadata: DatasetMetadata | str | dict, **kwargs)[source]¶

Bases: NewTorchSigDataset

Creates a Wideband dataset.

This class is responsible for creating a Wideband dataset, including the metadata and any transformations needed.

Parameters:

dataset_metadata (DatasetMetadata | str | dict) – Metadata for the Wideband dataset. This can be a DatasetMetadata object, a string (path to the metadata file), or a dictionary.
**kwargs – Additional keyword arguments passed to the parent class (NewTorchSigDataset).

class torchsig.datasets.wideband.StaticWideband(root: str, impairment_level: int, transforms: list = [], target_transforms: list = [], file_handler_class: ~torchsig.utils.file_handlers.base_handler.TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, train: bool | None = None, **kwargs)[source]¶

Bases: StaticTorchSigDataset

Loads and provides access to a pre-generated Wideband dataset.

This class allows loading a pre-generated Wideband dataset from disk, and includes options for applying transformations to both the data and target labels. The dataset can be accessed in raw or impaired form, depending on the flags set.

Parameters:

root (str) – The root directory where the dataset is stored.
impairment_level (int) – Defines impairment level 0, 1, 2.
transforms (list, optional) – A transformation to apply to the data. Defaults to [].
target_transforms (list, optional) – A transformation to apply to the targets. Defaults to [].
file_handler_class (TorchSigFileHandler, optional) – The file handler class for reading the dataset. Defaults to ZarrFileHandler.
**kwargs – Additional keyword arguments passed to the parent class (StaticTorchSigDataset).

Datamodules ¶

PyTorch Lightning DataModules for Narrowband and Wideband

Learn More: https://lightning.ai/docs/pytorch/stable/data/datamodule.html

If dataset does not exist at root, creates new dataset and writes to disk If dataset does exsit, simply loaded it back in

class torchsig.datasets.datamodules.TorchSigDataModule(root: str, dataset: str, train_metadata: ~torchsig.datasets.dataset_metadata.DatasetMetadata | str | dict, val_metadata: ~torchsig.datasets.dataset_metadata.DatasetMetadata | str | dict, batch_size: int = 1, num_workers: int = 1, collate_fn: ~typing.Callable | None = None, create_batch_size: int = 8, create_num_workers: int = 4, file_handler: ~torchsig.utils.file_handlers.base_handler.TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, overwrite: bool = False, transforms: list = [], target_transforms: list = [])[source]¶

Bases: LightningDataModule

PyTorch Lightning DataModule for managing TorchSig datasets.

Parameters:

root (str) – The root directory where datasets are stored or created.
dataset (str) – The name of the dataset (either ‘narrowband’ or ‘wideband’).
train_metadata (DatasetMetadata | str | dict) – Metadata for the training dataset.
val_metadata (DatasetMetadata | str | dict) – Metadata for the validation dataset.
batch_size (int, optional) – The batch size for data loading. Defaults to 1.
num_workers (int, optional) – The number of worker processes for data loading. Defaults to 1.
collate_fn (Callable, optional) – A function to collate data into batches.
create_batch_size (int, optional) – The batch size used during dataset creation. Defaults to 8.
create_num_workers (int, optional) – The number of workers used during dataset creation. Defaults to 4.
file_handler (TorchSigFileHandler, optional) – The file handler for managing data storage. Defaults to ZarrFileHandler.
overwrite (bool, optional) – Overwrites data on disk. Defaults to False.
transforms (list, optional) – A list of transformations to apply to the input data. Defaults to an empty list.
target_transforms (list, optional) – A list of transformations to apply to the target labels. Defaults to an empty list.

train: self.static_dataset_class | None¶

val: self.static_dataset_class | None¶

test: self.static_dataset_class | None¶

prepare_data() → None[source]¶

Prepares the dataset by creating new datasets if they do not exist on disk. The datasets are created using the DatasetCreator class.

If the dataset already exists on disk, it is loaded back into memory.

setup(stage: str = 'train') → None[source]¶

Sets up the train and validation datasets for the given stage.

Parameters:: stage (str, optional) – The stage of the DataModule, typically ‘train’ or ‘test’. Defaults to ‘train’.

train_dataloader() → DataLoader[source]¶

Returns the DataLoader for the training dataset.

Returns:: A PyTorch DataLoader for the training dataset.
Return type:: DataLoader

val_dataloader() → DataLoader[source]¶

Returns the DataLoader for the validation dataset.

Returns:: A PyTorch DataLoader for the validation dataset.
Return type:: DataLoader

trainer: pl.Trainer | None¶

prepare_data_per_node: bool¶

allow_zero_length_dataloader_with_multiple_devices: bool¶

class torchsig.datasets.datamodules.NarrowbandDataModule(root: str, dataset_metadata: ~torchsig.datasets.dataset_metadata.NarrowbandMetadata | str | dict, num_samples_train: int, num_samples_val: int | None = None, batch_size: int = 1, num_workers: int = 1, collate_fn: ~typing.Callable | None = None, create_batch_size: int = 8, create_num_workers: int = 4, file_handler: ~torchsig.utils.file_handlers.base_handler.TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, overwrite: bool = False, transforms: ~torchsig.transforms.base_transforms.Transform | ~typing.List[~typing.Callable | ~torchsig.transforms.base_transforms.Transform] = [], target_transforms: ~torchsig.transforms.target_transforms.TargetTransform | ~typing.List[~typing.Callable | ~torchsig.transforms.target_transforms.TargetTransform] = [])[source]¶

Bases: TorchSigDataModule

DataModule for creating and managing narrowband datasets.

Parameters:

root (str) – The root directory where datasets are stored or created.
dataset_metadata (NarrowbandMetadata | str | dict) – Metadata for the narrowband dataset.
num_samples_train (int) – The number of training samples.
num_samples_val (int, optional) – The number of validation samples. Defaults to 10% of training samples if not provided.
batch_size (int, optional) – The batch size for data loading. Defaults to 1.
num_workers (int, optional) – The number of worker processes for data loading. Defaults to 1.
collate_fn (Callable, optional) – A function to collate data into batches.
create_batch_size (int, optional) – The batch size used during dataset creation. Defaults to 8.
create_num_workers (int, optional) – The number of workers used during dataset creation. Defaults to 4.
file_handler (TorchSigFileHandler, optional) – The file handler for managing data storage. Defaults to ZarrFileHandler.
overwrite (bool, optional) – Overwrites data on disk. Defaults to False.
transforms (Transform | List[Callable | Transform], optional) – A list of transformations to apply to the input data.
target_transforms (TargetTransform | List[Callable | TargetTransform], optional) – A list of transformations to apply to the target labels.

train: self.static_dataset_class | None¶

val: self.static_dataset_class | None¶

test: self.static_dataset_class | None¶

trainer: pl.Trainer | None¶

prepare_data_per_node: bool¶

allow_zero_length_dataloader_with_multiple_devices: bool¶

class torchsig.datasets.datamodules.WidebandDataModule(root: str, dataset_metadata: ~torchsig.datasets.dataset_metadata.WidebandMetadata | str | dict, num_samples_train: int, num_samples_val: int | None = None, batch_size: int = 1, num_workers: int = 1, collate_fn: ~typing.Callable | None = None, create_batch_size: int = 8, create_num_workers: int = 4, file_handler: ~torchsig.utils.file_handlers.base_handler.TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, overwrite: bool = False, transforms: ~torchsig.transforms.base_transforms.Transform | ~typing.List[~typing.Callable | ~torchsig.transforms.base_transforms.Transform] = [], target_transforms: ~torchsig.transforms.target_transforms.TargetTransform | ~typing.List[~typing.Callable | ~torchsig.transforms.target_transforms.TargetTransform] = [])[source]¶

Bases: TorchSigDataModule

DataModule for creating and managing wideband datasets.

Parameters:

root (str) – The root directory where datasets are stored or created.
dataset_metadata (WidebandMetadata | str | dict) – Metadata for the wideband dataset.
num_samples_train (int) – The number of training samples.
num_samples_val (int, optional) – The number of validation samples. Defaults to 10% of training samples if not provided.
batch_size (int, optional) – The batch size for data loading. Defaults to 1.
num_workers (int, optional) – The number of worker processes for data loading. Defaults to 1.
collate_fn (Callable, optional) – A function to collate data into batches.
create_batch_size (int, optional) – The batch size used during dataset creation. Defaults to 8.
create_num_workers (int, optional) – The number of workers used during dataset creation. Defaults to 4.
file_handler (TorchSigFileHandler, optional) – The file handler for managing data storage. Defaults to ZarrFileHandler.
overwrite (bool, optional) – Overwrites data on disk. Defaults to False.
transforms (Transform | List[Callable | Transform], optional) – A list of transformations to apply to the input data.
target_transforms (TargetTransform | List[Callable | TargetTransform], optional) – A list of transformations to apply to the target labels.

train: self.static_dataset_class | None¶

val: self.static_dataset_class | None¶

test: self.static_dataset_class | None¶

trainer: pl.Trainer | None¶

prepare_data_per_node: bool¶

allow_zero_length_dataloader_with_multiple_devices: bool¶

class torchsig.datasets.datamodules.OfficialTorchSigDataModdule(root: str, dataset: str, impairment_level: int, batch_size: int = 1, num_workers: int = 1, collate_fn: ~typing.Callable | None = None, create_batch_size: int = 8, create_num_workers: int = 4, file_handler: ~torchsig.utils.file_handlers.base_handler.TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, transforms: ~torchsig.transforms.base_transforms.Transform | ~typing.List[~typing.Callable | ~torchsig.transforms.base_transforms.Transform] = [], target_transforms: ~torchsig.transforms.target_transforms.TargetTransform | ~typing.List[~typing.Callable | ~torchsig.transforms.target_transforms.TargetTransform] = [])[source]¶

Bases: TorchSigDataModule

A PyTorch Lightning DataModule for official TorchSignal datasets.

This class manages the dataset metadata, configuration, and data loading process for datasets with official configurations instead of using custom metadata. It initializes the train and validation metadata based on the dataset type and impairment level.

Parameters:

root (str) – Root directory where the dataset is stored.
dataset (str) – Name of the dataset.
impairment_level (int) – Defines the impairment level of the dataset.
batch_size (int, optional) – Batch size for the dataloaders. Default is 1.
num_workers (int, optional) – Number of workers for data loading. Default is 1.
collate_fn (Callable, optional) – Function to merge a list of samples into a batch. Default is None.
create_batch_size (int, optional) – Batch size used during dataset creation. Default is 8.
create_num_workers (int, optional) – Number of workers used during dataset creation. Default is 4.
file_handler (TorchSigFileHandler, optional) – File handler used to read/write dataset. Default is ZarrFileHandler.
transforms (Transform | List[Callable | Transform], optional) – List of transforms applied to dataset. Default is empty list.
target_transforms (TargetTransform | List[Callable | TargetTransform], optional) – List of transforms applied to targets. Default is empty list.

train: self.static_dataset_class | None¶

val: self.static_dataset_class | None¶

test: self.static_dataset_class | None¶

trainer: pl.Trainer | None¶

prepare_data_per_node: bool¶

allow_zero_length_dataloader_with_multiple_devices: bool¶

class torchsig.datasets.datamodules.OfficialNarrowbandDataModule(root: str, impairment_level: int, batch_size: int = 1, num_workers: int = 1, collate_fn: ~typing.Callable | None = None, create_batch_size: int = 8, create_num_workers: int = 4, file_handler: ~torchsig.utils.file_handlers.base_handler.TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, transforms: list = [], target_transforms: list = [])[source]¶

Bases: OfficialTorchSigDataModdule

A DataModule for the official Narrowband dataset.

This class extends OfficialTorchSigDataModdule and sets the dataset type to ‘narrowband’. It initializes the necessary parameters for the dataset and loads the train and validation metadata accordingly.

Parameters:

root (str) – Root directory where the dataset is stored.
impairment_level (int) – Defines the impairment level of the dataset.
batch_size (int, optional) – Batch size for the dataloaders. Default is 1.
num_workers (int, optional) – Number of workers for data loading. Default is 1.
collate_fn (Callable, optional) – Function to merge a list of samples into a batch. Default is None.
create_batch_size (int, optional) – Batch size used during dataset creation. Default is 8.
create_num_workers (int, optional) – Number of workers used during dataset creation. Default is 4.
file_handler (TorchSigFileHandler, optional) – File handler used to read/write dataset. Default is ZarrFileHandler.
transforms (list, optional) – List of transforms applied to dataset. Default is empty list.
target_transforms (list, optional) – List of transforms applied to targets. Default is empty list.

train: self.static_dataset_class | None¶

val: self.static_dataset_class | None¶

test: self.static_dataset_class | None¶

trainer: pl.Trainer | None¶

prepare_data_per_node: bool¶

allow_zero_length_dataloader_with_multiple_devices: bool¶

class torchsig.datasets.datamodules.OfficialWidebandDataModule(root: str, impairment_level: int, batch_size: int = 1, num_workers: int = 1, collate_fn: ~typing.Callable | None = None, create_batch_size: int = 8, create_num_workers: int = 4, file_handler: ~torchsig.utils.file_handlers.base_handler.TorchSigFileHandler = <class 'torchsig.utils.file_handlers.zarr.ZarrFileHandler'>, transforms: list = [], target_transforms: list = [])[source]¶

Bases: OfficialTorchSigDataModdule

A DataModule for the official Wideband dataset.

This class extends OfficialTorchSigDataModdule and sets the dataset type to ‘wideband’. It initializes the necessary parameters for the dataset and loads the train and validation metadata accordingly.

Parameters:

root (str) – Root directory where the dataset is stored.
impairment_level (int) – Defines the impairment level of the dataset.
batch_size (int, optional) – Batch size for the dataloaders. Default is 1.
num_workers (int, optional) – Number of workers for data loading. Default is 1.
collate_fn (Callable, optional) – Function to merge a list of samples into a batch. Default is None.
create_batch_size (int, optional) – Batch size used during dataset creation. Default is 8.
create_num_workers (int, optional) – Number of workers used during dataset creation. Default is 4.
file_handler (TorchSigFileHandler, optional) – File handler used to read/write dataset. Default is ZarrFileHandler.
transforms (list, optional) – List of transforms applied to dataset. Default is empty list.
target_transforms (list, optional) – List of transforms applied to targets. Default is empty list.

train: self.static_dataset_class | None¶

val: self.static_dataset_class | None¶

test: self.static_dataset_class | None¶

trainer: pl.Trainer | None¶

prepare_data_per_node: bool¶

allow_zero_length_dataloader_with_multiple_devices: bool¶

Datasets¶

Base Classes¶

TorchSig Datasets¶

Dataset Metadata¶

Narrowband¶

Wideband¶

Datamodules¶

Base Classes ¶

TorchSig Datasets ¶

Dataset Metadata ¶

Narrowband ¶

Wideband ¶

Datamodules ¶