torchsig.utils.dataset_summarizer.DatasetSummary

class torchsig.utils.dataset_summarizer.DatasetSummary(root: str, n_bins: int | dict[str, int] = 50)[source]

Bases: object

Summary statistics for a StaticTorchSigDataset.

Reads every sample once, extracts per-signal metadata from Signal.component_signals, and histograms the results with numpy.

dataset_length

Number of samples in the dataset.

class_counts

Counter mapping class_name -> occurrences.

histograms

Dict of metric_key -> (counts, bin_edges) from np.histogram.

num_signals_per_sample

Array of signal counts per sample.

Methods

from_dataset

Create a summary from an already-loaded dataset.

plot

Plot summary histograms.

__init__(root: str, n_bins: int | dict[str, int] = 50) None[source]

Summarize a static dataset on disk.

Parameters:
  • root – Path to the dataset directory (containing data.h5).

  • n_bins – Number of histogram bins. Either a single int applied to all metrics, or a dict mapping metric names to bin counts. Defaults to 50.

classmethod from_dataset(dataset: StaticTorchSigDataset, n_bins: int | dict[str, int] = 50) DatasetSummary[source]

Create a summary from an already-loaded dataset.

The dataset must return Signal objects (i.e. target_labels=None).

Parameters:
  • dataset – A loaded StaticTorchSigDataset.

  • n_bins – Number of histogram bins (int or per-metric dict).

Returns:

A populated DatasetSummary.

plot(metrics: list[str] | None = None, max_cols: int = 2, width_per_plot: int = 15, height_per_plot: int = 10, round_labels: int = 2, save_path: str | None = None)[source]

Plot summary histograms.

Parameters:
  • metrics – Which metrics to plot (keys of _PLOT_CONFIG). Defaults to all available metrics.

  • max_cols – Maximum subplot columns per row.

  • width_per_plot – Width in inches per subplot.

  • height_per_plot – Height in inches per subplot.

  • round_labels – Decimal places for bin-edge labels.

  • save_path – If provided, save the figure to this path.

Returns:

The matplotlib (fig, axes) tuple.