torchsig.utils.dataset_summarizer.DatasetSummary¶
- class torchsig.utils.dataset_summarizer.DatasetSummary(root: str, n_bins: int | dict[str, int] = 50)[source]¶
Bases:
objectSummary statistics for a
StaticTorchSigDataset.Reads every sample once, extracts per-signal metadata from
Signal.component_signals, and histograms the results with numpy.- dataset_length¶
Number of samples in the dataset.
- class_counts¶
Counter mapping class_name -> occurrences.
- histograms¶
Dict of metric_key -> (counts, bin_edges) from np.histogram.
- num_signals_per_sample¶
Array of signal counts per sample.
Methods
Create a summary from an already-loaded dataset.
Plot summary histograms.
- __init__(root: str, n_bins: int | dict[str, int] = 50) None[source]¶
Summarize a static dataset on disk.
- Parameters:
root – Path to the dataset directory (containing
data.h5).n_bins – Number of histogram bins. Either a single int applied to all metrics, or a dict mapping metric names to bin counts. Defaults to 50.
- classmethod from_dataset(dataset: StaticTorchSigDataset, n_bins: int | dict[str, int] = 50) DatasetSummary[source]¶
Create a summary from an already-loaded dataset.
The dataset must return
Signalobjects (i.e.target_labels=None).- Parameters:
dataset – A loaded
StaticTorchSigDataset.n_bins – Number of histogram bins (int or per-metric dict).
- Returns:
A populated
DatasetSummary.
- plot(metrics: list[str] | None = None, max_cols: int = 2, width_per_plot: int = 15, height_per_plot: int = 10, round_labels: int = 2, save_path: str | None = None)[source]¶
Plot summary histograms.
- Parameters:
metrics – Which metrics to plot (keys of
_PLOT_CONFIG). Defaults to all available metrics.max_cols – Maximum subplot columns per row.
width_per_plot – Width in inches per subplot.
height_per_plot – Height in inches per subplot.
round_labels – Decimal places for bin-edge labels.
save_path – If provided, save the figure to this path.
- Returns:
The matplotlib
(fig, axes)tuple.