Data module

OutlierDataset(root, dataset, outliers[, ...])

OutlierDataset for combining instances from (known) training and (unknown) outlier data.

DataWrapper(root, base_dataset, get_classes, ...)

DataWrapper for base datasets.

configure_oneclass_division(base_dataset, ...)

Method for obtaining configurations for OSR model evaluation using Holdout protocol (both KKC and UUC from single dataset) for One-class classification.

configure_division(base_dataset, repeats[, ...])

Method for obtaining configurations for OSR model evaluation using Holdout protocol (both KKC and UUC from single dataset).

get_train_test(base_dataset, kkc_indexes, ...)

Method for obtaining Cross-validation folds using Holdout protocol (both KKC and UUC from single dataset).

configure_division_outlier(base_dataset, ...)

Method for obtaining configurations for OSR model evaluation using Outlier protocol.

get_train_test_outlier(base_dataset, ...[, ...])

Method for obtaining Cross-validation folds using Outlier protocol (KKC obtained from base_dataset and UUC from outlier_dataset).

class torchosr.data.DataWrapper(root: str, base_dataset, get_classes, known_classes, return_only_known, indexes='all', onehot=False, onehot_num_classes=None)

Bases: VisionDataset

DataWrapper for base datasets.

Parameters:
  • root (string) – Data directory.

  • base_dataset (VisionDataset) – base dataset implementing __getitem__ function.

  • indexes (List) – Indexes of wrapped dataset objects that will be returned.

  • get_classes (List) – Considered class indexes from base dataset (known + unknown).

  • known_classes (List) – Considered known class indexes from base dataset.

  • return_only_known (boolean) – If True will return only known instances (for training). If False will assign new class index (equal to the number of classes) to unknown samples, and return them with known data (for testing).

  • onehot (boolean) – If True will perform one-hot encoding on labels.

  • onehot_num_classes (int) – Number of classes for one-hot encoding (in case outlier data is generated for testing).

class torchosr.data.OutlierDataset(root: str, dataset, outliers, shuffle: bool = True, random_state: int | None = None, unknown_label: int | None = None, onehot=False, onehot_num_classes=None)

Bases: VisionDataset

OutlierDataset for combining instances from (known) training and (unknown) outlier data.

Parameters:
  • root (string) – Data directory.

  • dataset (VisionDataset) – Dataset of known-class testing examples.

  • outliers (VisionDataset) – Dataset of unknown-class testing examples, which will be labeled as unknowns.

  • shuffle (boolean) – If True, the final data will be shuffled.

  • random_state (int) – Random state (for shuffle).

  • unknown_label (int) – Label with which the unknowns will be marked.

  • onehot (boolean) – If True will perform one-hot encoding on labels.

  • onehot_num_classes (int) – Number of classes for one-hot encoding (in case outlier data is generated for testing).

torchosr.data.configure_division(base_dataset, repeats, n_openness=None, seed=None, min_known_classes=2)

Method for obtaining configurations for OSR model evaluation using Holdout protocol (both KKC and UUC from single dataset).

Parameters:
  • base_dataset (VisionDataset) – Base dataset

  • repeats (int) – Number of randol selections of classes for single openness (KKC/UUC class cardinality)

  • n_openness (int) – Number of KKC/UUC class cardinality to generate. If None will return all possible configurations.

  • seed (int) – Random state

  • min_known_classes (int) – Minimum number of known classes

Return type:

List

Returns:

Lit of dataset configurations – each containing sets of KKC and UUC – and their Openness

torchosr.data.configure_division_outlier(base_dataset, outlier_dataset, repeats, n_openness=None, seed=None, min_known_classes=2)

Method for obtaining configurations for OSR model evaluation using Outlier protocol. KKC come from base_dataset, UUC from outlier_dataset.

Parameters:
  • base_dataset (VisionDataset) – Dataset describing KKC instances

  • outlier_dataset (VisionDataset) – Dataset describing UUC instances

  • repeats (int) – Number of randol selections of classes for single openness (KKC/UUC class cardinality)

  • n_openness (int) – Number of KKC/UUC class cardinality to generate

  • seed (int) – Random state

  • min_known_classes (int) – Minimum number of known classes

Return type:

List

Returns:

Lit of dataset configurations – each containing sets of KKC and UUC – and their Openness

torchosr.data.configure_oneclass_division(base_dataset, repeats, n_openness=None, seed=None)

Method for obtaining configurations for OSR model evaluation using Holdout protocol (both KKC and UUC from single dataset) for One-class classification. Set of KKC always contains a single class.

Parameters:
  • base_dataset (VisionDataset) – Base dataset

  • repeats (int) – Number of randol selections of classes for single openness (KKC/UUC class cardinality)

  • n_openness (int) – Number of KKC/UUC class cardinality to generate. In None will return all possible configurations.

  • seed (int) – Random state

Return type:

List

Returns:

List of dataset configurations – each containing sets of KKC and UUC – and their Openness

torchosr.data.get_train_test(base_dataset, kkc_indexes, uuc_indexes, root, tunning, fold, seed=1410, n_folds=5)

Method for obtaining Cross-validation folds using Holdout protocol (both KKC and UUC from single dataset).

Parameters:
  • base_dataset (VisionDataset) – Base dataset

  • kkc_indexes (List) – List of labels constituting Known Classes

  • uuc_indexes (List) – List of labels constituting Unknown Classes

  • root (string) – Datasets folder

  • tunning (boolean) – Flag. If True will split 10% of data for tunning, otherwise will split 90% of data.

  • fold (int) – Fold index

  • n_folds (int) – Number of folds

  • seed (int) – Random state

Return type:

List

Returns:

Train dataset, Test dataset

torchosr.data.get_train_test_outlier(base_dataset, outlier_dataset, kkc_indexes, uuc_indexes, root, tunning, fold, seed=1410, n_folds=5)

Method for obtaining Cross-validation folds using Outlier protocol (KKC obtained from base_dataset and UUC from outlier_dataset).

Parameters:
  • base_dataset (VisionDataset) – Dataset describing KKC instances

  • outlier_dataset (VisionDataset) – Dataset describing UUC instances

  • kkc_indexes (List) – List of labels constituting Known Classes

  • uuc_indexes (List) – List of labels constituting Unknown Classes

  • root (string) – Datasets folder

  • tunning (boolean) – Flag. If True will split 10% of data for tunning, otherwise will split 90% of data.

  • fold (int) – Fold index

  • n_folds (int) – Number of folds

  • seed (int) – Random state

Return type:

List

Returns:

Train dataset, Test dataset