PyTorch data loaders and transforms

PyTorch DataLoaders, DataSet and transforms

Loaders & datasets

SpectralDataset

 SpectralDataset (X, y, tax_order, transform=None)

An abstract class representing a :class:Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader.

.. note:: :class:~torch.utils.data.DataLoader by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

source

DataLoaders

 DataLoaders (*args, transform=None, batch_size=32)

Convert numpy error to Pytorch data loaders (generators) Args: *args: one or many tuple as ((X_train, y_train, tax_order), (X_test, y_test, tax_order)) transform: callable class (class)

Returns: (training_generator, validation_generator)

Transforms

source

SNV_transform

 SNV_transform ()

Initialize self. See help(type(self)) for accurate signature.

source

Noop

 Noop ()

Initialize self. See help(type(self)) for accurate signature.

Example of use

Load and preprocess data

src_dir = 'test'
fnames = ['spectra-features-smp.npy', 'spectra-wavenumbers-smp.npy', 
          'depth-order-smp.npy', 'target-smp.npy', 
          'tax-order-lu-smp.pkl', 'spectra-id-smp.npy']

X, X_names, depth_order, y, tax_lookup, X_id = load_kssl(src_dir, fnames=fnames)
transforms = [select_y, select_tax_order, select_X, log_transform_y]

data = X, y, X_id, depth_order
X, y, X_id, depth_order = compose(*transforms)(data)

Train/test split

data = train_test_split(X, y, depth_order[:, 1], test_size=0.1, random_state=42)
X_train, X_test, y_train, y_test, tax_order_train, tax_order_test = data


data = train_test_split(X_train, y_train, tax_order_train, test_size=0.1, random_state=42)
X_train, X_valid, y_train, y_valid, tax_order_train, tax_order_valid = data

Create the generators

dls = DataLoaders((X_train, y_train, tax_order_train), 
                  (X_valid, y_valid, tax_order_valid), 
                  (X_test, y_test, tax_order_test), transform=SNV_transform())

training_generator, validation_generator, test_generator = dls.loaders()

Iterate over data (features, targets) mini batches

for features, target, tax in training_generator:
    print(f'Batch of features (spectra): {features.shape}')
    print(f'Batch of targets: {target.shape}')
    print(f'Batch of Soil taxonomy orders id: {tax.shape}')

Batch of features (spectra): torch.Size([32, 1, 1764])
Batch of targets: torch.Size([32, 1])
Batch of Soil taxonomy orders id: torch.Size([32, 1])
Batch of features (spectra): torch.Size([31, 1, 1764])
Batch of targets: torch.Size([31, 1])
Batch of Soil taxonomy orders id: torch.Size([31, 1])

for features, target, _ in validation_generator:
    print(f'Batch of features (spectra): {features.shape}')
    print(f'Batch of targets: {target.shape}')

Batch of features (spectra): torch.Size([8, 1, 1764])
Batch of targets: torch.Size([8, 1])

for features, target, _ in test_generator:
    print(f'Batch of features (spectra): {features.shape}')
    print(f'Batch of targets: {target.shape}')

Batch of features (spectra): torch.Size([8, 1, 1764])
Batch of targets: torch.Size([8, 1])