src_dir = 'test'
fnames = ['spectra-features-smp.npy', 'spectra-wavenumbers-smp.npy',
'depth-order-smp.npy', 'target-smp.npy',
'tax-order-lu-smp.pkl', 'spectra-id-smp.npy']
X, X_names, depth_order, y, tax_lookup, X_id = load_kssl(src_dir, fnames=fnames)
transforms = [select_y, select_tax_order, select_X, log_transform_y]
data = X, y, X_id, depth_order
X, y, X_id, depth_order = compose(*transforms)(data)PyTorch data loaders and transforms
Loaders & datasets
SpectralDataset
SpectralDataset (X, y, tax_order, transform=None)
An abstract class representing a :class:Dataset.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler implementations and the default options of :class:~torch.utils.data.DataLoader.
.. note:: :class:~torch.utils.data.DataLoader by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
DataLoaders
DataLoaders (*args, transform=None, batch_size=32)
Convert numpy error to Pytorch data loaders (generators) Args: *args: one or many tuple as ((X_train, y_train, tax_order), (X_test, y_test, tax_order)) transform: callable class (class)
Returns: (training_generator, validation_generator)
Transforms
SNV_transform
SNV_transform ()
Initialize self. See help(type(self)) for accurate signature.
Noop
Noop ()
Initialize self. See help(type(self)) for accurate signature.
Example of use
Load and preprocess data
Train/test split
data = train_test_split(X, y, depth_order[:, 1], test_size=0.1, random_state=42)
X_train, X_test, y_train, y_test, tax_order_train, tax_order_test = data
data = train_test_split(X_train, y_train, tax_order_train, test_size=0.1, random_state=42)
X_train, X_valid, y_train, y_valid, tax_order_train, tax_order_valid = dataCreate the generators
dls = DataLoaders((X_train, y_train, tax_order_train),
(X_valid, y_valid, tax_order_valid),
(X_test, y_test, tax_order_test), transform=SNV_transform())
training_generator, validation_generator, test_generator = dls.loaders()Iterate over data (features, targets) mini batches
for features, target, tax in training_generator:
print(f'Batch of features (spectra): {features.shape}')
print(f'Batch of targets: {target.shape}')
print(f'Batch of Soil taxonomy orders id: {tax.shape}')Batch of features (spectra): torch.Size([32, 1, 1764])
Batch of targets: torch.Size([32, 1])
Batch of Soil taxonomy orders id: torch.Size([32, 1])
Batch of features (spectra): torch.Size([31, 1, 1764])
Batch of targets: torch.Size([31, 1])
Batch of Soil taxonomy orders id: torch.Size([31, 1])
for features, target, _ in validation_generator:
print(f'Batch of features (spectra): {features.shape}')
print(f'Batch of targets: {target.shape}')Batch of features (spectra): torch.Size([8, 1, 1764])
Batch of targets: torch.Size([8, 1])
for features, target, _ in test_generator:
print(f'Batch of features (spectra): {features.shape}')
print(f'Batch of targets: {target.shape}')Batch of features (spectra): torch.Size([8, 1, 1764])
Batch of targets: torch.Size([8, 1])