= 'test'
src_dir = ['spectra-features-smp.npy', 'spectra-wavenumbers-smp.npy',
fnames 'depth-order-smp.npy', 'target-smp.npy',
'tax-order-lu-smp.pkl', 'spectra-id-smp.npy']
= load_kssl(src_dir, fnames=fnames)
X, X_names, depth_order, y, tax_lookup, X_id = [select_y, select_tax_order, select_X, log_transform_y]
transforms
= X, y, X_id, depth_order
data = compose(*transforms)(data) X, y, X_id, depth_order
PyTorch data loaders and transforms
Loaders & datasets
SpectralDataset
SpectralDataset (X, y, tax_order, transform=None)
An abstract class representing a :class:Dataset
.
All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite :meth:__getitem__
, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite :meth:__len__
, which is expected to return the size of the dataset by many :class:~torch.utils.data.Sampler
implementations and the default options of :class:~torch.utils.data.DataLoader
.
.. note:: :class:~torch.utils.data.DataLoader
by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.
DataLoaders
DataLoaders (*args, transform=None, batch_size=32)
Convert numpy error to Pytorch data loaders (generators) Args: *args: one or many tuple as ((X_train, y_train, tax_order), (X_test, y_test, tax_order)) transform: callable class (class)
Returns: (training_generator, validation_generator)
Transforms
SNV_transform
SNV_transform ()
Initialize self. See help(type(self)) for accurate signature.
Noop
Noop ()
Initialize self. See help(type(self)) for accurate signature.
Example of use
Load and preprocess data
Train/test split
= train_test_split(X, y, depth_order[:, 1], test_size=0.1, random_state=42)
data = data
X_train, X_test, y_train, y_test, tax_order_train, tax_order_test
= train_test_split(X_train, y_train, tax_order_train, test_size=0.1, random_state=42)
data = data X_train, X_valid, y_train, y_valid, tax_order_train, tax_order_valid
Create the generators
= DataLoaders((X_train, y_train, tax_order_train),
dls
(X_valid, y_valid, tax_order_valid), =SNV_transform())
(X_test, y_test, tax_order_test), transform
= dls.loaders() training_generator, validation_generator, test_generator
Iterate over data (features, targets) mini batches
for features, target, tax in training_generator:
print(f'Batch of features (spectra): {features.shape}')
print(f'Batch of targets: {target.shape}')
print(f'Batch of Soil taxonomy orders id: {tax.shape}')
Batch of features (spectra): torch.Size([32, 1, 1764])
Batch of targets: torch.Size([32, 1])
Batch of Soil taxonomy orders id: torch.Size([32, 1])
Batch of features (spectra): torch.Size([31, 1, 1764])
Batch of targets: torch.Size([31, 1])
Batch of Soil taxonomy orders id: torch.Size([31, 1])
for features, target, _ in validation_generator:
print(f'Batch of features (spectra): {features.shape}')
print(f'Batch of targets: {target.shape}')
Batch of features (spectra): torch.Size([8, 1, 1764])
Batch of targets: torch.Size([8, 1])
for features, target, _ in test_generator:
print(f'Batch of features (spectra): {features.shape}')
print(f'Batch of targets: {target.shape}')
Batch of features (spectra): torch.Size([8, 1, 1764])
Batch of targets: torch.Size([8, 1])