Mirzai
Prediction of Exchangeable Potassium in Soil through Mid-Infrared Spectroscopy and Deep Learning: from Prediction to Explainability, Albinet et al., 2022
The mirzai Python Package, the present documentation and associated notebooks ensure the reproducibility of the above-mentioned scientific paper.
Paper with code
Baseline model (PLSR):
Convolutional Neural Network (CNN):
PLSR vs. CNN figures:
Interpretability
Setup
Getting the data
A zipped archive of the data used in this study are available for download at the following link: https://drive.google.com/drive/folders/1VGfrBexMPCFvoUa1VW26n-zO9v5WHeFh?usp=sharing
In a local environment
The preferred way it to use Mamba. Mamba is a fast, robust, and cross-platform package manager.
To install the required dependency and proper Python version:
- Clone
git clone git@github.com:franckalbinet/mirzai.git
or download the https://github.com/franckalbinet/mirzai into your local environement - In
mirzai/
root folder, execute the following Mamba commandmamba env create -f environment.yml
Here below the content of mirzai/environment.yml
file listing required Python version and packages:
name: mirzai
channels:
- conda-forge
- fastchan
- pytorch
dependencies:
- python=3.8
- nbdev
- jupyterlab
- numpy
- scipy
- matplotlib=3.5.1
- scikit-learn
- pytorch
- torchvision=0.12.0
- tqdm
- captum
Then activate the Python environement generated:
mamba activate mirzai
And finally launch
jupyter notebook
In Google Colab
Google Colab has been used to perform the experiments described in the above mentioned paper. The main advantage of Colab is to give access to a GPU (Graphical Processing Unit) which allows to train Deep Learning model in a fair amount of time. Please refer to Colab FAQ for further information.
Each notebook listed above includes a link to load it in the Google Colab environment. When clicking on those links, the notebook will get loaded in Google Colab.
Then, once open in Colab, you will need:
- To mount Google drive to access the data uploaded
- To install the
mirzai
Python package
These two steps are already included on top of each notebook and will be executed if on Colab:
if 'google.colab' in str(get_ipython()):
from google.colab import drive
drive.mount('/content/drive', force_remount=False)
!pip install mirzai
To locate the Google Drive folder where you uploaded the data, follow the 3 steps shown below:
- Click the “Files” icon in the Colab left panel then click on “MyDrive”
- Navigate to the directory containing the data then click on the ” 3 vertical dots” icon
- Click on “Copy path” to copy the full path (for instance
/content/drive/MyDrive/research/predict-k-mirs-dl/data
)
Last, when a GPU is required (e.g when training the Convolutional Neural Network or computing the GradientShap values), change runtime type in Colab
top menu: Runtime ▶ Change runtime type ▶ and select GPU
in the “Hardware accelerator” select box.
Acknowledgements
This work was carried out in the context of the IAEA funded Coordinated Research Project (CRP D1.50.19) titled “Remediation of Radioactive Contaminated Agricultural Land”, under IAEA Technical Contract n°23685.
We also thank Richard Ferguson from Kellogg Soil Survey Laboratory for providing access to the USDA MIR soil spectra library and the r equired training sessions for its operation.
Others
The name mirzai
comes from Mid-InfraRed Spectroscopy with AI but also is a way to pay tribute to Nino Ferrer’s song “Mirza”