NetCDF2CSV

A data pipeline that converts MARIS NetCDF files into MARIS Standard OpenRefine CSV format.

This module converts NetCDF files into CSV files that follow the MARIS Standard OpenRefine format. While MARISCO has replaced OpenRefine in the data cleaning and preparation pipeline, the MARIS master database still requires input files to conform to this CSV format specification. The conversion is performed using the marisco library.

Tip

For new MARIS users, please refer to field definitions for detailed information about Maris fields.

Dependencies

Required packages and internal modules for data format transformations

from IPython.display import display, Markdown

Configuration and File Paths

# fname_in =  Path('../../_data/output/tepco.nc')
fname_in =  Path('../../_data/output/100-HELCOM-MORS-2024.nc')
fname_out = fname_in.with_suffix('.csv')
output_format = 'openrefine_csv'

Data Loading

Load data from standardized MARIS NetCDF files using ExtractNetcdfContents. The NetCDF files follow CF conventions and include standardized variable names and metadata according to MARIS specifications.

contents = ExtractNetcdfContents(fname_in)

Show the dictionary of dataframes extracted from the NetCDF file.

contents.dfs.keys()

dict_keys(['BIOTA', 'SEAWATER', 'SEDIMENT'])

Show an example of the DataFrame extracted from the NetCDF file.

with pd.option_context('display.max_columns', None):
    display(contents.dfs['SEAWATER'].head())

	LON	LAT	SMP_DEPTH	TOT_DEPTH	TIME	SMP_ID	NUCLIDE	VALUE	UNIT	UNC	DL
0	29.333300	60.083302	0.0	NaN	1337731200	0	33	5.300000	1	1.696	1
1	29.333300	60.083302	29.0	NaN	1337731200	1	33	19.900000	1	3.980	1
2	23.150000	59.433300	0.0	NaN	1339891200	2	33	25.500000	1	5.100	1
3	27.983299	60.250000	0.0	NaN	1337817600	3	33	17.000000	1	4.930	1
4	27.983299	60.250000	39.0	NaN	1337817600	4	33	22.200001	1	3.996	1

Show an example of the dictionary of enums extracted from the NetCDF file as a DataFrame.

grp='SEAWATER'
print(f'Variables in {grp} group: {contents.enum_dicts[grp].keys()}')
var='dl'
with pd.option_context('display.max_columns', None):
    display(pd.DataFrame.from_dict(contents.enum_dicts[grp][var], orient='index').T)

Variables in SEAWATER group: dict_keys(['nuclide', 'unit', 'dl', 'filt'])

	Not applicable	Not available	Detected value	Detection limit	Not detected	Derived
0	-1	0	1	2	3	4

Show the global attributes extracted from the NetCDF file.

print("First few attributes from global attributes:", list(contents.global_attrs.items())[:5])

First few attributes from global attributes: [('id', '26VMZZ2Q'), ('title', 'Environmental database - Helsinki Commission Monitoring of Radioactive Substances'), ('summary', 'MORS Environment database has been used to collate data resulting from monitoring of environmental radioactivity in the Baltic Sea based on HELCOM Recommendation 26/3.\n\nThe database is structured according to HELCOM Guidelines on Monitoring of Radioactive Substances (https://www.helcom.fi/wp-content/uploads/2019/08/Guidelines-for-Monitoring-of-Radioactive-Substances.pdf), which specifies reporting format, database structure, data types and obligatory parameters used for reporting data under Recommendation 26/3.\n\nThe database is updated and quality assured annually by HELCOM MORS EG.'), ('keywords', 'oceanography, Earth Science > Oceans > Ocean Chemistry> Radionuclides, Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure, Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments, Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes, Earth Science > Oceans > Water Quality > Ocean Contaminants, Earth Science > Biological Classification > Animals/Vertebrates > Fish, Earth Science > Biosphere > Ecosystems > Marine Ecosystems, Earth Science > Biological Classification > Animals/Invertebrates > Mollusks, Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans, Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)'), ('history', 'TBD')]

Show the custom maps extracted from the NetCDF file.

grp='SEAWATER'
print(f'Custom maps in {grp} group: {contents.custom_maps[grp].keys()}')
with pd.option_context('display.max_columns', None):
    display(pd.DataFrame.from_dict(contents.custom_maps[grp], orient='index'))

Custom maps in SEAWATER group: dict_keys([])

Validate NetCDF Enumerations

Verify that enumerated values in the NetCDF file match current MARIS lookup tables.

FEEDBACK TO DATA PROVIDERS

The enumeration validation process is a diagnostic step that identifies inconsistencies between NetCDF enumerations and MARIS lookup tables. While this validation does not modify the dataset, it generates detailed feedback about any mismatches or undefined values.

NC_VARS

{'ID': 'id',
 'LON': 'lon',
 'LAT': 'lat',
 'SMP_DEPTH': 'smp_depth',
 'TOT_DEPTH': 'tot_depth',
 'TIME': 'time',
 'AREA': 'area',
 'SMP_ID': 'smp_id',
 'NUCLIDE': 'nuclide',
 'VALUE': 'value',
 'UNIT': 'unit',
 'UNC': 'unc',
 'DL': 'dl',
 'DLV': 'dlv',
 'FILT': 'filt',
 'COUNT_MET': 'count_met',
 'SAMP_MET': 'samp_met',
 'PREP_MET': 'prep_met',
 'VOL': 'vol',
 'SAL': 'sal',
 'TEMP': 'temp',
 'PH': 'ph',
 'BIO_GROUP': 'bio_group',
 'SPECIES': 'species',
 'BODY_PART': 'body_part',
 'SED_TYPE': 'sed_type',
 'TOP': 'top',
 'BOTTOM': 'bottom',
 'DRYWT': 'drywt',
 'WETWT': 'wetwt',
 'PERCENTWT': 'percentwt',
 'LAB': 'lab',
 'PROFILE_ID': 'profile_id',
 'STATION': 'station'}

source

ValidateEnumsCB

 ValidateEnumsCB (contents, maris_enums, verbose=False)

Validate enumeration mappings between NetCDF file and MARIS lookup tables.

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    data= contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        ValidateEnumsCB(
            contents = contents,
            maris_enums=Enums(lut_src_dir=lut_path())
        ),
    ]
)
tfm()

{'BIOTA':              LON        LAT  SMP_DEPTH        TIME  SMP_ID  NUCLIDE  \
 0      12.316667  54.283333        NaN  1348358400       0       31   
 1      12.316667  54.283333        NaN  1348358400       0        4   
 2      12.316667  54.283333        NaN  1348358400       0        9   
 3      12.316667  54.283333        NaN  1348358400       0       33   
 4      12.316667  54.283333        NaN  1348358400       1       31   
 ...          ...        ...        ...         ...     ...      ...   
 16089  21.395000  61.241501        2.0  1652140800    4789       33   
 16090  21.395000  61.241501        2.0  1652140800    4789        9   
 16091  21.385000  61.343334        NaN  1663200000    4790        4   
 16092  21.385000  61.343334        NaN  1663200000    4790       33   
 16093  21.385000  61.343334        NaN  1663200000    4790       12   
 
             VALUE  UNIT       UNC  DL  BIO_GROUP  SPECIES  BODY_PART  \
 0        0.010140     5       NaN   2          4       99         52   
 1      135.300003     5  4.830210   1          4       99         52   
 2        0.013980     5       NaN   2          4       99         52   
 3        4.338000     5  0.150962   1          4       99         52   
 4        0.009614     5       NaN   2          4       99         52   
 ...           ...   ...       ...  ..        ...      ...        ...   
 16089   13.700000     4  0.520600   1         11       96         55   
 16090    0.500000     4  0.045500   1         11       96         55   
 16091   50.700001     4  4.106700   1         14      129          1   
 16092    0.880000     4  0.140800   1         14      129          1   
 16093    6.600000     4  0.349800   1         14      129          1   
 
             DRYWT  WETWT  PERCENTWT  
 0      174.934433  948.0    0.18453  
 1      174.934433  948.0    0.18453  
 2      174.934433  948.0    0.18453  
 3      174.934433  948.0    0.18453  
 4      177.935120  964.0    0.18458  
 ...           ...    ...        ...  
 16089         NaN    NaN        NaN  
 16090         NaN    NaN        NaN  
 16091         NaN    NaN        NaN  
 16092         NaN    NaN        NaN  
 16093         NaN    NaN        NaN  
 
 [16094 rows x 16 columns],
 'SEAWATER':              LON        LAT  SMP_DEPTH  TOT_DEPTH        TIME  SMP_ID  \
 0      29.333300  60.083302        0.0        NaN  1337731200       0   
 1      29.333300  60.083302       29.0        NaN  1337731200       1   
 2      23.150000  59.433300        0.0        NaN  1339891200       2   
 3      27.983299  60.250000        0.0        NaN  1337817600       3   
 4      27.983299  60.250000       39.0        NaN  1337817600       4   
 ...          ...        ...        ...        ...         ...     ...   
 21468  13.499833  54.600334        0.0       47.0  1686441600    9724   
 21469  13.499833  54.600334       45.0       47.0  1686441600    9725   
 21470  14.200833  54.600334        0.0       11.0  1686614400    9731   
 21471  14.665500  54.600334        0.0       20.0  1686614400    9732   
 21472  14.330000  54.600334        0.0       17.0  1686614400    9734   
 
        NUCLIDE       VALUE  UNIT        UNC  DL  FILT  
 0           33    5.300000     1   1.696000   1     0  
 1           33   19.900000     1   3.980000   1     0  
 2           33   25.500000     1   5.100000   1     0  
 3           33   17.000000     1   4.930000   1     0  
 4           33   22.200001     1   3.996000   1     0  
 ...        ...         ...   ...        ...  ..   ...  
 21468        1  702.838074     1  51.276207   1     0  
 21469        1  725.855713     1  52.686260   1     0  
 21470        1  648.992920     1  48.154419   1     0  
 21471        1  627.178406     1  46.245316   1     0  
 21472        1  605.715088     1  45.691143   1     0  
 
 [21473 rows x 12 columns],
 'SEDIMENT':              LON        LAT  TOT_DEPTH        TIME  SMP_ID  NUCLIDE  \
 0      27.799999  60.466667       25.0  1337904000       0       33   
 1      27.799999  60.466667       25.0  1337904000       1       33   
 2      27.799999  60.466667       25.0  1337904000       2       33   
 3      27.799999  60.466667       25.0  1337904000       3       33   
 4      27.799999  60.466667       25.0  1337904000       4       33   
 ...          ...        ...        ...         ...     ...      ...   
 70444  15.537800  54.617832       62.0  1654646400   14121       67   
 70445  15.537800  54.617832       62.0  1654646400   14121       77   
 70446  15.537800  54.617832       62.0  1654646400   14122        4   
 70447  15.537800  54.617832       62.0  1654646400   14122       33   
 70448  15.537800  54.617832       62.0  1654646400   14122       77   
 
              VALUE  UNIT         UNC  DL  SED_TYPE   TOP  BOTTOM  PERCENTWT  
 0      1200.000000     3  240.000000   1         0  15.0    20.0        NaN  
 1       250.000000     3   50.000000   1         0  20.0    25.0        NaN  
 2       140.000000     3   29.400000   1         0  25.0    30.0        NaN  
 3        79.000000     3   15.800000   1         0  30.0    35.0        NaN  
 4        29.000000     3    6.960000   1         0  35.0    40.0        NaN  
 ...            ...   ...         ...  ..       ...   ...     ...        ...  
 70444     0.044000     2    0.015312   1        10  15.0    17.0   0.257642  
 70445     2.500000     2    0.185000   1        10  15.0    17.0   0.257642  
 70446  5873.000000     2  164.444000   1        10  17.0    19.0   0.263965  
 70447    21.200001     2    2.162400   1        10  17.0    19.0   0.263965  
 70448     0.370000     2    0.048100   1        10  17.0    19.0   0.263965  
 
 [70449 rows x 14 columns]}

Remove Non Compatible Columns

The [RemoveNonCompatibleVariablesCB](https://franckalbinet.github.io/marisco/api/netcdf2csv.html#removenoncompatiblevariablescb) callback filters out variables from the NetCDF format that are not listed in the VARS configuration.

source

RemoveNonCompatibleVariablesCB

 RemoveNonCompatibleVariablesCB (vars:Dict[str,str]={'LON': 'longitude',
                                 'LAT': 'latitude', 'SMP_DEPTH':
                                 'sampdepth', 'TOT_DEPTH': 'totdepth',
                                 'TIME': 'begperiod', 'AREA': 'area',
                                 'NUCLIDE': 'nuclide_id', 'VALUE':
                                 'activity', 'UNIT': 'unit_id', 'UNC':
                                 'uncertaint', 'DL': 'detection', 'DLV':
                                 'detection_val', 'FILT': 'filtered',
                                 'COUNT_MET': 'counmet_id', 'SAMP_MET':
                                 'sampmet_id', 'PREP_MET': 'prepmet_id',
                                 'VOL': 'volume', 'SAL': 'salinity',
                                 'TEMP': 'temperatur', 'SPECIES':
                                 'species_id', 'BODY_PART': 'bodypar_id',
                                 'SED_TYPE': 'sedtype_id', 'TOP':
                                 'sliceup', 'BOTTOM': 'slicedown',
                                 'DRYWT': 'drywt', 'WETWT': 'wetwt',
                                 'PERCENTWT': 'percentwt', 'LAB':
                                 'lab_id', 'PROFILE_ID': 'profile_id',
                                 'SAMPLE_TYPE': 'samptype_id',
                                 'TAXONNAME': 'taxonname', 'TAXONREPNAME':
                                 'taxonrepname', 'TAXONRANK': 'taxonrank',
                                 'TAXONDB': 'taxondb', 'TAXONDBID':
                                 'taxondb_id', 'TAXONDBURL':
                                 'taxondb_url', 'REF_ID': 'ref_id',
                                 'SMP_ID': 'samplabcode', 'STATION':
                                 'station'}, verbose:bool=False)

Remove variables not listed in VARS configuration.

	Type	Default	Details
vars	Dict	{‘LON’: ‘longitude’, ‘LAT’: ‘latitude’, ‘SMP_DEPTH’: ‘sampdepth’, ‘TOT_DEPTH’: ‘totdepth’, ‘TIME’: ‘begperiod’, ‘AREA’: ‘area’, ‘NUCLIDE’: ‘nuclide_id’, ‘VALUE’: ‘activity’, ‘UNIT’: ‘unit_id’, ‘UNC’: ‘uncertaint’, ‘DL’: ‘detection’, ‘DLV’: ‘detection_val’, ‘FILT’: ‘filtered’, ‘COUNT_MET’: ‘counmet_id’, ‘SAMP_MET’: ‘sampmet_id’, ‘PREP_MET’: ‘prepmet_id’, ‘VOL’: ‘volume’, ‘SAL’: ‘salinity’, ‘TEMP’: ‘temperatur’, ‘SPECIES’: ‘species_id’, ‘BODY_PART’: ‘bodypar_id’, ‘SED_TYPE’: ‘sedtype_id’, ‘TOP’: ‘sliceup’, ‘BOTTOM’: ‘slicedown’, ‘DRYWT’: ‘drywt’, ‘WETWT’: ‘wetwt’, ‘PERCENTWT’: ‘percentwt’, ‘LAB’: ‘lab_id’, ‘PROFILE_ID’: ‘profile_id’, ‘SAMPLE_TYPE’: ‘samptype_id’, ‘TAXONNAME’: ‘taxonname’, ‘TAXONREPNAME’: ‘taxonrepname’, ‘TAXONRANK’: ‘taxonrank’, ‘TAXONDB’: ‘taxondb’, ‘TAXONDBID’: ‘taxondb_id’, ‘TAXONDBURL’: ‘taxondb_url’, ‘REF_ID’: ‘ref_id’, ‘SMP_ID’: ‘samplabcode’, ‘STATION’: ‘station’}	Dictionary mapping OR vars to NC vars
verbose	bool	False

CSV_VARS

{'LON': 'longitude',
 'LAT': 'latitude',
 'SMP_DEPTH': 'sampdepth',
 'TOT_DEPTH': 'totdepth',
 'TIME': 'begperiod',
 'AREA': 'area',
 'NUCLIDE': 'nuclide_id',
 'VALUE': 'activity',
 'UNIT': 'unit_id',
 'UNC': 'uncertaint',
 'DL': 'detection',
 'DLV': 'detection_val',
 'FILT': 'filtered',
 'COUNT_MET': 'counmet_id',
 'SAMP_MET': 'sampmet_id',
 'PREP_MET': 'prepmet_id',
 'VOL': 'volume',
 'SAL': 'salinity',
 'TEMP': 'temperatur',
 'SPECIES': 'species_id',
 'BODY_PART': 'bodypar_id',
 'SED_TYPE': 'sedtype_id',
 'TOP': 'sliceup',
 'BOTTOM': 'slicedown',
 'DRYWT': 'drywt',
 'WETWT': 'wetwt',
 'PERCENTWT': 'percentwt',
 'LAB': 'lab_id',
 'PROFILE_ID': 'profile_id',
 'SAMPLE_TYPE': 'samptype_id',
 'TAXONNAME': 'taxonname',
 'TAXONREPNAME': 'taxonrepname',
 'TAXONRANK': 'taxonrank',
 'TAXONDB': 'taxondb',
 'TAXONDBID': 'taxondb_id',
 'TAXONDBURL': 'taxondb_url',
 'REF_ID': 'ref_id',
 'SMP_ID': 'samplabcode',
 'STATION': 'station'}

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    data=contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        RemoveNonCompatibleVariablesCB(vars=CSV_VARS, verbose=True),
    ]
)
tfm()
print('\n')

Removing variables that are not compatible with vars provided. 
Removing BIO_GROUP from BIOTA dataset.

Add Taxon Information

source

get_taxon_info_lut

 get_taxon_info_lut (maris_lut:str, key_names:dict={'Taxonname':
                     'TAXONNAME', 'Taxonrank': 'TAXONRANK', 'TaxonDB':
                     'TAXONDB', 'TaxonDBID': 'TAXONDBID', 'TaxonDBURL':
                     'TAXONDBURL'})

Create lookup dictionary for taxon information from MARIS species lookup table.

source

AddTaxonInformationCB

 AddTaxonInformationCB (fn_lut:Callable=<function <lambda>>,
                        verbose:bool=False)

Add taxon information to BIOTA group based on species lookup table.

	Type	Default	Details
fn_lut	Callable		Function that returns taxon lookup dictionary
verbose	bool	False

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    data=contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        AddTaxonInformationCB(
            fn_lut=lut_taxon
        ),
    ]
)

tfm()
# print(tfm.dfs['BIOTA'][['TAXONNAME','TAXONRANK','TAXONDB','TAXONDBID','TAXONDBURL']])

{'BIOTA':              LON        LAT  SMP_DEPTH        TIME  SMP_ID  NUCLIDE  \
 0      12.316667  54.283333        NaN  1348358400       0       31   
 1      12.316667  54.283333        NaN  1348358400       0        4   
 2      12.316667  54.283333        NaN  1348358400       0        9   
 3      12.316667  54.283333        NaN  1348358400       0       33   
 4      12.316667  54.283333        NaN  1348358400       1       31   
 ...          ...        ...        ...         ...     ...      ...   
 16089  21.395000  61.241501        2.0  1652140800    4789       33   
 16090  21.395000  61.241501        2.0  1652140800    4789        9   
 16091  21.385000  61.343334        NaN  1663200000    4790        4   
 16092  21.385000  61.343334        NaN  1663200000    4790       33   
 16093  21.385000  61.343334        NaN  1663200000    4790       12   
 
             VALUE  UNIT       UNC  DL  ...  SPECIES  BODY_PART       DRYWT  \
 0        0.010140     5       NaN   2  ...       99         52  174.934433   
 1      135.300003     5  4.830210   1  ...       99         52  174.934433   
 2        0.013980     5       NaN   2  ...       99         52  174.934433   
 3        4.338000     5  0.150962   1  ...       99         52  174.934433   
 4        0.009614     5       NaN   2  ...       99         52  177.935120   
 ...           ...   ...       ...  ..  ...      ...        ...         ...   
 16089   13.700000     4  0.520600   1  ...       96         55         NaN   
 16090    0.500000     4  0.045500   1  ...       96         55         NaN   
 16091   50.700001     4  4.106700   1  ...      129          1         NaN   
 16092    0.880000     4  0.140800   1  ...      129          1         NaN   
 16093    6.600000     4  0.349800   1  ...      129          1         NaN   
 
        WETWT  PERCENTWT          TAXONNAME TAXONRANK   TAXONDB TAXONDBID  \
 0      948.0    0.18453       Gadus morhua   species  Wikidata   Q199788   
 1      948.0    0.18453       Gadus morhua   species  Wikidata   Q199788   
 2      948.0    0.18453       Gadus morhua   species  Wikidata   Q199788   
 3      948.0    0.18453       Gadus morhua   species  Wikidata   Q199788   
 4      964.0    0.18458       Gadus morhua   species  Wikidata   Q199788   
 ...      ...        ...                ...       ...       ...       ...   
 16089    NaN        NaN  Fucus vesiculosus   species  Wikidata   Q754755   
 16090    NaN        NaN  Fucus vesiculosus   species  Wikidata   Q754755   
 16091    NaN        NaN     Mytilus edulis   species  Wikidata    Q27855   
 16092    NaN        NaN     Mytilus edulis   species  Wikidata    Q27855   
 16093    NaN        NaN     Mytilus edulis   species  Wikidata    Q27855   
 
                                   TAXONDBURL  
 0      https://www.wikidata.org/wiki/Q199788  
 1      https://www.wikidata.org/wiki/Q199788  
 2      https://www.wikidata.org/wiki/Q199788  
 3      https://www.wikidata.org/wiki/Q199788  
 4      https://www.wikidata.org/wiki/Q199788  
 ...                                      ...  
 16089  https://www.wikidata.org/wiki/Q754755  
 16090  https://www.wikidata.org/wiki/Q754755  
 16091   https://www.wikidata.org/wiki/Q27855  
 16092   https://www.wikidata.org/wiki/Q27855  
 16093   https://www.wikidata.org/wiki/Q27855  
 
 [16094 rows x 21 columns],
 'SEAWATER':              LON        LAT  SMP_DEPTH  TOT_DEPTH        TIME  SMP_ID  \
 0      29.333300  60.083302        0.0        NaN  1337731200       0   
 1      29.333300  60.083302       29.0        NaN  1337731200       1   
 2      23.150000  59.433300        0.0        NaN  1339891200       2   
 3      27.983299  60.250000        0.0        NaN  1337817600       3   
 4      27.983299  60.250000       39.0        NaN  1337817600       4   
 ...          ...        ...        ...        ...         ...     ...   
 21468  13.499833  54.600334        0.0       47.0  1686441600    9724   
 21469  13.499833  54.600334       45.0       47.0  1686441600    9725   
 21470  14.200833  54.600334        0.0       11.0  1686614400    9731   
 21471  14.665500  54.600334        0.0       20.0  1686614400    9732   
 21472  14.330000  54.600334        0.0       17.0  1686614400    9734   
 
        NUCLIDE       VALUE  UNIT        UNC  DL  FILT  
 0           33    5.300000     1   1.696000   1     0  
 1           33   19.900000     1   3.980000   1     0  
 2           33   25.500000     1   5.100000   1     0  
 3           33   17.000000     1   4.930000   1     0  
 4           33   22.200001     1   3.996000   1     0  
 ...        ...         ...   ...        ...  ..   ...  
 21468        1  702.838074     1  51.276207   1     0  
 21469        1  725.855713     1  52.686260   1     0  
 21470        1  648.992920     1  48.154419   1     0  
 21471        1  627.178406     1  46.245316   1     0  
 21472        1  605.715088     1  45.691143   1     0  
 
 [21473 rows x 12 columns],
 'SEDIMENT':              LON        LAT  TOT_DEPTH        TIME  SMP_ID  NUCLIDE  \
 0      27.799999  60.466667       25.0  1337904000       0       33   
 1      27.799999  60.466667       25.0  1337904000       1       33   
 2      27.799999  60.466667       25.0  1337904000       2       33   
 3      27.799999  60.466667       25.0  1337904000       3       33   
 4      27.799999  60.466667       25.0  1337904000       4       33   
 ...          ...        ...        ...         ...     ...      ...   
 70444  15.537800  54.617832       62.0  1654646400   14121       67   
 70445  15.537800  54.617832       62.0  1654646400   14121       77   
 70446  15.537800  54.617832       62.0  1654646400   14122        4   
 70447  15.537800  54.617832       62.0  1654646400   14122       33   
 70448  15.537800  54.617832       62.0  1654646400   14122       77   
 
              VALUE  UNIT         UNC  DL  SED_TYPE   TOP  BOTTOM  PERCENTWT  
 0      1200.000000     3  240.000000   1         0  15.0    20.0        NaN  
 1       250.000000     3   50.000000   1         0  20.0    25.0        NaN  
 2       140.000000     3   29.400000   1         0  25.0    30.0        NaN  
 3        79.000000     3   15.800000   1         0  30.0    35.0        NaN  
 4        29.000000     3    6.960000   1         0  35.0    40.0        NaN  
 ...            ...   ...         ...  ..       ...   ...     ...        ...  
 70444     0.044000     2    0.015312   1        10  15.0    17.0   0.257642  
 70445     2.500000     2    0.185000   1        10  15.0    17.0   0.257642  
 70446  5873.000000     2  164.444000   1        10  17.0    19.0   0.263965  
 70447    21.200001     2    2.162400   1        10  17.0    19.0   0.263965  
 70448     0.370000     2    0.048100   1        10  17.0    19.0   0.263965  
 
 [70449 rows x 14 columns]}

Standardize Time

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    data=contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        DecodeTimeCB(),
    ]
)

tfm()

print(tfm.dfs['SEAWATER']['TIME'])

0       2012-05-23
1       2012-05-23
2       2012-06-17
3       2012-05-24
4       2012-05-24
           ...    
21468   2023-06-11
21469   2023-06-11
21470   2023-06-13
21471   2023-06-13
21472   2023-06-13
Name: TIME, Length: 21473, dtype: datetime64[ns]

Add Sample Type ID

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    data=contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        AddSampleTypeIdColumnCB(),
    ]
)

tfm()
print(tfm.dfs['SEAWATER']['SAMPLE_TYPE'].unique())
# print(tfm.dfs['BIOTA']['SAMPLE_TYPE'].unique())
# print(tfm.dfs['SEDIMENT']['SAMPLE_TYPE'].unique())

[1]

Add Reference ID

Include the ref_id (i.e., Zotero Archive Location). The ZoteroArchiveLocationCB performs a lookup of the Zotero Archive Location based on the Zotero key defined in the global attributes of the MARIS NetCDF file as id.

contents.global_attrs['id']

'26VMZZ2Q'

source

AddZoteroArchiveLocationCB

 AddZoteroArchiveLocationCB (attrs:str, cfg:dict)

Fetch and append ‘Loc. in Archive’ from Zotero to DataFrame.

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    data=contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        AddZoteroArchiveLocationCB(contents.global_attrs, cfg=cfg()),
    ]
)
tfm()
print(tfm.dfs['SEAWATER']['REF_ID'].unique())

[100]

Remap encoded custom maps

NetCDF variables can store custom mappings as attributes, which provide a way to map between encoded values and their human-readable representations. The RemapCustomMapsCB callback handles this conversion process.

source

RemapCustomMapsCB

 RemapCustomMapsCB (verbose:bool=False)

Remap encoded custom maps to decoded values.

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    data=contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        RemapCustomMapsCB(verbose=True),
    ]
)
tfm()
print('Example of remapped custom maps:')
for grp in tfm.dfs:
    print(grp)
    print(tfm.dfs[grp]['SMP_ID'].head())

Example of remapped custom maps:
BIOTA
0    0
1    0
2    0
3    0
4    1
Name: SMP_ID, dtype: uint64
SEAWATER
0    0
1    1
2    2
3    3
4    4
Name: SMP_ID, dtype: uint64
SEDIMENT
0    0
1    1
2    2
3    3
4    4
Name: SMP_ID, dtype: uint64

Remap to Open Refine specific mappings

FEEDBACK FOR NEXT VERSION

[To be further clarified]

The current approach of remapping to OR-specific mappings should be reconsidered. Considering that we already utilize MARISCO lookup tables in NetCDF for creating enums, it would be beneficial to extend their use to OpenRefine data formats as well. By doing so, we could eliminate the need for OpenRefine-specific mappings, streamlining the data transformation process. Lets review the lookup tables used to create the enums for NetCDF:

enums = Enums(lut_src_dir=lut_path())
print(f'DL enums: {enums.types["DL"]}')
print(f'FILT enums: {enums.types["FILT"]}')

DL enums: {'Not applicable': -1, 'Not available': 0, 'Detected value': 1, 'Detection limit': 2, 'Not detected': 3, 'Derived': 4}
FILT enums: {'Not applicable': -1, 'Not available': 0, 'Yes': 1, 'No': 2}

For the detection limit lookup table (LUT), as shown below, the values required for the OpenRefine CSV format are listed under the ‘name’ column, whereas the enums utilize the ‘name_sanitized’ column.

Additionally, for the filtered LUT, also shown below, the values do not align consistently with the OpenRefine CSV format, which uses (Y, N, NA).

dl_lut = pd.read_excel(detection_limit_lut_path())
dl_lut

	id	name	name_sanitized
0	-1	Not applicable	Not applicable
1	0	Not Available	Not available
2	1	=	Detected value
3	2	<	Detection limit
4	3	ND	Not detected
5	4	DE	Derived

filtered_lut = pd.read_excel(filtered_lut_path())
filtered_lut

	id	name
0	-1	Not applicable
1	0	Not available
2	1	Yes
3	2	No

We will create OpenRefine specific mappings for the detection limit and filtered data:

RemapToORSpecificMappingsCB remaps the values of the detection limit and filtered data to the OpenRefine CSV format.

source

RemapToORSpecificMappingsCB

 RemapToORSpecificMappingsCB (or_mappings:Dict[str,Dict]={'DL': {3: 'ND',
                              1: '=', 2: '<'}, 'FILT': {0: 'NA', 1: 'Y',
                              2: 'N'}},
                              output_format:str='openrefine_csv',
                              verbose:bool=False)

Convert values using OR mappings if columns exist in dataframe.

	Type	Default	Details
or_mappings	Dict	{‘DL’: {3: ‘ND’, 1: ‘=’, 2: ‘<’}, ‘FILT’: {0: ‘NA’, 1: ‘Y’, 2: ‘N’}}	Dictionary of column mappings,
output_format	str	openrefine_csv
verbose	bool	False

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    data= contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        RemapToORSpecificMappingsCB(),
    ]
)

tfm()

# Loop through each group in the 'dfs' dictionary
for group_name, df in tfm.dfs.items():
    # Check if the group dataframe contains any of the columns specified in or_mappings.keys()
    relevant_columns = [col for col in or_mappings.keys() if col in df.columns]
    if relevant_columns:
        # Print the unique values from the relevant columns
        print(f"\nUnique values in {group_name} for columns {relevant_columns}:")
        for col in relevant_columns:
            print(f"{col}: {df[col].unique()}")
    else:
        print(f"No relevant columns found in {group_name} based on or_mappings keys.")


Unique values in BIOTA for columns ['DL']:
DL: ['<' '=' nan]

Unique values in SEAWATER for columns ['DL', 'FILT']:
DL: ['=' '<' nan]
FILT: ['NA' 'N' 'Y']

Unique values in SEDIMENT for columns ['DL']:
DL: ['=' '<' nan]

Remap to CSV data type format

CSV_DTYPES (defined in configs.ipynb) defines a state for each variable that contains a lookup table (i.e. enums). The state is either ‘decoded’ or ‘encoded’. Lets review the variable states as a DataFrame:

with pd.option_context('display.max_columns', None, 'display.max_colwidth', None):
    display(pd.DataFrame.from_dict(CSV_DTYPES, orient='index').T)

	AREA	NUCLIDE	UNIT	DL	FILT	COUNT_MET	SAMP_MET	PREP_MET	SPECIES	BODY_PART	SED_TYPE	LAB
state	decoded	encoded	encoded	decoded	decoded	encoded	encoded	encoded	encoded	encoded	encoded	encoded

enums = Enums(lut_src_dir=lut_path())
enums.types.keys()

dict_keys(['AREA', 'BIO_GROUP', 'BODY_PART', 'COUNT_MET', 'DL', 'FILT', 'NUCLIDE', 'PREP_MET', 'SAMP_MET', 'SED_TYPE', 'SPECIES', 'UNIT', 'LAB'])

source

get_excluded_enums

 get_excluded_enums (output_format:str='openrefine_csv')

Get excluded enums based on output format.

source

DataFormatConversionCB

 DataFormatConversionCB (dtypes:Dict, excluded_mappings:Callable=<function
                         get_excluded_enums>,
                         output_format:str='openrefine_csv',
                         verbose:bool=False)

A callback to convert DataFrame enum values between encoded and decoded formats based on specified settings.

	Type	Default	Details
dtypes	Dict		Dictionary defining data types and states for each lookup table
excluded_mappings	Callable	get_excluded_enums	Dictionary of columns to exclude from conversion
output_format	str	openrefine_csv
verbose	bool	False	Flag for verbose output

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        RemoveNonCompatibleVariablesCB(vars=CSV_VARS, verbose=True),
        DataFormatConversionCB(
            dtypes=CSV_DTYPES,
            excluded_mappings = get_excluded_enums,
            output_format='openrefine_csv',
            verbose=True
        ),
    ]
)
tfm()

Removing variables that are not compatible with vars provided. 
Removing BIO_GROUP from BIOTA dataset.
Loaded enums: dict_keys(['AREA', 'BIO_GROUP', 'BODY_PART', 'COUNT_MET', 'DL', 'FILT', 'NUCLIDE', 'PREP_MET', 'SAMP_MET', 'SED_TYPE', 'SPECIES', 'UNIT', 'LAB'])

{'BIOTA':              LON        LAT  SMP_DEPTH        TIME  SMP_ID  NUCLIDE  \
 0      12.316667  54.283333        NaN  1348358400       0       31   
 1      12.316667  54.283333        NaN  1348358400       0        4   
 2      12.316667  54.283333        NaN  1348358400       0        9   
 3      12.316667  54.283333        NaN  1348358400       0       33   
 4      12.316667  54.283333        NaN  1348358400       1       31   
 ...          ...        ...        ...         ...     ...      ...   
 16089  21.395000  61.241501        2.0  1652140800    4789       33   
 16090  21.395000  61.241501        2.0  1652140800    4789        9   
 16091  21.385000  61.343334        NaN  1663200000    4790        4   
 16092  21.385000  61.343334        NaN  1663200000    4790       33   
 16093  21.385000  61.343334        NaN  1663200000    4790       12   
 
             VALUE  UNIT       UNC  DL  SPECIES  BODY_PART       DRYWT  WETWT  \
 0        0.010140     5       NaN   2       99         52  174.934433  948.0   
 1      135.300003     5  4.830210   1       99         52  174.934433  948.0   
 2        0.013980     5       NaN   2       99         52  174.934433  948.0   
 3        4.338000     5  0.150962   1       99         52  174.934433  948.0   
 4        0.009614     5       NaN   2       99         52  177.935120  964.0   
 ...           ...   ...       ...  ..      ...        ...         ...    ...   
 16089   13.700000     4  0.520600   1       96         55         NaN    NaN   
 16090    0.500000     4  0.045500   1       96         55         NaN    NaN   
 16091   50.700001     4  4.106700   1      129          1         NaN    NaN   
 16092    0.880000     4  0.140800   1      129          1         NaN    NaN   
 16093    6.600000     4  0.349800   1      129          1         NaN    NaN   
 
        PERCENTWT  
 0        0.18453  
 1        0.18453  
 2        0.18453  
 3        0.18453  
 4        0.18458  
 ...          ...  
 16089        NaN  
 16090        NaN  
 16091        NaN  
 16092        NaN  
 16093        NaN  
 
 [16094 rows x 15 columns],
 'SEAWATER':              LON        LAT  SMP_DEPTH  TOT_DEPTH        TIME  SMP_ID  \
 0      29.333300  60.083302        0.0        NaN  1337731200       0   
 1      29.333300  60.083302       29.0        NaN  1337731200       1   
 2      23.150000  59.433300        0.0        NaN  1339891200       2   
 3      27.983299  60.250000        0.0        NaN  1337817600       3   
 4      27.983299  60.250000       39.0        NaN  1337817600       4   
 ...          ...        ...        ...        ...         ...     ...   
 21468  13.499833  54.600334        0.0       47.0  1686441600    9724   
 21469  13.499833  54.600334       45.0       47.0  1686441600    9725   
 21470  14.200833  54.600334        0.0       11.0  1686614400    9731   
 21471  14.665500  54.600334        0.0       20.0  1686614400    9732   
 21472  14.330000  54.600334        0.0       17.0  1686614400    9734   
 
        NUCLIDE       VALUE  UNIT        UNC  DL  FILT  
 0           33    5.300000     1   1.696000   1     0  
 1           33   19.900000     1   3.980000   1     0  
 2           33   25.500000     1   5.100000   1     0  
 3           33   17.000000     1   4.930000   1     0  
 4           33   22.200001     1   3.996000   1     0  
 ...        ...         ...   ...        ...  ..   ...  
 21468        1  702.838074     1  51.276207   1     0  
 21469        1  725.855713     1  52.686260   1     0  
 21470        1  648.992920     1  48.154419   1     0  
 21471        1  627.178406     1  46.245316   1     0  
 21472        1  605.715088     1  45.691143   1     0  
 
 [21473 rows x 12 columns],
 'SEDIMENT':              LON        LAT  TOT_DEPTH        TIME  SMP_ID  NUCLIDE  \
 0      27.799999  60.466667       25.0  1337904000       0       33   
 1      27.799999  60.466667       25.0  1337904000       1       33   
 2      27.799999  60.466667       25.0  1337904000       2       33   
 3      27.799999  60.466667       25.0  1337904000       3       33   
 4      27.799999  60.466667       25.0  1337904000       4       33   
 ...          ...        ...        ...         ...     ...      ...   
 70444  15.537800  54.617832       62.0  1654646400   14121       67   
 70445  15.537800  54.617832       62.0  1654646400   14121       77   
 70446  15.537800  54.617832       62.0  1654646400   14122        4   
 70447  15.537800  54.617832       62.0  1654646400   14122       33   
 70448  15.537800  54.617832       62.0  1654646400   14122       77   
 
              VALUE  UNIT         UNC  DL  SED_TYPE   TOP  BOTTOM  PERCENTWT  
 0      1200.000000     3  240.000000   1         0  15.0    20.0        NaN  
 1       250.000000     3   50.000000   1         0  20.0    25.0        NaN  
 2       140.000000     3   29.400000   1         0  25.0    30.0        NaN  
 3        79.000000     3   15.800000   1         0  30.0    35.0        NaN  
 4        29.000000     3    6.960000   1         0  35.0    40.0        NaN  
 ...            ...   ...         ...  ..       ...   ...     ...        ...  
 70444     0.044000     2    0.015312   1        10  15.0    17.0   0.257642  
 70445     2.500000     2    0.185000   1        10  15.0    17.0   0.257642  
 70446  5873.000000     2  164.444000   1        10  17.0    19.0   0.263965  
 70447    21.200001     2    2.162400   1        10  17.0    19.0   0.263965  
 70448     0.370000     2    0.048100   1        10  17.0    19.0   0.263965  
 
 [70449 rows x 14 columns]}

Review all callbacks

contents = ExtractNetcdfContents(fname_in)
output_format = 'openrefine_csv'
tfm = Transformer(
    data=contents.dfs,
    custom_maps=contents.custom_maps,
    cbs=[
        ValidateEnumsCB(
            contents = contents,
            maris_enums=Enums(lut_src_dir=lut_path())
        ),
        RemoveNonCompatibleVariablesCB(vars=CSV_VARS) ,
        RemapCustomMapsCB(),
        RemapToORSpecificMappingsCB(output_format=output_format),
        AddTaxonInformationCB(
            fn_lut=lut_taxon
        ),
        DecodeTimeCB(),
        AddSampleTypeIdColumnCB(),
        AddZoteroArchiveLocationCB(contents.global_attrs, cfg=cfg()),
        DataFormatConversionCB(
            dtypes=CSV_DTYPES,
            excluded_mappings = get_excluded_enums,
            output_format=output_format,
        ) 
        ]
)
tfm()
for grp in ['SEAWATER', 'BIOTA']:
    display(Markdown(f"<b>Head of the transformed `{grp}` DataFrame:</b>"))
    with pd.option_context('display.max_rows', None):
        display(tfm.dfs[grp].head())

Head of the transformed SEAWATER DataFrame:

	LON	LAT	SMP_DEPTH	TOT_DEPTH	TIME	SMP_ID	NUCLIDE	VALUE	UNIT	UNC	DL	FILT	SAMPLE_TYPE	REF_ID
0	29.333300	60.083302	0.0	NaN	2012-05-23	0	33	5.300000	1	1.696	=	NA	1	100
1	29.333300	60.083302	29.0	NaN	2012-05-23	1	33	19.900000	1	3.980	=	NA	1	100
2	23.150000	59.433300	0.0	NaN	2012-06-17	2	33	25.500000	1	5.100	=	NA	1	100
3	27.983299	60.250000	0.0	NaN	2012-05-24	3	33	17.000000	1	4.930	=	NA	1	100
4	27.983299	60.250000	39.0	NaN	2012-05-24	4	33	22.200001	1	3.996	=	NA	1	100

Head of the transformed BIOTA DataFrame:

	LON	LAT	SMP_DEPTH	TIME	SMP_ID	NUCLIDE	VALUE	UNIT	UNC	DL	...	DRYWT	WETWT	PERCENTWT	TAXONNAME	TAXONRANK	TAXONDB	TAXONDBID	TAXONDBURL	SAMPLE_TYPE	REF_ID
0	12.316667	54.283333	NaN	2012-09-23	0	31	0.010140	5	NaN	<	...	174.934433	948.0	0.18453	Gadus morhua	species	Wikidata	Q199788	https://www.wikidata.org/wiki/Q199788	2	100
1	12.316667	54.283333	NaN	2012-09-23	0	4	135.300003	5	4.830210	=	...	174.934433	948.0	0.18453	Gadus morhua	species	Wikidata	Q199788	https://www.wikidata.org/wiki/Q199788	2	100
2	12.316667	54.283333	NaN	2012-09-23	0	9	0.013980	5	NaN	<	...	174.934433	948.0	0.18453	Gadus morhua	species	Wikidata	Q199788	https://www.wikidata.org/wiki/Q199788	2	100
3	12.316667	54.283333	NaN	2012-09-23	0	33	4.338000	5	0.150962	=	...	174.934433	948.0	0.18453	Gadus morhua	species	Wikidata	Q199788	https://www.wikidata.org/wiki/Q199788	2	100
4	12.316667	54.283333	NaN	2012-09-23	1	31	0.009614	5	NaN	<	...	177.935120	964.0	0.18458	Gadus morhua	species	Wikidata	Q199788	https://www.wikidata.org/wiki/Q199788	2	100

5 rows × 22 columns

Decode

source

decode

 decode (fname_in:str, dest_out:str|None=None,
         output_format:str='openrefine_csv',
         remap_vars:Dict[str,str]={'LON': 'longitude', 'LAT': 'latitude',
         'SMP_DEPTH': 'sampdepth', 'TOT_DEPTH': 'totdepth', 'TIME':
         'begperiod', 'AREA': 'area', 'NUCLIDE': 'nuclide_id', 'VALUE':
         'activity', 'UNIT': 'unit_id', 'UNC': 'uncertaint', 'DL':
         'detection', 'DLV': 'detection_val', 'FILT': 'filtered',
         'COUNT_MET': 'counmet_id', 'SAMP_MET': 'sampmet_id', 'PREP_MET':
         'prepmet_id', 'VOL': 'volume', 'SAL': 'salinity', 'TEMP':
         'temperatur', 'SPECIES': 'species_id', 'BODY_PART': 'bodypar_id',
         'SED_TYPE': 'sedtype_id', 'TOP': 'sliceup', 'BOTTOM':
         'slicedown', 'DRYWT': 'drywt', 'WETWT': 'wetwt', 'PERCENTWT':
         'percentwt', 'LAB': 'lab_id', 'PROFILE_ID': 'profile_id',
         'SAMPLE_TYPE': 'samptype_id', 'TAXONNAME': 'taxonname',
         'TAXONREPNAME': 'taxonrepname', 'TAXONRANK': 'taxonrank',
         'TAXONDB': 'taxondb', 'TAXONDBID': 'taxondb_id', 'TAXONDBURL':
         'taxondb_url', 'REF_ID': 'ref_id', 'SMP_ID': 'samplabcode',
         'STATION': 'station'}, remap_dtypes:Dict[str,str]={'AREA':
         {'state': 'decoded'}, 'NUCLIDE': {'state': 'encoded'}, 'UNIT':
         {'state': 'encoded'}, 'DL': {'state': 'decoded'}, 'FILT':
         {'state': 'decoded'}, 'COUNT_MET': {'state': 'encoded'},
         'SAMP_MET': {'state': 'encoded'}, 'PREP_MET': {'state':
         'encoded'}, 'SPECIES': {'state': 'encoded'}, 'BODY_PART':
         {'state': 'encoded'}, 'SED_TYPE': {'state': 'encoded'}, 'LAB':
         {'state': 'encoded'}}, verbose:bool=False, **kwargs)

Decode data from NetCDF.

	Type	Default	Details
fname_in	str		Input file name
dest_out	str \| None	None	Output file name (optional)
output_format	str	openrefine_csv
remap_vars	Dict	{‘LON’: ‘longitude’, ‘LAT’: ‘latitude’, ‘SMP_DEPTH’: ‘sampdepth’, ‘TOT_DEPTH’: ‘totdepth’, ‘TIME’: ‘begperiod’, ‘AREA’: ‘area’, ‘NUCLIDE’: ‘nuclide_id’, ‘VALUE’: ‘activity’, ‘UNIT’: ‘unit_id’, ‘UNC’: ‘uncertaint’, ‘DL’: ‘detection’, ‘DLV’: ‘detection_val’, ‘FILT’: ‘filtered’, ‘COUNT_MET’: ‘counmet_id’, ‘SAMP_MET’: ‘sampmet_id’, ‘PREP_MET’: ‘prepmet_id’, ‘VOL’: ‘volume’, ‘SAL’: ‘salinity’, ‘TEMP’: ‘temperatur’, ‘SPECIES’: ‘species_id’, ‘BODY_PART’: ‘bodypar_id’, ‘SED_TYPE’: ‘sedtype_id’, ‘TOP’: ‘sliceup’, ‘BOTTOM’: ‘slicedown’, ‘DRYWT’: ‘drywt’, ‘WETWT’: ‘wetwt’, ‘PERCENTWT’: ‘percentwt’, ‘LAB’: ‘lab_id’, ‘PROFILE_ID’: ‘profile_id’, ‘SAMPLE_TYPE’: ‘samptype_id’, ‘TAXONNAME’: ‘taxonname’, ‘TAXONREPNAME’: ‘taxonrepname’, ‘TAXONRANK’: ‘taxonrank’, ‘TAXONDB’: ‘taxondb’, ‘TAXONDBID’: ‘taxondb_id’, ‘TAXONDBURL’: ‘taxondb_url’, ‘REF_ID’: ‘ref_id’, ‘SMP_ID’: ‘samplabcode’, ‘STATION’: ‘station’}
remap_dtypes	Dict	{‘AREA’: {‘state’: ‘decoded’}, ‘NUCLIDE’: {‘state’: ‘encoded’}, ‘UNIT’: {‘state’: ‘encoded’}, ‘DL’: {‘state’: ‘decoded’}, ‘FILT’: {‘state’: ‘decoded’}, ‘COUNT_MET’: {‘state’: ‘encoded’}, ‘SAMP_MET’: {‘state’: ‘encoded’}, ‘PREP_MET’: {‘state’: ‘encoded’}, ‘SPECIES’: {‘state’: ‘encoded’}, ‘BODY_PART’: {‘state’: ‘encoded’}, ‘SED_TYPE’: {‘state’: ‘encoded’}, ‘LAB’: {‘state’: ‘encoded’}}
verbose	bool	False
kwargs	VAR_KEYWORD
Returns	None		Additional arguments

fname = Path('../../_data/output/100-HELCOM-MORS-2024.nc')
decode(fname_in=fname, dest_out=fname.with_suffix(''), output_format='openrefine_csv')