Data format transformation

A data pipeline handler that transforms MARIS data from NetCDF to CSV. The primary focus is on converting NetCDF data into MARIS Standard Open-Refine CSV format while preserving data integrity. This handler implements a modular transformation pipeline using callbacks for each processing step, ensuring flexibility and extensibility in data handling.

Tip

For new MARIS users, please refer to field definitions for detailed information about Maris fields.

Dependencies

Required packages and internal modules for data format transformations

from IPython.display import display, Markdown

Configuration and File Paths

fname_in =  Path('../../_data/output/100-HELCOM-MORS-2024.nc')
fname_out = fname_in.with_suffix('.csv')
output_format = 'openrefine_csv'

Data Loading

Load data from standardized MARIS NetCDF files using ExtractNetcdfContents. The NetCDF files follow CF conventions and include standardized variable names and metadata according to MARIS specifications.

contents=ExtractNetcdfContents(fname_in)

Show the dictionary of dataframes extracted from the NetCDF file.

contents.dfs
{'BIOTA':              LON        LAT  SMP_DEPTH        TIME  NUCLIDE       VALUE  UNIT  \
 0      12.316667  54.283333        NaN  1348358400       31    0.010140     5   
 1      12.316667  54.283333        NaN  1348358400        4  135.300003     5   
 2      12.316667  54.283333        NaN  1348358400        9    0.013980     5   
 3      12.316667  54.283333        NaN  1348358400       33    4.338000     5   
 4      12.316667  54.283333        NaN  1348358400       31    0.009614     5   
 ...          ...        ...        ...         ...      ...         ...   ...   
 16089  21.395000  61.241501        2.0  1652140800       33   13.700000     4   
 16090  21.395000  61.241501        2.0  1652140800        9    0.500000     4   
 16091  21.385000  61.343334        NaN  1663200000        4   50.700001     4   
 16092  21.385000  61.343334        NaN  1663200000       33    0.880000     4   
 16093  21.385000  61.343334        NaN  1663200000       12    6.600000     4   
 
             UNC  DL  BIO_GROUP  SPECIES  BODY_PART       DRYWT  WETWT  \
 0           NaN   2          4       99         52  174.934433  948.0   
 1      4.830210   1          4       99         52  174.934433  948.0   
 2           NaN   2          4       99         52  174.934433  948.0   
 3      0.150962   1          4       99         52  174.934433  948.0   
 4           NaN   2          4       99         52  177.935120  964.0   
 ...         ...  ..        ...      ...        ...         ...    ...   
 16089  0.520600   1         11       96         55         NaN    NaN   
 16090  0.045500   1         11       96         55         NaN    NaN   
 16091  4.106700   1         14      129          1         NaN    NaN   
 16092  0.140800   1         14      129          1         NaN    NaN   
 16093  0.349800   1         14      129          1         NaN    NaN   
 
        PERCENTWT  
 0        0.18453  
 1        0.18453  
 2        0.18453  
 3        0.18453  
 4        0.18458  
 ...          ...  
 16089        NaN  
 16090        NaN  
 16091        NaN  
 16092        NaN  
 16093        NaN  
 
 [16094 rows x 15 columns],
 'SEAWATER':              LON        LAT  SMP_DEPTH  TOT_DEPTH        TIME  NUCLIDE  \
 0      29.333300  60.083302        0.0        NaN  1337731200       33   
 1      29.333300  60.083302       29.0        NaN  1337731200       33   
 2      23.150000  59.433300        0.0        NaN  1339891200       33   
 3      27.983299  60.250000        0.0        NaN  1337817600       33   
 4      27.983299  60.250000       39.0        NaN  1337817600       33   
 ...          ...        ...        ...        ...         ...      ...   
 21468  13.499833  54.600334        0.0       47.0  1686441600        1   
 21469  13.499833  54.600334       45.0       47.0  1686441600        1   
 21470  14.200833  54.600334        0.0       11.0  1686614400        1   
 21471  14.665500  54.600334        0.0       20.0  1686614400        1   
 21472  14.330000  54.600334        0.0       17.0  1686614400        1   
 
             VALUE  UNIT        UNC  DL  FILT  
 0        5.300000     1   1.696000   1     0  
 1       19.900000     1   3.980000   1     0  
 2       25.500000     1   5.100000   1     0  
 3       17.000000     1   4.930000   1     0  
 4       22.200001     1   3.996000   1     0  
 ...           ...   ...        ...  ..   ...  
 21468  702.838074     1  51.276207   1     0  
 21469  725.855713     1  52.686260   1     0  
 21470  648.992920     1  48.154419   1     0  
 21471  627.178406     1  46.245316   1     0  
 21472  605.715088     1  45.691143   1     0  
 
 [21473 rows x 11 columns],
 'SEDIMENT':              LON        LAT  TOT_DEPTH        TIME  NUCLIDE        VALUE  \
 0      27.799999  60.466667       25.0  1337904000       33  1200.000000   
 1      27.799999  60.466667       25.0  1337904000       33   250.000000   
 2      27.799999  60.466667       25.0  1337904000       33   140.000000   
 3      27.799999  60.466667       25.0  1337904000       33    79.000000   
 4      27.799999  60.466667       25.0  1337904000       33    29.000000   
 ...          ...        ...        ...         ...      ...          ...   
 70444  15.537800  54.617832       62.0  1654646400       67     0.044000   
 70445  15.537800  54.617832       62.0  1654646400       77     2.500000   
 70446  15.537800  54.617832       62.0  1654646400        4  5873.000000   
 70447  15.537800  54.617832       62.0  1654646400       33    21.200001   
 70448  15.537800  54.617832       62.0  1654646400       77     0.370000   
 
        UNIT         UNC  DL  SED_TYPE   TOP  BOTTOM  PERCENTWT  
 0         4  240.000000   1         0  15.0    20.0        NaN  
 1         4   50.000000   1         0  20.0    25.0        NaN  
 2         4   29.400000   1         0  25.0    30.0        NaN  
 3         4   15.800000   1         0  30.0    35.0        NaN  
 4         4    6.960000   1         0  35.0    40.0        NaN  
 ...     ...         ...  ..       ...   ...     ...        ...  
 70444     4    0.015312   1        10  15.0    17.0   0.257642  
 70445     4    0.185000   1        10  15.0    17.0   0.257642  
 70446     4  164.444000   1        10  17.0    19.0   0.263965  
 70447     4    2.162400   1        10  17.0    19.0   0.263965  
 70448     4    0.048100   1        10  17.0    19.0   0.263965  
 
 [70449 rows x 13 columns]}

Show the dictionary of enums extracted from the NetCDF file.

contents.enum_dicts
{'BIOTA': {'nuclide': {'NOT APPLICABLE': '-1',
   'NOT AVAILABLE': '0',
   'h3': '1',
   'be7': '2',
   'c14': '3',
   'k40': '4',
   'cr51': '5',
   'mn54': '6',
   'co57': '7',
   'co58': '8',
   'co60': '9',
   'zn65': '10',
   'sr89': '11',
   'sr90': '12',
   'zr95': '13',
   'nb95': '14',
   'tc99': '15',
   'ru103': '16',
   'ru106': '17',
   'rh106': '18',
   'ag106m': '19',
   'ag108': '20',
   'ag108m': '21',
   'ag110m': '22',
   'sb124': '23',
   'sb125': '24',
   'te129m': '25',
   'i129': '28',
   'i131': '29',
   'cs127': '30',
   'cs134': '31',
   'cs137': '33',
   'ba140': '34',
   'la140': '35',
   'ce141': '36',
   'ce144': '37',
   'pm147': '38',
   'eu154': '39',
   'eu155': '40',
   'pb210': '41',
   'pb212': '42',
   'pb214': '43',
   'bi207': '44',
   'bi211': '45',
   'bi214': '46',
   'po210': '47',
   'rn220': '48',
   'rn222': '49',
   'ra223': '50',
   'ra224': '51',
   'ra225': '52',
   'ra226': '53',
   'ra228': '54',
   'ac228': '55',
   'th227': '56',
   'th228': '57',
   'th232': '59',
   'th234': '60',
   'pa234': '61',
   'u234': '62',
   'u235': '63',
   'u238': '64',
   'np237': '65',
   'np239': '66',
   'pu238': '67',
   'pu239': '68',
   'pu240': '69',
   'pu241': '70',
   'am240': '71',
   'am241': '72',
   'cm242': '73',
   'cm243': '74',
   'cm244': '75',
   'cs134_137_tot': '76',
   'pu239_240_tot': '77',
   'pu239_240_iii_iv_tot': '78',
   'pu239_240_v_vi_tot': '79',
   'cm243_244_tot': '80',
   'pu238_pu239_240_tot_ratio': '81',
   'am241_pu239_240_tot_ratio': '82',
   'cs137_134_ratio': '83',
   'cd109': '84',
   'eu152': '85',
   'fe59': '86',
   'gd153': '87',
   'ir192': '88',
   'pu238_240_tot': '89',
   'rb86': '90',
   'sc46': '91',
   'sn113': '92',
   'sn117m': '93',
   'tl208': '94',
   'mo99': '95',
   'tc99m': '96',
   'ru105': '97',
   'te129': '98',
   'te132': '99',
   'i132': '100',
   'i135': '101',
   'cs136': '102',
   'tbeta': '103',
   'talpha': '104',
   'i133': '105',
   'th230': '106',
   'pa231': '107',
   'u236': '108',
   'ag111': '109',
   'in116m': '110',
   'te123m': '111',
   'sb127': '112',
   'ba133': '113',
   'ce139': '114',
   'tl201': '116',
   'hg203': '117',
   'na22': '122',
   'pa234m': '123',
   'am243': '124',
   'se75': '126',
   'sr85': '127',
   'y88': '128',
   'ce140': '129',
   'bi212': '130',
   'u236_238_ratio': '131',
   'i125': '132',
   'ba137m': '133',
   'u232': '134',
   'pa233': '135',
   'ru106_rh106_tot': '136',
   'tu': '137',
   'tbeta40k': '138',
   'fe55': '139',
   'ce144_pr144_tot': '140',
   'pu240_pu239_ratio': '141',
   'u233': '142',
   'pu239_242_tot': '143',
   'ac227': '144'},
  'unit': {'Not applicable': '-1',
   'NOT AVAILABLE': '0',
   'Bq per m3': '1',
   'Bq per m2': '2',
   'Bq per kg': '3',
   'Bq per kgd': '4',
   'Bq per kgw': '5',
   'kg per kg': '6',
   'TU': '7',
   'DELTA per mill': '8',
   'atom per kg': '9',
   'atom per kgd': '10',
   'atom per kgw': '11',
   'atom per l': '12',
   'Bq per kgC': '13'},
  'dl': {'Not applicable': '-1',
   'Not available': '0',
   'Detected value': '1',
   'Detection limit': '2',
   'Not detected': '3',
   'Derived': '4'},
  'bio_group': {'Not applicable': '-1',
   'Not available': '0',
   'Birds': '1',
   'Crustaceans': '2',
   'Echinoderms': '3',
   'Fish': '4',
   'Mammals': '5',
   'Molluscs': '6',
   'Others': '7',
   'Plankton': '8',
   'Polychaete worms': '9',
   'Reptile': '10',
   'Seaweeds and plants': '11',
   'Cephalopods': '12',
   'Gastropods': '13',
   'Bivalves': '14'},
  'species': {'NOT AVAILABLE': '0',
   'Aristeus antennatus': '1',
   'Apostichopus': '2',
   'Saccharina japonica var religiosa': '3',
   'Siganus fuscescens': '4',
   'Alpheus dentipes': '5',
   'Hexagrammos agrammus': '6',
   'Ditrema temminckii': '7',
   'Parapristipoma trilineatum': '8',
   'Scombrops boops': '9',
   'Pseudopleuronectes schrenki': '10',
   'Desmarestia ligulata': '11',
   'Saccharina japonica': '12',
   'Neodilsea yendoana': '13',
   'Costaria costata': '14',
   'Sargassum yezoense': '15',
   'Acanthephyra pelagica': '16',
   'Sargassum ringgoldianum': '17',
   'Acanthephyra quadrispinosa': '18',
   'Sargassum thunbergii': '19',
   'Sargassum patens': '20',
   'Asterias rubens': '21',
   'Sargassum miyabei': '22',
   'Homarus gammarus': '23',
   'Acanthephyra stylorostratis': '24',
   'Acanthocybium solandri': '25',
   'Acanthopagrus bifasciatus': '26',
   'Acanthophora muscoides': '27',
   'Acanthophora spicifera': '28',
   'Acanthurus triostegus': '29',
   'Actinopterygii': '30',
   'Adamussium colbecki': '31',
   'Ahnfeltiopsis densa': '32',
   'Alepes melanoptera': '33',
   'Ampharetidae': '34',
   'Anchoviella lepidentostole': '35',
   'Anguillidae': '36',
   'Aphroditidae': '37',
   'Arnoglossus': '38',
   'Aurigequula fasciata': '39',
   'Balaenoptera musculus': '40',
   'Balaenoptera physalus': '41',
   'Balistes': '42',
   'Beryciformes': '43',
   'Bryopsis maxima': '44',
   'Callinectes sp': '45',
   'Callorhinus ursinus': '46',
   'Carassius auratus auratus': '47',
   'Carcharhinus sorrah': '48',
   'Caridae': '49',
   'Clupea harengus': '50',
   'Cathorops spixii': '51',
   'Caulerpa racemosa': '52',
   'Caulerpa scalpelliformis': '53',
   'Caulerpa sertularioides': '54',
   'Cellana radiata': '55',
   'Coscinasterias tenuispina': '56',
   'Centroceras clavulatum': '57',
   'Centropomus parallelus': '58',
   'Crangon crangon': '59',
   'Ceramium diaphanum': '60',
   'Ceramium rubrum': '61',
   'Chaenocephalus aceratus': '62',
   'Chaetodipterus faber': '63',
   'Chaetomorpha antennina': '64',
   'Chaetomorpha linoides': '65',
   'Chelidonichthys kumu': '66',
   'Chelon ramada': '67',
   'Chiloscyllium': '68',
   'Chionodraco hamatus': '69',
   'Chlamys islandica': '70',
   'Chlorophyta': '71',
   'Chondrichthyes': '72',
   'Chrysaora': '73',
   'Cladophora nitellopsis': '74',
   'Cladophora vagabunda': '75',
   'Cladophoropsis membranacea': '76',
   'Clupea': '77',
   'Coccotylus truncatus': '78',
   'Codium fragile': '79',
   'Crassostrea': '80',
   'Cynoscion acoupa': '81',
   'Cynoscion jamaicensis': '82',
   'Cynoscion leiarchus': '83',
   'Engraulis encrasicolus': '84',
   'Cypselurus agoo agoo': '85',
   'Cystophora cristata': '86',
   'Cystoseira barbata': '87',
   'Cystoseira crinita': '88',
   'Decapodiformes': '89',
   'Decapterus russelli': '90',
   'Decapterus scombrinus': '91',
   'Delphinapterus leucas': '92',
   'Delphinus capensis': '93',
   'Diapterus rhombeus': '94',
   'Dicentrarchus punctatus': '95',
   'Fucus vesiculosus': '96',
   'Funchalia woodwardi': '97',
   'Ecklonia bicyclis': '98',
   'Gadus morhua': '99',
   'Ecklonia kurome': '100',
   'Gennadas elegans': '101',
   'Eisenia arborea': '102',
   'Encrasicholina devisi': '103',
   'Enteromorpha': '104',
   'Enteromorpha flexuosa': '105',
   'Enteromorpha intestinalis': '106',
   'Epinephelinae': '107',
   'Epinephelus diacanthus': '108',
   'Exocoetidae': '109',
   'Saccharina latissima': '110',
   'Gracilaria corticata': '111',
   'Ligur ensiferus': '112',
   'Gracilaria debilis': '113',
   'Gracilaria edulis': '114',
   'Gracilariales': '115',
   'Grateloupia elliptica': '116',
   'Grateloupia filicina': '117',
   'Lysmata seticaudata': '118',
   'Gymnogongrus griffithsiae': '119',
   'Mya arenaria': '120',
   'Halichoerus grypus': '121',
   'Macoma balthica': '122',
   'Marthasterias glacialis': '123',
   'Halimeda macroloba': '124',
   'Harengula clupeola': '125',
   'Harpagifer antarcticus': '126',
   'Hemifusus ternatanus': '127',
   'Hemiramphus brasiliensis': '128',
   'Mytilus edulis': '129',
   'Metapenaeus affinis': '130',
   'Heteroscleromorpha': '131',
   'Heterosigma akashiwo': '132',
   'Hilsa ilisha': '133',
   'Metapenaeus monoceros': '134',
   'Metapenaeus stebbingi': '135',
   'Holothuria': '136',
   'Hoplobrotula armata': '137',
   'Hypnea musciformis': '138',
   'Merlangius merlangus': '139',
   'Iridaea cordata': '140',
   'Jania rubens': '141',
   'Meganyctiphanes norvegica': '142',
   'Johnius glaucus': '143',
   'Kappaphycus': '144',
   'Kappaphycus alvarezii': '145',
   'Laevistrombus canarium': '146',
   'Lagenodelphis hosei': '147',
   'Lambia': '148',
   'Laminaria japonica': '149',
   'Laminaria longissima': '150',
   'Larimus breviceps': '151',
   'Laurencia papillosa': '152',
   'Leiognathidae': '153',
   'Leiognathus dussumieri': '154',
   'Lepidochelys olivacea': '155',
   'Leptonychotes weddellii': '156',
   'Limanda yokohamae': '157',
   'Nephrops norvegicus': '158',
   'Neuston': '159',
   'Littoraria undulata': '160',
   'Loligo vulgaris': '161',
   'Lumbrineridae': '162',
   'Lutjanus fulviflamma': '163',
   'Marginisporum aberrans': '164',
   'Megalaspis cordyla': '165',
   'Octopus vulgaris': '166',
   'Menticirrhus americanus': '167',
   'Mesoplodon densirostris': '168',
   'Palaemon longirostris': '169',
   'Metapenaeus brevicornis': '170',
   'Pasiphaea multidentata': '171',
   'Pasiphaea sivado': '172',
   'Parapenaeopsis stylifera': '173',
   'Miichthys miiuy': '174',
   'Mirounga leonina': '175',
   'Brachidontes striatulus': '176',
   'Monodon monoceros': '177',
   'Mugil platanus': '178',
   'Penaeus semisulcatus': '179',
   'Mullus barbatus': '180',
   'Mycteroperca rubra': '181',
   'Philocheras echinulatus': '182',
   'Myelophycus simplex': '183',
   'Mytilus coruscus': '184',
   'Penaeus indicus': '185',
   'Natator depressus': '186',
   'Pandalus jordani': '187',
   'Melicertus kerathurus': '188',
   'Parapenaeus longirostris': '189',
   'Plesionika': '190',
   'Platichthys flesus': '191',
   'Pleuronectes platessa': '192',
   'Nematopalaemon tenuipes': '193',
   'Nematoscelis difficilis': '194',
   'Nemipterus': '195',
   'Aegaeon lacazei': '196',
   'Nephtyidae': '197',
   'Nereididae': '198',
   'Netuma bilineata': '199',
   'Nibea maculata': '200',
   'Oceana serrulata': '201',
   'Palaemon serratus': '202',
   'Ocypode': '203',
   'Odobenus rosmarus': '204',
   'Ogcocephalus vespertilio': '205',
   'Oligoplites saurus': '206',
   'Onuphidae': '207',
   'Opheliidae': '208',
   'Opisthonema oglinum': '209',
   'Opisthopterus tardoore': '210',
   'Orientomysis mitsukurii': '211',
   'Otolithes cuvieri': '212',
   'Padina pavonica': '213',
   'Padina tetrastromatica': '214',
   'Padina vickersiae': '215',
   'Pagellus affinis': '216',
   'Pagophilus groenlandicus': '217',
   'Paguroidea': '218',
   'Pagurus': '219',
   'Systellaspis debilis': '220',
   'Sergestes': '221',
   'Sergestes arcticus': '222',
   'Pampus argenteus': '223',
   'Sergestes arachnipodus': '224',
   'Sergestes henseni': '225',
   'Sergestes prehensilis': '226',
   'Sergestes robustus': '227',
   'Pangasius pangasius': '228',
   'Panulirus homarus': '229',
   'Paracentrotus lividus': '230',
   'Pasiphaea sp': '231',
   'Pectinariidae': '232',
   'Penaeus': '233',
   'Phoca vitulina': '234',
   'Photopectoralis bindus': '235',
   'Phyllospadix iwatensis': '236',
   'Plectorhinchus mediterraneus': '237',
   'Pleuronectes mochigarei': '238',
   'Pleuronectes obscurus': '239',
   'Plocamium brasiliense': '240',
   'Polynemus paradiseus': '241',
   'Polysiphonia': '242',
   'Sprattus sprattus': '243',
   'Scomber scombrus': '244',
   'Polysiphonia fucoides': '245',
   'Gonostomatidae': '246',
   'Perca fluviatilis': '247',
   'Pomadasys crocro': '248',
   'Porphyra tenera': '249',
   'Potamogeton pectinatus': '250',
   'Priacanthus hamrur': '251',
   'Pseudorhombus malayanus': '252',
   'Pterocladiella capillacea': '253',
   'Pusa caspica': '254',
   'Pusa sibirica': '255',
   'Pylaiella littoralis': '256',
   'Sabellidae': '257',
   'Salangichthys ishikawae': '258',
   'Sarconema filiforme': '259',
   'Sardinella albella': '260',
   'Sardinella brasiliensis': '261',
   'Sardinops melanostictus': '262',
   'Sargassum cymosum': '263',
   'Sargassum linearifolium': '264',
   'Sargassum micracanthum': '265',
   'Xiphias gladius': '266',
   'Sargassum novae hollandiae': '267',
   'Sargassum oligocystum': '268',
   'Esox lucius': '269',
   'Limanda limanda': '270',
   'Abramis brama': '271',
   'Anguilla anguilla': '272',
   'Arctica islandica': '273',
   'Cerastoderma edule': '274',
   'Cyprinus carpio': '275',
   'Echinodermata': '276',
   'Fish larvae': '277',
   'Myoxocephalus scorpius': '278',
   'Osmerus eperlanus': '279',
   'Plankton': '280',
   'Scophthalmus maximus': '281',
   'Rhodophyta': '282',
   'Rutilus rutilus': '283',
   'Saduria entomon': '284',
   'Sander lucioperca': '285',
   'Gasterosteus aculeatus': '286',
   'Zoarces viviparus': '287',
   'Gymnocephalus cernua': '288',
   'Furcellaria lumbricalis': '289',
   'Cladophora glomerata': '290',
   'Lateolabrax japonicus': '291',
   'Okamejei kenojei': '292',
   'Sebastes pachycephalus': '293',
   'Squalus acanthias': '294',
   'Gadus macrocephalus': '295',
   'Paralichthys olivaceus': '296',
   'Ovalipes punctatus': '297',
   'Pseudopleuronectes yokohamae': '298',
   'Hemitripterus villosus': '299',
   'Clidoderma asperrimum': '300',
   'Microstomus achne': '301',
   'Lepidotrigla microptera': '302',
   'Hexagrammos otakii': '303',
   'Kareius bicoloratus': '304',
   'Pleuronichthys cornutus': '305',
   'Enteroctopus dofleini': '306',
   'Ammodytes personatus': '307',
   'Lophius litulon': '308',
   'Eopsetta grigorjewi': '309',
   'Takifugu porphyreus': '310',
   'Loliolus japonica': '311',
   'Sepia andreana': '312',
   'Sebastes cheni': '313',
   'Portunus trituberculatus': '314',
   'Sebastes schlegelii': '315',
   'Pennahia argentata': '316',
   'Platichthys stellatus': '317',
   'Gadus chalcogrammus': '318',
   'Chelidonichthys spinosus': '319',
   'Conger myriaster': '320',
   'Heterololigo bleekeri': '321',
   'Stichaeus grigorjewi': '322',
   'Pseudopleuronectes herzensteini': '323',
   'Octopus conispadiceus': '324',
   'Hippoglossoides dubius': '325',
   'Cleisthenes pinetorum': '326',
   'Glyptocephalus stelleri': '327',
   'Tanakius kitaharae': '328',
   'Nibea mitsukurii': '329',
   'Dasyatis matsubarai': '330',
   'Verasper moseri': '331',
   'Hemitrygon akajei': '332',
   'Triakis scyllium': '333',
   'Trachurus japonicus': '334',
   'Zeus faber': '335',
   'Pagrus major': '336',
   'Acanthopagrus schlegelii': '337',
   'Dentex tumifrons': '338',
   'Mustelus manazo': '339',
   'Seriola quinqueradiata': '340',
   'Hyperoglyphe japonica': '341',
   'Carcharhinus': '342',
   'Platycephalus': '343',
   'Scomber japonicus': '344',
   'Squatina japonica': '345',
   'Alopias pelagicus': '346',
   'Zenopsis nebulosa': '347',
   'Cynoglossus joyneri': '348',
   'Verasper variegatus': '349',
   'Oncorhynchus keta': '350',
   'Physiculus japonicus': '351',
   'Oplegnathus punctatus': '352',
   'Arothron hispidus': '353',
   'Stereolepis doederleini': '354',
   'Takifugu snyderi': '355',
   'Scomber australasicus': '356',
   'Liparis tanakae': '357',
   'Thamnaconus modestus': '358',
   'Gnathophis nystromi': '359',
   'Sebastes oblongus': '360',
   'Sebastiscus marmoratus': '361',
   'Takifugu pardalis': '362',
   'Mugil cephalus': '363',
   'Ditrema temminckii temminckii': '364',
   'Konosirus punctatus': '365',
   'Tribolodon brandtii': '366',
   'Oncorhynchus masou': '367',
   'Aluterus monoceros': '368',
   'Todarodes pacificus': '369',
   'Myoxocephalus stelleri': '370',
   'Myliobatis tobijei': '371',
   'Scyliorhinus torazame': '372',
   'Lophiomus setigerus': '373',
   'Heterodontus japonicus': '374',
   'Sebastes vulpes': '375',
   'Paraplagusia japonica': '376',
   'Ostrea edulis': '377',
   'Melanogrammus aeglefinus': '378',
   'Pollachius virens': '379',
   'Pollachius pollachius': '380',
   'Sebastes marinus': '381',
   'Anarhichas minor': '382',
   'Anarhichas denticulatus': '383',
   'Reinhardtius hippoglossoides': '384',
   'Trisopterus esmarkii': '385',
   'Micromesistius poutassou': '386',
   'Coryphaenoides rupestris': '387',
   'Argentina silus': '388',
   'Salmo salar': '389',
   'Sebastes viviparus': '390',
   'Buccinum undatum': '391',
   'Fucus serratus': '392',
   'Merluccius merluccius': '393',
   'Littorina littorea': '394',
   'Fucus': '395',
   'Rhodymenia': '396',
   'Solea solea': '397',
   'Trachurus trachurus': '398',
   'Eutrigla gurnardus': '399',
   'Pelvetia canaliculata': '400',
   'Ascophyllum nodosum': '401',
   'Mallotus villosus': '402',
   'Pecten maximus': '403',
   'Hippoglossoides platessoides': '404',
   'Sebastes mentella': '405',
   'Modiolus modiolus': '406',
   'Boreogadus saida': '407',
   'Sepia': '408',
   'Gadus': '409',
   'Sardina pilchardus': '410',
   'Pleuronectiformes': '411',
   'Molva molva': '412',
   'Patella': '413',
   'Crassostrea gigas': '414',
   'Dasyatis pastinaca': '415',
   'Lophius piscatorius': '416',
   'Porphyra umbilicalis': '417',
   'Patella vulgata': '418',
   'Brosme brosme': '419',
   'Glyptocephalus cynoglossus': '420',
   'Galeus melastomus': '421',
   'Chimaera monstrosa': '422',
   'Etmopterus spinax': '423',
   'Dicentrarchus labrax': '424',
   'Osilinus lineatus': '425',
   'Hippoglossus hippoglossus': '426',
   'Cyclopterus lumpus': '427',
   'Molva dypterygia': '428',
   'Microstomus kitt': '429',
   'Fucus distichus': '430',
   'Tapes': '431',
   'Sebastes norvegicus': '432',
   'Phycis blennoides': '433',
   'Fucus spiralis': '434',
   'Laminaria digitata': '435',
   'Dipturus batis': '436',
   'Anarhichas lupus': '437',
   'Lumpenus lampretaeformis': '438',
   'Lycodes vahlii': '439',
   'Argentina sphyraena': '440',
   'Trisopterus minutus': '441',
   'Thunnus': '442',
   'Hyperoplus lanceolatus': '443',
   'Gaidropsarus argentatus': '444',
   'Engraulis japonicus': '445',
   'Mytilus galloprovincialis': '446',
   'Undaria pinnatifida': '447',
   'Chlorophthalmus albatrossis': '448',
   'Sargassum fusiforme': '449',
   'Eisenia bicyclis': '450',
   'Spisula sachalinensis': '451',
   'Strongylocentrotus nudus': '452',
   'Haliotis discus hannai': '453',
   'Dexistes rikuzenius': '454',
   'Ruditapes philippinarum': '455',
   'Apostichopus japonicus': '456',
   'Pterothrissus gissu': '457',
   'Helicolenus hilgendorfii': '458',
   'Buccinum isaotakii': '459',
   'Neptunea intersculpta': '460',
   'Apostichopus nigripunctatus': '461',
   'Sebastes thompsoni': '462',
   'Oratosquilla oratoria': '463',
   'Oncorhynchus kisutch': '464',
   'Erimacrus isenbeckii': '465',
   'Sillago japonica': '466',
   'Trachysalambria curvirostris': '467',
   'Mytilus unguiculatus': '468',
   'Crassostrea nippona': '469',
   'Laminariales': '470',
   'Uroteuthis edulis': '471',
   'Takifugu poecilonotus': '472',
   'Neptunea arthritica': '473',
   'Katsuwonus pelamis': '474',
   'Doederleinia berycoides': '475',
   'Metapenaeopsis dalei': '476',
   'Seriola dumerili': '477',
   'Pseudorhombus pentophthalmus': '478',
   'Stephanolepis cirrhifer': '479',
   'Cookeolus japonicus': '480',
   'Panulirus japonicus': '481',
   'Thunnus orientalis': '482',
   'Halocynthia roretzi': '483',
   'Etrumeus sadina': '484',
   'Cololabis saira': '485',
   'Coryphaena hippurus': '486',
   'Sarda orientalis': '487',
   'Octopus ocellatus': '488',
   'Sardinops sagax': '489',
   'Sphyraena pinguis': '490',
   'Sebastes ventricosus': '491',
   'Occella iburia': '492',
   'Glossanodon semifasciatus': '493',
   'Mizuhopecten yessoensis': '494',
   'Neosalangichthys ishikawae': '495',
   'Bothrocara tanakae': '496',
   'Malacocottus zonurus': '497',
   'Coelorinchus macrochir': '498',
   'Neptunea constricta': '499',
   'Beringius polynematicus': '500',
   'Sebastes nivosus': '501',
   'Pandalus eous': '502',
   'Synaphobranchus kaupii': '503',
   'Sebastolobus macrochir': '504',
   'Marsupenaeus japonicus': '505',
   'Japelion hirasei': '506',
   'Pleurogrammus azonus': '507',
   'Monostroma nitidum': '508',
   'Atheresthes evermanni': '509',
   'Takifugu rubripes': '510',
   'Chionoecetes opilio': '511',
   'Pandalopsis coccinata': '512',
   'Chionoecetes japonicus': '513',
   'Sebastes matsubarae': '514',
   'Scombrops gilberti': '515',
   'Hyporhamphus sajori': '516',
   'Trichiurus lepturus': '517',
   'Alcichthys elongatus': '518',
   'Volutharpa perryi': '519',
   'Mercenaria stimpsoni': '520',
   'Berryteuthis magister': '521',
   'Aptocyclus ventricosus': '522',
   'Euphausia pacifica': '523',
   'Salangichthys microdon': '524',
   'Telmessus acutidens': '525',
   'Ceratophyllum demersum': '526',
   'Pandalus nipponensis': '527',
   'Sebastes owstoni': '528',
   'Cociella crocodilus': '529',
   'Conger japonicus': '530',
   'Sardinella zunasi': '531',
   'Cheilopogon pinnatibarbatus japonicus': '532',
   'Oplegnathus fasciatus': '533',
   'Macridiscus aequilatera': '534',
   'Repomucenus ornatipinnis': '535',
   'Clupea pallasii': '536',
   'Scorpaena neglecta': '537',
   'Scomberomorus niphonius': '538',
   'Leucopsarion petersii': '539',
   'Sebastes scythropus': '540',
   'Strongylura anastomella': '541',
   'Laemonema longipes': '542',
   'Fusitriton oregonensis': '543',
   'Japelion pericochlion': '544',
   'Sebastes steindachneri': '545',
   'Auxis rochei': '546',
   'Lobotes surinamensis': '547',
   'Auxis thazard': '548',
   'Chlorophthalmus borealis': '549',
   'Etelis coruscans': '550',
   'Sebastes inermis': '551',
   'Cynoglossus interruptus': '552',
   'Erilepis zonifer': '553',
   'Tridentiger obscurus': '554',
   'Caranx sexfasciatus': '555',
   'Thunnus thynnus': '556',
   'Takifugu stictonotus': '557',
   'Euthynnus affinis': '558',
   'Synagrops japonicus': '559',
   'Okamejei schmidti': '560',
   'Suggrundus meerdervoortii': '561',
   'Sebastes baramenuke': '562',
   'Pleurogrammus monopterygius': '563',
   'Decapterus maruadsi': '564',
   'Girella punctata': '565',
   'Sphyraena japonica': '566',
   'Ommastrephes bartramii': '567',
   'Sepiella japonica': '568',
   'Sepioteuthis lessoniana': '569',
   'Eucleoteuthis luminosa': '570',
   'Gloiopeltis furcata': '571',
   'Macrobrachium nipponense': '572',
   'Sepia kobiensis': '573',
   'Eriocheir japonica': '574',
   'Magallana nippona': '575',
   'Meretrix lusoria': '576',
   'Chondrus ocellatus': '577',
   'Chondrus elatus': '578',
   'Gloiopeltis': '579',
   'Holothuroidea': '580',
   'Corbicula japonica': '581',
   'Sunetta menstrualis': '582',
   'Pseudorhombus cinnamoneus': '583',
   'Takifugu niphobles': '584',
   'Lagocephalus gloveri': '585',
   'Beryx splendens': '586',
   'Parastichopus nigripunctatus': '587',
   'Venerupis philippinarum': '588',
   'Haliotis': '589',
   'Liparis agassizii': '590',
   'Seriola lalandi': '591',
   'Niphon spinosus': '592',
   'Pleuronichthys japonicus': '593',
   'Sergia lucens': '594',
   'Sphoeroides pachygaster': '595',
   'Coryphaenoides acrolepis': '596',
   'Pseudopleuronectes obscurus': '597',
   'Pyropia yezoensis': '598',
   'Isurus oxyrinchus': '599',
   'Sargassum fulvellum': '600',
   'Prionace glauca': '601',
   'Kajikia audax': '602',
   'Thunnus albacares': '603',
   'Thunnus alalunga': '604',
   'Thunnus obesus': '605',
   'Lamna ditropis': '606',
   'Glyptocidaris crenularis': '607',
   'Asterias amurensis': '608',
   'Sepiida': '609',
   'Congridae': '610',
   'Takifugu': '611',
   'Sargassum horneri': '612',
   'Haliotis discus': '613',
   'Pleuronectidae': '614',
   'Acanthogobius flavimanus': '615',
   'Acanthogobius lactipes': '616',
   'Pholis nebulosa': '617',
   'Hemigrapsus penicillatus': '618',
   'Palaemon paucidens': '619',
   'Mysidae': '620',
   'Zostera marina': '621',
   'Ulva pertusa': '622',
   'Gobiidae': '623',
   'Atherinidae': '624',
   'Tribolodon': '625',
   'Alpheus': '626',
   'Polychaeta': '627',
   'Sebastes': '628',
   'Charybdis japonica': '629',
   'Hemigrapsus': '630',
   'Favonigobius gymnauchen': '631',
   'Palaemon': '632',
   'Planiliza haematocheila': '633',
   'Palaemonidae': '634',
   'Pholis crassispina': '635',
   'Laminaria': '636',
   'Distolasterias nipon': '637',
   'Lophiiformes': '638',
   'Alpheus brevicristatus': '639',
   'Undaria undariodes': '640',
   'Neomysis awatschensis': '641',
   'Alpheidae': '642',
   'Macrobrachium': '643',
   'Hediste': '644',
   'Gymnogobius breunigii': '645',
   'Luidia quinaria': '646',
   'Rhizoprionodon acutus': '647',
   'Carangoides equula': '648',
   'Carcinoplax longimana': '649',
   'Anomura': '650',
   'Spatangoida': '651',
   'Plesiobatis daviesi': '652',
   'Eusphyra blochii': '653',
   'Ruditapes variegata': '654',
   'Sinonovacula constricta': '655',
   'Penaeus monodon': '656',
   'Litopenaeus vannamei': '657',
   'Solenocera crassicornis': '658',
   'Stomatopoda': '659',
   'Teuthida': '660',
   'Octopus': '661',
   'Larimichthys polyactis': '662',
   'Scomberomorini': '663',
   'Channa argus': '664',
   'Ranina ranina': '665',
   'Lates calcarifer': '666',
   'Scomberomorus commerson': '667',
   'Lutjanus malabaricus': '668',
   'Thenus parindicus': '669',
   'Amusium pleuronectes': '670',
   'Loligo': '671',
   'Plectropomus leopardus': '672',
   'Sillago ciliata': '673',
   'Scylla serrata': '674',
   'Pinctada maxima': '675',
   'Lutjanus argentimaculatus': '676',
   'Protonibea diacanthus': '677',
   'Polydactylus macrochir': '678',
   'Rachycentron canadum': '679',
   'Ibacus peronii': '680',
   'Arripis trutta': '681',
   'Sarda australis': '682',
   'Seriola hippos': '683',
   'Choerodon schoenleinii': '684',
   'Panulirus ornatus': '685',
   'Neotrygon kuhlii': '686',
   'Lethrinus nebulosus': '687',
   'Parupeneus multifasciatus': '688',
   'Saccostrea cucullata': '689',
   'Lutjanus sebae': '690',
   'Thunnus maccoyii': '691',
   'Acanthopagrus butcheri': '692',
   'Lambis lambis': '693',
   'Gerres subfasciatus': '694',
   'Zooplankton': '695',
   'Phytoplankton': '696',
   'Rapana venosa': '697',
   'Scapharca inaequivalvis': '698',
   'Ulva intestinalis': '699',
   'Ulva linza': '700',
   'Ceramium virgatum': '701',
   'Gayralia oxysperma': '702',
   'Vertebrata fucoides': '703',
   'Stuckenia pectinata': '704',
   'Rochia nilotica': '705',
   'Ctenochaetus striatus': '706',
   'Serranidae': '707',
   'Turbo setosus': '708',
   'Pandalidae': '709',
   'Gymnosarda unicolor': '710',
   'Epinephelini': '711',
   'Pisces': '712',
   'Liza klunzingeri': '713',
   'Acanthopagrus latus': '714',
   'Liza subviridis': '715',
   'Sparidentex hasta': '716',
   'Otolithes ruber': '717',
   'Crenidens crenidens': '718',
   'Ensis': '719',
   'Gastropoda': '720',
   'Euheterodonta': '721',
   'Scomber': '722',
   'Theragra chalcogramma': '723',
   'Engraulidae': '724',
   'Ostreidae': '725',
   'Phaeophyceae': '726',
   'Porphyra': '727',
   'Ulva reticulata': '728',
   'Perna viridis': '729',
   'Fenneropenaeus indicus': '730',
   'Merluccius': '731',
   'Soleidae': '732',
   'Mugilidae': '733',
   'Marine algae': '734',
   'Scarus rivulatus': '735',
   'Scarus coeruleus': '736',
   'Sardinella fimbriata': '737',
   'Dussumieria acuta': '738',
   'Lutjanus kasmira': '739',
   'Lutjanus rivulatus': '740',
   'Lutjanus bohar': '741',
   'Priacanthus blochii': '742',
   'Pelates quadrilineatus': '743',
   'Epinephelus fasciatus': '744',
   'Upeneus vittatus': '745',
   'Lethrinus laticaudis': '746',
   'Lethrinus lentjan': '747',
   'Lethrinus microdon': '748',
   'Sphyraena barracuda': '749',
   'Alectis indica': '750',
   'Epinephelus latifasciatus': '751',
   'Nemipterus japonicus': '752',
   'Raconda russeliana': '753',
   'Lactarius lactarius': '754',
   'Aetomylaeus bovinus': '755',
   'Pennahia anea': '756',
   'Leiognathus fasciatus': '757',
   'Sardinella longiceps': '758',
   'Tenualosa ilisha': '759',
   'Pellona ditchela': '760',
   'Stolephorus indicus': '761',
   'Setipinna breviceps': '762',
   'Rastrelliger kanagurta': '763',
   'Chanos chanos': '764',
   'Lepturacanthus savala': '765',
   'Epinephelus niveatus': '766',
   'Lutjanus johnii': '767',
   'Carangoides malabaricus': '768',
   'Ablennes hians': '769',
   'Chirocentrus dorab': '770',
   'Scomberomorus cavalla': '771',
   'Scomberomorus semifasciatus': '772',
   'Scomberomorus guttatus': '773',
   'Etrumeus teres': '774',
   'Spondyliosoma cantharus': '775',
   'Brama brama': '776',
   'Dasyatis zugei': '777',
   'Harpadon nehereus': '778',
   'Carcharhinus melanopterus': '779',
   'Penaeus plebejus': '780',
   'Sepia officinalis': '781',
   'Johnius dussumieri': '782',
   'Lutjanus campechanus': '783',
   'Ruditapes decussatus': '784',
   'Carcinus aestuarii': '785',
   'Squilla mantis': '786',
   'Epinephelus polyphekadion': '787',
   'Lutjanus gibbus': '788',
   'Lethrinus mahsena': '789',
   'Epinephelus chlorostigma': '790',
   'Carangoides bajad': '791',
   'Aethaloperca rogaa': '792',
   'Atule mate': '793',
   'Macolor niger': '794',
   'Carangoides fulvoguttatus': '795',
   'Plectropomus areolatus': '796',
   'Cephalopholis argus': '797',
   'Cephalopholis': '798',
   'Scarus sordidus': '799',
   'Scomberomorus tritor': '800',
   'Triaenodon obesus': '801',
   'Pomadasys commersonnii': '802',
   'Monotaxis grandoculis': '803',
   'Plectropomus maculatus': '804',
   'Trachinotus blochii': '805',
   'Pristipomoides filamentosus': '806',
   'Acanthurus gahhm': '807',
   'Acanthurus sohal': '808',
   'Siganus argenteus': '809',
   'Naso unicornis': '810',
   'Chanos': '811',
   'Oedalechilus labiosus': '812',
   'Plectorhinchus gaterinus': '813',
   'Mercenaria mercenaria': '814',
   'Mytilus': '815',
   'Turbo cornutus': '816',
   'Decapoda': '817',
   'Sphyraena': '818',
   'Arius maculatus': '819',
   'Penaeus merguiensis': '820',
   'Tegillarca granosa': '821',
   'Mullus barbatus barbatus': '822',
   'Chamelea gallina': '823',
   'Metanephrops thomsoni': '824',
   'Magallana gigas': '825',
   'Branchiostegus japonicus': '826',
   'Cephalopoda': '827',
   'Lutjanidae': '828',
   'Lethrinidae': '829',
   'Sphyraena argentea': '830',
   'Chirocentrus nudus': '831',
   'Trachinotus': '832',
   'Mugil auratus': '833',
   'Euthynnus alletteratus': '834',
   'Sparus aurata': '835',
   'Pagrus caeruleostictus': '836',
   'Scorpaena scrofa': '837',
   'Pagellus erythrinus': '838',
   'Epinephelus aeneus': '839',
   'Dentex maroccanus': '840',
   'Caranx rhonchus': '841',
   'Sardinella': '842',
   'Siganus': '843',
   'Solea': '844',
   'Diplodus sargus': '845',
   'Lithognathus mormyrus': '846',
   'Oblada melanura': '847',
   'Siganus rivulatus': '848',
   'Chelon labrosus': '849',
   'Cynoscion microlepidotus': '850',
   'Genypterus brasiliensis': '851',
   'Myoxocephalus polyacanthocephalus': '852',
   'Hexagrammos lagocephalus': '853',
   'Hexagrammos decagrammus': '854',
   'Sebastes ciliatus': '855',
   'Lepidopsetta polyxystra': '856',
   'Clupeiformes': '857',
   'Gadidae': '858',
   'Brachyura': '859',
   'Dasyatis': '860',
   'Carcharias': '861',
   'Saurida': '862',
   'Upeneus': '863',
   'Cynoglossus': '864',
   'Scomberomorus': '865',
   'Terapon': '866',
   'Leiognathus': '867',
   'Terapontidae': '868',
   'Caranx': '869',
   'Diplodus': '870',
   'Plectorhinchus flavomaculatus': '871',
   'Salmonidae': '872',
   'Mollusca': '873',
   'Boops boops': '874',
   'Sarpa salpa': '875',
   'Pagellus acarne': '876',
   'Spicara smaris': '877',
   'Diplodus vulgaris': '878',
   'Chelidonichthys lucerna': '879',
   'Sarda sarda': '880',
   'Serranus cabrilla': '881',
   'Diplodus annularis': '882',
   'Pagrus pagrus': '883',
   'Alosa fallax': '884',
   'Belone belone': '885',
   'Dentex dentex': '886',
   'Sphyraena viridensis': '887',
   'Trisopterus capelanus': '888',
   'Arnoglossus laterna': '889',
   'Procambarus clarkii': '890',
   'Nemadactylus macropterus': '891',
   'Pagrus auratus': '892',
   'Jasus edwardsii': '893',
   'Perna canaliculus': '894',
   'Pseudophycis bachus': '895',
   'Haliotis iris': '896',
   'Hoplostethus atlanticus': '897',
   'Rhombosolea leporina': '898',
   'Zygochlamys delicatula': '899',
   'Galeorhinus galeus': '900',
   'Parapercis colias': '901',
   'Tiostrea chilensis': '902',
   'Genypterus blacodes': '903',
   'Evechinus chloroticus': '904',
   'Austrovenus stutchburyi': '905',
   'Micromesistius australis': '906',
   'Macruronus novaezelandiae': '907',
   'Nototodarus': '908',
   'Perna perna': '909',
   'Sepia pharaonis': '910',
   'Turbo bruneus': '911',
   'Portunus sanguinolentus': '912',
   'Charybdis natator': '913',
   'Charybdis lucifera': '914',
   'Panulirus argus': '915',
   'Ethmalosa fimbriata': '916',
   'Sardinella brachysoma': '917',
   'Thryssa mystax': '918',
   'Plicofollis dussumieri': '919',
   'Nibea soldado': '920',
   'Epinephelus melanostigma': '921',
   'Megalops cyprinoides': '922',
   'Decapterus macarellus': '923',
   'Drepane punctata': '924',
   'Sillago sihama': '925',
   'Tylosurus crocodilus crocodilus': '926',
   'Saurida tumbil': '927',
   'Cynoglossus macrostomus': '928',
   'Parupeneus indicus': '929',
   'Synechogobius hasta': '930',
   'Busycotypus canaliculatus': '931',
   'Pampus cinereus': '932',
   'Pomadasys kaakan': '933',
   'Epinephelus coioides': '934',
   'Sepiella inermis': '935',
   'Uroteuthis duvauceli': '936',
   'Stomatella auricula': '937',
   'Cerithium scabridum': '938',
   'Marcia recens': '939',
   'Circe intermedia': '940',
   'Marcia opima': '941',
   'Fulvia fragile': '942',
   'Charybdis feriatus': '943',
   'Charybdis annulata': '944',
   'Atergatis integerrimus': '945',
   'Matuta lunaris': '946',
   'Calappa lophos': '947',
   'Uca annulipes': '948',
   'Chlamys varia': '949',
   'Cololabis adocetus': '950',
   'Seriola lalandi dorsalis': '951',
   'Brunneifusus ternatanus': '952',
   'Metapenaeus joyneri': '953',
   'Epinephelus tauvina': '954',
   'Coilia dussumieri': '955',
   'Carcharhinus dussumieri': '956',
   'Upeneus tragula': '957',
   'Sartoriana spinigera': '958',
   'Lamellidens marginalis': '959',
   'Polydactylus sextarius': '960',
   'Johnius macrorhynus': '961',
   'Hexanematichthys sagor': '962',
   'Sargassum swartzii': '963',
   'Argyrops spinifer': '964',
   'Synodus intermedius': '965',
   'Muraenesox cinereus': '966',
   'Carangoides armatus': '967',
   'Eleutheronema tetradactylum': '968',
   'Mustelus mosis': '969',
   'Nemipterus bipunctatus': '970',
   'Lutjanus quinquelineatus': '971',
   'Platycephalus indicus': '972',
   'Rhabdosargus haffara': '973',
   'Argyrops filamentosus': '974',
   'Brachirus orientalis': '975',
   'Mene maculata': '976',
   'Hemiramphus marginatus': '977',
   'Encrasicholina heteroloba': '978',
   'Trachinotus africanus': '979',
   'Bramidae': '980',
   'Escualosa thoracata': '981',
   'Sepia arabica': '982',
   'Scatophagus argus': '983',
   'Parastromateus niger': '984',
   'Planiliza subviridis': '985',
   'Labeo rohita': '986',
   'Oreochromis niloticus': '987',
   'Cardiidae': '988',
   'Sargassum angustifolium': '989',
   'Pomacea bridgesii': '990',
   'Sebastes fasciatus': '991',
   'Batoidea': '992',
   'Urophycis chuss': '993',
   'Dalatias licha': '994',
   'Trisopterus luscus': '995',
   'Scyliorhinus canicula': '996',
   'Ruvettus pretiosus': '997',
   'Aphanopus carbo': '998',
   'Alepocephalus bairdii': '999',
   ...},
  'body_part': {'Not applicable': '-1',
   'Not available': '0',
   'Whole animal': '1',
   'Whole animal eviscerated': '2',
   'Whole animal eviscerated without head': '3',
   'Flesh with bones': '4',
   'Blood': '5',
   'Skeleton': '6',
   'Bones': '7',
   'Exoskeleton': '8',
   'Endoskeleton': '9',
   'Shells': '10',
   'Molt': '11',
   'Skin': '12',
   'Head': '13',
   'Tooth': '14',
   'Otolith': '15',
   'Fins': '16',
   'Faecal pellet': '17',
   'Byssus': '18',
   'Soft parts': '19',
   'Viscera': '20',
   'Stomach': '21',
   'Hepatopancreas': '22',
   'Digestive gland': '23',
   'Pyloric caeca': '24',
   'Liver': '25',
   'Intestine': '26',
   'Kidney': '27',
   'Spleen': '28',
   'Brain': '29',
   'Eye': '30',
   'Fat': '31',
   'Heart': '32',
   'Branchial heart': '33',
   'Muscle': '34',
   'Mantle': '35',
   'Gills': '36',
   'Gonad': '37',
   'Ovary': '38',
   'Testes': '39',
   'Whole plant': '40',
   'Flower': '41',
   'Leaf': '42',
   'Old leaf': '43',
   'Young leaf': '44',
   'Leaf upper part': '45',
   'Leaf lower part': '46',
   'Scales': '47',
   'Root rhizome': '48',
   'Whole macro alga': '49',
   'Phytoplankton': '50',
   'Thallus': '51',
   'Flesh without bones': '52',
   'Stomach and intestine': '53',
   'Whole haptophytic plants': '54',
   'Loose drifting plants': '55',
   'Growing tips': '56',
   'Upper parts of plants': '57',
   'Lower parts of plants': '58',
   'Shells carapace': '59',
   'Flesh with scales': '60'}},
 'SEAWATER': {'nuclide': {'NOT APPLICABLE': '-1',
   'NOT AVAILABLE': '0',
   'h3': '1',
   'be7': '2',
   'c14': '3',
   'k40': '4',
   'cr51': '5',
   'mn54': '6',
   'co57': '7',
   'co58': '8',
   'co60': '9',
   'zn65': '10',
   'sr89': '11',
   'sr90': '12',
   'zr95': '13',
   'nb95': '14',
   'tc99': '15',
   'ru103': '16',
   'ru106': '17',
   'rh106': '18',
   'ag106m': '19',
   'ag108': '20',
   'ag108m': '21',
   'ag110m': '22',
   'sb124': '23',
   'sb125': '24',
   'te129m': '25',
   'i129': '28',
   'i131': '29',
   'cs127': '30',
   'cs134': '31',
   'cs137': '33',
   'ba140': '34',
   'la140': '35',
   'ce141': '36',
   'ce144': '37',
   'pm147': '38',
   'eu154': '39',
   'eu155': '40',
   'pb210': '41',
   'pb212': '42',
   'pb214': '43',
   'bi207': '44',
   'bi211': '45',
   'bi214': '46',
   'po210': '47',
   'rn220': '48',
   'rn222': '49',
   'ra223': '50',
   'ra224': '51',
   'ra225': '52',
   'ra226': '53',
   'ra228': '54',
   'ac228': '55',
   'th227': '56',
   'th228': '57',
   'th232': '59',
   'th234': '60',
   'pa234': '61',
   'u234': '62',
   'u235': '63',
   'u238': '64',
   'np237': '65',
   'np239': '66',
   'pu238': '67',
   'pu239': '68',
   'pu240': '69',
   'pu241': '70',
   'am240': '71',
   'am241': '72',
   'cm242': '73',
   'cm243': '74',
   'cm244': '75',
   'cs134_137_tot': '76',
   'pu239_240_tot': '77',
   'pu239_240_iii_iv_tot': '78',
   'pu239_240_v_vi_tot': '79',
   'cm243_244_tot': '80',
   'pu238_pu239_240_tot_ratio': '81',
   'am241_pu239_240_tot_ratio': '82',
   'cs137_134_ratio': '83',
   'cd109': '84',
   'eu152': '85',
   'fe59': '86',
   'gd153': '87',
   'ir192': '88',
   'pu238_240_tot': '89',
   'rb86': '90',
   'sc46': '91',
   'sn113': '92',
   'sn117m': '93',
   'tl208': '94',
   'mo99': '95',
   'tc99m': '96',
   'ru105': '97',
   'te129': '98',
   'te132': '99',
   'i132': '100',
   'i135': '101',
   'cs136': '102',
   'tbeta': '103',
   'talpha': '104',
   'i133': '105',
   'th230': '106',
   'pa231': '107',
   'u236': '108',
   'ag111': '109',
   'in116m': '110',
   'te123m': '111',
   'sb127': '112',
   'ba133': '113',
   'ce139': '114',
   'tl201': '116',
   'hg203': '117',
   'na22': '122',
   'pa234m': '123',
   'am243': '124',
   'se75': '126',
   'sr85': '127',
   'y88': '128',
   'ce140': '129',
   'bi212': '130',
   'u236_238_ratio': '131',
   'i125': '132',
   'ba137m': '133',
   'u232': '134',
   'pa233': '135',
   'ru106_rh106_tot': '136',
   'tu': '137',
   'tbeta40k': '138',
   'fe55': '139',
   'ce144_pr144_tot': '140',
   'pu240_pu239_ratio': '141',
   'u233': '142',
   'pu239_242_tot': '143',
   'ac227': '144'},
  'unit': {'Not applicable': '-1',
   'NOT AVAILABLE': '0',
   'Bq per m3': '1',
   'Bq per m2': '2',
   'Bq per kg': '3',
   'Bq per kgd': '4',
   'Bq per kgw': '5',
   'kg per kg': '6',
   'TU': '7',
   'DELTA per mill': '8',
   'atom per kg': '9',
   'atom per kgd': '10',
   'atom per kgw': '11',
   'atom per l': '12',
   'Bq per kgC': '13'},
  'dl': {'Not applicable': '-1',
   'Not available': '0',
   'Detected value': '1',
   'Detection limit': '2',
   'Not detected': '3',
   'Derived': '4'},
  'filt': {'Not applicable': '-1',
   'Not available': '0',
   'Yes': '1',
   'No': '2'}},
 'SEDIMENT': {'nuclide': {'NOT APPLICABLE': '-1',
   'NOT AVAILABLE': '0',
   'h3': '1',
   'be7': '2',
   'c14': '3',
   'k40': '4',
   'cr51': '5',
   'mn54': '6',
   'co57': '7',
   'co58': '8',
   'co60': '9',
   'zn65': '10',
   'sr89': '11',
   'sr90': '12',
   'zr95': '13',
   'nb95': '14',
   'tc99': '15',
   'ru103': '16',
   'ru106': '17',
   'rh106': '18',
   'ag106m': '19',
   'ag108': '20',
   'ag108m': '21',
   'ag110m': '22',
   'sb124': '23',
   'sb125': '24',
   'te129m': '25',
   'i129': '28',
   'i131': '29',
   'cs127': '30',
   'cs134': '31',
   'cs137': '33',
   'ba140': '34',
   'la140': '35',
   'ce141': '36',
   'ce144': '37',
   'pm147': '38',
   'eu154': '39',
   'eu155': '40',
   'pb210': '41',
   'pb212': '42',
   'pb214': '43',
   'bi207': '44',
   'bi211': '45',
   'bi214': '46',
   'po210': '47',
   'rn220': '48',
   'rn222': '49',
   'ra223': '50',
   'ra224': '51',
   'ra225': '52',
   'ra226': '53',
   'ra228': '54',
   'ac228': '55',
   'th227': '56',
   'th228': '57',
   'th232': '59',
   'th234': '60',
   'pa234': '61',
   'u234': '62',
   'u235': '63',
   'u238': '64',
   'np237': '65',
   'np239': '66',
   'pu238': '67',
   'pu239': '68',
   'pu240': '69',
   'pu241': '70',
   'am240': '71',
   'am241': '72',
   'cm242': '73',
   'cm243': '74',
   'cm244': '75',
   'cs134_137_tot': '76',
   'pu239_240_tot': '77',
   'pu239_240_iii_iv_tot': '78',
   'pu239_240_v_vi_tot': '79',
   'cm243_244_tot': '80',
   'pu238_pu239_240_tot_ratio': '81',
   'am241_pu239_240_tot_ratio': '82',
   'cs137_134_ratio': '83',
   'cd109': '84',
   'eu152': '85',
   'fe59': '86',
   'gd153': '87',
   'ir192': '88',
   'pu238_240_tot': '89',
   'rb86': '90',
   'sc46': '91',
   'sn113': '92',
   'sn117m': '93',
   'tl208': '94',
   'mo99': '95',
   'tc99m': '96',
   'ru105': '97',
   'te129': '98',
   'te132': '99',
   'i132': '100',
   'i135': '101',
   'cs136': '102',
   'tbeta': '103',
   'talpha': '104',
   'i133': '105',
   'th230': '106',
   'pa231': '107',
   'u236': '108',
   'ag111': '109',
   'in116m': '110',
   'te123m': '111',
   'sb127': '112',
   'ba133': '113',
   'ce139': '114',
   'tl201': '116',
   'hg203': '117',
   'na22': '122',
   'pa234m': '123',
   'am243': '124',
   'se75': '126',
   'sr85': '127',
   'y88': '128',
   'ce140': '129',
   'bi212': '130',
   'u236_238_ratio': '131',
   'i125': '132',
   'ba137m': '133',
   'u232': '134',
   'pa233': '135',
   'ru106_rh106_tot': '136',
   'tu': '137',
   'tbeta40k': '138',
   'fe55': '139',
   'ce144_pr144_tot': '140',
   'pu240_pu239_ratio': '141',
   'u233': '142',
   'pu239_242_tot': '143',
   'ac227': '144'},
  'unit': {'Not applicable': '-1',
   'NOT AVAILABLE': '0',
   'Bq per m3': '1',
   'Bq per m2': '2',
   'Bq per kg': '3',
   'Bq per kgd': '4',
   'Bq per kgw': '5',
   'kg per kg': '6',
   'TU': '7',
   'DELTA per mill': '8',
   'atom per kg': '9',
   'atom per kgd': '10',
   'atom per kgw': '11',
   'atom per l': '12',
   'Bq per kgC': '13'},
  'dl': {'Not applicable': '-1',
   'Not available': '0',
   'Detected value': '1',
   'Detection limit': '2',
   'Not detected': '3',
   'Derived': '4'},
  'sed_type': {'Not applicable': '-1',
   'Not available': '0',
   'Clay': '1',
   'Gravel': '2',
   'Marsh': '3',
   'Mud': '4',
   'Muddy sand': '5',
   'Sand': '6',
   'Fine sand': '7',
   'Sandy mud': '8',
   'Pebby sand': '9',
   'Silt and clay': '10',
   'Silt and gravel': '11',
   'Silt': '12',
   'Silty sand': '13',
   'Sludge': '14',
   'Turf': '15',
   'Very coarse sand': '16',
   'Coarse sand': '17',
   'Medium sand': '18',
   'Very fine sand': '19',
   'Coarse silt': '20',
   'Medium silt': '21',
   'Fine silt': '22',
   'Very fine silt': '23',
   'Calcareous': '24',
   'Glacial': '25',
   'Soft': '26',
   'Sulphidic': '27',
   'Fe Mg concretions': '28',
   'Sand and gravel': '29',
   'Pure sand': '30',
   'Sand and fine sand': '31',
   'Sand and clay': '32',
   'Sand and mud': '33',
   'Fine sand and gravel': '34',
   'Fine sand and sand': '35',
   'Pure fine sand': '36',
   'Fine sand and silt': '37',
   'Fine sand and clay': '38',
   'Fine sand and mud': '39',
   'Silt and sand': '40',
   'Silt and fine sand': '41',
   'Pure silt': '42',
   'Silt and mud': '43',
   'Clay and gravel': '44',
   'Clay and sand': '45',
   'Clay and fine sand': '46',
   'Pure clay': '47',
   'Clay and silt': '48',
   'Clay and mud': '49',
   'Glacial clay': '50',
   'Soft clay': '51',
   'Sulphidic clay': '52',
   'Clay and Fe Mg concretions': '53',
   'Mud and gravel': '54',
   'Mud and sand': '55',
   'Mud and fine sand': '56',
   'Mud and clay': '57',
   'Pure mud': '58',
   'Soft mud': '59',
   'Sulphidic mud': '60',
   'Mud and Fe Mg concretions': '61',
   'Sand and silt': '62'}}}

Show the global attributes extracted from the NetCDF file.

contents.global_attrs
{'id': '26VMZZ2Q',
 'title': 'Environmental database - Helsinki Commission Monitoring of Radioactive Substances',
 'summary': 'MORS Environment database has been used to collate data resulting from monitoring of environmental radioactivity in the Baltic Sea based on HELCOM Recommendation 26/3.\n\nThe database is structured according to HELCOM Guidelines on Monitoring of Radioactive Substances (https://www.helcom.fi/wp-content/uploads/2019/08/Guidelines-for-Monitoring-of-Radioactive-Substances.pdf), which specifies reporting format, database structure, data types and obligatory parameters used for reporting data under Recommendation 26/3.\n\nThe database is updated and quality assured annually by HELCOM MORS EG.',
 'keywords': 'oceanography, Earth Science > Oceans > Ocean Chemistry> Radionuclides, Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure, Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments, Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes, Earth Science > Oceans > Water Quality > Ocean Contaminants, Earth Science > Biological Classification > Animals/Vertebrates > Fish, Earth Science > Biosphere > Ecosystems > Marine Ecosystems, Earth Science > Biological Classification > Animals/Invertebrates > Mollusks, Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans, Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)',
 'history': 'TBD',
 'keywords_vocabulary': 'GCMD Science Keywords',
 'keywords_vocabulary_url': 'https://gcmd.earthdata.nasa.gov/static/kms/',
 'record': 'TBD',
 'featureType': 'TBD',
 'cdm_data_type': 'TBD',
 'Conventions': 'CF-1.10 ACDD-1.3',
 'publisher_name': 'Paul MCGINNITY, Iolanda OSVATH, Florence DESCROIX-COMANDUCCI',
 'publisher_email': 'p.mc-ginnity@iaea.org, i.osvath@iaea.org, F.Descroix-Comanducci@iaea.org',
 'publisher_url': 'https://maris.iaea.org',
 'publisher_institution': 'International Atomic Energy Agency - IAEA',
 'creator_name': '[{"creatorType": "author", "name": "HELCOM MORS"}]',
 'institution': 'TBD',
 'metadata_link': 'TBD',
 'creator_email': 'TBD',
 'creator_url': 'TBD',
 'references': 'TBD',
 'license': 'Without prejudice to the applicable Terms and Conditions (https://nucleus.iaea.org/Pages/Others/Disclaimer.aspx), I hereby agree that any use of the data will contain appropriate acknowledgement of the data source(s) and the IAEA Marine Radioactivity Information System (MARIS).',
 'comment': 'TBD',
 'geospatial_lat_min': '31.17',
 'geospatial_lon_min': '9.6333',
 'geospatial_lat_max': '65.75',
 'geospatial_lon_max': '53.5',
 'geospatial_vertical_min': '0.0',
 'geospatial_vertical_max': '437.0',
 'geospatial_bounds': 'POLYGON ((9.6333 53.5, 31.17 53.5, 31.17 65.75, 9.6333 65.75, 9.6333 53.5))',
 'geospatial_bounds_crs': 'EPSG:4326',
 'time_coverage_start': '1984-01-10T00:00:00',
 'time_coverage_end': '2023-11-30T00:00:00',
 'local_time_zone': 'TBD',
 'date_created': 'TBD',
 'date_modified': 'TBD',
 'publisher_postprocess_logs': "Convert 'nuclide' column values to lowercase, strip spaces, and store in 'NUCLIDE' column., Remap data provider nuclide names to standardized MARIS nuclide names., Standardize time format across all dataframes., Encode time as seconds since epoch., Separate sediment entries into distinct rows for Bq/kg and Bq/m² measurements., Sanitize measurement values by removing blanks and standardizing to use the `VALUE` column., Convert from relative error to standard uncertainty., Set the `unit` id column in the DataFrames based on a lookup table., Remap value type to MARIS format., Remap values from 'rubin' to 'SPECIES' for groups: BIOTA., Remap values from 'tissue' to 'BODY_PART' for groups: BIOTA., Remap values from 'SPECIES' to 'BIO_GROUP' for groups: BIOTA., Lookup sediment id using lookup table., Lookup filt value in dataframe using the lookup table., Ensure depth values are floats and add 'SMP_DEPTH' and 'TOT_DEPTH' columns., Remap Sediment slice top and bottom to MARIS format., Lookup dry-wet ratio and format for MARIS., Get geographical coordinates from columns expressed in degrees decimal format or from columns in degrees/minutes decimal format where degrees decimal format is missing or zero., Drop rows with invalid longitude & latitude values. Convert `,` separator to `.` separator."}

Validate NetCDF Enumerations

Verify that enumerated values in the NetCDF file match current MARIS lookup tables.

Tip

FEEDBACK TO DATA PROVIDER: The enumeration validation process is a diagnostic step that identifies inconsistencies between NetCDF enumerations and MARIS lookup tables. While this validation does not modify the dataset, it generates detailed feedback about any mismatches or undefined values.


source

ValidateEnumsCB

 ValidateEnumsCB (contents, maris_enums, verbose=False)

Validate enumeration mappings between NetCDF file and MARIS lookup tables.

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        ValidateEnumsCB(
            contents = contents,
            maris_enums=Enums(lut_src_dir=lut_path())
        ),
    ]
)
tfm()
{'BIOTA':              LON        LAT  SMP_DEPTH        TIME  NUCLIDE       VALUE  UNIT  \
 0      12.316667  54.283333        NaN  1348358400       31    0.010140     5   
 1      12.316667  54.283333        NaN  1348358400        4  135.300003     5   
 2      12.316667  54.283333        NaN  1348358400        9    0.013980     5   
 3      12.316667  54.283333        NaN  1348358400       33    4.338000     5   
 4      12.316667  54.283333        NaN  1348358400       31    0.009614     5   
 ...          ...        ...        ...         ...      ...         ...   ...   
 16089  21.395000  61.241501        2.0  1652140800       33   13.700000     4   
 16090  21.395000  61.241501        2.0  1652140800        9    0.500000     4   
 16091  21.385000  61.343334        NaN  1663200000        4   50.700001     4   
 16092  21.385000  61.343334        NaN  1663200000       33    0.880000     4   
 16093  21.385000  61.343334        NaN  1663200000       12    6.600000     4   
 
             UNC  DL  BIO_GROUP  SPECIES  BODY_PART       DRYWT  WETWT  \
 0           NaN   2          4       99         52  174.934433  948.0   
 1      4.830210   1          4       99         52  174.934433  948.0   
 2           NaN   2          4       99         52  174.934433  948.0   
 3      0.150962   1          4       99         52  174.934433  948.0   
 4           NaN   2          4       99         52  177.935120  964.0   
 ...         ...  ..        ...      ...        ...         ...    ...   
 16089  0.520600   1         11       96         55         NaN    NaN   
 16090  0.045500   1         11       96         55         NaN    NaN   
 16091  4.106700   1         14      129          1         NaN    NaN   
 16092  0.140800   1         14      129          1         NaN    NaN   
 16093  0.349800   1         14      129          1         NaN    NaN   
 
        PERCENTWT  
 0        0.18453  
 1        0.18453  
 2        0.18453  
 3        0.18453  
 4        0.18458  
 ...          ...  
 16089        NaN  
 16090        NaN  
 16091        NaN  
 16092        NaN  
 16093        NaN  
 
 [16094 rows x 15 columns],
 'SEAWATER':              LON        LAT  SMP_DEPTH  TOT_DEPTH        TIME  NUCLIDE  \
 0      29.333300  60.083302        0.0        NaN  1337731200       33   
 1      29.333300  60.083302       29.0        NaN  1337731200       33   
 2      23.150000  59.433300        0.0        NaN  1339891200       33   
 3      27.983299  60.250000        0.0        NaN  1337817600       33   
 4      27.983299  60.250000       39.0        NaN  1337817600       33   
 ...          ...        ...        ...        ...         ...      ...   
 21468  13.499833  54.600334        0.0       47.0  1686441600        1   
 21469  13.499833  54.600334       45.0       47.0  1686441600        1   
 21470  14.200833  54.600334        0.0       11.0  1686614400        1   
 21471  14.665500  54.600334        0.0       20.0  1686614400        1   
 21472  14.330000  54.600334        0.0       17.0  1686614400        1   
 
             VALUE  UNIT        UNC  DL  FILT  
 0        5.300000     1   1.696000   1     0  
 1       19.900000     1   3.980000   1     0  
 2       25.500000     1   5.100000   1     0  
 3       17.000000     1   4.930000   1     0  
 4       22.200001     1   3.996000   1     0  
 ...           ...   ...        ...  ..   ...  
 21468  702.838074     1  51.276207   1     0  
 21469  725.855713     1  52.686260   1     0  
 21470  648.992920     1  48.154419   1     0  
 21471  627.178406     1  46.245316   1     0  
 21472  605.715088     1  45.691143   1     0  
 
 [21473 rows x 11 columns],
 'SEDIMENT':              LON        LAT  TOT_DEPTH        TIME  NUCLIDE        VALUE  \
 0      27.799999  60.466667       25.0  1337904000       33  1200.000000   
 1      27.799999  60.466667       25.0  1337904000       33   250.000000   
 2      27.799999  60.466667       25.0  1337904000       33   140.000000   
 3      27.799999  60.466667       25.0  1337904000       33    79.000000   
 4      27.799999  60.466667       25.0  1337904000       33    29.000000   
 ...          ...        ...        ...         ...      ...          ...   
 70444  15.537800  54.617832       62.0  1654646400       67     0.044000   
 70445  15.537800  54.617832       62.0  1654646400       77     2.500000   
 70446  15.537800  54.617832       62.0  1654646400        4  5873.000000   
 70447  15.537800  54.617832       62.0  1654646400       33    21.200001   
 70448  15.537800  54.617832       62.0  1654646400       77     0.370000   
 
        UNIT         UNC  DL  SED_TYPE   TOP  BOTTOM  PERCENTWT  
 0         4  240.000000   1         0  15.0    20.0        NaN  
 1         4   50.000000   1         0  20.0    25.0        NaN  
 2         4   29.400000   1         0  25.0    30.0        NaN  
 3         4   15.800000   1         0  30.0    35.0        NaN  
 4         4    6.960000   1         0  35.0    40.0        NaN  
 ...     ...         ...  ..       ...   ...     ...        ...  
 70444     4    0.015312   1        10  15.0    17.0   0.257642  
 70445     4    0.185000   1        10  15.0    17.0   0.257642  
 70446     4  164.444000   1        10  17.0    19.0   0.263965  
 70447     4    2.162400   1        10  17.0    19.0   0.263965  
 70448     4    0.048100   1        10  17.0    19.0   0.263965  
 
 [70449 rows x 13 columns]}

Remove Non Compatible Columns

The [RemoveNonCompatibleVariablesCB](https://franckalbinet.github.io/marisco/handlers/data_format_transformation.html#removenoncompatiblevariablescb) callback filters out variables from the NetCDF format that are not listed in the VARS configuration.


source

RemoveNonCompatibleVariablesCB

 RemoveNonCompatibleVariablesCB (vars:Dict[str,str]={'LON': 'longitude',
                                 'LAT': 'latitude', 'SMP_DEPTH':
                                 'sampdepth', 'TOT_DEPTH': 'totdepth',
                                 'TIME': 'begperiod', 'AREA': 'area',
                                 'NUCLIDE': 'nuclide_id', 'VALUE':
                                 'activity', 'UNIT': 'unit_id', 'UNC':
                                 'uncertaint', 'DL': 'detection', 'FILT':
                                 'filtered', 'COUNT_MET': 'counmet_id',
                                 'SAMP_MET': 'sampmet_id', 'PREP_MET':
                                 'prepmet_id', 'VOL': 'volume', 'SAL':
                                 'salinity', 'TEMP': 'temperatur',
                                 'SPECIES': 'species_id', 'BODY_PART':
                                 'bodypar_id', 'SED_TYPE': 'sedtype_id',
                                 'TOP': 'sliceup', 'BOTTOM': 'slicedown',
                                 'DRYWT': 'drywt', 'WETWT': 'wetwt',
                                 'PERCENTWT': 'percentwt', 'LAB':
                                 'lab_id', 'PROFILE_ID': 'profile_id',
                                 'SAMPLE_TYPE': 'samptype_id',
                                 'TAXONNAME': 'taxonname', 'TAXONREPNAME':
                                 'taxonrepname', 'TAXONRANK': 'taxonrank',
                                 'TAXONDB': 'taxondb', 'TAXONDBID':
                                 'taxondb_id', 'TAXONDBURL':
                                 'taxondb_url', 'REF_ID': 'ref_id',
                                 'SMP_ID': 'samplelabcode'},
                                 verbose:bool=False)

Remove variables not listed in VARS configuration.

Type Default Details
vars Dict {‘LON’: ‘longitude’, ‘LAT’: ‘latitude’, ‘SMP_DEPTH’: ‘sampdepth’, ‘TOT_DEPTH’: ‘totdepth’, ‘TIME’: ‘begperiod’, ‘AREA’: ‘area’, ‘NUCLIDE’: ‘nuclide_id’, ‘VALUE’: ‘activity’, ‘UNIT’: ‘unit_id’, ‘UNC’: ‘uncertaint’, ‘DL’: ‘detection’, ‘FILT’: ‘filtered’, ‘COUNT_MET’: ‘counmet_id’, ‘SAMP_MET’: ‘sampmet_id’, ‘PREP_MET’: ‘prepmet_id’, ‘VOL’: ‘volume’, ‘SAL’: ‘salinity’, ‘TEMP’: ‘temperatur’, ‘SPECIES’: ‘species_id’, ‘BODY_PART’: ‘bodypar_id’, ‘SED_TYPE’: ‘sedtype_id’, ‘TOP’: ‘sliceup’, ‘BOTTOM’: ‘slicedown’, ‘DRYWT’: ‘drywt’, ‘WETWT’: ‘wetwt’, ‘PERCENTWT’: ‘percentwt’, ‘LAB’: ‘lab_id’, ‘PROFILE_ID’: ‘profile_id’, ‘SAMPLE_TYPE’: ‘samptype_id’, ‘TAXONNAME’: ‘taxonname’, ‘TAXONREPNAME’: ‘taxonrepname’, ‘TAXONRANK’: ‘taxonrank’, ‘TAXONDB’: ‘taxondb’, ‘TAXONDBID’: ‘taxondb_id’, ‘TAXONDBURL’: ‘taxondb_url’, ‘REF_ID’: ‘ref_id’, ‘SMP_ID’: ‘samplelabcode’} Dictionary mapping OR vars to NC vars
verbose bool False
contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        RemoveNonCompatibleVariablesCB(vars=CSV_VARS, verbose=True),
    ]
)
tfm()
print('\n')
Removing variables that are not compatible with vars provided. 
Removing BIO_GROUP from BIOTA dataset.

Add Taxon Information


source

get_taxon_info_lut

 get_taxon_info_lut (maris_lut:str, key_names:dict={'Taxonname':
                     'TAXONNAME', 'Taxonrank': 'TAXONRANK', 'TaxonDB':
                     'TAXONDB', 'TaxonDBID': 'TAXONDBID', 'TaxonDBURL':
                     'TAXONDBURL'})

Create lookup dictionary for taxon information from MARIS species lookup table.


source

AddTaxonInformationCB

 AddTaxonInformationCB (fn_lut:Callable=<function <lambda>>,
                        verbose:bool=False)

Add taxon information to BIOTA group based on species lookup table.

Type Default Details
fn_lut Callable Function that returns taxon lookup dictionary
verbose bool False
contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        AddTaxonInformationCB(
            fn_lut=lut_taxon
        ),
    ]
)

tfm()
print(tfm.dfs['BIOTA'][['TAXONNAME','TAXONRANK','TAXONDB','TAXONDBID','TAXONDBURL']])
               TAXONNAME TAXONRANK   TAXONDB TAXONDBID  \
0           Gadus morhua   species  Wikidata   Q199788   
1           Gadus morhua   species  Wikidata   Q199788   
2           Gadus morhua   species  Wikidata   Q199788   
3           Gadus morhua   species  Wikidata   Q199788   
4           Gadus morhua   species  Wikidata   Q199788   
...                  ...       ...       ...       ...   
16089  Fucus vesiculosus   species  Wikidata   Q754755   
16090  Fucus vesiculosus   species  Wikidata   Q754755   
16091     Mytilus edulis   species  Wikidata    Q27855   
16092     Mytilus edulis   species  Wikidata    Q27855   
16093     Mytilus edulis   species  Wikidata    Q27855   

                                  TAXONDBURL  
0      https://www.wikidata.org/wiki/Q199788  
1      https://www.wikidata.org/wiki/Q199788  
2      https://www.wikidata.org/wiki/Q199788  
3      https://www.wikidata.org/wiki/Q199788  
4      https://www.wikidata.org/wiki/Q199788  
...                                      ...  
16089  https://www.wikidata.org/wiki/Q754755  
16090  https://www.wikidata.org/wiki/Q754755  
16091   https://www.wikidata.org/wiki/Q27855  
16092   https://www.wikidata.org/wiki/Q27855  
16093   https://www.wikidata.org/wiki/Q27855  

[16094 rows x 5 columns]

Standardize Time

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        DecodeTimeCB(),
    ]
)

tfm()

print(tfm.dfs['BIOTA']['TIME'])
0       2012-09-23
1       2012-09-23
2       2012-09-23
3       2012-09-23
4       2012-09-23
           ...    
16089   2022-05-10
16090   2022-05-10
16091   2022-09-15
16092   2022-09-15
16093   2022-09-15
Name: TIME, Length: 16094, dtype: datetime64[ns]

Add Sample Type ID

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        AddSampleTypeIdColumnCB(),
    ]
)

tfm()
print(tfm.dfs['SEAWATER']['SAMPLE_TYPE'].unique())
print(tfm.dfs['BIOTA']['SAMPLE_TYPE'].unique())
print(tfm.dfs['SEDIMENT']['SAMPLE_TYPE'].unique())
[1]
[2]
[3]

Add Reference ID

Include the ref_id (i.e., Zotero Archive Location). The ZoteroArchiveLocationCB performs a lookup of the Zotero Archive Location based on the Zotero key defined in the global attributes of the MARIS NetCDF file as id.

contents.global_attrs['id']
'26VMZZ2Q'

source

AddZoteroArchiveLocationCB

 AddZoteroArchiveLocationCB (attrs:str, cfg:dict)

Fetch and append ‘Loc. in Archive’ from Zotero to DataFrame.

contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        AddZoteroArchiveLocationCB(contents.global_attrs, cfg=cfg()),
    ]
)
tfm()
print(tfm.dfs['SEAWATER']['REF_ID'].unique())
[100]

Remap to Open Refine specific mappings

Tip

FEEDBACK FOR NEXT VERSION: The current approach of remapping to OR-specific mappings should be reconsidered. Considering that we already utilize MARISCO lookup tables in NetCDF for creating enums, it would be beneficial to extend their use to OpenRefine data formats as well. By doing so, we could eliminate the need for OpenRefine-specific mappings, streamlining the data transformation process. Lets review the lookup tables used to create the enums for NetCDF:

enums = Enums(lut_src_dir=lut_path())
print(f'DL enums: {enums.types["DL"]}')
print(f'FILT enums: {enums.types["FILT"]}')
DL enums: {'Not applicable': -1, 'Not available': 0, 'Detected value': 1, 'Detection limit': 2, 'Not detected': 3, 'Derived': 4}
FILT enums: {'Not applicable': -1, 'Not available': 0, 'Yes': 1, 'No': 2}

For the detection limit lookup table (LUT), as shown below, the values required for the OpenRefine CSV format are listed under the ‘name’ column, whereas the enums utilize the ‘name_sanitized’ column. Additionally, for the filtered LUT, also shown below, the values do not align consistently with the OpenRefine CSV format, which uses (Y, N, NA).

dl_lut = pd.read_excel(detection_limit_lut_path())
dl_lut
id name name_sanitized
0 -1 Not applicable Not applicable
1 0 Not Available Not available
2 1 = Detected value
3 2 < Detection limit
4 3 ND Not detected
5 4 DE Derived
filtered_lut = pd.read_excel(filtered_lut_path())
filtered_lut
id name
0 -1 Not applicable
1 0 Not available
2 1 Yes
3 2 No

We will create OpenRefine specific mappings for the detection limit and filtered data:

RemapToORSpecificMappingsCB remaps the values of the detection limit and filtered data to the OpenRefine CSV format.


source

RemapToORSpecificMappingsCB

 RemapToORSpecificMappingsCB (or_mappings:Dict[str,Dict]={'DL': {0: 'ND',
                              1: '=', 2: '<'}, 'FILT': {0: 'NA', 1: 'Y',
                              2: 'N'}},
                              output_format:str='openrefine_csv',
                              verbose:bool=False)

Convert values using OR mappings if columns exist in dataframe.

Type Default Details
or_mappings Dict {‘DL’: {0: ‘ND’, 1: ‘=’, 2: ‘<’}, ‘FILT’: {0: ‘NA’, 1: ‘Y’, 2: ‘N’}} Dictionary of column mappings,
output_format str openrefine_csv
verbose bool False
contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        RemapToORSpecificMappingsCB(),
    ]
)

tfm()

# Loop through each group in the 'dfs' dictionary
for group_name, df in tfm.dfs.items():
    # Check if the group dataframe contains any of the columns specified in or_mappings.keys()
    relevant_columns = [col for col in or_mappings.keys() if col in df.columns]
    if relevant_columns:
        # Print the unique values from the relevant columns
        print(f"\nUnique values in {group_name} for columns {relevant_columns}:")
        for col in relevant_columns:
            print(f"{col}: {df[col].unique()}")
    else:
        print(f"No relevant columns found in {group_name} based on or_mappings keys.")

Unique values in BIOTA for columns ['DL']:
DL: ['<' '=' 'ND']

Unique values in SEAWATER for columns ['DL', 'FILT']:
DL: ['=' '<' 'ND']
FILT: ['NA' 'N' 'Y']

Unique values in SEDIMENT for columns ['DL']:
DL: ['=' '<' 'ND']

Remap to CSV data type format

CSV_DTYPES (defined in configs.ipynb) defines a state for each variable that contains a lookup table (i.e. enums). The state is either ‘decoded’ or ‘encoded’. Lets review the variable states as a DataFrame:

with pd.option_context('display.max_columns', None, 'display.max_colwidth', None):
    display(pd.DataFrame.from_dict(CSV_DTYPES, orient='index').T)
AREA NUCLIDE UNIT DL FILT COUNT_MET SAMP_MET PREP_MET SPECIES BODY_PART SED_TYPE LAB
state decoded encoded encoded decoded decoded encoded encoded encoded encoded encoded encoded encoded
Tip

FEEDBACK FOR NEXT VERSION: Should we use the enums in the NetCDF file or the enums in the Marisco package? While they are currently the same, inconsistencies might arise over time. I chose to use the enums in the Marisco package because small changes to the enum descriptions can be easily implemented there, ensuring those updates are reflected in the CSV output.

enums=Enums(lut_src_dir=lut_path())
enums.types.keys()
dict_keys(['AREA', 'BIO_GROUP', 'BODY_PART', 'COUNT_MET', 'DL', 'FILT', 'NUCLIDE', 'PREP_MET', 'SAMP_MET', 'SED_TYPE', 'SPECIES', 'UNIT', 'LAB'])

source

get_excluded_enums

 get_excluded_enums (output_format:str='openrefine_csv')

Get excluded enums based on output format.


source

DataFormatConversionCB

 DataFormatConversionCB (dtypes:Dict, excluded_mappings:Callable=<function
                         get_excluded_enums>,
                         output_format:str='openrefine_csv',
                         verbose:bool=False)

A callback to convert DataFrame enum values between encoded and decoded formats based on specified settings.

Type Default Details
dtypes Dict Dictionary defining data types and states for each lookup table
excluded_mappings Callable get_excluded_enums Dictionary of columns to exclude from conversion
output_format str openrefine_csv
verbose bool False Flag for verbose output
contents = ExtractNetcdfContents(fname_in)
tfm = Transformer(
    contents.dfs,
    cbs=[
        RemoveNonCompatibleVariablesCB(vars=CSV_VARS, verbose=True),
        DataFormatConversionCB(
            dtypes=CSV_DTYPES,
            excluded_mappings = get_excluded_enums,
            output_format='openrefine_csv',
            verbose=True
        ),
    ]
)
tfm()
print('\n')
Removing variables that are not compatible with vars provided. 
Removing BIO_GROUP from BIOTA dataset.
Loaded enums: dict_keys(['AREA', 'BIO_GROUP', 'BODY_PART', 'COUNT_MET', 'DL', 'FILT', 'NUCLIDE', 'PREP_MET', 'SAMP_MET', 'SED_TYPE', 'SPECIES', 'UNIT', 'LAB'])

Review all callbacks

contents = ExtractNetcdfContents(fname_in)
output_format = 'openrefine_csv'
tfm = Transformer(
    contents.dfs,
    cbs=[
        ValidateEnumsCB(
            contents = contents,
            maris_enums=Enums(lut_src_dir=lut_path())
        ),
        RemoveNonCompatibleVariablesCB(vars=CSV_VARS) ,
        RemapToORSpecificMappingsCB(output_format=output_format),
        AddTaxonInformationCB(
            fn_lut=lut_taxon
        ),
        DecodeTimeCB(),
        AddSampleTypeIdColumnCB(),
        AddZoteroArchiveLocationCB(contents.global_attrs, cfg=cfg()),
        DataFormatConversionCB(
            dtypes=CSV_DTYPES,
            excluded_mappings = get_excluded_enums,
            output_format=output_format,
        ) 
        ]
)
tfm()
for grp in ['SEAWATER', 'BIOTA']:
    display(Markdown(f"<b>Head of the transformed `{grp}` DataFrame:</b>"))
    with pd.option_context('display.max_rows', None):
        display(tfm.dfs[grp].head())

Head of the transformed SEAWATER DataFrame:

LON LAT SMP_DEPTH TOT_DEPTH TIME NUCLIDE VALUE UNIT UNC DL FILT SAMPLE_TYPE REF_ID
0 29.333300 60.083302 0.0 NaN 2012-05-23 33 5.300000 1 1.696 = NA 1 100
1 29.333300 60.083302 29.0 NaN 2012-05-23 33 19.900000 1 3.980 = NA 1 100
2 23.150000 59.433300 0.0 NaN 2012-06-17 33 25.500000 1 5.100 = NA 1 100
3 27.983299 60.250000 0.0 NaN 2012-05-24 33 17.000000 1 4.930 = NA 1 100
4 27.983299 60.250000 39.0 NaN 2012-05-24 33 22.200001 1 3.996 = NA 1 100

Head of the transformed BIOTA DataFrame:

LON LAT SMP_DEPTH TIME NUCLIDE VALUE UNIT UNC DL SPECIES ... DRYWT WETWT PERCENTWT TAXONNAME TAXONRANK TAXONDB TAXONDBID TAXONDBURL SAMPLE_TYPE REF_ID
0 12.316667 54.283333 NaN 2012-09-23 31 0.010140 5 NaN < 99 ... 174.934433 948.0 0.18453 Gadus morhua species Wikidata Q199788 https://www.wikidata.org/wiki/Q199788 2 100
1 12.316667 54.283333 NaN 2012-09-23 4 135.300003 5 4.830210 = 99 ... 174.934433 948.0 0.18453 Gadus morhua species Wikidata Q199788 https://www.wikidata.org/wiki/Q199788 2 100
2 12.316667 54.283333 NaN 2012-09-23 9 0.013980 5 NaN < 99 ... 174.934433 948.0 0.18453 Gadus morhua species Wikidata Q199788 https://www.wikidata.org/wiki/Q199788 2 100
3 12.316667 54.283333 NaN 2012-09-23 33 4.338000 5 0.150962 = 99 ... 174.934433 948.0 0.18453 Gadus morhua species Wikidata Q199788 https://www.wikidata.org/wiki/Q199788 2 100
4 12.316667 54.283333 NaN 2012-09-23 31 0.009614 5 NaN < 99 ... 177.935120 964.0 0.18458 Gadus morhua species Wikidata Q199788 https://www.wikidata.org/wiki/Q199788 2 100

5 rows × 21 columns

Decode


source

decode

 decode (fname_in:str, dest_out:str|None=None,
         output_format:str='openrefine_csv',
         remap_vars:Dict[str,str]={'LON': 'longitude', 'LAT': 'latitude',
         'SMP_DEPTH': 'sampdepth', 'TOT_DEPTH': 'totdepth', 'TIME':
         'begperiod', 'AREA': 'area', 'NUCLIDE': 'nuclide_id', 'VALUE':
         'activity', 'UNIT': 'unit_id', 'UNC': 'uncertaint', 'DL':
         'detection', 'FILT': 'filtered', 'COUNT_MET': 'counmet_id',
         'SAMP_MET': 'sampmet_id', 'PREP_MET': 'prepmet_id', 'VOL':
         'volume', 'SAL': 'salinity', 'TEMP': 'temperatur', 'SPECIES':
         'species_id', 'BODY_PART': 'bodypar_id', 'SED_TYPE':
         'sedtype_id', 'TOP': 'sliceup', 'BOTTOM': 'slicedown', 'DRYWT':
         'drywt', 'WETWT': 'wetwt', 'PERCENTWT': 'percentwt', 'LAB':
         'lab_id', 'PROFILE_ID': 'profile_id', 'SAMPLE_TYPE':
         'samptype_id', 'TAXONNAME': 'taxonname', 'TAXONREPNAME':
         'taxonrepname', 'TAXONRANK': 'taxonrank', 'TAXONDB': 'taxondb',
         'TAXONDBID': 'taxondb_id', 'TAXONDBURL': 'taxondb_url', 'REF_ID':
         'ref_id', 'SMP_ID': 'samplelabcode'},
         remap_dtypes:Dict[str,str]={'AREA': {'state': 'decoded'},
         'NUCLIDE': {'state': 'encoded'}, 'UNIT': {'state': 'encoded'},
         'DL': {'state': 'decoded'}, 'FILT': {'state': 'decoded'},
         'COUNT_MET': {'state': 'encoded'}, 'SAMP_MET': {'state':
         'encoded'}, 'PREP_MET': {'state': 'encoded'}, 'SPECIES':
         {'state': 'encoded'}, 'BODY_PART': {'state': 'encoded'},
         'SED_TYPE': {'state': 'encoded'}, 'LAB': {'state': 'encoded'}},
         verbose:bool=False, **kwargs)

Decode data from NetCDF.

Type Default Details
fname_in str Input file name
dest_out str | None None Output file name (optional)
output_format str openrefine_csv
remap_vars Dict {‘LON’: ‘longitude’, ‘LAT’: ‘latitude’, ‘SMP_DEPTH’: ‘sampdepth’, ‘TOT_DEPTH’: ‘totdepth’, ‘TIME’: ‘begperiod’, ‘AREA’: ‘area’, ‘NUCLIDE’: ‘nuclide_id’, ‘VALUE’: ‘activity’, ‘UNIT’: ‘unit_id’, ‘UNC’: ‘uncertaint’, ‘DL’: ‘detection’, ‘FILT’: ‘filtered’, ‘COUNT_MET’: ‘counmet_id’, ‘SAMP_MET’: ‘sampmet_id’, ‘PREP_MET’: ‘prepmet_id’, ‘VOL’: ‘volume’, ‘SAL’: ‘salinity’, ‘TEMP’: ‘temperatur’, ‘SPECIES’: ‘species_id’, ‘BODY_PART’: ‘bodypar_id’, ‘SED_TYPE’: ‘sedtype_id’, ‘TOP’: ‘sliceup’, ‘BOTTOM’: ‘slicedown’, ‘DRYWT’: ‘drywt’, ‘WETWT’: ‘wetwt’, ‘PERCENTWT’: ‘percentwt’, ‘LAB’: ‘lab_id’, ‘PROFILE_ID’: ‘profile_id’, ‘SAMPLE_TYPE’: ‘samptype_id’, ‘TAXONNAME’: ‘taxonname’, ‘TAXONREPNAME’: ‘taxonrepname’, ‘TAXONRANK’: ‘taxonrank’, ‘TAXONDB’: ‘taxondb’, ‘TAXONDBID’: ‘taxondb_id’, ‘TAXONDBURL’: ‘taxondb_url’, ‘REF_ID’: ‘ref_id’, ‘SMP_ID’: ‘samplelabcode’}
remap_dtypes Dict {‘AREA’: {‘state’: ‘decoded’}, ‘NUCLIDE’: {‘state’: ‘encoded’}, ‘UNIT’: {‘state’: ‘encoded’}, ‘DL’: {‘state’: ‘decoded’}, ‘FILT’: {‘state’: ‘decoded’}, ‘COUNT_MET’: {‘state’: ‘encoded’}, ‘SAMP_MET’: {‘state’: ‘encoded’}, ‘PREP_MET’: {‘state’: ‘encoded’}, ‘SPECIES’: {‘state’: ‘encoded’}, ‘BODY_PART’: {‘state’: ‘encoded’}, ‘SED_TYPE’: {‘state’: ‘encoded’}, ‘LAB’: {‘state’: ‘encoded’}}
verbose bool False
kwargs
Returns None Additional arguments
fname = Path('../../_data/output/100-HELCOM-MORS-2024.nc')
decode(fname_in=fname, dest_out=fname.with_suffix(''), output_format='openrefine_csv')