from IPython.display import display, Markdown
Data format transformation
A data pipeline handler that transforms MARIS data from NetCDF to CSV. The primary focus is on converting NetCDF data into MARIS Standard Open-Refine CSV format while preserving data integrity. This handler implements a modular transformation pipeline using callbacks for each processing step, ensuring flexibility and extensibility in data handling.
For new MARIS users, please refer to field definitions for detailed information about Maris fields.
Dependencies
Required packages and internal modules for data format transformations
Configuration and File Paths
= Path('../../_data/output/100-HELCOM-MORS-2024.nc')
fname_in = fname_in.with_suffix('.csv')
fname_out = 'openrefine_csv' output_format
Data Loading
Load data from standardized MARIS NetCDF files using ExtractNetcdfContents. The NetCDF files follow CF conventions and include standardized variable names and metadata according to MARIS specifications.
=ExtractNetcdfContents(fname_in) contents
Show the dictionary of dataframes extracted from the NetCDF file.
contents.dfs
{'BIOTA': LON LAT SMP_DEPTH TIME NUCLIDE VALUE UNIT \
0 12.316667 54.283333 NaN 1348358400 31 0.010140 5
1 12.316667 54.283333 NaN 1348358400 4 135.300003 5
2 12.316667 54.283333 NaN 1348358400 9 0.013980 5
3 12.316667 54.283333 NaN 1348358400 33 4.338000 5
4 12.316667 54.283333 NaN 1348358400 31 0.009614 5
... ... ... ... ... ... ... ...
16089 21.395000 61.241501 2.0 1652140800 33 13.700000 4
16090 21.395000 61.241501 2.0 1652140800 9 0.500000 4
16091 21.385000 61.343334 NaN 1663200000 4 50.700001 4
16092 21.385000 61.343334 NaN 1663200000 33 0.880000 4
16093 21.385000 61.343334 NaN 1663200000 12 6.600000 4
UNC DL BIO_GROUP SPECIES BODY_PART DRYWT WETWT \
0 NaN 2 4 99 52 174.934433 948.0
1 4.830210 1 4 99 52 174.934433 948.0
2 NaN 2 4 99 52 174.934433 948.0
3 0.150962 1 4 99 52 174.934433 948.0
4 NaN 2 4 99 52 177.935120 964.0
... ... .. ... ... ... ... ...
16089 0.520600 1 11 96 55 NaN NaN
16090 0.045500 1 11 96 55 NaN NaN
16091 4.106700 1 14 129 1 NaN NaN
16092 0.140800 1 14 129 1 NaN NaN
16093 0.349800 1 14 129 1 NaN NaN
PERCENTWT
0 0.18453
1 0.18453
2 0.18453
3 0.18453
4 0.18458
... ...
16089 NaN
16090 NaN
16091 NaN
16092 NaN
16093 NaN
[16094 rows x 15 columns],
'SEAWATER': LON LAT SMP_DEPTH TOT_DEPTH TIME NUCLIDE \
0 29.333300 60.083302 0.0 NaN 1337731200 33
1 29.333300 60.083302 29.0 NaN 1337731200 33
2 23.150000 59.433300 0.0 NaN 1339891200 33
3 27.983299 60.250000 0.0 NaN 1337817600 33
4 27.983299 60.250000 39.0 NaN 1337817600 33
... ... ... ... ... ... ...
21468 13.499833 54.600334 0.0 47.0 1686441600 1
21469 13.499833 54.600334 45.0 47.0 1686441600 1
21470 14.200833 54.600334 0.0 11.0 1686614400 1
21471 14.665500 54.600334 0.0 20.0 1686614400 1
21472 14.330000 54.600334 0.0 17.0 1686614400 1
VALUE UNIT UNC DL FILT
0 5.300000 1 1.696000 1 0
1 19.900000 1 3.980000 1 0
2 25.500000 1 5.100000 1 0
3 17.000000 1 4.930000 1 0
4 22.200001 1 3.996000 1 0
... ... ... ... .. ...
21468 702.838074 1 51.276207 1 0
21469 725.855713 1 52.686260 1 0
21470 648.992920 1 48.154419 1 0
21471 627.178406 1 46.245316 1 0
21472 605.715088 1 45.691143 1 0
[21473 rows x 11 columns],
'SEDIMENT': LON LAT TOT_DEPTH TIME NUCLIDE VALUE \
0 27.799999 60.466667 25.0 1337904000 33 1200.000000
1 27.799999 60.466667 25.0 1337904000 33 250.000000
2 27.799999 60.466667 25.0 1337904000 33 140.000000
3 27.799999 60.466667 25.0 1337904000 33 79.000000
4 27.799999 60.466667 25.0 1337904000 33 29.000000
... ... ... ... ... ... ...
70444 15.537800 54.617832 62.0 1654646400 67 0.044000
70445 15.537800 54.617832 62.0 1654646400 77 2.500000
70446 15.537800 54.617832 62.0 1654646400 4 5873.000000
70447 15.537800 54.617832 62.0 1654646400 33 21.200001
70448 15.537800 54.617832 62.0 1654646400 77 0.370000
UNIT UNC DL SED_TYPE TOP BOTTOM PERCENTWT
0 4 240.000000 1 0 15.0 20.0 NaN
1 4 50.000000 1 0 20.0 25.0 NaN
2 4 29.400000 1 0 25.0 30.0 NaN
3 4 15.800000 1 0 30.0 35.0 NaN
4 4 6.960000 1 0 35.0 40.0 NaN
... ... ... .. ... ... ... ...
70444 4 0.015312 1 10 15.0 17.0 0.257642
70445 4 0.185000 1 10 15.0 17.0 0.257642
70446 4 164.444000 1 10 17.0 19.0 0.263965
70447 4 2.162400 1 10 17.0 19.0 0.263965
70448 4 0.048100 1 10 17.0 19.0 0.263965
[70449 rows x 13 columns]}
Show the dictionary of enums extracted from the NetCDF file.
contents.enum_dicts
{'BIOTA': {'nuclide': {'NOT APPLICABLE': '-1',
'NOT AVAILABLE': '0',
'h3': '1',
'be7': '2',
'c14': '3',
'k40': '4',
'cr51': '5',
'mn54': '6',
'co57': '7',
'co58': '8',
'co60': '9',
'zn65': '10',
'sr89': '11',
'sr90': '12',
'zr95': '13',
'nb95': '14',
'tc99': '15',
'ru103': '16',
'ru106': '17',
'rh106': '18',
'ag106m': '19',
'ag108': '20',
'ag108m': '21',
'ag110m': '22',
'sb124': '23',
'sb125': '24',
'te129m': '25',
'i129': '28',
'i131': '29',
'cs127': '30',
'cs134': '31',
'cs137': '33',
'ba140': '34',
'la140': '35',
'ce141': '36',
'ce144': '37',
'pm147': '38',
'eu154': '39',
'eu155': '40',
'pb210': '41',
'pb212': '42',
'pb214': '43',
'bi207': '44',
'bi211': '45',
'bi214': '46',
'po210': '47',
'rn220': '48',
'rn222': '49',
'ra223': '50',
'ra224': '51',
'ra225': '52',
'ra226': '53',
'ra228': '54',
'ac228': '55',
'th227': '56',
'th228': '57',
'th232': '59',
'th234': '60',
'pa234': '61',
'u234': '62',
'u235': '63',
'u238': '64',
'np237': '65',
'np239': '66',
'pu238': '67',
'pu239': '68',
'pu240': '69',
'pu241': '70',
'am240': '71',
'am241': '72',
'cm242': '73',
'cm243': '74',
'cm244': '75',
'cs134_137_tot': '76',
'pu239_240_tot': '77',
'pu239_240_iii_iv_tot': '78',
'pu239_240_v_vi_tot': '79',
'cm243_244_tot': '80',
'pu238_pu239_240_tot_ratio': '81',
'am241_pu239_240_tot_ratio': '82',
'cs137_134_ratio': '83',
'cd109': '84',
'eu152': '85',
'fe59': '86',
'gd153': '87',
'ir192': '88',
'pu238_240_tot': '89',
'rb86': '90',
'sc46': '91',
'sn113': '92',
'sn117m': '93',
'tl208': '94',
'mo99': '95',
'tc99m': '96',
'ru105': '97',
'te129': '98',
'te132': '99',
'i132': '100',
'i135': '101',
'cs136': '102',
'tbeta': '103',
'talpha': '104',
'i133': '105',
'th230': '106',
'pa231': '107',
'u236': '108',
'ag111': '109',
'in116m': '110',
'te123m': '111',
'sb127': '112',
'ba133': '113',
'ce139': '114',
'tl201': '116',
'hg203': '117',
'na22': '122',
'pa234m': '123',
'am243': '124',
'se75': '126',
'sr85': '127',
'y88': '128',
'ce140': '129',
'bi212': '130',
'u236_238_ratio': '131',
'i125': '132',
'ba137m': '133',
'u232': '134',
'pa233': '135',
'ru106_rh106_tot': '136',
'tu': '137',
'tbeta40k': '138',
'fe55': '139',
'ce144_pr144_tot': '140',
'pu240_pu239_ratio': '141',
'u233': '142',
'pu239_242_tot': '143',
'ac227': '144'},
'unit': {'Not applicable': '-1',
'NOT AVAILABLE': '0',
'Bq per m3': '1',
'Bq per m2': '2',
'Bq per kg': '3',
'Bq per kgd': '4',
'Bq per kgw': '5',
'kg per kg': '6',
'TU': '7',
'DELTA per mill': '8',
'atom per kg': '9',
'atom per kgd': '10',
'atom per kgw': '11',
'atom per l': '12',
'Bq per kgC': '13'},
'dl': {'Not applicable': '-1',
'Not available': '0',
'Detected value': '1',
'Detection limit': '2',
'Not detected': '3',
'Derived': '4'},
'bio_group': {'Not applicable': '-1',
'Not available': '0',
'Birds': '1',
'Crustaceans': '2',
'Echinoderms': '3',
'Fish': '4',
'Mammals': '5',
'Molluscs': '6',
'Others': '7',
'Plankton': '8',
'Polychaete worms': '9',
'Reptile': '10',
'Seaweeds and plants': '11',
'Cephalopods': '12',
'Gastropods': '13',
'Bivalves': '14'},
'species': {'NOT AVAILABLE': '0',
'Aristeus antennatus': '1',
'Apostichopus': '2',
'Saccharina japonica var religiosa': '3',
'Siganus fuscescens': '4',
'Alpheus dentipes': '5',
'Hexagrammos agrammus': '6',
'Ditrema temminckii': '7',
'Parapristipoma trilineatum': '8',
'Scombrops boops': '9',
'Pseudopleuronectes schrenki': '10',
'Desmarestia ligulata': '11',
'Saccharina japonica': '12',
'Neodilsea yendoana': '13',
'Costaria costata': '14',
'Sargassum yezoense': '15',
'Acanthephyra pelagica': '16',
'Sargassum ringgoldianum': '17',
'Acanthephyra quadrispinosa': '18',
'Sargassum thunbergii': '19',
'Sargassum patens': '20',
'Asterias rubens': '21',
'Sargassum miyabei': '22',
'Homarus gammarus': '23',
'Acanthephyra stylorostratis': '24',
'Acanthocybium solandri': '25',
'Acanthopagrus bifasciatus': '26',
'Acanthophora muscoides': '27',
'Acanthophora spicifera': '28',
'Acanthurus triostegus': '29',
'Actinopterygii': '30',
'Adamussium colbecki': '31',
'Ahnfeltiopsis densa': '32',
'Alepes melanoptera': '33',
'Ampharetidae': '34',
'Anchoviella lepidentostole': '35',
'Anguillidae': '36',
'Aphroditidae': '37',
'Arnoglossus': '38',
'Aurigequula fasciata': '39',
'Balaenoptera musculus': '40',
'Balaenoptera physalus': '41',
'Balistes': '42',
'Beryciformes': '43',
'Bryopsis maxima': '44',
'Callinectes sp': '45',
'Callorhinus ursinus': '46',
'Carassius auratus auratus': '47',
'Carcharhinus sorrah': '48',
'Caridae': '49',
'Clupea harengus': '50',
'Cathorops spixii': '51',
'Caulerpa racemosa': '52',
'Caulerpa scalpelliformis': '53',
'Caulerpa sertularioides': '54',
'Cellana radiata': '55',
'Coscinasterias tenuispina': '56',
'Centroceras clavulatum': '57',
'Centropomus parallelus': '58',
'Crangon crangon': '59',
'Ceramium diaphanum': '60',
'Ceramium rubrum': '61',
'Chaenocephalus aceratus': '62',
'Chaetodipterus faber': '63',
'Chaetomorpha antennina': '64',
'Chaetomorpha linoides': '65',
'Chelidonichthys kumu': '66',
'Chelon ramada': '67',
'Chiloscyllium': '68',
'Chionodraco hamatus': '69',
'Chlamys islandica': '70',
'Chlorophyta': '71',
'Chondrichthyes': '72',
'Chrysaora': '73',
'Cladophora nitellopsis': '74',
'Cladophora vagabunda': '75',
'Cladophoropsis membranacea': '76',
'Clupea': '77',
'Coccotylus truncatus': '78',
'Codium fragile': '79',
'Crassostrea': '80',
'Cynoscion acoupa': '81',
'Cynoscion jamaicensis': '82',
'Cynoscion leiarchus': '83',
'Engraulis encrasicolus': '84',
'Cypselurus agoo agoo': '85',
'Cystophora cristata': '86',
'Cystoseira barbata': '87',
'Cystoseira crinita': '88',
'Decapodiformes': '89',
'Decapterus russelli': '90',
'Decapterus scombrinus': '91',
'Delphinapterus leucas': '92',
'Delphinus capensis': '93',
'Diapterus rhombeus': '94',
'Dicentrarchus punctatus': '95',
'Fucus vesiculosus': '96',
'Funchalia woodwardi': '97',
'Ecklonia bicyclis': '98',
'Gadus morhua': '99',
'Ecklonia kurome': '100',
'Gennadas elegans': '101',
'Eisenia arborea': '102',
'Encrasicholina devisi': '103',
'Enteromorpha': '104',
'Enteromorpha flexuosa': '105',
'Enteromorpha intestinalis': '106',
'Epinephelinae': '107',
'Epinephelus diacanthus': '108',
'Exocoetidae': '109',
'Saccharina latissima': '110',
'Gracilaria corticata': '111',
'Ligur ensiferus': '112',
'Gracilaria debilis': '113',
'Gracilaria edulis': '114',
'Gracilariales': '115',
'Grateloupia elliptica': '116',
'Grateloupia filicina': '117',
'Lysmata seticaudata': '118',
'Gymnogongrus griffithsiae': '119',
'Mya arenaria': '120',
'Halichoerus grypus': '121',
'Macoma balthica': '122',
'Marthasterias glacialis': '123',
'Halimeda macroloba': '124',
'Harengula clupeola': '125',
'Harpagifer antarcticus': '126',
'Hemifusus ternatanus': '127',
'Hemiramphus brasiliensis': '128',
'Mytilus edulis': '129',
'Metapenaeus affinis': '130',
'Heteroscleromorpha': '131',
'Heterosigma akashiwo': '132',
'Hilsa ilisha': '133',
'Metapenaeus monoceros': '134',
'Metapenaeus stebbingi': '135',
'Holothuria': '136',
'Hoplobrotula armata': '137',
'Hypnea musciformis': '138',
'Merlangius merlangus': '139',
'Iridaea cordata': '140',
'Jania rubens': '141',
'Meganyctiphanes norvegica': '142',
'Johnius glaucus': '143',
'Kappaphycus': '144',
'Kappaphycus alvarezii': '145',
'Laevistrombus canarium': '146',
'Lagenodelphis hosei': '147',
'Lambia': '148',
'Laminaria japonica': '149',
'Laminaria longissima': '150',
'Larimus breviceps': '151',
'Laurencia papillosa': '152',
'Leiognathidae': '153',
'Leiognathus dussumieri': '154',
'Lepidochelys olivacea': '155',
'Leptonychotes weddellii': '156',
'Limanda yokohamae': '157',
'Nephrops norvegicus': '158',
'Neuston': '159',
'Littoraria undulata': '160',
'Loligo vulgaris': '161',
'Lumbrineridae': '162',
'Lutjanus fulviflamma': '163',
'Marginisporum aberrans': '164',
'Megalaspis cordyla': '165',
'Octopus vulgaris': '166',
'Menticirrhus americanus': '167',
'Mesoplodon densirostris': '168',
'Palaemon longirostris': '169',
'Metapenaeus brevicornis': '170',
'Pasiphaea multidentata': '171',
'Pasiphaea sivado': '172',
'Parapenaeopsis stylifera': '173',
'Miichthys miiuy': '174',
'Mirounga leonina': '175',
'Brachidontes striatulus': '176',
'Monodon monoceros': '177',
'Mugil platanus': '178',
'Penaeus semisulcatus': '179',
'Mullus barbatus': '180',
'Mycteroperca rubra': '181',
'Philocheras echinulatus': '182',
'Myelophycus simplex': '183',
'Mytilus coruscus': '184',
'Penaeus indicus': '185',
'Natator depressus': '186',
'Pandalus jordani': '187',
'Melicertus kerathurus': '188',
'Parapenaeus longirostris': '189',
'Plesionika': '190',
'Platichthys flesus': '191',
'Pleuronectes platessa': '192',
'Nematopalaemon tenuipes': '193',
'Nematoscelis difficilis': '194',
'Nemipterus': '195',
'Aegaeon lacazei': '196',
'Nephtyidae': '197',
'Nereididae': '198',
'Netuma bilineata': '199',
'Nibea maculata': '200',
'Oceana serrulata': '201',
'Palaemon serratus': '202',
'Ocypode': '203',
'Odobenus rosmarus': '204',
'Ogcocephalus vespertilio': '205',
'Oligoplites saurus': '206',
'Onuphidae': '207',
'Opheliidae': '208',
'Opisthonema oglinum': '209',
'Opisthopterus tardoore': '210',
'Orientomysis mitsukurii': '211',
'Otolithes cuvieri': '212',
'Padina pavonica': '213',
'Padina tetrastromatica': '214',
'Padina vickersiae': '215',
'Pagellus affinis': '216',
'Pagophilus groenlandicus': '217',
'Paguroidea': '218',
'Pagurus': '219',
'Systellaspis debilis': '220',
'Sergestes': '221',
'Sergestes arcticus': '222',
'Pampus argenteus': '223',
'Sergestes arachnipodus': '224',
'Sergestes henseni': '225',
'Sergestes prehensilis': '226',
'Sergestes robustus': '227',
'Pangasius pangasius': '228',
'Panulirus homarus': '229',
'Paracentrotus lividus': '230',
'Pasiphaea sp': '231',
'Pectinariidae': '232',
'Penaeus': '233',
'Phoca vitulina': '234',
'Photopectoralis bindus': '235',
'Phyllospadix iwatensis': '236',
'Plectorhinchus mediterraneus': '237',
'Pleuronectes mochigarei': '238',
'Pleuronectes obscurus': '239',
'Plocamium brasiliense': '240',
'Polynemus paradiseus': '241',
'Polysiphonia': '242',
'Sprattus sprattus': '243',
'Scomber scombrus': '244',
'Polysiphonia fucoides': '245',
'Gonostomatidae': '246',
'Perca fluviatilis': '247',
'Pomadasys crocro': '248',
'Porphyra tenera': '249',
'Potamogeton pectinatus': '250',
'Priacanthus hamrur': '251',
'Pseudorhombus malayanus': '252',
'Pterocladiella capillacea': '253',
'Pusa caspica': '254',
'Pusa sibirica': '255',
'Pylaiella littoralis': '256',
'Sabellidae': '257',
'Salangichthys ishikawae': '258',
'Sarconema filiforme': '259',
'Sardinella albella': '260',
'Sardinella brasiliensis': '261',
'Sardinops melanostictus': '262',
'Sargassum cymosum': '263',
'Sargassum linearifolium': '264',
'Sargassum micracanthum': '265',
'Xiphias gladius': '266',
'Sargassum novae hollandiae': '267',
'Sargassum oligocystum': '268',
'Esox lucius': '269',
'Limanda limanda': '270',
'Abramis brama': '271',
'Anguilla anguilla': '272',
'Arctica islandica': '273',
'Cerastoderma edule': '274',
'Cyprinus carpio': '275',
'Echinodermata': '276',
'Fish larvae': '277',
'Myoxocephalus scorpius': '278',
'Osmerus eperlanus': '279',
'Plankton': '280',
'Scophthalmus maximus': '281',
'Rhodophyta': '282',
'Rutilus rutilus': '283',
'Saduria entomon': '284',
'Sander lucioperca': '285',
'Gasterosteus aculeatus': '286',
'Zoarces viviparus': '287',
'Gymnocephalus cernua': '288',
'Furcellaria lumbricalis': '289',
'Cladophora glomerata': '290',
'Lateolabrax japonicus': '291',
'Okamejei kenojei': '292',
'Sebastes pachycephalus': '293',
'Squalus acanthias': '294',
'Gadus macrocephalus': '295',
'Paralichthys olivaceus': '296',
'Ovalipes punctatus': '297',
'Pseudopleuronectes yokohamae': '298',
'Hemitripterus villosus': '299',
'Clidoderma asperrimum': '300',
'Microstomus achne': '301',
'Lepidotrigla microptera': '302',
'Hexagrammos otakii': '303',
'Kareius bicoloratus': '304',
'Pleuronichthys cornutus': '305',
'Enteroctopus dofleini': '306',
'Ammodytes personatus': '307',
'Lophius litulon': '308',
'Eopsetta grigorjewi': '309',
'Takifugu porphyreus': '310',
'Loliolus japonica': '311',
'Sepia andreana': '312',
'Sebastes cheni': '313',
'Portunus trituberculatus': '314',
'Sebastes schlegelii': '315',
'Pennahia argentata': '316',
'Platichthys stellatus': '317',
'Gadus chalcogrammus': '318',
'Chelidonichthys spinosus': '319',
'Conger myriaster': '320',
'Heterololigo bleekeri': '321',
'Stichaeus grigorjewi': '322',
'Pseudopleuronectes herzensteini': '323',
'Octopus conispadiceus': '324',
'Hippoglossoides dubius': '325',
'Cleisthenes pinetorum': '326',
'Glyptocephalus stelleri': '327',
'Tanakius kitaharae': '328',
'Nibea mitsukurii': '329',
'Dasyatis matsubarai': '330',
'Verasper moseri': '331',
'Hemitrygon akajei': '332',
'Triakis scyllium': '333',
'Trachurus japonicus': '334',
'Zeus faber': '335',
'Pagrus major': '336',
'Acanthopagrus schlegelii': '337',
'Dentex tumifrons': '338',
'Mustelus manazo': '339',
'Seriola quinqueradiata': '340',
'Hyperoglyphe japonica': '341',
'Carcharhinus': '342',
'Platycephalus': '343',
'Scomber japonicus': '344',
'Squatina japonica': '345',
'Alopias pelagicus': '346',
'Zenopsis nebulosa': '347',
'Cynoglossus joyneri': '348',
'Verasper variegatus': '349',
'Oncorhynchus keta': '350',
'Physiculus japonicus': '351',
'Oplegnathus punctatus': '352',
'Arothron hispidus': '353',
'Stereolepis doederleini': '354',
'Takifugu snyderi': '355',
'Scomber australasicus': '356',
'Liparis tanakae': '357',
'Thamnaconus modestus': '358',
'Gnathophis nystromi': '359',
'Sebastes oblongus': '360',
'Sebastiscus marmoratus': '361',
'Takifugu pardalis': '362',
'Mugil cephalus': '363',
'Ditrema temminckii temminckii': '364',
'Konosirus punctatus': '365',
'Tribolodon brandtii': '366',
'Oncorhynchus masou': '367',
'Aluterus monoceros': '368',
'Todarodes pacificus': '369',
'Myoxocephalus stelleri': '370',
'Myliobatis tobijei': '371',
'Scyliorhinus torazame': '372',
'Lophiomus setigerus': '373',
'Heterodontus japonicus': '374',
'Sebastes vulpes': '375',
'Paraplagusia japonica': '376',
'Ostrea edulis': '377',
'Melanogrammus aeglefinus': '378',
'Pollachius virens': '379',
'Pollachius pollachius': '380',
'Sebastes marinus': '381',
'Anarhichas minor': '382',
'Anarhichas denticulatus': '383',
'Reinhardtius hippoglossoides': '384',
'Trisopterus esmarkii': '385',
'Micromesistius poutassou': '386',
'Coryphaenoides rupestris': '387',
'Argentina silus': '388',
'Salmo salar': '389',
'Sebastes viviparus': '390',
'Buccinum undatum': '391',
'Fucus serratus': '392',
'Merluccius merluccius': '393',
'Littorina littorea': '394',
'Fucus': '395',
'Rhodymenia': '396',
'Solea solea': '397',
'Trachurus trachurus': '398',
'Eutrigla gurnardus': '399',
'Pelvetia canaliculata': '400',
'Ascophyllum nodosum': '401',
'Mallotus villosus': '402',
'Pecten maximus': '403',
'Hippoglossoides platessoides': '404',
'Sebastes mentella': '405',
'Modiolus modiolus': '406',
'Boreogadus saida': '407',
'Sepia': '408',
'Gadus': '409',
'Sardina pilchardus': '410',
'Pleuronectiformes': '411',
'Molva molva': '412',
'Patella': '413',
'Crassostrea gigas': '414',
'Dasyatis pastinaca': '415',
'Lophius piscatorius': '416',
'Porphyra umbilicalis': '417',
'Patella vulgata': '418',
'Brosme brosme': '419',
'Glyptocephalus cynoglossus': '420',
'Galeus melastomus': '421',
'Chimaera monstrosa': '422',
'Etmopterus spinax': '423',
'Dicentrarchus labrax': '424',
'Osilinus lineatus': '425',
'Hippoglossus hippoglossus': '426',
'Cyclopterus lumpus': '427',
'Molva dypterygia': '428',
'Microstomus kitt': '429',
'Fucus distichus': '430',
'Tapes': '431',
'Sebastes norvegicus': '432',
'Phycis blennoides': '433',
'Fucus spiralis': '434',
'Laminaria digitata': '435',
'Dipturus batis': '436',
'Anarhichas lupus': '437',
'Lumpenus lampretaeformis': '438',
'Lycodes vahlii': '439',
'Argentina sphyraena': '440',
'Trisopterus minutus': '441',
'Thunnus': '442',
'Hyperoplus lanceolatus': '443',
'Gaidropsarus argentatus': '444',
'Engraulis japonicus': '445',
'Mytilus galloprovincialis': '446',
'Undaria pinnatifida': '447',
'Chlorophthalmus albatrossis': '448',
'Sargassum fusiforme': '449',
'Eisenia bicyclis': '450',
'Spisula sachalinensis': '451',
'Strongylocentrotus nudus': '452',
'Haliotis discus hannai': '453',
'Dexistes rikuzenius': '454',
'Ruditapes philippinarum': '455',
'Apostichopus japonicus': '456',
'Pterothrissus gissu': '457',
'Helicolenus hilgendorfii': '458',
'Buccinum isaotakii': '459',
'Neptunea intersculpta': '460',
'Apostichopus nigripunctatus': '461',
'Sebastes thompsoni': '462',
'Oratosquilla oratoria': '463',
'Oncorhynchus kisutch': '464',
'Erimacrus isenbeckii': '465',
'Sillago japonica': '466',
'Trachysalambria curvirostris': '467',
'Mytilus unguiculatus': '468',
'Crassostrea nippona': '469',
'Laminariales': '470',
'Uroteuthis edulis': '471',
'Takifugu poecilonotus': '472',
'Neptunea arthritica': '473',
'Katsuwonus pelamis': '474',
'Doederleinia berycoides': '475',
'Metapenaeopsis dalei': '476',
'Seriola dumerili': '477',
'Pseudorhombus pentophthalmus': '478',
'Stephanolepis cirrhifer': '479',
'Cookeolus japonicus': '480',
'Panulirus japonicus': '481',
'Thunnus orientalis': '482',
'Halocynthia roretzi': '483',
'Etrumeus sadina': '484',
'Cololabis saira': '485',
'Coryphaena hippurus': '486',
'Sarda orientalis': '487',
'Octopus ocellatus': '488',
'Sardinops sagax': '489',
'Sphyraena pinguis': '490',
'Sebastes ventricosus': '491',
'Occella iburia': '492',
'Glossanodon semifasciatus': '493',
'Mizuhopecten yessoensis': '494',
'Neosalangichthys ishikawae': '495',
'Bothrocara tanakae': '496',
'Malacocottus zonurus': '497',
'Coelorinchus macrochir': '498',
'Neptunea constricta': '499',
'Beringius polynematicus': '500',
'Sebastes nivosus': '501',
'Pandalus eous': '502',
'Synaphobranchus kaupii': '503',
'Sebastolobus macrochir': '504',
'Marsupenaeus japonicus': '505',
'Japelion hirasei': '506',
'Pleurogrammus azonus': '507',
'Monostroma nitidum': '508',
'Atheresthes evermanni': '509',
'Takifugu rubripes': '510',
'Chionoecetes opilio': '511',
'Pandalopsis coccinata': '512',
'Chionoecetes japonicus': '513',
'Sebastes matsubarae': '514',
'Scombrops gilberti': '515',
'Hyporhamphus sajori': '516',
'Trichiurus lepturus': '517',
'Alcichthys elongatus': '518',
'Volutharpa perryi': '519',
'Mercenaria stimpsoni': '520',
'Berryteuthis magister': '521',
'Aptocyclus ventricosus': '522',
'Euphausia pacifica': '523',
'Salangichthys microdon': '524',
'Telmessus acutidens': '525',
'Ceratophyllum demersum': '526',
'Pandalus nipponensis': '527',
'Sebastes owstoni': '528',
'Cociella crocodilus': '529',
'Conger japonicus': '530',
'Sardinella zunasi': '531',
'Cheilopogon pinnatibarbatus japonicus': '532',
'Oplegnathus fasciatus': '533',
'Macridiscus aequilatera': '534',
'Repomucenus ornatipinnis': '535',
'Clupea pallasii': '536',
'Scorpaena neglecta': '537',
'Scomberomorus niphonius': '538',
'Leucopsarion petersii': '539',
'Sebastes scythropus': '540',
'Strongylura anastomella': '541',
'Laemonema longipes': '542',
'Fusitriton oregonensis': '543',
'Japelion pericochlion': '544',
'Sebastes steindachneri': '545',
'Auxis rochei': '546',
'Lobotes surinamensis': '547',
'Auxis thazard': '548',
'Chlorophthalmus borealis': '549',
'Etelis coruscans': '550',
'Sebastes inermis': '551',
'Cynoglossus interruptus': '552',
'Erilepis zonifer': '553',
'Tridentiger obscurus': '554',
'Caranx sexfasciatus': '555',
'Thunnus thynnus': '556',
'Takifugu stictonotus': '557',
'Euthynnus affinis': '558',
'Synagrops japonicus': '559',
'Okamejei schmidti': '560',
'Suggrundus meerdervoortii': '561',
'Sebastes baramenuke': '562',
'Pleurogrammus monopterygius': '563',
'Decapterus maruadsi': '564',
'Girella punctata': '565',
'Sphyraena japonica': '566',
'Ommastrephes bartramii': '567',
'Sepiella japonica': '568',
'Sepioteuthis lessoniana': '569',
'Eucleoteuthis luminosa': '570',
'Gloiopeltis furcata': '571',
'Macrobrachium nipponense': '572',
'Sepia kobiensis': '573',
'Eriocheir japonica': '574',
'Magallana nippona': '575',
'Meretrix lusoria': '576',
'Chondrus ocellatus': '577',
'Chondrus elatus': '578',
'Gloiopeltis': '579',
'Holothuroidea': '580',
'Corbicula japonica': '581',
'Sunetta menstrualis': '582',
'Pseudorhombus cinnamoneus': '583',
'Takifugu niphobles': '584',
'Lagocephalus gloveri': '585',
'Beryx splendens': '586',
'Parastichopus nigripunctatus': '587',
'Venerupis philippinarum': '588',
'Haliotis': '589',
'Liparis agassizii': '590',
'Seriola lalandi': '591',
'Niphon spinosus': '592',
'Pleuronichthys japonicus': '593',
'Sergia lucens': '594',
'Sphoeroides pachygaster': '595',
'Coryphaenoides acrolepis': '596',
'Pseudopleuronectes obscurus': '597',
'Pyropia yezoensis': '598',
'Isurus oxyrinchus': '599',
'Sargassum fulvellum': '600',
'Prionace glauca': '601',
'Kajikia audax': '602',
'Thunnus albacares': '603',
'Thunnus alalunga': '604',
'Thunnus obesus': '605',
'Lamna ditropis': '606',
'Glyptocidaris crenularis': '607',
'Asterias amurensis': '608',
'Sepiida': '609',
'Congridae': '610',
'Takifugu': '611',
'Sargassum horneri': '612',
'Haliotis discus': '613',
'Pleuronectidae': '614',
'Acanthogobius flavimanus': '615',
'Acanthogobius lactipes': '616',
'Pholis nebulosa': '617',
'Hemigrapsus penicillatus': '618',
'Palaemon paucidens': '619',
'Mysidae': '620',
'Zostera marina': '621',
'Ulva pertusa': '622',
'Gobiidae': '623',
'Atherinidae': '624',
'Tribolodon': '625',
'Alpheus': '626',
'Polychaeta': '627',
'Sebastes': '628',
'Charybdis japonica': '629',
'Hemigrapsus': '630',
'Favonigobius gymnauchen': '631',
'Palaemon': '632',
'Planiliza haematocheila': '633',
'Palaemonidae': '634',
'Pholis crassispina': '635',
'Laminaria': '636',
'Distolasterias nipon': '637',
'Lophiiformes': '638',
'Alpheus brevicristatus': '639',
'Undaria undariodes': '640',
'Neomysis awatschensis': '641',
'Alpheidae': '642',
'Macrobrachium': '643',
'Hediste': '644',
'Gymnogobius breunigii': '645',
'Luidia quinaria': '646',
'Rhizoprionodon acutus': '647',
'Carangoides equula': '648',
'Carcinoplax longimana': '649',
'Anomura': '650',
'Spatangoida': '651',
'Plesiobatis daviesi': '652',
'Eusphyra blochii': '653',
'Ruditapes variegata': '654',
'Sinonovacula constricta': '655',
'Penaeus monodon': '656',
'Litopenaeus vannamei': '657',
'Solenocera crassicornis': '658',
'Stomatopoda': '659',
'Teuthida': '660',
'Octopus': '661',
'Larimichthys polyactis': '662',
'Scomberomorini': '663',
'Channa argus': '664',
'Ranina ranina': '665',
'Lates calcarifer': '666',
'Scomberomorus commerson': '667',
'Lutjanus malabaricus': '668',
'Thenus parindicus': '669',
'Amusium pleuronectes': '670',
'Loligo': '671',
'Plectropomus leopardus': '672',
'Sillago ciliata': '673',
'Scylla serrata': '674',
'Pinctada maxima': '675',
'Lutjanus argentimaculatus': '676',
'Protonibea diacanthus': '677',
'Polydactylus macrochir': '678',
'Rachycentron canadum': '679',
'Ibacus peronii': '680',
'Arripis trutta': '681',
'Sarda australis': '682',
'Seriola hippos': '683',
'Choerodon schoenleinii': '684',
'Panulirus ornatus': '685',
'Neotrygon kuhlii': '686',
'Lethrinus nebulosus': '687',
'Parupeneus multifasciatus': '688',
'Saccostrea cucullata': '689',
'Lutjanus sebae': '690',
'Thunnus maccoyii': '691',
'Acanthopagrus butcheri': '692',
'Lambis lambis': '693',
'Gerres subfasciatus': '694',
'Zooplankton': '695',
'Phytoplankton': '696',
'Rapana venosa': '697',
'Scapharca inaequivalvis': '698',
'Ulva intestinalis': '699',
'Ulva linza': '700',
'Ceramium virgatum': '701',
'Gayralia oxysperma': '702',
'Vertebrata fucoides': '703',
'Stuckenia pectinata': '704',
'Rochia nilotica': '705',
'Ctenochaetus striatus': '706',
'Serranidae': '707',
'Turbo setosus': '708',
'Pandalidae': '709',
'Gymnosarda unicolor': '710',
'Epinephelini': '711',
'Pisces': '712',
'Liza klunzingeri': '713',
'Acanthopagrus latus': '714',
'Liza subviridis': '715',
'Sparidentex hasta': '716',
'Otolithes ruber': '717',
'Crenidens crenidens': '718',
'Ensis': '719',
'Gastropoda': '720',
'Euheterodonta': '721',
'Scomber': '722',
'Theragra chalcogramma': '723',
'Engraulidae': '724',
'Ostreidae': '725',
'Phaeophyceae': '726',
'Porphyra': '727',
'Ulva reticulata': '728',
'Perna viridis': '729',
'Fenneropenaeus indicus': '730',
'Merluccius': '731',
'Soleidae': '732',
'Mugilidae': '733',
'Marine algae': '734',
'Scarus rivulatus': '735',
'Scarus coeruleus': '736',
'Sardinella fimbriata': '737',
'Dussumieria acuta': '738',
'Lutjanus kasmira': '739',
'Lutjanus rivulatus': '740',
'Lutjanus bohar': '741',
'Priacanthus blochii': '742',
'Pelates quadrilineatus': '743',
'Epinephelus fasciatus': '744',
'Upeneus vittatus': '745',
'Lethrinus laticaudis': '746',
'Lethrinus lentjan': '747',
'Lethrinus microdon': '748',
'Sphyraena barracuda': '749',
'Alectis indica': '750',
'Epinephelus latifasciatus': '751',
'Nemipterus japonicus': '752',
'Raconda russeliana': '753',
'Lactarius lactarius': '754',
'Aetomylaeus bovinus': '755',
'Pennahia anea': '756',
'Leiognathus fasciatus': '757',
'Sardinella longiceps': '758',
'Tenualosa ilisha': '759',
'Pellona ditchela': '760',
'Stolephorus indicus': '761',
'Setipinna breviceps': '762',
'Rastrelliger kanagurta': '763',
'Chanos chanos': '764',
'Lepturacanthus savala': '765',
'Epinephelus niveatus': '766',
'Lutjanus johnii': '767',
'Carangoides malabaricus': '768',
'Ablennes hians': '769',
'Chirocentrus dorab': '770',
'Scomberomorus cavalla': '771',
'Scomberomorus semifasciatus': '772',
'Scomberomorus guttatus': '773',
'Etrumeus teres': '774',
'Spondyliosoma cantharus': '775',
'Brama brama': '776',
'Dasyatis zugei': '777',
'Harpadon nehereus': '778',
'Carcharhinus melanopterus': '779',
'Penaeus plebejus': '780',
'Sepia officinalis': '781',
'Johnius dussumieri': '782',
'Lutjanus campechanus': '783',
'Ruditapes decussatus': '784',
'Carcinus aestuarii': '785',
'Squilla mantis': '786',
'Epinephelus polyphekadion': '787',
'Lutjanus gibbus': '788',
'Lethrinus mahsena': '789',
'Epinephelus chlorostigma': '790',
'Carangoides bajad': '791',
'Aethaloperca rogaa': '792',
'Atule mate': '793',
'Macolor niger': '794',
'Carangoides fulvoguttatus': '795',
'Plectropomus areolatus': '796',
'Cephalopholis argus': '797',
'Cephalopholis': '798',
'Scarus sordidus': '799',
'Scomberomorus tritor': '800',
'Triaenodon obesus': '801',
'Pomadasys commersonnii': '802',
'Monotaxis grandoculis': '803',
'Plectropomus maculatus': '804',
'Trachinotus blochii': '805',
'Pristipomoides filamentosus': '806',
'Acanthurus gahhm': '807',
'Acanthurus sohal': '808',
'Siganus argenteus': '809',
'Naso unicornis': '810',
'Chanos': '811',
'Oedalechilus labiosus': '812',
'Plectorhinchus gaterinus': '813',
'Mercenaria mercenaria': '814',
'Mytilus': '815',
'Turbo cornutus': '816',
'Decapoda': '817',
'Sphyraena': '818',
'Arius maculatus': '819',
'Penaeus merguiensis': '820',
'Tegillarca granosa': '821',
'Mullus barbatus barbatus': '822',
'Chamelea gallina': '823',
'Metanephrops thomsoni': '824',
'Magallana gigas': '825',
'Branchiostegus japonicus': '826',
'Cephalopoda': '827',
'Lutjanidae': '828',
'Lethrinidae': '829',
'Sphyraena argentea': '830',
'Chirocentrus nudus': '831',
'Trachinotus': '832',
'Mugil auratus': '833',
'Euthynnus alletteratus': '834',
'Sparus aurata': '835',
'Pagrus caeruleostictus': '836',
'Scorpaena scrofa': '837',
'Pagellus erythrinus': '838',
'Epinephelus aeneus': '839',
'Dentex maroccanus': '840',
'Caranx rhonchus': '841',
'Sardinella': '842',
'Siganus': '843',
'Solea': '844',
'Diplodus sargus': '845',
'Lithognathus mormyrus': '846',
'Oblada melanura': '847',
'Siganus rivulatus': '848',
'Chelon labrosus': '849',
'Cynoscion microlepidotus': '850',
'Genypterus brasiliensis': '851',
'Myoxocephalus polyacanthocephalus': '852',
'Hexagrammos lagocephalus': '853',
'Hexagrammos decagrammus': '854',
'Sebastes ciliatus': '855',
'Lepidopsetta polyxystra': '856',
'Clupeiformes': '857',
'Gadidae': '858',
'Brachyura': '859',
'Dasyatis': '860',
'Carcharias': '861',
'Saurida': '862',
'Upeneus': '863',
'Cynoglossus': '864',
'Scomberomorus': '865',
'Terapon': '866',
'Leiognathus': '867',
'Terapontidae': '868',
'Caranx': '869',
'Diplodus': '870',
'Plectorhinchus flavomaculatus': '871',
'Salmonidae': '872',
'Mollusca': '873',
'Boops boops': '874',
'Sarpa salpa': '875',
'Pagellus acarne': '876',
'Spicara smaris': '877',
'Diplodus vulgaris': '878',
'Chelidonichthys lucerna': '879',
'Sarda sarda': '880',
'Serranus cabrilla': '881',
'Diplodus annularis': '882',
'Pagrus pagrus': '883',
'Alosa fallax': '884',
'Belone belone': '885',
'Dentex dentex': '886',
'Sphyraena viridensis': '887',
'Trisopterus capelanus': '888',
'Arnoglossus laterna': '889',
'Procambarus clarkii': '890',
'Nemadactylus macropterus': '891',
'Pagrus auratus': '892',
'Jasus edwardsii': '893',
'Perna canaliculus': '894',
'Pseudophycis bachus': '895',
'Haliotis iris': '896',
'Hoplostethus atlanticus': '897',
'Rhombosolea leporina': '898',
'Zygochlamys delicatula': '899',
'Galeorhinus galeus': '900',
'Parapercis colias': '901',
'Tiostrea chilensis': '902',
'Genypterus blacodes': '903',
'Evechinus chloroticus': '904',
'Austrovenus stutchburyi': '905',
'Micromesistius australis': '906',
'Macruronus novaezelandiae': '907',
'Nototodarus': '908',
'Perna perna': '909',
'Sepia pharaonis': '910',
'Turbo bruneus': '911',
'Portunus sanguinolentus': '912',
'Charybdis natator': '913',
'Charybdis lucifera': '914',
'Panulirus argus': '915',
'Ethmalosa fimbriata': '916',
'Sardinella brachysoma': '917',
'Thryssa mystax': '918',
'Plicofollis dussumieri': '919',
'Nibea soldado': '920',
'Epinephelus melanostigma': '921',
'Megalops cyprinoides': '922',
'Decapterus macarellus': '923',
'Drepane punctata': '924',
'Sillago sihama': '925',
'Tylosurus crocodilus crocodilus': '926',
'Saurida tumbil': '927',
'Cynoglossus macrostomus': '928',
'Parupeneus indicus': '929',
'Synechogobius hasta': '930',
'Busycotypus canaliculatus': '931',
'Pampus cinereus': '932',
'Pomadasys kaakan': '933',
'Epinephelus coioides': '934',
'Sepiella inermis': '935',
'Uroteuthis duvauceli': '936',
'Stomatella auricula': '937',
'Cerithium scabridum': '938',
'Marcia recens': '939',
'Circe intermedia': '940',
'Marcia opima': '941',
'Fulvia fragile': '942',
'Charybdis feriatus': '943',
'Charybdis annulata': '944',
'Atergatis integerrimus': '945',
'Matuta lunaris': '946',
'Calappa lophos': '947',
'Uca annulipes': '948',
'Chlamys varia': '949',
'Cololabis adocetus': '950',
'Seriola lalandi dorsalis': '951',
'Brunneifusus ternatanus': '952',
'Metapenaeus joyneri': '953',
'Epinephelus tauvina': '954',
'Coilia dussumieri': '955',
'Carcharhinus dussumieri': '956',
'Upeneus tragula': '957',
'Sartoriana spinigera': '958',
'Lamellidens marginalis': '959',
'Polydactylus sextarius': '960',
'Johnius macrorhynus': '961',
'Hexanematichthys sagor': '962',
'Sargassum swartzii': '963',
'Argyrops spinifer': '964',
'Synodus intermedius': '965',
'Muraenesox cinereus': '966',
'Carangoides armatus': '967',
'Eleutheronema tetradactylum': '968',
'Mustelus mosis': '969',
'Nemipterus bipunctatus': '970',
'Lutjanus quinquelineatus': '971',
'Platycephalus indicus': '972',
'Rhabdosargus haffara': '973',
'Argyrops filamentosus': '974',
'Brachirus orientalis': '975',
'Mene maculata': '976',
'Hemiramphus marginatus': '977',
'Encrasicholina heteroloba': '978',
'Trachinotus africanus': '979',
'Bramidae': '980',
'Escualosa thoracata': '981',
'Sepia arabica': '982',
'Scatophagus argus': '983',
'Parastromateus niger': '984',
'Planiliza subviridis': '985',
'Labeo rohita': '986',
'Oreochromis niloticus': '987',
'Cardiidae': '988',
'Sargassum angustifolium': '989',
'Pomacea bridgesii': '990',
'Sebastes fasciatus': '991',
'Batoidea': '992',
'Urophycis chuss': '993',
'Dalatias licha': '994',
'Trisopterus luscus': '995',
'Scyliorhinus canicula': '996',
'Ruvettus pretiosus': '997',
'Aphanopus carbo': '998',
'Alepocephalus bairdii': '999',
...},
'body_part': {'Not applicable': '-1',
'Not available': '0',
'Whole animal': '1',
'Whole animal eviscerated': '2',
'Whole animal eviscerated without head': '3',
'Flesh with bones': '4',
'Blood': '5',
'Skeleton': '6',
'Bones': '7',
'Exoskeleton': '8',
'Endoskeleton': '9',
'Shells': '10',
'Molt': '11',
'Skin': '12',
'Head': '13',
'Tooth': '14',
'Otolith': '15',
'Fins': '16',
'Faecal pellet': '17',
'Byssus': '18',
'Soft parts': '19',
'Viscera': '20',
'Stomach': '21',
'Hepatopancreas': '22',
'Digestive gland': '23',
'Pyloric caeca': '24',
'Liver': '25',
'Intestine': '26',
'Kidney': '27',
'Spleen': '28',
'Brain': '29',
'Eye': '30',
'Fat': '31',
'Heart': '32',
'Branchial heart': '33',
'Muscle': '34',
'Mantle': '35',
'Gills': '36',
'Gonad': '37',
'Ovary': '38',
'Testes': '39',
'Whole plant': '40',
'Flower': '41',
'Leaf': '42',
'Old leaf': '43',
'Young leaf': '44',
'Leaf upper part': '45',
'Leaf lower part': '46',
'Scales': '47',
'Root rhizome': '48',
'Whole macro alga': '49',
'Phytoplankton': '50',
'Thallus': '51',
'Flesh without bones': '52',
'Stomach and intestine': '53',
'Whole haptophytic plants': '54',
'Loose drifting plants': '55',
'Growing tips': '56',
'Upper parts of plants': '57',
'Lower parts of plants': '58',
'Shells carapace': '59',
'Flesh with scales': '60'}},
'SEAWATER': {'nuclide': {'NOT APPLICABLE': '-1',
'NOT AVAILABLE': '0',
'h3': '1',
'be7': '2',
'c14': '3',
'k40': '4',
'cr51': '5',
'mn54': '6',
'co57': '7',
'co58': '8',
'co60': '9',
'zn65': '10',
'sr89': '11',
'sr90': '12',
'zr95': '13',
'nb95': '14',
'tc99': '15',
'ru103': '16',
'ru106': '17',
'rh106': '18',
'ag106m': '19',
'ag108': '20',
'ag108m': '21',
'ag110m': '22',
'sb124': '23',
'sb125': '24',
'te129m': '25',
'i129': '28',
'i131': '29',
'cs127': '30',
'cs134': '31',
'cs137': '33',
'ba140': '34',
'la140': '35',
'ce141': '36',
'ce144': '37',
'pm147': '38',
'eu154': '39',
'eu155': '40',
'pb210': '41',
'pb212': '42',
'pb214': '43',
'bi207': '44',
'bi211': '45',
'bi214': '46',
'po210': '47',
'rn220': '48',
'rn222': '49',
'ra223': '50',
'ra224': '51',
'ra225': '52',
'ra226': '53',
'ra228': '54',
'ac228': '55',
'th227': '56',
'th228': '57',
'th232': '59',
'th234': '60',
'pa234': '61',
'u234': '62',
'u235': '63',
'u238': '64',
'np237': '65',
'np239': '66',
'pu238': '67',
'pu239': '68',
'pu240': '69',
'pu241': '70',
'am240': '71',
'am241': '72',
'cm242': '73',
'cm243': '74',
'cm244': '75',
'cs134_137_tot': '76',
'pu239_240_tot': '77',
'pu239_240_iii_iv_tot': '78',
'pu239_240_v_vi_tot': '79',
'cm243_244_tot': '80',
'pu238_pu239_240_tot_ratio': '81',
'am241_pu239_240_tot_ratio': '82',
'cs137_134_ratio': '83',
'cd109': '84',
'eu152': '85',
'fe59': '86',
'gd153': '87',
'ir192': '88',
'pu238_240_tot': '89',
'rb86': '90',
'sc46': '91',
'sn113': '92',
'sn117m': '93',
'tl208': '94',
'mo99': '95',
'tc99m': '96',
'ru105': '97',
'te129': '98',
'te132': '99',
'i132': '100',
'i135': '101',
'cs136': '102',
'tbeta': '103',
'talpha': '104',
'i133': '105',
'th230': '106',
'pa231': '107',
'u236': '108',
'ag111': '109',
'in116m': '110',
'te123m': '111',
'sb127': '112',
'ba133': '113',
'ce139': '114',
'tl201': '116',
'hg203': '117',
'na22': '122',
'pa234m': '123',
'am243': '124',
'se75': '126',
'sr85': '127',
'y88': '128',
'ce140': '129',
'bi212': '130',
'u236_238_ratio': '131',
'i125': '132',
'ba137m': '133',
'u232': '134',
'pa233': '135',
'ru106_rh106_tot': '136',
'tu': '137',
'tbeta40k': '138',
'fe55': '139',
'ce144_pr144_tot': '140',
'pu240_pu239_ratio': '141',
'u233': '142',
'pu239_242_tot': '143',
'ac227': '144'},
'unit': {'Not applicable': '-1',
'NOT AVAILABLE': '0',
'Bq per m3': '1',
'Bq per m2': '2',
'Bq per kg': '3',
'Bq per kgd': '4',
'Bq per kgw': '5',
'kg per kg': '6',
'TU': '7',
'DELTA per mill': '8',
'atom per kg': '9',
'atom per kgd': '10',
'atom per kgw': '11',
'atom per l': '12',
'Bq per kgC': '13'},
'dl': {'Not applicable': '-1',
'Not available': '0',
'Detected value': '1',
'Detection limit': '2',
'Not detected': '3',
'Derived': '4'},
'filt': {'Not applicable': '-1',
'Not available': '0',
'Yes': '1',
'No': '2'}},
'SEDIMENT': {'nuclide': {'NOT APPLICABLE': '-1',
'NOT AVAILABLE': '0',
'h3': '1',
'be7': '2',
'c14': '3',
'k40': '4',
'cr51': '5',
'mn54': '6',
'co57': '7',
'co58': '8',
'co60': '9',
'zn65': '10',
'sr89': '11',
'sr90': '12',
'zr95': '13',
'nb95': '14',
'tc99': '15',
'ru103': '16',
'ru106': '17',
'rh106': '18',
'ag106m': '19',
'ag108': '20',
'ag108m': '21',
'ag110m': '22',
'sb124': '23',
'sb125': '24',
'te129m': '25',
'i129': '28',
'i131': '29',
'cs127': '30',
'cs134': '31',
'cs137': '33',
'ba140': '34',
'la140': '35',
'ce141': '36',
'ce144': '37',
'pm147': '38',
'eu154': '39',
'eu155': '40',
'pb210': '41',
'pb212': '42',
'pb214': '43',
'bi207': '44',
'bi211': '45',
'bi214': '46',
'po210': '47',
'rn220': '48',
'rn222': '49',
'ra223': '50',
'ra224': '51',
'ra225': '52',
'ra226': '53',
'ra228': '54',
'ac228': '55',
'th227': '56',
'th228': '57',
'th232': '59',
'th234': '60',
'pa234': '61',
'u234': '62',
'u235': '63',
'u238': '64',
'np237': '65',
'np239': '66',
'pu238': '67',
'pu239': '68',
'pu240': '69',
'pu241': '70',
'am240': '71',
'am241': '72',
'cm242': '73',
'cm243': '74',
'cm244': '75',
'cs134_137_tot': '76',
'pu239_240_tot': '77',
'pu239_240_iii_iv_tot': '78',
'pu239_240_v_vi_tot': '79',
'cm243_244_tot': '80',
'pu238_pu239_240_tot_ratio': '81',
'am241_pu239_240_tot_ratio': '82',
'cs137_134_ratio': '83',
'cd109': '84',
'eu152': '85',
'fe59': '86',
'gd153': '87',
'ir192': '88',
'pu238_240_tot': '89',
'rb86': '90',
'sc46': '91',
'sn113': '92',
'sn117m': '93',
'tl208': '94',
'mo99': '95',
'tc99m': '96',
'ru105': '97',
'te129': '98',
'te132': '99',
'i132': '100',
'i135': '101',
'cs136': '102',
'tbeta': '103',
'talpha': '104',
'i133': '105',
'th230': '106',
'pa231': '107',
'u236': '108',
'ag111': '109',
'in116m': '110',
'te123m': '111',
'sb127': '112',
'ba133': '113',
'ce139': '114',
'tl201': '116',
'hg203': '117',
'na22': '122',
'pa234m': '123',
'am243': '124',
'se75': '126',
'sr85': '127',
'y88': '128',
'ce140': '129',
'bi212': '130',
'u236_238_ratio': '131',
'i125': '132',
'ba137m': '133',
'u232': '134',
'pa233': '135',
'ru106_rh106_tot': '136',
'tu': '137',
'tbeta40k': '138',
'fe55': '139',
'ce144_pr144_tot': '140',
'pu240_pu239_ratio': '141',
'u233': '142',
'pu239_242_tot': '143',
'ac227': '144'},
'unit': {'Not applicable': '-1',
'NOT AVAILABLE': '0',
'Bq per m3': '1',
'Bq per m2': '2',
'Bq per kg': '3',
'Bq per kgd': '4',
'Bq per kgw': '5',
'kg per kg': '6',
'TU': '7',
'DELTA per mill': '8',
'atom per kg': '9',
'atom per kgd': '10',
'atom per kgw': '11',
'atom per l': '12',
'Bq per kgC': '13'},
'dl': {'Not applicable': '-1',
'Not available': '0',
'Detected value': '1',
'Detection limit': '2',
'Not detected': '3',
'Derived': '4'},
'sed_type': {'Not applicable': '-1',
'Not available': '0',
'Clay': '1',
'Gravel': '2',
'Marsh': '3',
'Mud': '4',
'Muddy sand': '5',
'Sand': '6',
'Fine sand': '7',
'Sandy mud': '8',
'Pebby sand': '9',
'Silt and clay': '10',
'Silt and gravel': '11',
'Silt': '12',
'Silty sand': '13',
'Sludge': '14',
'Turf': '15',
'Very coarse sand': '16',
'Coarse sand': '17',
'Medium sand': '18',
'Very fine sand': '19',
'Coarse silt': '20',
'Medium silt': '21',
'Fine silt': '22',
'Very fine silt': '23',
'Calcareous': '24',
'Glacial': '25',
'Soft': '26',
'Sulphidic': '27',
'Fe Mg concretions': '28',
'Sand and gravel': '29',
'Pure sand': '30',
'Sand and fine sand': '31',
'Sand and clay': '32',
'Sand and mud': '33',
'Fine sand and gravel': '34',
'Fine sand and sand': '35',
'Pure fine sand': '36',
'Fine sand and silt': '37',
'Fine sand and clay': '38',
'Fine sand and mud': '39',
'Silt and sand': '40',
'Silt and fine sand': '41',
'Pure silt': '42',
'Silt and mud': '43',
'Clay and gravel': '44',
'Clay and sand': '45',
'Clay and fine sand': '46',
'Pure clay': '47',
'Clay and silt': '48',
'Clay and mud': '49',
'Glacial clay': '50',
'Soft clay': '51',
'Sulphidic clay': '52',
'Clay and Fe Mg concretions': '53',
'Mud and gravel': '54',
'Mud and sand': '55',
'Mud and fine sand': '56',
'Mud and clay': '57',
'Pure mud': '58',
'Soft mud': '59',
'Sulphidic mud': '60',
'Mud and Fe Mg concretions': '61',
'Sand and silt': '62'}}}
Show the global attributes extracted from the NetCDF file.
contents.global_attrs
{'id': '26VMZZ2Q',
'title': 'Environmental database - Helsinki Commission Monitoring of Radioactive Substances',
'summary': 'MORS Environment database has been used to collate data resulting from monitoring of environmental radioactivity in the Baltic Sea based on HELCOM Recommendation 26/3.\n\nThe database is structured according to HELCOM Guidelines on Monitoring of Radioactive Substances (https://www.helcom.fi/wp-content/uploads/2019/08/Guidelines-for-Monitoring-of-Radioactive-Substances.pdf), which specifies reporting format, database structure, data types and obligatory parameters used for reporting data under Recommendation 26/3.\n\nThe database is updated and quality assured annually by HELCOM MORS EG.',
'keywords': 'oceanography, Earth Science > Oceans > Ocean Chemistry> Radionuclides, Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure, Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments, Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes, Earth Science > Oceans > Water Quality > Ocean Contaminants, Earth Science > Biological Classification > Animals/Vertebrates > Fish, Earth Science > Biosphere > Ecosystems > Marine Ecosystems, Earth Science > Biological Classification > Animals/Invertebrates > Mollusks, Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans, Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)',
'history': 'TBD',
'keywords_vocabulary': 'GCMD Science Keywords',
'keywords_vocabulary_url': 'https://gcmd.earthdata.nasa.gov/static/kms/',
'record': 'TBD',
'featureType': 'TBD',
'cdm_data_type': 'TBD',
'Conventions': 'CF-1.10 ACDD-1.3',
'publisher_name': 'Paul MCGINNITY, Iolanda OSVATH, Florence DESCROIX-COMANDUCCI',
'publisher_email': 'p.mc-ginnity@iaea.org, i.osvath@iaea.org, F.Descroix-Comanducci@iaea.org',
'publisher_url': 'https://maris.iaea.org',
'publisher_institution': 'International Atomic Energy Agency - IAEA',
'creator_name': '[{"creatorType": "author", "name": "HELCOM MORS"}]',
'institution': 'TBD',
'metadata_link': 'TBD',
'creator_email': 'TBD',
'creator_url': 'TBD',
'references': 'TBD',
'license': 'Without prejudice to the applicable Terms and Conditions (https://nucleus.iaea.org/Pages/Others/Disclaimer.aspx), I hereby agree that any use of the data will contain appropriate acknowledgement of the data source(s) and the IAEA Marine Radioactivity Information System (MARIS).',
'comment': 'TBD',
'geospatial_lat_min': '31.17',
'geospatial_lon_min': '9.6333',
'geospatial_lat_max': '65.75',
'geospatial_lon_max': '53.5',
'geospatial_vertical_min': '0.0',
'geospatial_vertical_max': '437.0',
'geospatial_bounds': 'POLYGON ((9.6333 53.5, 31.17 53.5, 31.17 65.75, 9.6333 65.75, 9.6333 53.5))',
'geospatial_bounds_crs': 'EPSG:4326',
'time_coverage_start': '1984-01-10T00:00:00',
'time_coverage_end': '2023-11-30T00:00:00',
'local_time_zone': 'TBD',
'date_created': 'TBD',
'date_modified': 'TBD',
'publisher_postprocess_logs': "Convert 'nuclide' column values to lowercase, strip spaces, and store in 'NUCLIDE' column., Remap data provider nuclide names to standardized MARIS nuclide names., Standardize time format across all dataframes., Encode time as seconds since epoch., Separate sediment entries into distinct rows for Bq/kg and Bq/m² measurements., Sanitize measurement values by removing blanks and standardizing to use the `VALUE` column., Convert from relative error to standard uncertainty., Set the `unit` id column in the DataFrames based on a lookup table., Remap value type to MARIS format., Remap values from 'rubin' to 'SPECIES' for groups: BIOTA., Remap values from 'tissue' to 'BODY_PART' for groups: BIOTA., Remap values from 'SPECIES' to 'BIO_GROUP' for groups: BIOTA., Lookup sediment id using lookup table., Lookup filt value in dataframe using the lookup table., Ensure depth values are floats and add 'SMP_DEPTH' and 'TOT_DEPTH' columns., Remap Sediment slice top and bottom to MARIS format., Lookup dry-wet ratio and format for MARIS., Get geographical coordinates from columns expressed in degrees decimal format or from columns in degrees/minutes decimal format where degrees decimal format is missing or zero., Drop rows with invalid longitude & latitude values. Convert `,` separator to `.` separator."}
Validate NetCDF Enumerations
Verify that enumerated values in the NetCDF file match current MARIS lookup tables.
FEEDBACK TO DATA PROVIDER: The enumeration validation process is a diagnostic step that identifies inconsistencies between NetCDF enumerations and MARIS lookup tables. While this validation does not modify the dataset, it generates detailed feedback about any mismatches or undefined values.
ValidateEnumsCB
ValidateEnumsCB (contents, maris_enums, verbose=False)
Validate enumeration mappings between NetCDF file and MARIS lookup tables.
= ExtractNetcdfContents(fname_in)
contents = Transformer(
tfm
contents.dfs,=[
cbs
ValidateEnumsCB(= contents,
contents =Enums(lut_src_dir=lut_path())
maris_enums
),
]
) tfm()
{'BIOTA': LON LAT SMP_DEPTH TIME NUCLIDE VALUE UNIT \
0 12.316667 54.283333 NaN 1348358400 31 0.010140 5
1 12.316667 54.283333 NaN 1348358400 4 135.300003 5
2 12.316667 54.283333 NaN 1348358400 9 0.013980 5
3 12.316667 54.283333 NaN 1348358400 33 4.338000 5
4 12.316667 54.283333 NaN 1348358400 31 0.009614 5
... ... ... ... ... ... ... ...
16089 21.395000 61.241501 2.0 1652140800 33 13.700000 4
16090 21.395000 61.241501 2.0 1652140800 9 0.500000 4
16091 21.385000 61.343334 NaN 1663200000 4 50.700001 4
16092 21.385000 61.343334 NaN 1663200000 33 0.880000 4
16093 21.385000 61.343334 NaN 1663200000 12 6.600000 4
UNC DL BIO_GROUP SPECIES BODY_PART DRYWT WETWT \
0 NaN 2 4 99 52 174.934433 948.0
1 4.830210 1 4 99 52 174.934433 948.0
2 NaN 2 4 99 52 174.934433 948.0
3 0.150962 1 4 99 52 174.934433 948.0
4 NaN 2 4 99 52 177.935120 964.0
... ... .. ... ... ... ... ...
16089 0.520600 1 11 96 55 NaN NaN
16090 0.045500 1 11 96 55 NaN NaN
16091 4.106700 1 14 129 1 NaN NaN
16092 0.140800 1 14 129 1 NaN NaN
16093 0.349800 1 14 129 1 NaN NaN
PERCENTWT
0 0.18453
1 0.18453
2 0.18453
3 0.18453
4 0.18458
... ...
16089 NaN
16090 NaN
16091 NaN
16092 NaN
16093 NaN
[16094 rows x 15 columns],
'SEAWATER': LON LAT SMP_DEPTH TOT_DEPTH TIME NUCLIDE \
0 29.333300 60.083302 0.0 NaN 1337731200 33
1 29.333300 60.083302 29.0 NaN 1337731200 33
2 23.150000 59.433300 0.0 NaN 1339891200 33
3 27.983299 60.250000 0.0 NaN 1337817600 33
4 27.983299 60.250000 39.0 NaN 1337817600 33
... ... ... ... ... ... ...
21468 13.499833 54.600334 0.0 47.0 1686441600 1
21469 13.499833 54.600334 45.0 47.0 1686441600 1
21470 14.200833 54.600334 0.0 11.0 1686614400 1
21471 14.665500 54.600334 0.0 20.0 1686614400 1
21472 14.330000 54.600334 0.0 17.0 1686614400 1
VALUE UNIT UNC DL FILT
0 5.300000 1 1.696000 1 0
1 19.900000 1 3.980000 1 0
2 25.500000 1 5.100000 1 0
3 17.000000 1 4.930000 1 0
4 22.200001 1 3.996000 1 0
... ... ... ... .. ...
21468 702.838074 1 51.276207 1 0
21469 725.855713 1 52.686260 1 0
21470 648.992920 1 48.154419 1 0
21471 627.178406 1 46.245316 1 0
21472 605.715088 1 45.691143 1 0
[21473 rows x 11 columns],
'SEDIMENT': LON LAT TOT_DEPTH TIME NUCLIDE VALUE \
0 27.799999 60.466667 25.0 1337904000 33 1200.000000
1 27.799999 60.466667 25.0 1337904000 33 250.000000
2 27.799999 60.466667 25.0 1337904000 33 140.000000
3 27.799999 60.466667 25.0 1337904000 33 79.000000
4 27.799999 60.466667 25.0 1337904000 33 29.000000
... ... ... ... ... ... ...
70444 15.537800 54.617832 62.0 1654646400 67 0.044000
70445 15.537800 54.617832 62.0 1654646400 77 2.500000
70446 15.537800 54.617832 62.0 1654646400 4 5873.000000
70447 15.537800 54.617832 62.0 1654646400 33 21.200001
70448 15.537800 54.617832 62.0 1654646400 77 0.370000
UNIT UNC DL SED_TYPE TOP BOTTOM PERCENTWT
0 4 240.000000 1 0 15.0 20.0 NaN
1 4 50.000000 1 0 20.0 25.0 NaN
2 4 29.400000 1 0 25.0 30.0 NaN
3 4 15.800000 1 0 30.0 35.0 NaN
4 4 6.960000 1 0 35.0 40.0 NaN
... ... ... .. ... ... ... ...
70444 4 0.015312 1 10 15.0 17.0 0.257642
70445 4 0.185000 1 10 15.0 17.0 0.257642
70446 4 164.444000 1 10 17.0 19.0 0.263965
70447 4 2.162400 1 10 17.0 19.0 0.263965
70448 4 0.048100 1 10 17.0 19.0 0.263965
[70449 rows x 13 columns]}
Remove Non Compatible Columns
The [
RemoveNonCompatibleVariablesCB](https://franckalbinet.github.io/marisco/handlers/data_format_transformation.html#removenoncompatiblevariablescb)
callback filters out variables from the NetCDF format that are not listed in the VARS configuration.
RemoveNonCompatibleVariablesCB
RemoveNonCompatibleVariablesCB (vars:Dict[str,str]={'LON': 'longitude', 'LAT': 'latitude', 'SMP_DEPTH': 'sampdepth', 'TOT_DEPTH': 'totdepth', 'TIME': 'begperiod', 'AREA': 'area', 'NUCLIDE': 'nuclide_id', 'VALUE': 'activity', 'UNIT': 'unit_id', 'UNC': 'uncertaint', 'DL': 'detection', 'FILT': 'filtered', 'COUNT_MET': 'counmet_id', 'SAMP_MET': 'sampmet_id', 'PREP_MET': 'prepmet_id', 'VOL': 'volume', 'SAL': 'salinity', 'TEMP': 'temperatur', 'SPECIES': 'species_id', 'BODY_PART': 'bodypar_id', 'SED_TYPE': 'sedtype_id', 'TOP': 'sliceup', 'BOTTOM': 'slicedown', 'DRYWT': 'drywt', 'WETWT': 'wetwt', 'PERCENTWT': 'percentwt', 'LAB': 'lab_id', 'PROFILE_ID': 'profile_id', 'SAMPLE_TYPE': 'samptype_id', 'TAXONNAME': 'taxonname', 'TAXONREPNAME': 'taxonrepname', 'TAXONRANK': 'taxonrank', 'TAXONDB': 'taxondb', 'TAXONDBID': 'taxondb_id', 'TAXONDBURL': 'taxondb_url', 'REF_ID': 'ref_id', 'SMP_ID': 'samplelabcode'}, verbose:bool=False)
Remove variables not listed in VARS configuration.
Type | Default | Details | |
---|---|---|---|
vars | Dict | {‘LON’: ‘longitude’, ‘LAT’: ‘latitude’, ‘SMP_DEPTH’: ‘sampdepth’, ‘TOT_DEPTH’: ‘totdepth’, ‘TIME’: ‘begperiod’, ‘AREA’: ‘area’, ‘NUCLIDE’: ‘nuclide_id’, ‘VALUE’: ‘activity’, ‘UNIT’: ‘unit_id’, ‘UNC’: ‘uncertaint’, ‘DL’: ‘detection’, ‘FILT’: ‘filtered’, ‘COUNT_MET’: ‘counmet_id’, ‘SAMP_MET’: ‘sampmet_id’, ‘PREP_MET’: ‘prepmet_id’, ‘VOL’: ‘volume’, ‘SAL’: ‘salinity’, ‘TEMP’: ‘temperatur’, ‘SPECIES’: ‘species_id’, ‘BODY_PART’: ‘bodypar_id’, ‘SED_TYPE’: ‘sedtype_id’, ‘TOP’: ‘sliceup’, ‘BOTTOM’: ‘slicedown’, ‘DRYWT’: ‘drywt’, ‘WETWT’: ‘wetwt’, ‘PERCENTWT’: ‘percentwt’, ‘LAB’: ‘lab_id’, ‘PROFILE_ID’: ‘profile_id’, ‘SAMPLE_TYPE’: ‘samptype_id’, ‘TAXONNAME’: ‘taxonname’, ‘TAXONREPNAME’: ‘taxonrepname’, ‘TAXONRANK’: ‘taxonrank’, ‘TAXONDB’: ‘taxondb’, ‘TAXONDBID’: ‘taxondb_id’, ‘TAXONDBURL’: ‘taxondb_url’, ‘REF_ID’: ‘ref_id’, ‘SMP_ID’: ‘samplelabcode’} | Dictionary mapping OR vars to NC vars |
verbose | bool | False |
= ExtractNetcdfContents(fname_in)
contents = Transformer(
tfm
contents.dfs,=[
cbsvars=CSV_VARS, verbose=True),
RemoveNonCompatibleVariablesCB(
]
)
tfm()print('\n')
Removing variables that are not compatible with vars provided.
Removing BIO_GROUP from BIOTA dataset.
Add Taxon Information
get_taxon_info_lut
get_taxon_info_lut (maris_lut:str, key_names:dict={'Taxonname': 'TAXONNAME', 'Taxonrank': 'TAXONRANK', 'TaxonDB': 'TAXONDB', 'TaxonDBID': 'TAXONDBID', 'TaxonDBURL': 'TAXONDBURL'})
Create lookup dictionary for taxon information from MARIS species lookup table.
AddTaxonInformationCB
AddTaxonInformationCB (fn_lut:Callable=<function <lambda>>, verbose:bool=False)
Add taxon information to BIOTA group based on species lookup table.
Type | Default | Details | |
---|---|---|---|
fn_lut | Callable | Function that returns taxon lookup dictionary | |
verbose | bool | False |
= ExtractNetcdfContents(fname_in)
contents = Transformer(
tfm
contents.dfs,=[
cbs
AddTaxonInformationCB(=lut_taxon
fn_lut
),
]
)
tfm()print(tfm.dfs['BIOTA'][['TAXONNAME','TAXONRANK','TAXONDB','TAXONDBID','TAXONDBURL']])
TAXONNAME TAXONRANK TAXONDB TAXONDBID \
0 Gadus morhua species Wikidata Q199788
1 Gadus morhua species Wikidata Q199788
2 Gadus morhua species Wikidata Q199788
3 Gadus morhua species Wikidata Q199788
4 Gadus morhua species Wikidata Q199788
... ... ... ... ...
16089 Fucus vesiculosus species Wikidata Q754755
16090 Fucus vesiculosus species Wikidata Q754755
16091 Mytilus edulis species Wikidata Q27855
16092 Mytilus edulis species Wikidata Q27855
16093 Mytilus edulis species Wikidata Q27855
TAXONDBURL
0 https://www.wikidata.org/wiki/Q199788
1 https://www.wikidata.org/wiki/Q199788
2 https://www.wikidata.org/wiki/Q199788
3 https://www.wikidata.org/wiki/Q199788
4 https://www.wikidata.org/wiki/Q199788
... ...
16089 https://www.wikidata.org/wiki/Q754755
16090 https://www.wikidata.org/wiki/Q754755
16091 https://www.wikidata.org/wiki/Q27855
16092 https://www.wikidata.org/wiki/Q27855
16093 https://www.wikidata.org/wiki/Q27855
[16094 rows x 5 columns]
Standardize Time
= ExtractNetcdfContents(fname_in)
contents = Transformer(
tfm
contents.dfs,=[
cbs
DecodeTimeCB(),
]
)
tfm()
print(tfm.dfs['BIOTA']['TIME'])
0 2012-09-23
1 2012-09-23
2 2012-09-23
3 2012-09-23
4 2012-09-23
...
16089 2022-05-10
16090 2022-05-10
16091 2022-09-15
16092 2022-09-15
16093 2022-09-15
Name: TIME, Length: 16094, dtype: datetime64[ns]
Add Sample Type ID
= ExtractNetcdfContents(fname_in)
contents = Transformer(
tfm
contents.dfs,=[
cbs
AddSampleTypeIdColumnCB(),
]
)
tfm()print(tfm.dfs['SEAWATER']['SAMPLE_TYPE'].unique())
print(tfm.dfs['BIOTA']['SAMPLE_TYPE'].unique())
print(tfm.dfs['SEDIMENT']['SAMPLE_TYPE'].unique())
[1]
[2]
[3]
Add Reference ID
Include the ref_id
(i.e., Zotero Archive Location). The ZoteroArchiveLocationCB
performs a lookup of the Zotero Archive Location based on the Zotero key
defined in the global attributes of the MARIS NetCDF file as id
.
'id'] contents.global_attrs[
'26VMZZ2Q'
AddZoteroArchiveLocationCB
AddZoteroArchiveLocationCB (attrs:str, cfg:dict)
Fetch and append ‘Loc. in Archive’ from Zotero to DataFrame.
= ExtractNetcdfContents(fname_in)
contents = Transformer(
tfm
contents.dfs,=[
cbs=cfg()),
AddZoteroArchiveLocationCB(contents.global_attrs, cfg
]
)
tfm()print(tfm.dfs['SEAWATER']['REF_ID'].unique())
[100]
Remap to Open Refine specific mappings
FEEDBACK FOR NEXT VERSION: The current approach of remapping to OR-specific mappings should be reconsidered. Considering that we already utilize MARISCO lookup tables in NetCDF for creating enums, it would be beneficial to extend their use to OpenRefine data formats as well. By doing so, we could eliminate the need for OpenRefine-specific mappings, streamlining the data transformation process. Lets review the lookup tables used to create the enums for NetCDF:
= Enums(lut_src_dir=lut_path())
enums print(f'DL enums: {enums.types["DL"]}')
print(f'FILT enums: {enums.types["FILT"]}')
DL enums: {'Not applicable': -1, 'Not available': 0, 'Detected value': 1, 'Detection limit': 2, 'Not detected': 3, 'Derived': 4}
FILT enums: {'Not applicable': -1, 'Not available': 0, 'Yes': 1, 'No': 2}
For the detection limit lookup table (LUT), as shown below, the values required for the OpenRefine CSV format are listed under the ‘name’ column, whereas the enums utilize the ‘name_sanitized’ column. Additionally, for the filtered LUT, also shown below, the values do not align consistently with the OpenRefine CSV format, which uses (Y
, N
, NA
).
= pd.read_excel(detection_limit_lut_path())
dl_lut dl_lut
id | name | name_sanitized | |
---|---|---|---|
0 | -1 | Not applicable | Not applicable |
1 | 0 | Not Available | Not available |
2 | 1 | = | Detected value |
3 | 2 | < | Detection limit |
4 | 3 | ND | Not detected |
5 | 4 | DE | Derived |
= pd.read_excel(filtered_lut_path())
filtered_lut filtered_lut
id | name | |
---|---|---|
0 | -1 | Not applicable |
1 | 0 | Not available |
2 | 1 | Yes |
3 | 2 | No |
We will create OpenRefine specific mappings for the detection limit and filtered data:
RemapToORSpecificMappingsCB remaps the values of the detection limit and filtered data to the OpenRefine CSV format.
RemapToORSpecificMappingsCB
RemapToORSpecificMappingsCB (or_mappings:Dict[str,Dict]={'DL': {0: 'ND', 1: '=', 2: '<'}, 'FILT': {0: 'NA', 1: 'Y', 2: 'N'}}, output_format:str='openrefine_csv', verbose:bool=False)
Convert values using OR mappings if columns exist in dataframe.
Type | Default | Details | |
---|---|---|---|
or_mappings | Dict | {‘DL’: {0: ‘ND’, 1: ‘=’, 2: ‘<’}, ‘FILT’: {0: ‘NA’, 1: ‘Y’, 2: ‘N’}} | Dictionary of column mappings, |
output_format | str | openrefine_csv | |
verbose | bool | False |
= ExtractNetcdfContents(fname_in)
contents = Transformer(
tfm
contents.dfs,=[
cbs
RemapToORSpecificMappingsCB(),
]
)
tfm()
# Loop through each group in the 'dfs' dictionary
for group_name, df in tfm.dfs.items():
# Check if the group dataframe contains any of the columns specified in or_mappings.keys()
= [col for col in or_mappings.keys() if col in df.columns]
relevant_columns if relevant_columns:
# Print the unique values from the relevant columns
print(f"\nUnique values in {group_name} for columns {relevant_columns}:")
for col in relevant_columns:
print(f"{col}: {df[col].unique()}")
else:
print(f"No relevant columns found in {group_name} based on or_mappings keys.")
Unique values in BIOTA for columns ['DL']:
DL: ['<' '=' 'ND']
Unique values in SEAWATER for columns ['DL', 'FILT']:
DL: ['=' '<' 'ND']
FILT: ['NA' 'N' 'Y']
Unique values in SEDIMENT for columns ['DL']:
DL: ['=' '<' 'ND']
Remap to CSV data type format
CSV_DTYPES (defined in configs.ipynb) defines a state for each variable that contains a lookup table (i.e. enums). The state is either ‘decoded’ or ‘encoded’. Lets review the variable states as a DataFrame:
with pd.option_context('display.max_columns', None, 'display.max_colwidth', None):
='index').T) display(pd.DataFrame.from_dict(CSV_DTYPES, orient
AREA | NUCLIDE | UNIT | DL | FILT | COUNT_MET | SAMP_MET | PREP_MET | SPECIES | BODY_PART | SED_TYPE | LAB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
state | decoded | encoded | encoded | decoded | decoded | encoded | encoded | encoded | encoded | encoded | encoded | encoded |
FEEDBACK FOR NEXT VERSION: Should we use the enums in the NetCDF file or the enums in the Marisco package? While they are currently the same, inconsistencies might arise over time. I chose to use the enums in the Marisco package because small changes to the enum descriptions can be easily implemented there, ensuring those updates are reflected in the CSV output.
=Enums(lut_src_dir=lut_path())
enums enums.types.keys()
dict_keys(['AREA', 'BIO_GROUP', 'BODY_PART', 'COUNT_MET', 'DL', 'FILT', 'NUCLIDE', 'PREP_MET', 'SAMP_MET', 'SED_TYPE', 'SPECIES', 'UNIT', 'LAB'])
get_excluded_enums
get_excluded_enums (output_format:str='openrefine_csv')
Get excluded enums based on output format.
DataFormatConversionCB
DataFormatConversionCB (dtypes:Dict, excluded_mappings:Callable=<function get_excluded_enums>, output_format:str='openrefine_csv', verbose:bool=False)
A callback to convert DataFrame enum values between encoded and decoded formats based on specified settings.
Type | Default | Details | |
---|---|---|---|
dtypes | Dict | Dictionary defining data types and states for each lookup table | |
excluded_mappings | Callable | get_excluded_enums | Dictionary of columns to exclude from conversion |
output_format | str | openrefine_csv | |
verbose | bool | False | Flag for verbose output |
= ExtractNetcdfContents(fname_in)
contents = Transformer(
tfm
contents.dfs,=[
cbsvars=CSV_VARS, verbose=True),
RemoveNonCompatibleVariablesCB(
DataFormatConversionCB(=CSV_DTYPES,
dtypes= get_excluded_enums,
excluded_mappings ='openrefine_csv',
output_format=True
verbose
),
]
)
tfm()print('\n')
Removing variables that are not compatible with vars provided.
Removing BIO_GROUP from BIOTA dataset.
Loaded enums: dict_keys(['AREA', 'BIO_GROUP', 'BODY_PART', 'COUNT_MET', 'DL', 'FILT', 'NUCLIDE', 'PREP_MET', 'SAMP_MET', 'SED_TYPE', 'SPECIES', 'UNIT', 'LAB'])
Review all callbacks
= ExtractNetcdfContents(fname_in)
contents = 'openrefine_csv'
output_format = Transformer(
tfm
contents.dfs,=[
cbs
ValidateEnumsCB(= contents,
contents =Enums(lut_src_dir=lut_path())
maris_enums
),vars=CSV_VARS) ,
RemoveNonCompatibleVariablesCB(=output_format),
RemapToORSpecificMappingsCB(output_format
AddTaxonInformationCB(=lut_taxon
fn_lut
),
DecodeTimeCB(),
AddSampleTypeIdColumnCB(),=cfg()),
AddZoteroArchiveLocationCB(contents.global_attrs, cfg
DataFormatConversionCB(=CSV_DTYPES,
dtypes= get_excluded_enums,
excluded_mappings =output_format,
output_format
)
]
)
tfm()for grp in ['SEAWATER', 'BIOTA']:
f"<b>Head of the transformed `{grp}` DataFrame:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
display(tfm.dfs[grp].head())
Head of the transformed SEAWATER
DataFrame:
LON | LAT | SMP_DEPTH | TOT_DEPTH | TIME | NUCLIDE | VALUE | UNIT | UNC | DL | FILT | SAMPLE_TYPE | REF_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 29.333300 | 60.083302 | 0.0 | NaN | 2012-05-23 | 33 | 5.300000 | 1 | 1.696 | = | NA | 1 | 100 |
1 | 29.333300 | 60.083302 | 29.0 | NaN | 2012-05-23 | 33 | 19.900000 | 1 | 3.980 | = | NA | 1 | 100 |
2 | 23.150000 | 59.433300 | 0.0 | NaN | 2012-06-17 | 33 | 25.500000 | 1 | 5.100 | = | NA | 1 | 100 |
3 | 27.983299 | 60.250000 | 0.0 | NaN | 2012-05-24 | 33 | 17.000000 | 1 | 4.930 | = | NA | 1 | 100 |
4 | 27.983299 | 60.250000 | 39.0 | NaN | 2012-05-24 | 33 | 22.200001 | 1 | 3.996 | = | NA | 1 | 100 |
Head of the transformed BIOTA
DataFrame:
LON | LAT | SMP_DEPTH | TIME | NUCLIDE | VALUE | UNIT | UNC | DL | SPECIES | ... | DRYWT | WETWT | PERCENTWT | TAXONNAME | TAXONRANK | TAXONDB | TAXONDBID | TAXONDBURL | SAMPLE_TYPE | REF_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 12.316667 | 54.283333 | NaN | 2012-09-23 | 31 | 0.010140 | 5 | NaN | < | 99 | ... | 174.934433 | 948.0 | 0.18453 | Gadus morhua | species | Wikidata | Q199788 | https://www.wikidata.org/wiki/Q199788 | 2 | 100 |
1 | 12.316667 | 54.283333 | NaN | 2012-09-23 | 4 | 135.300003 | 5 | 4.830210 | = | 99 | ... | 174.934433 | 948.0 | 0.18453 | Gadus morhua | species | Wikidata | Q199788 | https://www.wikidata.org/wiki/Q199788 | 2 | 100 |
2 | 12.316667 | 54.283333 | NaN | 2012-09-23 | 9 | 0.013980 | 5 | NaN | < | 99 | ... | 174.934433 | 948.0 | 0.18453 | Gadus morhua | species | Wikidata | Q199788 | https://www.wikidata.org/wiki/Q199788 | 2 | 100 |
3 | 12.316667 | 54.283333 | NaN | 2012-09-23 | 33 | 4.338000 | 5 | 0.150962 | = | 99 | ... | 174.934433 | 948.0 | 0.18453 | Gadus morhua | species | Wikidata | Q199788 | https://www.wikidata.org/wiki/Q199788 | 2 | 100 |
4 | 12.316667 | 54.283333 | NaN | 2012-09-23 | 31 | 0.009614 | 5 | NaN | < | 99 | ... | 177.935120 | 964.0 | 0.18458 | Gadus morhua | species | Wikidata | Q199788 | https://www.wikidata.org/wiki/Q199788 | 2 | 100 |
5 rows × 21 columns
Decode
decode
decode (fname_in:str, dest_out:str|None=None, output_format:str='openrefine_csv', remap_vars:Dict[str,str]={'LON': 'longitude', 'LAT': 'latitude', 'SMP_DEPTH': 'sampdepth', 'TOT_DEPTH': 'totdepth', 'TIME': 'begperiod', 'AREA': 'area', 'NUCLIDE': 'nuclide_id', 'VALUE': 'activity', 'UNIT': 'unit_id', 'UNC': 'uncertaint', 'DL': 'detection', 'FILT': 'filtered', 'COUNT_MET': 'counmet_id', 'SAMP_MET': 'sampmet_id', 'PREP_MET': 'prepmet_id', 'VOL': 'volume', 'SAL': 'salinity', 'TEMP': 'temperatur', 'SPECIES': 'species_id', 'BODY_PART': 'bodypar_id', 'SED_TYPE': 'sedtype_id', 'TOP': 'sliceup', 'BOTTOM': 'slicedown', 'DRYWT': 'drywt', 'WETWT': 'wetwt', 'PERCENTWT': 'percentwt', 'LAB': 'lab_id', 'PROFILE_ID': 'profile_id', 'SAMPLE_TYPE': 'samptype_id', 'TAXONNAME': 'taxonname', 'TAXONREPNAME': 'taxonrepname', 'TAXONRANK': 'taxonrank', 'TAXONDB': 'taxondb', 'TAXONDBID': 'taxondb_id', 'TAXONDBURL': 'taxondb_url', 'REF_ID': 'ref_id', 'SMP_ID': 'samplelabcode'}, remap_dtypes:Dict[str,str]={'AREA': {'state': 'decoded'}, 'NUCLIDE': {'state': 'encoded'}, 'UNIT': {'state': 'encoded'}, 'DL': {'state': 'decoded'}, 'FILT': {'state': 'decoded'}, 'COUNT_MET': {'state': 'encoded'}, 'SAMP_MET': {'state': 'encoded'}, 'PREP_MET': {'state': 'encoded'}, 'SPECIES': {'state': 'encoded'}, 'BODY_PART': {'state': 'encoded'}, 'SED_TYPE': {'state': 'encoded'}, 'LAB': {'state': 'encoded'}}, verbose:bool=False, **kwargs)
Decode data from NetCDF.
Type | Default | Details | |
---|---|---|---|
fname_in | str | Input file name | |
dest_out | str | None | None | Output file name (optional) |
output_format | str | openrefine_csv | |
remap_vars | Dict | {‘LON’: ‘longitude’, ‘LAT’: ‘latitude’, ‘SMP_DEPTH’: ‘sampdepth’, ‘TOT_DEPTH’: ‘totdepth’, ‘TIME’: ‘begperiod’, ‘AREA’: ‘area’, ‘NUCLIDE’: ‘nuclide_id’, ‘VALUE’: ‘activity’, ‘UNIT’: ‘unit_id’, ‘UNC’: ‘uncertaint’, ‘DL’: ‘detection’, ‘FILT’: ‘filtered’, ‘COUNT_MET’: ‘counmet_id’, ‘SAMP_MET’: ‘sampmet_id’, ‘PREP_MET’: ‘prepmet_id’, ‘VOL’: ‘volume’, ‘SAL’: ‘salinity’, ‘TEMP’: ‘temperatur’, ‘SPECIES’: ‘species_id’, ‘BODY_PART’: ‘bodypar_id’, ‘SED_TYPE’: ‘sedtype_id’, ‘TOP’: ‘sliceup’, ‘BOTTOM’: ‘slicedown’, ‘DRYWT’: ‘drywt’, ‘WETWT’: ‘wetwt’, ‘PERCENTWT’: ‘percentwt’, ‘LAB’: ‘lab_id’, ‘PROFILE_ID’: ‘profile_id’, ‘SAMPLE_TYPE’: ‘samptype_id’, ‘TAXONNAME’: ‘taxonname’, ‘TAXONREPNAME’: ‘taxonrepname’, ‘TAXONRANK’: ‘taxonrank’, ‘TAXONDB’: ‘taxondb’, ‘TAXONDBID’: ‘taxondb_id’, ‘TAXONDBURL’: ‘taxondb_url’, ‘REF_ID’: ‘ref_id’, ‘SMP_ID’: ‘samplelabcode’} | |
remap_dtypes | Dict | {‘AREA’: {‘state’: ‘decoded’}, ‘NUCLIDE’: {‘state’: ‘encoded’}, ‘UNIT’: {‘state’: ‘encoded’}, ‘DL’: {‘state’: ‘decoded’}, ‘FILT’: {‘state’: ‘decoded’}, ‘COUNT_MET’: {‘state’: ‘encoded’}, ‘SAMP_MET’: {‘state’: ‘encoded’}, ‘PREP_MET’: {‘state’: ‘encoded’}, ‘SPECIES’: {‘state’: ‘encoded’}, ‘BODY_PART’: {‘state’: ‘encoded’}, ‘SED_TYPE’: {‘state’: ‘encoded’}, ‘LAB’: {‘state’: ‘encoded’}} | |
verbose | bool | False | |
kwargs | |||
Returns | None | Additional arguments |
= Path('../../_data/output/100-HELCOM-MORS-2024.nc')
fname =fname, dest_out=fname.with_suffix(''), output_format='openrefine_csv') decode(fname_in