from IPython.display import display, Markdown
OSPAR
This data pipeline, known as a “handler” in Marisco terminology, is designed to clean, standardize, and encode OSPAR data into
NetCDF
format. The handler processes raw OSPAR data, applying various transformations and lookups to align it withMARIS
data standards.
Key functions of this handler:
- Cleans and normalizes raw OSPAR data
- Applies standardized nomenclature and units
- Encodes the processed data into
NetCDF
format compatible with MARIS requirements
This handler is a crucial component in the Marisco data processing workflow, ensuring OSPAR data is properly integrated into the MARIS database.
For new MARIS users, please refer to Understanding MARIS Data Formats (NetCDF and Open Refine) for detailed information.
The present notebook pretends to be an instance of Literate Programming in the sense that it is a narrative that includes code snippets that are interspersed with explanations. When a function or a class needs to be exported in a dedicated python module (in our case marisco/handlers/ospar.py
) the code snippet is added to the module using #| export
as provided by the wonderful nbdev library.
Configuration and File Paths
The handler requires several configuration parameters: 1. src_dir: path to the maris-crawlers folder containing the OSPAR data in CSV format. 2. fname_out_nc: Output path and filename for NetCDF file (relative paths supported) 3. zotero_key: Key for retrieving dataset attributes from Zotero
FEEDBACK FOR NEXT VERSION: Update src_dir to use Franck’s repository.
Exported source
= 'https://raw.githubusercontent.com/niallmurphy93/maris-crawlers/refs/heads/main/data/processed/OSPAR'
src_dir = '../../_data/output/191-OSPAR-2024.nc'
fname_out_nc ='LQRA4MMK' # OSPAR MORS zotero key zotero_key
Load data
OSPAR data is provided as a zipped Microsoft Access database. To facilitate easier access and integration, we process this dataset and convert it into .csv
files. These processed files are then made available in the maris-crawlers repository on GitHub. Once converted, the dataset is in a format that is readily compatible with the marisco data pipeline, ensuring seamless data handling and analysis.
read_csv
read_csv (file_name, dir='https://raw.githubusercontent.com/niallmurphy93/maris- crawlers/refs/heads/main/data/processed/OSPAR')
Exported source
= {
default_smp_types 'Biota': 'BIOTA',
'Seawater': 'SEAWATER',
}
Exported source
def read_csv(file_name, dir=src_dir):
= f'{dir}/{file_name}'
file_path return pd.read_csv(file_path)
load_data
load_data (src_url:str, smp_types:dict={'Biota': 'BIOTA', 'Seawater': 'SEAWATER'}, use_cache:bool=False, save_to_cache:bool=False, verbose:bool=False)
Load OSPAR data and return the data in a dictionary of dataframes with the dictionary key as the sample type.
Exported source
def load_data(src_url: str,
dict = default_smp_types,
smp_types: bool = False,
use_cache: bool = False,
save_to_cache: bool = False) -> Dict[str, pd.DataFrame]:
verbose: "Load OSPAR data and return the data in a dictionary of dataframes with the dictionary key as the sample type."
def safe_file_path(url: str) -> str:
"""Safely encode spaces in a URL."""
return url.replace(" ", "%20")
def get_file_path(dir_path: str, file_prefix: str) -> str:
"""Construct the full file path based on directory and file prefix."""
= f"{dir_path}/{file_prefix} data.csv"
file_path return safe_file_path(file_path) if not use_cache else file_path
def load_and_process_csv(file_path: str) -> pd.DataFrame:
"""Load a CSV file and process it."""
if use_cache and not Path(file_path).exists():
if verbose:
print(f"{file_path} not found in cache.")
return pd.DataFrame()
if verbose:
= time.time()
start_time
try:
= pd.read_csv(file_path)
df = df.columns.str.lower()
df.columns if verbose:
print(f"Data loaded from {file_path} in {time.time() - start_time:.2f} seconds.")
return df
except Exception as e:
if verbose:
print(f"Failed to load {file_path}: {e}")
return pd.DataFrame()
def save_to_cache_dir(df: pd.DataFrame, file_prefix: str):
"""Save the DataFrame to the cache directory."""
= cache_path()
cache_dir = f"{cache_dir}/{file_prefix} data.csv"
cache_file_path =False)
df.to_csv(cache_file_path, indexif verbose:
print(f"Data saved to cache at {cache_file_path}")
= {}
data for file_prefix, smp_type in smp_types.items():
= cache_path() if use_cache else src_url
dir_path = get_file_path(dir_path, file_prefix)
file_path = load_and_process_csv(file_path)
df
if save_to_cache and not df.empty:
save_to_cache_dir(df, file_prefix)
= df
data[smp_type]
return data
=True, verbose=True) load_data(src_dir, save_to_cache
Data loaded from https://raw.githubusercontent.com/niallmurphy93/maris-crawlers/refs/heads/main/data/processed/OSPAR/Biota%20data.csv in 0.43 seconds.
Data saved to cache at /home/niallmurphy93/.marisco/cache/Biota data.csv
Data loaded from https://raw.githubusercontent.com/niallmurphy93/maris-crawlers/refs/heads/main/data/processed/OSPAR/Seawater%20data.csv in 0.39 seconds.
Data saved to cache at /home/niallmurphy93/.marisco/cache/Seawater data.csv
{'BIOTA': id contracting party rsc sub-division station id \
0 1 Belgium 8 Kloosterzande-Schelde
1 2 Belgium 8 Kloosterzande-Schelde
2 3 Belgium 8 Kloosterzande-Schelde
3 4 Belgium 8 Kloosterzande-Schelde
4 5 Belgium 8 Kloosterzande-Schelde
... ... ... ... ...
15946 98058 Sweden 12 Ringhals (R22)
15947 98059 Sweden 12 Ringhals (R23)
15948 98060 Sweden 11 SW7
15949 98061 Sweden 11 SW6a
15950 98062 Sweden 12 Ringhals (R25)
sample id latd latm lats latdir longd ... sampling date \
0 DA 17531 51 23.0 36.0 N 4 ... 03/03/10 00:00:00
1 DA 17534 51 23.0 36.0 N 4 ... 06/14/10 00:00:00
2 DA 17537 51 23.0 36.0 N 4 ... 09/27/10 00:00:00
3 DA 17540 51 23.0 36.0 N 4 ... 12/08/10 00:00:00
4 DA 17531 51 23.0 36.0 N 4 ... 03/03/10 00:00:00
... ... ... ... ... ... ... ... ...
15946 NaN 57 15.0 9.0 N 12 ... 08/09/22 00:00:00
15947 NaN 57 18.0 23.0 N 12 ... 09/23/22 00:00:00
15948 NaN 58 36.0 12.0 N 11 ... 11/07/22 00:00:00
15949 NaN 57 18.0 9.0 N 11 ... 09/20/22 00:00:00
15950 NaN 57 20.0 7.0 N 12 ... 09/02/22 00:00:00
nuclide value type activity or mda uncertainty unit \
0 137Cs < 0.326416 NaN Bq/kg f.w.
1 137Cs < 0.442704 NaN Bq/kg f.w.
2 137Cs < 0.412989 NaN Bq/kg f.w.
3 137Cs < 0.202768 NaN Bq/kg f.w.
4 226Ra < 0.652833 NaN Bq/kg f.w.
... ... ... ... ... ...
15946 137Cs = 0.384000 0.024192 Bq/kg f.w.
15947 137Cs = 0.456000 0.024168 Bq/kg f.w.
15948 137Cs = 0.122000 0.062000 Bq/kg f.w.
15949 137Cs < 0.310000 NaN Bq/kg f.w.
15950 137Cs = 0.306000 0.014382 Bq/kg f.w.
data provider measurement comment \
0 SCK•CEN NaN
1 SCK•CEN NaN
2 SCK•CEN NaN
3 SCK•CEN NaN
4 SCK•CEN NaN
... ... ...
15946 Swedish Radiation Safety Authority NaN
15947 Swedish Radiation Safety Authority NaN
15948 Swedish Radiation Safety Authority NaN
15949 Swedish Radiation Safety Authority NaN
15950 Swedish Radiation Safety Authority NaN
sample comment reference comment
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
... ... ...
15946 converted from dw to fw NaN
15947 converted from dw to fw NaN
15948 NaN NaN
15949 NaN NaN
15950 converted from dw to fw NaN
[15951 rows x 27 columns],
'SEAWATER': id contracting party rsc sub-division station id sample id \
0 1 Belgium 8.0 Belgica-W01 WNZ 01
1 2 Belgium 8.0 Belgica-W02 WNZ 02
2 3 Belgium 8.0 Belgica-W03 WNZ 03
3 4 Belgium 8.0 Belgica-W04 WNZ 04
4 5 Belgium 8.0 Belgica-W05 WNZ 05
... ... ... ... ... ...
19188 120364 Ireland 4.0 N2 NaN
19189 120365 Ireland 4.0 N3 NaN
19190 120366 Ireland 4.0 N8 NaN
19191 120367 Ireland 4.0 N9 NaN
19192 120368 Ireland 4.0 N10 NaN
latd latm lats latdir longd ... sampling date nuclide \
0 51 22.0 31.0 N 3 ... 01/27/10 00:00:00 137Cs
1 51 13.0 25.0 N 2 ... 01/27/10 00:00:00 137Cs
2 51 11.0 4.0 N 2 ... 01/27/10 00:00:00 137Cs
3 51 25.0 13.0 N 3 ... 01/27/10 00:00:00 137Cs
4 51 24.0 58.0 N 2 ... 01/26/10 00:00:00 137Cs
... ... ... ... ... ... ... ... ...
19188 53 36.0 0.0 N 5 ... NaN NaN
19189 53 44.0 0.0 N 5 ... NaN NaN
19190 53 39.0 0.0 N 5 ... NaN NaN
19191 53 53.0 0.0 N 5 ... NaN NaN
19192 53 52.0 0.0 N 5 ... NaN NaN
value type activity or mda uncertainty unit data provider \
0 < 0.20 NaN Bq/l SCK•CEN
1 < 0.27 NaN Bq/l SCK•CEN
2 < 0.26 NaN Bq/l SCK•CEN
3 < 0.25 NaN Bq/l SCK•CEN
4 < 0.20 NaN Bq/l SCK•CEN
... ... ... ... ... ...
19188 NaN NaN NaN NaN NaN
19189 NaN NaN NaN NaN NaN
19190 NaN NaN NaN NaN NaN
19191 NaN NaN NaN NaN NaN
19192 NaN NaN NaN NaN NaN
measurement comment sample comment \
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
... ... ...
19188 2021 data The Irish Navy attempted a few times to collec...
19189 2021 data The Irish Navy attempted a few times to collec...
19190 2021 data The Irish Navy attempted a few times to collec...
19191 2021 data The Irish Navy attempted a few times to collec...
19192 2021 data The Irish Navy attempted a few times to collec...
reference comment
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
... ...
19188 NaN
19189 NaN
19190 NaN
19191 NaN
19192 NaN
[19193 rows x 25 columns]}
Nuclide Name Normalization
FEEDBACK FOR NEXT VERSION: In the lookup at nuc_lut_path, do we need nc_name? We used nc_name when we were pivoting the table from long to wide format. Should we remove it?
We are standardizing the nuclide names in the OSPAR dataset to align with the standardized names provided in the MARISCO lookup table. The lookup process utilizes three key columns: - nuclide_id
: This serves as a unique identifier for each nuclide. - nuclide
: Represents the standardized name of the nuclide as per our conventions. - nc_name
: Denotes the corresponding name used in NetCDF files. Below, we will examine the structure and contents of the lookup table:
= pd.read_excel(nuc_lut_path())
nuc_lut_df nuc_lut_df.head()
nuclide_id | nuclide | atomicnb | massnb | nusymbol | half_life | hl_unit | nc_name | |
---|---|---|---|---|---|---|---|---|
0 | -1 | NOT APPLICABLE | NaN | NaN | NaN | NaN | NaN | NOT APPLICABLE |
1 | 0 | NOT AVAILABLE | 0.0 | 0.0 | 0 | 0.00 | - | NOT AVAILABLE |
2 | 1 | TRITIUM | 1.0 | 3.0 | 3H | 12.35 | Y | h3 |
3 | 2 | BERYLLIUM | 4.0 | 7.0 | 7Be | 53.30 | D | be7 |
4 | 3 | CARBON | 6.0 | 14.0 | 14C | 5730.00 | Y | c14 |
OSPAR defines the the nuclide measured in the nuclide
column. However, as shown below, the nuclide names are not standardized.
= load_data(src_dir, use_cache=True, verbose=True)
dfs = get_unique_across_dfs(dfs, 'nuclide', as_df=True)
df df.T
Data loaded from /home/niallmurphy93/.marisco/cache/Biota data.csv in 0.05 seconds.
Data loaded from /home/niallmurphy93/.marisco/cache/Seawater data.csv in 0.04 seconds.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
value | 210Po | 239,240Pu | 228Ra | Cs-137 | 3H | 99Tc | 226Ra | 239, 240 Pu | NaN | CS-137 | 137Cs | 137Cs | 210Po | 99Tc | 238Pu | 210Pb | 241Am | 99Tc |
Lower & strip nuclide names
To streamline the process of standardizing nuclide data, we employ the LowerStripNameCB
callback. This function is applied to each DataFrame within our dictionary of DataFrames. Specifically, LowerStripNameCB
simplifies the nuclide names by converting them to lowercase and removing any leading or trailing whitespace.
= load_data(src_dir, use_cache=True, verbose=True)
dfs = Transformer(dfs, cbs=[LowerStripNameCB(col_src='nuclide', col_dst='nuclide')])
tfm =tfm()
dfs_outputfor key, df in dfs_output.items():
print(f'{key} nuclides: ')
print(df['nuclide'].unique())
Data loaded from /home/niallmurphy93/.marisco/cache/Biota data.csv in 0.04 seconds.
Data loaded from /home/niallmurphy93/.marisco/cache/Seawater data.csv in 0.04 seconds.
BIOTA nuclides:
['137cs' '226ra' '228ra' '239,240pu' '99tc' '210po' '210pb' '3h' 'cs-137'
'238pu' '239, 240 pu' '241am']
SEAWATER nuclides:
['137cs' '239,240pu' '226ra' '228ra' '99tc' '3h' '210po' '210pb' nan]
Remap nuclide names to MARIS data formats
FEEDBACK TO DATA PROVIDER: The nuclide
column has inconsistent naming. E.g:
Cs-137
,137Cs
orCS-137
239, 240 pu
or239,240 pu
ra-226
and226ra
See below:
='nuclide', as_df=True).T get_unique_across_dfs(dfs, col_name
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
value | 210Po | 239,240Pu | 228Ra | Cs-137 | 3H | 99Tc | 226Ra | 239, 240 Pu | NaN | CS-137 | 137Cs | 137Cs | 210Po | 99Tc | 238Pu | 210Pb | 241Am | 99Tc |
Next, we map nuclide names used by OSPAR to the MARIS standard nuclide names.
Remapping data provider nomenclatures to MARIS standards is a recurrent operation and is done in a semi-automated manner according to the following pattern:
- Inspect data provider nomenclature:
- Match automatically against MARIS nomenclature (using a fuzzy matching algorithm);
- Fix potential mismatches;
- Apply the lookup table to the dataframe.
We will refer to this process as IMFA (Inspect, Match, Fix, Apply).
Let’s now create an instance of a fuzzy matching algorithm Remapper
. This instance will align the nuclide names from the OSPAR dataset with the MARIS standard nuclide names, as defined in the lookup table located at nuc_lut_path
and previously shown as nuc_lut_df
.
= Remapper(provider_lut_df=get_unique_across_dfs(dfs_output, col_name='nuclide', as_df=True),
remapper =nuc_lut_path,
maris_lut_fn='nuclide_id',
maris_col_id='nc_name',
maris_col_name='value',
provider_col_to_match='value',
provider_col_key='nuclides_ospar.pkl') fname_cache
Now, we can automatically match the OSPAR nuclide names to the MARIS standard. The match_score column helps us evaluate the results.
=True)
remapper.generate_lookup_table(as_df=1, verbose=True) remapper.select_match(match_score_threshold
Processing: 0%| | 0/13 [00:00<?, ?it/s]Processing: 100%|██████████| 13/13 [00:00<00:00, 50.27it/s]
1 entries matched the criteria, while 12 entries had a match score of 1 or higher.
matched_maris_name | source_name | match_score | |
---|---|---|---|
source_key | |||
239, 240 pu | pu240 | 239, 240 pu | 8 |
239,240pu | pu240 | 239,240pu | 6 |
228ra | u235 | 228ra | 4 |
137cs | i133 | 137cs | 4 |
210pb | ru106 | 210pb | 4 |
241am | pu241 | 241am | 4 |
226ra | u234 | 226ra | 4 |
210po | ru106 | 210po | 4 |
238pu | u238 | 238pu | 3 |
99tc | tu | 99tc | 3 |
3h | tu | 3h | 2 |
cs-137 | cs137 | cs-137 | 1 |
We now manually review the unmatched nuclide names and construct a dictionary to map them to the MARIS standard.
The dictionary fixes_nuclide_names
applies manual corrections to the nuclide names before the remapping process begins. The generate_lookup_table
function constructs a lookup table for this purpose and includes an overwrite
parameter, set to True
by default. When activated, this parameter enables the function to update the existing cache with a new pickle file containing the updated lookup table. We are now prepared to test the remapping process.
=True, fixes=fixes_nuclide_names)
remapper.generate_lookup_table(as_dflen(remapper.select_match(match_score_threshold=1)), 0) fc.test_eq(
Processing: 0%| | 0/13 [00:00<?, ?it/s]Processing: 100%|██████████| 13/13 [00:00<00:00, 48.19it/s]
To view all remapped nuclides in a DataFrame, set the match_score_threshold
to 0 and enable as_df
. Disabling as_df
provides a more detailed response that includes the matched_id. This matched_id serves as the unique integer key in the lookup table, establishing a one-to-one relationship between each integer and the standardized MARIS nuclide names.
=False
as_df=as_df, fixes=fixes_nuclide_names)
remapper.generate_lookup_table(as_df=remapper.select_match(match_score_threshold=0, verbose=True)
matchesif as_df:
display(matches.T)else:
print(matches)
Processing: 0%| | 0/13 [00:00<?, ?it/s]Processing: 100%|██████████| 13/13 [00:00<00:00, 52.95it/s]
0 entries matched the criteria, while 13 entries had a match score of 0 or higher.
{'228ra': Match(matched_id=54, matched_maris_name='ra228', source_name='228ra', match_score=0), '3h': Match(matched_id=1, matched_maris_name='h3', source_name='3h', match_score=0), '137cs': Match(matched_id=33, matched_maris_name='cs137', source_name='137cs', match_score=0), 'cs-137': Match(matched_id=33, matched_maris_name='cs137', source_name='cs-137', match_score=0), nan: Match(matched_id=-1, matched_maris_name='Unknown', source_name=nan, match_score=0), '238pu': Match(matched_id=67, matched_maris_name='pu238', source_name='238pu', match_score=0), '210pb': Match(matched_id=41, matched_maris_name='pb210', source_name='210pb', match_score=0), '241am': Match(matched_id=72, matched_maris_name='am241', source_name='241am', match_score=0), '226ra': Match(matched_id=53, matched_maris_name='ra226', source_name='226ra', match_score=0), '99tc': Match(matched_id=15, matched_maris_name='tc99', source_name='99tc', match_score=0), '210po': Match(matched_id=47, matched_maris_name='po210', source_name='210po', match_score=0), '239,240pu': Match(matched_id=77, matched_maris_name='pu239_240_tot', source_name='239,240pu', match_score=0), '239, 240 pu': Match(matched_id=77, matched_maris_name='pu239_240_tot', source_name='239, 240 pu', match_score=0)}
The nuclide names have been successfully remapped. We now create a callback named RemapNuclideNameCB
to translate the OSPAR dataset’s nuclide names into the standard nuclide_id
s used by MARIS. This callback employs the lut_nuclides
lambda function, which provides the required lookup table. Note that the overwrite=False
parameter is specified in the Remapper
constructor of the lut_nuclides
lambda function to utilize the cached version.
RemapNuclideNameCB
RemapNuclideNameCB (fn_lut:Callable, col_name:str)
Remap data provider nuclide names to standardized MARIS nuclide names.
Type | Details | |
---|---|---|
fn_lut | Callable | Function that returns the lookup table dictionary |
col_name | str | Column name to remap |
Let’s see it in action, along with the LowerStripNameCB
callback:
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm ='nuclide', col_dst='nuclide'),
LowerStripNameCB(col_src='nuclide')
RemapNuclideNameCB(lut_nuclides, col_name
])= tfm()
dfs_out
# For instance
for key in dfs_out.keys():
print(f'Unique nuclide_ids for {key} NUCLIDE column: ', dfs_out[key]['NUCLIDE'].unique())
Unique nuclide_ids for BIOTA NUCLIDE column: [33 53 54 77 15 47 41 1 67 72]
Unique nuclide_ids for SEAWATER NUCLIDE column: [33 77 53 54 15 1 47 41 -1]
Standardize Time
FEEDBACK TO DATA PROVIDER: ‘NaN’ values found for sampling date
column in the SEAWATER
dataset.
= load_data(src_dir, use_cache=True)
dfs
for key in dfs.keys():
if dfs[key]['sampling date'].isnull().sum() > 0:
print(f"NaN values found for 'sampling date' in {key} dataset. A total of {dfs[key]['sampling date'].isnull().sum()} NaN values found.")
print(f'Example:')
with pd.option_context('display.max_columns', None):
'sampling date'].isnull()].head(2))
display(dfs[key][dfs[key][else:
print(f"No NaN values found for 'sampling date' in {key} dataset.")
No NaN values found for 'sampling date' in BIOTA dataset.
NaN values found for 'sampling date' in SEAWATER dataset. A total of 10 NaN values found.
Example:
id | contracting party | rsc sub-division | station id | sample id | latd | latm | lats | latdir | longd | longm | longs | longdir | sample type | sampling depth | sampling date | nuclide | value type | activity or mda | uncertainty | unit | data provider | measurement comment | sample comment | reference comment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14776 | 97948 | Sweden | 11.0 | SW7 | 1 | 58 | 36.0 | 12.0 | N | 11 | 14.0 | 42.0 | E | WATER | 1.0 | NaN | 3H | NaN | NaN | NaN | Bq/l | Swedish Radiation Safety Authority | no 3H this year due to broken LSC | NaN | NaN |
14780 | 97952 | Sweden | 12.0 | Ringhals (R35) | 7 | 57 | 14.0 | 5.0 | N | 11 | 56.0 | 8.0 | E | WATER | 1.0 | NaN | 3H | NaN | NaN | NaN | Bq/l | Swedish Radiation Safety Authority | no 3H this year due to broken LSC | NaN | NaN |
We create a callback that remaps the date time format in the dictionary of DataFrames (i.e. %m/%d/%y %H:%M:%S
) to a data time object and in the process handle missing date and times.
ParseTimeCB
ParseTimeCB (col_src:dict={'BIOTA': 'sampling date', 'SEAWATER': 'sampling date'}, col_dst:str='TIME', format:str='%m/%d/%y %H:%M:%S')
Parse the time format in the dataframe and check for inconsistencies.
Type | Default | Details | |
---|---|---|---|
col_src | dict | {‘BIOTA’: ‘sampling date’, ‘SEAWATER’: ‘sampling date’} | Column name to remap |
col_dst | str | TIME | Column name to remap |
format | str | %m/%d/%y %H:%M:%S | Time format |
Apply the transformer for callback ParseTimeCB
.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm
ParseTimeCB(),
CompareDfsAndTfmCB(dfs)])
tfm()
"<b> Row Count Comparison Before and After Transformation:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
display(pd.DataFrame.from_dict(tfm.compare_stats))
"<b> Example of parsed time column:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
'SEAWATER']['TIME'].head(2)) display(tfm.dfs[
10 invalid rows found in group 'SEAWATER' during time parsing callback (ParseTimeCB).
Row Count Comparison Before and After Transformation:
BIOTA | SEAWATER | |
---|---|---|
Number of rows in original dataframes (dfs): | 15951 | 19193 |
Number of rows in transformed dataframes (tfm.dfs): | 15951 | 19183 |
Number of rows removed (tfm.dfs_removed): | 0 | 10 |
Example of parsed time column:
0 2010-01-27
1 2010-01-27
Name: TIME, dtype: datetime64[ns]
The NetCDF time format requires the time to be encoded as number of milliseconds since a time of origin. In our case the time of origin is 1970-01-01
as indicated in configs.ipynb
CONFIFS['units']['time']
dictionary.
EncodeTimeCB
transforms the datetime object from ParseTimeCB
into the MARIS NetCDF time format.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[ParseTimeCB(),
tfm
EncodeTimeCB(),
CompareDfsAndTfmCB(dfs)
])
tfm()
"<b> Row Count Comparison Before and After Transformation:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
display(pd.DataFrame.from_dict(tfm.compare_stats))
10 invalid rows found in group 'SEAWATER' during time parsing callback (ParseTimeCB).
Row Count Comparison Before and After Transformation:
BIOTA | SEAWATER | |
---|---|---|
Number of rows in original dataframes (dfs): | 15951 | 19193 |
Number of rows in transformed dataframes (tfm.dfs): | 15951 | 19183 |
Number of rows removed (tfm.dfs_removed): | 0 | 10 |
Sanitize value
We create a callback, SanitizeValueCB
, to consolidate measurement values into a single column named VALUE
and remove any NaN entries.
SanitizeValueCB
SanitizeValueCB (value_col:dict={'BIOTA': 'activity or mda', 'SEAWATER': 'activity or mda'})
Sanitize value by removing blank entries and populating value
column.
Type | Default | Details | |
---|---|---|---|
value_col | dict | {‘BIOTA’: ‘activity or mda’, ‘SEAWATER’: ‘activity or mda’} | Column name to sanitize |
Exported source
= {'BIOTA': 'activity or mda', 'SEAWATER': 'activity or mda'} value_cols
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[SanitizeValueCB(),
tfm
CompareDfsAndTfmCB(dfs)])
tfm()
"<b> Example of VALUE column:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
'SEAWATER'][['VALUE']].head())
display(tfm.dfs[
"<b> Row Count Comparison Before and After Transformation:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
display(pd.DataFrame.from_dict(tfm.compare_stats))
"<b> Example of removed data:</b>"))
display(Markdown(with pd.option_context('display.max_columns', None):
'SEAWATER'].head(2)) display(tfm.dfs_removed[
10 invalid rows found in group 'SEAWATER' during sanitize value callback.
Example of VALUE column:
VALUE | |
---|---|
0 | 0.20 |
1 | 0.27 |
2 | 0.26 |
3 | 0.25 |
4 | 0.20 |
Row Count Comparison Before and After Transformation:
BIOTA | SEAWATER | |
---|---|---|
Number of rows in original dataframes (dfs): | 15951 | 19193 |
Number of rows in transformed dataframes (tfm.dfs): | 15951 | 19183 |
Number of rows removed (tfm.dfs_removed): | 0 | 10 |
Example of removed data:
id | contracting party | rsc sub-division | station id | sample id | latd | latm | lats | latdir | longd | longm | longs | longdir | sample type | sampling depth | sampling date | nuclide | value type | activity or mda | uncertainty | unit | data provider | measurement comment | sample comment | reference comment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
14776 | 97948 | Sweden | 11.0 | SW7 | 1 | 58 | 36.0 | 12.0 | N | 11 | 14.0 | 42.0 | E | WATER | 1.0 | NaN | 3H | NaN | NaN | NaN | Bq/l | Swedish Radiation Safety Authority | no 3H this year due to broken LSC | NaN | NaN |
14780 | 97952 | Sweden | 12.0 | Ringhals (R35) | 7 | 57 | 14.0 | 5.0 | N | 11 | 56.0 | 8.0 | E | WATER | 1.0 | NaN | 3H | NaN | NaN | NaN | Bq/l | Swedish Radiation Safety Authority | no 3H this year due to broken LSC | NaN | NaN |
Normalize uncertainty
We create a callback, NormalizeUncCB
, to standardize the uncertainty value to the MARIS format. For each sample type in the OSPAR dataset, the reported uncertainty is given as an expanded uncertainty with a coverage factor 𝑘=2
. For further details, refer to the OSPAR reporting guidelines. In MARIS the uncertainty values are reported as standard uncertainty with a coverage factor 𝑘=1
.
NormalizeUncCB
callback normalizes the uncertainty using the following lambda
function:
NormalizeUncCB
NormalizeUncCB (col_unc:dict={'BIOTA': 'uncertainty', 'SEAWATER': 'uncertainty'}, fn_convert_unc:Callable=<function <lambda>>)
Normalize uncertainty values in DataFrames.
Type | Default | Details | |
---|---|---|---|
col_unc | dict | {‘BIOTA’: ‘uncertainty’, ‘SEAWATER’: ‘uncertainty’} | Column name to normalize |
fn_convert_unc | Callable | Function correcting coverage factor |
Exported source
= {'BIOTA': 'uncertainty', 'SEAWATER': 'uncertainty'} unc_cols
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm
SanitizeValueCB(),
NormalizeUncCB()
])
tfm()
"<b> Example of VALUE and UNC columns:</b>"))
display(Markdown(for grp in ['SEAWATER', 'BIOTA']:
print(f'\n{grp}:')
print(tfm.dfs[grp][['VALUE', 'UNC']])
10 invalid rows found in group 'SEAWATER' during sanitize value callback.
Example of VALUE and UNC columns:
SEAWATER:
VALUE UNC
0 0.200000 NaN
1 0.270000 NaN
2 0.260000 NaN
3 0.250000 NaN
4 0.200000 NaN
... ... ...
19183 0.000005 2.600000e-07
19184 6.152000 3.076000e-01
19185 0.005390 1.078000e-03
19186 0.001420 2.840000e-04
19187 6.078000 3.039000e-01
[19183 rows x 2 columns]
BIOTA:
VALUE UNC
0 0.326416 NaN
1 0.442704 NaN
2 0.412989 NaN
3 0.202768 NaN
4 0.652833 NaN
... ... ...
15946 0.384000 0.012096
15947 0.456000 0.012084
15948 0.122000 0.031000
15949 0.310000 NaN
15950 0.306000 0.007191
[15951 rows x 2 columns]
Feedback to Data Provider: The SEAWATER
dataset includes instances where the uncertainty values significantly exceed the corresponding measurement values. While such occurrences are not inherently erroneous, they merit attention and may warrant further verification.
To demonstrate instances where the uncertainty significantly surpasses the measurement values, we will initially compute the ‘relative uncertainty’ as a percentage for the seawater dataset.
= load_data(src_dir, use_cache=True)
dfs for grp in ['SEAWATER', 'BIOTA']:
'relative_uncertainty'] = (
tfm.dfs[grp][# Divide 'uncertainty' by 'value'
'uncertainty'] / tfm.dfs[grp]['activity or mda'])
(tfm.dfs[grp][# Multiply by 100 to convert to percentage
* 100)
Now we will retrieve all rows where the relative uncertainty exceeds 100% for the seawater dataset.
= 100
threshold ='SEAWATER'
grp=['id', 'contracting party', 'nuclide', 'value type', 'activity or mda', 'uncertainty', 'unit', 'relative_uncertainty']
cols_to_show=tfm.dfs[grp][cols_to_show][tfm.dfs[grp]['relative_uncertainty'] > threshold]
df
print(f'Number of rows where relative uncertainty is greater than {threshold}%: \n {df.shape[0]} \n')
f"<b> Example of data with relative uncertainty greater than {threshold}%:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
display(df.head())
Number of rows where relative uncertainty is greater than 100%:
95
Example of data with relative uncertainty greater than 100%:
id | contracting party | nuclide | value type | activity or mda | uncertainty | unit | relative_uncertainty | |
---|---|---|---|---|---|---|---|---|
969 | 11075 | United Kingdom | 137Cs | = | 0.0028 | 0.3276 | Bq/l | 11700.0 |
971 | 11077 | United Kingdom | 137Cs | = | 0.0029 | 0.3364 | Bq/l | 11600.0 |
973 | 11079 | United Kingdom | 137Cs | = | 0.0025 | 0.3325 | Bq/l | 13300.0 |
975 | 11081 | United Kingdom | 137Cs | = | 0.0025 | 0.3450 | Bq/l | 13800.0 |
977 | 11083 | United Kingdom | 137Cs | = | 0.0038 | 0.3344 | Bq/l | 8800.0 |
FEEDBACK TO DATA PROVIDER: The BIOTA
dataset includes instances where the uncertainty values significantly exceed the corresponding measurement values. While such occurrences are not inherently erroneous, they merit attention and may warrant further verification.
Now we will retrieve all rows where the relative uncertainty exceeds 100% for the biota dataset.
= 100
threshold ='BIOTA'
grp=['id', 'contracting party', 'nuclide', 'value type', 'activity or mda', 'uncertainty', 'unit', 'relative_uncertainty']
cols_to_show=tfm.dfs[grp][cols_to_show][tfm.dfs[grp]['relative_uncertainty'] > threshold]
df
print(f'Number of rows where relative uncertainty is greater than {threshold}%: \n {df.shape[0]} \n')
f"<b> Example of data with relative uncertainty greater than {threshold}%:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
display(df.head())
Number of rows where relative uncertainty is greater than 100%:
100
Example of data with relative uncertainty greater than 100%:
id | contracting party | nuclide | value type | activity or mda | uncertainty | unit | relative_uncertainty | |
---|---|---|---|---|---|---|---|---|
249 | 3101 | Norway | 137Cs | = | 0.0500 | 0.1000 | Bq/kg f.w. | 200.000000 |
306 | 3158 | Norway | 137Cs | = | 0.1500 | 0.1600 | Bq/kg f.w. | 106.666667 |
775 | 8152 | Norway | 137Cs | = | 0.0340 | 0.0500 | Bq/kg f.w. | 147.058824 |
788 | 8165 | Norway | 137Cs | = | 0.0300 | 0.0500 | Bq/kg f.w. | 166.666667 |
1839 | 19571 | Belgium | 239,240Pu | = | 0.0074 | 0.0093 | Bq/kg f.w. | 125.675676 |
Remap units
FEEDBACK TO DATA PROVIDER: The Unit
column contains NaN
values for the SEAWATER
dataset, as shown below.
=2
number_rows_to_show=dfs['SEAWATER'][dfs['SEAWATER']['unit'].isnull()]
dfprint(f'Number of rows with NaN in unit column: \n {df.shape[0]} \n')
f"<b> Example of data with NaN in unit column:</b>"))
display(Markdown(with pd.option_context('display.max_columns', None):
display(df.head(number_rows_to_show))
Number of rows with NaN in unit column:
8
Example of data with NaN in unit column:
id | contracting party | rsc sub-division | station id | sample id | latd | latm | lats | latdir | longd | longm | longs | longdir | sample type | sampling depth | sampling date | nuclide | value type | activity or mda | uncertainty | unit | data provider | measurement comment | sample comment | reference comment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16161 | 120369 | Ireland | 1.0 | Salthill | NaN | 53 | 15.0 | 40.0 | N | 9 | 4.0 | 15.0 | W | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 2021 data | Woodstown (County Waterford) and Salthill (Cou... | NaN |
16162 | 120370 | Ireland | 1.0 | Woodstown | NaN | 52 | 11.0 | 55.0 | N | 6 | 58.0 | 47.0 | W | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Let’s inspect the unique units used by OSPAR:
='unit', as_df=True) get_unique_across_dfs(dfs, col_name
index | value | |
---|---|---|
0 | 0 | NaN |
1 | 1 | Bq/l |
2 | 2 | Bq/kg f.w. |
3 | 3 | BQ/L |
4 | 4 | Bq/L |
FEEDBACK TO DATA PROVIDER: Standardizing the units would simplify data processing, as the units are not consistent across the dataset. For example, BQ/L
, Bq/l
, and Bq/L
are used interchangeably.
We will establish unit renaming rules for the OSPAR dataset:
Now we will create a callback, RemapUnitCB
, to remap the units in the dataframes. For the SEAWATER
dataset, we will set a default unit of Bq/l
.
RemapUnitCB
RemapUnitCB (lut:Dict[str,str], default_units:Dict[str,str]={'SEAWATER': 'Bq/l', 'BIOTA': 'Bq/kg f.w.'}, verbose:bool=False)
Callback to update DataFrame ‘UNIT’ columns based on a lookup table.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[SanitizeValueCB(), # Remove blank value entries (also removes NaN values in Unit column)
tfm =True),
RemapUnitCB(renaming_unit_rules, verbose
CompareDfsAndTfmCB(dfs)
])
tfm()
"<b> Row Count Comparison Before and After Transformation:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
display(pd.DataFrame.from_dict(tfm.compare_stats))
print('Unique Unit values:')
for grp in ['BIOTA', 'SEAWATER']:
print(f"{grp}: {tfm.dfs[grp]['UNIT'].unique()}")
10 invalid rows found in group 'SEAWATER' during sanitize value callback.
Row Count Comparison Before and After Transformation:
BIOTA | SEAWATER | |
---|---|---|
Number of rows in original dataframes (dfs): | 15951 | 19193 |
Number of rows in transformed dataframes (tfm.dfs): | 15951 | 19183 |
Number of rows removed (tfm.dfs_removed): | 0 | 10 |
Unique Unit values:
BIOTA: [5]
SEAWATER: [1]
Remap detection limit
FEEDBACK TO DATA PROVIDER: The Value type
column contains numerous nan
entries.
# Count the number of NaN entries in the 'value type' column for 'SEAWATER'
= dfs['SEAWATER']['value type'].isnull().sum()
na_count_seawater print(f"Number of NaN 'Value type' entries in 'SEAWATER': {na_count_seawater}")
# Count the number of NaN entries in the 'value type' column for 'BIOTA'
= dfs['BIOTA']['value type'].isnull().sum()
na_count_biota print(f"Number of NaN 'Value type' entries in 'BIOTA': {na_count_biota}")
Number of NaN 'Value type' entries in 'SEAWATER': 64
Number of NaN 'Value type' entries in 'BIOTA': 23
In the OSPAR dataset, the detection limit is denoted by <
in the Value type
column. When the Value type
is <
, the Activity or MDA
column specifies the detection limit. Conversely, when the Value type
is =
, it indicates an actual measurement in theActivity or MDA
column. Let’s review the entries in the Value type
column for the OSPAR dataset:
for grp in dfs.keys():
print(f'{grp}:')
print(tfm.dfs[grp]['value type'].unique())
BIOTA:
['<' '=' nan]
SEAWATER:
['<' '=' nan]
In MARIS the Detection limits are encoded as follows:
pd.read_excel(detection_limit_lut_path())
id | name | name_sanitized | |
---|---|---|---|
0 | -1 | Not applicable | Not applicable |
1 | 0 | Not Available | Not available |
2 | 1 | = | Detected value |
3 | 2 | < | Detection limit |
4 | 3 | ND | Not detected |
5 | 4 | DE | Derived |
We can create a lambda function to retrieve the MARIS lookup table.
We can define the columns of interest in both the SEAWATER
and BIOTA
DataFrames for the detection limit column.
We now create a callback RemapDetectionLimitCB
to remap OSPAR detection limit values to MARIS formatted values using the lookup table. Since the dataset contains ‘nan’ entries for the detection limit column, we will create a condition to set the detection limit to ‘=’ when the value and uncertainty columns are present and the current detection limit value is not in the lookup keys.
RemapDetectionLimitCB
RemapDetectionLimitCB (coi:dict, fn_lut:Callable)
Remap detection limit values to MARIS format using a lookup table.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[SanitizeValueCB(),
tfm
NormalizeUncCB(), =True),
RemapUnitCB(renaming_unit_rules, verbose
RemapDetectionLimitCB(coi_dl, lut_dl)])
tfm()for grp in ['BIOTA', 'SEAWATER']:
print(f"{grp}: {tfm.dfs[grp]['DL'].unique()}")
10 invalid rows found in group 'SEAWATER' during sanitize value callback.
BIOTA: [2 1]
SEAWATER: [2 1]
Remap Biota species
The OSPAR dataset contains biota species information in the Species
column of the biota DataFrame. To ensure consistency with MARIS standards, it is necessary to remap these species names. We will employ a similar approach to that used for standardizing nuclide names, IMFA (Inspect, Match, Fix, Apply).
We first inspect the unique Species
values of the OSPAR Biota dataset:
= load_data(src_dir, use_cache=True)
dfs with pd.option_context('display.max_columns', None):
='species', as_df=True).T) display(get_unique_across_dfs(dfs, col_name
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 |
value | Ostrea edulis | CHIMAERA MONSTROSA | Brosme brosme | Cyclopterus lumpus | EUTRIGLA GURNARDUS | Dicentrarchus labrax | Pelvetia canaliculata | GALEUS MELASTOMUS | Glyptocephalus cynoglossus | Reinhardtius hippoglossoides | Pecten maximus | NUCELLA LAPILLUS | Littorina littorea | Trisopterus minutus | Unknown | Gadus morhua | Sebastes viviparus | Anarhichas minor | GLYPTOCEPHALUS CYNOGLOSSUS | MOLVA MOLVA | Platichthys flesus | CLUPEA HARENGUS | Hyperoplus lanceolatus | DIPTURUS BATIS | CRASSOSTREA GIGAS | OSTREA EDULIS | Galeus melastomus | Eutrigla gurnardus | Mytilus edulis | Anguilla anguilla | MONODONTA LINEATA | Trachurus trachurus | ASCOPHYLLUM NODOSUM | PLATICHTHYS FLESUS | Lycodes vahlii | RHODYMENIA PSEUDOPALAMATA & PALMARIA PALMATA | LIMANDA LIMANDA | ASCOPHYLLUN NODOSUM | Tapes sp. | Scomber scombrus | Microstomus kitt | MICROMESISTIUS POUTASSOU | RHODYMENIA spp | Ostrea Edulis | SCOPHTHALMUS RHOMBUS | SCOMBER SCOMBRUS | Cerastoderma edule | ETMOPTERUS SPINAX | Squalus acanthias | NaN | RAJIDAE/BATOIDEA | HIPPOGLOSSUS HIPPOGLOSSUS | MYTILUS EDULIS | MERLANGIUS MERLANGUS | LAMINARIA DIGITATA | SOLEA SOLEA (S.VULGARIS) | PATELLA VULGATA | Melanogrammus aeglefinus | Limanda limanda | Solea solea (S.vulgaris) | Thunnus thynnus | GADUS MORHUA | Clupea Harengus | Sebastes norvegicus | Capros aper | Gadiculus argenteus | Merlangius merlangus | SEBASTES MARINUS | Penaeus vannamei | Patella sp. | BROSME BROSME | TRACHURUS TRACHURUS | Argentina sphyraena | Gadus morhua | Anarhichas denticulatus | Mixture of green, red and brown algae | Clupea harengus | FUCUS SPP. | LITTORINA LITTOREA | Phycis blennoides | Argentina silus | BUCCINUM UNDATUM | Merluccius merluccius | Fucus sp. | Rhodymenia spp. | Melanogrammus aeglefinus | Homarus gammarus | Fucus serratus | FUCUS SPIRALIS | Gaidropsarus argenteus | Hippoglossus hippoglossus | PLEURONECTES PLATESSA | Clupea harengus | MERLANGUIS MERLANGUIS | Pollachius virens | MERLUCCIUS MERLUCCIUS | Trisopterus esmarkii | Pleuronectiformes [order] | PALMARIA PALMATA | PECTEN MAXIMUS | Sprattus sprattus | Sepia spp. | Pollachius pollachius | SPRATTUS SPRATTUS | Ascophyllum nodosum | Mytilus Edulis | Pleuronectes platessa | RAJA DIPTURUS BATIS | PECTINIDAE | Raja montagui | Salmo salar | Fucus Vesiculosus | REINHARDTIUS HIPPOGLOSSOIDES | HIPPOGLOSSOIDES PLATESSOIDES | Sebastes vivipares | Limanda Limanda | Sebastes marinus | Gadus Morhua | Gadus sp. | Merlangius Merlangus | Fucus distichus | PORPHYRA UMBILICALIS | CYCLOPTERUS LUMPUS | MELANOGRAMMUS AEGLEFINUS | Sebastes Mentella | Coryphaenoides rupestris | Pleuronectes platessa | Hippoglossoides platessoides | Crassostrea gigas | PELVETIA CANALICULATA | Lumpenus lampretaeformis | unknown | Thunnus sp. | Boreogadus Saida | Trisopterus esmarki | OSILINUS LINEATUS | Molva molva | MOLVA DYPTERYGIA | Buccinum undatum | Sebastes mentella | MERLUCCIUS MERLUCCIUS | Sardina pilchardus | PLUERONECTES PLATESSA | SALMO SALAR | Lophius piscatorius | CERASTODERMA (CARDIUM) EDULE | Anarhichas lupus | Dasyatis pastinaca | FUCUS spp | BOREOGADUS SAIDA | Boreogadus saida | Modiolus modiolus | FUCUS SERRATUS | Nephrops norvegicus | FUCUS VESICULOSUS | Cerastoderma (Cardium) Edule | POLLACHIUS VIRENS | SEBASTES MENTELLA | DICENTRARCHUS (MORONE) LABRAX | ANARHICHAS LUPUS | Flatfish | Fucus vesiculosus | Micromesistius poutassou | PATELLA | Phoca vitulina | Mallotus villosus | Gadiculus argenteus thori |
We attempt to match the OSPAR species
column to the species
column of the MARIS nomenclature using the Remapper
. First, we initialize the Remapper
:
= Remapper(provider_lut_df=get_unique_across_dfs(dfs, col_name='species', as_df=True),
remapper =species_lut_path,
maris_lut_fn='species_id',
maris_col_id='species',
maris_col_name='value',
provider_col_to_match='value',
provider_col_key='species_ospar.pkl') fname_cache
Next, we perform the matching and generate a lookup table that includes the match score, which quantifies the degree of match accuracy:
=True)
remapper.generate_lookup_table(as_dfwith pd.option_context('display.max_columns', None):
=1, verbose=True).T) display(remapper.select_match(match_score_threshold
Processing: 0%| | 0/167 [00:00<?, ?it/s]Processing: 100%|██████████| 167/167 [00:22<00:00, 7.30it/s]
129 entries matched the criteria, while 38 entries had a match score of 1 or higher.
source_key | RHODYMENIA PSEUDOPALAMATA & PALMARIA PALMATA | Mixture of green, red and brown algae | SOLEA SOLEA (S.VULGARIS) | Solea solea (S.vulgaris) | Cerastoderma (Cardium) Edule | CERASTODERMA (CARDIUM) EDULE | DICENTRARCHUS (MORONE) LABRAX | Pleuronectiformes [order] | RAJIDAE/BATOIDEA | PALMARIA PALMATA | MONODONTA LINEATA | Gadiculus argenteus | Unknown | unknown | RAJA DIPTURUS BATIS | Sepia spp. | Flatfish | Rhodymenia spp. | FUCUS SPP. | Gadus sp. | Thunnus sp. | Tapes sp. | FUCUS spp | Fucus sp. | Patella sp. | RHODYMENIA spp | MERLANGUIS MERLANGUIS | PLUERONECTES PLATESSA | Gaidropsarus argenteus | Melanogrammus aeglefinus | Pleuronectes platessa | Trisopterus esmarki | Hippoglossus hippoglossus | Sebastes vivipares | MERLUCCIUS MERLUCCIUS | Gadus morhua | ASCOPHYLLUN NODOSUM | Clupea harengus |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
matched_maris_name | Lomentaria catenata | Mercenaria mercenaria | Loligo vulgaris | Loligo vulgaris | Cerastoderma edule | Cerastoderma edule | Dicentrarchus labrax | Pleuronectiformes | Batoidea | Alaria marginata | Monodonta labio | Pampus argenteus | Undaria | Undaria | Dipturus batis | Sepia | Lambia | Rhodymenia | Fucus | Penaeus sp. | Thunnus | Tapes | Fucus | Fucus | Patella | Rhodymenia | Merlangius merlangus | Pleuronectes platessa | Gaidropsarus argentatus | Melanogrammus aeglefinus | Pleuronectes platessa | Trisopterus esmarkii | Hippoglossus hippoglossus | Sebastes viviparus | Merluccius merluccius | Gadus morhua | Ascophyllum nodosum | Clupea harengus |
source_name | RHODYMENIA PSEUDOPALAMATA & PALMARIA PALMATA | Mixture of green, red and brown algae | SOLEA SOLEA (S.VULGARIS) | Solea solea (S.vulgaris) | Cerastoderma (Cardium) Edule | CERASTODERMA (CARDIUM) EDULE | DICENTRARCHUS (MORONE) LABRAX | Pleuronectiformes [order] | RAJIDAE/BATOIDEA | PALMARIA PALMATA | MONODONTA LINEATA | Gadiculus argenteus | Unknown | unknown | RAJA DIPTURUS BATIS | Sepia spp. | Flatfish | Rhodymenia spp. | FUCUS SPP. | Gadus sp. | Thunnus sp. | Tapes sp. | FUCUS spp | Fucus sp. | Patella sp. | RHODYMENIA spp | MERLANGUIS MERLANGUIS | PLUERONECTES PLATESSA | Gaidropsarus argenteus | Melanogrammus aeglefinus | Pleuronectes platessa | Trisopterus esmarki | Hippoglossus hippoglossus | Sebastes vivipares | MERLUCCIUS MERLUCCIUS | Gadus morhua | ASCOPHYLLUN NODOSUM | Clupea harengus |
match_score | 31 | 26 | 12 | 12 | 10 | 10 | 9 | 8 | 8 | 7 | 6 | 6 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Below, we fix the entries that are not properly matched by the Remapper
:
We can now review the remapping results, incorporating the adjustments from the fixes_biota_species
dictionary:
=fixes_biota_species)
remapper.generate_lookup_table(fixeswith pd.option_context('display.max_columns', None):
=1, verbose=True).T) display(remapper.select_match(match_score_threshold
Processing: 0%| | 0/167 [00:00<?, ?it/s]Processing: 100%|██████████| 167/167 [00:22<00:00, 7.37it/s]
139 entries matched the criteria, while 28 entries had a match score of 1 or higher.
source_key | Cerastoderma (Cardium) Edule | CERASTODERMA (CARDIUM) EDULE | DICENTRARCHUS (MORONE) LABRAX | Pleuronectiformes [order] | Gadiculus argenteus | MONODONTA LINEATA | FUCUS SPP. | Rhodymenia spp. | RAJA DIPTURUS BATIS | Sepia spp. | Tapes sp. | RHODYMENIA spp | FUCUS spp | Patella sp. | Fucus sp. | Thunnus sp. | MERLANGUIS MERLANGUIS | PLUERONECTES PLATESSA | Gaidropsarus argenteus | MERLUCCIUS MERLUCCIUS | Clupea harengus | Sebastes vivipares | Pleuronectes platessa | Hippoglossus hippoglossus | Trisopterus esmarki | Melanogrammus aeglefinus | ASCOPHYLLUN NODOSUM | Gadus morhua |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
matched_maris_name | Cerastoderma edule | Cerastoderma edule | Dicentrarchus labrax | Pleuronectiformes | Pampus argenteus | Monodonta labio | Fucus | Rhodymenia | Dipturus batis | Sepia | Tapes | Rhodymenia | Fucus | Patella | Fucus | Thunnus | Merlangius merlangus | Pleuronectes platessa | Gaidropsarus argentatus | Merluccius merluccius | Clupea harengus | Sebastes viviparus | Pleuronectes platessa | Hippoglossus hippoglossus | Trisopterus esmarkii | Melanogrammus aeglefinus | Ascophyllum nodosum | Gadus morhua |
source_name | Cerastoderma (Cardium) Edule | CERASTODERMA (CARDIUM) EDULE | DICENTRARCHUS (MORONE) LABRAX | Pleuronectiformes [order] | Gadiculus argenteus | MONODONTA LINEATA | FUCUS SPP. | Rhodymenia spp. | RAJA DIPTURUS BATIS | Sepia spp. | Tapes sp. | RHODYMENIA spp | FUCUS spp | Patella sp. | Fucus sp. | Thunnus sp. | MERLANGUIS MERLANGUIS | PLUERONECTES PLATESSA | Gaidropsarus argenteus | MERLUCCIUS MERLUCCIUS | Clupea harengus | Sebastes vivipares | Pleuronectes platessa | Hippoglossus hippoglossus | Trisopterus esmarki | Melanogrammus aeglefinus | ASCOPHYLLUN NODOSUM | Gadus morhua |
match_score | 10 | 10 | 9 | 8 | 6 | 6 | 5 | 5 | 5 | 5 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Visual inspection of the remaining imperfectly matched entries appears acceptable. We can now define a Remapper Lambda Function that instantiates the Remapper and returns the corrected lookup table.
Putting it all together, we now apply the RemapCB
callback to our data. This process adds a SPECIES
column to our BIOTA
dataframe, which contains the standardized species IDs.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm =lut_biota, col_remap='SPECIES', col_src='species', dest_grps='BIOTA')
RemapCB(fn_lut
])
'BIOTA']['SPECIES'].unique() tfm()[
array([ 377, 129, 96, 0, 192, 99, 50, 378, 270, 379, 380,
381, 382, 383, 384, 385, 244, 386, 387, 388, 389, 390,
391, 392, 393, 394, 395, 396, 274, 397, 398, 243, 399,
400, 401, 402, 403, 404, 405, 406, 407, 191, 139, 408,
410, 412, 413, 272, 414, 415, 416, 417, 418, 419, 420,
421, 422, 423, 424, 425, 426, 427, 428, 411, 429, 430,
431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441,
442, 443, 444, 294, 1684, 1610, 1609, 1605, 1608, 23, 1606,
234, 556, 1701, 1752, 158, 223])
Enhance Species Data Using Biological group.
The Biological group
column in the OSPAR dataset provides valuable insights related to species. We will leverage this information to enrich the SPECIES
column. To achieve this, we will employ the generic RemapCB
callback to create an enhanced_species
column. Subsequently, this enhanced_species
column will be used to further enrich the SPECIES
column.
First we inspect the unique values in the biological group
column.
='biological group', as_df=True) get_unique_across_dfs(dfs, col_name
index | value | |
---|---|---|
0 | 0 | fish |
1 | 1 | MOLLUSCS |
2 | 2 | Seaweeds |
3 | 3 | FISH |
4 | 4 | seaweed |
5 | 5 | Seaweed |
6 | 6 | molluscs |
7 | 7 | SEAWEED |
8 | 8 | Fish |
9 | 9 | Molluscs |
We will remap the biological group
columns data to the species
column of the MARIS nomenclature, again using a Remapper
object:
= Remapper(provider_lut_df=get_unique_across_dfs(dfs, col_name='biological group', as_df=True),
remapper =species_lut_path,
maris_lut_fn='species_id',
maris_col_id='species',
maris_col_name='value',
provider_col_to_match='value',
provider_col_key='enhance_species_ospar.pkl') fname_cache
Like before we will inspect the data.
=True)
remapper.generate_lookup_table(as_df=1) remapper.select_match(match_score_threshold
Processing: 100%|██████████| 10/10 [00:01<00:00, 8.26it/s]
matched_maris_name | source_name | match_score | |
---|---|---|---|
source_key | |||
fish | Fucus | fish | 4 |
FISH | Fucus | FISH | 4 |
Fish | Fucus | Fish | 4 |
MOLLUSCS | Mollusca | MOLLUSCS | 1 |
Seaweeds | Seaweed | Seaweeds | 1 |
molluscs | Mollusca | molluscs | 1 |
Molluscs | Mollusca | Molluscs | 1 |
We can see that some entries require manual fixes.
Now we will apply the manual fixes to the lookup table and review.
=fixes_enhanced_biota_species)
remapper.generate_lookup_table(fixes=1) remapper.select_match(match_score_threshold
Processing: 0%| | 0/10 [00:00<?, ?it/s]Processing: 100%|██████████| 10/10 [00:01<00:00, 6.69it/s]
matched_maris_name | source_name | match_score | |
---|---|---|---|
source_key | |||
MOLLUSCS | Mollusca | MOLLUSCS | 1 |
Seaweeds | Seaweed | Seaweeds | 1 |
molluscs | Mollusca | molluscs | 1 |
Molluscs | Mollusca | Molluscs | 1 |
Visual inspection of the remaining imperfectly matched entries appears acceptable. We can now define a Remapper Lambda Function that instantiates the Remapper and returns the corrected lookup table.
Now we can apply RemapCB
which results in the addition of an enhanced_species
column in our BIOTA
DataFrame.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm =lut_biota_enhanced, col_remap='enhanced_species', col_src='biological group', dest_grps='BIOTA')
RemapCB(fn_lut
])
'BIOTA']['enhanced_species'].unique() tfm()[
array([ 873, 1059, 712])
With the enhanced_species
column, we can enrich the SPECIES
column. We will use the value in enhanced_species
column in the absence of a SPECIES
match if the enhanced_species
column is valid.
EnhanceSpeciesCB
EnhanceSpeciesCB ()
Enhance the ‘SPECIES’ column using the ‘enhanced_species’ column if conditions are met.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm =lut_biota, col_remap='SPECIES', col_src='species', dest_grps='BIOTA'),
RemapCB(fn_lut=lut_biota_enhanced, col_remap='enhanced_species', col_src='biological group', dest_grps='BIOTA'),
RemapCB(fn_lut
EnhanceSpeciesCB()
])
'BIOTA']['SPECIES'].unique() tfm()[
array([ 377, 129, 96, 712, 192, 99, 50, 378, 270, 379, 380,
381, 382, 383, 384, 385, 244, 386, 387, 388, 389, 390,
391, 392, 393, 394, 395, 396, 274, 161, 398, 243, 399,
400, 401, 402, 403, 404, 405, 406, 407, 1379, 191, 139,
408, 1299, 410, 148, 412, 413, 272, 414, 415, 416, 417,
418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428,
411, 429, 430, 431, 814, 432, 433, 434, 435, 436, 437,
438, 439, 440, 441, 442, 443, 444, 294, 992, 1426, 1684,
1610, 1609, 1605, 1608, 23, 1606, 234, 556, 1701, 1752, 1104,
158, 223])
All entries are matched for the SPECIES
column.
Remap Biota tissues
The OSPAR dataset includes entries where the Body Part
is labeled as whole
. However, the MARIS data standard requires a more specific distinction for the body_part
field, differentiating between Whole animal
and Whole plant
. Fortunately, the OSPAR dataset provides a Biological group
field that allows us to make this distinction.
To address this discrepancy and ensure compatibility with MARIS standards, we will: 1. Create a temporary column body_part_temp
that combines information from both Body Part
and Biological group
. 2. Use this temporary column to perform the lookup using our Remapper
object.
Lets create the temporary column, body_part_temp
, that combines Body Part
and Biological group
.
AddBodypartTempCB
AddBodypartTempCB ()
Add a temporary column with the body part and biological group combined.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm
AddBodypartTempCB(),
])= tfm()
dfs_test 'BIOTA']['body_part_temp'].unique() dfs_test[
array(['whole animal molluscs', 'whole plant seaweed', 'whole fish fish',
'flesh without bones fish', 'whole animal fish', 'muscle fish',
'head fish', 'soft parts molluscs', 'growing tips seaweed',
'soft parts fish', 'unknown fish', 'flesh without bone fish',
'flesh fish', 'flesh with scales fish', 'liver fish',
'flesh without bones seaweed', 'whole fish',
'flesh without bones molluscs', 'whole seaweed',
'whole plant seaweeds', 'whole fish', 'whole without head fish',
'mix of muscle and whole fish without liver fish',
'whole fisk fish', 'muscle fish', 'cod medallion fish',
'tail and claws fish'], dtype=object)
To align the body_part_temp
column with the bodypar
column in the MARIS nomenclature, we will use the Remapper
. However, since the OSPAR dataset lacks a predefined lookup table for the body_part
column, we must first create one. This is accomplished by extracting unique values from the body_part_temp
column.
='body_part_temp', as_df=True).head() get_unique_across_dfs(dfs_test, col_name
index | value | |
---|---|---|
0 | 0 | whole without head fish |
1 | 1 | soft parts fish |
2 | 2 | flesh without bones molluscs |
3 | 3 | growing tips seaweed |
4 | 4 | muscle fish |
We can now remap the body_part_temp
column to the bodypar
column in the MARIS nomenclature using the Remapper
. Subsequently, we will inspect the results:
= Remapper(provider_lut_df=get_unique_across_dfs(dfs_test, col_name='body_part_temp', as_df=True),
remapper =bodyparts_lut_path,
maris_lut_fn='bodypar_id',
maris_col_id='bodypar',
maris_col_name='value',
provider_col_to_match='value',
provider_col_key='tissues_ospar.pkl'
fname_cache
)
=True)
remapper.generate_lookup_table(as_dfwith pd.option_context('display.max_columns', None):
=0, verbose=True).T) display(remapper.select_match(match_score_threshold
Processing: 0%| | 0/27 [00:00<?, ?it/s]Processing: 100%|██████████| 27/27 [00:00<00:00, 102.11it/s]
0 entries matched the criteria, while 27 entries had a match score of 0 or higher.
source_key | mix of muscle and whole fish without liver fish | whole without head fish | cod medallion fish | tail and claws fish | unknown fish | whole fisk fish | whole fish fish | whole plant seaweeds | whole animal molluscs | soft parts molluscs | flesh without bones molluscs | whole plant seaweed | flesh without bones seaweed | growing tips seaweed | flesh fish | whole seaweed | muscle fish | liver fish | flesh without bones fish | muscle fish | whole fish | head fish | whole animal fish | soft parts fish | whole fish | flesh with scales fish | flesh without bone fish |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
matched_maris_name | Flesh without bones | Flesh without bones | Old leaf | Stomach and intestine | Growing tips | Whole animal | Whole animal | Whole plant | Whole animal | Soft parts | Flesh without bones | Whole plant | Flesh without bones | Growing tips | Shells | Whole plant | Muscle | Liver | Flesh without bones | Muscle | Whole animal | Head | Whole animal | Soft parts | Whole animal | Flesh with scales | Flesh without bones |
source_name | mix of muscle and whole fish without liver fish | whole without head fish | cod medallion fish | tail and claws fish | unknown fish | whole fisk fish | whole fish fish | whole plant seaweeds | whole animal molluscs | soft parts molluscs | flesh without bones molluscs | whole plant seaweed | flesh without bones seaweed | growing tips seaweed | flesh fish | whole seaweed | muscle fish | liver fish | flesh without bones fish | muscle fish | whole fish | head fish | whole animal fish | soft parts fish | whole fish | flesh with scales fish | flesh without bone fish |
match_score | 31 | 13 | 13 | 13 | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 8 | 7 | 7 | 6 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
Many of the lookup entries are sufficient for our needs. However, for values that don’t find a match, we can use the fixes_biota_bodyparts
dictionary to apply manual corrections. First we will create the dictionary.
Now we will generate the lookup table and apply the manual fixes defined in the fixes_biota_bodyparts
dictionary.
=fixes_biota_tissues)
remapper.generate_lookup_table(fixeswith pd.option_context('display.max_columns', None):
=1, verbose=True).T) display(remapper.select_match(match_score_threshold
Processing: 0%| | 0/27 [00:00<?, ?it/s]Processing: 100%|██████████| 27/27 [00:00<00:00, 94.75it/s]
1 entries matched the criteria, while 26 entries had a match score of 1 or higher.
source_key | whole animal molluscs | flesh without bones molluscs | whole fisk fish | soft parts molluscs | whole fish fish | whole plant seaweeds | growing tips seaweed | whole plant seaweed | whole seaweed | muscle fish | flesh without bones fish | liver fish | whole animal fish | flesh with scales fish | soft parts fish | whole fish | head fish | whole fish | muscle fish | flesh without bone fish | flesh without bones seaweed | mix of muscle and whole fish without liver fish | tail and claws fish | unknown fish | cod medallion fish | whole without head fish |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
matched_maris_name | Whole animal | Flesh without bones | Whole animal | Soft parts | Whole animal | Whole plant | Growing tips | Whole plant | Whole plant | Muscle | Flesh without bones | Liver | Whole animal | Flesh with scales | Soft parts | Whole animal | Head | Whole animal | Muscle | Flesh without bones | (Not available) | (Not available) | (Not available) | (Not available) | (Not available) | (Not available) |
source_name | whole animal molluscs | flesh without bones molluscs | whole fisk fish | soft parts molluscs | whole fish fish | whole plant seaweeds | growing tips seaweed | whole plant seaweed | whole seaweed | muscle fish | flesh without bones fish | liver fish | whole animal fish | flesh with scales fish | soft parts fish | whole fish | head fish | whole fish | muscle fish | flesh without bone fish | flesh without bones seaweed | mix of muscle and whole fish without liver fish | tail and claws fish | unknown fish | cod medallion fish | whole without head fish |
match_score | 9 | 9 | 9 | 9 | 9 | 9 | 8 | 8 | 7 | 6 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | 2 | 2 | 2 | 2 | 2 | 2 |
At this stage, the majority of entries have been successfully matched to the MARIS nomenclature. Entries that remain unmatched are appropriately marked as ‘not available’. We are now ready to proceed with the final remapping process. We will define a lambda function to instantiate the Remapper
, which will then generate and return the corrected lookup table.
Putting it all together, we now apply the RemapCB
callback. This process results in the addition of a BODY_PART
column to our BIOTA
DataFrame.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm
AddBodypartTempCB(),=lut_bodyparts, col_remap='BODY_PART', col_src='body_part_temp' , dest_grps='BIOTA')
RemapCB(fn_lut
])
tfm()'BIOTA']['BODY_PART'].unique() tfm.dfs[
array([ 1, 40, 52, 34, 13, 19, 56, 0, 4, 60, 25])
Remap biogroup
The MARIS species lookup table contains a biogroup_id
column that associates each species with its corresponding biogroup
. We will leverage this relationship to create a BIO_GROUP
column in the BIOTA
DataFrame.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm =lut_biota, col_remap='SPECIES', col_src='species', dest_grps='BIOTA'),
RemapCB(fn_lut=lut_biota_enhanced, col_remap='enhanced_species', col_src='biological group', dest_grps='BIOTA'),
RemapCB(fn_lut
EnhanceSpeciesCB(),=lut_biogroup_from_biota, col_remap='BIO_GROUP', col_src='SPECIES', dest_grps='BIOTA')
RemapCB(fn_lut
])
print(tfm()['BIOTA']['BIO_GROUP'].unique())
[14 11 4 13 12 2 5]
Add Sample ID
The OSPAR dataset includes an ID
column, which we will use to create a SMP_ID
column.
AddSampleIdCB
AddSampleIdCB ()
Include a SMP_ID column from the ID column of OSPAR
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm
AddSampleIdCB(),
CompareDfsAndTfmCB(dfs)
])
tfm()for grp in ['BIOTA', 'SEAWATER']:
print(f"{grp}: {tfm.dfs[grp]['SMP_ID'].unique()}")
print(pd.DataFrame.from_dict(tfm.compare_stats) , '\n')
BIOTA: [ 1 2 3 ... 98060 98061 98062]
SEAWATER: [ 1 2 3 ... 120366 120367 120368]
BIOTA SEAWATER
Number of rows in original dataframes (dfs): 15951 19193
Number of rows in transformed dataframes (tfm.d... 15951 19193
Number of rows removed (tfm.dfs_removed): 0 0
Add depth
The OSPAR dataset features a Sampling depth
column specifically for the SEAWATER
dataset. In this section, we will develop a callback to integrate the sampling depth, denoted as SMP_DEPTH
, into the MARIS dataset.
AddDepthCB
AddDepthCB ()
Ensure depth values are floats and add ‘SMP_DEPTH’ columns.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm
AddDepthCB()
])
tfm()for grp in tfm.dfs.keys():
if 'SMP_DEPTH' in tfm.dfs[grp].columns:
print(f'{grp}:', tfm.dfs[grp][['SMP_DEPTH']].drop_duplicates())
SEAWATER: SMP_DEPTH
0 3.0
80 2.0
81 21.0
85 31.0
87 32.0
... ...
16022 71.0
16023 66.0
16025 81.0
16385 1660.0
16389 1500.0
[134 rows x 1 columns]
Standardize Coordinates
The OSPAR dataset offers coordinates in degrees, minutes, and seconds (DMS). The following callback is designed to convert DMS to decimal degrees.
ConvertLonLatCB
ConvertLonLatCB ()
Convert Coordinates to decimal degrees (DDD.DDDDD°).
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm
ConvertLonLatCB()
])
tfm()
with pd.option_context('display.max_columns', None):
'SEAWATER'][['LAT','latd', 'latm', 'lats', 'LON', 'latdir', 'longd', 'longm','longs', 'longdir']]) display(tfm.dfs[
LAT | latd | latm | lats | LON | latdir | longd | longm | longs | longdir | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 51.375278 | 51 | 22.0 | 31.0 | 3.188056 | N | 3 | 11.0 | 17.0 | E |
1 | 51.223611 | 51 | 13.0 | 25.0 | 2.859444 | N | 2 | 51.0 | 34.0 | E |
2 | 51.184444 | 51 | 11.0 | 4.0 | 2.713611 | N | 2 | 42.0 | 49.0 | E |
3 | 51.420278 | 51 | 25.0 | 13.0 | 3.262222 | N | 3 | 15.0 | 44.0 | E |
4 | 51.416111 | 51 | 24.0 | 58.0 | 2.809722 | N | 2 | 48.0 | 35.0 | E |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
19188 | 53.600000 | 53 | 36.0 | 0.0 | -5.933333 | N | 5 | 56.0 | 0.0 | W |
19189 | 53.733333 | 53 | 44.0 | 0.0 | -5.416667 | N | 5 | 25.0 | 0.0 | W |
19190 | 53.650000 | 53 | 39.0 | 0.0 | -5.233333 | N | 5 | 14.0 | 0.0 | W |
19191 | 53.883333 | 53 | 53.0 | 0.0 | -5.550000 | N | 5 | 33.0 | 0.0 | W |
19192 | 53.866667 | 53 | 52.0 | 0.0 | -5.883333 | N | 5 | 53.0 | 0.0 | W |
19193 rows × 10 columns
Sanitize coordinates drops a row when both longitude & latitude equal 0 or data contains unrealistic longitude & latitude values. Converts longitude & latitude ,
separator to .
separator.”
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm
ConvertLonLatCB(),
SanitizeLonLatCB(),
CompareDfsAndTfmCB(dfs)
])
tfm()
"<b> Row Count Comparison Before and After Transformation:</b>"))
display(Markdown(with pd.option_context('display.max_rows', None):
display(pd.DataFrame.from_dict(tfm.compare_stats))
with pd.option_context('display.max_columns', None):
'SEAWATER'][['LAT','LON']]) display(tfm.dfs[
Row Count Comparison Before and After Transformation:
BIOTA | SEAWATER | |
---|---|---|
Number of rows in original dataframes (dfs): | 15951 | 19193 |
Number of rows in transformed dataframes (tfm.dfs): | 15951 | 19193 |
Number of rows removed (tfm.dfs_removed): | 0 | 0 |
LAT | LON | |
---|---|---|
0 | 51.375278 | 3.188056 |
1 | 51.223611 | 2.859444 |
2 | 51.184444 | 2.713611 |
3 | 51.420278 | 3.262222 |
4 | 51.416111 | 2.809722 |
... | ... | ... |
19188 | 53.600000 | -5.933333 |
19189 | 53.733333 | -5.416667 |
19190 | 53.650000 | -5.233333 |
19191 | 53.883333 | -5.550000 |
19192 | 53.866667 | -5.883333 |
19193 rows × 2 columns
Review all callbacks
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm ='nuclide', col_dst='nuclide'),
LowerStripNameCB(col_src='nuclide'),
RemapNuclideNameCB(lut_nuclides, col_name
ParseTimeCB(),
EncodeTimeCB(),
SanitizeValueCB(),
NormalizeUncCB(),
RemapUnitCB(renaming_unit_rules),
RemapDetectionLimitCB(coi_dl, lut_dl),=lut_biota, col_remap='SPECIES', col_src='species', dest_grps='BIOTA'),
RemapCB(fn_lut=lut_biota_enhanced, col_remap='enhanced_species', col_src='biological group', dest_grps='BIOTA'),
RemapCB(fn_lut
EnhanceSpeciesCB(),
AddBodypartTempCB(),=lut_bodyparts, col_remap='BODY_PART', col_src='body_part_temp' , dest_grps='BIOTA'),
RemapCB(fn_lut
AddSampleIdCB(),
AddDepthCB(),
ConvertLonLatCB(),
SanitizeLonLatCB(),
CompareDfsAndTfmCB(dfs)
])
tfm()print(pd.DataFrame.from_dict(tfm.compare_stats) , '\n')
10 invalid rows found in group 'SEAWATER' during time parsing callback (ParseTimeCB).
BIOTA SEAWATER
Number of rows in original dataframes (dfs): 15951 19193
Number of rows in transformed dataframes (tfm.d... 15951 19183
Number of rows removed (tfm.dfs_removed): 0 10
Example change logs
Review the change logs for the netcdf encoding.
= load_data(src_dir, use_cache=True)
dfs = Transformer(dfs, cbs=[
tfm ='nuclide', col_dst='nuclide'),
LowerStripNameCB(col_src='nuclide'),
RemapNuclideNameCB(lut_nuclides, col_name
ParseTimeCB(),
EncodeTimeCB(),
SanitizeValueCB(),
NormalizeUncCB(),
RemapUnitCB(renaming_unit_rules),
RemapDetectionLimitCB(coi_dl, lut_dl),=lut_biota, col_remap='SPECIES', col_src='species', dest_grps='BIOTA'),
RemapCB(fn_lut=lut_biota_enhanced, col_remap='enhanced_species', col_src='biological group', dest_grps='BIOTA'),
RemapCB(fn_lut
EnhanceSpeciesCB(),
AddBodypartTempCB(),=lut_bodyparts, col_remap='BODY_PART', col_src='body_part_temp' , dest_grps='BIOTA'),
RemapCB(fn_lut
AddSampleIdCB(),
AddDepthCB(),
ConvertLonLatCB(),
SanitizeLonLatCB(),
])
# Transform
tfm()# Check transformation logs
tfm.logs
10 invalid rows found in group 'SEAWATER' during time parsing callback (ParseTimeCB).
["Convert 'nuclide' column values to lowercase, strip spaces, and store in 'nuclide' column.",
'Remap data provider nuclide names to standardized MARIS nuclide names.',
'Parse the time format in the dataframe and check for inconsistencies.',
'Encode time as seconds since epoch.',
'Sanitize value by removing blank entries and populating `value` column.',
'Normalize uncertainty values in DataFrames.',
"Callback to update DataFrame 'UNIT' columns based on a lookup table.",
'Remap detection limit values to MARIS format using a lookup table.',
"Remap values from 'species' to 'SPECIES' for groups: BIOTA.",
"Remap values from 'biological group' to 'enhanced_species' for groups: BIOTA.",
"Enhance the 'SPECIES' column using the 'enhanced_species' column if conditions are met.",
'Add a temporary column with the body part and biological group combined.',
"Remap values from 'body_part_temp' to 'BODY_PART' for groups: BIOTA.",
'Include a SMP_ID column from the ID column of OSPAR',
"Ensure depth values are floats and add 'SMP_DEPTH' columns.",
'Convert Coordinates to decimal degrees (DDD.DDDDD°).',
'Drop rows with invalid longitude & latitude values. Convert `,` separator to `.` separator.']
Feed global attributes
get_attrs
get_attrs (tfm:marisco.callbacks.Transformer, zotero_key:str, kw:list=['oceanography', 'Earth Science > Oceans > Ocean Chemistry> Radionuclides', 'Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure', 'Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments', 'Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes', 'Earth Science > Oceans > Water Quality > Ocean Contaminants', 'Earth Science > Biological Classification > Animals/Vertebrates > Fish', 'Earth Science > Biosphere > Ecosystems > Marine Ecosystems', 'Earth Science > Biological Classification > Animals/Invertebrates > Mollusks', 'Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans', 'Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)'])
Retrieve all global attributes.
Type | Default | Details | |
---|---|---|---|
tfm | Transformer | Transformer object | |
zotero_key | str | Zotero dataset record key | |
kw | list | [‘oceanography’, ‘Earth Science > Oceans > Ocean Chemistry> Radionuclides’, ‘Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure’, ‘Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments’, ‘Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes’, ‘Earth Science > Oceans > Water Quality > Ocean Contaminants’, ‘Earth Science > Biological Classification > Animals/Vertebrates > Fish’, ‘Earth Science > Biosphere > Ecosystems > Marine Ecosystems’, ‘Earth Science > Biological Classification > Animals/Invertebrates > Mollusks’, ‘Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans’, ‘Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)’] | List of keywords |
Returns | dict | Global attributes |
=zotero_key, kw=kw) get_attrs(tfm, zotero_key
{'geospatial_lat_min': '49.43222222222222',
'geospatial_lat_max': '81.26805555555555',
'geospatial_lon_min': '-58.23166666666667',
'geospatial_lon_max': '36.181666666666665',
'geospatial_bounds': 'POLYGON ((-58.23166666666667 36.181666666666665, 49.43222222222222 36.181666666666665, 49.43222222222222 81.26805555555555, -58.23166666666667 81.26805555555555, -58.23166666666667 36.181666666666665))',
'geospatial_vertical_max': '1850.0',
'geospatial_vertical_min': '0.0',
'time_coverage_start': '1995-01-01T00:00:00',
'time_coverage_end': '2022-12-31T00:00:00',
'id': 'LQRA4MMK',
'title': 'OSPAR Environmental Monitoring of Radioactive Substances',
'summary': '',
'creator_name': '[{"creatorType": "author", "firstName": "", "lastName": "OSPAR Comission\'s Radioactive Substances Committee (RSC)"}]',
'keywords': 'oceanography, Earth Science > Oceans > Ocean Chemistry> Radionuclides, Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure, Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments, Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes, Earth Science > Oceans > Water Quality > Ocean Contaminants, Earth Science > Biological Classification > Animals/Vertebrates > Fish, Earth Science > Biosphere > Ecosystems > Marine Ecosystems, Earth Science > Biological Classification > Animals/Invertebrates > Mollusks, Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans, Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)',
'publisher_postprocess_logs': "Convert 'nuclide' column values to lowercase, strip spaces, and store in 'nuclide' column., Remap data provider nuclide names to standardized MARIS nuclide names., Parse the time format in the dataframe and check for inconsistencies., Encode time as seconds since epoch., Sanitize value by removing blank entries and populating `value` column., Normalize uncertainty values in DataFrames., Callback to update DataFrame 'UNIT' columns based on a lookup table., Remap detection limit values to MARIS format using a lookup table., Remap values from 'species' to 'SPECIES' for groups: BIOTA., Remap values from 'biological group' to 'enhanced_species' for groups: BIOTA., Enhance the 'SPECIES' column using the 'enhanced_species' column if conditions are met., Add a temporary column with the body part and biological group combined., Remap values from 'body_part_temp' to 'BODY_PART' for groups: BIOTA., Include a SMP_ID column from the ID column of OSPAR, Ensure depth values are floats and add 'SMP_DEPTH' columns., Convert Coordinates to decimal degrees (DDD.DDDDD°)., Drop rows with invalid longitude & latitude values. Convert `,` separator to `.` separator."}
Encoding NETCDF
encode
encode (fname_out_nc:str, **kwargs)
Encode data to NetCDF.
Type | Details | |
---|---|---|
fname_out_nc | str | Output file name |
kwargs | ||
Returns | None | Additional arguments |
=False) encode(fname_out_nc, verbose
10 invalid rows found in group 'SEAWATER' during time parsing callback (ParseTimeCB).
NetCDF Review
First lets review the global attributes of the NetCDF file:
= ExtractNetcdfContents(fname_out_nc)
contents print(contents.global_attrs)
{'id': 'LQRA4MMK', 'title': 'OSPAR Environmental Monitoring of Radioactive Substances', 'summary': '', 'keywords': 'oceanography, Earth Science > Oceans > Ocean Chemistry> Radionuclides, Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure, Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments, Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes, Earth Science > Oceans > Water Quality > Ocean Contaminants, Earth Science > Biological Classification > Animals/Vertebrates > Fish, Earth Science > Biosphere > Ecosystems > Marine Ecosystems, Earth Science > Biological Classification > Animals/Invertebrates > Mollusks, Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans, Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)', 'history': 'TBD', 'keywords_vocabulary': 'GCMD Science Keywords', 'keywords_vocabulary_url': 'https://gcmd.earthdata.nasa.gov/static/kms/', 'record': 'TBD', 'featureType': 'TBD', 'cdm_data_type': 'TBD', 'Conventions': 'CF-1.10 ACDD-1.3', 'publisher_name': 'Paul MCGINNITY, Iolanda OSVATH, Florence DESCROIX-COMANDUCCI', 'publisher_email': 'p.mc-ginnity@iaea.org, i.osvath@iaea.org, F.Descroix-Comanducci@iaea.org', 'publisher_url': 'https://maris.iaea.org', 'publisher_institution': 'International Atomic Energy Agency - IAEA', 'creator_name': '[{"creatorType": "author", "firstName": "", "lastName": "OSPAR Comission\'s Radioactive Substances Committee (RSC)"}]', 'institution': 'TBD', 'metadata_link': 'TBD', 'creator_email': 'TBD', 'creator_url': 'TBD', 'references': 'TBD', 'license': 'Without prejudice to the applicable Terms and Conditions (https://nucleus.iaea.org/Pages/Others/Disclaimer.aspx), I hereby agree that any use of the data will contain appropriate acknowledgement of the data source(s) and the IAEA Marine Radioactivity Information System (MARIS).', 'comment': 'TBD', 'geospatial_lat_min': '49.43222222222222', 'geospatial_lon_min': '-58.23166666666667', 'geospatial_lat_max': '81.26805555555555', 'geospatial_lon_max': '36.181666666666665', 'geospatial_vertical_min': '0.0', 'geospatial_vertical_max': '1850.0', 'geospatial_bounds': 'POLYGON ((-58.23166666666667 36.181666666666665, 49.43222222222222 36.181666666666665, 49.43222222222222 81.26805555555555, -58.23166666666667 81.26805555555555, -58.23166666666667 36.181666666666665))', 'geospatial_bounds_crs': 'EPSG:4326', 'time_coverage_start': '1995-01-01T00:00:00', 'time_coverage_end': '2022-12-31T00:00:00', 'local_time_zone': 'TBD', 'date_created': 'TBD', 'date_modified': 'TBD', 'publisher_postprocess_logs': "Convert 'nuclide' column values to lowercase, strip spaces, and store in 'nuclide' column., Remap data provider nuclide names to standardized MARIS nuclide names., Parse the time format in the dataframe and check for inconsistencies., Encode time as seconds since epoch., Sanitize value by removing blank entries and populating `value` column., Normalize uncertainty values in DataFrames., Callback to update DataFrame 'UNIT' columns based on a lookup table., Remap detection limit values to MARIS format using a lookup table., Remap values from 'species' to 'SPECIES' for groups: BIOTA., Remap values from 'biological group' to 'enhanced_species' for groups: BIOTA., Enhance the 'SPECIES' column using the 'enhanced_species' column if conditions are met., Add a temporary column with the body part and biological group combined., Remap values from 'body_part_temp' to 'BODY_PART' for groups: BIOTA., Include a SMP_ID column from the ID column of OSPAR, Ensure depth values are floats and add 'SMP_DEPTH' columns., Convert Coordinates to decimal degrees (DDD.DDDDD°)., Drop rows with invalid longitude & latitude values. Convert `,` separator to `.` separator."}
Review the publisher_postprocess_logs.
print(contents.global_attrs['publisher_postprocess_logs'])
Convert 'nuclide' column values to lowercase, strip spaces, and store in 'nuclide' column., Remap data provider nuclide names to standardized MARIS nuclide names., Parse the time format in the dataframe and check for inconsistencies., Encode time as seconds since epoch., Sanitize value by removing blank entries and populating `value` column., Normalize uncertainty values in DataFrames., Callback to update DataFrame 'UNIT' columns based on a lookup table., Remap detection limit values to MARIS format using a lookup table., Remap values from 'species' to 'SPECIES' for groups: BIOTA., Remap values from 'biological group' to 'enhanced_species' for groups: BIOTA., Enhance the 'SPECIES' column using the 'enhanced_species' column if conditions are met., Add a temporary column with the body part and biological group combined., Remap values from 'body_part_temp' to 'BODY_PART' for groups: BIOTA., Include a SMP_ID column from the ID column of OSPAR, Ensure depth values are floats and add 'SMP_DEPTH' columns., Convert Coordinates to decimal degrees (DDD.DDDDD°)., Drop rows with invalid longitude & latitude values. Convert `,` separator to `.` separator.
Now lets review the enums of the groups in the NetCDF file:
print(contents.enum_dicts)
{'BIOTA': {'nuclide': {'NOT APPLICABLE': '-1', 'NOT AVAILABLE': '0', 'h3': '1', 'be7': '2', 'c14': '3', 'k40': '4', 'cr51': '5', 'mn54': '6', 'co57': '7', 'co58': '8', 'co60': '9', 'zn65': '10', 'sr89': '11', 'sr90': '12', 'zr95': '13', 'nb95': '14', 'tc99': '15', 'ru103': '16', 'ru106': '17', 'rh106': '18', 'ag106m': '19', 'ag108': '20', 'ag108m': '21', 'ag110m': '22', 'sb124': '23', 'sb125': '24', 'te129m': '25', 'i129': '28', 'i131': '29', 'cs127': '30', 'cs134': '31', 'cs137': '33', 'ba140': '34', 'la140': '35', 'ce141': '36', 'ce144': '37', 'pm147': '38', 'eu154': '39', 'eu155': '40', 'pb210': '41', 'pb212': '42', 'pb214': '43', 'bi207': '44', 'bi211': '45', 'bi214': '46', 'po210': '47', 'rn220': '48', 'rn222': '49', 'ra223': '50', 'ra224': '51', 'ra225': '52', 'ra226': '53', 'ra228': '54', 'ac228': '55', 'th227': '56', 'th228': '57', 'th232': '59', 'th234': '60', 'pa234': '61', 'u234': '62', 'u235': '63', 'u238': '64', 'np237': '65', 'np239': '66', 'pu238': '67', 'pu239': '68', 'pu240': '69', 'pu241': '70', 'am240': '71', 'am241': '72', 'cm242': '73', 'cm243': '74', 'cm244': '75', 'cs134_137_tot': '76', 'pu239_240_tot': '77', 'pu239_240_iii_iv_tot': '78', 'pu239_240_v_vi_tot': '79', 'cm243_244_tot': '80', 'pu238_pu239_240_tot_ratio': '81', 'am241_pu239_240_tot_ratio': '82', 'cs137_134_ratio': '83', 'cd109': '84', 'eu152': '85', 'fe59': '86', 'gd153': '87', 'ir192': '88', 'pu238_240_tot': '89', 'rb86': '90', 'sc46': '91', 'sn113': '92', 'sn117m': '93', 'tl208': '94', 'mo99': '95', 'tc99m': '96', 'ru105': '97', 'te129': '98', 'te132': '99', 'i132': '100', 'i135': '101', 'cs136': '102', 'tbeta': '103', 'talpha': '104', 'i133': '105', 'th230': '106', 'pa231': '107', 'u236': '108', 'ag111': '109', 'in116m': '110', 'te123m': '111', 'sb127': '112', 'ba133': '113', 'ce139': '114', 'tl201': '116', 'hg203': '117', 'na22': '122', 'pa234m': '123', 'am243': '124', 'se75': '126', 'sr85': '127', 'y88': '128', 'ce140': '129', 'bi212': '130', 'u236_238_ratio': '131', 'i125': '132', 'ba137m': '133', 'u232': '134', 'pa233': '135', 'ru106_rh106_tot': '136', 'tu': '137', 'tbeta40k': '138', 'fe55': '139', 'ce144_pr144_tot': '140', 'pu240_pu239_ratio': '141', 'u233': '142', 'pu239_242_tot': '143', 'ac227': '144'}, 'unit': {'Not applicable': '-1', 'NOT AVAILABLE': '0', 'Bq per m3': '1', 'Bq per m2': '2', 'Bq per kg': '3', 'Bq per kgd': '4', 'Bq per kgw': '5', 'kg per kg': '6', 'TU': '7', 'DELTA per mill': '8', 'atom per kg': '9', 'atom per kgd': '10', 'atom per kgw': '11', 'atom per l': '12', 'Bq per kgC': '13'}, 'dl': {'Not applicable': '-1', 'Not available': '0', 'Detected value': '1', 'Detection limit': '2', 'Not detected': '3', 'Derived': '4'}, 'species': {'NOT AVAILABLE': '0', 'Aristeus antennatus': '1', 'Apostichopus': '2', 'Saccharina japonica var religiosa': '3', 'Siganus fuscescens': '4', 'Alpheus dentipes': '5', 'Hexagrammos agrammus': '6', 'Ditrema temminckii': '7', 'Parapristipoma trilineatum': '8', 'Scombrops boops': '9', 'Pseudopleuronectes schrenki': '10', 'Desmarestia ligulata': '11', 'Saccharina japonica': '12', 'Neodilsea yendoana': '13', 'Costaria costata': '14', 'Sargassum yezoense': '15', 'Acanthephyra pelagica': '16', 'Sargassum ringgoldianum': '17', 'Acanthephyra quadrispinosa': '18', 'Sargassum thunbergii': '19', 'Sargassum patens': '20', 'Asterias rubens': '21', 'Sargassum miyabei': '22', 'Homarus gammarus': '23', 'Acanthephyra stylorostratis': '24', 'Acanthocybium solandri': '25', 'Acanthopagrus bifasciatus': '26', 'Acanthophora muscoides': '27', 'Acanthophora spicifera': '28', 'Acanthurus triostegus': '29', 'Actinopterygii': '30', 'Adamussium colbecki': '31', 'Ahnfeltiopsis densa': '32', 'Alepes melanoptera': '33', 'Ampharetidae': '34', 'Anchoviella lepidentostole': '35', 'Anguillidae': '36', 'Aphroditidae': '37', 'Arnoglossus': '38', 'Aurigequula fasciata': '39', 'Balaenoptera musculus': '40', 'Balaenoptera physalus': '41', 'Balistes': '42', 'Beryciformes': '43', 'Bryopsis maxima': '44', 'Callinectes sp': '45', 'Callorhinus ursinus': '46', 'Carassius auratus auratus': '47', 'Carcharhinus sorrah': '48', 'Caridae': '49', 'Clupea harengus': '50', 'Cathorops spixii': '51', 'Caulerpa racemosa': '52', 'Caulerpa scalpelliformis': '53', 'Caulerpa sertularioides': '54', 'Cellana radiata': '55', 'Coscinasterias tenuispina': '56', 'Centroceras clavulatum': '57', 'Centropomus parallelus': '58', 'Crangon crangon': '59', 'Ceramium diaphanum': '60', 'Ceramium rubrum': '61', 'Chaenocephalus aceratus': '62', 'Chaetodipterus faber': '63', 'Chaetomorpha antennina': '64', 'Chaetomorpha linoides': '65', 'Chelidonichthys kumu': '66', 'Chelon ramada': '67', 'Chiloscyllium': '68', 'Chionodraco hamatus': '69', 'Chlamys islandica': '70', 'Chlorophyta': '71', 'Chondrichthyes': '72', 'Chrysaora': '73', 'Cladophora nitellopsis': '74', 'Cladophora vagabunda': '75', 'Cladophoropsis membranacea': '76', 'Clupea': '77', 'Coccotylus truncatus': '78', 'Codium fragile': '79', 'Crassostrea': '80', 'Cynoscion acoupa': '81', 'Cynoscion jamaicensis': '82', 'Cynoscion leiarchus': '83', 'Engraulis encrasicolus': '84', 'Cypselurus agoo agoo': '85', 'Cystophora cristata': '86', 'Cystoseira barbata': '87', 'Cystoseira crinita': '88', 'Decapodiformes': '89', 'Decapterus russelli': '90', 'Decapterus scombrinus': '91', 'Delphinapterus leucas': '92', 'Delphinus capensis': '93', 'Diapterus rhombeus': '94', 'Dicentrarchus punctatus': '95', 'Fucus vesiculosus': '96', 'Funchalia woodwardi': '97', 'Ecklonia bicyclis': '98', 'Gadus morhua': '99', 'Ecklonia kurome': '100', 'Gennadas elegans': '101', 'Eisenia arborea': '102', 'Encrasicholina devisi': '103', 'Enteromorpha': '104', 'Enteromorpha flexuosa': '105', 'Enteromorpha intestinalis': '106', 'Epinephelinae': '107', 'Epinephelus diacanthus': '108', 'Exocoetidae': '109', 'Saccharina latissima': '110', 'Gracilaria corticata': '111', 'Ligur ensiferus': '112', 'Gracilaria debilis': '113', 'Gracilaria edulis': '114', 'Gracilariales': '115', 'Grateloupia elliptica': '116', 'Grateloupia filicina': '117', 'Lysmata seticaudata': '118', 'Gymnogongrus griffithsiae': '119', 'Mya arenaria': '120', 'Halichoerus grypus': '121', 'Macoma balthica': '122', 'Marthasterias glacialis': '123', 'Halimeda macroloba': '124', 'Harengula clupeola': '125', 'Harpagifer antarcticus': '126', 'Hemifusus ternatanus': '127', 'Hemiramphus brasiliensis': '128', 'Mytilus edulis': '129', 'Metapenaeus affinis': '130', 'Heteroscleromorpha': '131', 'Heterosigma akashiwo': '132', 'Hilsa ilisha': '133', 'Metapenaeus monoceros': '134', 'Metapenaeus stebbingi': '135', 'Holothuria': '136', 'Hoplobrotula armata': '137', 'Hypnea musciformis': '138', 'Merlangius merlangus': '139', 'Iridaea cordata': '140', 'Jania rubens': '141', 'Meganyctiphanes norvegica': '142', 'Johnius glaucus': '143', 'Kappaphycus': '144', 'Kappaphycus alvarezii': '145', 'Laevistrombus canarium': '146', 'Lagenodelphis hosei': '147', 'Lambia': '148', 'Laminaria japonica': '149', 'Laminaria longissima': '150', 'Larimus breviceps': '151', 'Laurencia papillosa': '152', 'Leiognathidae': '153', 'Leiognathus dussumieri': '154', 'Lepidochelys olivacea': '155', 'Leptonychotes weddellii': '156', 'Limanda yokohamae': '157', 'Nephrops norvegicus': '158', 'Neuston': '159', 'Littoraria undulata': '160', 'Loligo vulgaris': '161', 'Lumbrineridae': '162', 'Lutjanus fulviflamma': '163', 'Marginisporum aberrans': '164', 'Megalaspis cordyla': '165', 'Octopus vulgaris': '166', 'Menticirrhus americanus': '167', 'Mesoplodon densirostris': '168', 'Palaemon longirostris': '169', 'Metapenaeus brevicornis': '170', 'Pasiphaea multidentata': '171', 'Pasiphaea sivado': '172', 'Parapenaeopsis stylifera': '173', 'Miichthys miiuy': '174', 'Mirounga leonina': '175', 'Brachidontes striatulus': '176', 'Monodon monoceros': '177', 'Mugil platanus': '178', 'Penaeus semisulcatus': '179', 'Mullus barbatus': '180', 'Mycteroperca rubra': '181', 'Philocheras echinulatus': '182', 'Myelophycus simplex': '183', 'Mytilus coruscus': '184', 'Penaeus indicus': '185', 'Natator depressus': '186', 'Pandalus jordani': '187', 'Melicertus kerathurus': '188', 'Parapenaeus longirostris': '189', 'Plesionika': '190', 'Platichthys flesus': '191', 'Pleuronectes platessa': '192', 'Nematopalaemon tenuipes': '193', 'Nematoscelis difficilis': '194', 'Nemipterus': '195', 'Aegaeon lacazei': '196', 'Nephtyidae': '197', 'Nereididae': '198', 'Netuma bilineata': '199', 'Nibea maculata': '200', 'Oceana serrulata': '201', 'Palaemon serratus': '202', 'Ocypode': '203', 'Odobenus rosmarus': '204', 'Ogcocephalus vespertilio': '205', 'Oligoplites saurus': '206', 'Onuphidae': '207', 'Opheliidae': '208', 'Opisthonema oglinum': '209', 'Opisthopterus tardoore': '210', 'Orientomysis mitsukurii': '211', 'Otolithes cuvieri': '212', 'Padina pavonica': '213', 'Padina tetrastromatica': '214', 'Padina vickersiae': '215', 'Pagellus affinis': '216', 'Pagophilus groenlandicus': '217', 'Paguroidea': '218', 'Pagurus': '219', 'Systellaspis debilis': '220', 'Sergestes': '221', 'Sergestes arcticus': '222', 'Pampus argenteus': '223', 'Sergestes arachnipodus': '224', 'Sergestes henseni': '225', 'Sergestes prehensilis': '226', 'Sergestes robustus': '227', 'Pangasius pangasius': '228', 'Panulirus homarus': '229', 'Paracentrotus lividus': '230', 'Pasiphaea sp': '231', 'Pectinariidae': '232', 'Penaeus': '233', 'Phoca vitulina': '234', 'Photopectoralis bindus': '235', 'Phyllospadix iwatensis': '236', 'Plectorhinchus mediterraneus': '237', 'Pleuronectes mochigarei': '238', 'Pleuronectes obscurus': '239', 'Plocamium brasiliense': '240', 'Polynemus paradiseus': '241', 'Polysiphonia': '242', 'Sprattus sprattus': '243', 'Scomber scombrus': '244', 'Polysiphonia fucoides': '245', 'Gonostomatidae': '246', 'Perca fluviatilis': '247', 'Pomadasys crocro': '248', 'Porphyra tenera': '249', 'Potamogeton pectinatus': '250', 'Priacanthus hamrur': '251', 'Pseudorhombus malayanus': '252', 'Pterocladiella capillacea': '253', 'Pusa caspica': '254', 'Pusa sibirica': '255', 'Pylaiella littoralis': '256', 'Sabellidae': '257', 'Salangichthys ishikawae': '258', 'Sarconema filiforme': '259', 'Sardinella albella': '260', 'Sardinella brasiliensis': '261', 'Sardinops melanostictus': '262', 'Sargassum cymosum': '263', 'Sargassum linearifolium': '264', 'Sargassum micracanthum': '265', 'Xiphias gladius': '266', 'Sargassum novae hollandiae': '267', 'Sargassum oligocystum': '268', 'Esox lucius': '269', 'Limanda limanda': '270', 'Abramis brama': '271', 'Anguilla anguilla': '272', 'Arctica islandica': '273', 'Cerastoderma edule': '274', 'Cyprinus carpio': '275', 'Echinodermata': '276', 'Fish larvae': '277', 'Myoxocephalus scorpius': '278', 'Osmerus eperlanus': '279', 'Plankton': '280', 'Scophthalmus maximus': '281', 'Rhodophyta': '282', 'Rutilus rutilus': '283', 'Saduria entomon': '284', 'Sander lucioperca': '285', 'Gasterosteus aculeatus': '286', 'Zoarces viviparus': '287', 'Gymnocephalus cernua': '288', 'Furcellaria lumbricalis': '289', 'Cladophora glomerata': '290', 'Lateolabrax japonicus': '291', 'Okamejei kenojei': '292', 'Sebastes pachycephalus': '293', 'Squalus acanthias': '294', 'Gadus macrocephalus': '295', 'Paralichthys olivaceus': '296', 'Ovalipes punctatus': '297', 'Pseudopleuronectes yokohamae': '298', 'Hemitripterus villosus': '299', 'Clidoderma asperrimum': '300', 'Microstomus achne': '301', 'Lepidotrigla microptera': '302', 'Hexagrammos otakii': '303', 'Kareius bicoloratus': '304', 'Pleuronichthys cornutus': '305', 'Enteroctopus dofleini': '306', 'Ammodytes personatus': '307', 'Lophius litulon': '308', 'Eopsetta grigorjewi': '309', 'Takifugu porphyreus': '310', 'Loliolus japonica': '311', 'Sepia andreana': '312', 'Sebastes cheni': '313', 'Portunus trituberculatus': '314', 'Sebastes schlegelii': '315', 'Pennahia argentata': '316', 'Platichthys stellatus': '317', 'Gadus chalcogrammus': '318', 'Chelidonichthys spinosus': '319', 'Conger myriaster': '320', 'Heterololigo bleekeri': '321', 'Stichaeus grigorjewi': '322', 'Pseudopleuronectes herzensteini': '323', 'Octopus conispadiceus': '324', 'Hippoglossoides dubius': '325', 'Cleisthenes pinetorum': '326', 'Glyptocephalus stelleri': '327', 'Tanakius kitaharae': '328', 'Nibea mitsukurii': '329', 'Dasyatis matsubarai': '330', 'Verasper moseri': '331', 'Hemitrygon akajei': '332', 'Triakis scyllium': '333', 'Trachurus japonicus': '334', 'Zeus faber': '335', 'Pagrus major': '336', 'Acanthopagrus schlegelii': '337', 'Dentex tumifrons': '338', 'Mustelus manazo': '339', 'Seriola quinqueradiata': '340', 'Hyperoglyphe japonica': '341', 'Carcharhinus': '342', 'Platycephalus': '343', 'Scomber japonicus': '344', 'Squatina japonica': '345', 'Alopias pelagicus': '346', 'Zenopsis nebulosa': '347', 'Cynoglossus joyneri': '348', 'Verasper variegatus': '349', 'Oncorhynchus keta': '350', 'Physiculus japonicus': '351', 'Oplegnathus punctatus': '352', 'Arothron hispidus': '353', 'Stereolepis doederleini': '354', 'Takifugu snyderi': '355', 'Scomber australasicus': '356', 'Liparis tanakae': '357', 'Thamnaconus modestus': '358', 'Gnathophis nystromi': '359', 'Sebastes oblongus': '360', 'Sebastiscus marmoratus': '361', 'Takifugu pardalis': '362', 'Mugil cephalus': '363', 'Ditrema temminckii temminckii': '364', 'Konosirus punctatus': '365', 'Tribolodon brandtii': '366', 'Oncorhynchus masou': '367', 'Aluterus monoceros': '368', 'Todarodes pacificus': '369', 'Myoxocephalus stelleri': '370', 'Myliobatis tobijei': '371', 'Scyliorhinus torazame': '372', 'Lophiomus setigerus': '373', 'Heterodontus japonicus': '374', 'Sebastes vulpes': '375', 'Paraplagusia japonica': '376', 'Ostrea edulis': '377', 'Melanogrammus aeglefinus': '378', 'Pollachius virens': '379', 'Pollachius pollachius': '380', 'Sebastes marinus': '381', 'Anarhichas minor': '382', 'Anarhichas denticulatus': '383', 'Reinhardtius hippoglossoides': '384', 'Trisopterus esmarkii': '385', 'Micromesistius poutassou': '386', 'Coryphaenoides rupestris': '387', 'Argentina silus': '388', 'Salmo salar': '389', 'Sebastes viviparus': '390', 'Buccinum undatum': '391', 'Fucus serratus': '392', 'Merluccius merluccius': '393', 'Littorina littorea': '394', 'Fucus': '395', 'Rhodymenia': '396', 'Solea solea': '397', 'Trachurus trachurus': '398', 'Eutrigla gurnardus': '399', 'Pelvetia canaliculata': '400', 'Ascophyllum nodosum': '401', 'Mallotus villosus': '402', 'Pecten maximus': '403', 'Hippoglossoides platessoides': '404', 'Sebastes mentella': '405', 'Modiolus modiolus': '406', 'Boreogadus saida': '407', 'Sepia': '408', 'Gadus': '409', 'Sardina pilchardus': '410', 'Pleuronectiformes': '411', 'Molva molva': '412', 'Patella': '413', 'Crassostrea gigas': '414', 'Dasyatis pastinaca': '415', 'Lophius piscatorius': '416', 'Porphyra umbilicalis': '417', 'Patella vulgata': '418', 'Brosme brosme': '419', 'Glyptocephalus cynoglossus': '420', 'Galeus melastomus': '421', 'Chimaera monstrosa': '422', 'Etmopterus spinax': '423', 'Dicentrarchus labrax': '424', 'Osilinus lineatus': '425', 'Hippoglossus hippoglossus': '426', 'Cyclopterus lumpus': '427', 'Molva dypterygia': '428', 'Microstomus kitt': '429', 'Fucus distichus': '430', 'Tapes': '431', 'Sebastes norvegicus': '432', 'Phycis blennoides': '433', 'Fucus spiralis': '434', 'Laminaria digitata': '435', 'Dipturus batis': '436', 'Anarhichas lupus': '437', 'Lumpenus lampretaeformis': '438', 'Lycodes vahlii': '439', 'Argentina sphyraena': '440', 'Trisopterus minutus': '441', 'Thunnus': '442', 'Hyperoplus lanceolatus': '443', 'Gaidropsarus argentatus': '444', 'Engraulis japonicus': '445', 'Mytilus galloprovincialis': '446', 'Undaria pinnatifida': '447', 'Chlorophthalmus albatrossis': '448', 'Sargassum fusiforme': '449', 'Eisenia bicyclis': '450', 'Spisula sachalinensis': '451', 'Strongylocentrotus nudus': '452', 'Haliotis discus hannai': '453', 'Dexistes rikuzenius': '454', 'Ruditapes philippinarum': '455', 'Apostichopus japonicus': '456', 'Pterothrissus gissu': '457', 'Helicolenus hilgendorfii': '458', 'Buccinum isaotakii': '459', 'Neptunea intersculpta': '460', 'Apostichopus nigripunctatus': '461', 'Sebastes thompsoni': '462', 'Oratosquilla oratoria': '463', 'Oncorhynchus kisutch': '464', 'Erimacrus isenbeckii': '465', 'Sillago japonica': '466', 'Trachysalambria curvirostris': '467', 'Mytilus unguiculatus': '468', 'Crassostrea nippona': '469', 'Laminariales': '470', 'Uroteuthis edulis': '471', 'Takifugu poecilonotus': '472', 'Neptunea arthritica': '473', 'Katsuwonus pelamis': '474', 'Doederleinia berycoides': '475', 'Metapenaeopsis dalei': '476', 'Seriola dumerili': '477', 'Pseudorhombus pentophthalmus': '478', 'Stephanolepis cirrhifer': '479', 'Cookeolus japonicus': '480', 'Panulirus japonicus': '481', 'Thunnus orientalis': '482', 'Halocynthia roretzi': '483', 'Etrumeus sadina': '484', 'Cololabis saira': '485', 'Coryphaena hippurus': '486', 'Sarda orientalis': '487', 'Octopus ocellatus': '488', 'Sardinops sagax': '489', 'Sphyraena pinguis': '490', 'Sebastes ventricosus': '491', 'Occella iburia': '492', 'Glossanodon semifasciatus': '493', 'Mizuhopecten yessoensis': '494', 'Neosalangichthys ishikawae': '495', 'Bothrocara tanakae': '496', 'Malacocottus zonurus': '497', 'Coelorinchus macrochir': '498', 'Neptunea constricta': '499', 'Beringius polynematicus': '500', 'Sebastes nivosus': '501', 'Pandalus eous': '502', 'Synaphobranchus kaupii': '503', 'Sebastolobus macrochir': '504', 'Marsupenaeus japonicus': '505', 'Japelion hirasei': '506', 'Pleurogrammus azonus': '507', 'Monostroma nitidum': '508', 'Atheresthes evermanni': '509', 'Takifugu rubripes': '510', 'Chionoecetes opilio': '511', 'Pandalopsis coccinata': '512', 'Chionoecetes japonicus': '513', 'Sebastes matsubarae': '514', 'Scombrops gilberti': '515', 'Hyporhamphus sajori': '516', 'Trichiurus lepturus': '517', 'Alcichthys elongatus': '518', 'Volutharpa perryi': '519', 'Mercenaria stimpsoni': '520', 'Berryteuthis magister': '521', 'Aptocyclus ventricosus': '522', 'Euphausia pacifica': '523', 'Salangichthys microdon': '524', 'Telmessus acutidens': '525', 'Ceratophyllum demersum': '526', 'Pandalus nipponensis': '527', 'Sebastes owstoni': '528', 'Cociella crocodilus': '529', 'Conger japonicus': '530', 'Sardinella zunasi': '531', 'Cheilopogon pinnatibarbatus japonicus': '532', 'Oplegnathus fasciatus': '533', 'Macridiscus aequilatera': '534', 'Repomucenus ornatipinnis': '535', 'Clupea pallasii': '536', 'Scorpaena neglecta': '537', 'Scomberomorus niphonius': '538', 'Leucopsarion petersii': '539', 'Sebastes scythropus': '540', 'Strongylura anastomella': '541', 'Laemonema longipes': '542', 'Fusitriton oregonensis': '543', 'Japelion pericochlion': '544', 'Sebastes steindachneri': '545', 'Auxis rochei': '546', 'Lobotes surinamensis': '547', 'Auxis thazard': '548', 'Chlorophthalmus borealis': '549', 'Etelis coruscans': '550', 'Sebastes inermis': '551', 'Cynoglossus interruptus': '552', 'Erilepis zonifer': '553', 'Tridentiger obscurus': '554', 'Caranx sexfasciatus': '555', 'Thunnus thynnus': '556', 'Takifugu stictonotus': '557', 'Euthynnus affinis': '558', 'Synagrops japonicus': '559', 'Okamejei schmidti': '560', 'Suggrundus meerdervoortii': '561', 'Sebastes baramenuke': '562', 'Pleurogrammus monopterygius': '563', 'Decapterus maruadsi': '564', 'Girella punctata': '565', 'Sphyraena japonica': '566', 'Ommastrephes bartramii': '567', 'Sepiella japonica': '568', 'Sepioteuthis lessoniana': '569', 'Eucleoteuthis luminosa': '570', 'Gloiopeltis furcata': '571', 'Macrobrachium nipponense': '572', 'Sepia kobiensis': '573', 'Eriocheir japonica': '574', 'Magallana nippona': '575', 'Meretrix lusoria': '576', 'Chondrus ocellatus': '577', 'Chondrus elatus': '578', 'Gloiopeltis': '579', 'Holothuroidea': '580', 'Corbicula japonica': '581', 'Sunetta menstrualis': '582', 'Pseudorhombus cinnamoneus': '583', 'Takifugu niphobles': '584', 'Lagocephalus gloveri': '585', 'Beryx splendens': '586', 'Parastichopus nigripunctatus': '587', 'Venerupis philippinarum': '588', 'Haliotis': '589', 'Liparis agassizii': '590', 'Seriola lalandi': '591', 'Niphon spinosus': '592', 'Pleuronichthys japonicus': '593', 'Sergia lucens': '594', 'Sphoeroides pachygaster': '595', 'Coryphaenoides acrolepis': '596', 'Pseudopleuronectes obscurus': '597', 'Pyropia yezoensis': '598', 'Isurus oxyrinchus': '599', 'Sargassum fulvellum': '600', 'Prionace glauca': '601', 'Kajikia audax': '602', 'Thunnus albacares': '603', 'Thunnus alalunga': '604', 'Thunnus obesus': '605', 'Lamna ditropis': '606', 'Glyptocidaris crenularis': '607', 'Asterias amurensis': '608', 'Sepiida': '609', 'Congridae': '610', 'Takifugu': '611', 'Sargassum horneri': '612', 'Haliotis discus': '613', 'Pleuronectidae': '614', 'Acanthogobius flavimanus': '615', 'Acanthogobius lactipes': '616', 'Pholis nebulosa': '617', 'Hemigrapsus penicillatus': '618', 'Palaemon paucidens': '619', 'Mysidae': '620', 'Zostera marina': '621', 'Ulva pertusa': '622', 'Gobiidae': '623', 'Atherinidae': '624', 'Tribolodon': '625', 'Alpheus': '626', 'Polychaeta': '627', 'Sebastes': '628', 'Charybdis japonica': '629', 'Hemigrapsus': '630', 'Favonigobius gymnauchen': '631', 'Palaemon': '632', 'Planiliza haematocheila': '633', 'Palaemonidae': '634', 'Pholis crassispina': '635', 'Laminaria': '636', 'Distolasterias nipon': '637', 'Lophiiformes': '638', 'Alpheus brevicristatus': '639', 'Undaria undariodes': '640', 'Neomysis awatschensis': '641', 'Alpheidae': '642', 'Macrobrachium': '643', 'Hediste': '644', 'Gymnogobius breunigii': '645', 'Luidia quinaria': '646', 'Rhizoprionodon acutus': '647', 'Carangoides equula': '648', 'Carcinoplax longimana': '649', 'Anomura': '650', 'Spatangoida': '651', 'Plesiobatis daviesi': '652', 'Eusphyra blochii': '653', 'Ruditapes variegata': '654', 'Sinonovacula constricta': '655', 'Penaeus monodon': '656', 'Litopenaeus vannamei': '657', 'Solenocera crassicornis': '658', 'Stomatopoda': '659', 'Teuthida': '660', 'Octopus': '661', 'Larimichthys polyactis': '662', 'Scomberomorini': '663', 'Channa argus': '664', 'Ranina ranina': '665', 'Lates calcarifer': '666', 'Scomberomorus commerson': '667', 'Lutjanus malabaricus': '668', 'Thenus parindicus': '669', 'Amusium pleuronectes': '670', 'Loligo': '671', 'Plectropomus leopardus': '672', 'Sillago ciliata': '673', 'Scylla serrata': '674', 'Pinctada maxima': '675', 'Lutjanus argentimaculatus': '676', 'Protonibea diacanthus': '677', 'Polydactylus macrochir': '678', 'Rachycentron canadum': '679', 'Ibacus peronii': '680', 'Arripis trutta': '681', 'Sarda australis': '682', 'Seriola hippos': '683', 'Choerodon schoenleinii': '684', 'Panulirus ornatus': '685', 'Neotrygon kuhlii': '686', 'Lethrinus nebulosus': '687', 'Parupeneus multifasciatus': '688', 'Saccostrea cucullata': '689', 'Lutjanus sebae': '690', 'Thunnus maccoyii': '691', 'Acanthopagrus butcheri': '692', 'Lambis lambis': '693', 'Gerres subfasciatus': '694', 'Zooplankton': '695', 'Phytoplankton': '696', 'Rapana venosa': '697', 'Scapharca inaequivalvis': '698', 'Ulva intestinalis': '699', 'Ulva linza': '700', 'Ceramium virgatum': '701', 'Gayralia oxysperma': '702', 'Vertebrata fucoides': '703', 'Stuckenia pectinata': '704', 'Rochia nilotica': '705', 'Ctenochaetus striatus': '706', 'Serranidae': '707', 'Turbo setosus': '708', 'Pandalidae': '709', 'Gymnosarda unicolor': '710', 'Epinephelini': '711', 'Pisces': '712', 'Liza klunzingeri': '713', 'Acanthopagrus latus': '714', 'Liza subviridis': '715', 'Sparidentex hasta': '716', 'Otolithes ruber': '717', 'Crenidens crenidens': '718', 'Ensis': '719', 'Gastropoda': '720', 'Euheterodonta': '721', 'Scomber': '722', 'Theragra chalcogramma': '723', 'Engraulidae': '724', 'Ostreidae': '725', 'Phaeophyceae': '726', 'Porphyra': '727', 'Ulva reticulata': '728', 'Perna viridis': '729', 'Fenneropenaeus indicus': '730', 'Merluccius': '731', 'Soleidae': '732', 'Mugilidae': '733', 'Marine algae': '734', 'Scarus rivulatus': '735', 'Scarus coeruleus': '736', 'Sardinella fimbriata': '737', 'Dussumieria acuta': '738', 'Lutjanus kasmira': '739', 'Lutjanus rivulatus': '740', 'Lutjanus bohar': '741', 'Priacanthus blochii': '742', 'Pelates quadrilineatus': '743', 'Epinephelus fasciatus': '744', 'Upeneus vittatus': '745', 'Lethrinus laticaudis': '746', 'Lethrinus lentjan': '747', 'Lethrinus microdon': '748', 'Sphyraena barracuda': '749', 'Alectis indica': '750', 'Epinephelus latifasciatus': '751', 'Nemipterus japonicus': '752', 'Raconda russeliana': '753', 'Lactarius lactarius': '754', 'Aetomylaeus bovinus': '755', 'Pennahia anea': '756', 'Leiognathus fasciatus': '757', 'Sardinella longiceps': '758', 'Tenualosa ilisha': '759', 'Pellona ditchela': '760', 'Stolephorus indicus': '761', 'Setipinna breviceps': '762', 'Rastrelliger kanagurta': '763', 'Chanos chanos': '764', 'Lepturacanthus savala': '765', 'Epinephelus niveatus': '766', 'Lutjanus johnii': '767', 'Carangoides malabaricus': '768', 'Ablennes hians': '769', 'Chirocentrus dorab': '770', 'Scomberomorus cavalla': '771', 'Scomberomorus semifasciatus': '772', 'Scomberomorus guttatus': '773', 'Etrumeus teres': '774', 'Spondyliosoma cantharus': '775', 'Brama brama': '776', 'Dasyatis zugei': '777', 'Harpadon nehereus': '778', 'Carcharhinus melanopterus': '779', 'Penaeus plebejus': '780', 'Sepia officinalis': '781', 'Johnius dussumieri': '782', 'Lutjanus campechanus': '783', 'Ruditapes decussatus': '784', 'Carcinus aestuarii': '785', 'Squilla mantis': '786', 'Epinephelus polyphekadion': '787', 'Lutjanus gibbus': '788', 'Lethrinus mahsena': '789', 'Epinephelus chlorostigma': '790', 'Carangoides bajad': '791', 'Aethaloperca rogaa': '792', 'Atule mate': '793', 'Macolor niger': '794', 'Carangoides fulvoguttatus': '795', 'Plectropomus areolatus': '796', 'Cephalopholis argus': '797', 'Cephalopholis': '798', 'Scarus sordidus': '799', 'Scomberomorus tritor': '800', 'Triaenodon obesus': '801', 'Pomadasys commersonnii': '802', 'Monotaxis grandoculis': '803', 'Plectropomus maculatus': '804', 'Trachinotus blochii': '805', 'Pristipomoides filamentosus': '806', 'Acanthurus gahhm': '807', 'Acanthurus sohal': '808', 'Siganus argenteus': '809', 'Naso unicornis': '810', 'Chanos': '811', 'Oedalechilus labiosus': '812', 'Plectorhinchus gaterinus': '813', 'Mercenaria mercenaria': '814', 'Mytilus': '815', 'Turbo cornutus': '816', 'Decapoda': '817', 'Sphyraena': '818', 'Arius maculatus': '819', 'Penaeus merguiensis': '820', 'Tegillarca granosa': '821', 'Mullus barbatus barbatus': '822', 'Chamelea gallina': '823', 'Metanephrops thomsoni': '824', 'Magallana gigas': '825', 'Branchiostegus japonicus': '826', 'Cephalopoda': '827', 'Lutjanidae': '828', 'Lethrinidae': '829', 'Sphyraena argentea': '830', 'Chirocentrus nudus': '831', 'Trachinotus': '832', 'Mugil auratus': '833', 'Euthynnus alletteratus': '834', 'Sparus aurata': '835', 'Pagrus caeruleostictus': '836', 'Scorpaena scrofa': '837', 'Pagellus erythrinus': '838', 'Epinephelus aeneus': '839', 'Dentex maroccanus': '840', 'Caranx rhonchus': '841', 'Sardinella': '842', 'Siganus': '843', 'Solea': '844', 'Diplodus sargus': '845', 'Lithognathus mormyrus': '846', 'Oblada melanura': '847', 'Siganus rivulatus': '848', 'Chelon labrosus': '849', 'Cynoscion microlepidotus': '850', 'Genypterus brasiliensis': '851', 'Myoxocephalus polyacanthocephalus': '852', 'Hexagrammos lagocephalus': '853', 'Hexagrammos decagrammus': '854', 'Sebastes ciliatus': '855', 'Lepidopsetta polyxystra': '856', 'Clupeiformes': '857', 'Gadidae': '858', 'Brachyura': '859', 'Dasyatis': '860', 'Carcharias': '861', 'Saurida': '862', 'Upeneus': '863', 'Cynoglossus': '864', 'Scomberomorus': '865', 'Terapon': '866', 'Leiognathus': '867', 'Terapontidae': '868', 'Caranx': '869', 'Diplodus': '870', 'Plectorhinchus flavomaculatus': '871', 'Salmonidae': '872', 'Mollusca': '873', 'Boops boops': '874', 'Sarpa salpa': '875', 'Pagellus acarne': '876', 'Spicara smaris': '877', 'Diplodus vulgaris': '878', 'Chelidonichthys lucerna': '879', 'Sarda sarda': '880', 'Serranus cabrilla': '881', 'Diplodus annularis': '882', 'Pagrus pagrus': '883', 'Alosa fallax': '884', 'Belone belone': '885', 'Dentex dentex': '886', 'Sphyraena viridensis': '887', 'Trisopterus capelanus': '888', 'Arnoglossus laterna': '889', 'Procambarus clarkii': '890', 'Nemadactylus macropterus': '891', 'Pagrus auratus': '892', 'Jasus edwardsii': '893', 'Perna canaliculus': '894', 'Pseudophycis bachus': '895', 'Haliotis iris': '896', 'Hoplostethus atlanticus': '897', 'Rhombosolea leporina': '898', 'Zygochlamys delicatula': '899', 'Galeorhinus galeus': '900', 'Parapercis colias': '901', 'Tiostrea chilensis': '902', 'Genypterus blacodes': '903', 'Evechinus chloroticus': '904', 'Austrovenus stutchburyi': '905', 'Micromesistius australis': '906', 'Macruronus novaezelandiae': '907', 'Nototodarus': '908', 'Perna perna': '909', 'Sepia pharaonis': '910', 'Turbo bruneus': '911', 'Portunus sanguinolentus': '912', 'Charybdis natator': '913', 'Charybdis lucifera': '914', 'Panulirus argus': '915', 'Ethmalosa fimbriata': '916', 'Sardinella brachysoma': '917', 'Thryssa mystax': '918', 'Plicofollis dussumieri': '919', 'Nibea soldado': '920', 'Epinephelus melanostigma': '921', 'Megalops cyprinoides': '922', 'Decapterus macarellus': '923', 'Drepane punctata': '924', 'Sillago sihama': '925', 'Tylosurus crocodilus crocodilus': '926', 'Saurida tumbil': '927', 'Cynoglossus macrostomus': '928', 'Parupeneus indicus': '929', 'Synechogobius hasta': '930', 'Busycotypus canaliculatus': '931', 'Pampus cinereus': '932', 'Pomadasys kaakan': '933', 'Epinephelus coioides': '934', 'Sepiella inermis': '935', 'Uroteuthis duvauceli': '936', 'Stomatella auricula': '937', 'Cerithium scabridum': '938', 'Marcia recens': '939', 'Circe intermedia': '940', 'Marcia opima': '941', 'Fulvia fragile': '942', 'Charybdis feriatus': '943', 'Charybdis annulata': '944', 'Atergatis integerrimus': '945', 'Matuta lunaris': '946', 'Calappa lophos': '947', 'Uca annulipes': '948', 'Chlamys varia': '949', 'Cololabis adocetus': '950', 'Seriola lalandi dorsalis': '951', 'Brunneifusus ternatanus': '952', 'Metapenaeus joyneri': '953', 'Epinephelus tauvina': '954', 'Coilia dussumieri': '955', 'Carcharhinus dussumieri': '956', 'Upeneus tragula': '957', 'Sartoriana spinigera': '958', 'Lamellidens marginalis': '959', 'Polydactylus sextarius': '960', 'Johnius macrorhynus': '961', 'Hexanematichthys sagor': '962', 'Sargassum swartzii': '963', 'Argyrops spinifer': '964', 'Synodus intermedius': '965', 'Muraenesox cinereus': '966', 'Carangoides armatus': '967', 'Eleutheronema tetradactylum': '968', 'Mustelus mosis': '969', 'Nemipterus bipunctatus': '970', 'Lutjanus quinquelineatus': '971', 'Platycephalus indicus': '972', 'Rhabdosargus haffara': '973', 'Argyrops filamentosus': '974', 'Brachirus orientalis': '975', 'Mene maculata': '976', 'Hemiramphus marginatus': '977', 'Encrasicholina heteroloba': '978', 'Trachinotus africanus': '979', 'Bramidae': '980', 'Escualosa thoracata': '981', 'Sepia arabica': '982', 'Scatophagus argus': '983', 'Parastromateus niger': '984', 'Planiliza subviridis': '985', 'Labeo rohita': '986', 'Oreochromis niloticus': '987', 'Cardiidae': '988', 'Sargassum angustifolium': '989', 'Pomacea bridgesii': '990', 'Sebastes fasciatus': '991', 'Batoidea': '992', 'Urophycis chuss': '993', 'Dalatias licha': '994', 'Trisopterus luscus': '995', 'Scyliorhinus canicula': '996', 'Ruvettus pretiosus': '997', 'Aphanopus carbo': '998', 'Alepocephalus bairdii': '999', 'Centroscymnus coelolepis': '1000', 'Loligo forbesii': '1001', 'Lutjanus cyanopterus': '1002', 'Mugil liza': '1003', 'Micropogonias furnieri': '1004', 'Balistes capriscus': '1005', 'Haemulidae': '1006', 'Stenotomus caprinus': '1007', 'Hemanthias leptus': '1008', 'Micropogonias undulatus': '1009', 'Cynoscion nebulosus': '1010', 'Rhomboplites aurorubens': '1011', 'Bothidae': '1012', 'Pogonias cromis': '1013', 'Lutjanus synagris': '1014', 'Netuma thalassina': '1015', 'Sillaginopsis panijus': '1016', 'Leptomelanosoma indicum': '1017', 'Therapon': '1018', 'Pterotolithus maculatus': '1019', 'Ilisha filigera': '1020', 'Hilsa kelee': '1021', 'Pampus chinensis': '1022', 'Palaemon styliferus': '1023', 'Argyrosomus regius': '1024', 'Lutjanus': '1025', 'Sciades': '1026', 'Mullus': '1027', 'Albula vulpes': '1028', 'Selar crumenophthalmus': '1029', 'Centropomus': '1030', 'Sardinella aurita': '1031', 'Harengula humeralis': '1032', 'Diapterus auratus': '1033', 'Gerres cinereus': '1034', 'Haemulon parra': '1035', 'Ocyurus chrysurus': '1036', 'Sphyraena guachancho': '1037', 'Anoplopoma fimbria': '1038', 'Nerita versicolor': '1039', 'Bulla striata': '1040', 'Melongena melongena': '1041', 'Trachycardium muricatum': '1042', 'Isognomon alatus': '1043', 'Brachidontes exustus': '1044', 'Crassostrea virginica': '1045', 'Protothaca granulata': '1046', 'Cittarium pica': '1047', 'Penaeus schmitti': '1048', 'Penaeus notialis': '1049', 'Callinectes sapidus': '1050', 'Callinectes danae': '1051', 'Dasyatidae': '1052', 'Caridea': '1053', 'Nephropidae': '1054', 'Sparus': '1055', 'Sargassum boveanum': '1056', 'Haliotis tuberculata': '1057', 'Littorinidae': '1058', 'Seaweed': '1059', 'Echinoidea': '1060', 'Ostreida': '1061', 'Donax trunculus': '1062', 'Scrobicularia plana': '1063', 'Venus verrucosa': '1064', 'Solen marginatus': '1065', 'Testudines': '1066', 'Mullidae': '1067', 'Amphipoda': '1068', 'Cystosphaera jacquinotii': '1069', 'Daption capense': '1070', 'Desmarestia anceps': '1071', 'Himantothallus grandifolius': '1072', 'Mirounga': '1073', 'Nacella concinna': '1074', 'Notothenia coriiceps': '1075', 'Pygoscelis antarcticus': '1076', 'Pygoscelis papua': '1077', 'Oncorhynchus gorbuscha': '1078', 'Oncorhynchus mykiss': '1079', 'Oncorhynchus nerka': '1080', 'Oncorhynchus tshawytscha': '1081', 'Erignathus barbatus': '1082', 'Pusa hispida': '1083', 'Hippoglossus stenolepis': '1084', 'Squalus suckleyi': '1085', 'Sargassum': '1086', 'Codium': '1087', 'Membranoptera alata': '1088', 'Dictyota dichotoma': '1089', 'Plocamium cartilagineum': '1090', 'Galatea paradoxa': '1091', 'Crassostrea tulipa': '1092', 'Macrobrachium sp': '1093', 'Portunus': '1094', 'Tympanotonos fuscatus': '1095', 'Thais': '1096', 'Bivalvia': '1097', 'Cynoglossus senegalensis': '1098', 'Carlarius heudelotii': '1099', 'Fontitrygon margarita': '1100', 'Chrysichthys nigrodigitatus': '1101', 'Acanthephyra purpurea': '1102', 'Actinauge abyssorum': '1103', 'Alaria marginata': '1104', 'Anadara transversa': '1105', 'Anthomedusae': '1106', 'Archosargus probatocephalus': '1107', 'Argyropelecus aculeatus': '1108', 'Ariopsis felis': '1109', 'Astrometis sertulifera': '1110', 'Astropecten': '1111', 'Atherina breviceps': '1112', 'Atolla': '1113', 'Aulacomya atra': '1114', 'Auxis rochei rochei': '1115', 'Auxis thazard thazard': '1116', 'Avicennia marina': '1117', 'Balaena mysticetus': '1118', 'Balaenoptera acutorostrata': '1119', 'Balanus': '1120', 'Berardius bairdii': '1121', 'Beroe': '1122', 'Boopsoidea inornata': '1123', 'Calanoida': '1124', 'Calanus finmarchicus finmarchicus': '1125', 'Callorhinchus milii': '1126', 'Cepphus columba': '1127', 'Cladonia rangiferina': '1128', 'Clinus superciliosus': '1129', 'Codium tomentosum': '1130', 'Copepoda': '1131', 'Coregonus autumnalis': '1132', 'Coregonus nasus': '1133', 'Coregonus sardinella': '1134', 'Coryphaenoides armatus': '1135', 'Coryphoblennius galerita': '1136', 'Creseis sp': '1137', 'Crinoidea': '1138', 'Crossota': '1139', 'Cryptochiton stelleri': '1140', 'Delphinus delphis': '1141', 'Diacria': '1142', 'Dichistius capensis': '1143', 'Dosinia alta': '1144', 'Dugong dugon': '1145', 'Electrona risso': '1146', 'Engraulis capensis': '1147', 'Ensis siliqua': '1148', 'Eryonidae': '1149', 'Eualaria fistulosa': '1150', 'Eupasiphae gilesii': '1151', 'Euphausiacea': '1152', 'Euphausiidae': '1153', 'Eurypharynx pelecanoides': '1154', 'Eurythenes gryllus': '1155', 'Euthynnus lineatus': '1156', 'Fratercula cirrhata': '1157', 'Galeichthys feliceps': '1158', 'Gelidium corneum': '1159', 'Gibbula umbilicalis': '1160', 'Gnathophausia ingens': '1161', 'Gonatus fabricii': '1162', 'Haliaeetus leucocephalus': '1163', 'Haliclona': '1164', 'Halodule uninervis': '1165', 'Hemilepidotus': '1166', 'Hemilepidotus jordani': '1167', 'Heterocarpus ensifer': '1168', 'Heterodontus portusjacksoni': '1169', 'Hippasteria phrygiana': '1170', 'Homola barbata': '1171', 'Hyperoodon planifrons': '1172', 'Hypleurochilus geminatus': '1173', 'Invertebrata': '1174', 'Isognomon bicolor': '1175', 'Isopoda': '1176', 'Kogia breviceps': '1177', 'Labrus bergylta': '1178', 'Lagenorhynchus obliquidens': '1179', 'Lampris guttatus': '1180', 'Larus glaucescens': '1181', 'Leander serratus': '1182', 'Libinia emarginata': '1183', 'Lichia amia': '1184', 'Lipophrys pholis': '1185', 'Lipophrys trigloides': '1186', 'Lithognathus lithognathus': '1187', 'Lithophaga aristata': '1188', 'Lobianchia gemellarii': '1189', 'Loliginidae': '1190', 'Loligo reynaudii': '1191', 'Lophius budegassa': '1192', 'Magallana angulata': '1193', 'Majoidea': '1194', 'Megachasma pelagios': '1195', 'Megaptera novaeangliae': '1196', 'Menippe mercenaria': '1197', 'Mesoplodon carlhubbsi': '1198', 'Mesoplodon stejnegeri': '1199', 'Microstomus pacificus': '1200', 'Morone saxatilis': '1201', 'Mullus surmuletus': '1202', 'Mycteroperca xenarcha': '1203', 'Myliobatis australis': '1204', 'Mysida': '1205', 'Mytilus californianus': '1206', 'Mytilus trossulus': '1207', 'Nephasoma Nephasoma flagriferum': '1208', 'Nudibranchia': '1209', 'Odobenus rosmarus divergens': '1210', 'Ommastrephidae': '1211', 'Ophiomusa lymani': '1212', 'Ophiothrix lineata': '1213', 'Orcinus orca': '1214', 'Ostracoda': '1215', 'Pagellus bogaraveo': '1216', 'Pandalus borealis': '1217', 'Paphies subtriangulata': '1218', 'Parabrotula': '1219', 'Paracalanus': '1220', 'Patella aspera': '1221', 'Periphylla': '1222', 'Phocoena phocoena': '1223', 'Phocoenoides dalli': '1224', 'Phronima': '1225', 'Physeter macrocephalus': '1226', 'Pinctada radiata': '1227', 'Plesionika edwardsii': '1228', 'Pododesmus macrochisma': '1229', 'Pomatomus saltatrix': '1230', 'Portunus pelagicus': '1231', 'Praunus': '1232', 'Pyrosoma': '1233', 'Rangifer tarandus': '1234', 'Rhabdosargus globiceps': '1235', 'Saccorhiza polyschides': '1236', 'Sagitta': '1237', 'Salpa': '1238', 'Salvelinus alpinus': '1239', 'Salvelinus malma': '1240', 'Sarda chiliensis': '1241', 'Sargassum aquifolium': '1242', 'Scalibregmatidae': '1243', 'Sebastes alutus': '1244', 'Sebastes melanops': '1245', 'Seriola dorsalis': '1246', 'Serranus scriba': '1247', 'Sigmops bathyphilus': '1248', 'Silicula fragilis': '1249', 'Sipunculidae': '1250', 'Somateria mollissima': '1251', 'Somateria spectabilis': '1252', 'Sparodon durbanensis': '1253', 'Spicara maena': '1254', 'Squatina australis': '1255', 'Striostrea margaritacea': '1256', 'Stromateus fiatola': '1257', 'Strongylocentrotus polyacanthus': '1258', 'Taractichthys steindachneri': '1259', 'Tectura scutum': '1260', 'Tegula viridula': '1261', 'Thais haemastoma': '1262', 'Thegrefg': '1263', 'Themisto': '1264', 'Thunnus tonggol': '1265', 'Trachurus picturatus': '1266', 'Trachurus symmetricus': '1267', 'Trygonorrhina fasciata': '1268', 'Ulva lactuca': '1269', 'Ursus maritimus': '1270', 'Vampyroteuthis infernalis': '1271', 'Ziphius cavirostris': '1272', 'Alepes kleinii': '1273', 'Alepes vari': '1274', 'Decapterus macrosoma': '1275', 'Lutjanus madras': '1276', 'Lutjanus russellii': '1277', 'Rastrelliger brachysoma': '1278', 'Rastrelliger faughni': '1279', 'Selar boops': '1280', 'Selaroides leptolepis': '1281', 'Sphyraena obtusata': '1282', 'Geloina expansa': '1283', 'Caesio erythrogaster': '1284', 'Euristhmus microceps': '1285', 'Pomacanthus annularis': '1286', 'Scylla': '1287', 'Plotosus lineatus': '1288', 'Prionotus stephanophrys': '1289', 'Trachurus murphyi': '1290', 'Dosidicus gigas': '1291', 'Sarda chiliensis chiliensis': '1292', 'Cynoscion analis': '1293', 'Merluccius gayi peruanus': '1294', 'Brotula ordwayi': '1295', 'Loligo gahi': '1296', 'Merluccius gayi': '1297', 'Ophichthus remiger': '1298', 'Penaeus sp': '1299', 'Trachinotus paitensis': '1300', 'Cheilopogon heterurus': '1301', 'Engraulis ringens': '1302', 'Sciaena deliciosa': '1303', 'Isacia conceptionis': '1304', 'Odontesthes regia': '1305', 'Bodianus diplotaenia': '1306', 'Concholepas concholepas': '1307', 'Diplectrum conceptione': '1308', 'Genypterus maculatus': '1309', 'Labrisomus philippii': '1310', 'Paralabrax humeralis': '1311', 'Prionotus horrens': '1312', 'Dasyatis akajei': '1313', 'Arctoscopus japonicus': '1314', 'Sepia esculenta': '1315', 'Bothrocara hollandi': '1316', 'Cynoglossidae': '1317', 'Lepidotrigla': '1318', 'Lepidotrigla alata': '1319', 'Octopus sinensis': '1320', 'Rhabdosargus sarba': '1321', 'Lophiidae': '1322', 'Muraenesox': '1323', 'Physiculus maximowiczi': '1324', 'Pleuronectoidei': '1325', 'Sciaenidae': '1326', 'Triglidae': '1327', 'Atherina presbyter': '1328', 'Bentheogennema intermedia': '1329', 'Benthesicymidae': '1330', 'Benthesicymus': '1331', 'Buccinum striatissimum': '1332', 'Callinectes': '1333', 'Cancer pagurus': '1334', 'Chaetognatha': '1335', 'Chama macerophylla': '1336', 'Cirripedia': '1337', 'Cyclosalpa': '1338', 'Cymopolia barbata': '1339', 'Cynoscion': '1340', 'Cystoseira amentacea': '1341', 'Ectocarpus siliculosus': '1342', 'Ellisolandia elongata': '1343', 'Enteromorpha linza': '1344', 'Euphausia superba': '1345', 'Gaidropsarus mediterraneus': '1346', 'Gennadas valens': '1347', 'Globicephala': '1348', 'Haliptilon virgatum': '1349', 'Halocynthia aurantium': '1350', 'Heliocidaris crassispina': '1351', 'Hymenodora gracilis': '1352', 'Lagodon rhomboides': '1353', 'Lepas Anatifa anatifera': '1354', 'Lobophora variegata': '1355', 'Macrocystis pyrifera': '1356', 'Maculabatis gerrardi': '1357', 'Nemacystus decipiens': '1358', 'Neptunea polycostata': '1359', 'Padina pavonia': '1360', 'Penaeidae': '1361', 'Petricolinae': '1362', 'Polynemidae': '1363', 'Pristipomoides aquilonaris': '1364', 'Pyropia fallax': '1365', 'Radiolaria': '1366', 'Salpidae': '1367', 'Sardinops melanosticta': '1368', 'Sargassum vulgare': '1369', 'Sciaena umbra': '1370', 'Scorpaena porcus': '1371', 'Sergestidae': '1372', 'Sicyonia brevirostris': '1373', 'Sphaerococcus coronopifolius': '1374', 'Stenella coeruleoalba': '1375', 'Stichopus japonicus': '1376', 'Thalia democratica': '1377', 'Themisto gaudichaudii': '1378', 'Undaria': '1379', 'Analipus japonicus': '1380', 'Sargassum yamadae': '1381', 'Ahnfeltiopsis paradoxa': '1382', 'Scytosiphon lomentaria': '1383', 'Chondria crassicaulis': '1384', 'Grateloupia lanceolata': '1385', 'Colpomenia sinuosa': '1386', 'Chondrus giganteus': '1387', 'Sargassum muticum': '1388', 'Ulva prolifera': '1389', 'Petalonia fascia': '1390', 'Balanus roseus': '1391', 'Chaetomorpha moniligera': '1392', 'Lomentaria hakodatensis': '1393', 'Neodilsea longissima': '1394', 'Polyopes affinis': '1395', 'Schizymenia dubyi': '1396', 'Dictyopteris pacifica': '1397', 'Ahnfeltiopsis flabelliformis': '1398', 'Bangia fuscopurpurea': '1399', 'Calliarthron': '1400', 'Cladophora': '1401', 'Cladophora albida': '1402', 'Dasya sessilis': '1403', 'Delesseria serrulata': '1404', 'Ecklonia cava': '1405', 'Gelidium elegans': '1406', 'Grateloupia turuturu': '1407', 'Hypnea asiatica': '1408', 'Mazzaella japonica': '1409', 'Pachydictyon coriaceum': '1410', 'Padina arborescens': '1411', 'Pterosiphonia pinnulata': '1412', 'Alatocladia yessoensis': '1413', 'Bryopsis plumosa': '1414', 'Ceramium kondoi': '1415', 'Chondracanthus intermedius': '1416', 'Codium contractum': '1417', 'Codium lucasii': '1418', 'Corallina pilulifera': '1419', 'Dictyopteris undulata': '1420', 'Gastroclonium pacificum': '1421', 'Gelidium amansii': '1422', 'Grateloupia sparsa': '1423', 'Laurencia okamurae': '1424', 'Leathesia marina': '1425', 'Lomentaria catenata': '1426', 'Meristotheca papulosa': '1427', 'Sargassum confusum': '1428', 'Sargassum siliquastrum': '1429', 'Tinocladia crassa': '1430', 'Saccharina yendoana': '1431', 'Thalassiophyllum clathrus': '1432', 'Mytilida': '1433', 'Pteriomorphia': '1434', 'Conger': '1435', 'Scyliorhinidae': '1436', 'Labrus': '1437', 'Algae': '1438', 'Necora puber': '1439', 'Anguilla': '1440', 'Rajidae': '1441', 'Buccinidae': '1442', 'Crustacea': '1443', 'Green algae': '1444', 'Ammodytes japonicus': '1445', 'Evynnis tumifrons': '1446', 'Gnathophis nystromi nystromi': '1447', 'Loligo bleekeri': '1448', 'Platichthys bicoloratus': '1449', 'Limanda punctatissima': '1450', 'Loliolus Nipponololigo japonica': '1451', 'Acanthopagrus schlegelii schlegelii': '1452', 'Sepiolina': '1453', 'Gelidium': '1454', 'Atrina pectinata': '1455', 'Echinocardium cordatum': '1456', 'Lamnidae': '1457', 'Meretrix lamarckii': '1458', 'Noctiluca scintillans': '1459', 'Philine argentata': '1460', 'Sergestes lucens': '1461', 'Corbicula sandai': '1462', 'Ulva': '1463', 'Actiniaria': '1464', 'Ctenopharyngodon idella': '1465', 'Ophiuroidea': '1466', 'Scomberoides lysan': '1467', 'Scomberoides tol': '1468', 'Sebastolobus': '1469', 'Selachimorpha': '1470', 'Selene setapinnis': '1471', 'Selene vomer': '1472', 'Sepia elliptica': '1473', 'Sergestes sp': '1474', 'Setipinna taty': '1475', 'Siganus canaliculatus': '1476', 'Sigmops gracile': '1477', 'Solenocera sp': '1478', 'Sparidae': '1479', 'Spermatophytina': '1480', 'Sphoeroides testudineus': '1481', 'Sphyraena jello': '1482', 'Spyridia hypnoides': '1483', 'Squaliformes': '1484', 'Squillidae': '1485', 'Stegophiura sladeni': '1486', 'Stenella longirostris': '1487', 'Stenobrachius leucopsarus': '1488', 'Sternaspidae': '1489', 'Stoechospermum polypodioides': '1490', 'Stolephorus commersonnii': '1491', 'Stromateus cinereus': '1492', 'Stromateus niger': '1493', 'Stromateus sinensis': '1494', 'Synidotea': '1495', 'Takifugu vermicularis': '1496', 'Telatrygon zugei': '1497', 'Terapon jarbua': '1498', 'Terebellidae': '1499', 'Thryssa dussumieri': '1500', 'Thunnini': '1501', 'Tibia curta': '1502', 'Tonna dolium': '1503', 'Trachinus draco': '1504', 'Trematomus bernacchii': '1505', 'Tridacna': '1506', 'Trinectes paulistanus': '1507', 'Trochus radiatus': '1508', 'Turbinaria': '1509', 'Tursiops truncatus': '1510', 'Ucides': '1511', 'Ulva compressa': '1512', 'Ulva fasciata': '1513', 'Ulva flexuosa': '1514', 'Ulva rigida': '1515', 'Upeneus taeniopterus': '1516', 'Upogebiidae': '1517', 'Uroteuthis Photololigo edulis': '1518', 'Valoniopsis pachynema': '1519', 'Veneridae': '1520', 'Venus foveolata': '1521', 'Vertebrata': '1522', 'Volutharpa ampullacea perryi': '1523', 'Zannichellia palustris': '1524', 'Zeus japonicus': '1525', 'Favites': '1526', 'Gadiformes': '1527', 'Gafrarium dispar': '1528', 'Galaxaura frutescens': '1529', 'Gelidium crinale': '1530', 'Genidens genidens': '1531', 'Girella elevata': '1532', 'Girella tricuspidata': '1533', 'Dentex hypselosomus': '1534', 'Saurida elongata': '1535', 'Pseudolabrus eoethinus': '1536', 'Atrobucca nibe': '1537', 'Diagramma pictum': '1538', 'Sepia lycidas': '1539', 'Plectorhinchus cinctus': '1540', 'Metapenaeopsis acclivis': '1541', 'Metapenaeopsis barbata': '1542', 'Nibea albiflora': '1543', 'Girella leonina': '1544', 'Sphyraenidae': '1545', 'Parapercis pulchella': '1546', 'Parapercis sexfasciata': '1547', 'Thysanoteuthis rhombus': '1548', 'Lepidotrigla kishinouyi': '1549', 'Cystoseira': '1550', 'Padina': '1551', 'Halimeda': '1552', 'Pacifastacus leniusculus': '1553', 'Salmo trutta': '1554', 'Chondrus crispus': '1555', 'Ictalurus punctatus': '1556', 'Acanthurus': '1557', 'Scombridae': '1558', 'Leukoma staminea': '1559', 'Trochidae': '1560', 'Protonibea': '1562', 'Anchoa compressa': '1563', 'Ensis magnus': '1564', 'Bolinus brandaris': '1565', 'Lutjanus notatus': '1566', 'Lethrinus olivaceus': '1567', 'Carassius auratus': '1569', 'Mugil': '1570', 'Gobius': '1571', 'Lajonkairia lajonkairii': '1572', 'Chrysophrys auratus': '1573', 'Galeorhinus australis': '1574', 'Nototodarus sloanii gouldi': '1575', 'Tylosurus crocodilus': '1576', 'Acanthogobius hasta': '1577', 'Penaeus chinensis': '1578', 'Ruditapes variegatus': '1579', 'Marcia marmorata': '1580', 'Rachycentron': '1581', 'Scomber kanagurta': '1582', 'Arius': '1583', 'Panulirus versicolor': '1584', 'Tilapia zillii': '1585', 'Schizoporella errata': '1586', 'Phallusia nigra': '1587', 'Physeter catodon': '1588', 'Salmo trutta trutta': '1589', 'Tachysurus thalassinus': '1590', 'Sillago domina': '1591', 'Otolithus argenteus': '1592', 'Trichiurus haumela': '1593', 'Otolithes maculata': '1594', 'Hilsa kanagurta': '1595', 'Oreochromis mossambicus': '1596', 'Siluriformes': '1597', 'Theodoxus euxinus': '1598', 'Formio niger': '1599', 'Rastrelliger': '1600', 'Nephasoma flagriferum': '1601', 'Ophiomusium lymani': '1602', 'Nematonurus armatus': '1603', 'Thalamitoides spinigera': '1604', 'Capros aper': '1605', 'Gadiculus argenteus thori': '1606', 'Phorcus lineatus': '1607', 'Penaeus vannamei': '1608', 'Raja montagui': '1609', 'Scophthalmus rhombus': '1610', 'Crambe maritima': '1611', 'Fucus ceranoides': '1612', 'Maja squinado': '1613', 'Salicornia europaea': '1614', 'Aequipecten opercularis': '1615', 'Galathea squamifera': '1616', 'Cynoglossus semilaevis': '1617', 'Loliolus beka': '1619', 'Octopus variabilis': '1620', 'Abudefduf sexfasciatus': '1621', 'Acanthurus blochii': '1622', 'Achillea millefolium': '1623', 'Alaria crassifolia': '1624', 'Albulidae': '1625', 'Ammodytes': '1626', 'Anadara satowi': '1627', 'Argyrosomus japonicus': '1628', 'Ascidiacea': '1629', 'Aulopiformes': '1630', 'Babylonia japonica': '1631', 'Babylonia kirana': '1632', 'Bathylagidae': '1633', 'Beryx decadactylus': '1634', 'Branchiostegus': '1635', 'Buccinum': '1636', 'Caesio lunaris': '1637', 'Callionymus curvicornis': '1638', 'Campylaephora hypnaeoides': '1639', 'Cetoscarus ocellatus': '1640', 'Charonia tritonis': '1641', 'Chelon haematocheilus': '1642', 'Chlorurus sordidus': '1643', 'Choerodon azurio': '1644', 'Chromis notata': '1645', 'Cladosiphon okamuranus': '1646', 'Cociella punctata': '1647', 'Coryphaena': '1648', 'Cyclina sinensis': '1649', 'Cymbacephalus beauforti': '1650', 'Dendrobranchiata': '1651', 'Digenea simplex': '1652', 'Ditrema viride': '1653', 'Enteromorpha prolifera': '1654', 'Epinephelus': '1655', 'Epinephelus akaara': '1656', 'Epinephelus awoara': '1657', 'Etelis carbunculus': '1658', 'Fistularia commersonii': '1659', 'Fulvia mutica': '1660', 'Fusinus colus': '1661', 'Gafrarium tumidum': '1662', 'Gelidiaceae': '1663', 'Girella cyanea': '1664', 'Girella mezina': '1665', 'Goniistius zonatus': '1666', 'Gracilaria': '1667', 'Gymnocranius euanus': '1668', 'Heikeopsis japonica': '1669', 'Hemitrygon': '1670', 'Hippoglossoides pinetorum': '1671', 'Holothuria atra': '1672', 'Holothuria leucospilota': '1673', 'Idiosepiidae': '1674', 'Inegocia japonica': '1675', 'Inimicus didactylus': '1676', 'Ishige': '1677', 'Lagocephalus spadiceus': '1678', 'Lambis truncata': '1679', 'Leiognathus equula': '1680', 'Lethrinus xanthochilus': '1681', 'Lutjanus erythropterus': '1682', 'Lutjanus semicinctus': '1683', 'Monodonta labio': '1684', 'Monostroma kuroshiense': '1685', 'Mulloidichthys flavolineatus': '1686', 'Mulloidichthys vanicolensis': '1687', 'Muraenesocidae': '1688', 'Myagropsis myagroides': '1689', 'Mytilisepta virgata': '1690', 'Naso brevirostris': '1691', 'Nematalosa japonica': '1692', 'Nemipterus virgatus': '1693', 'Nipponacmea': '1694', 'Nuchequula nuchalis': '1695', 'Octopus cyanea': '1696', 'Panopea generosa': '1697', 'Paralichthys': '1698', 'Paralithodes camtschaticus': '1699', 'Parascolopsis inermis': '1700', 'Pectinidae': '1701', 'Pentapodus aureofasciatus': '1702', 'Pinctada fucata': '1703', 'Pitar citrinus': '1704', 'Platycephalidae': '1705', 'Plecoglossus altivelis': '1706', 'Pleuronectes herzensteini': '1707', 'Priacanthus macracanthus': '1708', 'Pristipomoides': '1709', 'Psenopsis anomala': '1710', 'Pseudobalistes fuscus': '1711', 'Pseudocaranx dentex': '1712', 'Pseudolabrus sieboldi': '1713', 'Pseudorhombus arsius': '1714', 'Pterocaesio chrysozona': '1715', 'Rhynchopelates oxyrhynchus': '1716', 'Ryukyupercis gushikeni': '1717', 'Saccostrea echinata': '1718', 'Sargassum hemiphyllum': '1719', 'Sargassum piluliferum': '1720', 'Saurida micropectoralis': '1721', 'Saurida undosquamis': '1722', 'Saurida wanieso': '1723', 'Scarus forsteni': '1724', 'Scarus ghobban': '1725', 'Scarus ovifrons': '1726', 'Scarus rubroviolaceus': '1727', 'Scyphozoa': '1728', 'Sebastes iracundus': '1729', 'Semicossyphus reticulatus': '1730', 'Sepia latimanus': '1731', 'Siganus guttatus': '1732', 'Siganus luridus': '1733', 'Sphaerotrichia divaricata': '1734', 'Sphyrnidae': '1735', 'Spondylus regius': '1736', 'Spratelloides gracilis': '1737', 'Sthenoteuthis oualaniensis': '1738', 'Tetraodontidae': '1739', 'Trichiurus lepturus japonicus': '1740', 'Tridacna crocea': '1741', 'Turbo argyrostomus': '1742', 'Tylosurus pacificus': '1743', 'Ulvophyceae': '1744', 'Upeneus japonicus': '1745', 'Upeneus moluccensis': '1746', 'Uranoscopus japonicus': '1747', 'Anguilliformes': '1748', 'Crithmum maritimum': '1749', 'Littorina': '1750', 'Nucella lapillus': '1752', 'Scyliorhinus stellaris': '1753', 'Annelida': '1754', 'Aphrodita aculeata': '1755', 'Callionymus lyra': '1756', 'Urticina felina': '1757', 'Gebiidea': '1758', 'Bonellia viridis': '1759', 'Alcyonium glomeratum': '1760'}, 'body_part': {'Not applicable': '-1', 'Not available': '0', 'Whole animal': '1', 'Whole animal eviscerated': '2', 'Whole animal eviscerated without head': '3', 'Flesh with bones': '4', 'Blood': '5', 'Skeleton': '6', 'Bones': '7', 'Exoskeleton': '8', 'Endoskeleton': '9', 'Shells': '10', 'Molt': '11', 'Skin': '12', 'Head': '13', 'Tooth': '14', 'Otolith': '15', 'Fins': '16', 'Faecal pellet': '17', 'Byssus': '18', 'Soft parts': '19', 'Viscera': '20', 'Stomach': '21', 'Hepatopancreas': '22', 'Digestive gland': '23', 'Pyloric caeca': '24', 'Liver': '25', 'Intestine': '26', 'Kidney': '27', 'Spleen': '28', 'Brain': '29', 'Eye': '30', 'Fat': '31', 'Heart': '32', 'Branchial heart': '33', 'Muscle': '34', 'Mantle': '35', 'Gills': '36', 'Gonad': '37', 'Ovary': '38', 'Testes': '39', 'Whole plant': '40', 'Flower': '41', 'Leaf': '42', 'Old leaf': '43', 'Young leaf': '44', 'Leaf upper part': '45', 'Leaf lower part': '46', 'Scales': '47', 'Root rhizome': '48', 'Whole macro alga': '49', 'Phytoplankton': '50', 'Thallus': '51', 'Flesh without bones': '52', 'Stomach and intestine': '53', 'Whole haptophytic plants': '54', 'Loose drifting plants': '55', 'Growing tips': '56', 'Upper parts of plants': '57', 'Lower parts of plants': '58', 'Shells carapace': '59', 'Flesh with scales': '60'}}, 'SEAWATER': {'nuclide': {'NOT APPLICABLE': '-1', 'NOT AVAILABLE': '0', 'h3': '1', 'be7': '2', 'c14': '3', 'k40': '4', 'cr51': '5', 'mn54': '6', 'co57': '7', 'co58': '8', 'co60': '9', 'zn65': '10', 'sr89': '11', 'sr90': '12', 'zr95': '13', 'nb95': '14', 'tc99': '15', 'ru103': '16', 'ru106': '17', 'rh106': '18', 'ag106m': '19', 'ag108': '20', 'ag108m': '21', 'ag110m': '22', 'sb124': '23', 'sb125': '24', 'te129m': '25', 'i129': '28', 'i131': '29', 'cs127': '30', 'cs134': '31', 'cs137': '33', 'ba140': '34', 'la140': '35', 'ce141': '36', 'ce144': '37', 'pm147': '38', 'eu154': '39', 'eu155': '40', 'pb210': '41', 'pb212': '42', 'pb214': '43', 'bi207': '44', 'bi211': '45', 'bi214': '46', 'po210': '47', 'rn220': '48', 'rn222': '49', 'ra223': '50', 'ra224': '51', 'ra225': '52', 'ra226': '53', 'ra228': '54', 'ac228': '55', 'th227': '56', 'th228': '57', 'th232': '59', 'th234': '60', 'pa234': '61', 'u234': '62', 'u235': '63', 'u238': '64', 'np237': '65', 'np239': '66', 'pu238': '67', 'pu239': '68', 'pu240': '69', 'pu241': '70', 'am240': '71', 'am241': '72', 'cm242': '73', 'cm243': '74', 'cm244': '75', 'cs134_137_tot': '76', 'pu239_240_tot': '77', 'pu239_240_iii_iv_tot': '78', 'pu239_240_v_vi_tot': '79', 'cm243_244_tot': '80', 'pu238_pu239_240_tot_ratio': '81', 'am241_pu239_240_tot_ratio': '82', 'cs137_134_ratio': '83', 'cd109': '84', 'eu152': '85', 'fe59': '86', 'gd153': '87', 'ir192': '88', 'pu238_240_tot': '89', 'rb86': '90', 'sc46': '91', 'sn113': '92', 'sn117m': '93', 'tl208': '94', 'mo99': '95', 'tc99m': '96', 'ru105': '97', 'te129': '98', 'te132': '99', 'i132': '100', 'i135': '101', 'cs136': '102', 'tbeta': '103', 'talpha': '104', 'i133': '105', 'th230': '106', 'pa231': '107', 'u236': '108', 'ag111': '109', 'in116m': '110', 'te123m': '111', 'sb127': '112', 'ba133': '113', 'ce139': '114', 'tl201': '116', 'hg203': '117', 'na22': '122', 'pa234m': '123', 'am243': '124', 'se75': '126', 'sr85': '127', 'y88': '128', 'ce140': '129', 'bi212': '130', 'u236_238_ratio': '131', 'i125': '132', 'ba137m': '133', 'u232': '134', 'pa233': '135', 'ru106_rh106_tot': '136', 'tu': '137', 'tbeta40k': '138', 'fe55': '139', 'ce144_pr144_tot': '140', 'pu240_pu239_ratio': '141', 'u233': '142', 'pu239_242_tot': '143', 'ac227': '144'}, 'unit': {'Not applicable': '-1', 'NOT AVAILABLE': '0', 'Bq per m3': '1', 'Bq per m2': '2', 'Bq per kg': '3', 'Bq per kgd': '4', 'Bq per kgw': '5', 'kg per kg': '6', 'TU': '7', 'DELTA per mill': '8', 'atom per kg': '9', 'atom per kgd': '10', 'atom per kgw': '11', 'atom per l': '12', 'Bq per kgC': '13'}, 'dl': {'Not applicable': '-1', 'Not available': '0', 'Detected value': '1', 'Detection limit': '2', 'Not detected': '3', 'Derived': '4'}}}
Lets review the data of the NetCDF file:
= contents.dfs
dfs dfs
{'BIOTA': LON LAT TIME SMP_ID NUCLIDE VALUE UNIT \
0 4.031111 51.393333 1267574400 1 33 0.326416 5
1 4.031111 51.393333 1276473600 2 33 0.442704 5
2 4.031111 51.393333 1285545600 3 33 0.412989 5
3 4.031111 51.393333 1291766400 4 33 0.202768 5
4 4.031111 51.393333 1267574400 5 53 0.652833 5
... ... ... ... ... ... ... ...
15946 12.087778 57.252499 1660003200 98058 33 0.384000 5
15947 12.107500 57.306389 1663891200 98059 33 0.456000 5
15948 11.245000 58.603333 1667779200 98060 33 0.122000 5
15949 11.905278 57.302502 1663632000 98061 33 0.310000 5
15950 12.076667 57.335278 1662076800 98062 33 0.306000 5
UNC DL SPECIES BODY_PART
0 NaN 2 377 1
1 NaN 2 377 1
2 NaN 2 377 1
3 NaN 2 377 1
4 NaN 2 377 1
... ... .. ... ...
15946 0.012096 1 272 52
15947 0.012084 1 272 52
15948 0.031000 1 129 19
15949 NaN 2 129 19
15950 0.007191 1 96 40
[15951 rows x 11 columns],
'SEAWATER': LON LAT SMP_DEPTH TIME SMP_ID NUCLIDE VALUE \
0 3.188056 51.375278 3.0 1264550400 1 33 0.200000
1 2.859444 51.223610 3.0 1264550400 2 33 0.270000
2 2.713611 51.184444 3.0 1264550400 3 33 0.260000
3 3.262222 51.420277 3.0 1264550400 4 33 0.250000
4 2.809722 51.416111 3.0 1264464000 5 33 0.200000
... ... ... ... ... ... ... ...
19178 4.615278 52.831944 1.0 1573649640 97102 77 0.000005
19179 3.565556 51.411945 1.0 1575977820 96936 1 6.152000
19180 3.565556 51.411945 1.0 1575977820 96949 53 0.005390
19181 3.565556 51.411945 1.0 1575977820 96962 54 0.001420
19182 3.493889 51.719444 1.0 1576680180 96982 1 6.078000
UNIT UNC DL
0 1 NaN 2
1 1 NaN 2
2 1 NaN 2
3 1 NaN 2
4 1 NaN 2
... ... ... ..
19178 1 2.600000e-07 1
19179 1 3.076000e-01 1
19180 1 1.078000e-03 1
19181 1 2.840000e-04 1
19182 1 3.039000e-01 1
[19183 rows x 10 columns]}
Lets review the biota data:
=dfs['BIOTA']
nc_dfs_biota nc_dfs_biota
LON | LAT | TIME | SMP_ID | NUCLIDE | VALUE | UNIT | UNC | DL | SPECIES | BODY_PART | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 4.031111 | 51.393333 | 1267574400 | 1 | 33 | 0.326416 | 5 | NaN | 2 | 377 | 1 |
1 | 4.031111 | 51.393333 | 1276473600 | 2 | 33 | 0.442704 | 5 | NaN | 2 | 377 | 1 |
2 | 4.031111 | 51.393333 | 1285545600 | 3 | 33 | 0.412989 | 5 | NaN | 2 | 377 | 1 |
3 | 4.031111 | 51.393333 | 1291766400 | 4 | 33 | 0.202768 | 5 | NaN | 2 | 377 | 1 |
4 | 4.031111 | 51.393333 | 1267574400 | 5 | 53 | 0.652833 | 5 | NaN | 2 | 377 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
15946 | 12.087778 | 57.252499 | 1660003200 | 98058 | 33 | 0.384000 | 5 | 0.012096 | 1 | 272 | 52 |
15947 | 12.107500 | 57.306389 | 1663891200 | 98059 | 33 | 0.456000 | 5 | 0.012084 | 1 | 272 | 52 |
15948 | 11.245000 | 58.603333 | 1667779200 | 98060 | 33 | 0.122000 | 5 | 0.031000 | 1 | 129 | 19 |
15949 | 11.905278 | 57.302502 | 1663632000 | 98061 | 33 | 0.310000 | 5 | NaN | 2 | 129 | 19 |
15950 | 12.076667 | 57.335278 | 1662076800 | 98062 | 33 | 0.306000 | 5 | 0.007191 | 1 | 96 | 40 |
15951 rows × 11 columns
Lets review the seawater data:
=dfs['SEAWATER']
nc_dfs_seawater nc_dfs_seawater
LON | LAT | SMP_DEPTH | TIME | SMP_ID | NUCLIDE | VALUE | UNIT | UNC | DL | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 3.188056 | 51.375278 | 3.0 | 1264550400 | 1 | 33 | 0.200000 | 1 | NaN | 2 |
1 | 2.859444 | 51.223610 | 3.0 | 1264550400 | 2 | 33 | 0.270000 | 1 | NaN | 2 |
2 | 2.713611 | 51.184444 | 3.0 | 1264550400 | 3 | 33 | 0.260000 | 1 | NaN | 2 |
3 | 3.262222 | 51.420277 | 3.0 | 1264550400 | 4 | 33 | 0.250000 | 1 | NaN | 2 |
4 | 2.809722 | 51.416111 | 3.0 | 1264464000 | 5 | 33 | 0.200000 | 1 | NaN | 2 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
19178 | 4.615278 | 52.831944 | 1.0 | 1573649640 | 97102 | 77 | 0.000005 | 1 | 2.600000e-07 | 1 |
19179 | 3.565556 | 51.411945 | 1.0 | 1575977820 | 96936 | 1 | 6.152000 | 1 | 3.076000e-01 | 1 |
19180 | 3.565556 | 51.411945 | 1.0 | 1575977820 | 96949 | 53 | 0.005390 | 1 | 1.078000e-03 | 1 |
19181 | 3.565556 | 51.411945 | 1.0 | 1575977820 | 96962 | 54 | 0.001420 | 1 | 2.840000e-04 | 1 |
19182 | 3.493889 | 51.719444 | 1.0 | 1576680180 | 96982 | 1 | 6.078000 | 1 | 3.039000e-01 | 1 |
19183 rows × 10 columns
Data Format Conversion
The MARIS data processing workflow involves two key steps:
- NetCDF to Standardized CSV Compatible with OpenRefine Pipeline
- Convert standardized NetCDF files to CSV formats compatible with OpenRefine using the
NetCDFDecoder
. - Preserve data integrity and variable relationships.
- Maintain standardized nomenclature and units.
- Convert standardized NetCDF files to CSV formats compatible with OpenRefine using the
- Database Integration
- Process the converted CSV files using OpenRefine.
- Apply data cleaning and standardization rules.
- Export validated data to the MARIS master database.
This section focuses on the first step: converting NetCDF files to a format suitable for OpenRefine processing using the NetCDFDecoder
class.
=fname_out_nc, verbose=True) decode(fname_in
Saved BIOTA to ../../_data/output/191-OSPAR-2024_BIOTA.csv
Saved SEAWATER to ../../_data/output/191-OSPAR-2024_SEAWATER.csv