Callbacks to populate NetCDF global attributes.

Covers four concerns:

from nbdev.showdoc import show_doc

Global attributes feeder

GlobAttrsFeeder follows the same callback pattern as marisco.callbacks.Transformer: it takes a list of Callback objects, runs them in order, and collects their results into an attrs dict. Each callback below contributes one piece of the NetCDF global metadata.


source

GlobAttrsFeeder


def GlobAttrsFeeder(
    dfs:Dict, # Dictionary of NetCDF group DataFrames
    cbs:List=None, # Callbacks
    logs:List=None, # List of preprocessing steps taken
)->None:

Produce NetCDF global attributes as specified by the callbacks.

Spatial and temporal coverage


source

BboxCB


def BboxCB(
    
):

Compute dataset geographical bounding box


source

DepthRangeCB


def DepthRangeCB(
    depth_col:str='SMP_DEPTH', # Column name for sampling depth values
):

Compute minimum and maximum depth values


source

TimeRangeCB


def TimeRangeCB(
    time_col:str='TIME', # Column name for time values
    fn_time_unit:Callable=get_time_units, # Function returning the NetCDF time unit string
):

Compute time coverage start and end dates

Bibliographic metadata

Every curated dataset in MARIS needs bibliographic metadata (title, abstract, creators, DOI, …) stored as global attributes in the NetCDF file produced by each dataset handler (e.g. the Geotraces handler). MARIS previously managed this metadata through the IAEA’s Zotero group library, but the IAEA’s own INIS bibliographic database is now the target single source of truth; migration is ongoing.

Both ZoteroCB and InisCB below produce the same core set of global attributes (id, title, summary, creator_name), making them interchangeable in a handler’s callback pipeline. InisCB additionally injects references (DOI) and metadata_link (record URL). The attribute set is not locked; additional bibliographic fields can be added as future needs arise.

Zotero

Bibliographic metadata for each dataset is managed in the IAEA’s Zotero group library. ZoteroClient is a lightweight client to fetch individual records; ZoteroCB wraps it into a callback that populates id, title, summary, and creator_name global attributes.


source

ZoteroClient


def ZoteroClient(
    item_id:str, # Zotero item key to retrieve
    lib_id:str, # Zotero library ID
    api_key:str, # Zotero API key
)->None:

Zotero API client to fetch a bibliographic record.

Read-only properties (all return str): title, summary, creator_name (JSON-encoded list) — derived from the fetched Zotero record.

item = ZoteroClient('26VMZZ2Q', ZOTERO_LIB_ID, os.getenv('ZOTERO_API_KEY'))
test_eq(item.title, 'Environmental database - Helsinki Commission Monitoring of Radioactive Substances')
test_eq(item.summary[:30], 'MORS Environment database has ')
creators = json.loads(item.creator_name)
test_eq(len(creators), 1)
test_eq(creators[0]['creatorType'], 'author')

source

ZoteroCB


def ZoteroCB(
    itemId, # Zotero item key to retrieve
):

Populate global attributes from Zotero bibliographic metadata.

attrs = GlobAttrsFeeder(None, cbs=[
    ZoteroCB('26VMZZ2Q')
    ])()
    
test_eq(attrs['id'], '26VMZZ2Q')
test_eq(attrs['title'], 'Environmental database - Helsinki Commission Monitoring of Radioactive Substances')
attrs = GlobAttrsFeeder(None, cbs=[
    ZoteroCB('3W354SQG')
    ])()
    
test_eq(attrs['id'], '3W354SQG')
attrs = GlobAttrsFeeder(None, cbs=[
    ZoteroCB('x')
    ])()
    
test_eq(attrs, {})
Item x does not exist in Zotero library

INIS

Bibliographic metadata can also be fetched from the IAEA’s InvenioRDM-based INIS API. INISClient is a lightweight client to fetch individual records; InisCB wraps it into a callback that populates id, title, summary, and creator_name global attributes, and can also inject references and metadata_link from the record’s DOI and web URL.


source

fetch_inis


def fetch_inis(
    inis_id:str, # INIS record identifier (e.g. 'vq0ha-86k24')
    base_url:str='https://inis.iaea.org/api/records', # API base URL
)->dict: # Raw INIS record payload

Fetch an INIS record from the InvenioRDM API via curl.

Exported source
INIS_QA_API = "https://inis-qa.iaea.org/api/records"
INIS_API = "https://inis.iaea.org/api/records"

source

find_curl


def find_curl(
    
)->str:

Return path to curl, or raise FileNotFoundError.


source

INISClient


def INISClient(
    inis_id:str, # INIS record identifier (e.g. 'vq0ha-86k24')
    base_url:str='https://inis.iaea.org/api/records', # API base URL
)->None:

Retrieve INIS metadata from the InvenioRDM API.

Read-only properties (all return str): title, summary, doi, creator_name (JSON-encoded list), url — derived from the fetched INIS record; return '' when the record does not exist.

inis = INISClient('5smfm-0a377')
test_eq(inis.title, 'The GEOTRACES Intermediate Data Product 2017')
test_eq(inis.exist(), True)
test_eq(inis.summary[:30], 'The GEOTRACES Intermediate Dat')
test_eq(inis.doi, '10.1016/j.chemgeo.2018.05.040')
# Test a record without DOI (g7wwp-fcc77 is a test record with no DOI) on QA instance
no_doi = INISClient('g7wwp-fcc77', base_url=INIS_QA_API)
test_eq(no_doi.doi, '')
test_eq(no_doi.exist(), True)
creators = json.loads(inis.creator_name)
test_eq(len(creators), 286)
test_eq(creators[0]['person_or_org']['family_name'], 'Schlitzer')
test_eq(inis.url, 'https://inis.iaea.org/records/5smfm-0a377')
# Test non-existent record
nonexistent = INISClient('this-does-not-exist')
test_eq(nonexistent.exist(), False)
test_eq(nonexistent.title, '')
test_eq(nonexistent.doi, '')

source

InisCB


def InisCB(
    inis_id:str, # INIS record identifier to retrieve
    base_url:str='https://inis.iaea.org/api/records', # API base URL
):

Populate global attributes from INIS metadata.

# Integration test: InisCB fills correct attrs
class AttrSink:
    def __init__(self): self.attrs = {}

sink = AttrSink()
InisCB('5smfm-0a377')(sink)
test_eq(sink.attrs['id'], '5smfm-0a377')
test_eq(sink.attrs['title'], 'The GEOTRACES Intermediate Data Product 2017')
test_eq(sink.attrs['references'], '10.1016/j.chemgeo.2018.05.040')
test_eq(sink.attrs['metadata_link'], 'https://inis.iaea.org/records/5smfm-0a377')
creators = json.loads(sink.attrs['creator_name'])
test_eq(len(creators), 286)
test_eq(set(sink.attrs.keys()), {'id', 'title', 'summary', 'creator_name', 'references', 'metadata_link'})
# Test InisCB handles non-existent records gracefully
sink2 = AttrSink()
InisCB('this-does-not-exist')(sink2)
test_eq(sink2.attrs, {})

Static global attributes


source

KeyValuePairCB


def KeyValuePairCB(
    k, # NetCDF global attribute key name
    v, # NetCDF global attribute value
):

Add a single key-value pair as a NetCDF global attribute.

For static global attributes that don’t derive from data — like a keywords string or a publisher name — KeyValuePairCB wraps a simple key-value pair into a callback, keeping the interface uniform.

Usage

dfs = pd.read_pickle('../files/pkl/dfs_test.pkl')
kw = ['oceanography', 'Earth Science > Oceans > Ocean Chemistry> Radionuclides',
      'Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure',
      'Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments',
      'Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes',
      'Earth Science > Oceans > Water Quality > Ocean Contaminants',
      'Earth Science > Biological Classification > Animals/Vertebrates > Fish',
      'Earth Science > Biosphere > Ecosystems > Marine Ecosystems',
      'Earth Science > Biological Classification > Animals/Invertebrates > Mollusks',
      'Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans',
      'Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)']
feed = GlobAttrsFeeder(dfs, cbs=[
    BboxCB(),
    DepthRangeCB(),
    TimeRangeCB(),
    ZoteroCB('26VMZZ2Q'),
    KeyValuePairCB('keywords', ', '.join(kw))
    ])
attrs = feed()
attrs
{'geospatial_lat_min': '179.9986',
 'geospatial_lat_max': '89.9905',
 'geospatial_lon_min': '-180.0',
 'geospatial_lon_max': '-70.5744',
 'geospatial_bounds': 'POLYGON ((-180 -70.5744, 179.9986 -70.5744, 179.9986 89.9905, -180 89.9905, -180 -70.5744))',
 'geospatial_vertical_max': '5815.3',
 'geospatial_vertical_min': '0.5',
 'time_coverage_start': '2007-07-30T10:37:19',
 'time_coverage_end': '2018-11-22T07:33:10',
 'id': '26VMZZ2Q',
 'title': 'Environmental database - Helsinki Commission Monitoring of Radioactive Substances',
 'summary': 'MORS Environment database has been used to collate data resulting from monitoring of environmental radioactivity in the Baltic Sea based on HELCOM Recommendation 26/3.\n\nThe database is structured according to HELCOM Guidelines on Monitoring of Radioactive Substances (https://www.helcom.fi/wp-content/uploads/2019/08/Guidelines-for-Monitoring-of-Radioactive-Substances.pdf), which specifies reporting format, database structure, data types and obligatory parameters used for reporting data under Recommendation 26/3.\n\nThe database is updated and quality assured annually by HELCOM MORS EG.',
 'creator_name': '[{"creatorType": "author", "name": "HELCOM MORS"}]',
 'keywords': 'oceanography, Earth Science > Oceans > Ocean Chemistry> Radionuclides, Earth Science > Human Dimensions > Environmental Impacts > Nuclear Radiation Exposure, Earth Science > Oceans > Ocean Chemistry > Ocean Tracers, Earth Science > Oceans > Marine Sediments, Earth Science > Oceans > Ocean Chemistry, Earth Science > Oceans > Sea Ice > Isotopes, Earth Science > Oceans > Water Quality > Ocean Contaminants, Earth Science > Biological Classification > Animals/Vertebrates > Fish, Earth Science > Biosphere > Ecosystems > Marine Ecosystems, Earth Science > Biological Classification > Animals/Invertebrates > Mollusks, Earth Science > Biological Classification > Animals/Invertebrates > Arthropods > Crustaceans, Earth Science > Biological Classification > Plants > Macroalgae (Seaweeds)'}