cicsmd

 



Improving the Inventory, Discoverability, and Delivery of Oceanographic Data at the National Centers for Environmental Information

Research Topic: Climate Data & Information Records/Scientific Data Stewardship
Task Leader: Alexey Mishonov
CICS Scientist: James Reagan
Sponsor: NESDIS NCEI
Published Date: 9/26/2017
AMJR_BEDI_16tn

2017 ANNUAL REPORT

Background

The goal of BEDI is to improve the discoverability and delivery of datasets developed or archived at the National Centers for Environmental Information (NCEI)  through NCEI’s geoportal and THREDDS server.  These include (1) ocean profile datasets, (2) ocean surface datasets and (3) ocean model datasets.

1.       Ocean profile datasets:

 The NCEI maintains the World Ocean Database (WOD) which is the most comprehensive, quality controlled database of ocean profile data.  Two other important ocean profile datasets, available at NCEI, are Global Temperature and Salinity Profile Program (GTSPP) and the Argo data program.  The GTSPP data are near real-time data and the Argo program uses autonomous profiling floats and is recognized for its excellent quality control and delivery system.  One of the important goals of the Big Earth Data Initiative (BEDI) project is to generate International Organization for Standardization (ISO) metadata files for all three datasets – thereby making these data more discoverable and accessible for researchers.  In time, it is hoped that this will make these data easier to aggregate.  In an attempt to expand to other data types, ISO metadata files will also be created for Global Ocean Current Data (GOCD), also an NCEI product.

2.       Ocean surface dataset:

The International Comprehensive Ocean-Atmosphere Data Set (ICOADS) is a foundational dataset of in situ marine meteorological and ocean surface variables spanning more than 300 years from the 1600s to the present.  ICOADS is vital to research into global climate change.  Among many other uses, it is the base dataset for the Extended Reconstruction of Sea Surface Temperature (ERSST) which is used to estimate historical changes in SST and monitor current SST trends. ICOADS data in IMMA format is served from an online system at the National Center for Atmospheric Research (NCAR).  However, IMMA is an ASCII format which does not readily lend itself to machine-to-machine transfer, nor to the National Centers for Environmental Information (NCEI) granule discovery tools.  Data needed to be converted to NetCDF format, archived at NCEI and then severed through the NCEI geoportal and THREDDS server.

3.       Ocean model datasets:

CO-OPS Operational Forecast Systems (OFS) provide NOAA’s capability to produce operational guidance on water levels and currents in the coastal ocean and Great Lakes in support of maritime navigation. The heart of each OFS is a state-of-the-art hydrodynamic model, run every 6 hours to provide up-to-date guidance. OFS are currently implemented in 13 geographic domains (see https://tidesandcurrents.noaa.gov/models.html), with plans to expand to 15 within 3 years and full CONUS coverage within 7 years. In aggregate, the current 13 systems produce approximately 150 gigabytes of output daily. CO-OPS currently retains the model output for one to two years and makes it available via THREDDS Data Server (TDS), after which it is overwritten and lost to future use. At this time no OFS output is archived; however CO-OPS has submitted a request to NCEI for basic archival services (file-level access, tape storage). While this is an encouraging step forward in terms of preserving these important data, accessibility of the archived data will be cumbersome. The proposed work builds upon the commitment to archive the data already in progress.

Accomplishments

1.       Ocean profile datasets:

Since coming aboard the BEDI project in October 2016, Beauchamp initially familiarized himself with the World Ocean Database (WOD), completed the NOAA IT Security Awareness course, and enrolled in an overview of Metadata Basics seminar taught by Kathy Martinolich from NOAA/NCEI.  After becoming familiar with the WOD, he began to utilize an updated isolite metadata template and existing WOD software to create metadata files for WOD ragged-array format NetCDF files.  Several iterations of template files and some minor tweaking of the software were necessary in order to make the metadata files valid.  These files were then made available to another team member, Yuanjie Li, so that they could be indexed for the NCEI granule geoportal.

An existing script to process GTSPP profiles was adapted and a significant number of metadata files (from years 1985 – 2008) were generated.  As these files are successfully indexed by Yuanjie Li, the remainder of the GTSPP metadata files will be processed.  As both the WOD and GTSPP metadata files were being indexed, Beauchamp confirmed that they could be discovered via the granule geoportal and provided feedback to Yuanjie Li whenever issues arose.  The indexing for WOD and GTSPP is ongoing.

Argo profile data are archived as single cast NetCDF files.  FORTRAN code and a script were developed to extract the necessary information from the NetCDF files and insert it into a template file to generate Argo metadata files, which will be eventually indexed.  These Argo metadata are not in their final form and some minor revisions are still necessary.

2.       Ocean surface dataset:

Wang has created ISO metadata for ICOADS observational data and has developed a CF compliant netCDF format for same. This will allow for discovery and delivery of ICOADS data through the NCEI granule discovery tools and THREDDS server either independently or in conjunction with WOD data.  This delivery of two foundational datasets together will allow researchers to access the entire array of marine meteorological, ocean surface, and subsurface ocean data through one mechanism.  It should be noted that ICOADS Release 3.0 (available ~June 2016) has more than 455 million “granules” (i.e. individual “marine reports” in ICOADS terminology) compared to 13.9 million for WOD, so the work will entail scaling up the WOD system one order of magnitude.  Wang is working with the NCEI data ingest team (John Relph) to set up automation of ingest and archive the ICOADS netCDF format into the NCEI Archive Management System. The ISO metadata template for ICOADS data has also been developed and he will work with Li to finalize the metadata files

3.       Ocean model datasets:

Wang has set up a test THREDDS website on the THREDDS server at NCEI-NC to provide access and service to the CO-OPS OFS Modeling data at NCEI. He also set up catalogs for THREDDS time aggregation for both structured and un-structured models. This will provide advanced web-service access to the CO-OPS OFS model data over the long term.  

Planned work        

1.       Ocean profile datasets:

·         Continue working with Yuanjie Li, generating and making available WOD and GTSPP metadata for indexing and confirming that the indexing is complete and successful.

·         Completing the procedure (FORTRAN code, template, and script) for generating Argo metadata files.  Upon generating these files we will make them available for indexing.

·         Develop a procedure for creating metadata files for Global Ocean Current Data (GOCD) and generate and make these files available for indexing.

·         Explore the use of elasticsearch for granule search as well as using ERRDAP as an aggregation tools.

2.       Ocean surface dataset:

·         Complete the ICOADS NetCDF automation setup

·         Finalize the ICOADS ISO metadata template and create ISO metadata file for each granule in ICOADS

·         Testing and implementation of granule geoportal searches, THREDDS server and NCEI Data Access and OneStop data delivery

·         Maintain the ICOADS data automation and solve any issues encountered

3.       Ocean model datasets:

·         Apply the THREDDS catalogs to production when COOPS OFS data are available at NCEI-NC

·         Apply the time aggregation to the data

Products      

(1) The International Comprehensive Ocean-Atmosphere Data Set netCDF format

(2) THREDDS server for the COOPS OFS modeling data with time aggregation capability

(3) ISO metadata for GTSPP data (years 1985 – 2008)

(4) ISO metadata for WOD (9 of 11 instrument types completed)

(5) Preliminary method developed to generate ISO metadata from Argo profiles

close (X)