cmiputil.esgfsearch module

Search CMIP6 datasets via ESGF RESTful API, get OPeNDAP URLs and other information of found dataset.

Basic Usage

Typical flow of searching and downloading CMIP6 dataset from ESGF is as follows;

  1. create a esgfsearch.ESGFSearch instance,

  2. do search via the doSearch() method,

  3. seach results are set as a datainfo attribute, which is a list of esgfdatainfo.DataInfo instances. One element corresponds to the one search result.

  4. open dataset URLs as your favorit datatype, such as xarray, siphon or netCDF4, etc.

All dataset URLs found are stored as the data_urls attribute.

Example

>>> from cmiputil import esgfsearch
>>> import xarray as xr
>>> params = {'source_id': 'MIROC6',
...           'experiment_id': 'historical',
...           'variable_id': 'tas',
...           'variant_label': 'r1i1p1f1'}
>>> es = esgfsearch.ESGFSearch()
>>> es.doSearch(params)

In above after doSearch(), es.data_urls is set as below:

'data_urls': ['http://esgf-data2.diasjp.net/thredds/dodsC/CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.Amon.tas.gn.tas.20181212.aggregation.1']}

You can open in any kind of datasets from this URLs, for example:

ds = []
for url in es.data_urls:
   if type(url) is list:
       ds.append(xr.open_mfdataset(url, decode_times=False, combine='by_coords'))
   else:
       ds.append(xr.open_dataset(url, decode_times=False))

“Aggregated”

One feature of OPenDAP is that a multi-files dataset can be accessed as an aggregated single file. If you prefer to get aggregated dataset, set aggregate as True in config file (see below), or vice varsa.

In case you choose not to use aggregation, netCDF4 (and the datatype that use it as a backend) can open multifile as a single dataset, as shown in above example.

Config File

This module reads in config file, sections below;

  • [cmiputil]

    cmip6_data_dir (str):

    the root of local data store (described below).

  • [ESGFSearch]

    search_service (str):

    the base URL of the search service at an ESGF Index Node

    aggregate (bool):

    retrieve OPeNDAP aggregated datasets or not

  • [ESGFSearch.keywords] : keyword parameters of RESTful API

  • [ESGFSearch.facets] : facet parameters of RESTful API

Warning

Currently format, limit, type keywords are not configurable. Even if you specify them in your config file, they will be overriden.

Local files

This module assumes that local data files are stored in the DRS complient directory structure. See drs module for the details of DRS. If you use synda install for download and replication of CMIP6 data files from ESGF, files are stored in such way.

doSearch() also searchs local files corresponding to the search result and set local_files() property so that you can use local files instead of downloading them.

Do not forget to set base_dir attribute or cmip6_data_dir in config file as the root of this directory structure.

After doSearch() in above example, es.local_files is set as below if they are exists:

[[PosixPath('/data/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/Amon/tas/gn/v20181212/tas_Amon_MIROC6_historical_r1i1p1f1_gn_185001-194912.nc'),
  PosixPath('/data/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/Amon/tas/gn/v20181212/tas_Amon_MIROC6_historical_r1i1p1f1_gn_195001-201412.nc')]]
exception esgfsearch.NotFoundError[source]

Bases: Exception

class esgfsearch.ESGFSearch(conffile='')[source]

Bases: object

Search CMIP6 datasets via ESGF RESTful API, get OPeNDAP URLs and other information of found datasets

If conffile is None, no config file is read and the blank instance is created. If you want only default config files, set conffile="". See config module for details.

Parameters

conffile (path-like) – configure file

conf

config.Conf instance

datainfo

list of esgfdatainfo.ESGFDataInfo instances

search_service

search service for RESTful API, eg., http://esgf-node.llnl.gov/esg-search/

service_type

service type for RESTful API. currently only search is allowed.

aggregate

get aggregated URL if TRUE

Type

bool

params

dict for keyword parameters and facet parameters for RESTful API

base_dir

base(root) path for local data directory structure

Type

str

doSearch(params=None, base_url=None)[source]

Do search via ESGF RESTful API.

Search results are stored to the datainfo attributes as a list of esgfdatainfo.ESGFDataInfo instances.

If aggregate attribute is True, this method obtains URLs of aggregated dataset, else URLs of all of files listed in the catalog.

All of retrieved OPeNDAP URLs can be accessed by data_urls() attribute.

Parameters
  • params (dict) – keyword parameters and facet parameters.

  • base_url – base URL of the ESGF search service.

Raises

NotFoundError – raised if no catalog found.

Returns

None

If base_url is not None, overrides search_service + service_type attributes.

params is to update (use update() method of python dict) to params attribute.

property cat_urls

Obtained catalog URLs

Type

list(str)

property data_urls

URLs of each dataset.

If aggregate is False, one dataset consists of multiple datafile, type of this is list of list(str).

Type

list(str) or list(list(str))

property local_files

Paths of existing local file corresponding to the search result.

Type

list(str) or list(list(str))

esgfsearch.getDefaultConf()[source]

Return default config values as a dict.

Intended to be called before writeConf() in config.

Example

>>> from cmiputil import esgfsearch, config
>>> conf = config.Conf(None)   #  to create brank config
>>> conf.setCommonSection()
>>> d = esgfsearch.getDefaultConf()
>>> conf.read_dict(d)
>>> conf.writeConf('/tmp/cmiputil.conf', overwrite=True)
esgfsearch.facets_default = {'table_id': 'Amon'}

Default fasets for RESTful API.

esgfsearch.keywords_default = {'latest': 'true', 'replica': 'false'}

Default keywords for RESTful API.

esgfsearch.keywords_non_configurable = {'format': 'application/solr+json', 'limit': 10000, 'type': 'Dataset'}

Keywords not configurable for RESTful API.

esgfsearch.search_service_default = 'http://esgf-node.llnl.gov/esg-search/'

Default search service URL

esgfsearch.service_type_default = 'search'

Not configurable

Type

Default service type