cmiputil.esgfsearch module¶
Search CMIP6 datasets via ESGF RESTful API, get OPeNDAP URLs and other information of found dataset.
Basic Usage¶
Typical flow of searching and downloading CMIP6 dataset from ESGF is as follows;
create a
esgfsearch.ESGFSearchinstance,do search via the
doSearch()method,seach results are set as a
datainfoattribute, which is a list ofesgfdatainfo.DataInfoinstances. One element corresponds to the one search result.open dataset URLs as your favorit datatype, such as xarray, siphon or netCDF4, etc.
All dataset URLs found are stored as the data_urls attribute.
Example
>>> from cmiputil import esgfsearch
>>> import xarray as xr
>>> params = {'source_id': 'MIROC6',
... 'experiment_id': 'historical',
... 'variable_id': 'tas',
... 'variant_label': 'r1i1p1f1'}
>>> es = esgfsearch.ESGFSearch()
>>> es.doSearch(params)
In above after doSearch(), es.data_urls is set as below:
'data_urls': ['http://esgf-data2.diasjp.net/thredds/dodsC/CMIP6.CMIP.MIROC.MIROC6.historical.r1i1p1f1.Amon.tas.gn.tas.20181212.aggregation.1']}
You can open in any kind of datasets from this URLs, for example:
ds = []
for url in es.data_urls:
if type(url) is list:
ds.append(xr.open_mfdataset(url, decode_times=False, combine='by_coords'))
else:
ds.append(xr.open_dataset(url, decode_times=False))
“Aggregated”¶
One feature of OPenDAP is that a multi-files dataset can be accessed
as an aggregated single file. If you prefer to get aggregated
dataset, set aggregate as True in config file (see below), or
vice varsa.
In case you choose not to use aggregation, netCDF4 (and the datatype that use it as a backend) can open multifile as a single dataset, as shown in above example.
Config File¶
This module reads in config file, sections below;
[cmiputil]
cmip6_data_dir(str):the root of local data store (described below).
[ESGFSearch]
search_service(str):the base URL of the search service at an ESGF Index Node
aggregate(bool):retrieve OPeNDAP aggregated datasets or not
[ESGFSearch.keywords] : keyword parameters of RESTful API
[ESGFSearch.facets] : facet parameters of RESTful API
Warning
Currently format, limit, type keywords are not configurable. Even if you specify them in your config file, they will be overriden.
Local files¶
This module assumes that local data files are stored in the DRS
complient directory structure. See drs module for the details
of DRS. If you use synda install for download and replication of
CMIP6 data files from ESGF, files are stored in such way.
doSearch() also searchs local files corresponding to the
search result and set local_files() property so that you can
use local files instead of downloading them.
Do not forget to set base_dir attribute or cmip6_data_dir
in config file as the root of this directory structure.
After doSearch() in above example, es.local_files is set as below if they are exists:
[[PosixPath('/data/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/Amon/tas/gn/v20181212/tas_Amon_MIROC6_historical_r1i1p1f1_gn_185001-194912.nc'),
PosixPath('/data/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/Amon/tas/gn/v20181212/tas_Amon_MIROC6_historical_r1i1p1f1_gn_195001-201412.nc')]]
-
class
esgfsearch.ESGFSearch(conffile='')[source]¶ Bases:
objectSearch CMIP6 datasets via ESGF RESTful API, get OPeNDAP URLs and other information of found datasets
If conffile is
None, no config file is read and the blank instance is created. If you want only default config files, setconffile="". Seeconfigmodule for details.- Parameters
conffile (path-like) – configure file
-
conf¶ config.Confinstance
-
datainfo¶ list of
esgfdatainfo.ESGFDataInfoinstances
-
search_service¶ search service for RESTful API, eg.,
http://esgf-node.llnl.gov/esg-search/
-
service_type¶ service type for RESTful API. currently only
searchis allowed.
-
params¶ dict for keyword parameters and facet parameters for RESTful API
-
doSearch(params=None, base_url=None)[source]¶ Do search via ESGF RESTful API.
Search results are stored to the
datainfoattributes as a list ofesgfdatainfo.ESGFDataInfoinstances.If
aggregateattribute isTrue, this method obtains URLs of aggregated dataset, else URLs of all of files listed in the catalog.All of retrieved OPeNDAP URLs can be accessed by
data_urls()attribute.- Parameters
params (dict) – keyword parameters and facet parameters.
base_url – base URL of the ESGF search service.
- Raises
NotFoundError – raised if no catalog found.
- Returns
None
If base_url is not
None, overridessearch_service+service_typeattributes.params is to update (use update() method of python dict) to
paramsattribute.
-
esgfsearch.getDefaultConf()[source]¶ Return default config values as a dict.
Intended to be called before
writeConf()inconfig.Example
>>> from cmiputil import esgfsearch, config >>> conf = config.Conf(None) # to create brank config >>> conf.setCommonSection() >>> d = esgfsearch.getDefaultConf() >>> conf.read_dict(d) >>> conf.writeConf('/tmp/cmiputil.conf', overwrite=True)
-
esgfsearch.facets_default= {'table_id': 'Amon'}¶ Default fasets for RESTful API.
-
esgfsearch.keywords_default= {'latest': 'true', 'replica': 'false'}¶ Default keywords for RESTful API.
-
esgfsearch.keywords_non_configurable= {'format': 'application/solr+json', 'limit': 10000, 'type': 'Dataset'}¶ Keywords not configurable for RESTful API.
-
esgfsearch.search_service_default= 'http://esgf-node.llnl.gov/esg-search/'¶ Default search service URL
-
esgfsearch.service_type_default= 'search'¶ Not configurable
- Type
Default service type