cmiputil.drs module¶
CMIP6 Data Reference Syntax (DRS).
(Excerpt from http://goo.gl/v1drZl)
File name template:¶
DRS compilent filename consists of several global attributes as follows:
filename = <variable_id>
_<table_id>
_<source_id>
_<experiment_id >
_<member_id>
_<grid_label>
[_<time_range>].nc
For time-invariant fields, the last segment (<time_range>) above is omitted.
All strings appearing in the file name are constructed using only the following characters: a-z, A-Z, 0-9, and the hyphen (“-“), except the hyphen must not appear in <variable_id>. Underscores are prohibited throughout except as shown in the template.
The <member_id> is constructed from the <sub_experiment_id> and <variant_label> using the following algorithm:
if <sub_experiment_id> == "none"
<member_id> = <variant_label>
else
<member_id> = <sub_experiment_id>-<variant_label>
endif
The <time_range> is a string generated consistent with the following:
if frequency == "fx" then
<time_range>=""
else
<time_range> = N1-N2
endif
where N1 and N2 are integers of the form yyyy[MM[dd[hh[mm[ss]]]]][<suffix>]
(expressed as a string, where yyyy, MM, dd, hh, mm and
ss are integer year, month, day, hour, minute, and second, respectively),
where <suffix> is defined as follows:
if the variable identified by variable_id has a time dimension with a “climatology” attribute then
<suffix> = "-clim"
else
<suffix> = ""
endif
and where the precision of the time_range strings is determined by the <frequency> global attribute.
Example when there is no <sub_experiment_id>:
tas_Amon_GFDL-CM4_historical_r1i1p1f1_gn_196001-199912.nc
Example with a <sub_experiment_id>:
pr_day_CNRM-CM6-1_dcppA-hindcast_s1960-r2i1p1f1_gn_198001-198412.nc
Directory structure template:¶
DRS complient directory structure consists of several global attributes as follows:
Directory structure = <mip_era>/
<activity_id>/
<institution_id>/
<source_id>/
<experiment_id>/
<member_id>/
<table_id>/
<variable_id>/
<grid_label>/
<version>
Note:
<version> has the form “vYYYYMMDD” (e.g., “v20160314”), indicating a representative date for the version. Note that files contained in a single <version> subdirectory at the end of the directory path should represent all the available time-samples reported from the simulation; a time-series can be split across several files, but all the files must be found in the same subdirectory. This implies that <version> will not generally be the actual date that all files in the subdirectory were written or published.
If multiple activities are listed in the global attribute, the first one is used in the directory structure.
Example when there is no <sub-experiment_id>:
CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/1pctCO2/r1i1p1f1/Amon/tas/gn/v20150322
Example with a <sub_experiment_id>:
CMIP6/DCPP/CNRM-CERFACS/CNRM-CM6-1/dcppA-hindcast/s1960-r2i1p1f3/day/pr/gn/v20160215
-
class
drs.DRS(file=None, filename=None, dirname=None, do_sanitize=True, **kw)[source]¶ Bases:
objectClass for CMIP6 DRS.
This class contains attributes necessary to construct a file name/directory name that is valid for CMIP6 DRS (Data Reference Syntax). See above and http://goo.gl/v1drZl for details about DRS as well as CMIP6 global attributes, etc.
Instance member variables of this class are:
activity_idexperiment_idgrid_labelinstitution_idmip_erasource_idsource_idsub_experiment_idtable_idtime_rangevariable_idvariant_labelversionmember_id
Note that
member_idis not able to set directly, this is constructed bysub_experiment_id(omittable) andvariant_label, via decorated methodmember_id().You can use the class member
requiredAttribs,filenameAttribs,filenameAttribsOptional,dirnameAttribsto know necessary attributes to set this class and a filename/dirname valid for DRS.Note
Attributes as the class member,
hasattr(self, a) is False: not set explicitlyself.a == None: not set explicitlyself.a == '*': set as is <- not implemented yettype(self.a) == list: multiple values for brace expansion.
- Parameters
If file is given, it must be a valid CMIP6 netCDF file, and attributes in that file are read and set.
Else if filename is given, it must be a valid filename as DRS, and attributes are set from components consist of that name.
Else if dirname is given, it must be a valid directory name as DRS, and attributes are set from components consist of that name.
Else attributes are set from **kw dict.
If do_sanitize is
True, remove invalid attribute values, else set as-is.You can sanitize after via
doSanitize().Examples
>>> drs.DRS(filename='tas_Amon_MIROC6_piControl_r1i1p1f1_gn_320001-329912.nc') DRS(experiment_id='piControl', grid_label='gn', mip_era='CMIP6', source_id='MIROC6', table_id='Amon', time_range='320001-329912', variable_id='tas', variant_label='r1i1p1f1') >>> drs.DRS(dirname='/data/CMIP6/CMIP/MIROC/MIROC6/piControl/r1i1p1f1/Amon/tas/gn/v20181212/') DRS(activity_id='CMIP', experiment_id='piControl', grid_label='gn', institution_id='MIROC', mip_era='CMIP6', source_id='MIROC6', table_id='Amon', variable_id='tas', variant_label='r1i1p1f1', version='v20181212')
Do or not sanitize;
>>> attrs = {k:v for k,v in drs.sample_attrs.items()} >>> attrs['table_id'] = 'INVALID' >>> d = drs.DRS(**attrs) >>> d.table_id Traceback (most recent call last): ... AttributeError: 'DRS' object has no attribute 'table_id' >>> d = drs.DRS(**attrs, do_sanitize=False) >>> d.table_id 'INVALID'
-
dirName(prefix=None, allow_asterisk=True)[source]¶ Construct directory name by DRS from
DRSinstance members.If allow_asterisk is
True, invalidIf you want glob/brace expaned list, use
dirNameList()instead.- Parameters
prefix (Path-like) – prepend to the result path.
allow_asterisk – allow result contains
*or not.
- Raises
AttributeError – any attributes are missing or invalid and
allow_asterisk=False- Returns
Path-like – directory name
Examples
Usual case;
>>> str(drs.DRS(**drs.sample_attrs).dirName()) 'CMIP6/CMIP/MIROC/MIROC6/piControl/r1i1p1f1/Amon/tas/gn/v20181212'
With
sub_experiment_id;>>> str(drs.DRS(**drs.sample_attrs_w_subexp).dirName()) 'CMIP6/DCPP/IPSL/IPSL-CM6A-LR/dcppC-atl-pacemaker/s1950-r1i1p1f1/Amon/rsdscs/gr/v20190110'
Invalid value for valid attribute;
>>> attrs = {k:v for k,v in drs.sample_attrs.items()} >>> attrs['table_id'] = 'invalid' >>> str(drs.DRS(**attrs).dirName()) 'CMIP6/CMIP/MIROC/MIROC6/piControl/r1i1p1f1/*/tas/gn/v20181212' >>> str(drs.DRS(**attrs).dirName(allow_asterisk=False)) Traceback (most recent call last): ... AttributeError: 'DRS' object has no attribute 'table_id'
Missing attributes;
>>> attrs = {k:v for k,v in drs.sample_attrs.items()} >>> del attrs['experiment_id'] >>> str(drs.DRS(**attrs).dirName(prefix='/data/')) '/data/CMIP6/CMIP/MIROC/MIROC6/*/r1i1p1f1/Amon/tas/gn/v20181212' >>> str(drs.DRS(**attrs).dirName(prefix='/data/', allow_asterisk=False)) Traceback (most recent call last): ... AttributeError: 'DRS' object has no attribute 'experiment_id'
Allow multi values;
>>> attrs = {k: v for k, v in drs.sample_attrs.items()} >>> attrs.update({'experiment_id':'amip, piControl'}) >>> str(drs.DRS(**attrs).dirName()) 'CMIP6/CMIP/MIROC/MIROC6/{amip,piControl}/r1i1p1f1/Amon/tas/gn/v20181212'
-
dirNameList(prefix=None)[source]¶ Return list of directory name constructed by DRS from
DRSinstance members, that contains asterisk and/or braces- Parameters
prefix (path-like) – dirname to prepend.
- Returns
list of path-like – directory names
Note
Non-existent directories are omitted.
Examples
>>> attrs = {k: v for k, v in drs.sample_attrs.items()} >>> attrs.update({'experiment_id':'amip, piControl'}) >>> del attrs['version'] >>> str(drs.DRS(**attrs).dirName()) 'CMIP6/CMIP/MIROC/MIROC6/{amip,piControl}/r1i1p1f1/Amon/tas/gn/*' >>> res = drs.DRS(**attrs).dirNameList(prefix='/data') >>> ref = [Path('/data/CMIP6/CMIP/MIROC/MIROC6/amip/r1i1p1f1/Amon/tas/gn/v20181214'), ... Path('/data/CMIP6/CMIP/MIROC/MIROC6/piControl/r1i1p1f1/Amon/tas/gn/v20181212')] >>> print(ref == res) True
The last example will return
[]if expanded directories do not exist.
-
doSanitize(silent=True)[source]¶ Sanitize instances.
That is, remove invalid values for valid attributes.
- Parameters
silent (bool) – do it silently or not
- Returns
nothing
Examples
>>> d = drs.DRS(**drs.sample_attrs) >>> d.activity_id = 'InvalidMIP' >>> hasattr(d, 'activity_id') True >>> d.doSanitize() >>> hasattr(d, 'activity_id') False
For above case, You should use
d.set(activity_id='...')instead of setting an attribute directly. Seeset().
-
fileName(prefix=None, w_time_range=True, allow_asterisk=True)[source]¶ Construct filename from current instance member attributes.
- Parameters
- Raises
AttributeError – any attributes are missing.
- Returns
path-like – filename
Note
By definition, including <time_range> part or not is decided by the attribute <frequency> is ‘fx’ or not. <frequency> is the same with the attribute <table_id>, so in this method if
self.table_id == 'fx'force w_time_range to beTrue. Ifself.table_id = '*'or set multi values and you want force <time_range> part to be omitted, setw_time_range=Falseexplicitly.Examples
Usual case;
>>> str(drs.DRS(**drs.sample_attrs).fileName()) 'tas_Amon_MIROC6_piControl_r1i1p1f1_gn_320001-329912.nc'
With
sub_experiment_id;>>> str(drs.DRS(**drs.sample_attrs_w_subexp).fileName()) 'rsdscs_Amon_IPSL-CM6A-LR_dcppC-atl-pacemaker_s1950-r1i1p1f1_gr_192001-201412.nc'
No
time_range;>>> str(drs.DRS(**drs.sample_attrs_no_time_range).fileName(w_time_range=False)) 'areacella_fx_MIROC6_historical_r1i1p1f1_gn.nc'
With prefix;
>>> prefix=Path('/data/CMIP6/') >>> str(drs.DRS(**drs.sample_attrs).fileName(prefix)) '/data/CMIP6/tas_Amon_MIROC6_piControl_r1i1p1f1_gn_320001-329912.nc'
Invalid value for valid attribute;
>>> attrs = {k: v for k, v in drs.sample_attrs.items()} >>> attrs.update({'table_id': 'invalid'}) >>> str(drs.DRS(**attrs).fileName()) 'tas_*_MIROC6_piControl_r1i1p1f1_gn_320001-329912.nc' >>> str(drs.DRS(**attrs).fileName(allow_asterisk=False)) Traceback (most recent call last): ... AttributeError: 'DRS' object has no attribute 'table_id'
Missing attributes;
>>> attrs = {k: v for k, v in drs.sample_attrs.items()} >>> del attrs['time_range'] >>> str(drs.DRS(**attrs).fileName()) 'tas_Amon_MIROC6_piControl_r1i1p1f1_gn_*.nc' >>> str(drs.DRS(**attrs).fileName(allow_asterisk=False)) Traceback (most recent call last): ... AttributeError: 'DRS' object has no attribute 'time_range'
Allow multi values;
>>> attrs = {k: v for k, v in drs.sample_attrs.items()} >>> attrs.update({'experiment_id':'amip, piControl'}) >>> str(drs.DRS(**attrs).fileName()) 'tas_Amon_MIROC6_{amip,piControl}_r1i1p1f1_gn_320001-329912.nc'
-
fileNameList(prefix=None)[source]¶ Returns a list of filenames constructed by the instance member attributes that may contains ‘*’ and/or braces.
Returns: list of str: filenames
Examples
>>> attrs = {k: v for k, v in drs.sample_attrs.items()} >>> attrs.update({'experiment_id':'amip, piControl'}) >>> del attrs['time_range'] >>> str(drs.DRS(**attrs).fileName()) 'tas_Amon_MIROC6_{amip,piControl}_r1i1p1f1_gn_*.nc'
>>> dlist = drs.DRS(**attrs).fileNameList() # doctest: +SKIP >>> [str(d) for d in dlist] # doctest: +SKIP ['tas_Amon_MIROC6_amip_r1i1p1f1_gn_*.nc', 'tas_Amon_MIROC6_piControl_r1i1p1f1_gn_*.nc']
The last example will return
[]if expanded files do not exist.
-
getAttribs()[source]¶ Return current instance attributes defined of
requiredAttribsand their values.- Returns
dict – attribute-value pairs.
-
getAttrsFromGA(file)[source]¶ Obtain requiered attributes from the global attributes defined in a valid netCDF file.
- Parameters
file (str or path-like?) – filename of a valid netCDF file.
- Returns
dict – whose keys are from
DRS.requiredAttribs.
-
isValid(silent=True)[source]¶ Check if attributes are valid as DRS.
- Parameters
silent (bool) – no message even if something is invalid.
- Returns
bool – all attributes are valid or not.
Examples
>>> d = drs.DRS(**drs.sample_attrs) >>> d.isValid() True >>> d.activity_id = 'InvalidMIP' >>> d.isValid() False
-
isValidPath(path, directory=False, separated=False)[source]¶ Check if given path is DRS compliant.
path may be a URL obtained by ESGF Search function. See
cmiputil.esgfsearchfor details.- Parameters
- Returns
bool or list of bool – valid or not (see below)
If separate is True, return a tuple of two dicts, first element is for the filename, second is for the directory name, both dicts’ key/value shows that each attributes are valid or not. If directory is
True, first elements is{'all': True}.Examples
>>> ourl = ('http://vesg.ipsl.upmc.fr/thredds/fileServer/cmip6/DCPP/' ... 'IPSL/IPSL-CM6A-LR/dcppC-pac-pacemaker/s1920-r1i1p1f1/' ... 'Amon/rsdscs/gr/v20190110/rsdscs_Amon_IPSL-CM6A-LR_' ... 'dcppC-pac-pacemaker_s1920-r1i1p1f1_gr_192001-201412.nc') >>> drs.DRS().isValidPath(url) True >>> drs.DRS().isValidPath(url, separated=True) ({'experiment_id': True, 'grid_label': True, 'source_id': True, 'sub_experiment_id': True, 'table_id': True, 'time_range': True, 'variable_id': True, 'variant_label': True}, {'activity_id': True, 'experiment_id': True, 'grid_label': True, 'institution_id': True, 'mip_era': True, 'source_id': True, 'sub_experiment_id': True, 'table_id': True, 'variable_id': True, 'variant_label': True, 'version': True}) >>> url = ('http://vesg.ipsl.upmc.fr/thredds/fileServer/cmip6/DCPP/' ... 'IPSL/IPSL-CM6A-LR/dcppC-pac-pacemaker/s1920-r1i1p1f1/' ... 'Amon/rsdscs/gr/v20190110') >>> drs.DRS().isValidPath(url) False >>> drs.DRS().isValidPath(url, directory=True) True
-
isValidValueForAttr(value, attr)[source]¶ Check value is valid for the attribute attr.
- Parameters
- Raises
AttributeError – raises when attr is invalid for DRS.
- Returns
bool – whether value is valid for the attribute attr
Examples
>>> d = drs.DRS() >>> d.isValidValueForAttr('Amon', 'table_id') True >>> d.isValidValueForAttr('Invalid', 'source_id') False >>> d.isValidValueForAttr('piControl', 'experiment_id') True >>> d.isValidValueForAttr('piControl', 'experiments_id') Traceback (most recent call last): ... AttributeError: ('Invalid Attribute for DRS:', 'experiments_id') >>> d.isValidValueForAttr('*', 'institution_id') True >>> d.isValidValueForAttr('MIROC*', 'source_id') True
-
set(do_sanitize=True, **argv)[source]¶ Set instance attributes, if attribute is in
requiredAttribs.In argv,
missing attributes are left unset/untouched,
attribute with invalid value is sanitized via
doSanitize()ifdo_sanitize=True,unnecessary attributes are neglected.
Each of attributes are checked by
isValidValueForAttr()before set.- Parameters
argv (dict) – attribute/value pairs
do_sanitize (bool) – remove invalid values via
doSanitize()
- Returns
nothing
Examples
>>> d = drs.DRS(**drs.sample_attrs) >>> d DRS(activity_id='CMIP', experiment_id='piControl', grid_label='gn', institution_id='MIROC', mip_era='CMIP6', source_id='MIROC6', table_id='Amon', time_range='320001-329912', variable_id='tas', variant_label='r1i1p1f1', version='v20181212') >>> d.set(experiment_id='amip') >>> d.experiment_id 'amip' >>> d.set(experiment_id='invalid_experiment') >>> d.experiment_id Traceback (most recent call last): ... AttributeError: 'DRS' object has no attribute 'experiment_id'
In the last example, invalid value for experiment_id is sanitized since
do_sanitize=Trueby default.
-
splitDirName(dname, validate=False)[source]¶ Split dirname to attributes for DRS.
If
varidate=False, just split only. So if the dname consist of the same number of components with DRS-valid directory name, no error happens. You should set validate=True or useisValidValueForAttr()by yourself.- Parameters
dname (path-like) – directory name
validate (bool) – validate the resulting attribute/value pair
- Returns
dict – attribute and it’s value
Note
Instance members keep untouched, give
set()the result of this method.Examples
>>> dname = 'CMIP6/CMIP/MIROC/MIROC6/piControl/r1i1p1f1/Amon/tas/gn/v20181212' >>> drs.DRS().splitDirName(dname) {'activity_id': 'CMIP', 'experiment_id': 'piControl', 'grid_label': 'gn', 'institution_id': 'MIROC', 'mip_era': 'CMIP6', 'source_id': 'MIROC6', 'table_id': 'Amon', 'variable_id': 'tas', 'variant_label': 'r1i1p1f1', 'version': 'v20181212', 'prefix': ''}
With prefix;
>>> dname = ('/work/data/CMIP6/CMIP6/CMIP/MIROC/MIROC6/piControl/r1i1p1f1/Amon/tas/gn/v20181212') >>> drs.DRS().splitDirName(dname) {'activity_id': 'CMIP', 'experiment_id': 'piControl', 'grid_label': 'gn', 'institution_id': 'MIROC', 'mip_era': 'CMIP6', 'source_id': 'MIROC6', 'table_id': 'Amon', 'variable_id': 'tas', 'variant_label': 'r1i1p1f1', 'version': 'v20181212', 'prefix': '/work/data/CMIP6'}
Invalid case;
>>> dname = 'Some/Invalid/Path' >>> drs.DRS().splitDirName(dname) Traceback (most recent call last): ... ValueError: Invalid dirname: "Some/Invalid/Path" >>> dname = 'Some/Invalid/but/has/occasionally/the/same/number/of/component/' >>> drs.DRS().splitDirName(dname) {'activity_id': 'Invalid', 'experiment_id': 'occasionally', 'grid_label': 'of', 'institution_id': 'but', 'mip_era': 'Some', 'source_id': 'has', 'table_id': 'same', 'variable_id': 'number', 'variant_label': 'the', 'version': 'component', 'prefix': ''} >>> drs.DRS().splitDirName(dname, validate=True) Traceback (most recent call last): ... ValueError: "Invalid" is invalid for <activity_id>
-
splitFileName(fname, validate=False)[source]¶ Split filename to attributes for DRS.
If
varidate=False, just split only. So if the fname consist of the same number of components with DRS-valid filename, no error happens. You should set validate=True or useisValidValueForAttr()by yourself.- Parameters
fname (Path-like) – filename
validate (bool) – validate the resulting attribute/value pair
- Raises
ValueError – if fname is invalid for DRS.
- Returns
dict – attribute and it’s value
Note
Instance members keep untouched, give
set()the result of this method.Examples
>>> fname = "tas_Amon_MIROC6_piControl_r1i1p1f1_gn_320001-329912.nc" >>> drs.DRS().splitFileName(fname) {'experiment_id': 'piControl', 'grid_label': 'gn', 'source_id': 'MIROC6', 'table_id': 'Amon', 'time_range': '320001-329912', 'variable_id': 'tas', 'variant_label': 'r1i1p1f1'} >>> fname='invalid_very_long_file_name.nc' >>> drs.DRS().splitFileName(fname) Traceback (most recent call last): ... ValueError: not follow the name template: "invalid_very_long_file_name.nc" >>> fname='invalid_but_same_length_with_drs.nc' >>> drs.DRS().splitFileName(fname) {'experiment_id': 'length', 'grid_label': 'drs', 'source_id': 'same', 'table_id': 'but', 'variable_id': 'invalid', 'variant_label': 'with'} >>> drs.DRS().splitFileName(fname, validate=True) Traceback (most recent call last): ... ValueError: "length" is invalid for <experiment_id>
-
dirnameAttribs= ('mip_era', 'activity_id', 'institution_id', 'source_id', 'experiment_id', 'member_id', 'table_id', 'variable_id', 'grid_label', 'version')¶ Attributes necessary to construct dirname.
-
filenameAttribs= ('variable_id', 'table_id', 'source_id', 'experiment_id', 'member_id', 'grid_label')¶ Attributes necessary to construct filename.
-
filenameAttribsOptional= ('time_range',)¶ Attributes optional to construct filename.
-
property
member_id¶ Getter for the attribute <member_id>.
See the definition of this attribute in
cmiputil.drs.
-
requiredAttribs= ('activity_id', 'experiment_id', 'grid_label', 'institution_id', 'mip_era', 'source_id', 'sub_experiment_id', 'table_id', 'time_range', 'variable_id', 'variant_label', 'version')¶ Attributes managed in this class.