cmiputil.dds module

Module to parse DDS (Dataset Descriptor Structure) used in OPeNDAP.

DDS

For the definition of DDS, see OpenDAP UserGuide. In this module, we change the notation in the DDS syntax as follows:

declarations := list(declaration)
declaration := Var | Struct
Struct := stype { declarations } (name | name arr)
stype := Dataset|Structure|Sequence|Grid
Grid := Grid { ARRAY: declaration MAPS: declarations } (name | name arr)
Var := btype (name | name arr)
btype := Byte|Int32|UInt32|Float64|String|Url| …
arr := [integer] | [name = integer]

As you can see from above syntax, one Struct can contain other Struct recursively, and consists the tree structure. The root of the tree must be one “Dataset”.

In this module, each element of above syntax is implemented as one class.

Basic Usage

Text form of DDS will be obtained by, for example, ESGFDataInfo.getDDS(). Use parse_dataset() to parse it to get the tree structure. The root of the tree is a Dataset instance, and you can access nodes and leafs of the tree by dot notation (see also ‘Example’ section below):

ds = parse_dataset(text=sample1)
ds.tas  # Grid('tas, arrary=Var(tas, ...), maps={'time':..., 'lat':..., 'lon':...})
ds.tas.array.arr[0]  # Arr('time', 8412)

Example

>>> sample1 = '''
... Dataset {
...     Float64 lat[lat = 160];
...     Float64 lat_bnds[lat = 160][bnds = 2];
...     Float64 lon[lon = 320];
...     Float64 lon_bnds[lon = 320][bnds = 2];
...     Float64 height;
...     Float64 time[time = 8412];
...     Float64 time_bnds[time = 8412][bnds = 2];
...     Grid {
...      ARRAY:
...         Float32 tas[time = 8412][lat = 160][lon = 320];
...      MAPS:
...         Float64 time[time = 8412];
...         Float64 lat[lat = 160];
...         Float64 lon[lon = 320];
...     } tas;
... } CMIP6.CMIP.MRI.MRI-ESM2-0.piControl.r1i1p1f1.Amon.tas.gn.tas.20190222.aggregation.1;'''
>>> sample1_struct = Dataset(
...    'CMIP6.CMIP.MRI.MRI-ESM2-0.piControl.r1i1p1f1.Amon.tas.gn.tas.20190222.aggregation.1',
...    {
...        'lat':
...        Var('lat', 'Float64', arr=[Arr('lat', 160)]),
...        'lat_bnds':
...        Var('lat_bnds', 'Float64', arr=[Arr('lat', 160),
...                                        Arr('bnds', 2)]),
...        'lon':
...        Var('lon', 'Float64', arr=[Arr('lon', 320)]),
...        'lon_bnds':
...        Var('lon_bnds', 'Float64', arr=[Arr('lon', 320),
...                                        Arr('bnds', 2)]),
...        'height':
...        Var('height', 'Float64'),
...        'time':
...        Var('time', 'Float64', arr=[Arr('time', 8412)]),
...        'time_bnds':
...        Var('time_bnds', 'Float64', arr=[Arr('time', 8412),
...                                         Arr('bnds', 2)]),
...        'tas':
...        Grid('tas',
...             array=Var(
...                 'tas',
...                 'Float32',
...                 arr=[Arr('time', 8412),
...                      Arr('lat', 160),
...                      Arr('lon', 320)]),
...             maps={
...                 'time': Var('time', 'Float64', arr=[Arr('time', 8412)]),
...                 'lat': Var('lat', 'Float64', arr=[Arr('lat', 160)]),
...                 'lon': Var('lon', 'Float64', arr=[Arr('lon', 320)])
...             })
...    })
>>> sample1_struct == parse_dataset(sample1)
True
>>> from cmiputil import dds
>>> sample2 = '''
... Dataset {
...   Int32 catalog_number;
...   Sequence {
...     String experimenter;
...     Int32 time;
...     Structure {
...       Float64 latitude;
...       Float64 longitude;
...     } location;
...     Sequence {
...       Float64 depth;
...       Float64 salinity;
...       Float64 oxygen;
...       Float64 temperature;
...     } cast;
...   } station;
... } data;
... '''
>>> sample2_struct = Dataset(
...     'data', {
...         'catalog_number':
...         Var('catalog_number', 'Int32'),
...         'station':
...         Sequence(
...             'station', {
...                 'experimenter':
...                 Var('experimenter', 'String'),
...                 'time':
...                 Var('time', 'Int32'),
...                 'location':
...                 Structure(
...                     'location', {
...                         'latitude': Var('latitude', 'Float64'),
...                         'longitude': Var('longitude', 'Float64')
...                     }),
...                 'cast':
...                 Sequence(
...                     'cast', {
...                         'depth': Var('depth', 'Float64'),
...                         'salinity': Var('salinity', 'Float64'),
...                         'oxygen': Var('oxygen', 'Float64'),
...                         'temperature': Var('temperature', 'Float64')
...                     })
...             })
...     })
>>> sample2_struct == parse_dataset(sample2)
True
>>> sample3 = '''
... Dataset {
...     Structure {
...         Float64 lat;
...         Float64 lon;
...     } location;
...     Structure {
...         Int32 minutes;
...         Int32 day;
...         Int32 year;
...     } time;
...     Float64 depth[500];
...     Float64 temperature[500];
... } xbt-station;
... '''
>>> sample3_struct = Dataset(
...     'xbt-station', {
...         'location':
...         Structure('location', {
...             'lat': Var('lat', 'Float64'),
...             'lon': Var('lon', 'Float64')
...         }),
...         'time':
...         Structure(
...             'time', {
...                 'minutes': Var('minutes', 'Int32'),
...                 'day': Var('day', 'Int32'),
...                 'year': Var('year', 'Int32')
...             }),
...         'depth':
...         Var('depth', 'Float64', arr=[Arr('', 500)]),
...         'temperature':
...         Var('temperature', 'Float64', arr=[Arr('', 500)])
...     })
>>> sample3_struct == parse_dataset(sample3)
True
class dds.Arr(name='', val=None, text=None)[source]

Bases: object

Class for arr.

arr := [integer] | [name = integer]

As a text form:

text = '[time = 8412]'
text = '[500]'

Example

>>> text = '[lat = 160];'
>>> Arr(text=text)
Arr('lat', 160)
>>> text = '[500];'
>>> Arr(text=text)
Arr('', 500)
name

name

Type

str

val

integer

Type

int

parse(text)[source]
text_formatted(indent=None, linebreak=None)[source]

Text form of arr.

indent and linebreak are dummy here.

property text
class dds.BType[source]

Bases: enum.Enum

Values for Var.btype.

Byte = 'Byte'
Float32 = 'Float32'
Float64 = 'Float64'
Int16 = 'Int16'
Int32 = 'Int32'
String = 'String'
UInt32 = 'UInt32'
Url = 'Url'
class dds.Dataset(name='', decl=None, text=None)[source]

Bases: dds.Struct

Class for Dataset.

See Struct.

stype = 'Dataset'
class dds.Decl(name='')[source]

Bases: object

Class for declaration, that is, base class for Var and Struct. No need to use this class explicitly.

declaration := Var | Struct
text_formatted(indent=None, linebreak=True)[source]
class dds.Decls[source]

Bases: dict

Class for declarations.

declarations := list(declaration)

In this module, declarations are expressed as dict, not list. At this point, this class is just an alias for dict.

class dds.Grid(name='', array=None, maps=None, text=None)[source]

Bases: dds.Struct

Class for Grid.

Grid := Grid { ARRAY: declaration MAPS: declarations } (name | name arr)
name

name

Type

str

stype

stype

Type

SType

array

ARRAY declaration

Type

Decl

maps

MAPS declarations

Type

Decls

Examples

>>> text = '''
...     Grid {
...      ARRAY:
...         Float32 tas[time = 8412][lat = 160][lon = 320];
...      MAPS:
...         Float64 time[time = 8412];
...         Float64 lat[lat = 160];
...         Float64 lon[lon = 320];
...     } tas;'''
>>> Grid(text=text)
Grid('tas', array=Var('tas', 'Float32', arr=[Arr('time', 8412), Arr('lat', 160), Arr('lon', 320)]), maps={'time': Var('time', 'Float64', arr=[Arr('time', 8412)]), 'lat': Var('lat', 'Float64', arr=[Arr('lat', 160)]), 'lon': Var('lon', 'Float64', arr=[Arr('lon', 320)])})
Parameters
  • name (str) – name

  • stype (str or SType) – stype

  • array (Decl) – ARRAY declaration

  • maps (Decls) – MAPS declarations

  • text (str) – text to be parsed.

If text is not None, other attributes are overridden by the result of parse().

parse(text)[source]

Parse text to construct Grid.

text_formatted(indent=4, linebreak=True)[source]

Return formatted text.

stype = 'Grid'
property text

Text to construct this instance.

class dds.SType[source]

Bases: enum.Enum

Values for Struct.stype

Dataset = 'Dataset'
Grid = 'Grid'
Sequence = 'Sequence'
Structure = 'Structure'
class dds.Sequence(name='', decl=None, text=None)[source]

Bases: dds.Struct

Class for Sequence.

See Struct.

Examples

>>> text = '''
...     Sequence {
...       Float64 depth;
...       Float64 salinity;
...       Float64 oxygen;
...       Float64 temperature;
...     } cast;'''
>>> Sequence(text=text)
Sequence('cast', {'depth': Var('depth', 'Float64'), 'salinity': Var('salinity', 'Float64'), 'oxygen': Var('oxygen', 'Float64'), 'temperature': Var('temperature', 'Float64')})
stype = 'Sequence'
class dds.Struct(name='', decl=None, text=None)[source]

Bases: dds.Decl

Class for struct, that is, base class for Structure, Sequence, Grid and Dataset. Do not use this directly.

struct := stype { declarations } var
stype := Dataset|Structure|Sequence|Grid

You can access items of self.decl as if they are the attribute of this class, via dot notation.

Examples

>>> text = '''
... Sequence {
...   Float64 depth;
...     Float64 salinity;
...     Float64 oxygen;
...     Float64 temperature;
...   } cast;'''
>>> s = Sequence(text=text)
>>> s.salinity
Var('salinity', 'Float64')
>>> text = '''
... Dataset {
...   Int32 catalog_number;
...   Sequence {
...     String experimenter;
...     Int32 time;
...     Structure {
...       Float64 latitude;
...       Float64 longitude;
...     } location;
...   } station;
... } data;'''
>>> d = parse_dataset(text)
>>> d.station.location.latitude
Var('latitude', 'Float64')
name

name

Type

str

stype

stype

Type

SType

decl

declarations

Type

Decls)

Parameters
  • name (str) – name

  • decl (str or Decls)) – declarations

  • text (str) – text to be parsed.

If text is not None, other attributes are overridden by the result of parse() or left untouced..

parse(text)[source]

Parse text to construct Struct.

If given text is not valid for each subclass, the instance is left as ‘null’ instance.

text_formatted(indent=4, linebreak=True)[source]

Return formatted text.

stype = None
property text

Text to construct this instance.

class dds.Structure(name='', decl=None, text=None)[source]

Bases: dds.Struct

Class for Structure.

See Struct.

stype = 'Structure'
class dds.Var(name='', btype=None, arr=None, text=None)[source]

Bases: dds.Decl

Class for Var.

Var := basetype (name*|*name arr)
name

name

Type

str

btype

basetype

Type

BType

arr

array-decl

Type

list(Arr)

Parameters
  • name (str) – name

  • btype (str or BType) – basetype

  • arr (Arr or list(Arr)) – array-decl

  • text (str) – text to be parsed

Raises

TypeError – if btype or arr is invalid

If text is not None, other attributes are overridden by the result of parse().

parse(text)[source]

Parse text to construct Var.

text_formatted(indent=None, linebreak=None)[source]

Formatted text expression of this instance.

indent and linebreak are dummy arguments here.

property text

Text to construct this instance.

dds.check_braces_matching(text)[source]

Check if braces({ and }) in given text match.

Raises ValueError unless match.

Examples

>>> text = 'Dataset{varline} hoge'
>>> check_braces_matching(text)  # True
>>> text = 'Struct{ Sequence{Var} fuga }} hoge'
>>> check_braces_matching(text)
Traceback (most recent call last):
    ...
ValueError: braces do not match: too many right braces: 1 more.
>>> text = 'Struct{ Sequence{{Var} fuga } hoge'
>>> check_braces_matching(text)
Traceback (most recent call last):
    ...
ValueError: braces do not match: too many left braces: 1 more.
dds.parse_arrdecls(text)[source]

Parse text contains multiple Arr definitions and return a list of them.

dds.parse_dataset(text)[source]

Parse toplevel dataset.

dataset := Dataset { declarations } name;

dds.parse_declarations(text)[source]

Return Decls, dict of {name: Decl} parsed from text.

dds.pop_struct(text)[source]

Pop one Struct-derived instance parsed from the first part of text, return it and the rest of text.

dds.pop_varline(text)[source]

Pop one Var instance parsed from the first part of text, return it and rest of the text.