intake-erddap Python API

intake-erddap catalog

class intake_erddap.erddap_cat.ERDDAPCatalog(*args, **kwargs)[source]

Makes data sources out of all datasets the given ERDDAP service

Parameters:
  • server (str) –

    URL to the ERDDAP service. Example: "https://coastwatch.pfeg.noaa.gov/erddap"

    Note

    Do not include a trailing slash.

  • bbox (tuple of 4 floats, optional) – For explicit geographic search queries, pass a tuple of four floats in the bbox argument. The bounding box parameters are (min_lon, min_lat, max_lon, max_lat).

  • standard_names (list of str, optional) – For explicit search queries for datasets containing a given standard_name use this argument. Example: [“air_temperature”, “air_pressure”].

  • variable_names (list of str, optional) – For explicit search queries for datasets containing a variable with a given name. This can be useful when the client knows of a particular variable name or a convention applied where there is no CF standard name.

  • start_time (datetime, optional) – For explicit search queries for datasets that contain data after start_time.

  • end_time (datetime, optional) – For explicit search queries for datasets that contain data before end_time.

  • search_for (list of str, optional) – For explicit search queries for datasets that any contain of the terms specified in this keyword argument.

  • kwargs_search (dict, optional) –

    Keyword arguments to input to search on the server before making the catalog. Options are:

    • to search by bounding box: include all of min_lon, max_lon, min_lat, max_lat: (int, float)

      Longitudes must be between -180 to +180.

    • to search within a datetime range: include both of min_time, max_time: interpretable

      datetime string, e.g., “2021-1-1”

    • to search using a textual keyword: include search_for as either

      a string or a list of strings. Multiple values will be searched individually and combined in the final catalog results.

  • category_search (list, tuple, optional) – Use this to narrow search by ERDDAP category. The syntax is (category, key), e.g. (“standard_name”, “temp”). category is the ERDDAP category for filtering results. Good choices for selecting variables are “standard_name” and “variableName”. key is the custom_criteria key to narrow the search by, which will be matched to the category results using the custom_criteria that must be set up or input by the user, with cf-pandas. Currently only a single key can be matched at a time.

  • use_source_constraints (bool, default True) – Any relevant search parameter defined in kwargs_search will be passed to the source objects as constraints.

  • protocol (str, default "tabledap") – One of the two supported ERDDAP Data Access Protocols: “griddap”, or “tabledap”. “tabledap” will present tabular datasets using pandas, meanwhile “griddap” will use xarray.

  • metadata (dict, optional) – Extra metadata for the intake catalog.

  • query_type (str, default "union") – Specifies how the catalog should apply the query parameters. Choices are "union" or "intersection". If the query_type is set to "intersection", then the set of results will be the intersection of each individual query made to ERDDAP. This is equivalent to a logical AND of the results. If the value is "union" then the results will be the union of each resulting dataset. This is equivalent to a logical OR.

  • mask_failed_qartod (bool, False) – WARNING ALPHA FEATURE. If True and *_qc_agg columns associated with data columns are available, data values associated with QARTOD flags other than 1 and 2 will be nan’ed out. Has not been thoroughly tested.

  • dropna (bool, False.) – WARNING ALPHA FEATURE. If True, rows with data columns of nans will be dropped from data frame. Has not been thoroughly tested.

  • cache_kwargs (dict, optional) – WARNING ALPHA FEATURE. If you want to have the data you access stored locally in a cache, use this keyword to input a dictionary of keywords. The cache is set up using fsspec’s simple cache. Example configuration is cache_kwargs=dict(cache_storage="/tmp/fnames/", same_names=True).

search_url

If a search is performed on the ERDDAP server, the search url is saved as an attribute.

Type:

str

server

The Base URL of the ERDDAP instance.

Type:

str

Methods

get_client()

Return an initialized ERDDAP Client.

get_search_urls()

Return the search URLs used in generating the catalog.

get_client() ERDDAP[source]

Return an initialized ERDDAP Client.

get_search_urls() List[str][source]

Return the search URLs used in generating the catalog.

intake-erddap source

class intake_erddap.erddap.ERDDAPSource(*args, **kwargs)[source]

ERDDAP Source (Base Class). This class represents the abstract base class for an intake data source object for ERDDAP. Clients should use either TableDAPSource or GridDAPSource.

Parameters:
  • dataset_id (str) – The unique datasetID value returned from ERDDAP.

  • protocol (str) – Either ‘griddap’ or ‘tabledap’.

  • variables (list of str) –

  • constraints (dict) – The query constraints to apply to TableDAP requests.

  • metadata (dict) –

  • erddap_client (class, optional) – The client object to use for connections to ERDDAP. Must conform to the erddapy.ERDDAP interface.

  • http_client (class, optional) – The client object to use for HTTP requests. Must conform to the requests interface.

  • open_kwargs (dict, optional) – Keyword arguments to pass on to the open function like e.to_pandas for a DataFrame. For example, {“parse_dates”: True}

Note

Caches entire dataframe in memory.

Methods

get_client()

Return an initialized ERDDAP Client.

get_client() ERDDAP[source]

Return an initialized ERDDAP Client.

class intake_erddap.erddap.TableDAPSource(*args, **kwargs)[source]

Creates a Data Source for an ERDDAP TableDAP Dataset.

Parameters:
  • server (str) –

    URL to the ERDDAP service. Example: "https://coastwatch.pfeg.noaa.gov/erddap"

    Note

    Do not include a trailing slash.

  • dataset_id (str) – The dataset identifier from ERDDAP.

  • variables (list of str, optional) – A list of variables to retrieve from the dataset.

  • constraints (dict, optional) – A mapping of conditions and constraints. Example: {"time>=": "2022-01-02T12:00:00Z", "lon>": -140, "lon<": 0}

  • metadata (dict, optional) – Additional metadata to include with the source passed from the catalog.

  • erddap_client (type, optional) – A class that implements an interface like erdappy’s ERDDAP class. The source will rely on this client to interface with ERDDAP for most requests.

  • http_client (module or object, optional) – An object or module that implements an HTTP Client similar to request’s interface. The source will use this object to make HTTP requests to ERDDAP in some cases.

  • mask_failed_qartod (bool, False) – WARNING ALPHA FEATURE. If True and *_qc_agg columns associated with data columns are available, data values associated with QARTOD flags other than 1 and 2 will be nan’ed out. Has not been thoroughly tested.

  • dropna (bool, False.) – WARNING ALPHA FEATURE. If True, rows with data columns of nans will be dropped from data frame. Has not been thoroughly tested.

  • cache_kwargs (dict, optional) – WARNING ALPHA FEATURE. If you want to have the data you access stored locally in a cache, use this keyword to input a dictionary of keywords. The cache is set up using fsspec’s simple cache. Example configuration is cache_kwargs=dict(cache_storage="/tmp/fnames/", same_names=True).

Examples

Sources are normally returned from a catalog object, but a source can be instantiated directly:

>>> source = TableDAPSource("https://erddap.senors.axds.co/erddap",
... "gov_usgs_waterdata_441759103261203")

Getting a pandas DataFrame from the source:

>>> ds = source.read()

Once the dataset object has been instantiated, the dataset’s full metadata is available in the source.

>>> source.metadata
{'info_url': 'https://erddap.sensors.axds.co/erddap/info/gov_usgs_waterdata_404513098181201...',
'catalog_dir': '',
'variables': {'time': {'_CoordinateAxisType': 'Time',
'actual_range': [1430828100.0, 1668079800.0],
'axis': 'T',
'ioos_category': 'Time',
'long_name': 'Time',
'standard_name': 'time',
'time_origin': '01-JAN-1970 00:00:00',
'units': 'seconds since 1970-01-01T00:00:00Z'},
    ...

Methods

read()

Return the dataframe from ERDDAP

read_chunked()

Return iterator over container fragments of data source

read_partition(i)

Return a part of the data corresponding to i-th partition.

read() DataFrame[source]

Return the dataframe from ERDDAP

read_chunked()

Return iterator over container fragments of data source

read_partition(i)

Return a part of the data corresponding to i-th partition.

By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.

class intake_erddap.erddap.GridDAPSource(*args, **kwargs)[source]

Creates a Data Source for an ERDDAP GridDAP Dataset.

Parameters:
  • server (str) –

    URL to the ERDDAP service. Example: "https://coastwatch.pfeg.noaa.gov/erddap"

    Note

    Do not include a trailing slash.

  • dataset_id (str) – The dataset identifier from ERDDAP.

  • constraints (dict, optional) – A mapping of conditions and constraints.

  • chunks (None or int or dict or str, optional) – If chunks is provided, it is used to load the new dataset into dask arrays. chunks=-1 loads the dataset with dask using a single chunk for all arrays. chunks={} loads the dataset with dask using engine preferred chunks if exposed by the backend, otherwise with a single chunk for all arrays. chunks=’auto’ will use dask auto chunking taking into account the engine preferred chunks. See dask chunking for more details.

  • xarray_kwargs (dict, optional) – Arguments to be passed to the xarray open_dataset function.

Examples

Sources are normally returned from a catalog object, but a source can be instantiated directly:

>>> source = GridDAPSource("https://coastwatch.pfeg.noaa.gov/erddap", "charmForecast1day",
... chunks={"time": 1})

Getting an xarray dataset from the source object:

>>> ds = source.to_dask()

Once the dataset object has been instantiated, the dataset’s full metadata is available in the source.

>>> source.metadata
{'catalog_dir': '',
'dims': {'time': 1182, 'latitude': 391, 'longitude': 351},
'data_vars': {'pseudo_nitzschia': ['time', 'latitude', 'longitude'],
'particulate_domoic': ['time', 'latitude', 'longitude'],
'cellular_domoic': ['time', 'latitude', 'longitude'],
'chla_filled': ['time', 'latitude', 'longitude'],
'r555_filled': ['time', 'latitude', 'longitude'],
'r488_filled': ['time', 'latitude', 'longitude']},
'coords': ('time', 'latitude', 'longitude'),
'acknowledgement':
    ...

Warning

The read() method will raise a NotImplemented exception because the standard intake interface has the result read entirely into memory. For gridded datasets this should not be allowed, reading the entire dataset into memory can overwhelm the server, get the client blacklisted, and potentially crash the client by exhausting available system memory. If a client truly wants to load the entire dataset into memory, the client can invoke the method ds.load() on the Dataset object.

Methods

close()

Close open descriptors.

read_chunked()

Return an xarray dataset (optionally chunked).

read_partition(i)

Fetch one chunk of the array for a variable.

to_dask()

Return an xarray dataset (optionally chunked).

close()[source]

Close open descriptors.

read_chunked() Dataset[source]

Return an xarray dataset (optionally chunked).

read_partition(i: Tuple[str, ...]) ArrayLike[source]

Fetch one chunk of the array for a variable.

to_dask() Dataset[source]

Return an xarray dataset (optionally chunked).