User Guide¶
Querying¶
A catalog can be generated by passing your desired query parameters directly
with the kwargs_search keyword argument. This object gets passed to
erddapy
import intake_erddap
search = {
"min_lon": -180,
"max_lon": -156,
"min_lat": 50,
"max_lat": 66,
"min_time": "2021-04-01",
"max_time": "2021-04-02",
}
cat = intake_erddap.ERDDAPCatalogReader(server_url, kwargs_search=search)
The same query can also be specified using the constructor keyword arguments:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
bbox=(-180., 50., -156., 66.),
start_time=datetime(2021, 4, 1),
end_time=datetime(2021, 4, 2),
)
The catalog supports querying for datasets that contain a variable with a
particular
CF Standard Name
. Clients can specify the standard name queries with either the
kwargs_search keyword argument, or the standard_names keyword argument:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
kwargs_search={
"standard_name": "air_temperature",
},
)
or:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
standard_names=["air_temperature"],
)
Multiple standard name values can be queries which will return all datasets containing at least one of the queried standard names:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
standard_names=["air_temperature", "air_pressure"],
)
In cases where standard names are not sufficient, clients can query using the variable name as it appears in ERDDAP:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
variable_names=["Pair", "temp"],
)
Lastly, ERDDAP offers a plaintext search option. Clients can query for datasets containing a plaintext search term:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
search_for=["ioos", "aoos", "NOAA"],
)
This can also be useful if you know the name of the station or stations you want to make a catalog from
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
search_for=["aoos_204"],
)
Querying with AND¶
Sometimes, clients may want to find only datasets that match all of the query
terms exactly. This can be achieved with the query_type keyword argument:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
standard_names=["air_temperature", "air_pressure"],
query_type="intersection",
)
This will return only datasets that have both air_temperature and
air_pressure as standard names associated with variables.
Constraints¶
Use the input option use_source_constraints=True to use any relevant parameter from “kwargs_search” constraints in the query. This will pass a start_time on so that it will limit the time returned in the data to the start_time, for example:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
bbox=(-180., 50., -156., 66.),
start_time=datetime(2021, 4, 1),
end_time=datetime(2021, 4, 2),
use_source_constraints=True,
)
Dropping bad values¶
Use the dropna option to drop rows with NaN values in the data columns:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
dropna=True,
)
Note that this is an alpha feature because it uses logic that identifies columns of data as opposed to coordinates and axes on its own to decide from which columns to drop NaN values. This has not been thoroughly tested.
Selecting which columns of data to return¶
Use the variables option to select which columns of data to return. This is useful when you only need a subset of the data columns:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
variables=["sea_water_temperature"],
)
Variables time, latitude, longitude, and z are always additionally returned.
Mask due to quality flags¶
If mask_failed_qartod=True` and *_qc_agg columns associated with the data columns are available, data values associated with QARTOD flags other than 1 and 2 will be nan’ed out. Has not been thoroughly tested.
Simple caching¶
You can using simple caching through fsspec if you input cache_kwargs such as the following:
cat = intake_erddap.ERDDAPCatalogReader(
server=server_url,
cache_kwargs=dict(cache_storage="/tmp/fnames/", same_names=True),
)
This would have the effect of caching the data locally in the /tmp/fnames/ directory so it doesn’t have to be downloaded next time. The same_names option is useful if you want to cache the data with the same name as the data source for clarity.