-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easy selection of user defined catalogs #245
Comments
@marc-white @rbeucher what are your opinions on this? |
Sure, we could add a placeholder for user-built catalogs. What about adding a temporary source location to the main catalog at runtime? I’m thinking of our discussion with the Ocean team today—it sounds like their new experiments will have intake-ESM catalogs. It would be good to integrate these dynamically into the main catalog. What do you think? |
The main issue I see is that we have to 'pre-can' all of the information about the user's potential catalog - what guarantee do we have that this catalog information will match whatever the user comes up with? Secondly, how would a user build their catalog? Would we need to provide updates to the existing Thirdly, is this actually necessary to do within the
I'm not sure this is technically feasible - the catalog content is read from the |
I was envisaging a situation where the user would populate
I think we would jut leave it up to the user to build their catalog however they see fit - eg. modifying
Yeah, this is a really good point. Perhaps it would be better to direct users to load custom catalogs with
I think it might actually be plausible to do this - I think we would just have to update the intake dataframe catalog driver to support multi-file catalogs. This would look something like sources:
access_nri:
args:
columns_with_iterables:
- model
- realm
- frequency
- variable
mode: r
name_column: name
path:
- /g/data/xp65/public/apps/access-nri-intake-catalog/{{version}}/metacatalog.csv
- $MY_EPHEMERAL_CATALOG.csv
yaml_column: yaml
description: ACCESS-NRI intake catalog
driver: intake_dataframe_catalog.core.DfFileCatalog
metadata:
storage: gdata/fs38+gdata/oi10+gdata/tm70
version: '{{version}}'
parameters:
version:
default: v0.1.3
description: Catalog version
type: str Probably it would be quite a bit more involved than that to actually implement, but I think it should be doable. |
Yup, this is what I had in mind.
Yeah, absolutely.
I think that approach would minimize confusion between the 'real' catalog and the user's own Frankenstein's monster version, especially once users start sharing with each other (see below).
Yeah, thats an excellent point. |
Is your feature request related to a problem? Please describe.
This builds on the solution to #191 in #243.
With the changes introduced by #243, users are able to build & query their own catalogs by placing a catalog file in
$HOME/.access_nri_intake_catalog/catalog.yaml
, and this will be preferentially loaded over the default catalog at/g/data/xp65/public/apps/access-nri-intake-catalog/catalog.yaml
.This default catalog looks something like (NB. using the old version numbering)
and a user defined catalog will look something like
where $DIR is a directory the user has placed their metacatalog in.
These changes represent a big step forward in terms of the users ability to use bespoke catalogs. However, the architecture of intake is such that presently, if a user wished to compare catalogs/data obtained from catalogs, it would be necessary to:
$ mv ~/.access_nri_intake_catalog/catalog.yaml ~/._access_nri_intake_catalog/catalog.yaml
.This might create issues for users who wish to compare their custom catalog with the default catalog, and it can be made easier.
Describe the feature you'd like
Intake allows a single catalog to describe multiple sources: ie, the two catalogs above could be combined as
This would then allow the user to perform the following operations:
Doing so requires an additional entry point for the user_def catalog, & so we would additionally require the following changes in
pyproject.toml
:and in
src/access_nri_intake/data/__init__.py
Entry points are created at package build time & fixed, so realistically we would probably have to limit users to a single user defined catalog, unless we figure out a way to do some black magic to circumvent that limitation.
Describe alternatives you've considered
Leave as is - this might be an unnecessary addition.
Additional context
See #244 for sample implementation.
The text was updated successfully, but these errors were encountered: