Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a dtype to support fsspec MSs #209

Closed
o-smirnov opened this issue Jan 31, 2024 · 7 comments
Closed

add a dtype to support fsspec MSs #209

o-smirnov opened this issue Jan 31, 2024 · 7 comments
Assignees
Milestone

Comments

@o-smirnov
Copy link
Member

Since dask-ms apps can use an S3 backend for their MSs, the current MS dtype is not quite adequate. Introduce a new type that is fsspec-aware. @JSKenyon @sjperkins got an example of how to query an fsspec?

@o-smirnov o-smirnov self-assigned this Jan 31, 2024
@JSKenyon
Copy link
Collaborator

JSKenyon commented Jan 31, 2024

I think it is used as follows in dask-ms: https://github.com/ratt-ru/dask-ms/blob/a0043fba3eae3eabdbdd6e2fb1f22abf7d762dbb/daskms/fsspec_store.py#L17

Edit: Your use may actually be simpler as you probably don't need to know whether it is zarr, parquet or casa table backed.

@o-smirnov
Copy link
Member Author

Just as a note to self before I forget: the reason this matters (as opposed to why just not make the MS name input a plain string) is that the singularity backend needs to know which directories need to be accessed, so that they can be bound inside the container. For MSs nested under the CWD, this doesn't matter since the CWD is always bound. Where this creates a problem is if the MS is somewhere else in the directory hirearchy.

@o-smirnov
Copy link
Member Author

fsspec looks overly complicated for what I need, so rather not add the extra dependency. All I need to know is, is a given string a dask-ms URL or a path to a local file?

A simple regex will do. I just need to know what the possibilities to match are. Hence, question for @JSKenyon @sjperkins, is it true that all dask-ms URLs look like foo::bar://baz or bar://baz?

@sjperkins
Copy link
Collaborator

This is probably a reasonable subset:

  • /path/to/wsrt.ms
  • file://path/to/wsrt.ms
  • s3://host.address/path/to/wsrt.zarr

@o-smirnov
Copy link
Member Author

Thanks. Finally, what's a good name for this dtype? MSX? DaskMS? DMS?

@o-smirnov o-smirnov added this to the Release 2.0 milestone Feb 13, 2024
@sjperkins
Copy link
Collaborator

I think the above are fairly generic url schema's. I wouldn't say they're dask-ms specific. Would a url dtype work?

@sjperkins
Copy link
Collaborator

I think the above are fairly generic url schema's. I wouldn't say they're dask-ms specific. Would a url dtype work?

Thinking about this a bit more, perhaps uri would be better than url as it references both local and remote datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants