Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flesh out actual tests #6

Open
rabernat opened this issue May 14, 2021 · 0 comments
Open

Flesh out actual tests #6

rabernat opened this issue May 14, 2021 · 0 comments

Comments

@rabernat
Copy link
Member

@nbren12 got us started in #2 with some basic tests: https://github.com/pangeo-data/pangeo-integration-tests/blob/main/test_gcs.py

These needed to be expanded to cover the use cases discussed in #1, which I quite here

Some questions we need to resolve to move forward with this idea, and my initial responses, are:

  • What are the workflows we want to test?
    • Write random data to cloud
    • Read back data
    • Copy data
    • Rechunk data
  • How big does the test need to be in order to be realistic?
    • My sense: > 100 GB
  • Which combinations of libraries do we want to include?
    • dask (w/o xarray)
    • xarray (via dask)
    • gcsfs
    • s3fs
    • adlfs
    • rechunker

Also, my experience is that coarse-graining operations tend to cause many problems. Unlike a time-mean the output will be too large for memory, but the data reduction is enough that the input and output chunk-sizes should differ.

Generally, I think "writes" are less robust than "reads", but the latter is more frequently used by this community.

How big does the test need to be in order to be realistic?

I think so, but I need a better sense in the error rate per GB/HTTP request

A related issue is how do we want to handle Dask. Ideally we would parametrize the tests over different Dask schedulers, including distributed schedulers (see #4 (comment)). Noah mentioned Apache Beam in #1. Do we want to include Beam here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant