-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor integration tests to remove random collection sampling #749
base: main
Are you sure you want to change the base?
Conversation
"page_num": 1, | ||
"page_size": 100, | ||
"sort_key[]": "-usage_score", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copied this out of my browser dev tools' network tab after doing a similar query in earthdata search client. I'm sure we can run an equivalent query with earthaccess.
Worked on this with @itcarroll during hack day. Notes: #755 |
We considered the usefulness of random sampling tests. We don't think we should be doing this for integration tests, especially when they execute on every PR. We could, for example, run them on a cron job and create reports, but that seems like overkill when we have a community to help us identify datasets and connect with the right support channel if there's an issue with the provider. We may still consider a cron job for, for examle, recalculating the most popular datasets on a monthly basis. |
We decided we can hardcode a small number and expand the list as we go. Other things like random tests on a cron or updating the list of popular datasets on a cron can be addressed separately. |
d79f48f
to
194fd29
Compare
@betolink will take on work to update @mfisher87 will continue working on |
We will update the .txt files to .csv files and add boolean field for "does the collection have a EULA?" and then we'll use that field to mark those tests as |
…into integration-tests-refactor
Two major milestones:
Thanks to @DeanHenze and @Sherwin-14 for collaborating on this on today's hackathon! |
@@ -244,6 +244,9 @@ def _repr_html_(self) -> str: | |||
granule_html_repr = _repr_granule_html(self) | |||
return granule_html_repr | |||
|
|||
def __hash__(self) -> int: | |||
return hash(self["meta"]["concept-id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betolink @chuckwondo This seems reasonable to me, but please validate me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it for like 5 minutes, this is obviously a bad idea. This class is subclassing dict
. We'd need to implement like a frozendict
.
Also still TODO: Run generate.py in GHA on a monthly/quarterly cron and auto-open a PR with the changes to top collections? |
If we want to determine whether a collection has a EULA, this example was provided:
The metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like part of this issue may be related to work on EULAs in this issue. |
Resolves #215
cc @betolink just getting started on this. I have some sample code that can generate a list of 100 most popular collections, in order, given a provider.
Before I continue, I would like input!
Next steps may be: