This repository has been archived by the owner on Aug 4, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Refactor Metropolitan Museum of Art to use ProviderDataIngester #674
Refactor Metropolitan Museum of Art to use ProviderDataIngester #674
Changes from 8 commits
c7e1903
eca69f5
f1988ce
62a02ff
0de8ae7
db7a861
c42c8bc
75aaf10
16d51e0
ae99654
6ee7f87
f431932
67f6673
8ac2941
b54f605
bc55683
5cca07b
bb98bdd
57ec40e
1c2e0f5
e441b79
e78b6d9
8bec026
248b1a1
90d3727
3658dc4
1dcc93d
a1670e5
3f7765e
17d0aff
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be great to go into the docstring!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're comfortable with it, I think it'd be okay to leave this at the 1 second default rather than bumping it up to 3. We run this DAG every day and haven't encountered any issues or rate limiting. I'd love to keep your comments in the docstring though, just so we know if we do encounter rate limiting going forward we have a known way to address it.
Also, I'll make an issue for changing the delay from a private to a public attribute. There are definitely cases where we'll want to bump it up or adjust it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, I made https://github.com/WordPress/openverse-catalog/issues/687 as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't incremented, so this will always be the number of ids retrieved in that batch, is that right? Would it make more sense to add to the total each time, or else change the wording of the log message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The met museum returns a json with two things: a list of object ids, and the length of that list. The additional API calls are just to get the details about any given object ID, not to get another list of object IDs. So, I think it's ok to just take this from the source once for each dag run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is reporting once for every batch, indicating the number of ids in that batch as opposed to the total number ingested by the run. I see it multiple times in the logs for a particular dagrun. Is that the intention, or am I wrong about what that number means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh... Maybe I'm confused about what a batch means and how it's functioning here. My sense is that there should only be one batch per dag-run. I'll take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this enforces the one batch per dagrun thing! 3f7765e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool :) What do you think about also reporting the final number at the end of ingestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but this and other methods that don't access
self
can be declared static.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, what are the benefits of making that declaration? The main one I'm familiar with is being able to call them from outside a specific instance of the class. Would that make testing more efficient in this case? Are there other benefits I'm not thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually a good practice to make functions that you know won't need to access or modify the class's state via
self
,staticmethod
s. This makes the scope clear to folks reading the function, can make testing easier in certain cases (since you don't need to instantiate a class in order to test that specific function), and prevents cases where a function that seems like it should be static but is not explicitly set as such has access toself
and can thus modify class state (either intentionally or unintentionally).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Madison usually says that it's better to use a
set
when checking within
:)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 Yes, that makes total sense, I'll fix it tonight. Though hopefully for a static list of two distinct items it wouldn't make a huge difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, you know me too well @obulat 😁 And you're right @rwidom, it won't make a huge difference here but it's a good habit to get into because it can have significant performance benefits when there are tons of iterations happening!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it actually does make a noticeable difference! Who knew? ~7%!