-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "cost" hook in tap base #348
Comments
Just noting that I'm going to work on this |
@laurentS - Sounds great! Are you okay with a simple log line printed or were you thinking about emitting another machine-readable manner? In theory, Singer metrics could be used (additionally or instead of the proposed human-readable print) and could be more machine-readable. https://hub.meltano.com/singer/spec#metrics Another approach is to add to STATE, but since STATE is cumulative of all executions, I'm not sure that's a great fit. (Happy to discuss though.) |
@laurentS - The one spec consideration is to perhaps introduce a domain label for the cost and to make this support So, there could be a |
@aaronsteers about cost, I indeed started with a cost function that can return cost along any arbitrary dimensions ("domains", to reuse your words), in the form of a dict that each tap developer can define. For instance About logging the result, I haven't figured out the best way to do it yet. My thinking is that a single summary value at the end of the tap run would be more useful than one line per API call, ideally broken down per stream. I'd love to print out the result like a metric, in JSON format, so that users can optionally parse this automatically if they want to. Singer metrics look like a nice way to achieve this, but maybe a human readable log line is enough to get started, and we can always improve on this if there's a need for it? |
@laurentS - sounds good to me! 👍 |
Closed by #704 |
Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/350
Originally created by @laurentS on 2022-03-20 17:19:25
Summary
As a user of a tap, I would like to know how much API "cost" was caused by a tap run.
For instance, the github API has per-hour usage limits, where 1 REST API call costs 1, or a graphql API call has a cost that depends on the number of nodes returned.
Other APIs might charge per call.
At the end of a tap run, I would like to know how much of the resource the tap consumed.
Proposed benefits
With
tap-github
running in production, we have found it hard to track the reasons behind sudden surges in "quota exceeded" errors. Being able to identify which runs use what would help in understanding the cause of such issues.For billable APIs/resources, such a feature could also help track actual dollar costs.
Making these values retrievable would allow tracking them in monitoring systems, etc...
Proposal details
As it's not really possible to define how this cost is calculated at the SDK level, it would be great if the SDK provided some method that a tap could overload to calculate and accumulate said cost, something like:
This could be called by the SDK after each request returns (as the cost might depend on the content of the response). The SDK would simply keep a sum of all these results, and at the end of the run, the tap would at a minimum log a line like:
or possibly export this value in the state as another metric?
Again, the final method could simply be a no-op by default that each tap can override to implement appropriate behaviour.
Best reasons not to build
By default, these methods can simply do a
pass
, and therefore have only negligible performance impact. There is no behaviour change related to this, unless the result is exported in state, which might cause issues downstream depending on the target used.The text was updated successfully, but these errors were encountered: