Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Source Google Analytics to Beta #10938

Closed
19 tasks done
sherifnada opened this issue Mar 8, 2022 · 13 comments
Closed
19 tasks done

[EPIC] Source Google Analytics to Beta #10938

sherifnada opened this issue Mar 8, 2022 · 13 comments

Comments

@sherifnada
Copy link
Contributor

sherifnada commented Mar 8, 2022

The main purpose of this issue is to track release progress for the connector.

According to checklist:

Acceptance criteria
  • Create a checklist document that should rbe placed on the Google Drive
  • Checklist filled up by engineering team
  • Collect and list all tech debt-related issues in the section below
  • Create/Revise MLP document and place it here
  • Move checklist to the unblockers team
  • Reviewed/Updated by unblockers team
  • Move checklist to the product team for review
  • Reviewed/Updated by product team
  • Update connector documentation according to the comments in the checklist
  • Certification completed
@lazebnyi lazebnyi moved this to Prioritized for scoping in GL Roadmap Mar 23, 2022
@lazebnyi lazebnyi changed the title [EPIC] Google Analytics to Beta [EPIC] Source Google Analytics to Beta Mar 23, 2022
@oustynova oustynova moved this from Prioritized for scoping to Scoping in progress in GL Roadmap Mar 28, 2022
@oustynova oustynova moved this from Scoping in progress to Implementation in progress in GL Roadmap Mar 29, 2022
davydov-d added a commit that referenced this issue Apr 7, 2022
davydov-d added a commit that referenced this issue Apr 7, 2022
davydov-d added a commit that referenced this issue Apr 13, 2022
* #10938 fix docs and spec

* #10938 add pr

* auto-bump connector version

Co-authored-by: Octavia Squidington III <[email protected]>
davydov-d added a commit that referenced this issue Apr 19, 2022
davydov-d added a commit that referenced this issue Apr 19, 2022
davydov-d added a commit that referenced this issue Apr 20, 2022
* #10938 doc and specs minor fixes

* #10938 add changelog item

* auto-bump connector version

Co-authored-by: Octavia Squidington III <[email protected]>
@misteryeo
Copy link
Contributor

@davydov-d Reviewed the checklist again - didn't see a Docs PR to review, can you share?

@ChristopheDuong
Copy link
Contributor

@sherifnada @misteryeo , should this issue #12013 be included in this epic too?

@davydov-d
Copy link
Collaborator

@davydov-d Reviewed the checklist again - didn't see a Docs PR to review, can you share?

@misteryeo Sure, but it's merged already.. I was not aware you review PRs as well
https://github.com/airbytehq/airbyte/pull/12150/files

@misteryeo
Copy link
Contributor

misteryeo commented Apr 22, 2022

@misteryeo Sure, but it's merged already.. I was not aware you review PRs as well
https://github.com/airbytehq/airbyte/pull/12150/files

@davydov-d Oh I only asked because we were discussing the 'Window in days' field and whether 'data chunk' should be used. On that field specifically, I'm still unclear what exactly 'Window in days' means even with the description - what does 'stream slice' mean? Are there external docs that we can link to so we can make this clearer?

@sherifnada @misteryeo , should this issue #12013 be included in this epic too?

Yes, thanks for bringing this up @ChristopheDuong - was reviewing Thalia's response. Saw there's some chatter happening over there but @davydov-d let's make sure to implement this as part of this verification process.

@davydov-d
Copy link
Collaborator

@misteryeo Sure, but it's merged already.. I was not aware you review PRs as well
https://github.com/airbytehq/airbyte/pull/12150/files

@davydov-d Oh I only asked because we were discussing the 'Window in days' field and whether 'data chunk' should be used. On that field specifically, I'm still unclear what exactly 'Window in days' means even with the description - what does 'stream slice' mean? Are there external docs that we can link to so we can make this clearer?

No, stream slice is our internal technical term. As a rule, under the hood the connector makes a series of requests to a resource, each of which retrieves a batch/chunk/slice of data during the sync. The window in days defines a date range filter for every request. For example, when start_date = 2020-01-01 and window_in_days=30, request filters will look like
2020-01-01..2020-01-30
2020-01-31..2020-02-29
2020-03-01..2020-03-30
How would you like this parameter to be descripted? I guess it should be something laconic, not that large I've written here ☝️

@sherifnada @misteryeo , should this issue #12013 be included in this epic too?

Yes, thanks for bringing this up @ChristopheDuong - was reviewing Thalia's response. Saw there's some chatter happening over there but @davydov-d let's make sure to implement this as part of this verification process.

ok

@misteryeo
Copy link
Contributor

And to confirm @davydov-d - there's no external documentation referencing this field? This is an Airbyte created field?

@davydov-d
Copy link
Collaborator

And to confirm @davydov-d - there's no external documenta

well, we can reference this one https://developers.google.com/analytics/devguides/reporting/core/v4/rest/v4/DateRange
window_in_days is a number of days between startDate and endDate

@misteryeo
Copy link
Contributor

misteryeo commented Apr 26, 2022

@davydov-d but the field itself, is describing sampling - correct? https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-google-analytics-v4/source_google_analytics_v4/source.py#L25

So what the field is doing, is determining how big of a sample size of data a user would want to sync each time based on a range of days from the start_date. The larger this day range, the large the sample and the slower it will be but the more accurate it will be. Does this seem correct?

Clarifying because this is opposite of what's currently described in the field description today:

Window in days (Optional) - The amount of days each stream slice would consist of beginning from start_date. Bigger the value - faster the fetch. (Min=1, as for a Day; Max=364, as for a Year). (e.g. 30, 60, 90, 120, 200, 364)

@davydov-d
Copy link
Collaborator

@davydov-d but the field itself, is describing sampling - correct? https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-google-analytics-v4/source_google_analytics_v4/source.py#L25

So what the field is doing, is determining how big of a sample size of data a user would want to sync each time based on a range of days from the start_date. The larger this day range, the large the sample and the slower it will be but the more accurate it will be. Does this seem correct?

Clarifying because this is opposite of what's currently described in the field description today:

Window in days (Optional) - The amount of days each stream slice would consist of beginning from start_date. Bigger the value - faster the fetch. (Min=1, as for a Day; Max=364, as for a Year). (e.g. 30, 60, 90, 120, 200, 364)

well, yes but no :)
According to the doc, sampling level is defined by the samplingLevel parameter, which we do not expose in the input config(should we?), so the DEFAULT level is used meaning speed and accuracy of the response are balanced. The response speed depends on both accuracy and the date range. So, in case we have a large date range, we can speed up the response time by decreasing the date range, leaving the accuracy approximately the same. On the other hand, it also means increasing the number of requests. In other words, the window_in_days field indirectly affects sampling(only after it kicks in) but that's not the primary goal of this field.

I'm not so sure about Bigger the value - faster the fetch. Bigger the value - slower the single response, fewer the number of requests. Not confident about the total speed of the whole sync, so I would omit this phrase

@misteryeo
Copy link
Contributor

Gotcha, let's update the description to - does that resonate with you @davydov-d:

Data request window (Optional) - The amount of data batched by the number of days. The bigger the value, the bigger the batch size and the lower the API requests made.

@davydov-d
Copy link
Collaborator

davydov-d commented Apr 27, 2022

Gotcha, let's update the description to - does that resonate with you @davydov-d:

Data request window (Optional) - The amount of data batched by the number of days. The bigger the value, the bigger the batch size and the lower the API requests made.

sounds good
@misteryeo have a look please #12385
are we done with the checklist here?

@sherifnada
Copy link
Contributor Author

@davydov-d yes please consider the checklist done

@davydov-d
Copy link
Collaborator

@davydov-d davydov-d moved this from Implementation in progress to In review (Airbyte) in GL Roadmap May 3, 2022
@davydov-d davydov-d moved this from In review (Airbyte) to Done in GL Roadmap May 3, 2022
suhomud pushed a commit that referenced this issue May 23, 2022
* #10938 doc and specs minor fixes

* #10938 add changelog item

* auto-bump connector version

Co-authored-by: Octavia Squidington III <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

6 participants