Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for specifying multiple input prefixes in Aggregation Service CreateJob requests: Feedback Requested #76

Open
ghanekaromkar opened this issue Oct 3, 2024 · 1 comment
Labels
feedback requested Feedback Requested from customers

Comments

@ghanekaromkar
Copy link
Contributor

Problem

The aggregation service team has heard feedback (#67) from our partners about the difficulties they face in submitting job to aggregation service. This is due to the rigid input prefix requirements of the API that might not work well with the data organization pattern chosen by an adtech.

Currently, aggregation service's CreateJob API accepts a single value for the parameter input_data_blob_prefix. This parameter refers to the input prefix where the aggregatable reports are stored. The aggregation job includes all the reports stored under this prefix hierarchy when generating the summary report.
This seems to be a problem in situations when only a subset of the reports stored under a prefix are intended to be used in an aggregation job, yet the said prefix is the only common prefix for all such reports. An example of such a situation can be found in issue #67.
The current workaround for this issue (suggested here) is that adtechs need to reorganize their reports such that all the reports under a given input prefix are only the ones that are intended to be aggregated in the given aggregation job. When adtech’s data organization pattern is different from their querying pattern, they need to either copy the reports or move them around to meet the above requirements. This is error prone, time consuming and, in case of report copying, also leads to higher storage costs.

Proposal

To address this problem, we are proposing the following changes to the CreateJob API.

  • Introduction of a new field input_data_blob_prefix_list which would accept a list of input prefixes under which the aggregatable reports for a job are stored. Aggregation worker would read all reports stored under each of the prefixes provided in the list and include their contributions in the generated summary report.
  • This field would accept a list with a maximum size of 50 entries. This number can be increased in future based on adtech feedback.
  • [Backwards compatibility]
    • We would be introducing this field in a backwards compatible way. This will be an optional field in the current version of the API.
    • Exactly one of the two fields input_data_blob_prefix and input_data_blob_prefix_list would be required to be specified in the CreateJob request.
    • Users of aggregation service who do not see the need to specify multiple input prefixes can continue using the field input_data_blob_prefix.

API changes

Current CreateJob API request payload schema

{
    // other fields of CreateJob request

    "input_data_bucket_name": "my-bucket",
    "input_data_blob_prefix": "my-month/my-day/",
    "job_parameters": {
          // fields inside this json object
     }
}

Proposed CreateJob API request payload schema

{
    // other fields of CreateJob request

    "input_data_bucket_name": "my-bucket",
    "input_data_blob_prefix": "my-month/my-day", //should be absent if input_data_blob_prefix_list is provided
    "input_data_blob_prefix_list": ["my-month/my-day/hour-00",
                                    "my-month/my-day/hour-01",
                                    "my-month/my-day/hour-02"
                                    "my-month/my-day/hour-03"
                                    "my-month/my-day/hour-04"], //optional field
     "job_parameters": {
         // fields inside this json object
     }
}

Feedback request

If you have any feedback on the above proposal, please let us know by responding to this issue.

We would really appreciate your feedback on these API changes. In particular:

  1. Would adtechs find this feature useful?
  2. We're proposing a limit of 50 on the number of input prefixes. Do adtechs find this limit sufficient for their use cases?
@ruclohani ruclohani added the feedback requested Feedback Requested from customers label Oct 3, 2024
@wojtek-rybak
Copy link

Thank you for sharing the proposal. Here are my thoughts:

  1. Usefulness of the feature for adtechs: This feature would be very valuable for RTB House. It would give us more flexibility in specifying data input, enabling us to implement more sophisticated aggregation patterns.

  2. Limit of 50 input prefixes: The limit seems sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback requested Feedback Requested from customers
Projects
None yet
Development

No branches or pull requests

3 participants