Support for specifying multiple input prefixes in Aggregation Service CreateJob requests: Feedback Requested #76

ghanekaromkar · 2024-10-03T21:24:53Z

Problem

The aggregation service team has heard feedback (#67) from our partners about the difficulties they face in submitting job to aggregation service. This is due to the rigid input prefix requirements of the API that might not work well with the data organization pattern chosen by an adtech.

Currently, aggregation service's CreateJob API accepts a single value for the parameter input_data_blob_prefix. This parameter refers to the input prefix where the aggregatable reports are stored. The aggregation job includes all the reports stored under this prefix hierarchy when generating the summary report.
This seems to be a problem in situations when only a subset of the reports stored under a prefix are intended to be used in an aggregation job, yet the said prefix is the only common prefix for all such reports. An example of such a situation can be found in issue #67.
The current workaround for this issue (suggested here) is that adtechs need to reorganize their reports such that all the reports under a given input prefix are only the ones that are intended to be aggregated in the given aggregation job. When adtech’s data organization pattern is different from their querying pattern, they need to either copy the reports or move them around to meet the above requirements. This is error prone, time consuming and, in case of report copying, also leads to higher storage costs.

Proposal

To address this problem, we are proposing the following changes to the CreateJob API.

Introduction of a new field input_data_blob_prefix_list which would accept a list of input prefixes under which the aggregatable reports for a job are stored. Aggregation worker would read all reports stored under each of the prefixes provided in the list and include their contributions in the generated summary report.
This field would accept a list with a maximum size of 50 entries. This number can be increased in future based on adtech feedback.
[Backwards compatibility]
- We would be introducing this field in a backwards compatible way. This will be an optional field in the current version of the API.
- Exactly one of the two fields input_data_blob_prefix and input_data_blob_prefix_list would be required to be specified in the CreateJob request.
- Users of aggregation service who do not see the need to specify multiple input prefixes can continue using the field input_data_blob_prefix.

API changes

Current CreateJob API request payload schema

{
    // other fields of CreateJob request

    "input_data_bucket_name": "my-bucket",
    "input_data_blob_prefix": "my-month/my-day/",
    "job_parameters": {
          // fields inside this json object
     }
}

Proposed CreateJob API request payload schema

{
    // other fields of CreateJob request

    "input_data_bucket_name": "my-bucket",
    "input_data_blob_prefix": "my-month/my-day", //should be absent if input_data_blob_prefix_list is provided
    "input_data_blob_prefix_list": ["my-month/my-day/hour-00",
                                    "my-month/my-day/hour-01",
                                    "my-month/my-day/hour-02"
                                    "my-month/my-day/hour-03"
                                    "my-month/my-day/hour-04"], //optional field
     "job_parameters": {
         // fields inside this json object
     }
}

Feedback request

If you have any feedback on the above proposal, please let us know by responding to this issue.

We would really appreciate your feedback on these API changes. In particular:

Would adtechs find this feature useful?
We're proposing a limit of 50 on the number of input prefixes. Do adtechs find this limit sufficient for their use cases?

The text was updated successfully, but these errors were encountered:

wojtek-rybak · 2024-10-05T15:31:44Z

Thank you for sharing the proposal. Here are my thoughts:

Usefulness of the feature for adtechs: This feature would be very valuable for RTB House. It would give us more flexibility in specifying data input, enabling us to implement more sophisticated aggregation patterns.
Limit of 50 input prefixes: The limit seems sufficient.

ruclohani added the feedback requested Feedback Requested from customers label Oct 3, 2024

ghanekaromkar mentioned this issue Oct 4, 2024

Allow multiple prefixes in aggregation jobs #67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for specifying multiple input prefixes in Aggregation Service CreateJob requests: Feedback Requested #76

Support for specifying multiple input prefixes in Aggregation Service CreateJob requests: Feedback Requested #76

ghanekaromkar commented Oct 3, 2024

wojtek-rybak commented Oct 5, 2024

Support for specifying multiple input prefixes in Aggregation Service CreateJob requests: Feedback Requested #76

Support for specifying multiple input prefixes in Aggregation Service CreateJob requests: Feedback Requested #76

Comments

ghanekaromkar commented Oct 3, 2024

Problem

Proposal

API changes

Current CreateJob API request payload schema

Proposed CreateJob API request payload schema

Feedback request

wojtek-rybak commented Oct 5, 2024