You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The aggregation service team has heard feedback (#67) from our partners about the difficulties they face in submitting job to aggregation service. This is due to the rigid input prefix requirements of the API that might not work well with the data organization pattern chosen by an adtech.
Currently, aggregation service's CreateJob API accepts a single value for the parameter input_data_blob_prefix. This parameter refers to the input prefix where the aggregatable reports are stored. The aggregation job includes all the reports stored under this prefix hierarchy when generating the summary report.
This seems to be a problem in situations when only a subset of the reports stored under a prefix are intended to be used in an aggregation job, yet the said prefix is the only common prefix for all such reports. An example of such a situation can be found in issue #67.
The current workaround for this issue (suggested here) is that adtechs need to reorganize their reports such that all the reports under a given input prefix are only the ones that are intended to be aggregated in the given aggregation job. When adtech’s data organization pattern is different from their querying pattern, they need to either copy the reports or move them around to meet the above requirements. This is error prone, time consuming and, in case of report copying, also leads to higher storage costs.
Proposal
To address this problem, we are proposing the following changes to the CreateJob API.
Introduction of a new field input_data_blob_prefix_list which would accept a list of input prefixes under which the aggregatable reports for a job are stored. Aggregation worker would read all reports stored under each of the prefixes provided in the list and include their contributions in the generated summary report.
This field would accept a list with a maximum size of 50 entries. This number can be increased in future based on adtech feedback.
[Backwards compatibility]
We would be introducing this field in a backwards compatible way. This will be an optional field in the current version of the API.
Exactly one of the two fieldsinput_data_blob_prefix and input_data_blob_prefix_list would be required to be specified in the CreateJob request.
Users of aggregation service who do not see the need to specify multiple input prefixes can continue using the field input_data_blob_prefix.
API changes
Current CreateJob API request payload schema
{
// other fields of CreateJob request
"input_data_bucket_name": "my-bucket",
"input_data_blob_prefix": "my-month/my-day/",
"job_parameters": {
// fields inside this json object
}
}
Proposed CreateJob API request payload schema
{
// other fields of CreateJob request
"input_data_bucket_name": "my-bucket",
"input_data_blob_prefix": "my-month/my-day", //should be absent if input_data_blob_prefix_list is provided
"input_data_blob_prefix_list": ["my-month/my-day/hour-00",
"my-month/my-day/hour-01",
"my-month/my-day/hour-02"
"my-month/my-day/hour-03"
"my-month/my-day/hour-04"], //optional field
"job_parameters": {
// fields inside this json object
}
}
Feedback request
If you have any feedback on the above proposal, please let us know by responding to this issue.
We would really appreciate your feedback on these API changes. In particular:
Would adtechs find this feature useful?
We're proposing a limit of 50 on the number of input prefixes. Do adtechs find this limit sufficient for their use cases?
The text was updated successfully, but these errors were encountered:
Thank you for sharing the proposal. Here are my thoughts:
Usefulness of the feature for adtechs: This feature would be very valuable for RTB House. It would give us more flexibility in specifying data input, enabling us to implement more sophisticated aggregation patterns.
Limit of 50 input prefixes: The limit seems sufficient.
Problem
The aggregation service team has heard feedback (#67) from our partners about the difficulties they face in submitting job to aggregation service. This is due to the rigid input prefix requirements of the API that might not work well with the data organization pattern chosen by an adtech.
Currently, aggregation service's CreateJob API accepts a single value for the parameter
input_data_blob_prefix
. This parameter refers to the input prefix where the aggregatable reports are stored. The aggregation job includes all the reports stored under this prefix hierarchy when generating the summary report.This seems to be a problem in situations when only a subset of the reports stored under a prefix are intended to be used in an aggregation job, yet the said prefix is the only common prefix for all such reports. An example of such a situation can be found in issue #67.
The current workaround for this issue (suggested here) is that adtechs need to reorganize their reports such that all the reports under a given input prefix are only the ones that are intended to be aggregated in the given aggregation job. When adtech’s data organization pattern is different from their querying pattern, they need to either copy the reports or move them around to meet the above requirements. This is error prone, time consuming and, in case of report copying, also leads to higher storage costs.
Proposal
To address this problem, we are proposing the following changes to the CreateJob API.
input_data_blob_prefix_list
which would accept a list of input prefixes under which the aggregatable reports for a job are stored. Aggregation worker would read all reports stored under each of the prefixes provided in the list and include their contributions in the generated summary report.input_data_blob_prefix
andinput_data_blob_prefix_list
would be required to be specified in the CreateJob request.input_data_blob_prefix
.API changes
Current CreateJob API request payload schema
Proposed CreateJob API request payload schema
Feedback request
If you have any feedback on the above proposal, please let us know by responding to this issue.
We would really appreciate your feedback on these API changes. In particular:
The text was updated successfully, but these errors were encountered: