Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proto definition for simulator synthetic data specification #1037

Merged
merged 6 commits into from
Jun 6, 2023

Conversation

jcorilla
Copy link
Contributor

No description provided.

@wfa-reviewable
Copy link

This change is Reviewable

@jcorilla jcorilla force-pushed the jcorilla-synthetic-data-proto branch 2 times, most recently from 782cc12 to da04efd Compare May 30, 2023 21:57
@jcorilla jcorilla marked this pull request as ready for review May 30, 2023 22:02
Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @jcorilla and @stevenwarejones)


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 34 at r1 (raw file):

  int64 begin = 1;
  int64 end = 2;
}

nit: maybe take a cue from Kotlin and use the same naming pattern as ClosedRange

Suggestion:

message VidRange {
  int64 start = 1;
  int64 end_inclusive = 2;
}

src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 40 at r1 (raw file):

  VidInterval vid_interval = 1;

  // For Age, Gender, Social Grade ...

The summary fragment should actually describe the field. e.g. "Set of fields that come from population data."


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 43 at r1 (raw file):

  // These need to be assigned at the VID level here. When generating a
  // SimulatorSyntheticData you need to import the necessary EventTemplate.
  repeated google.protobuf.FieldDescriptorProto population_fields = 2;

Why did we go with FieldDescriptorProto instead of CEL expression (or basically any string syntax for field traversal, e.g. person.age_group)? I don't think there was a tradeoff discussion.

With FieldDescriptorProto, you have the benefit of being really specific and typed when it comes to access and you don't need to do any parsing/evaluating the expression. The downside is that you're sticking likely repeated data into the API. You still need to do some kind of reflective traversal to find the field. There's also no way to distinguish multiple instances of a field (e.g. if you have more than one field using the same message type and you're trying to target a subfield of that message). The other benefit to using a string is that you can use a Map for the values with the string as the key (rather than correlated lists).

It's fine if we're okay with these tradeoffs, I just want to call them out. Note that I wouldn't bother using the CEL library if we were using string expressions since the parsing of an expression that just evaluates to a field value is trivial.

@kungfucraig

Code quote:

google.protobuf.FieldDescriptorProto

src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 64 at r1 (raw file):

// Each Synthetic event group spec indicates which VID intervals are reached.
message SyntheticEventGroupSpec {
  string property_under_measurement = 1;

What is this supposed to be? A DataProvider resource name?

@kungfucraig kungfucraig self-requested a review May 31, 2023 20:24
Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @jcorilla, @SanjayVas, and @stevenwarejones)


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 25 at r1 (raw file):

option java_multiple_files = true;

message SimulatorSyntheticDataSpec {

Please add a comment that describes this structure.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 34 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

nit: maybe take a cue from Kotlin and use the same naming pattern as ClosedRange

Though I'd much prefer Interval, it seems we've been using Range for this.

We should use half-open ranges [begin, end).


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 36 at r1 (raw file):

}

// The specification of the entire VID space.

Maybe say: "The specification of a synthetic virtual population"


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 43 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

Why did we go with FieldDescriptorProto instead of CEL expression (or basically any string syntax for field traversal, e.g. person.age_group)? I don't think there was a tradeoff discussion.

With FieldDescriptorProto, you have the benefit of being really specific and typed when it comes to access and you don't need to do any parsing/evaluating the expression. The downside is that you're sticking likely repeated data into the API. You still need to do some kind of reflective traversal to find the field. There's also no way to distinguish multiple instances of a field (e.g. if you have more than one field using the same message type and you're trying to target a subfield of that message). The other benefit to using a string is that you can use a Map for the values with the string as the key (rather than correlated lists).

It's fine if we're okay with these tradeoffs, I just want to call them out. Note that I wouldn't bother using the CEL library if we were using string expressions since the parsing of an expression that just evaluates to a field value is trivial.

@kungfucraig

From the point of view of the Simulator, having the Field Descriptors will force the proto to be compiled in, however from the point of view of a downstream EG client it doesn't help much. And maybe it's worse because you can't readily tie a field descriptor back to it's origin (afaik).

It seems like it would be better to go with the strings and make it clear that anyone who wants to work with this structure is expected to agree out of band on the event template messages.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 45 at r1 (raw file):

  repeated google.protobuf.FieldDescriptorProto population_fields = 2;

  // For Device, Location, Duration ...

Same issue here as with population fields that Sanjay mentioned.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 64 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

What is this supposed to be? A DataProvider resource name?

It's basically a short description string that identifies the fake media property we are measuring. Dynamically populating it with the simulator name could be reasonable.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 67 at r1 (raw file):

  string description = 2;

  // The VIDs reached with their frequency non-population attributes.

This doesn't make sense.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 83 at r1 (raw file):

  }

  message DateSpec {

Please add a description for this.

@jcorilla jcorilla force-pushed the jcorilla-synthetic-data-proto branch from da04efd to 052c4eb Compare June 1, 2023 17:24
Copy link
Contributor Author

@jcorilla jcorilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @kungfucraig, @SanjayVas, and @stevenwarejones)


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 25 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Please add a comment that describes this structure.

Done.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 34 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Though I'd much prefer Interval, it seems we've been using Range for this.

We should use half-open ranges [begin, end).

Switching to using half open range.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 36 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Maybe say: "The specification of a synthetic virtual population"

Done.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 40 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

The summary fragment should actually describe the field. e.g. "Set of fields that come from population data."

Done.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 43 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

From the point of view of the Simulator, having the Field Descriptors will force the proto to be compiled in, however from the point of view of a downstream EG client it doesn't help much. And maybe it's worse because you can't readily tie a field descriptor back to it's origin (afaik).

It seems like it would be better to go with the strings and make it clear that anyone who wants to work with this structure is expected to agree out of band on the event template messages.

Switching to using Strings instead of Field Descriptors.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 45 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Same issue here as with population fields that Sanjay mentioned.

Done.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 67 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

This doesn't make sense.

Done.


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 83 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

Please add a description for this.

Done.

@SanjayVas SanjayVas requested a review from kungfucraig June 1, 2023 18:07
Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @jcorilla, @kungfucraig, and @stevenwarejones)


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 34 at r1 (raw file):

Previously, jcorilla wrote…

Switching to using half open range.

I'd still recommend using end_exclusive for the name of the end field to be explicit.

See Kotlin's OpenEndRange


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 64 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

It's basically a short description string that identifies the fake media property we are measuring. Dynamically populating it with the simulator name could be reasonable.

I'm not sure I'm seeing a value in this given that this whole spec will go inside of an EventGroup which is a child of a DataProvider. (Each EDP simulator instance is bound to a single DataProvider).

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jcorilla, @SanjayVas, and @stevenwarejones)


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 64 at r1 (raw file):

Previously, SanjayVas (Sanjay Vasandani) wrote…

I'm not sure I'm seeing a value in this given that this whole spec will go inside of an EventGroup which is a child of a DataProvider. (Each EDP simulator instance is bound to a single DataProvider).

I'm good dropping it.

Copy link
Contributor Author

@jcorilla jcorilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 2 unresolved discussions (waiting on @kungfucraig, @SanjayVas, and @stevenwarejones)


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 64 at r1 (raw file):

Previously, kungfucraig (Craig Wright) wrote…

I'm good dropping it.

Done.

Copy link
Member

@SanjayVas SanjayVas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @stevenwarejones)

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jcorilla)


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 89 at r4 (raw file):

  message DateSpec {
    // Date the VIDs were reached.
    google.type.Date date = 1;

can we make this a date range?

@jcorilla jcorilla force-pushed the jcorilla-synthetic-data-proto branch from 9f2e384 to f54f107 Compare June 5, 2023 19:01
Copy link
Contributor Author

@jcorilla jcorilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 files reviewed, 1 unresolved discussion (waiting on @SanjayVas and @stevenwarejones)


src/main/proto/wfa/measurement/api/v2alpha/event_group_metadata/testing/simulator_synthetic_data_spec.proto line 89 at r4 (raw file):

Previously, stevenwarejones (Steven Ware Jones) wrote…

can we make this a date range?

Done.

Copy link
Collaborator

@stevenwarejones stevenwarejones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jcorilla)

@jcorilla jcorilla force-pushed the jcorilla-synthetic-data-proto branch from f54f107 to b915ed7 Compare June 6, 2023 16:28
Copy link
Contributor Author

@jcorilla jcorilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jcorilla)

Copy link
Contributor Author

@jcorilla jcorilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kungfucraig Could you take another look at the change in date spec from singular date -> date range? I believe you also have a block on the VidInterval discussion.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jcorilla)

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jcorilla)

Copy link
Member

@kungfucraig kungfucraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jcorilla)

@jcorilla jcorilla force-pushed the jcorilla-synthetic-data-proto branch from b915ed7 to 3f78075 Compare June 6, 2023 19:48
@jcorilla jcorilla merged commit 0a0c60f into main Jun 6, 2023
@jcorilla jcorilla deleted the jcorilla-synthetic-data-proto branch June 6, 2023 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants