-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add events to enrichment settings file (Closes #806) #807
base: master
Are you sure you want to change the base?
Add events to enrichment settings file (Closes #806) #807
Conversation
111a587
to
6b119da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @szareiangm. I think this can be slightly improved.
- Current method is not granular enough. We can filter entities like
page_view
,struct
,unstruct
, but cannot filter only some particular self-describing (unstruct) events (e.g. onlyiglu:com.statusgator/status_change/jsonschema/1-0-0
) - In order to fix this, we list these entities not as strings, but as union type (e.g. using
oneOf
) of plainevent_type
(page_view
,struct
etc) andunstruct
+ iglu URI. - Event names can be listed in
enum
as we have fixed set of them: https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-common-enrich/src/main/scala/com.snowplowanalytics.snowplow.enrich/common/enrichments/EventEnrichments.scala#L166-L172
"enabled": { | ||
"type": "boolean" | ||
}, | ||
"events" : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tabs used everywhere in the config, so indentation is broken a little bit.
So do you mean that I should create a data structure definition that can be |
I ended up with this: "events" : {
"type" : [ "array", "null" ],
"items" : {
"anyOf": [
{
"type": "object",
"properties": {
"eventType": {
"type": "string",
"enum": ["struct", "unstruct",
"ad_impression", "transaction", "transaction_item", "page_view","page_ping"]
}
},
"required": ["eventType"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"schema": {
"type": "string",
"pattern": "^iglu:([a-zA-Z0-9-_.]+)/([a-zA-Z0-9-_]+)/([a-zA-Z0-9-_]+)/([1-9][0-9]*|\\*)-((?:0|[1-9][0-9]*)|\\*)-((?:0|[1-9][0-9]*)|\\*)$"
}
},
"required": ["schema"],
"additionalProperties": false
}
]
},
"description": "Run this enrichment for the event names in the list. Use null to disable"
}, Does it follow your points? @chuwy |
Hey @szareiangm! Yes, that's exactly right. Except that I'd use Otherwise 👍 |
What if someone wants to have the enrichment for |
That would mean that then need to have both objects in config: [
{"eventType": "page_view"},
{"schema": "iglu:com.acme/user_registration/jsonschema/1-0-2"}
] Any particular event cannot be |
By the way, I'd go even further and make my above statement more explicit in config by making unstruct event schema look like following: {
"type": "object",
"properties": {
"eventType": {
"type": "string",
"enum": ["unstruct"]
},
"schema": {
"type": "string",
"pattern": "^iglu:([a-zA-Z0-9-_.]+)/([a-zA-Z0-9-_]+)/([a-zA-Z0-9-_]+)/([1-9][0-9]*|\\*)-((?:0|[1-9][0-9]*)|\\*)-((?:0|[1-9][0-9]*)|\\*)$"
}
},
"required": ["eventType", "schema"],
"additionalProperties": false
} It means that |
No, this is still same array that can contain multiple objects which are either "classic" event OR unstruct event. List of [Classic enum | Unstruct schema] or (shortened format): List(
Classic("page_view"),
Unstruct("iglu:com.acme/user_registration/jsonschema/1-0-2"),
Classic("ad_impression"),
Unstruct("iglu:com.acme/foo/jsonschema/1-0-2")
) Oh, and one more point. Unstruct event's schemas must be in schema "crititerion format" ( |
Is it close to your idea if I add * for the regex for your last point? "events" : {
"type" : [ "array", "null" ],
"items" : {
"oneOf": [
{
"type": "object",
"properties": {
"eventType": {
"type": "string",
"enum": ["struct",
"ad_impression", "transaction", "transaction_item", "page_view","page_ping"]
}
},
"required": ["eventType"],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"eventType": {
"type": "string",
"enum": ["unstruct"]
},
"schema": {
"type": "string",
"pattern": "^iglu:([a-zA-Z0-9-_.]+)/([a-zA-Z0-9-_]+)/([a-zA-Z0-9-_]+)/([1-9][0-9]*|\\*)-((?:0|[1-9][0-9]*)|\\*)-((?:0|[1-9][0-9]*)|\\*)$"
}
},
"required": ["eventType", "schema"],
"additionalProperties": false
}
]
},
"description": "Run this enrichment for the event names in the list. Use null to disable"
} |
Yep, thank you @szareiangm. Regex seems to be valid. |
Yay! |
6b119da
to
ac1690a
Compare
I updated my PR with the new changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
@chuwy I have a question. Converting this schema to a case class needs a serializer ( |
Hey @szareiangm. We don't yet use any sealed trait EventSkip
object EventSkip {
case class ClassicEvent(eventType: String) extends EventSkip // would be nice to have eventType as a enum-like hierarchy
case class UnstructEvent(schema: SchemaKey) extends EventSkip
def parse(json: JValue): Either[String, EventSkip] =
json match {
case JObject(fields) =>
(fields.toMap.get("eventType"), fields.toMap.get("schema")) match {
case (Some("unstruct"), Some(schema)) =>
Right(UnstructEvent(schema))
case (Some(eventType), None) =>
Right(ClassicEvent(eventType))
case _ =>
Left("Object ${compact(json)} cannot be deserialized to EventSkip")
}
}
} This is very rough example. Ideally, it should be a serialization format. |
@szareiangm has signed the Software Grant and Corporate Contributor License Agreement |
Added
events
to all of the enrichments.