Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add events to enrichment settings file (Closes #806) #807

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

szareiangm
Copy link

Added events to all of the enrichments.

@szareiangm szareiangm force-pushed the master-optional-enrichments branch 2 times, most recently from 111a587 to 6b119da Compare July 12, 2018 01:47
Copy link
Contributor

@chuwy chuwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @szareiangm. I think this can be slightly improved.

  1. Current method is not granular enough. We can filter entities like page_view, struct, unstruct, but cannot filter only some particular self-describing (unstruct) events (e.g. only iglu:com.statusgator/status_change/jsonschema/1-0-0)
  2. In order to fix this, we list these entities not as strings, but as union type (e.g. using oneOf) of plain event_type (page_view, struct etc) and unstruct + iglu URI.
  3. Event names can be listed in enum as we have fixed set of them: https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-common-enrich/src/main/scala/com.snowplowanalytics.snowplow.enrich/common/enrichments/EventEnrichments.scala#L166-L172

"enabled": {
"type": "boolean"
},
"events" : {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tabs used everywhere in the config, so indentation is broken a little bit.

@szareiangm
Copy link
Author

So do you mean that I should create a data structure definition that can be event_type (page_view, struct etc) and unstruct + iglu UR? And then I would have a list of this data structure?

@szareiangm
Copy link
Author

I ended up with this:

"events" : {
    "type" : [ "array", "null" ],
    "items" : {
        "anyOf": [
            {
                "type": "object",
                "properties": {
                    "eventType": {
                        "type": "string",
                        "enum": ["struct", "unstruct",
                                  "ad_impression", "transaction", "transaction_item", "page_view","page_ping"]
                    }
                },
                "required": ["eventType"],
                "additionalProperties": false
            },
            {
                "type": "object",
                "properties": {
                    "schema": {
                        "type": "string",
                        "pattern": "^iglu:([a-zA-Z0-9-_.]+)/([a-zA-Z0-9-_]+)/([a-zA-Z0-9-_]+)/([1-9][0-9]*|\\*)-((?:0|[1-9][0-9]*)|\\*)-((?:0|[1-9][0-9]*)|\\*)$"
                    }
                },
                "required": ["schema"],
                "additionalProperties": false
            }
        ]
    },
    "description": "Run this enrichment for the event names in the list. Use null to disable"
},

Does it follow your points? @chuwy

@chuwy
Copy link
Contributor

chuwy commented Jul 13, 2018

Hey @szareiangm! Yes, that's exactly right. Except that I'd use oneOf instead of anyOf as anyOf means we can use something like {"eventType": "ad_impression", "schema": "iglu:com.acme/event/jsonschema/1-0-2"}, which does not make much sense.

Otherwise 👍

@szareiangm
Copy link
Author

What if someone wants to have the enrichment for page_view and a custom unstruct event for user_registration. if we use oneOf we should have the iglu:... for page_view too? @chuwy

@chuwy
Copy link
Contributor

chuwy commented Jul 13, 2018

That would mean that then need to have both objects in config:

[
  {"eventType": "page_view"},
  {"schema": "iglu:com.acme/user_registration/jsonschema/1-0-2"}
]

Any particular event cannot be page_view and user_registration ("classic" event and unstruct event) at the same time.

@chuwy
Copy link
Contributor

chuwy commented Jul 13, 2018

By the way, I'd go even further and make my above statement more explicit in config by making unstruct event schema look like following:

{
    "type": "object",
    "properties": {
        "eventType": {
            "type": "string",
            "enum": ["unstruct"]
        },
        "schema": {
            "type": "string",
            "pattern": "^iglu:([a-zA-Z0-9-_.]+)/([a-zA-Z0-9-_]+)/([a-zA-Z0-9-_]+)/([1-9][0-9]*|\\*)-((?:0|[1-9][0-9]*)|\\*)-((?:0|[1-9][0-9]*)|\\*)$"
        }
    },
    "required": ["eventType", "schema"],
    "additionalProperties": false
}

It means that schema is applicable only when eventType is unstruct. On the other hand we would also be able to filter only unstruct events by not specifying any schema.

@chuwy
Copy link
Contributor

chuwy commented Jul 13, 2018

No, this is still same array that can contain multiple objects which are either "classic" event OR unstruct event.

List of [Classic enum | Unstruct schema]

or (shortened format):

List(
  Classic("page_view"),
  Unstruct("iglu:com.acme/user_registration/jsonschema/1-0-2"),
  Classic("ad_impression"),
  Unstruct("iglu:com.acme/foo/jsonschema/1-0-2")
)

Oh, and one more point. Unstruct event's schemas must be in schema "crititerion format" (iglu:com.acme/user_registration/jsonschema/1-*-*). Otherwise user will have to add each new schema manually, which is very error-prone.

@szareiangm
Copy link
Author

Is it close to your idea if I add * for the regex for your last point?

        "events" : {
            "type" : [ "array", "null" ],
            "items" : {
                "oneOf": [
                    {
                        "type": "object",
                        "properties": {
                            "eventType": {
                                "type": "string",
                                "enum": ["struct",
                                          "ad_impression", "transaction", "transaction_item", "page_view","page_ping"]
                            }
                        },
                        "required": ["eventType"],
                        "additionalProperties": false
                    },
                    {
                        "type": "object",
                        "properties": {
                            "eventType": {
                                "type": "string",
                                "enum": ["unstruct"]
                            },
                            "schema": {
                                "type": "string",
                                "pattern": "^iglu:([a-zA-Z0-9-_.]+)/([a-zA-Z0-9-_]+)/([a-zA-Z0-9-_]+)/([1-9][0-9]*|\\*)-((?:0|[1-9][0-9]*)|\\*)-((?:0|[1-9][0-9]*)|\\*)$"
                            }
                        },
                        "required": ["eventType", "schema"],
                        "additionalProperties": false
                    }
                ]
            },
            "description": "Run this enrichment for the event names in the list. Use null to disable"
        }

@chuwy
Copy link
Contributor

chuwy commented Jul 13, 2018

Yep, thank you @szareiangm. Regex seems to be valid.

@szareiangm
Copy link
Author

Yay!
I saw the \\* now. I will change the files to reflect our discussion in a few minutes.

@szareiangm
Copy link
Author

I updated my PR with the new changes.

Copy link
Contributor

@chuwy chuwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@szareiangm
Copy link
Author

@chuwy I have a question. Converting this schema to a case class needs a serializer (oneOf). The only similar code that I could find was in schemaddl library. Do you have any ideas how to do serialize this without writing too much duplicate code or importing a library for just one line?

@chuwy
Copy link
Contributor

chuwy commented Jul 17, 2018

Hey @szareiangm. We don't yet use any jsonschema-to-case class serializers and usually write these decoders manually. E.g. in this case it would be something like:

sealed trait EventSkip
object EventSkip {
  case class ClassicEvent(eventType: String) extends EventSkip    // would be nice to have eventType as a enum-like hierarchy
  case class UnstructEvent(schema: SchemaKey) extends EventSkip

  def parse(json: JValue): Either[String, EventSkip] =
    json match {
      case JObject(fields) =>
        (fields.toMap.get("eventType"), fields.toMap.get("schema")) match {
          case (Some("unstruct"), Some(schema)) =>
            Right(UnstructEvent(schema))
          case (Some(eventType), None) =>
            Right(ClassicEvent(eventType))
          case _ =>
            Left("Object ${compact(json)} cannot be deserialized to EventSkip")
        }
      }
}

This is very rough example. Ideally, it should be a serialization format.

@snowplowcla
Copy link

@szareiangm has signed the Software Grant and Corporate Contributor License Agreement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants