Consider checking the type of a bad row before parsing directly with the right decoder #35

benjben · 2020-04-06T15:22:45Z

At the moment we use .as[BadRow] to parse a Json to its bad row case class.
This is based on ADTs encoding and decoding.
Given the number of classes and fields in the bad rows it might be more performant to first check the schema of the bad row (in "schema" field) and then use directly the right Decoder, e.g.

schema match {
  case Schemas.EnrichmentFailures.toSchemaUri =>
    json.as[EnrichmentFailures]
  ...
}

The text was updated successfully, but these errors were encountered:

chuwy · 2020-04-06T15:42:38Z

Yup, that's a good idea. But I believe it is duplicated ticket: #28

chuwy · 2020-04-06T15:44:39Z

At the same time, I think it would be better to leave Decoder[BadRow] like it is now, because .as[A] decoding implies that structure is compatible with A, while what we really can decode it SelfDescribingData[BadRow]

benjben · 2020-04-06T20:31:13Z

But I believe it is duplicated ticket

Arf it is indeed, sorry.

because .as[A] decoding implies that structure is compatible with A, while what we really can decode it SelfDescribingData[BadRow]

Good point. So we would need a parse function as you suggested in #28

def parse(json: Json): Either[String, BadRow] =
  for {
    sdj <- SelfDescribingData
      .parse(json)
      .leftMap(e => s"Cannot parse ${json.noSpaces} as self-describing JSON, ${e.code}")
    br <-  sdj.schema match {
      case Schemas.EnrichmentFailures.toSchemaUri ->
        sdj.data.as[EnrichmentFailures].leftMap(_.getMessage)
     case Schemas.SchemaViolations.toSchemaUri ->
        sdj.data.as[SchemaViolations].leftMap(_.getMessage)
     ...
  } yield br

We should also add one to parse directly from a String.

benjben added enhancement New feature or request question Further information is requested labels Apr 6, 2020

benjben closed this as completed Apr 6, 2020

chuwy added the duplicate This issue or pull request already exists label Apr 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider checking the type of a bad row before parsing directly with the right decoder #35

Consider checking the type of a bad row before parsing directly with the right decoder #35

benjben commented Apr 6, 2020

chuwy commented Apr 6, 2020

chuwy commented Apr 6, 2020

benjben commented Apr 6, 2020

Consider checking the type of a bad row before parsing directly with the right decoder #35

Consider checking the type of a bad row before parsing directly with the right decoder #35

Comments

benjben commented Apr 6, 2020

chuwy commented Apr 6, 2020

chuwy commented Apr 6, 2020

benjben commented Apr 6, 2020