Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Avro schema from schema registry if it is missing #951

Conversation

irajhedayati
Copy link
Contributor

@irajhedayati irajhedayati commented Sep 22, 2019

As mentioned #894, secor fails if

  • secor.file.reader.writer.factory=com.pinterest.secor.io.impl.AvroParquetFileReaderWriterFactory
  • kafka.useTimestamp=true

The writer is going to ask SecorSchemaRegistryClient for Avro schema for the writer. But that only has the schema in the cache if there has been a call for decodeMessage. However, if we use the message timestamp, it won't need to decode the message.

I updated the code to query the schema registry in order to update the cache. It uses the standard subject name for a Kafka topic as "kafka_topic-value" to query the schema.

public Schema getSchema(String topic) {
Schema schema = schemas.get(topic);
if (schema == null) {
throw new IllegalStateException("Avro schema not found for topic " + topic);
try {
SchemaMetadata schemaMetadata = schemaRegistryClient.getLatestSchemaMetadata(topic + "-value");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd to me. Who defines that convention: 'topic_name-value'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, there are three strategies configurable in Kafka to work with Schema Registry explained here. The default one which most of the deployments use is "TopicNameStrategy". In this strategy

  • The subject name for the value is "topic-value"
  • The subject name for the key is "key-value"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is getSchema() only interested in value schema (or key schema as well)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only value schema. Key is only an internal Kafka thing and is not gonna be stored in the Avro/AvroParquet files

@HenryCaiHaiying HenryCaiHaiying merged commit da6d8b9 into pinterest:master Sep 23, 2019
@HenryCaiHaiying
Copy link
Contributor

Looks good.

@irajhedayati irajhedayati deleted the retrieve-avro-schema-from-schemaregistry branch September 23, 2019 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants