[Question] How do you use Confluent Schema Registry to deserialize Avro from Kafka? #592

rsaltrelli · 2020-07-17T14:59:13Z

rsaltrelli
Jul 17, 2020

I'm working on a proof of concept using Kafka, Confluent Schema Registry, and .NET for Apache Spark. I'm able to read Avro serialized bytes from Kafka and print them to the console. Now I'm trying to integrate the Schema Registry to make sense of the bytes. I'm including the org.apache.spark:spark-avro_2.11:2.4.1 package when running spark-submit and would like my Spark app to be able to call the from_avro UDF as described here.

What I can't figure out is exactly how to invoke the from_avro UDF. It seems like I should call CallUDF(), but the from_avro function also takes an additional parameter which is the Avro schema I pulled out of Confluent Schema Registry. How do I supply that additional parameter? Am I even going about this the right way?

using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using Confluent.SchemaRegistry;
using Microsoft.Spark.Sql;
using static Microsoft.Spark.Sql.Functions;

namespace ClickStreamConsumer
{
    class Program
    {
        private static ISchemaRegistryClient _schemaRegistry;

        static async Task Main(string[] args)
        {
            if (args.Length != 3)
            {
                Console.Error.WriteLine(
                    "Usage: ClickEventConsumer " +
                    "<bootstrap-servers> <schema-registry-url> <topic>");
                Environment.Exit(1);
            }

            string bootstrapServers = args[0];
            string schemaRegistryUrl = args[1];
            string topic = args[2];

            var schemaRegistryParams = new Dictionary<string, string>
            {
                { "schema.registry.url", schemaRegistryUrl }
            };

            _schemaRegistry = new CachedSchemaRegistryClient(schemaRegistryParams);

            var subject = _schemaRegistry.ConstructValueSubjectName(topic);
            var schema = await _schemaRegistry.GetLatestSchemaAsync(subject);

            var spark = SparkSession
                .Builder()
                .AppName("ClickEventConsumer")
                .GetOrCreate();

            var avro = spark
                .ReadStream()
                .Format("kafka")
                .Option("kafka.bootstrap.servers", bootstrapServers)
                .Option("subscribe", topic)
                .Option("startingOffsets", "earliest")
                .Load();

            // How do I pass in the Avro schema?
            // Is there a better way to do this?
            var something = avro
                .Select(CallUDF("from_avro", avro["value"]));

            var console = something
                .WriteStream()
                .Format("console")
                .Start();

            console.AwaitTermination();

            spark.Stop();
        }
    }
}

Answered by imback82

Jul 17, 2020

from_avro is a Scala side function, so you don't need to create a UDF. Unfortunately, that function is not currently exposed in .NET for Apache Spark.

Meanwhile, you can try the following:

static void Column FromAvro(Column data, String jsonFormatSchema)
{
    return new Column(
        (JvmObjectReference)SparkEnvironment.JvmBridge.CallStaticJavaMethod(
            "org.apache.spark.sql.avro",
           "from_avro",
           data,
           jsonFormatSchema));
}

And make sure add https://github.com/aelij/IgnoresAccessChecksToGenerator in your project to access internal classes.

View full answer

imback82 · 2020-07-17T18:14:38Z

imback82
Jul 17, 2020

from_avro is a Scala side function, so you don't need to create a UDF. Unfortunately, that function is not currently exposed in .NET for Apache Spark.

Meanwhile, you can try the following:

static void Column FromAvro(Column data, String jsonFormatSchema)
{
    return new Column(
        (JvmObjectReference)SparkEnvironment.JvmBridge.CallStaticJavaMethod(
            "org.apache.spark.sql.avro",
           "from_avro",
           data,
           jsonFormatSchema));
}

And make sure add https://github.com/aelij/IgnoresAccessChecksToGenerator in your project to access internal classes.

0 replies

rsaltrelli · 2020-07-17T19:15:14Z

rsaltrelli
Jul 17, 2020
Author

Thanks this seems to work but now I've uncovered other errors.

20/07/17 19:08:05 WARN AbstractChannelHandlerContext: An exception 'java.lang.NullPointerException' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.NullPointerException
	at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
	at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
	at scala.collection.SeqLike$class.size(SeqLike.scala:106)
	at scala.collection.mutable.ArrayOps$ofRef.size(ArrayOps.scala:186)
	at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69)
	at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22)
	at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:230)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:171)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.handleBackendRequest(DotnetBackendHandler.scala:85)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.channelRead0(DotnetBackendHandler.scala:28)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.channelRead0(DotnetBackendHandler.scala:23)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)

Not sure where this exception is coming from. My hunch is that it might have something to do with the fact that the first 4 bytes are the Id of the schema in the Schema Registry, which can throw off deserialization when using a vanilla Avro deserializer. Is there a way to do the deserialization in .NET rather than in Scala? If I could get the bytes in .NET then I could instantiate a Confluent AvroDeserializer and it would automatically pop those bytes, look up the schema in Schema Registry and deserialize the rest accordingly. That would be ideal.

0 replies

imback82 · 2020-07-17T21:51:07Z

imback82
Jul 17, 2020

which line of the code above is causing this exception?

0 replies

rsaltrelli · 2020-07-20T12:42:26Z

rsaltrelli
Jul 20, 2020
Author

I introduced the code you recommended. Presumably, it's coming from there.

using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using Confluent.SchemaRegistry;
using Microsoft.Spark.Interop;
using Microsoft.Spark.Interop.Ipc;
using Microsoft.Spark.Sql;
using static Microsoft.Spark.Sql.Functions;

namespace ClickStreamConsumer
{
    class Program
    {
        private static ISchemaRegistryClient _schemaRegistry;

        static async Task Main(string[] args)
        {
            if (args.Length != 3)
            {
                Console.Error.WriteLine(
                    "Usage: ClickEventConsumer " +
                    "<bootstrap-servers> <schema-registry-url> <topic>");
                Environment.Exit(1);
            }

            string bootstrapServers = args[0];
            string schemaRegistryUrl = args[1];
            string topic = args[2];

            var schemaRegistryParams = new Dictionary<string, string>
            {
                { "schema.registry.url", schemaRegistryUrl }
            };

            _schemaRegistry = new CachedSchemaRegistryClient(schemaRegistryParams);

            var subject = _schemaRegistry.ConstructValueSubjectName(topic);
            var schema = await _schemaRegistry.GetLatestSchemaAsync(subject);

            Console.WriteLine(schema.SchemaString);

            var spark = SparkSession
                .Builder()
                .AppName("ClickEventConsumer")
                .GetOrCreate();

            var avro = spark
                .ReadStream()
                .Format("kafka")
                .Option("kafka.bootstrap.servers", bootstrapServers)
                .Option("subscribe", topic)
                .Option("startingOffsets", "earliest")
                .Load();

            var something = avro
                .Select(FromAvro(avro["key"], "string").As("key"),
                    FromAvro(avro["value"], schema.SchemaString).As("value"));

            var console = something
                .WriteStream()
                .Format("console")
                .Start();

            console.AwaitTermination();

            spark.Stop();
        }

        private static Column FromAvro(Column avro, string schema)
        {
            if (avro is null)
            {
                throw new ArgumentNullException("avro");
            }

            if (string.IsNullOrWhiteSpace(schema))
            {
                throw new ArgumentNullException("schema");
            }

            return new Column(
                (JvmObjectReference)SparkEnvironment.JvmBridge.CallStaticJavaMethod(
                    "org.apache.spark.sql.avro",
                    "from_avro",
                    avro,
                    schema));
        }
    }
}

0 replies

rsaltrelli · 2020-07-20T15:35:15Z

rsaltrelli
Jul 20, 2020
Author

This documentation seems to indicate that I can just pass the schema registry URL into the from_avro method.

using System;
using Microsoft.Spark.Interop;
using Microsoft.Spark.Interop.Ipc;
using Microsoft.Spark.Sql;

namespace ClickStreamConsumer
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length != 3)
            {
                Console.Error.WriteLine(
                    "Usage: ClickEventConsumer " +
                    "<bootstrap-servers> <schema-registry-url> <topic>");
                Environment.Exit(1);
            }

            string bootstrapServers = args[0];
            string schemaRegistryUrl = args[1];
            string topic = args[2];


            var spark = SparkSession
                .Builder()
                .AppName("ClickEventConsumer")
                .GetOrCreate();

            var avro = spark
                .ReadStream()
                .Format("kafka")
                .Option("kafka.bootstrap.servers", bootstrapServers)
                .Option("subscribe", topic)
                .Option("startingOffsets", "earliest")
                .Load();

            var something = avro
                .Select(FromAvro(avro["value"], topic + "-value", schemaRegistryUrl).As("value"));

            var console = something
                .WriteStream()
                .Format("console")
                .Start();

            console.AwaitTermination();

            spark.Stop();
        }

        private static Column FromAvro(Column avro, string subject, string schemaRegistryUrl)
        {
            return new Column(
                (JvmObjectReference)SparkEnvironment.JvmBridge.CallStaticJavaMethod(
                    "org.apache.spark.sql.avro",
                    "from_avro",
                    avro,
                    subject,
                    schemaRegistryUrl));
        }
    }
}

However, when I do it I just get more cryptic exceptions.

20/07/20 15:32:19 DEBUG AbstractChannelHandlerContext: An exception java.lang.NullPointerException
	at org.apache.spark.api.dotnet.DotnetBackendHandler.exceptionCaught(DotnetBackendHandler.scala:96)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
	at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)
was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.NullPointerException
	at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
	at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
	at scala.collection.SeqLike$class.size(SeqLike.scala:106)
	at scala.collection.mutable.ArrayOps$ofRef.size(ArrayOps.scala:186)
	at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69)
	at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22)
	at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:230)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:171)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.handleBackendRequest(DotnetBackendHandler.scala:85)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.channelRead0(DotnetBackendHandler.scala:28)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.channelRead0(DotnetBackendHandler.scala:23)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)

0 replies

imback82 · 2020-07-20T16:16:58Z

imback82
Jul 20, 2020

        private static Column FromAvro(Column avro, string subject, string schemaRegistryUrl)
        {
            return new Column(
                (JvmObjectReference)SparkEnvironment.JvmBridge.CallStaticJavaMethod(
                    "org.apache.spark.sql.avro",
                    "from_avro",
                    avro,
                    subject,
                    schemaRegistryUrl));
        }

from_avro seems to take only two params? https://github.com/apache/spark/blob/58c637a213e122156b914c27d51e5eb1ec4e0b4c/external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala#L33

0 replies

rsaltrelli · 2020-07-23T12:52:34Z

rsaltrelli
Jul 23, 2020
Author

I see. Looks like this other version of from_avro might only be accessible from Azure Databricks, which I'm not using. Back to the way you suggested. I'm kind of blocked if I can't figure out where this java.lang.NullPointerException is coming from. Is there any way to get better visibility into that?

0 replies

rsaltrelli · 2020-07-23T13:39:15Z

rsaltrelli
Jul 23, 2020
Author

I want to be able to do exactly what this is doing but ideally in .NET instead of Scala.

0 replies

imback82 · 2020-07-23T15:50:15Z

imback82
Jul 23, 2020

For java.lang.NullPointerException, do you have a repro set up I can try? Is this also happening locally (not Azure Databricks)?

0 replies

rsaltrelli · 2020-07-23T18:14:31Z

rsaltrelli
Jul 23, 2020
Author

It's all running locally using Docker Compose. Here's the repo I'm working on. There are a bunch of things in here that are a work in progress. To see the error you're interested in, clone the repo and do the following.

cd {repo}/spark
./build.sh
cd {repo}
docker-compose build
docker-compose up -d  zookeeper broker schema-registry control-center 
# Wait for the Confluent Platform to spin up.
docker-compose up -d click-stream-producer click-stream-consumer

click-stream-producer is a .NET Core console app that produces messages to Kafka and publishes the schemas to Schema Registry.

click-stream-consumer is the .NET for Spark Streaming app. The container is a single-node spark cluster that runs my .NET for Spark console app. The app is just trying to consume the messages from Kafka and print them to the console for now.

0 replies

imback82 · 2020-07-23T19:18:58Z

imback82
Jul 23, 2020

Hmm.., is it possible to simplify the repro step such that it doesn't involve kafka, etc.? For example, read a file and call FromAvro on it? The simpler the repro step is, I can get to it faster.

Also, can you share the full log for the run?

0 replies

rsaltrelli · 2020-07-23T19:42:16Z

rsaltrelli
Jul 23, 2020
Author

The confluent platform is part of the repro steps. Deserializing Confluent Avro data is slightly different than deserializing a vanilla Avro data. Regular Avro is just the raw bytes, but Confluent Avro prepends the Id of the schema in the schema registry. So I suspect that even if we figure out how to use from_avro to deserialize vanilla Avro, it won't be able to deserialize Confluent Avro because of the 4 unexpected bytes at the beginning of the message.

Full Logs

Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/bin/spark-2.4.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-sql-kafka-0-10_2.11 added as a dependency
org.apache.spark#spark-avro_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-5a2ff95b-906b-4367-a1e0-579205bb09b9;1.0
	confs: [default]
	found org.apache.spark#spark-sql-kafka-0-10_2.11;2.4.1 in central
	found org.apache.kafka#kafka-clients;2.0.0 in central
	found org.lz4#lz4-java;1.4.0 in central
	found org.xerial.snappy#snappy-java;1.1.7.1 in central
	found org.slf4j#slf4j-api;1.7.16 in central
	found org.spark-project.spark#unused;1.0.0 in central
	found org.apache.spark#spark-avro_2.11;2.4.1 in central
downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.11/2.4.1/spark-sql-kafka-0-10_2.11-2.4.1.jar ...
	[SUCCESSFUL ] org.apache.spark#spark-sql-kafka-0-10_2.11;2.4.1!spark-sql-kafka-0-10_2.11.jar (364ms)
downloading https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.11/2.4.1/spark-avro_2.11-2.4.1.jar ...
	[SUCCESSFUL ] org.apache.spark#spark-avro_2.11;2.4.1!spark-avro_2.11.jar (105ms)
downloading https://repo1.maven.org/maven2/org/apache/kafka/kafka-clients/2.0.0/kafka-clients-2.0.0.jar ...
	[SUCCESSFUL ] org.apache.kafka#kafka-clients;2.0.0!kafka-clients.jar (658ms)
downloading https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar ...
	[SUCCESSFUL ] org.spark-project.spark#unused;1.0.0!unused.jar (48ms)
downloading https://repo1.maven.org/maven2/org/lz4/lz4-java/1.4.0/lz4-java-1.4.0.jar ...
	[SUCCESSFUL ] org.lz4#lz4-java;1.4.0!lz4-java.jar (177ms)
downloading https://repo1.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.1/snappy-java-1.1.7.1.jar ...
	[SUCCESSFUL ] org.xerial.snappy#snappy-java;1.1.7.1!snappy-java.jar (544ms)
downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.16/slf4j-api-1.7.16.jar ...
	[SUCCESSFUL ] org.slf4j#slf4j-api;1.7.16!slf4j-api.jar (79ms)
:: resolution report :: resolve 13218ms :: artifacts dl 2053ms
	:: modules in use:
	org.apache.kafka#kafka-clients;2.0.0 from central in [default]
	org.apache.spark#spark-avro_2.11;2.4.1 from central in [default]
	org.apache.spark#spark-sql-kafka-0-10_2.11;2.4.1 from central in [default]
	org.lz4#lz4-java;1.4.0 from central in [default]
	org.slf4j#slf4j-api;1.7.16 from central in [default]
	org.spark-project.spark#unused;1.0.0 from central in [default]
	org.xerial.snappy#snappy-java;1.1.7.1 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   7   |   7   |   7   |   0   ||   7   |   7   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-5a2ff95b-906b-4367-a1e0-579205bb09b9
	confs: [default]
	7 artifacts copied, 0 already retrieved (4928kB/82ms)
20/07/23 19:17:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/07/23 19:17:30 INFO DotnetRunner: Starting DotnetBackend with dotnet.
20/07/23 19:17:30 DEBUG InternalLoggerFactory: Using SLF4J as the default logging framework
20/07/23 19:17:30 DEBUG MultithreadEventLoopGroup: -Dio.netty.eventLoopThreads: 8
20/07/23 19:17:30 DEBUG PlatformDependent0: -Dio.netty.noUnsafe: false
20/07/23 19:17:30 DEBUG PlatformDependent0: Java version: 8
20/07/23 19:17:30 DEBUG PlatformDependent0: sun.misc.Unsafe.theUnsafe: available
20/07/23 19:17:30 DEBUG PlatformDependent0: sun.misc.Unsafe.copyMemory: available
20/07/23 19:17:30 DEBUG PlatformDependent0: java.nio.Buffer.address: available
20/07/23 19:17:30 DEBUG PlatformDependent0: direct buffer constructor: available
20/07/23 19:17:30 DEBUG PlatformDependent0: java.nio.Bits.unaligned: available, true
20/07/23 19:17:30 DEBUG PlatformDependent0: jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable prior to Java9
20/07/23 19:17:30 DEBUG PlatformDependent0: java.nio.DirectByteBuffer.<init>(long, int): available
20/07/23 19:17:30 DEBUG PlatformDependent: sun.misc.Unsafe: available
20/07/23 19:17:30 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
20/07/23 19:17:30 DEBUG PlatformDependent: -Dio.netty.bitMode: 64 (sun.arch.data.model)
20/07/23 19:17:30 DEBUG PlatformDependent: -Dio.netty.noPreferDirect: false
20/07/23 19:17:30 DEBUG PlatformDependent: -Dio.netty.maxDirectMemory: 954728448 bytes
20/07/23 19:17:30 DEBUG PlatformDependent: -Dio.netty.uninitializedArrayAllocationThreshold: -1
20/07/23 19:17:30 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner(): available
20/07/23 19:17:30 DEBUG NioEventLoop: -Dio.netty.noKeySetOptimization: false
20/07/23 19:17:30 DEBUG NioEventLoop: -Dio.netty.selectorAutoRebuildThreshold: 512
20/07/23 19:17:30 DEBUG PlatformDependent: org.jctools-core.MpscChunkedArrayQueue: available
20/07/23 19:17:30 DEBUG DefaultChannelId: -Dio.netty.processId: 6 (auto-detected)
20/07/23 19:17:30 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
20/07/23 19:17:30 DEBUG NetUtil: -Djava.net.preferIPv6Addresses: false
20/07/23 19:17:30 DEBUG NetUtil: Loopback interface: lo (lo, 127.0.0.1)
20/07/23 19:17:30 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 4096
20/07/23 19:17:30 DEBUG DefaultChannelId: -Dio.netty.machineId: 02:42:ac:ff:fe:17:00:08 (auto-detected)
20/07/23 19:17:30 DEBUG InternalThreadLocalMap: -Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
20/07/23 19:17:30 DEBUG InternalThreadLocalMap: -Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
20/07/23 19:17:30 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.level: simple
20/07/23 19:17:30 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.targetRecords: 4
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.numHeapArenas: 8
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.numDirectArenas: 8
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.pageSize: 8192
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.maxOrder: 11
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.chunkSize: 16777216
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.tinyCacheSize: 512
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.smallCacheSize: 256
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.normalCacheSize: 64
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.maxCachedBufferCapacity: 32768
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.cacheTrimInterval: 8192
20/07/23 19:17:30 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.useCacheForAllThreads: true
20/07/23 19:17:30 DEBUG ByteBufUtil: -Dio.netty.allocator.type: pooled
20/07/23 19:17:30 DEBUG ByteBufUtil: -Dio.netty.threadLocalDirectBufferSize: 65536
20/07/23 19:17:30 DEBUG ByteBufUtil: -Dio.netty.maxThreadLocalCharBufferSize: 16384
20/07/23 19:17:31 INFO DotnetRunner: Port number used by DotnetBackend is 36341
20/07/23 19:17:31 INFO DotnetRunner: Adding key=spark.jars and value=file:///root/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.4.1.jar,file:///root/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.1.jar,file:///root/.ivy2/jars/org.apache.kafka_kafka-clients-2.0.0.jar,file:///root/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar,file:///root/.ivy2/jars/org.lz4_lz4-java-1.4.0.jar,file:///root/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.7.1.jar,file:///root/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar,file:/app/microsoft-spark-2.4.x-0.12.1.jar to environment
20/07/23 19:17:31 INFO DotnetRunner: Adding key=spark.app.name and value=org.apache.spark.deploy.dotnet.DotnetRunner to environment
20/07/23 19:17:31 INFO DotnetRunner: Adding key=spark.submit.deployMode and value=client to environment
20/07/23 19:17:31 INFO DotnetRunner: Adding key=spark.master and value=local to environment
20/07/23 19:17:31 INFO DotnetRunner: Adding key=spark.repl.local.jars and value=file:///root/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.4.1.jar,file:///root/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.1.jar,file:///root/.ivy2/jars/org.apache.kafka_kafka-clients-2.0.0.jar,file:///root/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar,file:///root/.ivy2/jars/org.lz4_lz4-java-1.4.0.jar,file:///root/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.7.1.jar,file:///root/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar to environment
Unhandled exception. System.Net.Http.HttpRequestException: [http://schema-registry:8081/] HttpRequestException: Connection refused
   at Confluent.SchemaRegistry.RestService.ExecuteOnOneInstanceAsync(Func`1 createRequest)
   at Confluent.SchemaRegistry.RestService.RequestAsync[T](String endPoint, HttpMethod method, Object[] jsonBody)
   at Confluent.SchemaRegistry.RestService.GetLatestSchemaAsync(String subject)
   at Confluent.SchemaRegistry.CachedSchemaRegistryClient.GetLatestSchemaAsync(String subject)
   at EtlWorkshop.Spark.ClickStreamConsumer.Program.Main(String[] args) in /src/Program.cs:line 34
   at EtlWorkshop.Spark.ClickStreamConsumer.Program.<Main>(String[] args)
20/07/23 19:17:32 INFO DotnetRunner: Closing DotnetBackend
20/07/23 19:17:32 INFO ShutdownHookManager: Shutdown hook called
20/07/23 19:17:32 INFO ShutdownHookManager: Deleting directory /tmp/spark-ecfce0ca-1048-4062-b6e3-d5a87444d23e
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/local/bin/spark-2.4.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-sql-kafka-0-10_2.11 added as a dependency
org.apache.spark#spark-avro_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-d1a50be5-08c3-4569-8549-6c1961178bf5;1.0
	confs: [default]
	found org.apache.spark#spark-sql-kafka-0-10_2.11;2.4.1 in central
	found org.apache.kafka#kafka-clients;2.0.0 in central
	found org.lz4#lz4-java;1.4.0 in central
	found org.xerial.snappy#snappy-java;1.1.7.1 in central
	found org.slf4j#slf4j-api;1.7.16 in central
	found org.spark-project.spark#unused;1.0.0 in central
	found org.apache.spark#spark-avro_2.11;2.4.1 in central
:: resolution report :: resolve 963ms :: artifacts dl 22ms
	:: modules in use:
	org.apache.kafka#kafka-clients;2.0.0 from central in [default]
	org.apache.spark#spark-avro_2.11;2.4.1 from central in [default]
	org.apache.spark#spark-sql-kafka-0-10_2.11;2.4.1 from central in [default]
	org.lz4#lz4-java;1.4.0 from central in [default]
	org.slf4j#slf4j-api;1.7.16 from central in [default]
	org.spark-project.spark#unused;1.0.0 from central in [default]
	org.xerial.snappy#snappy-java;1.1.7.1 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   7   |   0   |   0   |   0   ||   7   |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-d1a50be5-08c3-4569-8549-6c1961178bf5
	confs: [default]
	0 artifacts copied, 7 already retrieved (0kB/12ms)
20/07/23 19:40:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/07/23 19:40:51 INFO DotnetRunner: Starting DotnetBackend with dotnet.
20/07/23 19:40:51 DEBUG InternalLoggerFactory: Using SLF4J as the default logging framework
20/07/23 19:40:51 DEBUG MultithreadEventLoopGroup: -Dio.netty.eventLoopThreads: 8
20/07/23 19:40:51 DEBUG PlatformDependent0: -Dio.netty.noUnsafe: false
20/07/23 19:40:51 DEBUG PlatformDependent0: Java version: 8
20/07/23 19:40:51 DEBUG PlatformDependent0: sun.misc.Unsafe.theUnsafe: available
20/07/23 19:40:51 DEBUG PlatformDependent0: sun.misc.Unsafe.copyMemory: available
20/07/23 19:40:51 DEBUG PlatformDependent0: java.nio.Buffer.address: available
20/07/23 19:40:51 DEBUG PlatformDependent0: direct buffer constructor: available
20/07/23 19:40:51 DEBUG PlatformDependent0: java.nio.Bits.unaligned: available, true
20/07/23 19:40:51 DEBUG PlatformDependent0: jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable prior to Java9
20/07/23 19:40:51 DEBUG PlatformDependent0: java.nio.DirectByteBuffer.<init>(long, int): available
20/07/23 19:40:51 DEBUG PlatformDependent: sun.misc.Unsafe: available
20/07/23 19:40:51 DEBUG PlatformDependent: -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
20/07/23 19:40:51 DEBUG PlatformDependent: -Dio.netty.bitMode: 64 (sun.arch.data.model)
20/07/23 19:40:51 DEBUG PlatformDependent: -Dio.netty.noPreferDirect: false
20/07/23 19:40:51 DEBUG PlatformDependent: -Dio.netty.maxDirectMemory: 954728448 bytes
20/07/23 19:40:51 DEBUG PlatformDependent: -Dio.netty.uninitializedArrayAllocationThreshold: -1
20/07/23 19:40:51 DEBUG CleanerJava6: java.nio.ByteBuffer.cleaner(): available
20/07/23 19:40:51 DEBUG NioEventLoop: -Dio.netty.noKeySetOptimization: false
20/07/23 19:40:51 DEBUG NioEventLoop: -Dio.netty.selectorAutoRebuildThreshold: 512
20/07/23 19:40:51 DEBUG PlatformDependent: org.jctools-core.MpscChunkedArrayQueue: available
20/07/23 19:40:51 DEBUG DefaultChannelId: -Dio.netty.processId: 6 (auto-detected)
20/07/23 19:40:51 DEBUG NetUtil: -Djava.net.preferIPv4Stack: false
20/07/23 19:40:51 DEBUG NetUtil: -Djava.net.preferIPv6Addresses: false
20/07/23 19:40:51 DEBUG NetUtil: Loopback interface: lo (lo, 127.0.0.1)
20/07/23 19:40:51 DEBUG NetUtil: /proc/sys/net/core/somaxconn: 4096
20/07/23 19:40:51 DEBUG DefaultChannelId: -Dio.netty.machineId: 02:42:ac:ff:fe:17:00:08 (auto-detected)
20/07/23 19:40:51 DEBUG InternalThreadLocalMap: -Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
20/07/23 19:40:51 DEBUG InternalThreadLocalMap: -Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
20/07/23 19:40:51 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.level: simple
20/07/23 19:40:51 DEBUG ResourceLeakDetector: -Dio.netty.leakDetection.targetRecords: 4
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.numHeapArenas: 8
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.numDirectArenas: 8
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.pageSize: 8192
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.maxOrder: 11
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.chunkSize: 16777216
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.tinyCacheSize: 512
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.smallCacheSize: 256
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.normalCacheSize: 64
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.maxCachedBufferCapacity: 32768
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.cacheTrimInterval: 8192
20/07/23 19:40:51 DEBUG PooledByteBufAllocator: -Dio.netty.allocator.useCacheForAllThreads: true
20/07/23 19:40:51 DEBUG ByteBufUtil: -Dio.netty.allocator.type: pooled
20/07/23 19:40:51 DEBUG ByteBufUtil: -Dio.netty.threadLocalDirectBufferSize: 65536
20/07/23 19:40:51 DEBUG ByteBufUtil: -Dio.netty.maxThreadLocalCharBufferSize: 16384
20/07/23 19:40:51 INFO DotnetRunner: Port number used by DotnetBackend is 37187
20/07/23 19:40:51 INFO DotnetRunner: Adding key=spark.jars and value=file:///root/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.4.1.jar,file:///root/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.1.jar,file:///root/.ivy2/jars/org.apache.kafka_kafka-clients-2.0.0.jar,file:///root/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar,file:///root/.ivy2/jars/org.lz4_lz4-java-1.4.0.jar,file:///root/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.7.1.jar,file:///root/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar,file:/app/microsoft-spark-2.4.x-0.12.1.jar to environment
20/07/23 19:40:51 INFO DotnetRunner: Adding key=spark.app.name and value=org.apache.spark.deploy.dotnet.DotnetRunner to environment
20/07/23 19:40:51 INFO DotnetRunner: Adding key=spark.submit.deployMode and value=client to environment
20/07/23 19:40:51 INFO DotnetRunner: Adding key=spark.master and value=local to environment
20/07/23 19:40:51 INFO DotnetRunner: Adding key=spark.repl.local.jars and value=file:///root/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.4.1.jar,file:///root/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.1.jar,file:///root/.ivy2/jars/org.apache.kafka_kafka-clients-2.0.0.jar,file:///root/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar,file:///root/.ivy2/jars/org.lz4_lz4-java-1.4.0.jar,file:///root/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.7.1.jar,file:///root/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar to environment
[2020-07-23T19:40:52.2439337Z] [click-stream-consumer] [Info] [ConfigurationService] Using port 37187 for connection.
[2020-07-23T19:40:52.2484335Z] [click-stream-consumer] [Info] [JvmBridge] JvMBridge port is 37187
20/07/23 19:40:52 DEBUG Recycler: -Dio.netty.recycler.maxCapacityPerThread: 32768
20/07/23 19:40:52 DEBUG Recycler: -Dio.netty.recycler.maxSharedCapacityFactor: 2
20/07/23 19:40:52 DEBUG Recycler: -Dio.netty.recycler.linkCapacity: 16
20/07/23 19:40:52 DEBUG Recycler: -Dio.netty.recycler.ratio: 8
20/07/23 19:40:52 DEBUG AbstractByteBuf: -Dio.netty.buffer.bytebuf.checkAccessible: true
20/07/23 19:40:52 DEBUG ResourceLeakDetectorFactory: Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@3bf7768c
20/07/23 19:40:52 INFO SparkContext: Running Spark version 2.4.1
20/07/23 19:40:52 INFO SparkContext: Submitted application: EtlWorkshop.Spark.ClickEventConsumer
20/07/23 19:40:52 INFO SecurityManager: Changing view acls to: root
20/07/23 19:40:52 INFO SecurityManager: Changing modify acls to: root
20/07/23 19:40:52 INFO SecurityManager: Changing view acls groups to: 
20/07/23 19:40:52 INFO SecurityManager: Changing modify acls groups to: 
20/07/23 19:40:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
20/07/23 19:40:52 DEBUG TransportServer: Shuffle server started on port: 44811
20/07/23 19:40:52 INFO Utils: Successfully started service 'sparkDriver' on port 44811.
20/07/23 19:40:52 DEBUG SparkEnv: Using serializer: class org.apache.spark.serializer.JavaSerializer
20/07/23 19:40:52 INFO SparkEnv: Registering MapOutputTracker
20/07/23 19:40:52 DEBUG MapOutputTrackerMasterEndpoint: init
20/07/23 19:40:52 INFO SparkEnv: Registering BlockManagerMaster
20/07/23 19:40:52 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/07/23 19:40:52 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/07/23 19:40:52 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-843b1ec4-7b38-4aa7-8fbd-0da5a718402a
20/07/23 19:40:52 DEBUG DiskBlockManager: Adding shutdown hook
20/07/23 19:40:52 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/07/23 19:40:52 INFO SparkEnv: Registering OutputCommitCoordinator
20/07/23 19:40:52 DEBUG OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: init
20/07/23 19:40:52 DEBUG SecurityManager: Created SSL options for ui: SSLOptions{enabled=false, port=None, keyStore=None, keyStorePassword=None, trustStore=None, trustStorePassword=None, protocol=None, enabledAlgorithms=Set()}
20/07/23 19:40:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/07/23 19:40:53 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://click-stream-consumer:4040
20/07/23 19:40:53 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.4.1.jar at spark://click-stream-consumer:44811/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.4.1.jar with timestamp 1595533253343
20/07/23 19:40:53 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.apache.spark_spark-avro_2.11-2.4.1.jar at spark://click-stream-consumer:44811/jars/org.apache.spark_spark-avro_2.11-2.4.1.jar with timestamp 1595533253344
20/07/23 19:40:53 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.apache.kafka_kafka-clients-2.0.0.jar at spark://click-stream-consumer:44811/jars/org.apache.kafka_kafka-clients-2.0.0.jar with timestamp 1595533253344
20/07/23 19:40:53 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at spark://click-stream-consumer:44811/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1595533253345
20/07/23 19:40:53 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.lz4_lz4-java-1.4.0.jar at spark://click-stream-consumer:44811/jars/org.lz4_lz4-java-1.4.0.jar with timestamp 1595533253345
20/07/23 19:40:53 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.7.1.jar at spark://click-stream-consumer:44811/jars/org.xerial.snappy_snappy-java-1.1.7.1.jar with timestamp 1595533253345
20/07/23 19:40:53 INFO SparkContext: Added JAR file:///root/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar at spark://click-stream-consumer:44811/jars/org.slf4j_slf4j-api-1.7.16.jar with timestamp 1595533253345
20/07/23 19:40:53 INFO SparkContext: Added JAR file:/app/microsoft-spark-2.4.x-0.12.1.jar at spark://click-stream-consumer:44811/jars/microsoft-spark-2.4.x-0.12.1.jar with timestamp 1595533253346
20/07/23 19:40:53 INFO Executor: Starting executor ID driver on host localhost
20/07/23 19:40:53 DEBUG TransportServer: Shuffle server started on port: 32921
20/07/23 19:40:53 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 32921.
20/07/23 19:40:53 INFO NettyBlockTransferService: Server created on click-stream-consumer:32921
20/07/23 19:40:53 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/07/23 19:40:53 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, click-stream-consumer, 32921, None)
20/07/23 19:40:53 DEBUG DefaultTopologyMapper: Got a request for click-stream-consumer
20/07/23 19:40:53 INFO BlockManagerMasterEndpoint: Registering block manager click-stream-consumer:32921 with 366.3 MB RAM, BlockManagerId(driver, click-stream-consumer, 32921, None)
20/07/23 19:40:53 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, click-stream-consumer, 32921, None)
20/07/23 19:40:53 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, click-stream-consumer, 32921, None)
20/07/23 19:40:54 DEBUG SparkContext: Adding shutdown hook
20/07/23 19:40:54 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/app/spark-warehouse').
20/07/23 19:40:54 INFO SharedState: Warehouse path is 'file:/app/spark-warehouse'.
20/07/23 19:40:54 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/07/23 19:40:55 DEBUG KafkaSourceProvider: source: Set key.deserializer to org.apache.kafka.common.serialization.ByteArrayDeserializer, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: source: Set value.deserializer to org.apache.kafka.common.serialization.ByteArrayDeserializer, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: source: Set auto.offset.reset to earliest, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: source: Set enable.auto.commit to false, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: source: Set max.poll.records to 1, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: source: Set receive.buffer.bytes to 65536
20/07/23 19:40:55 DEBUG KafkaSourceProvider: executor: Set key.deserializer to org.apache.kafka.common.serialization.ByteArrayDeserializer, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: executor: Set value.deserializer to org.apache.kafka.common.serialization.ByteArrayDeserializer, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: executor: Set auto.offset.reset to none, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: executor: Set group.id to spark-kafka-source-0feb3f4e-e547-4359-8d0e-d11e3ba434ab--1372555407-executor, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: executor: Set enable.auto.commit to false, earlier value: 
20/07/23 19:40:55 DEBUG KafkaSourceProvider: executor: Set receive.buffer.bytes to 65536
20/07/23 19:40:56 DEBUG AbstractChannelHandlerContext: An exception java.lang.NullPointerException
	at org.apache.spark.api.dotnet.DotnetBackendHandler.exceptionCaught(DotnetBackendHandler.scala:96)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
	at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)
was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.NullPointerException
	at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192)
	at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192)
	at scala.collection.SeqLike$class.size(SeqLike.scala:106)
	at scala.collection.mutable.ArrayOps$ofRef.size(ArrayOps.scala:186)
	at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69)
	at scala.collection.mutable.ArrayBuilder.sizeHint(ArrayBuilder.scala:22)
	at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:230)
	at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
	at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.handleMethodCall(DotnetBackendHandler.scala:171)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.handleBackendRequest(DotnetBackendHandler.scala:85)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.channelRead0(DotnetBackendHandler.scala:28)
	at org.apache.spark.api.dotnet.DotnetBackendHandler.channelRead0(DotnetBackendHandler.scala:23)
	at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)

0 replies

imback82 · 2020-07-23T20:13:32Z

imback82
Jul 23, 2020

Do you need to fix this?

Unhandled exception. System.Net.Http.HttpRequestException: [http://schema-registry:8081/] HttpRequestException: Connection refused
   at Confluent.SchemaRegistry.RestService.ExecuteOnOneInstanceAsync(Func`1 createRequest)
   at Confluent.SchemaRegistry.RestService.RequestAsync[T](String endPoint, HttpMethod method, Object[] jsonBody)
   at Confluent.SchemaRegistry.RestService.GetLatestSchemaAsync(String subject)
   at Confluent.SchemaRegistry.CachedSchemaRegistryClient.GetLatestSchemaAsync(String subject)
   at EtlWorkshop.Spark.ClickStreamConsumer.Program.Main(String[] args) in /src/Program.cs:line 34
   at EtlWorkshop.Spark.ClickStreamConsumer.Program.<Main>(String[] args)

0 replies

rsaltrelli · 2020-07-23T21:23:44Z

rsaltrelli
Jul 23, 2020
Author

Ah. Sort of. The schema registry is still spinning up when my app was trying to look up the schema. Docker Compose isn't the best at managing dependencies and my Spark app isn't super robust yet. Just let the Spark app fall down and start it again.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How do you use Confluent Schema Registry to deserialize Avro from Kafka? #592

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 14 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[Question] How do you use Confluent Schema Registry to deserialize Avro from Kafka? #592

rsaltrelli Jul 17, 2020

Replies: 14 comments

imback82 Jul 17, 2020

rsaltrelli Jul 17, 2020 Author

imback82 Jul 17, 2020

rsaltrelli Jul 20, 2020 Author

rsaltrelli Jul 20, 2020 Author

imback82 Jul 20, 2020

rsaltrelli Jul 23, 2020 Author

rsaltrelli Jul 23, 2020 Author

imback82 Jul 23, 2020

rsaltrelli Jul 23, 2020 Author

imback82 Jul 23, 2020

rsaltrelli Jul 23, 2020 Author

Full Logs

imback82 Jul 23, 2020

rsaltrelli Jul 23, 2020 Author

rsaltrelli
Jul 17, 2020

imback82
Jul 17, 2020

rsaltrelli
Jul 17, 2020
Author

imback82
Jul 17, 2020

rsaltrelli
Jul 20, 2020
Author

rsaltrelli
Jul 20, 2020
Author

imback82
Jul 20, 2020

rsaltrelli
Jul 23, 2020
Author

rsaltrelli
Jul 23, 2020
Author

imback82
Jul 23, 2020

rsaltrelli
Jul 23, 2020
Author

imback82
Jul 23, 2020

rsaltrelli
Jul 23, 2020
Author

imback82
Jul 23, 2020

rsaltrelli
Jul 23, 2020
Author