[FEATURE REQUEST]: Support for sql-spark-connector #611

rrekapalli · 2020-08-04T10:29:12Z

rrekapalli
Aug 4, 2020

Can we expect support for "sql-spark-connector"(https://github.com/microsoft/sql-spark-connector)? Currently it's only available for Scala and Python. Would be a great addition if it's directly implemented in this library.

imback82 · 2020-08-05T06:41:24Z

imback82
Aug 5, 2020

@rrekapalli this should just work if you do

df.Write()
    .Format("com.microsoft.sqlserver.jdbc.spark")
    .Mode("append")
    .Option("url", url)
    .Option("dbtable", table_name)
    .Option("user", username)
    .Option("password", password)
    .Save();

Make sure to pass the jar file for the connector when you run spark-submit.

0 replies

rrekapalli · 2020-08-05T11:13:01Z

rrekapalli
Aug 5, 2020
Author

Hi @imback82 , thank you for the quick reply. I tried to write to the database with below command:

spark-submit
--class org.apache.spark.deploy.dotnet.DotnetRunner
--jars azure-sqldb-spark-1.0.0.jar
--master local[*] microsoft-spark-2.4.x-0.12.1.jar dotnet Rules.Demo.dll

but got the following error.

20/08/05 16:26:52 ERROR Executor: Exception in task 6.0 in stage 17.0 (TID 1016)
java.lang.NoSuchMethodError: com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(Lcom/microsoft/sqlserver/jdbc/ISQLServerBulkRecord;)V
at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.bulkWrite(BulkCopyUtils.scala:91)
at com.microsoft.sqlserver.jdbc.spark.BulkCopyUtils$.savePartition(BulkCopyUtils.scala:43)
at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$$anonfun$write$2.apply(BestEffortSingleInstanceStrategy.scala:30)
at com.microsoft.sqlserver.jdbc.spark.SingleInstanceWriteStrategies$$anonfun$write$2.apply(BestEffortSingleInstanceStrategy.scala:29)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Appreciate if you could through some light.

0 replies

imback82 · 2020-08-05T23:05:07Z

imback82
Aug 5, 2020

Looks like you are missing dependencies. Can you try with --packages com.microsoft.azure:azure-sqldb-spark:1.0.2 to resolve dependencies?

0 replies

rrekapalli · 2020-08-06T04:26:36Z

rrekapalli
Aug 6, 2020
Author

Still no luck! When I tired with "azure-sqldb-spark:1.0.1.jar" I got dependencies errors. So tried with the following:

spark-submit
--class org.apache.spark.deploy.dotnet.DotnetRunner
--jars "azure-sqldb-spark-1.0.1-jar-with-dependencies.jar"
--packages com.microsoft.azure:azure-sqldb-spark:1.0.1
--master local[*] microsoft-spark-2.4.x-0.12.1.jar dotnet Rules.Demo.dll

******* Output Error Message *****************

Ivy Default Cache set to: C:\Users\rajar.ivy2\cache
The jars for the packages stored in: C:\Users\rajar.ivy2\jars
:: loading settings :: url = jar:file:/C:/Spark/bin/spark-2.4.5-bin-hadoop2.7/jars/azure-sqldb-spark-1.0.1-jar-with-dependencies.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.azure#azure-sqldb-spark added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
:: resolution report :: resolve 3367ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
module not found: com.microsoft.azure#azure-sqldb-spark;1.0.1

    ==== local-m2-cache: tried

      file:/C:/Users/rajar/.m2/repository/com/microsoft/azure/azure-sqldb-spark/1.0.1/azure-sqldb-spark-1.0.1.pom

      -- artifact com.microsoft.azure#azure-sqldb-spark;1.0.1!azure-sqldb-spark.jar:

      file:/C:/Users/rajar/.m2/repository/com/microsoft/azure/azure-sqldb-spark/1.0.1/azure-sqldb-spark-1.0.1.jar

    ==== local-ivy-cache: tried

      C:\Users\rajar\.ivy2\local\com.microsoft.azure\azure-sqldb-spark\1.0.1\ivys\ivy.xml

      -- artifact com.microsoft.azure#azure-sqldb-spark;1.0.1!azure-sqldb-spark.jar:

      C:\Users\rajar\.ivy2\local\com.microsoft.azure\azure-sqldb-spark\1.0.1\jars\azure-sqldb-spark.jar

    ==== central: tried

      https://repo1.maven.org/maven2/com/microsoft/azure/azure-sqldb-spark/1.0.1/azure-sqldb-spark-1.0.1.pom

      -- artifact com.microsoft.azure#azure-sqldb-spark;1.0.1!azure-sqldb-spark.jar:

      https://repo1.maven.org/maven2/com/microsoft/azure/azure-sqldb-spark/1.0.1/azure-sqldb-spark-1.0.1.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/com/microsoft/azure/azure-sqldb-spark/1.0.1/azure-sqldb-spark-1.0.1.pom

      -- artifact com.microsoft.azure#azure-sqldb-spark;1.0.1!azure-sqldb-spark.jar:

      http://dl.bintray.com/spark-packages/maven/com/microsoft/azure/azure-sqldb-spark/1.0.1/azure-sqldb-spark-1.0.1.jar

            ::::::::::::::::::::::::::::::::::::::::::::::

            ::          UNRESOLVED DEPENDENCIES         ::

            ::::::::::::::::::::::::::::::::::::::::::::::

            :: com.microsoft.azure#azure-sqldb-spark;1.0.1: not found

            ::::::::::::::::::::::::::::::::::::::::::::::

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.microsoft.azure#azure-sqldb-spark;1.0.1: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1197)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:304)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

0 replies

rrekapalli · 2020-08-06T04:31:47Z

rrekapalli
Aug 6, 2020
Author

Also, tried with the following with no luck:

spark-submit
--class org.apache.spark.deploy.dotnet.DotnetRunner
--jars "azure-sqldb-spark-1.0.2-jar-with-dependencies.jar"
--packages com.microsoft.azure:azure-sqldb-spark:1.0.2
--master local[*] microsoft-spark-2.4.x-0.12.1.jar dotnet Rules.Demo.dll

Following is the error message:

********************* output error message *******************************

Ivy Default Cache set to: C:\Users\rajar.ivy2\cache
The jars for the packages stored in: C:\Users\rajar.ivy2\jars
:: loading settings :: url = jar:file:/C:/Spark/bin/spark-2.4.5-bin-hadoop2.7/jars/azure-sqldb-spark-1.0.1-jar-with-dependencies.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.azure#azure-sqldb-spark added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.microsoft.azure#azure-sqldb-spark;1.0.2 in central
found org.scalactic#scalactic_2.11;3.0.4 in central
found org.scala-lang#scala-reflect;2.11.11 in central
found com.microsoft.azure#adal4j;1.2.0 in central
found com.nimbusds#oauth2-oidc-sdk;5.18.1 in central
found javax.mail#mail;1.4.7 in central
found javax.activation#activation;1.1 in central
found com.github.stephenc.jcip#jcip-annotations;1.0-1 in central
found org.apache.commons#commons-lang3;3.4 in central
found org.apache.commons#commons-collections4;4.1 in central
found net.minidev#json-smart;1.3.1 in central
found com.nimbusds#lang-tag;1.5 in central
[1.5] com.nimbusds#lang-tag;[1.4.3,)
found com.nimbusds#nimbus-jose-jwt;8.19 in central
[8.19] com.nimbusds#nimbus-jose-jwt;[4.29,)
found com.google.code.gson#gson;2.2.4 in central
found org.slf4j#slf4j-api;1.7.5 in central
found commons-codec#commons-codec;1.10 in central
found com.microsoft.sqlserver#mssql-jdbc;6.4.0.jre8 in central
found org.apache.maven.plugins#maven-gpg-plugin;1.6 in central
found org.apache.maven#maven-plugin-api;2.2.1 in central
found org.apache.maven#maven-project;2.2.1 in central
found org.apache.maven#maven-settings;2.2.1 in central
found org.apache.maven#maven-model;2.2.1 in central
found org.codehaus.plexus#plexus-interpolation;1.11 in central
found org.codehaus.plexus#plexus-container-default;1.0-alpha-9-stable-1 in central
found junit#junit;3.8.1 in central
found classworlds#classworlds;1.1 in central
found org.apache.maven#maven-profile;2.2.1 in central
found org.apache.maven#maven-artifact-manager;2.2.1 in central
found org.apache.maven#maven-repository-metadata;2.2.1 in central
found org.apache.maven#maven-artifact;2.2.1 in central
found org.apache.maven.wagon#wagon-provider-api;1.0-beta-6 in central
found backport-util-concurrent#backport-util-concurrent;3.1 in central
found org.apache.maven#maven-plugin-registry;2.2.1 in central
found org.codehaus.plexus#plexus-utils;3.0.20 in central
found org.sonatype.plexus#plexus-sec-dispatcher;1.4 in central
found org.sonatype.plexus#plexus-cipher;1.4 in central
:: resolution report :: resolve 6166ms :: artifacts dl 60ms
:: modules in use:
backport-util-concurrent#backport-util-concurrent;3.1 from central in [default]
classworlds#classworlds;1.1 from central in [default]
com.github.stephenc.jcip#jcip-annotations;1.0-1 from central in [default]
com.google.code.gson#gson;2.2.4 from central in [default]
com.microsoft.azure#adal4j;1.2.0 from central in [default]
com.microsoft.azure#azure-sqldb-spark;1.0.2 from central in [default]
com.microsoft.sqlserver#mssql-jdbc;6.4.0.jre8 from central in [default]
com.nimbusds#lang-tag;1.5 from central in [default]
com.nimbusds#nimbus-jose-jwt;8.19 from central in [default]
com.nimbusds#oauth2-oidc-sdk;5.18.1 from central in [default]
commons-codec#commons-codec;1.10 from central in [default]
javax.activation#activation;1.1 from central in [default]
javax.mail#mail;1.4.7 from central in [default]
junit#junit;3.8.1 from central in [default]
net.minidev#json-smart;1.3.1 from central in [default]
org.apache.commons#commons-collections4;4.1 from central in [default]
org.apache.commons#commons-lang3;3.4 from central in [default]
org.apache.maven#maven-artifact;2.2.1 from central in [default]
org.apache.maven#maven-artifact-manager;2.2.1 from central in [default]
org.apache.maven#maven-model;2.2.1 from central in [default]
org.apache.maven#maven-plugin-api;2.2.1 from central in [default]
org.apache.maven#maven-plugin-registry;2.2.1 from central in [default]
org.apache.maven#maven-profile;2.2.1 from central in [default]
org.apache.maven#maven-project;2.2.1 from central in [default]
org.apache.maven#maven-repository-metadata;2.2.1 from central in [default]
org.apache.maven#maven-settings;2.2.1 from central in [default]
org.apache.maven.plugins#maven-gpg-plugin;1.6 from central in [default]
org.apache.maven.wagon#wagon-provider-api;1.0-beta-6 from central in [default]
org.codehaus.plexus#plexus-container-default;1.0-alpha-9-stable-1 from central in [default]
org.codehaus.plexus#plexus-interpolation;1.11 from central in [default]
org.codehaus.plexus#plexus-utils;3.0.20 from central in [default]
org.scala-lang#scala-reflect;2.11.11 from central in [default]
org.scalactic#scalactic_2.11;3.0.4 from central in [default]
org.slf4j#slf4j-api;1.7.5 from central in [default]
org.sonatype.plexus#plexus-cipher;1.4 from central in [default]
org.sonatype.plexus#plexus-sec-dispatcher;1.4 from central in [default]
:: evicted modules:
net.minidev#json-smart;[1.3.1,2.3] by [net.minidev#json-smart;1.3.1] in [default]
org.codehaus.plexus#plexus-utils;1.5.15 by [org.codehaus.plexus#plexus-utils;3.0.20] in [default]
org.codehaus.plexus#plexus-utils;1.5.5 by [org.codehaus.plexus#plexus-utils;3.0.20] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 39 | 2 | 0 | 3 || 36 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 36 already retrieved (0kB/43ms)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Spark/bin/spark-2.4.5-bin-hadoop2.7/jars/azure-sqldb-spark-1.0.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Spark/bin/spark-2.4.5-bin-hadoop2.7/jars/azure-sqldb-spark-1.0.2-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Spark/bin/spark-2.4.5-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.IllegalArgumentException: Unsupported spark version used: 2.2.1. Normalized spark version used: 2.2.1. Supported versions: 2.4.0, 2.4.4, 2.4.5, 2.4.1, 2.4.6, 2.4.3
at org.apache.spark.deploy.dotnet.DotnetRunner$.validateSparkVersions(DotnetRunner.scala:171)
at org.apache.spark.deploy.dotnet.DotnetRunner$.main(DotnetRunner.scala:47)
at org.apache.spark.deploy.dotnet.DotnetRunner.main(DotnetRunner.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

0 replies

rrekapalli · 2020-08-06T04:35:17Z

rrekapalli
Aug 6, 2020
Author

My intention for using this driver is to speedup large volume of data transfer between Spark and SQL Server. Appreciate if you could suggest an alternative (DateStreamWriter or something) if this is not the correct approach (can't replace SQL Server for our use case).

0 replies

imback82 · 2020-08-06T04:54:24Z

imback82
Aug 6, 2020

Can you create an issue in https://github.com/Azure/azure-sqldb-spark for the dependency issue? You should be able to repro it with spark-shell --packages com.microsoft.azure:azure-sqldb-spark:1.0.2.

I am not familiar with benchmarks for different SQL connectors. @rapoth do you happen to know? Basically, you want to write the results to SQL server right?

0 replies

rrekapalli · 2020-08-07T05:38:22Z

rrekapalli
Aug 7, 2020
Author

Sure, I'll create an issue there. Thank you.

Also, that's correct. I am trying to both read & write a large volume of data to/from SQL Server. I was able to do that with an MSSQL JDBC driver (mssql-jdbc-8.2.2.jre8.jar) and was successful. However, since data is large it was taking long time to write. I came across this connector which supports bulk operations while looking for a better alternative.

@rapoth , could you please suggest what's the best alternative/approach for large data reads and writes from SQL Server using .NET for Spark?

Thanks.

0 replies

Rafnel · 2024-10-07T17:34:38Z

Rafnel
Oct 7, 2024

@rrekapalli, I know it has been 4 years but did you ever figure out how to use the sql-spark-connector and how to do bulk inserts with it? We also have noticed that dotnet spark using the default com.microsoft.sqlserver.jdbc.SQLServerDriver driver inserts one row at a time which is incredibly inefficient. Seems that in general dotnet spark might have lost support from Microsoft, so maybe would be best to stop using it, but I still have hope lol.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE REQUEST]: Support for sql-spark-connector #611

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

[FEATURE REQUEST]: Support for sql-spark-connector #611

rrekapalli Aug 4, 2020

Replies: 9 comments

imback82 Aug 5, 2020

rrekapalli Aug 5, 2020 Author

imback82 Aug 5, 2020

rrekapalli Aug 6, 2020 Author

rrekapalli Aug 6, 2020 Author

rrekapalli Aug 6, 2020 Author

imback82 Aug 6, 2020

rrekapalli Aug 7, 2020 Author

Rafnel Oct 7, 2024

rrekapalli
Aug 4, 2020

imback82
Aug 5, 2020

rrekapalli
Aug 5, 2020
Author

imback82
Aug 5, 2020

rrekapalli
Aug 6, 2020
Author

rrekapalli
Aug 6, 2020
Author

rrekapalli
Aug 6, 2020
Author

imback82
Aug 6, 2020

rrekapalli
Aug 7, 2020
Author

Rafnel
Oct 7, 2024