-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support streaming ingest #222
Comments
Same issue as #217 |
@epa095, As a work around, if this is ok for your specific case, you can set the ingestion batching policy on the database level. |
@yogilad : I have changed the policy on the database level, and both enabled streamingingestion and set ingestionbatching - MaximumBatchingTimeSpan to 10 seconds. By setting KustoSinkOptions.KUSTO_SPARK_INGESTION_PROPERTIES_JSON.[flushImmediately] to true I get the following timings from spark using this sink: "durationMs" : {
"addBatch" : 4826,
"getBatch" : 32,
"latestOffset" : 30,
"queryPlanning" : 5,
"triggerExecution" : 5093,
"walCommit" : 111
}, So writing the batch takes roughly 4,5 seconds. That is of course much better than 5 minutes, so great:-D BUT I am stull under the impression that streaming ingestion is a completely different ingestion mode for kusto, and that we should be able to expect "Latency of less than a second[...]". And I see that the python kusto client has special handling for streaming. So, does this connector support writing to the streaming ingestion endpoint? If it does, what is the "magic toggle"? Or is it just kind of "automagically" enabled if I enable streaming ingestion on the database and set |
The Spark Connector does not currently support streaming ingestion, though we may add this in the future. Thanks for the suggestion! |
Just for clarification we have different solutions that could be used here - you can create yourself blobs and set an EventGrid connection to the cluster with streaming ingest |
The current implementation is now good for streaming support but the discussion for not implementing this is for avoiding wrong usage with spark streaming |
Hi @ohadbitt, can you explain what you mean by 'wrong usage with Spark streaming'? And regarding the new version: If I understood the documentation correctly, with table ingestion batching policies you get a minimum latency of 10 seconds. This is not sufficient for time critical streaming use cases (like ours). So it would be great to have a way to use the streaming ingestion feature of Data Explorer directly through the Spark Connector. |
I am looking into how low latency I can get between spark and ADX. And I see that one way to get lower latency is to enable Streaming ingestion policy in both the ADX cluster and database/table. But after enabling I still see that it takes 5 minutes for my batches from spark (through this connector) to arive.
The documentation only mentiones Ingestion Batching Policy, making me think that maybe this connector does not support using the streaming ingestion into ADX? If so, it would be very nice (and natural) if it started supporting it. Maybe this new feature in azure-kusto-python makes it easier?
The text was updated successfully, but these errors were encountered: