Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add managed streaming ingest #344

Merged
merged 60 commits into from
Dec 19, 2021
Merged

Add managed streaming ingest #344

merged 60 commits into from
Dec 19, 2021

Conversation

AsafMah
Copy link
Collaborator

@AsafMah AsafMah commented Oct 31, 2021

This PR Adds ManagedStreaming ingest client.
The behavior is modeled after the c# version in regards to control, parameters and retries.

Features:
-Return values from ingestions that indicate the ingestion type
-Adding a special client request id to ManagedStreamingIngestClient
-Adding a custom client request id to Streaming and ManagedStreamingIngestClient

  • Parsing kusto api errors with KustoApiError/KustoMultiApiError
  • Arranging ingest clients in an hirearchy, to share code and interface
  • Adding "ingest_from_stream" functionallity to non-streaming ingest client

Bug Fixes:

  • Correcting "Missing Mapping" error to work with all types of ingest clients and all types of formats
  • Not compressing binary formats in all ingestions, instead of just streaming
  • Adding missing auth methods to ValidKeywords.parse (to help copying the connection string)
  • Properly close clients on tests

Old description:

This release adds ManagedStreamingIngestClient to python.
ManagedStreamingIngestClient is a client that tries to ingest via Streaming Ingestion, and falls back to Queued after multiple retries or other conditions.


This seemingly simple feature turned out to be more complex since it required feature-parity, re-designing and some bug fixing.

Here are the other additions that this PR adds in order to support ManagedStreamingIngestClient,
Let me know if you want to extract some of them to their own PRs:

Features:
- Parsing kusto api errors with KustoApiError/KustoMultiApiError
- Arranging obtaniers in an hirearchy, to share code and interface
- Adding "ingest_from_stream" functionallity to non-streaming ingest client

Bug Fixes:
- Correcting "Missing Mapping" error to work with all types of ingest clients and all types of formats
- Not compressing binary formats in all ingestions, instead of just streaming
- Adding missing auth methods to ValidKeywords.parse (to help copying the connection string)
- Properly close clients on tests


Open questions (see TODOs in code):
- What to do when the stream isn't seekable?  
The options are:
    1. Raise an exception saying we don't support non-seekable streams
    2. Read the stream into a list and wrap with BytesIO (might use lots of memory)
    3. Send it directly to queued ingest  

    What other SDKs do?
    * C# - No special handling at all, it will try to call seek and throw an exception when not possible
    * Java - Will try to seek, and when it fails it will send a custom user-friendly exception
    * Node - There isn't a concept of seekable streams in node, so we always save it to an array, and retry from that

    What is the correct thing to do here (and maybe change the other sdks to act this way too?)
- What to do on "Kusto.DataNode.Exceptions.StreamingIngestionRequestException"?  
  This exception is raised when the table doesn't support streaming ingestions at all.  
  This is technically a permanent error, but in ManagedStreamingIngestions we can fallback to queued.  
  I implemented this fallback here, but it might not be "correct".
  Other SDKS do not handle this case, and we may want to raise the error for the user.

- Reasons - I added here reasons as to why the ingestion fell back to queued (for example, "Streaming ingestion exceeded maximum retry count, defaulting to queued ingestion").  
These currently don't exist in other sdks, and here are "just strings".   
Do we want them to be in a pretty enum so the user can check against them, or do we not want this support?  
Should we add them to other clients?

- Sleep - In the original C# sdk, we always sleep with backoff on a transient streaming failure.
    In Java we skipped this (don't remember why, think because it's sync), and in node it's there (node is async).  
    It's also here in python, but you can optionally change it or turn it off.  
    We probably want a unified behavior for all of the SDKs, but the question is what.

- Max Retries - Same as sleep, only this one is only configurable in node, we need to decide if we want to accept this from the user.

ArielYehezkely
ArielYehezkely previously approved these changes Dec 6, 2021
ArielYehezkely
ArielYehezkely previously approved these changes Dec 19, 2021
@AsafMah AsafMah merged commit ea79960 into master Dec 19, 2021
@AsafMah AsafMah deleted the add-managed-streaming branch April 24, 2022 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants