-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Dremio integration #135
Conversation
sassmith
commented
Aug 15, 2024
- Add support for Dremio integration
- use Pyarrow FlightClient to execute and fetch results
- Create a description object from the Pyarrow FlightStreamReader schema to return to the collector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks great Sam, left a few minor comments, let me know if you have any questions/comments
"rowcount": len(records), | ||
} | ||
|
||
def _set_description(self, schema: Schema) -> List[List[Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we name this method _get_dbapi_description
? the "_set" part is confusing to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think you can convert this method to return List[Tuple]
, see below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good on the rename and List[Tuple] 👍
|
||
def __init__(self, credentials: Optional[Dict], **kwargs: Any): | ||
super().__init__(connection_type="dremio") | ||
if not credentials or _ATTR_CONNECT_ARGS not in credentials: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connect_args
is intended to be used when we're passing everything in connect_args to a connect method like connect(**connect_args)
, and the idea is that we can pass new parameters without having to update the agent code
in this case it seems you're getting only host
and token
from connect_args, so no need to use it, you can check the code for Looker
where we're getting client_id
and client_secret
directly from credentials
Also, in this case, if flight.connect
accepts more parameters than host, even when we're not using them now, might be good to keep connect_args and use flight.connect(**connect_args)
and specify token
in a separate property in credentials.
makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes total sense! implemented those changes
return "unknown" | ||
|
||
return [ | ||
[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can return a tuple here using ()
instead of []
which is more dbapi compliant and will be converted automatically to a tuple for serialization anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, left a minor comment
if not credentials or _ATTR_CONNECT_ARGS not in credentials: | ||
raise ValueError( | ||
f"Dremio agent client requires {_ATTR_CONNECT_ARGS} in credentials" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if token
is required maybe we should add a validation here too?