-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MongoDB/PyMongo: Amalgamate PyMongo driver to use CrateDB as backend #83
Conversation
27d19ad
to
be8f833
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #83 +/- ##
===========================================
+ Coverage 69.35% 82.01% +12.65%
===========================================
Files 73 81 +8
Lines 2826 3247 +421
===========================================
+ Hits 1960 2663 +703
+ Misses 866 584 -282
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
187a3d3
to
c7f724c
Compare
## Synopsis | ||
```python | ||
import pymongo | ||
from cratedb_toolkit.adapter.pymongo import PyMongoCrateDbAdapter | ||
|
||
with PyMongoCrateDbAdapter(dburi="crate://crate@localhost:4200"): | ||
client = pymongo.MongoClient("localhost", 27017) | ||
collection = client.foo.bar | ||
|
||
inserted_id = collection.insert_one({"author": "Mike", "text": "My first blog post!"}).inserted_id | ||
print(inserted_id) | ||
|
||
document = collection.find_one({"author": "Mike"}) | ||
print(document) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the "naming things" aspect: I am still not sure whether to say PyMongoCrateDbAdapter
or PyMongoCrateDBAdapter
. What do you think is better?
On another recent occasion, we used the name CrateDBTestAdapter
. So, if we agree to always use CrateDB with capitalized DB, this spot will need to be adjusted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I like PyMongoCrateDBAdapter
slightly better, contrary to what I've initially written down. But iirc, @matriv has a different stance on this. Maybe because he is used to Java naming conventions, being somewhat stricter on this topic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep let's go with DB
both captitals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated with 6d2730a.
## Examples | ||
|
||
To inspect the capabilities of the driver adapter, there is a basic program at | ||
[pymongo_adapter.py], and test cases at [test_pymongo.py]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with 592442e.
d95cedb
to
592442e
Compare
Setting the CrateDB column policy to `dynamic` means that new columns can be added without needing to explicitly change the table definition by running corresponding `ALTER TABLE` statements. -- https://cratedb.com/docs/crate/reference/en/latest/general/ddl/column-policy.html#dynamic
2b76fe7
to
c925151
Compare
def patch_types_map(): | ||
""" | ||
Register missing timestamp data type. | ||
""" | ||
# TODO: Submit patch to `crate-python`. | ||
TYPES_MAP["timestamp without time zone"] = sqltypes.TIMESTAMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch should be submitted to sqlalchemy-cratedb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a corresponding note to the backlog.
cratedb_toolkit/util/pandas.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those patches might also be submitted to sqlalchemy-cratedb
in one way or another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a corresponding note to the backlog.
def adjust_sqlalchemy(self): | ||
""" | ||
Configure CrateDB SQLAlchemy dialect. | ||
|
||
Setting the CrateDB column policy to `dynamic` means that new columns | ||
can be added without needing to explicitly change the table definition | ||
by running corresponding `ALTER TABLE` statements. | ||
|
||
https://cratedb.com/docs/crate/reference/en/latest/general/ddl/column-policy.html#dynamic | ||
""" | ||
# 1. Patch data types for CrateDB dialect. | ||
# TODO: Upstream to `sqlalchemy-cratedb`. | ||
patch_types_map() | ||
|
||
# 2. Prepare pandas. | ||
# TODO: Provide unpatching hook. | ||
# TODO: Use `with table_kwargs(...)`. | ||
from cratedb_toolkit.util.pandas import patch_pandas_sqltable_with_dialect_parameters | ||
|
||
patch_pandas_sqltable_with_dialect_parameters(table_kwargs={"crate_column_policy": "'dynamic'"}) | ||
patch_pandas_sqltable_with_extended_mapping() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be generalized / upstreamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a corresponding note to the backlog.
def insert_returning_id(pd_table, conn, keys, data_iter): | ||
""" | ||
Use CrateDB's "bulk operations" endpoint as a fast path for pandas' and Dask's `to_sql()` [1] method. | ||
|
||
The idea is to break out of SQLAlchemy, compile the insert statement, and use the raw | ||
DBAPI connection client, in order to be able to amend the SQL statement, adding a | ||
`RETURNING _id` clause. | ||
|
||
The vanilla implementation, used by SQLAlchemy, is:: | ||
|
||
data = [dict(zip(keys, row)) for row in data_iter] | ||
conn.execute(pd_table.table.insert(), data) | ||
""" | ||
nonlocal object_id_cratedb | ||
|
||
# Compile SQL statement and materialize batch. | ||
sql = str(pd_table.table.insert().compile(bind=conn)) | ||
data = list(data_iter) | ||
|
||
# Invoke amended insert operation, returning the record | ||
# identifier as surrogate to MongoDB's `ObjectId`. | ||
cursor = conn._dbapi_connection.cursor() | ||
cursor.execute(sql=sql + " RETURNING _id", parameters=data[0]) | ||
outcome = cursor.fetchone() | ||
object_id_cratedb = outcome[0] | ||
cursor.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to be upstreamed like sqlalchemy-cratedb
's insert_bulk
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a corresponding note to the backlog.
With corresponding improvements, the amalgamated PyMongo driver can now run 95% of the MongoDB "getting started" tutorial successfully.
It needs to balance SQLAlchemy 1.x vs. 2.x throughout the toolkit test cases, because JessiQL still uses SQLAlchemy 1.x.
Use `PyMongoCrateDBAdapter`, with capitalized "DB".
About
This patch amalgamates the PyMongo driver to talk to CrateDB instead of a MongoDB server.
Documentation
Preview: https://cratedb-toolkit--83.org.readthedocs.build/adapter/pymongo.html
Synopsis
Status
It can run 95% of the MongoDB "getting started" tutorial successfully.
-- https://pymongo.readthedocs.io/en/stable/tutorial.html
-- test_pymongo.py
Backlog