You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some data saved in .xlsx documents (~ 10,500 rows, 4.3 MB in total) that I need to ingest into an on-disk SurrealDB instance. This is my current method:
The last await statement takes 1 minute to complete on its own (I'm using VS Code / Jupyter, which displays the runtimes for each cell).
The Problem/Question: this seems pretty slow to me. Is there a better way to do this in Python (trying to get around the xy problem here)? Or is SurrealDB / SurrealDB.py just not that fast?
Describe the solution
If the solution for this kind of problem is batching with db.query for fewer round trips, then it would be nice to include some kind of batching functionality in db.create instead. The create function could accept an array of dicts and a kwarg specifying the batch size.
If that's not the issue, than I don't know.
Alternative methods
Unknown.
SurrealDB version
1.5.1 for linux on x86_64
surrealdb.py version
surrealdb 0.3.2 for linux on x86_64 using Python 3.10.12
I did try using db.query to send the data all at once, but the query was too large. I haven't tried breaking up that large command yet, but that might also be a good option.
If that does end up working, it might be nice to include some kind of batching functionality in the insert function itself (i.e. maybe allow arrays to be passed to insert). That's the feature request part of this issue.
You can use asyncio.gather to launch all the requests at the same time. Docs.
Another approach would be to create a huge string with all the queries, and send it all at once, maybe in a transaction?
I was trying the gather() for batch querying some data that was too big for a single request:
tasks = [db.query(f"select * from {id}") for id in ids]
out = asyncio.gather(*tasks)
However, I'm getting a RuntimeError: cannot call recv while another coroutine is already waiting for the next message -- I haven't dug into it yet, but maybe someone here knows what the issue might be.
Is your feature request related to a problem?
I have some data saved in
.xlsx
documents (~ 10,500 rows, 4.3 MB in total) that I need to ingest into an on-disk SurrealDB instance. This is my current method:The last await statement takes 1 minute to complete on its own (I'm using VS Code / Jupyter, which displays the runtimes for each cell).
The Problem/Question: this seems pretty slow to me. Is there a better way to do this in Python (trying to get around the xy problem here)? Or is SurrealDB / SurrealDB.py just not that fast?
Describe the solution
If the solution for this kind of problem is batching with
db.query
for fewer round trips, then it would be nice to include some kind of batching functionality indb.create
instead. The create function could accept an array of dicts and a kwarg specifying the batch size.If that's not the issue, than I don't know.
Alternative methods
Unknown.
SurrealDB version
1.5.1 for linux on x86_64
surrealdb.py version
surrealdb 0.3.2 for linux on x86_64 using Python 3.10.12
Contact Details
[email protected]
Is there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: