Replies: 1 comment 21 replies
-
@adriangb there are many reasons but I'll try to give a couple examples:
|
Beta Was this translation helpful? Give feedback.
21 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to understand where time goes in writing, which seems to take ~500ms for me even with small files and ~50 commits.
My test code looks as follows:
I'm running a proxy on
localhost:8000
so that I can see every request made. For only the last write (the one that's timed) I see:That's a surprising amount of requests. I don't understand why it's necessary to re-read the entire commit history: it should be up to date, at most a single list operation should confirm that. I also don't understand why it's reading data from parquet files before writing.
All in all I would have thought that only 4 or 5 requests would be needed:
_delta_log
to get the last commit id.Am I misunderstanding the Delta Lake protocol? Should I be using something lower-level?
Beta Was this translation helpful? Give feedback.
All reactions