Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning spam: Unexpected field name numRecords for remove action #614

Closed
Tom-Newton opened this issue May 30, 2022 · 3 comments · Fixed by #651
Closed

Warning spam: Unexpected field name numRecords for remove action #614

Tom-Newton opened this issue May 30, 2022 · 3 comments · Fixed by #651
Labels
bug Something isn't working

Comments

@Tom-Newton
Copy link
Contributor

Tom-Newton commented May 30, 2022

Environment

Delta-rs version: 0.5.7

Binding: Python

Environment:

  • Cloud provider: Azure
  • OS: Ubuntu 18.04
  • Other: Python 3.8

Bug

What happened:
We get an enormous spam of warning logs

[2022-05-30T18:54:37Z WARN  deltalake::action] Unexpected field name `numRecords` for remove action: Row { fields: [("path", Str("part-00031-24527240-103f-4850-b36a-96523433d62d-c000.snappy.parquet")), ("deletionTimestamp", Long(1653935751443)), ("dataChange", Bool(false)), ("extendedFileMetadata", Bool(true)), ("partitionValues", MapInternal(Map { entries: [] })), ("size", Long(1444)), ("tags", Null), ("numRecords", Null)] }

It looks like I get one warning log for every remove action in the transaction log of my table.

What you expected to happen:
Opens the table without logging warnings which I'm pretty sure are un-important.

How to reproduce it:

  1. Use delta-spark 1.2.1
  2. Create a test delta table and apply some random modifications until you've made enough transactions to create a checkpoint.
  3. Try to create a deltalake.DeltaTable object from that table.
  4. You should see a bunch of warnings like the one I included above.

Python script I created to reproduce:
reproduce_deltalake_warnings.zip

More details:
I think a new version of delta-spark has introduced this new numRecords field so I guess delta-rs needs to be updated to handle this. It would be really nice to have a fix for this 🙂 . For large tables with long history retention these warnings are pretty unmanageable.

I have put a hack in my code which sets the env var RUST_LOG="error" to work around this for now.

@Tom-Newton Tom-Newton added the bug Something isn't working label May 30, 2022
@Tom-Newton Tom-Newton changed the title Warning spam Unexpected field name numRecords for remove action Warning spam Unexpected field name numRecords for remove action May 30, 2022
@Tom-Newton Tom-Newton changed the title Warning spam Unexpected field name numRecords for remove action Warning spam: Unexpected field name numRecords for remove action May 30, 2022
@houqp
Copy link
Member

houqp commented May 30, 2022

Looks like this was introduced in delta-io/delta@6f39630.

@kamcheungting-db, @scottsand-db should we update the official spec to reflect this change?

@scottsand-db
Copy link

Hi @houqp - yes, that would be great. Want to make a PR to update https://github.com/delta-io/delta/blob/master/PROTOCOL.md?

@Tom-Newton
Copy link
Contributor Author

I'm pretty keen to get this sorted so I'm going to have a go at making a PR for it. I've never used rust before so we'll see how it goes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants