Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process fails without error message when executing merge #2310

Closed
cesar-vermeulen opened this issue Mar 21, 2024 · 1 comment
Closed

Process fails without error message when executing merge #2310

cesar-vermeulen opened this issue Mar 21, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@cesar-vermeulen
Copy link

cesar-vermeulen commented Mar 21, 2024

Environment

Delta-rs version:
0.16.1
Binding:
Python
Environment:

  • Cloud provider: Azure
  • OS: Docker image on kubernetes, running Debian GNU/Linux 11 (bullseye)

Bug

What happened:

During execution of following code, pod crashes without any error message:

  dt = DeltaTable(full_path)
  merge_definition = dt.merge(
      source=df,
      predicate=construct_merge_statement_predicates(primary_keys),
      source_alias="source",
      target_alias="target",
  )

  if write_method.merge_mode["update_when_matched"]: 
      merge_definition = merge_definition.when_matched_update_all()
  if write_method.merge_mode["insert_when_not_matched"]: 
      merge_definition = merge_definition.when_not_matched_insert_all()
  LOGGER.info("Starting merge")
  merge_execution = merge_definition.execute()
  LOGGER.info("Done merging")

Effectively, the code reaches the merge statement, but then crashes after only 10 seconds without passing any error messages (in python console).
However, this is a table that exists in 2 schemas in our source database, and gets converted to the same pandas schema. This code runs successfully on a daily basis for one table, but fails to be executed for the other, whilst having relatively the same amount of rows (source being ~1.5m rows, target ~20k rows. Both update when matched and insert when not matched are set to true for both scenarios.

It appears to me that the process crashes in the rust backend, but I have no clue how to further debug this.

What you expected to happen:
Successful merge execute statements
How to reproduce it:

/
More details:

  • There are no concurrent writes
  • First 6 merges (in this case) have succeeded
  • Pandas dataframes are being used
@cesar-vermeulen cesar-vermeulen added the bug Something isn't working label Mar 21, 2024
@ion-elgreco
Copy link
Collaborator

@cesar-vermeulen are you not simply running OOM?

set this env variable, RUST_LOG=debug, and check the logs for something fishy

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants