-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TableMerger - when_matched_delete() fails when Column names contain special characters #2438
Labels
bug
Something isn't working
Comments
I forgot to add the error message: ---------------------------------------------------------------------------
DeltaError Traceback (most recent call last)
Cell In[6], line 16
6 dt = DeltaTable("tmp")
7 new_data = pa.table({"x": [2, 3]})
9 (
10 dt.merge(
11 source=new_data,
12 predicate='target.x = source.x',
13 source_alias='source',
14 target_alias='target')
15 .when_matched_delete()
---> 16 .execute()
17 )
File c:\...\.venv\Lib\site-packages\deltalake\table.py:1778, in TableMerger.execute(self)
1772 def execute(self) -> Dict[str, Any]:
1773 """Executes `MERGE` with the previously provided settings in Rust with Apache Datafusion query engine.
1774
1775 Returns:
1776 Dict: metrics
1777 """
-> 1778 metrics = self.table._table.merge_execute(
1779 source=self.source,
1780 predicate=self.predicate,
1781 source_alias=self.source_alias,
1782 target_alias=self.target_alias,
1783 safe_cast=self.safe_cast,
1784 writer_properties=self.writer_properties._to_dict()
1785 if self.writer_properties
1786 else None,
1787 custom_metadata=self.custom_metadata,
1788 matched_update_updates=self.matched_update_updates,
1789 matched_update_predicate=self.matched_update_predicate,
1790 matched_delete_predicate=self.matched_delete_predicate,
1791 matched_delete_all=self.matched_delete_all,
1792 not_matched_insert_updates=self.not_matched_insert_updates,
1793 not_matched_insert_predicate=self.not_matched_insert_predicate,
1794 not_matched_by_source_update_updates=self.not_matched_by_source_update_updates,
1795 not_matched_by_source_update_predicate=self.not_matched_by_source_update_predicate,
1796 not_matched_by_source_delete_predicate=self.not_matched_by_source_delete_predicate,
1797 not_matched_by_source_delete_all=self.not_matched_by_source_delete_all,
1798 )
1799 self.table.update_incremental()
1800 return json.loads(metrics)
DeltaError: Generic DeltaTable error: Schema error: No field named __delta_rs_c_y. Valid fields are source.x, __delta_rs_source, target.x, target."y--1", target.__delta_rs_path, __delta_rs_target, __delta_rs_operation, __delta_rs_c_x, "__delta_rs_c_y--1", __delta_rs_delete, __delta_rs_target_insert, __delta_rs_target_update, __delta_rs_target_delete, __delta_rs_target_copy. |
Blajda
pushed a commit
that referenced
this issue
Apr 23, 2024
…2441) # Description @Blajda I don't think `from_qualified_name_ignore_case` was needed here since the delta_fields don't have relation information, they are just the column names. `from_qualified_name_ignore_case` will try to parse `__delta_rs_c_y--1` and results into `__delta_rs_c_y`, while `from_name `just keeps the column name as-is, which is preferred. # Related Issue(s) - closes #2438
Wow - that was fast! Thanks |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment
Delta-rs version: 0.16.4
Binding: python
Environment:
Bug
What happened:
At our company we are currently thinking about using delta for our sensor data. The package delta-rs provides pretty much all the functionality we need. However, for reasons I won't be able to change we often have column names with two dashes, e.g. "y--1" ("y-1" works). We need to be able to delete data from the delta lake. When using the TableMerger this fails as shown in the example below.
In the documentation it says the following:
However, there is no argument in "when_matched_delete()" to specifiy the columns with special characters.
What you expected to happen:
I guess the desired behaviour would be that you can simply delete the matching rows, even when the column names contain special characters.
I would be happy to give a fix a shot (also in rust) - but I would need some guidance along the way.
How to reproduce it:
The text was updated successfully, but these errors were encountered: