Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No snapshot or version 0 found, perhaps /Users/watsy0007/resources/test_table/ is an empty dir? #2016

Closed
watsy0007 opened this issue Jan 2, 2024 · 4 comments · Fixed by #2044
Labels
binding/python Issues for the Python package bug Something isn't working

Comments

@watsy0007
Copy link

Environment

Delta-rs version:
0.17.0

Binding:
python-v0.15.0

Environment: MacOS

  • OS: MacOS

Bug

What happened:
After I executed the vacuum function, from 0 to 99 json files were deleted from _delta_log directory .
Later when i re-ran dt = DeltaTable(local_path), I get the error in the issue title

What you expected to happen:
import local table successfully

How to reproduce it:
as What happend

More details:
image

jupyter notebook code

import deltalake
import duckdb
import pyarrow
import pandas as pd

local_price_path = '/Users/watsy0007/resources/test_table'
price_table = deltalake.DeltaTable(local_price_path)
price_ds = price_table.to_pyarrow_dataset()

## compact and vacuum
price_table.optimize.compact()
price_table.vacuum()
image

and the last operation json as follow

```json {"remove":{"path":"month=202312/part-00001-cc59364c-2097-4f4c-860c-6e2b97c1573e-c000.zstd.parquet","dataChange":false,"deletionTimestamp":1703810161493,"partitionValues":{"month":"202312"},"size":2521730}} {"remove":{"path":"month=202312/141-d770dcc2-8008-4991-a19f-1a6a39f8f96f-0.parquet","dataChange":false,"deletionTimestamp":1703810161493,"partitionValues":{"month":"202312"},"size":178963}} {"add":{"path":"month=202312/part-00001-a2d5ee29-980a-44e1-85b2-3fcfd73c8923-c000.zstd.parquet","partitionValues":{"month":"202312"},"size":2630580,"modificationTime":1703810162294,"dataChange":false,"stats":"{\"numRecords\":188485,\"minValues\":{\"close\":\"0.00010004621161297134\",\"unique_id\":\"0chain\",\"price_dt\":\"2023-12-01T00:00:00Z\",\"source\":\"coingecko\",\"price\":3.5876668360643876e-17,\"created_at\":\"2023-12-01T00:01:03Z\",\"date_ts\":1701388800,\"vs_currency\":\"usd\",\"coin_id\":0},\"maxValues\":{\"price_dt\":\"2023-12-28T23:53:46Z\",\"created_at\":\"2023-12-29T00:35:51.812Z\",\"source\":\"exchangerate\",\"vs_currency\":\"usd\",\"close\":\"997.2183639248549\",\"coin_id\":2890,\"unique_id\":\"zynecoin\",\"price\":88545.6567696104,\"date_ts\":1703807626},\"nullCount\":{\"close\":0,\"coin_id\":0,\"created_at\":0,\"vs_currency\":0,\"price\":0,\"source\":0,\"unique_id\":0,\"date_ts\":0,\"price_dt\":0}}","tags":null,"deletionVector":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}} {"commitInfo":{"timestamp":1703810162349,"operation":"OPTIMIZE","operationParameters":{"targetSize":"104857600"},"readVersion":141,"operationMetrics":{"filesAdded":{"avg":2630580.0,"max":2630580,"min":2630580,"totalFiles":1,"totalSize":2630580},"filesRemoved":{"avg":1350346.5,"max":2521730,"min":178963,"totalFiles":2,"totalSize":2700693},"numBatches":24,"numFilesAdded":1,"numFilesRemoved":2,"partitionsOptimized":0,"preserveInsertionOrder":true,"totalConsideredFiles":2,"totalFilesSkipped":0},"clientVersion":"delta-rs.0.17.0"}} ```
@watsy0007 watsy0007 added the bug Something isn't working label Jan 2, 2024
@rtyler rtyler added the binding/python Issues for the Python package label Jan 2, 2024
@rtyler
Copy link
Member

rtyler commented Jan 2, 2024

@watsy0007 is there a checkpoint file in the _delta_log directory? Removing all those JSON files is... interesting

@watsy0007
Copy link
Author

@rtyler no, this is also the point that confuses me when i read the documentation.
Is is because i am use an early version? maybe from python-v0.12.0.
By the way do I have any other way to fix the data now?

@Blajda
Copy link
Collaborator

Blajda commented Jan 6, 2024

@watsy0007 did you ever execute cleanup_metadata on the table? Vacuum and Compact should not delete any metadata files.

ion-elgreco pushed a commit that referenced this issue Jan 7, 2024
#2044)

# Description
When metadata cleanup is executed on a delta table without checkpoints
it will corrupt the table and prevent further loading. This is a high
risk for people who use delta-rs since our writers do not automatically
create checkpoints.

# Related Issue(s)
- closes #2016
@watsy0007
Copy link
Author

@Blajda sorry, i forgot.
It may have been executed during the initial testing phase after writing data.
I checked the data write time, and the first write happended on October 8th. The delta-rs version is 0.15.0

r3stl355 pushed a commit to r3stl355/delta-rs that referenced this issue Jan 10, 2024
delta-io#2044)

# Description
When metadata cleanup is executed on a delta table without checkpoints
it will corrupt the table and prevent further loading. This is a high
risk for people who use delta-rs since our writers do not automatically
create checkpoints.

# Related Issue(s)
- closes delta-io#2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants