Rename `data_sequence_number` to `sequence_number` #893

Fokko · 2024-07-04T18:49:20Z

Feature Request / Improvement

It looks like a misnamed field slipped in:

{
    "status": 1,
    "snapshot_id": {
        "long": 898025966831056900
    },
    "data_sequence_number": null,
    "file_sequence_number": null,
    "data_file": {
        "content": 0,
        "file_path": "/tmp/some.db/tablev2/data/00000-0-93717a88-1cea-4e3d-a69a-00ce3d087822.parquet",
        "file_format": "PARQUET",
        "partition": {},
        "record_count": 3,
        "file_size_in_bytes": 5459,
        "column_sizes": { ... },
        "value_counts": { ... },
        "null_value_counts": { ... },
        "nan_value_counts": { ... },
        "lower_bounds": { ... },
        "upper_bounds": { ... },
        "key_metadata": null,
        "split_offsets": {
            "array": [
                4
            ]
        },
        "equality_ids": null,
        "sort_order_id": null
    }
}

This should be sequence_number:

Luckily this still worked due to Iceberg's field-id based lookup, but would be good to get this cleaned up.

Relevant code:

iceberg-python/pyiceberg/manifest.py

Line 380 in a8d3f17

NestedField(3, "data_sequence_number", LongType(), required=False),

The text was updated successfully, but these errors were encountered:

kevinjqliu · 2024-07-04T22:26:35Z

Is there a way on the Java/spark side to turn metadata information into JSON? With #535, perhaps we can compare the two JSON results and check for mismatches like this one.

soumya-ghosh · 2024-07-05T16:52:34Z

@Fokko I would like to take a shot at this one.

Fokko · 2024-07-05T16:55:55Z

@soumya-ghosh Feel free to take a stab at it, let me know if you run into anything

Fokko · 2024-07-05T16:57:00Z

Is there a way on the Java/spark side to turn metadata information into JSON? With #535, perhaps we can compare the two JSON results and check for mismatches like this one.

That would be an interesting idea. We could take the PySpark schema and turn it into an Iceberg schema and compare the two (or just compare the Avro schemas)

soumya-ghosh · 2024-07-06T22:07:18Z

@Fokko the PR #900 is ready for review.

Fokko added this to the PyIceberg 0.7.0 release milestone Jul 4, 2024

Fokko mentioned this issue Jul 4, 2024

Support merge manifests on writes (MergeAppend) #363

Merged

kevinjqliu added the good first issue Good for newcomers label Jul 4, 2024

Fokko assigned soumya-ghosh Jul 5, 2024

soumya-ghosh mentioned this issue Jul 6, 2024

Rename data_sequence_number to sequence_number in ManifestEntry #900

Merged

HonahX linked a pull request Jul 11, 2024 that will close this issue

Rename data_sequence_number to sequence_number in ManifestEntry #900

Merged

HonahX closed this as completed Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename `data_sequence_number` to `sequence_number` #893

Rename `data_sequence_number` to `sequence_number` #893

Fokko commented Jul 4, 2024

kevinjqliu commented Jul 4, 2024

soumya-ghosh commented Jul 5, 2024

Fokko commented Jul 5, 2024

Fokko commented Jul 5, 2024

soumya-ghosh commented Jul 6, 2024

Rename data_sequence_number to sequence_number #893

Rename data_sequence_number to sequence_number #893

Comments

Fokko commented Jul 4, 2024

Feature Request / Improvement

kevinjqliu commented Jul 4, 2024

soumya-ghosh commented Jul 5, 2024

Fokko commented Jul 5, 2024

Fokko commented Jul 5, 2024

soumya-ghosh commented Jul 6, 2024

Rename `data_sequence_number` to `sequence_number` #893

Rename `data_sequence_number` to `sequence_number` #893