Initial version of simple FileReader/Writer #516

robert3005 · 2024-07-24T22:22:01Z

Reader currently assumes Chunked(Column(Chunked)) layouts. This is going to be expanded in follow ups

bench-vortex/src/reader.rs

robert3005 · 2024-07-24T22:27:52Z

vortex-serde/src/file/file_writer.rs

+            let len = chunk.byte_offsets.len() - 1;
+            let byte_counts = chunk
+                .byte_offsets
+                .iter()
+                .skip(1)
+                .zip(chunk.byte_offsets.iter())
+                .map(|(a, b)| a - b)
+                .collect_vec();
+
+            chunks.extend(
+                chunk
+                    .byte_offsets
+                    .iter()
+                    .zip(chunk.byte_offsets.iter().skip(1))
+                    .map(|(begin, end)| Layout::Flat(FlatLayout::new(*begin, *end))),
+            );
+            let row_counts = chunk
+                .row_offsets
+                .iter()
+                .skip(1)
+                .zip(chunk.row_offsets.iter())
+                .map(|(a, b)| a - b)
+                .collect_vec();
+            chunk.byte_offsets.truncate(len);
+            chunk.row_offsets.truncate(len);
+
+            let metadata_array = StructArray::try_new(
+                [
+                    "byte_offset".into(),
+                    "byte_count".into(),
+                    "row_offset".into(),
+                    "row_count".into(),
+                ]
+                .into(),
+                vec![
+                    chunk.byte_offsets.into_array(),
+                    byte_counts.into_array(),
+                    chunk.row_offsets.into_array(),
+                    row_counts.into_array(),
+                ],
+                len,
+                Validity::NonNullable,
+            )?;


This logic beyond extending chunks byte offsets is highly suspect. We can't reuse these tables in ChunkedArrayReader which is a bad thing. I wonder if we need to store multiple metadata tables since the offset tables are going to be 1 row longer than stat tables. OR the byte/row offsets live in layouts and these tables are pure metadata.

vortex-serde/src/file/file_reader.rs

vortex-serde/src/file/file_writer.rs

vortex-serde/src/io/read.rs

vortex-serde/src/file/footer.rs

vortex-serde/src/file/file_reader.rs

lwwmanning · 2024-07-25T16:57:36Z

note that the new vortex_serde tests appear to take a LOOONG time in miri, probably worth adding #[cfg_attr(miri, ignore)] those (and maybe making "fast" variants for miri to run)

robert3005 commented Jul 24, 2024

View reviewed changes

bench-vortex/src/reader.rs Outdated Show resolved Hide resolved

robert3005 commented Jul 24, 2024

View reviewed changes

bench-vortex/src/reader.rs Outdated Show resolved Hide resolved

robert3005 commented Jul 24, 2024

View reviewed changes

robert3005 and others added 26 commits July 25, 2024 00:05

Add vortex file format

315df41

less

e6c4818

bug fix and minimal reader

b47f95c

st

d449c0f

dtype reader

e1a0184

.

99770c3

some more things

b1224df

basic reader that works

9f9a519

should be able to read multiple batches

655aea6

chunked arrays

35f7def

Slightly nicer code

09e7862

cosmetic changes

fe06109

unaligned

93c02aa

something

2b1f4dd

less

516a30c

moves

cd95cab

less

6a0ba24

refactor

b48e860

minor changes

10f61f5

more

8f45a6b

fixes

17538df

less

d9a560f

less

ad71f73

less

e834679

unwind

5a30f7a

fix

4d0ac6b

robert3005 force-pushed the rk/metadata branch from ac7e640 to 4d0ac6b Compare July 24, 2024 23:16

AdamGS added 2 commits July 25, 2024 10:39

cr note

b2ba79e

fix lint

bac53d5

a10y reviewed Jul 25, 2024

View reviewed changes

AdamGS and others added 3 commits July 25, 2024 16:35

minor change

7fb2a31

.

5ec5a44

Merge remote-tracking branch 'origin/develop' into rk/metadata

868d069

robert3005 and others added 4 commits July 29, 2024 11:41

asserts

9a04327

move len around

8175c2c

More things and some refactoring

8349398

projection work and basic test

370a9d0

AdamGS force-pushed the rk/metadata branch from 0579b49 to 370a9d0 Compare July 29, 2024 14:29

robert3005 and others added 11 commits July 29, 2024 15:51

typo

4accbe3

mask

0f4a5bc

ignore

3baa4e7

metadatas are first in chunked layouts

ced449a

.

476fc48

ignore

d6daeb8

read

c36f6c2

Fix metadata layout bug

7436ab3

some CR notes

8b169f6

less

0c903d4

magic

f3db79a

AdamGS approved these changes Jul 29, 2024

View reviewed changes

AdamGS enabled auto-merge (squash) July 29, 2024 17:23

AdamGS disabled auto-merge July 29, 2024 17:23

AdamGS enabled auto-merge (squash) July 29, 2024 17:23

fmt

1f34b42

AdamGS merged commit 24dca82 into develop Jul 29, 2024
3 checks passed

AdamGS deleted the rk/metadata branch July 29, 2024 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial version of simple FileReader/Writer #516

Initial version of simple FileReader/Writer #516

robert3005 commented Jul 24, 2024 •

edited

Loading

robert3005 Jul 24, 2024

lwwmanning commented Jul 25, 2024

Initial version of simple FileReader/Writer #516

Initial version of simple FileReader/Writer #516

Conversation

robert3005 commented Jul 24, 2024 • edited Loading

robert3005 Jul 24, 2024

Choose a reason for hiding this comment

lwwmanning commented Jul 25, 2024

robert3005 commented Jul 24, 2024 •

edited

Loading