[IO] Improved IO with support for reading data from compressed files #308

probberechts · 2024-04-12T12:16:48Z

It is a common practice to store data as compressed files to reduce storage requirements. With this PR it is no longer needed to decompress the file before loading the data with kloppy.

from kloppy import statsperform

dataset = statsperform.load(
    raw_data="ma25_tracking.txt.gz",
    meta_data="ma1_metadata.xml.gz",
)

Whether a file is compressed is derived from the file's extension. Currently supports ".gz", ".xz" and ".bz2".

- Add support for opening a gzip, bzip or lzma-compressed file. - Additional tests for io.open_as_file function

kloppy/io.py

koenvo · 2024-04-19T14:53:44Z

This should also work non-local files, right? Like https://some-url.com/file.xml.gz

koenvo · 2024-05-27T15:14:05Z

Can you merge master in please to make sure tests run again

probberechts · 2024-05-27T21:28:22Z

I couldn't get boto (to mock an S3 bucket) to work on GitHub Actions. In the most recent version, there is this bug and for older versions I can't figure out a set of version constraints between s3fs and boto that works on each Python version. Hence, I propose to disable these tests until the bug is fixed.

I recently also found the xopen library for opening compressed files. We could use it as a more efficient and robust replacement of the _open method that I implemented. Do you think it is worth adding another dependency? It could also be an optional one.

kloppy/io.py

koenvo · 2024-06-19T19:32:03Z

Thanks Pieter, great work!

feat(io): Allow reading data from compressed file

0c53883

- Add support for opening a gzip, bzip or lzma-compressed file. - Additional tests for io.open_as_file function

koenvo reviewed Apr 19, 2024

View reviewed changes

kloppy/io.py Outdated Show resolved Hide resolved

probberechts force-pushed the feat/load-gzip branch from 01c6c3a to c17764b Compare April 26, 2024 13:52

fix(io): Allow compression for remote files

f3052d8

probberechts force-pushed the feat/load-gzip branch from c17764b to f3052d8 Compare April 26, 2024 13:55

JanVanHaaren assigned koenvo May 27, 2024

probberechts added 3 commits May 27, 2024 21:37

Merge branch 'master' into feat/load-gzip

5bdfa2f

test(io): fix s3 fixture for Python3.11

ccf1cb3

chore: linting

9825a7b

probberechts force-pushed the feat/load-gzip branch 2 times, most recently from 827ebad to a6a288a Compare May 27, 2024 20:39

test(io): resolve dependency conflicts

0db0ac9

probberechts force-pushed the feat/load-gzip branch from a6a288a to 0db0ac9 Compare May 27, 2024 21:10

test(io): capitulation 🏳️?

cb05ae1

koenvo reviewed Jun 3, 2024

View reviewed changes

kloppy/io.py Outdated Show resolved Hide resolved

koenvo reviewed Jun 3, 2024

View reviewed changes

kloppy/io.py Outdated Show resolved Hide resolved

probberechts force-pushed the feat/load-gzip branch from 86eb9d8 to f7b82d6 Compare June 18, 2024 19:24

probberechts changed the title ~~[IO] Allow reading data from compressed file~~ [IO] Improved IO with support for reading data from compressed files Jun 18, 2024

probberechts force-pushed the feat/load-gzip branch from f7b82d6 to 7dcc9f6 Compare June 18, 2024 19:32

more io improvements and bugfixes

454a245

probberechts force-pushed the feat/load-gzip branch from 7dcc9f6 to 454a245 Compare June 18, 2024 19:33

probberechts requested a review from koenvo June 18, 2024 19:38

koenvo merged commit a3ca3f3 into PySport:master Jun 19, 2024
19 checks passed

koenvo added this to the 3.15 milestone Jun 19, 2024

probberechts deleted the feat/load-gzip branch June 20, 2024 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IO] Improved IO with support for reading data from compressed files #308

[IO] Improved IO with support for reading data from compressed files #308

probberechts commented Apr 12, 2024

koenvo commented Apr 19, 2024

koenvo commented May 27, 2024

probberechts commented May 27, 2024

koenvo commented Jun 19, 2024

[IO] Improved IO with support for reading data from compressed files #308

[IO] Improved IO with support for reading data from compressed files #308

Conversation

probberechts commented Apr 12, 2024

koenvo commented Apr 19, 2024

koenvo commented May 27, 2024

probberechts commented May 27, 2024

koenvo commented Jun 19, 2024