Override default implementation of `skip(long)` method #18977

findinpath · 2023-09-08T20:30:45Z

Description

Delegate to stream the implementation of skip(long) method. By doing this, it is ensured that not the default implementation from java.io.InputStream is being used, because it calls read() causing actual reads from the underlying input stream although skip is just a logical operation.

Additional context and related issues

Fixes #18976

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Hive
* Avoid reading unrelated data to the text line reader splits. ({issue}`issuenumber`)

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/HdfsTrinoInputStream.java

lib/trino-filesystem-s3/src/main/java/io/trino/filesystem/s3/S3InputStream.java

electrum · 2023-09-08T22:20:34Z

Where are we using skip() rather than seek() for large sizes? The new S3 implementation (and possibly Azure) will also need to implement this efficiently.

lib/trino-filesystem-s3/src/main/java/io/trino/filesystem/s3/S3InputStream.java

electrum · 2023-09-08T23:05:06Z

Changes requested for the S3InputStream change.

To answer my previous question: I found that TextLineReader and SequenceFileReader use skipNBytes(), which we implement for S3InputStream and AzureInputStream. For HdfsTrinoInputStream, it uses the default implementation, which calls skip(), hence why we need to make the change here to delegate correctly.

findinpath · 2023-09-09T04:28:58Z

I found that TextLineReader and SequenceFileReader use skipNBytes()

See here the hierarchy of calls

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/HdfsTrinoInputStream.java

Delegate to `stream` the implementation of `skip(long)` method. By doing this, it is ensured that not the default implementation from `java.io.InputStream` is being used, because it calls `read()` causing actual reads from the underlying input stream although `skip` is just a logical operation. Co-authored-by: James Petty <[email protected]>

findinpath · 2023-09-11T08:32:22Z

Removed the changes in S3InputStream because they were unrelated to this change.
Thank you @electrum for raising this aspect.

findepi · 2023-09-11T09:49:58Z

/test-with-secrets sha=3af6109e7dc6a9f286f84c6c84b66bcf342c4967

github-actions · 2023-09-11T12:52:45Z

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/6145626334

cla-bot bot added the cla-signed label Sep 8, 2023

findinpath requested review from electrum, findepi and pettyjamesm September 8, 2023 20:30

findinpath self-assigned this Sep 8, 2023

pettyjamesm reviewed Sep 8, 2023

View reviewed changes

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/HdfsTrinoInputStream.java Outdated Show resolved Hide resolved

findinpath force-pushed the findinpath/hdfs-trino-input-stream-skip branch from 4dbfc5c to 2473ea3 Compare September 8, 2023 20:39

findinpath requested a review from pettyjamesm September 8, 2023 20:41

pettyjamesm approved these changes Sep 8, 2023

View reviewed changes

findinpath force-pushed the findinpath/hdfs-trino-input-stream-skip branch from 2473ea3 to 822102c Compare September 8, 2023 21:11

findinpath requested a review from pettyjamesm September 8, 2023 21:11

findinpath force-pushed the findinpath/hdfs-trino-input-stream-skip branch 2 times, most recently from cc8fb2d to 8db9715 Compare September 8, 2023 21:41

findinpath commented Sep 8, 2023

View reviewed changes

lib/trino-filesystem-s3/src/main/java/io/trino/filesystem/s3/S3InputStream.java Outdated Show resolved Hide resolved

electrum requested changes Sep 8, 2023

View reviewed changes

lib/trino-filesystem-s3/src/main/java/io/trino/filesystem/s3/S3InputStream.java Outdated Show resolved Hide resolved

findinpath force-pushed the findinpath/hdfs-trino-input-stream-skip branch from 8db9715 to 3862260 Compare September 9, 2023 04:53

findinpath requested a review from electrum September 9, 2023 04:53

findinpath force-pushed the findinpath/hdfs-trino-input-stream-skip branch from 3862260 to 87ddc4e Compare September 9, 2023 04:57

findinpath commented Sep 9, 2023

View reviewed changes

lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/HdfsTrinoInputStream.java Show resolved Hide resolved

findinpath force-pushed the findinpath/hdfs-trino-input-stream-skip branch from 87ddc4e to 3af6109 Compare September 11, 2023 08:31

findinpath mentioned this pull request Sep 11, 2023

findinpath Override default implementation of skip(long) method findinpath/trino#15

Closed

electrum approved these changes Sep 11, 2023

View reviewed changes

electrum merged commit d06a3e0 into trinodb:master Sep 11, 2023
59 checks passed

github-actions bot added this to the 427 milestone Sep 11, 2023

colebow mentioned this pull request Sep 12, 2023

Add Trino 427 release notes #19023

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Override default implementation of `skip(long)` method #18977

Override default implementation of `skip(long)` method #18977

findinpath commented Sep 8, 2023 •

edited

Loading

electrum commented Sep 8, 2023 •

edited

Loading

electrum commented Sep 8, 2023

findinpath commented Sep 9, 2023

findinpath commented Sep 11, 2023

findepi commented Sep 11, 2023

github-actions bot commented Sep 11, 2023

Override default implementation of skip(long) method #18977

Override default implementation of skip(long) method #18977

Conversation

findinpath commented Sep 8, 2023 • edited Loading

Description

Additional context and related issues

Release notes

electrum commented Sep 8, 2023 • edited Loading

electrum commented Sep 8, 2023

findinpath commented Sep 9, 2023

findinpath commented Sep 11, 2023

findepi commented Sep 11, 2023

github-actions bot commented Sep 11, 2023

Override default implementation of `skip(long)` method #18977

Override default implementation of `skip(long)` method #18977

findinpath commented Sep 8, 2023 •

edited

Loading

electrum commented Sep 8, 2023 •

edited

Loading