Fix fastparquet tests to work with HDFS #9583
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #9545.
This commit fixes the
fastparquet
tests to run on Spark clusters where thefs.default.name
does not point to the local filesystem.Before this commit, the
fastparquet
tests assumed that the parquet files generated for the tests were written to local filesystem, and could be read from bothfastparquet
and Spark from the same location. However, this fails when run against clusters whose default filesystem is HDFS.fastparquet
can only read from the local filesystem.This commit changes the tests as follows:
fastparquet
.fastparquet
, the data is copied to the default Hadoop filesystem before reading through Spark.