Hadoop FileSystem Java Class Wrapper

Typed Python wrappers for Hadoop FileSystem class family.

Installation

You can install this package from pypi on any Hadoop or Spark runtime:

pip install hadoop-fs-wrapper

Select a version that matches hadoop version you are using:

Hadoop Version / Spark version	Compatible hadoop-fs-wrapper version
3.2.x / 3.2.x	0.4.x
3.3.x / 3.3.x	0.4.x, 0.5.x
3.3.x / 3.4.x	0.6.x
3.5.x / 3.5.x	0.7.x

Usage

Common use case is accessing Hadoop FileSystem from Spark session object:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

file_system = FileSystem.from_spark_session(spark=spark_session)

Then, for example, one can check if there are any files under specified path:

from hadoop_fs_wrapper.wrappers.file_system import FileSystem

def is_valid_source_path(file_system: FileSystem, path: str) -> bool:
    """
     Checks whether a regexp path refers to a valid set of paths
    :param file_system: pyHadooopWrapper FileSystem
    :param path: path e.g. (s3a|abfss|file|...)://hello@world.com/path/part*.csv
    :return: true if path resolves to existing paths, otherwise false
    """
    return len(file_system.glob_status(path)) > 0

Contribution

Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped, please open an issue or create a PR.

All changes are tested against Spark 3.4 running in local mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hadoop FileSystem Java Class Wrapper

Installation

Usage

Contribution

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hadoop FileSystem Java Class Wrapper

Installation

Usage

Contribution