This library provides elegant functions to manage hdfs filesystem and cloud object stores.
libraryDependencies += "com.brayanjules" %% "hio" % "0.0.1"
for authentication, you can use environment variables or provide an xml config file:
- Config File
- To set a configuration file you should follow the official hadoop documentation
- And set the environment variable CONFIG_PATH with the file path where the configuration file is stored
- S3 Example:
<configuration> <property> <name>fs.s3a.access.key</name> <value>AWS access key ID</value> </property> <property> <name>fs.s3a.secret.key</name> <value>AWS secret key</value> </property> </configuration>
- Environment Variables
- AWS
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- Azure Object Store / Data lake
- AZURE_TENANT_ID
- AZURE_CLIENT_ID
- AZURE_CLIENT_SECRET
- AWS
This function creates a folder in a filesystem/object store based on a given name, the folder name it should follow the rules of the provider.
val root = hio.Path("s3a://bucket_name")
hio.mkdir(root / "path/to/folder")
The function search in the given folder and returns the paths of every file or folder within it. It also supports searching given a wildcard.
val wd = hio.Path("s3a://bucket_name/path/to/folders")
hio.ls(wd)
or to search given a wildcard
val wd = hio.Path("s3a://bucket_name/path/to/folders")
hio.ls.withWildCard(wd / "*.txt")
The execution of the function will return an array of string containing the paths as follows:
ArraySeq(
"s3a://bucket_name/path/to/folders/file_1.txt",
"s3a://bucket_name/path/to/folders/file_2.txt",
"s3a://bucket_name/path/to/folders/new_folder")
Note that if you use the wildcard function with only a directory you will not get all the files and folders within it, instead it will return only the given folder. e.g:
val wd = hio.Path("s3a://bucket_name/path/to/folders")
hio.ls.withWildCard(wd)
returns
ArraySeq("s3a://bucket_name/path/to/folders")
The function remove
permanently deletes files or folders from a filesystem/object store. It is also possible
to recursively deletes sub-folders/files using remove.all
. e.g:
val filePath = hio.Path("s3a://bucket_name/path/to/file")
hio.remove(filePath)
or recursively deletes
val folderPath = hio.Path("s3a://bucket_name/path/to/folder")
hio.remove.all(folderPath)
The function copy
creates a copy of the files in the source folder in the destination folder. It is also possible
to use this function with wild card copy.withWildCard
. e.g:
val src = hio.Path("s3a://bucket_name/path/to/src")
val dest = hio.Path("s3a://bucket_name/path/to/dest")
hio.copy(src,dest)
or use it with wildcard
val src = hio.Path("s3a://bucket_name/path/to/src/*.parquet")
val dest = hio.Path("s3a://bucket_name/path/to/dest")
hio.copy.withWildCard(src,dest)
The function move
creates a copy of the files in the source folder in the destination folder and remove the files
from the source folder. It is also possible to use this function with wild card move.withWildCard
. e.g:
val src = hio.Path("s3a://bucket_name/path/to/src")
val dest = hio.Path("s3a://bucket_name/path/to/dest")
hio.move(src,dest)
or use it with wildcard
val src = hio.Path("s3a://bucket_name/path/to/src/*.parquet")
val dest = hio.Path("s3a://bucket_name/path/to/dest")
hio.move.withWildCard(src,dest)
This function write
creates a file in a filesystem from an array of bytes or a string. To create the file
the folder must exist.
val fileContentInStr =
"""
|name,lastname,age
|Maria,Willis,36
|Benito,Jackson,28
|""".stripMargin
val wd = hio.Path("s3a://bucket_name/path/to/folders/data.csv")
hio.write(wd,fileContentInStr)
This function reads a file from the filesystem/object store and return its representation in byte array or string.
val wd = hio.Path("s3a://bucket_name/path/to/folders")
hio.read(wd / "file_1.txt")
or to automatically parse to string
val wd = hio.Path("s3a://bucket_name/path/to/folders")
hio.read.string(wd / "file_1.txt")
We welcome contributions to this project, to contribute checkout our CONTRIBUTING.md file.
- SBT 1.8.2
- Java 8
- Scala 2.12.12
To compile, run
sbt compile
To test, run
sbt test
To generate artifacts, run
sbt package