Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ResourceContexts.jl for resource handling #12

Merged
merged 5 commits into from
May 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ version = "0.2.2"

[deps]
AbstractTrees = "1520ce14-60c1-5f80-bbc7-55ef81b5835c"
ResourceContexts = "8d208092-d35c-4dd3-a0d7-8325f9cce6b4"
REPL = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"
ReplMaker = "b873ce64-0db9-51f5-a568-4457d8e49576"
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
Expand All @@ -15,6 +16,7 @@ UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
AbstractTrees = "0.3"
ReplMaker = "0.2"
TOML = "1"
ResourceContexts = "0.1"
julia = "1.5"

[extras]
Expand Down
2 changes: 2 additions & 0 deletions docs/dev.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
using LiveServer
servedocs(doc_env=true, foldername=@__DIR__)
51 changes: 39 additions & 12 deletions docs/src/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ end
DocTestFilters = [
r"(?<=Project: \[).*$",
r"path =.*",
r"@.*"
r"@.*",
r"(?<=IOStream\().*",
]
```

Expand Down Expand Up @@ -85,17 +86,42 @@ path = ".../DataSets/docs/src/data/file.txt"

## Loading Data

To load data, call the `open()` function on the `DataSet` and pass the desired
Julia type which will be returned. For example, to read the dataset named
`"a_text_file"` as a `String`,
You can call `open()` on a DataSet to inspect the data inside. `open()` will
return the [`Blob`](@ref) and [`BlobTree`](@ref) types for local files and
directories on disk. For example,

```jldoctest
julia> open(dataset("a_text_file"))
📄 @ .../DataSets/docs/src/data/file.txt

julia> open(dataset("a_tree_example"))
📂 Tree @ .../DataSets/docs/src/data/csvset
📄 1.csv
📄 2.csv
```

Use the form `open(T, dataset)` to read the data as a specific type. `Blob`
data can be opened as `String`, `IO`, or `Vector{UInt8}`, depending on your
needs:

```jldoctest
julia> io = open(IO, dataset("a_text_file"))
IOStream(<file .../DataSets/docs/src/data/file.txt>)

julia> read(io, String)
"Hello world!\n"

julia> buf = open(Vector{UInt8}, dataset("a_text_file"));

julia> String(buf)
"Hello world!\n"

julia> open(String, dataset("a_text_file"))
"Hello world!\n"
```

It's also possible to open this data as an `IO` stream, in which case the do
block form should be used:
To ensure the dataset is closed again in a timely way (freeing any resources
such as file handles), you should use the scoped form, for example:

```jldoctest
julia> open(IO, dataset("a_text_file")) do io
Expand All @@ -106,10 +132,11 @@ julia> open(IO, dataset("a_text_file")) do io
content = "Hello world!\n"
```

Let's also inspect the tree example using the tree data type
[`BlobTree`](@ref). Such data trees can be indexed with path components to get
at the file [`Blob`](@ref)s inside, which in turn can be `open`ed to retrieve
the data.
Let's look at some tree-like data which is represented on local disk as a
folder or directory. Tree data is opened in Julia as the [`BlobTree`](@ref)
type and can be indexed with path components to get at the file [`Blob`](@ref)s
inside. In turn, we can `open()` one of the file blobs and look at the data
contained within.

```jldoctest
julia> tree = open(BlobTree, dataset("a_tree_example"))
Expand All @@ -118,9 +145,9 @@ julia> tree = open(BlobTree, dataset("a_tree_example"))
📄 2.csv

julia> tree["1.csv"]
📄 1.csv @ .../DataSets/test/data/csvset
📄 1.csv @ .../DataSets/docs/src/data/csvset

julia> Text(open(String, tree["1.csv"]))
julia> open(String, tree["1.csv"]) |> Text
Name,Age
"Aaron",23
"Harry",42
Expand Down
49 changes: 43 additions & 6 deletions src/BlobTree.jl
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ mapped into the program as an `IO` byte stream, or interpreted as a `String`.

Blobs can be arranged into hierarchies "directories" via the `BlobTree` type.
"""
struct Blob{Root}
mutable struct Blob{Root}
root::Root
path::RelPath
end
Expand All @@ -148,7 +148,7 @@ function AbstractTrees.printnode(io::IO, file::Blob)
print(io, "📄 ", basename(file))
end

# Opening as Vector{UInt8} or as String uses IO interface
# Opening as Vector{UInt8} or as String defers to IO interface
function Base.open(f::Function, ::Type{Vector{UInt8}}, file::Blob)
open(IO, file.root, file.path) do io
f(read(io)) # TODO: use Mmap?
Expand All @@ -174,9 +174,42 @@ function Base.open(f::Function, ::Type{T}, file::Blob; kws...) where {T}
open(f, T, file.root, file.path; kws...)
end

# Unscoped form of open
function Base.open(::Type{T}, file::Blob; kws...) where {T}
open(identity, T, file; kws...)
# ResourceContexts.jl - based versions of the above.

@! function Base.open(::Type{Vector{UInt8}}, file::Blob)
@context begin
# TODO: use Mmap?
read(@! open(IO, file.root, file.path))
end
end

@! function Base.open(::Type{String}, file::Blob)
@context begin
read(@!(open(IO, file.root, file.path)), String)
end
end

# Default open-type for Blob is IO
@! function Base.open(file::Blob; kws...)
@! open(IO, file.root, file.path; kws...)
end

# Opening Blob as itself is trivial
@! function Base.open(::Type{Blob}, file::Blob)
file
end

# open with other types T defers to the underlying storage system
@! function Base.open(::Type{T}, file::Blob; kws...) where {T}
@! open(T, file.root, file.path; kws...)
end

# Unscoped form of open for Blob
function Base.open(::Type{T}, blob::Blob; kws...) where {T}
@context begin
result = @! open(T, blob; kws...)
@! ResourceContexts.detach_context_cleanup(result)
end
end

# read() is also supported for `Blob`s
Expand Down Expand Up @@ -230,7 +263,7 @@ julia> tree[path"csvset"]
📄 2.csv
```
"""
struct BlobTree{Root} <: AbstractBlobTree
mutable struct BlobTree{Root} <: AbstractBlobTree
root::Root
path::RelPath
end
Expand Down Expand Up @@ -315,4 +348,8 @@ function Base.open(f::Function, ::Type{BlobTree}, tree::BlobTree)
f(tree)
end

@! function Base.open(::Type{BlobTree}, tree::BlobTree)
tree
end

# Base.open(::Type{T}, file::Blob; kws...) where {T} = open(identity, T, file.root, file.path; kws...)
61 changes: 31 additions & 30 deletions src/DataSets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module DataSets
using UUIDs
using TOML
using SHA
using ResourceContexts

export DataSet, dataset, @datafunc, @datarun
export Blob, BlobTree, newfile, newdir
Expand Down Expand Up @@ -508,6 +509,10 @@ function add_storage_driver((name,opener)::Pair)
_drivers[name] = opener
end

#-------------------------------------------------------------------------------
# Functions for opening datasets

# do-block form of open()
function Base.open(f::Function, as_type, dataset::DataSet)
storage_config = dataset.storage
driver = _drivers[storage_config["driver"]]
Expand All @@ -516,43 +521,39 @@ function Base.open(f::Function, as_type, dataset::DataSet)
end
end

# For convenience, this non-scoped open() just returns the data handle as
# opened. See check_scoped_open for a way to help users avoid errors when using
# this (ie, if `identity` is not a valid argument to open() because resources
# would be closed before it returns).
#
# FIXME: Consider removing this. It should likely be replaced with `load()`, in
# analogy to FileIO.jl's load operation:
# * `load()` is "load the entire file into memory as such-and-such type"
# * `open()` is "open this resource, and run some function while it's open"
Base.open(as_type, conf::DataSet) = open(identity, as_type, conf)

"""
check_scoped_open(func, as_type)
# Contexts-based form of open()
@! function Base.open(dataset::DataSet)
storage_config = dataset.storage
driver = _drivers[storage_config["driver"]]
# Use `enter_do` because drivers don't yet use the ResourceContexts.jl mechanism
(storage,) = @! enter_do(driver, storage_config, dataset)
storage
end

Call `check_scoped_open(func, as_type) in your implementation of `open(func,
as_type, data)` if you clean up or `close()` resources by the time `open()`
returns.
@! function Base.open(as_type, dataset::DataSet)
storage = @! open(dataset)
@! open(as_type, storage)
end

That is, if the unscoped form `use(open(AsType, data))` is invalid and the
following scoped form required:
# TODO:
# Consider making a distinction between open() and load().

```
open(AsType, data) do x
use(x)
# Finalizer-based version of open()
function Base.open(dataset::DataSet)
@context begin
result = @! open(dataset)
@! ResourceContexts.detach_context_cleanup(result)
end
end
```

The dicotomy of resource handling techniques in `open()` are due to an
unresolved language design problem of how resource handling and cleanup should
work (see https://github.com/JuliaLang/julia/issues/7721).
"""
check_scoped_open(func, as_type) = nothing

function check_scoped_open(func::typeof(identity), as_type)
throw(ArgumentError("You must use the scoped form `open(your_function, AsType, data)` to open as type $as_type"))
function Base.open(as_type, dataset::DataSet)
@context begin
result = @! open(as_type, dataset)
@! ResourceContexts.detach_context_cleanup(result)
end
end

#-------------------------------------------------------------------------------
# Application entry points
include("entrypoint.jl")

Expand Down
12 changes: 8 additions & 4 deletions src/filesystem.jl
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,17 @@ Base.read(root::AbstractFileSystemRoot, path::RelPath) where {T} =

Base.summary(io::IO, root::AbstractFileSystemRoot) = print(io, sys_abspath(root))

function Base.open(f::Function, ::Type{IO}, root::AbstractFileSystemRoot, path;
write=false, read=!write, kws...)
function Base.open(f::Function, as_type::Type{IO}, root::AbstractFileSystemRoot, path;
kws...)
@context f(@! open(as_type, root, path; kws...))
end

@! function Base.open(::Type{IO}, root::AbstractFileSystemRoot, path;
write=false, read=!write, kws...)
if !iswriteable(root) && write
error("Error writing file at read-only path $path")
end
check_scoped_open(f, IO)
open(f, sys_abspath(root, path); read=read, write=write, kws...)
@! open(sys_abspath(root, path); read=read, write=write, kws...)
end

function Base.mkdir(root::AbstractFileSystemRoot, path::RelPath; kws...)
Expand Down
36 changes: 32 additions & 4 deletions test/runtests.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
using DataSets
using Test
using UUIDs
using ResourceContexts

using DataSets: FileSystemRoot

Expand Down Expand Up @@ -37,21 +38,48 @@ end
@test ds.uuid == UUID("b498f769-a7f6-4f67-8d74-40b770398f26")
end

@testset "open() for DataSet" begin
proj = DataSets.load_project("Data.toml")

text_data = dataset(proj, "a_text_file")
@test open(text_data) isa Blob
@test read(open(text_data), String) == "Hello world!\n"
@context begin
@test read(@!(open(text_data)), String) == "Hello world!\n"
end

tree_data = dataset(proj, "a_tree_example")
@test open(tree_data) isa BlobTree
@context begin
@test @!(open(tree_data)) isa BlobTree
tree = @! open(tree_data)
@test readdir(tree) == ["1.csv", "2.csv"]
end
end

#-------------------------------------------------------------------------------
@testset "open() functions" begin
@testset "open() for Blob and BlobTree" begin
blob = Blob(FileSystemRoot("data/file.txt"))
@test open(identity, String, blob) == "Hello world!\n"
@test String(open(identity, Vector{UInt8}, blob)) == "Hello world!\n"
@test open(io->read(io,String), IO, blob) == "Hello world!\n"
@test open(io->read(io,String), IO, blob) == "Hello world!\n"
@test open(identity, Blob, blob) === blob
# Unscoped form for types which support it.
# Unscoped forms
@test open(String, blob) == "Hello world!\n"
@test String(open(Vector{UInt8}, blob)) == "Hello world!\n"
@test_throws ArgumentError("You must use the scoped form `open(your_function, AsType, data)` to open as type IO") open(IO, blob)
@test read(open(IO, blob), String) == "Hello world!\n"

tree = BlobTree(FileSystemRoot("data"))
@test open(identity, BlobTree, tree) === tree

# Context-based forms
@context begin
@test @!(open(String, blob)) == "Hello world!\n"
@test String(@! open(Vector{UInt8}, blob)) == "Hello world!\n"
@test read(@!(open(IO, blob)), String) == "Hello world!\n"
@test @!(open(Blob, blob)) === blob
@test @!(open(BlobTree, tree)) === tree
end
end

#-------------------------------------------------------------------------------
Expand Down