-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the TOML stdlib in code loading #36018
Conversation
Does this make Pkg operations faster? Thinking of #27414 (comment) |
If you mean loading packages, then I don't think it was bottlenecked on reading the TOML files, even though it did it quite inefficiently. |
Any measurements on how this affects code loading speed? One thought I’d had for that was to specify what values you’d like to extract from a file and only return that data instead of constructing the whole data structure. So if you’re just looking for a single value, it could be done with no allocation at all. |
Yeah, it is possible, but the time to fully parse my pretty big v1.4 Manifest.toml file is ~300 us (including reading it from disc). Compared to the cost of actually loading a package that is negligible. I don't think this really needs to be optimized further. |
I was thinking about the issue of parsing the same file over and over. But maybe caching is better for that. |
I think it is good to completely avoid opening and reading through the file which a cache should help with. Also, it is simpler in the sense that you don't need a second version of the parser that does something more akin lazy parsing. |
3f9ecfd
to
741cff8
Compare
base/toml_parser.jl
Outdated
# [a.b.c.d] doesn't "define" the table [a] | ||
# so keys can later be added to [a], therefore | ||
# we need to keep track of what tables are | ||
# actualyl "defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# actualyl "defined | |
# actually defined |
base/toml_parser.jl
Outdated
@eval macro $(Symbol("try"))(expr) | ||
:( | ||
v = $(esc(expr)); | ||
v isa $ParserError && return v; | ||
v; | ||
) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eval macro $(Symbol("try"))(expr) | |
:( | |
v = $(esc(expr)); | |
v isa $ParserError && return v; | |
v; | |
) | |
end | |
@eval macro $(:var"try")(expr) | |
return quote | |
v = $(esc(expr)) | |
v isa ParserError && return v | |
v | |
end | |
end |
base/toml_parser.jl
Outdated
function recurse_dict!(l::Parser, d::Dict, dotted_keys::AbstractVector{String}, check=true)::Err{TOMLDict} | ||
for i in 1:length(dotted_keys) | ||
key = dotted_keys[i] | ||
d = get!(() -> TOMLDict(), d, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d = get!(() -> TOMLDict(), d, key) | |
d = get!(TOMLDict, d, key) |
base/toml_parser.jl
Outdated
|
||
isvalid_hex(c::Char) = isdigit(c) || ('a' <= c <= 'f') || ('A' <= c <= 'F') | ||
isvalid_oct(c::Char) = '0' <= c <= '7' | ||
isvalid_binary(c::Char) = '0' <= c <= '2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isvalid_binary(c::Char) = '0' <= c <= '2' | |
isvalid_binary(c::Char) = '0' <= c <= '1' |
base/toml_parser.jl
Outdated
contains_underscore |= read_underscore | ||
end | ||
if !ok_end_value(peek(l)) | ||
error() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: message is required
base/toml_parser.jl
Outdated
subs = take_substring(l) | ||
# Need to pass a AbstractString to `parse` so materialize it in case it | ||
# contains underscore. | ||
# vvvvvvv <- this looksl like a dude flipping the bird |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# vvvvvvv <- this looksl like a dude flipping the bird |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:(
base/toml_parser.jl
Outdated
e isa Base.OverflowError && return(ParserError(ErrOverflowError)) | ||
error("internal parser error: did not correctly discredit $(repr(s)) as an int") | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end | |
return v | |
end |
base/toml_parser.jl
Outdated
set_marker!(l) | ||
@try accept_two(l, isdigit) | ||
day = parse_int(l, false) | ||
# Verify the real range in the constructor below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true--as stated at the top of the file, no validity checking is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is true when you have the Dates stdlib available:
julia> Date(1989, 02, 29)
ERROR: ArgumentError: Day: 29 out of range (1:28)
base/toml_parser.jl
Outdated
if ok_end_value(peek(l)) | ||
if (read_space = accept(l, ' ')) | ||
if !isdigit(peek(l)) | ||
return Date(year, month, day) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend for this to throw errors (whereas everything else returns them)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a try catch here as well.
base/toml_parser.jl
Outdated
accept_two(l, f::F) where {F} = accept_n(l, 2, f) || return(ParserError(ErrParsingDateTime)) | ||
function parse_datetime(l)::Err{Union{DateTime, Date}} | ||
# Year has already been eaten when we reach here | ||
year = parse_int(l, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
year = parse_int(l, false) | |
year = parse_int(l, false)::Int |
base/loading.jl
Outdated
entry = first(d[name]) | ||
if found_name | ||
uuid = get(entry, "uuid", nothing) | ||
return PkgId(UUID(uuid), name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERROR: MethodError: Cannot
convert an object of type Nothing to an object of type UInt128
base/loading.jl
Outdated
error("expected a single entry for $(repr(name)) in $(repr(project_file))") | ||
end | ||
entry = first(d[name]) | ||
if found_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always true here?
elseif deps isa Dict | ||
for (dep, uuid) in deps | ||
if dep === name | ||
return PkgId(UUID(uuid), name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuid
might not be of the proper type if the file is malformed
base/loading.jl
Outdated
return nothing | ||
found_where || return nothing | ||
found_name || return PkgId(name) | ||
# Only here is deps was not a dict which mean we have a unique name for the dep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Only here is deps was not a dict which mean we have a unique name for the dep | |
# Only here if deps was not a dict which mean we have a unique name for the dep |
base/toml_parser.jl
Outdated
# In case we do not have the Dates stdlib available | ||
# we parse DateTime into these internal structs, | ||
# note that these do not do any argument checking | ||
struct Date | ||
year::Int | ||
month::Int | ||
day::Int | ||
end | ||
struct Time | ||
hour::Int | ||
minute::Int | ||
second::Int | ||
ms::Int | ||
end | ||
struct DateTime | ||
date::Date | ||
time::Time | ||
end | ||
DateTime(y, m, d, h, mi, s, ms) = | ||
DateTime(Date(y,m,d), Time(h, mi, s, ms)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be so much better if we could either use delayed definitions here (Requires.jl?) or make these a parser error (unsupported) rather than re-implementing incompatible structs with the same names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't make it a parser error (we need to support arbitrary TOML files) and we want code loading to work without Dates. So this was the best I could come up with.
Added the docs and tests labels since the new toml_parser.jl file will need both. Otherwise, most looks pretty straight-forward here and should be in pretty good shape to merge. |
base/loading.jl
Outdated
p::TOML.Parser | ||
d::Dict{String, Dict{String, Any}} | ||
end | ||
TOMLCache() = TOMLCache(TOML.Parser(), Dict()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't matter, but maybe:
TOMLCache() = TOMLCache(TOML.Parser(), Dict()) | |
TOMLCache() = TOMLCache(TOML.Parser(), Dict{String,Dict{String,Any}}()) |
?
Thanks for pointing out the better alternative. It seems to be identical to the difference between x::T = y and x = y::T I'll switch to the style you demonstrated for future commits. |
b1d6fc8
to
240a190
Compare
I've rebased this and changed the target branch to https://github.com/JuliaLang/julia/tree/kc/load_toml for a simpler review. |
@timholy If you want to do some invalidation sanity checks, you could use this branch together with JuliaLang/Pkg.jl#1984 and see how things look. |
240a190
to
47119ee
Compare
47119ee
to
031448d
Compare
This is ready to go but it breaks Revise and I don't want to break people's workflow so I'll wait a bit with merging. It should be a fairly small change to https://github.com/timholy/Revise.jl/blob/8336b10a5e7d0e4b1bb0726cf426469c8f866d2e/src/pkgs.jl#L459-L488 @timholy. I didn't really know what the function does so I had a hard time updating it myself. |
… Base TOML parser
031448d
to
c601fbf
Compare
Revise has been updated to deal with this PR now. |
… Base TOML parser (JuliaLang#36018)
… Base TOML parser (JuliaLang#36018)
The current way TOML files are parsed in Base during code loading has a few issues:
Therefore, it would make sense to try:
This PR starts this process by putting a newly written TOML parser into Base and re-implements code loading to take advantage of it. The parser is available as a package with proper API, docs, tests, etc (https://github.com/KristofferC/TOMLX.jl) but the only thing this PR adds is the parser part (with a more low-level API) since that is what is needed in Base and I wanted to make this PR have as little "noise" as possible.
It has a significant number of tests (https://github.com/KristofferC/TOMLX.jl/blob/master/test/readme.jl, https://github.com/KristofferC/TOMLX.jl/blob/master/test/toml_test.jl) and I have benchmarked it pretty thoroughly.
For this PR, I think it makes sense to mostly comment on the code loading parts (@StefanKarpinski for review). For the TOML parsing itself, perhaps it makes sense to focus that discussion to the https://github.com/KristofferC/TOMLX.jl/ repo?.
After this step is done, we update Pkg to start using the parser here for it's parsing.
Then we can try to finally make a TOML stdlib and have Pkg depend on that. This will remove the last bundled dependency for Pkg.
There is a TODO in this PR and that is that I right now use a global cache object to store the
filename => Dict
mapping in code loading which gets empited inBase.require
. This should be moved to a function local cache that gets created inBase.require
and threaded through all functions. I wanted to put up this PR anyway for people to get a chance to review things as soon as possible.One possible question is why this parser is not just the one that currently lives in the
ext
folder in Pkg. The reason for this is that theext
- parser (forked from https://github.com/JuliaLang/TOML.jl) was written a quite long time ago for an old spec of TOML and requires some significant updates. I was also a bit unhappy with the performance of it (it allocates scratch buffers a bit too much and makes copies of strings etc) so I felt a clean implementation might lead to a better end result.Fixes #27414.