-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hh globstar #38
Hh globstar #38
Conversation
Thank you for the PR. Unfortunately, #19 is about adding support for If you want to fix #19, you need to write an updated algorithm for the glob readdir functions in this file that can successfully traverse extra directory levels when encountering a |
Oh, I hadn't seen that implementation of @oxinabox. My implementation is indeed almost identical. One way of supporting globstar syntax for function globstar(g::Union{AbstractString,Glob.GlobMatch}, prefix::AbstractString = "";
relative::Union{Bool, Nothing} = nothing,
topdown::Bool = true,
follow_symlinks::Bool = true,
onerror::Union{Function, Nothing} = nothing
)
g = Glob.GlobMatch(g)
relative === nothing && (relative = isempty(prefix))
isempty(prefix) && (prefix = pwd())
fn = FilenameMatch(join([fn isa AbstractString ? fn : fn.pattern for fn in g.pattern], "/"), PATHNAME)
matches = String[]
for (root, dirs, files) in walkdir(prefix; topdown, follow_symlinks, onerror)
for file in files
file = joinpath(root, file)
relfile = relpath(file, prefix)
relpattern = Sys.iswindows() ? replace(relfile, '\\' => '/') : relfile
occursin(fn, relpattern) && push!(matches, relative ? relfile : file)
end
end
matches
end With this definition you get julia> globstar(glob"**/*.jl")
9-element Vector{String}:
"atest.jl"
"btest.jl"
"ctest.jl"
"noskip\\test1.jl"
"noskip\\test2.jl"
"noskip\\test3.jl"
"noskip\\hh\\test3.jl"
"skip\\test.jl"
"skip\\test2.jl"
julia> globstar(glob"**/*.jl", ".")
9-element Vector{String}:
".\\atest.jl"
".\\btest.jl"
".\\ctest.jl"
".\\noskip\\test1.jl"
".\\noskip\\test2.jl"
".\\noskip\\test3.jl"
".\\noskip\\hh\\test3.jl"
".\\skip\\test.jl"
".\\skip\\test2.jl"
julia> globstar(glob"**/*.jl", pwd())
9-element Vector{String}:
"C:\\Users\\hh\\test\\atest.jl"
"C:\\Users\\hh\\test\\btest.jl"
"C:\\Users\\hh\\test\\ctest.jl"
"C:\\Users\\hh\\test\\noskip\\test1.jl"
"C:\\Users\\hh\\test\\noskip\\test2.jl"
"C:\\Users\\hh\\test\\noskip\\test3.jl"
"C:\\Users\\hh\\test\\noskip\\hh\\test3.jl"
"C:\\Users\\hh\\test\\skip\\test.jl"
"C:\\Users\\hh\\test\\skip\\test2.jl" Edit: fixed rootdir |
The option |
P.S.: the above implementation only works with the current PR in place. |
Yes, that seems somewhat right. It is less flexible than the current implementation (which avoids making any assumptions about the contents of what makes up a path and permits arbitrary matches in the sequence including regexes), but otherwise somewhat accurate to it |
One could define this function for handling string args and otherwise test whether the pattern vector contains "**" and doesn't contain regexes and only then call this function, otherwise keep the old version in place for compatibility. |
There's a severe bug in the approach of my |
Your code seems actually pretty close to right, since you do need to first generate a list of all candidates and then filter out the ones that don't match. To maintain the rest of the matching ability, you just need to replace |
It seemed very close, but it actually wasn't, because latest with multiple globstars the function would always fail. So I reverted the changes and instead wrapped the main part of the function in a while loop which restarts the matching process with a new directory. Mutliple levels of globstars are supported by using an array of globstarmatches. I also added directory matching and a fileonly mode. Finally, I added a bunch of tests and hope that all edge cases are contained. There is one point I observed when matching the old way: julia> glob(["a", r".", "c"])
4-element Vector{String}:
"a\\.b\\c"
"a\\.c\\c"
"a\\b\\c"
"a\\c\\c" So it seems that one would have to set start and end markers in order to only match a single character for the middle directory. |
I think I got everything sorted out by now. |
Some final thought perhaps:
|
@vtjnash I continued thinking about my statement about multiple globstars. I thought that I had found an example that would not pass without my array queue but I couldn't repeat it. Mathematically I now think, it is not necessary to retry with a new position of the first wildcard as soon as the second one is reached. So not queuing saves a lot of computing. Therefore, I went back to simple indexing as you did with star and all tests pass. |
I just want to apologize that I have barely any time for development for the next couple weeks, but I should be fully back in July. Originally my expectation was that if someone wanted fancier options, they could construct the object manually, instead of using the macro to split it. |
@vtjnash understood, thanks for the feedback. I'm still not completely satisfied with the current status for various reasons
If ever you find the time for thinking about these points, just drop a note here |
Any news on this one? |
Yeah, I am finally able to get back to this. Thank you for your patience and persistence! I think we should break this down into a couple different PRs for easier discussion and review:
Would you mind making a new PR with just the fnmatch improvements? We can keep this open until everything is merged, but it would help me with thinking through reviewing that it handles https://man.freebsd.org/cgi/man.cgi?query=zshexpn&sektion=1 Looks like the patterns to detect are
This sounds like the expected behavior of using fnmatch without the FNM_PATHNAME flag. Is there an additional distinction?
I don't entirely know why I did that. Sounds like a bug actually, as I meant for it to generate separate fnmatch calls for each directory, such that each component of the glob was split on |
That sounds like a good proposal. I will come up with a globstar PR for fnmatch asap. |
Close in favour of #39 |
This PR addresses #19