-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT]: Throw error for invalid ** usage outside folder segments (e.g. /tmp/**.csv) #3100
Changes from 3 commits
80fbfb3
3d0a066
7481286
d1a12ba
93301f6
37b91dd
b1202eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -404,6 +404,29 @@ pub async fn glob( | |
}; | ||
let glob = glob.as_str(); | ||
|
||
// We need to do some validation on the glob pattern before compiling it, since the globset crate is very permissive | ||
// and will happily compile patterns that don't make sense without throwing an error. | ||
fn verify_glob(glob: &str) -> super::Result<()> { | ||
// Catch for cases like `s3://bucket/path/**.txt` | ||
// NOTE: "\**" is a valid pattern that matches a literal `*`, followed by anything, so we need to only capture cases where `**` is not preceded by a backslash | ||
let re = regex::Regex::new(r"(?:[^\\]|^)\*\*").unwrap(); | ||
|
||
for segment in glob.split(GLOB_DELIMITER) { | ||
if re.is_match(segment) && segment != "**" { | ||
return Err(super::Error::InvalidArgument { | ||
msg: format!( | ||
"Invalid usage of '**' in glob pattern. The '**' wildcard must occupy an entire path segment and be surrounded by '{}' characters. Found invalid usage in '{}'.", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we be more helpful with the error message as well? Would love to add a suggestion here for the user to do this glob path instead: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I've slightly rewritten the regex to process the path fully instead of segmenting it delimited portions. Have rewritten it to give suggestions in this manner: is this the behaviour you expect?
|
||
GLOB_DELIMITER, glob | ||
), | ||
}); | ||
} | ||
} | ||
|
||
Ok(()) | ||
} | ||
|
||
verify_glob(glob)?; | ||
|
||
let glob_fragments = to_glob_fragments(glob)?; | ||
let full_glob_matcher = GlobBuilder::new(glob) | ||
.literal_separator(true) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can move this out of the function, and just run Rust unit-tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes a lot of sense.
Wasn't familiar with how Rust unit tests worked, have moved the testing logic for the
verify_glob
function toobject_store_glob.rs
.