-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.airflowignore does not use 'glob' as .gitignore for example #21392
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
I thin changing to glob would not be feasible, as glob and regexp are not compatible. But better error message would be nice indeed. |
Another think to add could be to add the list of ignored files to the logs (With some reasonable frequency) with information that they were matched by the regexp (that would cover the case where regexp woudl not fail, but you would end up in not ignoring the files that you want). I marked it as "good first issue" I think this one might be a good one for some "fresh" contribution. |
Yeah, if we wanted to change the format of this file to more closely match what people expect (which I agree is confusing) -- i.e. (Of the two I'd prefer a new name, but I don't know what I'd call the file) |
The comment would be also not good, because people would not know they have to add it :( . Same story as now. I think we are a bit stuck with regexp until |
One option might be to add a config option to specify the file syntax, which would default to
May be something I haven't considered though. |
@potiuk thoughts on the configuration suggestion? I think supporting the Unix glob-style is reasonable, since it would match user expectations from other tools much more closely:
These all support a similar (identical?) syntax, so we'd be in good company. If we think it's worth pursuing it's probably something I can take on. |
Yeah. That looks like a good idea to add configuration option. That's a clear path for deprecation as well - we coul add a deprecation for regexp now and swap the default in Airflow 3. |
A new configuration parameter "CORE_IGNORE_FILE_SYNTAX" is added to allow patterns in .airflowignore files to be interpreted as either regular expressions (the default) or glob expressions as found in .gitignore files. This allows users to use patterns they will be familiar with from tools such as git, helm and docker. Glob expressions support wildcard matches ("*", "?") within a directory as well as character classes ("[0-9]"). In addition, zero or more directories can be matched using "**". Patterns can be negated by prefixing a "!" at the beginning of the pattern. The "fnmatch" library in core Python does not produce patterns that are fully compliant with the kind of patterns that users will be used to from gitignore or dockerignore files, so the globs are parsed using the pathspec package from PyPI. To aid with debugging ignorefile patterns a more helpful error message is emitted in the logs for invalid patterns, which are now skipped rather than causing a hard-to-read scheduler stack trace.
A new configuration parameter "CORE_IGNORE_FILE_SYNTAX" is added to allow patterns in .airflowignore files to be interpreted as either regular expressions (the default) or glob expressions as found in .gitignore files. This allows users to use patterns they will be familiar with from tools such as git, helm and docker. Glob expressions support wildcard matches ("*", "?") within a directory as well as character classes ("[0-9]"). In addition, zero or more directories can be matched using "**". Patterns can be negated by prefixing a "!" at the beginning of the pattern. The "fnmatch" library in core Python does not produce patterns that are fully compliant with the kind of patterns that users will be used to from gitignore or dockerignore files, so the globs are parsed using the pathspec package from PyPI. To aid with debugging ignorefile patterns a more helpful error message is emitted in the logs for invalid patterns, which are now skipped rather than causing a hard-to-read scheduler stack trace. closes: #21392
Description
Hi,
It would be nice of the file
.airflowignore
also used 'glob' as.gitignore
for example. In my case when I used*_test.py
instead of.*_test.py
, I could not see the dags and also got errors when I usedairflow dags list
. The error message was not easy to understand. Without the help of @potiuk, I was lost.An example is shown below:
Traceback (most recent call last): File "/usr/local/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 370, in _run_processor_manager processor_manager.start() File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 610, in start return self._run_parsing_loop() File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 620, in _run_parsing_loop self._refresh_dag_dir() File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 755, in _refresh_dag_dir self._file_paths = list_py_file_paths(self._dag_directory) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/file.py", line 169, in list_py_file_paths file_paths.extend(find_dag_file_paths(directory, safe_mode)) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/file.py", line 187, in find_dag_file_paths for file_path in find_path_from_directory(str(directory), ".airflowignore"): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/file.py", line 115, in find_path_from_directory patterns += [re.compile(line) for line in lines_no_comments if line] File "/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/file.py", line 115, in <listcomp> patterns += [re.compile(line) for line in lines_no_comments if line] File "/usr/local/lib/python3.8/re.py", line 252, in compile return _compile(pattern, flags) File "/usr/local/lib/python3.8/re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "/usr/local/lib/python3.8/sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "/usr/local/lib/python3.8/sre_parse.py", line 948, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "/usr/local/lib/python3.8/sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "/usr/local/lib/python3.8/sre_parse.py", line 668, in _parse raise source.error("nothing to repeat", re.error: nothing to repeat at position 0
Use case/motivation
No response
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: