Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#16037 Templated requirements.txt in Python operators #17349

Merged
merged 21 commits into from
Jan 7, 2022

Conversation

rounakdatta
Copy link
Contributor

@rounakdatta rounakdatta commented Jul 30, 2021

Closes: #16037
This pull request addresses the changes discussed in the aforementioned issue.

  • Added new unit tests
  • This change is not breaking
  • Updating documentation of the operator

Changes:

  • Support for passing a requirements.txt file in the requirements field of the PythonVirtualenvOperator.
  • The file can be named *.txt (anything ending with .txt), and can be jinja templated.

Note: template_searchpath must be appropriately set while using arbitrary locations of the template file.

@boring-cyborg boring-cyborg bot added the area:core-operators Operators, Sensors and hooks within Core Airflow label Jul 30, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jul 30, 2021
@rounakdatta rounakdatta requested a review from uranusjr August 25, 2021 13:17
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to support "prepare_virtualenv" backwards compatible behaviour.

@@ -87,14 +87,15 @@ def prepare_virtualenv(
:param system_site_packages: Whether to include system_site_packages in your virtualenv.
See virtualenv documentation for more information.
:type system_site_packages: bool
:param requirements: List of additional python packages
:type requirements: List[str]
:param requirements: Path to the requirements.txt file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still (backwards compatibility) handle the case where requirements are List[str]. It will make it a bit more complex, but I think it is needed.

I propose to keep the old requrements handling only List[str] and add a new parameter requirements_file_path - and check if only one of those is passed (and act accordingly). I think it is very ambiguous to name requirements something that is path to requirements file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed! 😃

@uranusjr uranusjr closed this Aug 29, 2021
@uranusjr uranusjr reopened this Aug 29, 2021
@uranusjr uranusjr closed this Aug 30, 2021
@uranusjr uranusjr reopened this Aug 30, 2021
@uranusjr
Copy link
Member

Come on CI

@rounakdatta
Copy link
Contributor Author

rounakdatta commented Aug 30, 2021

Looks like one of the tests is stuck, shall I push an empty commit @uranusjr ?

@uranusjr
Copy link
Member

No need, I just re-triggered them.

@rounakdatta
Copy link
Contributor Author

One check is failing where apache-airflow[devel-ci] package is failing to install because of a dependency version incompatibility. I believe this didn't get introduced due to my change 😬

@potiuk
Copy link
Member

potiuk commented Aug 30, 2021

One check is failing where apache-airflow[devel-ci] package is failing to install because of a dependency version incompatibility. I believe this didn't get introduced due to my change 😬

I think you need to rebase - that was celery-5 change that got merged recently, and it is conflcting unless you rebase to latest main.

@kaxil kaxil requested review from uranusjr and removed request for ryanahamilton, dimberman, bbovenzi, jhtimmins and vikramkoka December 9, 2021 13:38
@kaxil
Copy link
Member

kaxil commented Dec 9, 2021

tests are failing:

=================================== FAILURES ===================================
  __________________ TestPythonVirtualenvOperator.test_add_dill __________________
  
  self = <tests.operators.test_python.TestPythonVirtualenvOperator testMethod=test_add_dill>
  
      def test_add_dill(self):
          def f():
              import dill  # noqa: F401
              import lazy_object_proxy  # noqa: F401
      
  >       self._run_as_operator(f, use_dill=True, system_site_packages=False)
  
  tests/operators/test_python.py:750: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  tests/operators/test_python.py:742: in _run_as_operator
      task.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE)
  airflow/utils/session.py:69: in wrapper
      return func(*args, session=session, **kwargs)
  airflow/models/baseoperator.py:1348: in run
      session=session,
  airflow/utils/session.py:66: in wrapper
      return func(*args, **kwargs)
  airflow/models/taskinstance.py:1636: in run
      session=session,
  airflow/utils/session.py:66: in wrapper
      return func(*args, **kwargs)
  airflow/models/taskinstance.py:1331: in _run_raw_task
      self._execute_task_with_callbacks(context)
  airflow/models/taskinstance.py:1457: in _execute_task_with_callbacks
      result = self._execute_task(context, self.task)
  airflow/models/taskinstance.py:1513: in _execute_task
      result = execute_callable(context=context)
  airflow/operators/python.py:408: in execute
      return super().execute(context=serializable_context)
  airflow/operators/python.py:181: in execute
      return_value = self.execute_callable()
  airflow/operators/python.py:460: in execute_callable
      string_args_filename,

This module is a part of setuptools, and not guaranteed to be installed
into a new virtual environment.

Fortunately funcsigs exposes the __version__ attribute at top level, so
we don't really need to read package metadata to test it.
@uranusjr
Copy link
Member

uranusjr commented Jan 6, 2022

Those test_add_dill tests shouldn’t import lazy_object_proxy, that is not guaranteed to be injected.

@uranusjr uranusjr force-pushed the 16037-support-requirements-txt branch 2 times, most recently from e2d2abf to cf53aa7 Compare January 6, 2022 06:47
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I committed and pushed the previous review suggestions directly to the branch.

@uranusjr uranusjr changed the title #16037 Add support for passing templated requirements.txt in PythonVirtualenvOperator #16037 Templated requirements.txt in Python operators Jan 7, 2022
@uranusjr uranusjr merged commit b597cea into apache:main Jan 7, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 7, 2022

Awesome work, congrats on your first merged pull request!

@jedcunningham jedcunningham added the type:new-feature Changelog: New Features label Mar 4, 2022
@basjacobs93
Copy link

basjacobs93 commented Jun 9, 2022

This does not seem to be working properly when requirements is a string which contains a file path to a requirements file. If I provide requirements with a path to a requirements.txt file, I get (I obfuscated the pathname)

ERROR: Invalid requirement: '<pathname>/requirements.txt' (from line 1 of /tmp/venvbcsbi3im/requirements.txt)
Hint: It looks like a path. The path does exist. The argument you provided (<pathname>/requirements.txt) appears to be a requirements file. If that is the case, use the '-r' flag to install the packages specified within it.

In this line, the requirements parameter (in this case a file path) is stored into self.requirements:

self.requirements = requirements

Then, this file path is put into the requirements_file_contents variable

requirements_file_contents = self.requirements

Next, this file path (rather than its contents) is written to the f'{tmp_dir}/requirements.txt' file

file.write(requirements_file_contents)

Indeed, if I pass it requirements="-r <pathname>/requirements.txt", it works correctly.

I think it would be best to change

self.requirements = requirements

to

with open(requirements, "r") as file:
    self.requirements = file.readlines()

@uranusjr
Copy link
Member

uranusjr commented Jun 9, 2022

A pull request would be much appreciated.

@basjacobs93
Copy link

See PR #24368
The contents of the file requirements are now stored into self.requirements. These are then saved to the temporary requirements file which is passed to pip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow full tests needed We need to run full set of tests for this PR to merge type:new-feature Changelog: New Features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow using requirments.txt in PythonVirtualEnvOperator
8 participants