Skip to content

Commit

Permalink
Implement layout="zip" for Lambda/GCF, deprecating lambdex (Cherry-pi…
Browse files Browse the repository at this point in the history
…ck of #19076) (#19120)

This fixes #18879 by allowing the `python_awslambda` and
`python_google_cloud_function` FaaS artefacts to be generated in
"simple" format, using the `pex3 venv create --layout=flat-zipped`
functionality recently added in PEX 2.1.135
(https://github.com/pantsbuild/pex/releases/tag/v2.1.135).

This format is just: put everything at the top-level. For instance, the
zip contains `cowsay/__init__.py` etc., rather than
`.deps/cowsay-....whl`. This avoids the need to do the dynamic PEX
initialisation/venv creation.

This shifts the dynamic dependency computation/extraction/layout from
run-time to build-time, relying on the FaaS environment to be generally
consistent. It shouldn't change what actually happens after
initialisation. This can:

- reduce cold-starts noticeably: for instance, some of our lambdas spend
1s doing PEX/Lambdex start up.
- reduce package size somewhat (the PEX `.bootstrap/` folder seems to be
about 2MB uncompressed, ~1MB compressed).
- increase build times.
 
For instance, for one Python 3.9 Lambda in our codebase:

| metric | before | after |
|---|---|---|
| init time on cold start | 2.3-2.5s | 1.3-1.4s (-1s) |
| compressed size |  24.6MB | 23.8MB (-0.8MB) |
| uncompressed size | 117.8MB | 115.8MB (-2.0MB) |
| PEX-construction build time | ~5s | ~5s |
| PEX-postprocessing build time | 0.14s | 4.8s |

(The PEX-postprocessing time metric is specifically the time to run the
`Setting up handler` (lambdex) or `Build python_awslambda` (`pex3 venv
create`) process, computed by running `pants --keep-sandboxes=always
package ...` for each layout, and then `hyperfine -r3 -w1
path/to/first/__run.sh path/to/second/__run.sh`. This _doesn't_ include
the time to construct the input PEX, which is the same for both.)

---

This functionality is driven by adding a new option to the
`[lambdex].layout` option added in #19074. In #19074 (targeted for
2.17), it defaults `lambdex` (retaining the current code paths). This PR
flips the default to the new option `zip`, which keys into the
functionality above. I've tried to keep the non-lambdex implementation
generally separate to the lambdex one, rather than reusing all of the
code that happens to be common currently, because it'd make sense to
deprecate/remove the lambdex functionality and thus I feel it's best for
this new functionality to be mostly a fresh start.

This PR's commits can be reviewed independently. 

I _think_ this is an acceptable MVP for this functionality, but there's
various bits of follow-up:

- add a warning about `files` being loaded into these packages, which
has been temporarily lost (#19027)
- adjust documentation #19067
- other improvements like #18195 and #18880 
- improve performance, e.g. potentially `pex3 venv create ...` could use
the lock file and sources to directly compute the appropriate files,
without having to materialise a normal pex first

This is a re-doing of #19022 with a simpler approach to deprecation, as
discussed in
#19074 (comment)
and
#19032 (comment).
The phasing will be:

| release | supports lambdex? | supports zip? | default layout | deprecation warnings |
|---|---|---|---|---|
| 2.17 (this PR) | ✅ | ✅ | lambdex | if `layout = "lambdex"` is implicit, tell people to set it: recommend `zip`, but allow `lambdex` if they have to |
| 2.18 | ✅ | ✅ | zip | if `layout = "lambdex"` is set at all, tell people to remove it and switch to `zip` |
| 2.19 | ❌ | ✅ | zip | none, migration over (or maybe just about removing the `[lambdex]` section entirely) |
  • Loading branch information
huonw authored May 23, 2023
1 parent c4acc98 commit 3453187
Show file tree
Hide file tree
Showing 11 changed files with 687 additions and 54 deletions.
39 changes: 30 additions & 9 deletions src/python/pants/backend/awslambda/python/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,13 @@
PythonAwsLambdaIncludeRequirements,
PythonAwsLambdaRuntime,
)
from pants.backend.python.util_rules import pex_from_targets
from pants.backend.python.util_rules.faas import BuildLambdexRequest, PythonFaaSCompletePlatforms
from pants.backend.python.subsystems.lambdex import Lambdex, LambdexLayout
from pants.backend.python.util_rules.faas import (
BuildLambdexRequest,
BuildPythonFaaSRequest,
PythonFaaSCompletePlatforms,
)
from pants.backend.python.util_rules.faas import rules as faas_rules
from pants.core.goals.package import BuiltPackage, OutputPathField, PackageFieldSet
from pants.core.util_rules.environments import EnvironmentField
from pants.engine.rules import Get, collect_rules, rule
Expand All @@ -38,22 +43,38 @@ class PythonAwsLambdaFieldSet(PackageFieldSet):
@rule(desc="Create Python AWS Lambda", level=LogLevel.DEBUG)
async def package_python_awslambda(
field_set: PythonAwsLambdaFieldSet,
lambdex: Lambdex,
) -> BuiltPackage:
if lambdex.layout is LambdexLayout.LAMBDEX:
return await Get(
BuiltPackage,
BuildLambdexRequest(
address=field_set.address,
target_name=PythonAWSLambda.alias,
complete_platforms=field_set.complete_platforms,
runtime=field_set.runtime,
handler=field_set.handler,
output_path=field_set.output_path,
include_requirements=field_set.include_requirements.value,
script_handler=None,
script_module=None,
# The AWS-facing handler function is always lambdex_handler.handler, which is the
# wrapper injected by lambdex that manages invocation of the actual handler.
handler_log_message="lambdex_handler.handler",
),
)

return await Get(
BuiltPackage,
BuildLambdexRequest(
BuildPythonFaaSRequest(
address=field_set.address,
target_name=PythonAWSLambda.alias,
complete_platforms=field_set.complete_platforms,
runtime=field_set.runtime,
handler=field_set.handler,
output_path=field_set.output_path,
include_requirements=field_set.include_requirements.value,
script_handler=None,
script_module=None,
# The AWS-facing handler function is always lambdex_handler.handler, which is the
# wrapper injected by lambdex that manages invocation of the actual handler.
handler_log_message="lambdex_handler.handler",
reexported_handler_module=PythonAwsLambdaHandlerField.reexported_handler_module,
),
)

Expand All @@ -62,5 +83,5 @@ def rules():
return [
*collect_rules(),
UnionRule(PackageFieldSet, PythonAwsLambdaFieldSet),
*pex_from_targets.rules(),
*faas_rules(),
]
69 changes: 67 additions & 2 deletions src/python/pants/backend/awslambda/python/rules_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ def complete_platform(rule_runner: PythonRuleRunner) -> bytes:
"major_minor_interpreter",
all_major_minor_python_versions(Lambdex.default_interpreter_constraints),
)
def test_create_hello_world_lambda(
def test_create_hello_world_lambda_with_lambdex(
rule_runner: PythonRuleRunner, major_minor_interpreter: str, complete_platform: str, caplog
) -> None:
rule_runner.write_files(
Expand Down Expand Up @@ -197,7 +197,7 @@ def handler(event, context):
), "Using include_requirements=False should exclude third-party deps"


def test_warn_files_targets(rule_runner: PythonRuleRunner, caplog) -> None:
def test_warn_files_targets_with_lambdex(rule_runner: PythonRuleRunner, caplog) -> None:
rule_runner.write_files(
{
"assets/f.txt": "",
Expand Down Expand Up @@ -257,3 +257,68 @@ def handler(event, context):
assert "assets/f.txt:files" in caplog.text
assert "assets:relocated" in caplog.text
assert "assets:resources" not in caplog.text


def test_create_hello_world_lambda(rule_runner: PythonRuleRunner) -> None:
rule_runner.write_files(
{
"src/python/foo/bar/hello_world.py": dedent(
"""
import mureq
def handler(event, context):
print('Hello, World!')
"""
),
"src/python/foo/bar/BUILD": dedent(
"""
python_requirement(name="mureq", requirements=["mureq==0.2"])
python_sources()
python_awslambda(
name='lambda',
handler='foo.bar.hello_world:handler',
runtime="python3.7",
)
python_awslambda(
name='slimlambda',
include_requirements=False,
handler='foo.bar.hello_world:handler',
runtime="python3.7",
)
"""
),
}
)

zip_file_relpath, content = create_python_awslambda(
rule_runner,
Address("src/python/foo/bar", target_name="lambda"),
expected_extra_log_lines=(" Handler: lambda_function.handler",),
extra_args=["--lambdex-layout=zip"],
)
assert "src.python.foo.bar/lambda.zip" == zip_file_relpath

zipfile = ZipFile(BytesIO(content))
names = set(zipfile.namelist())
assert "mureq/__init__.py" in names
assert "foo/bar/hello_world.py" in names
assert (
zipfile.read("lambda_function.py") == b"from foo.bar.hello_world import handler as handler"
)

zip_file_relpath, content = create_python_awslambda(
rule_runner,
Address("src/python/foo/bar", target_name="slimlambda"),
expected_extra_log_lines=(" Handler: lambda_function.handler",),
extra_args=["--lambdex-layout=zip"],
)
assert "src.python.foo.bar/slimlambda.zip" == zip_file_relpath

zipfile = ZipFile(BytesIO(content))
names = set(zipfile.namelist())
assert "mureq/__init__.py" not in names
assert "foo/bar/hello_world.py" in names
assert (
zipfile.read("lambda_function.py") == b"from foo.bar.hello_world import handler as handler"
)
9 changes: 9 additions & 0 deletions src/python/pants/backend/awslambda/python/target_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,20 @@


class PythonAwsLambdaHandlerField(PythonFaaSHandlerField):
# This doesn't matter (just needs to be fixed), but is the default name used by the AWS
# console when creating a Python lambda, so is as good as any
# https://docs.aws.amazon.com/lambda/latest/dg/python-handler.html
reexported_handler_module = "lambda_function"

help = help_text(
f"""
Entry point to the AWS Lambda handler.
{PythonFaaSHandlerField.help}
This is re-exported at `{reexported_handler_module}.handler` in the resulting package to be
used as the configured handler of the Lambda in AWS. It can also be accessed under its
source-root-relative module path, for example: `path.to.module.handler_func`.
"""
)

Expand Down
52 changes: 37 additions & 15 deletions src/python/pants/backend/google_cloud_function/python/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,13 @@
PythonGoogleCloudFunctionRuntime,
PythonGoogleCloudFunctionType,
)
from pants.backend.python.util_rules import pex_from_targets
from pants.backend.python.util_rules.faas import BuildLambdexRequest, PythonFaaSCompletePlatforms
from pants.backend.python.subsystems.lambdex import Lambdex, LambdexLayout
from pants.backend.python.util_rules.faas import (
BuildLambdexRequest,
BuildPythonFaaSRequest,
PythonFaaSCompletePlatforms,
)
from pants.backend.python.util_rules.faas import rules as faas_rules
from pants.core.goals.package import BuiltPackage, OutputPathField, PackageFieldSet
from pants.core.util_rules.environments import EnvironmentField
from pants.engine.rules import Get, collect_rules, rule
Expand All @@ -38,28 +43,45 @@ class PythonGoogleCloudFunctionFieldSet(PackageFieldSet):
@rule(desc="Create Python Google Cloud Function", level=LogLevel.DEBUG)
async def package_python_google_cloud_function(
field_set: PythonGoogleCloudFunctionFieldSet,
lambdex: Lambdex,
) -> BuiltPackage:
if lambdex.layout is LambdexLayout.LAMBDEX:
return await Get(
BuiltPackage,
BuildLambdexRequest(
address=field_set.address,
target_name=PythonGoogleCloudFunction.alias,
complete_platforms=field_set.complete_platforms,
runtime=field_set.runtime,
handler=field_set.handler,
output_path=field_set.output_path,
include_requirements=True,
# The GCP-facing handler function is always `main.handler` (We pass `-M main.py -H handler` to
# Lambdex to ensure this), which is the wrapper injected by Lambdex that manages invocation of
# the actual user-supplied handler function. This arrangement works well since GCF assumes the
# handler function is housed in `main.py` in the root of the zip (you can re-direct this by
# setting a `GOOGLE_FUNCTION_SOURCE` Google Cloud build environment variable; e.g.:
# `gcloud functions deploy {--build-env-vars-file,--set-build-env-vars}`, but it's non-trivial
# to do this right or with intended effect) and the handler name you configure GCF with is just
# the unqualified function name, which we log here.
script_handler="handler",
script_module="main.py",
handler_log_message="handler",
),
)

return await Get(
BuiltPackage,
BuildLambdexRequest(
BuildPythonFaaSRequest(
address=field_set.address,
target_name=PythonGoogleCloudFunction.alias,
complete_platforms=field_set.complete_platforms,
runtime=field_set.runtime,
handler=field_set.handler,
output_path=field_set.output_path,
include_requirements=True,
# The GCP-facing handler function is always `main.handler` (We pass `-M main.py -H handler` to
# Lambdex to ensure this), which is the wrapper injected by Lambdex that manages invocation of
# the actual user-supplied handler function. This arrangement works well since GCF assumes the
# handler function is housed in `main.py` in the root of the zip (you can re-direct this by
# setting a `GOOGLE_FUNCTION_SOURCE` Google Cloud build environment variable; e.g.:
# `gcloud functions deploy {--build-env-vars-file,--set-build-env-vars}`, but it's non-trivial
# to do this right or with intended effect) and the handler name you configure GCF with is just
# the unqualified function name, which we log here.
script_handler="handler",
script_module="main.py",
handler_log_message="handler",
reexported_handler_module=PythonGoogleCloudFunctionHandlerField.reexported_handler_module,
log_only_reexported_handler_func=True,
),
)

Expand All @@ -68,5 +90,5 @@ def rules():
return [
*collect_rules(),
UnionRule(PackageFieldSet, PythonGoogleCloudFunctionFieldSet),
*pex_from_targets.rules(),
*faas_rules(),
]
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ def complete_platform(rule_runner: PythonRuleRunner) -> bytes:
"major_minor_interpreter",
all_major_minor_python_versions(Lambdex.default_interpreter_constraints),
)
def test_create_hello_world_lambda(
def test_create_hello_world_lambda_with_lambdex(
rule_runner: PythonRuleRunner, major_minor_interpreter: str, complete_platform: str, caplog
) -> None:
rule_runner.write_files(
Expand Down Expand Up @@ -243,3 +243,45 @@ def handler(event, context):
assert "assets/f.txt:files" in caplog.text
assert "assets:relocated" in caplog.text
assert "assets:resources" not in caplog.text


def test_create_hello_world_gcf(rule_runner: PythonRuleRunner) -> None:
rule_runner.write_files(
{
"src/python/foo/bar/hello_world.py": dedent(
"""
import mureq
def handler(event, context):
print('Hello, World!')
"""
),
"src/python/foo/bar/BUILD": dedent(
"""
python_requirement(name="mureq", requirements=["mureq==0.2"])
python_sources()
python_google_cloud_function(
name='gcf',
handler='foo.bar.hello_world:handler',
runtime="python37",
type='event',
)
"""
),
}
)

zip_file_relpath, content = create_python_google_cloud_function(
rule_runner,
Address("src/python/foo/bar", target_name="gcf"),
expected_extra_log_lines=(" Handler: handler",),
extra_args=["--lambdex-layout=zip"],
)
assert "src.python.foo.bar/gcf.zip" == zip_file_relpath

zipfile = ZipFile(BytesIO(content))
names = set(zipfile.namelist())
assert "mureq/__init__.py" in names
assert "foo/bar/hello_world.py" in names
assert zipfile.read("main.py") == b"from foo.bar.hello_world import handler as handler"
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,19 @@


class PythonGoogleCloudFunctionHandlerField(PythonFaaSHandlerField):
# GCP requires "Your main file must be named main.py"
# https://cloud.google.com/functions/docs/writing#directory-structure-python
reexported_handler_module = "main"

help = help_text(
f"""
Entry point to the Google Cloud Function handler.
{PythonFaaSHandlerField.help}
This is re-exported at `{reexported_handler_module}.handler` in the resulting package to
used as the configured handler of the Google Cloud Function in GCP. It can also be accessed
under its source-root-relative module path, for example: `path.to.module.handler_func`.
"""
)

Expand Down
48 changes: 48 additions & 0 deletions src/python/pants/backend/python/subsystems/lambdex.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,19 @@
# Copyright 2019 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

from enum import Enum

from pants.backend.python.subsystems.python_tool_base import LockfileRules, PythonToolBase
from pants.backend.python.target_types import ConsoleScript
from pants.base.deprecated import warn_or_error
from pants.engine.rules import collect_rules
from pants.option.option_types import EnumOption
from pants.util.strutil import softwrap


class LambdexLayout(Enum):
LAMBDEX = "lambdex"
ZIP = "zip"


class Lambdex(PythonToolBase):
Expand All @@ -20,6 +30,44 @@ class Lambdex(PythonToolBase):
default_lockfile_resource = ("pants.backend.python.subsystems", "lambdex.lock")
lockfile_rules_type = LockfileRules.SIMPLE

layout = EnumOption(
default=LambdexLayout.LAMBDEX,
help=softwrap(
"""
Explicitly control the layout used for `python_awslambda` and
`python_google_cloud_function` targets. This option exists for the transition from
Lambdex-based layout to the plain zip layout, as recommended by cloud vendors.
"""
),
)

def warn_for_layout(self, target_alias: str) -> None:
if self.options.is_default("layout"):
lambda_message = (
" (you will need to also update the handlers configured in the cloud from `lambdex_handler.handler` to `lambda_function.handler`)"
if target_alias == "python_awslambda"
else ""
)

warn_or_error(
"2.19.0.dev0",
f"using the Lambdex layout for `{target_alias}` targets",
softwrap(
f"""
Set the `[lambdex].layout` option explicitly to `zip` (recommended) or `lambdex`
(compatibility), in `pants.toml`. Recommended: set to `zip` to opt-in to the new
layout recommended by cloud vendors{lambda_message}:
[lambdex]
layout = "zip"
You can also explicitly set `layout = "lambdex"` to silence this warning and
continue using the Lambdex-based layout in this release of Pants. This layout
will disappear in future.
"""
),
)


def rules():
return collect_rules()
Loading

0 comments on commit 3453187

Please sign in to comment.