Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run integration tests on wrapped Forwarder (captures metrics) #329

Merged
merged 13 commits into from
Aug 19, 2020

Conversation

nhinsch
Copy link
Contributor

@nhinsch nhinsch commented Aug 18, 2020

What does this PR do?

Right now we run integration tests on the Forwarder without the Datadog Lambda wrapper. This is a problem because the wrapper contains logic to flush metrics from a background thread at the end of each function invocation. Without this, the metrics do not appear in the integration test snapshots as they are not sent during the actual function invocation.

By running the integration tests against the wrapped Forwarder, we can capture metrics in our snapshots, including enhanced metrics from the wrapper. We need to remove the specific metric values from the snapshots, however, as they can vary between test runs.

Motivation

Capturing calls to the metrics intake API will improve our integration test coverage. We also may be losing metrics data in production because of this issue.

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)
  • This PR passes the unit tests

@nhinsch nhinsch changed the title Ngh/integration tests 2 Capture Metrics API calls in integration tests Aug 18, 2020
@nhinsch nhinsch changed the title Capture Metrics API calls in integration tests Flush metrics in every invocation; Capture Metrics API calls in integration tests Aug 19, 2020
@nhinsch nhinsch changed the title Flush metrics in every invocation; Capture Metrics API calls in integration tests Flush metrics in every invocation Aug 19, 2020
@@ -12,7 +12,7 @@

class RecorderHandler(BaseHTTPRequestHandler):
def __init__(self, request, client_address, server):
super().__init__(request, client_address, server)()
super().__init__(request, client_address, server)
Copy link
Contributor Author

@nhinsch nhinsch Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was raising exceptions in the integration tests, so I cleaned it up

@@ -5,6 +5,3 @@ ARG forwarder

# Add the code into /var/task (will unzip files)
ADD $forwarder /var/task/

ENV DD_API_KEY "fake-api-key"
ENV DD_SITE "datadog.com"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these to docker-compose to be with the other env vars

@@ -635,6 +635,10 @@ def forward_metrics(metrics):
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Forwarded metric: {json.dumps(metric)}")

# Force the background thread to send metrics
# If this does not happen before the function ends, the metrics may not be sent
lambda_stats.flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do flush the metrics (and it blocks) after executing the handler function https://github.com/DataDog/datadog-lambda-python/blob/a1139b4940229821d29a6fd4c20cf06d03e81de4/datadog_lambda/wrapper.py#L141. Perhaps you can add some debugging statements around that line or in the wrapper code to see what's going on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration tests were being run against the unwrapped handler 🤬

I'll fix this in the morning!

@nhinsch nhinsch changed the title Flush metrics in every invocation Run integration tests on wrapped Forwarder (captures metrics) Aug 19, 2020
@nhinsch nhinsch merged commit 8bd7342 into master Aug 19, 2020
@nhinsch nhinsch deleted the ngh/integration-tests-2 branch August 19, 2020 16:02
mpuaplacester added a commit to Placester/dd-aws-lambda-functions that referenced this pull request Sep 29, 2020
* Increasing parsing priority for aws/rds source

* Add dms to list of recognized log sources

* Update python and checkout action to the latest

* Add integration test note to template

* Document how to find the installed forwarder Lambda

* Fixes DataDog#207 by allowing CloudWatch Logs & S3 to invoke the Forwarder

* Add aws_account tag to logs and enhanced metrics

* Update snapshots for new tags

* Remove unused kwarg

* Bump version from 3.3.0 to 3.4.0

* Update README.md

* Fixed DataDog#230 as cfn-lint now passes with no errors

* Update "extractResourceId" function

This extends "extractResourceId" function to extract "source" tag from applications that are deployed at Tenant level.

The existing behavior is to discard all resourceId that don't begin with "/subscriptions/" thus it doesn't handle Tenant level resourceId that begin with "/tenants/".

Adds a else if block to handle resourceIds with Tenant level deployemnts(Like Active Directory)

I have kept the source and tag extraction logic similar to what currently exists.

* Add cfn-lint

* Fix integration test snapshots

* Bump version from 3.4.0 to 3.5.0

* Exclude unuseful files from zip

* Instructions for safe deletion

* Add manual installation steps

* Terraform installation and permissions

* Apply suggestions from code review

Co-Authored-By: Stephen Pinkerton <[email protected]>

* Use latest cfn-lint

* Fix tests

* Change enhanced tag parsing rules

* Update comment

* Only remove colons from tag keys

* Fix typo in comment

* Blank out forwarder version in snapshot tests

* Remove camel case parsing rules

* Bump version to 3.6.0

* Remove empty comments

* Bump version from 3.5.0 to 3.6.0

* Disallow empty value for required fields

* feat: Enable supplying DdApiKeySecret directly

- Works around indefinite reapply changes when using terraform to deploy
cloudformation with NoEcho parameters
(hashicorp/terraform-provider-aws#55)

* Preserve mandatory DdApiKey

- Make DdApiKeySecretArn an Advanced option

* Ensure empty tags can be sanitized

We have a few lambdas that have tags with empty values, and that was causing an exception
to be thrown during the sanitization function.  This fixes that behavior, with tests.

* update mocha to remove vuln of minimist version

* Bump layer versions

* Update cloudwatch snapshots

* Bump forwarder version

* Update layer version

* Remove bad character

* Bump version from 3.6.0 to 3.7.0

* Refresh tags when new lambda arn is encountered

* check that log message is a string for enhanced metrcis submission

* Fix typo

* Bump version from 3.7.0 to 3.8.0

* add jvanbrunschot changes to update to python 3 and use the new lambda layer

* Add tool for building release bundle with datadog_lambda dep

* Make release process work with new zip

* reverting to use of Stats class and updating it to python3

* Style changes

* Removing lambda layer imports

* Removing lambda layer call

* Move trace forwarder into repository

* Update trace forwarder deps

* Add trace_forwarder as package to forwarder

* Fix integration tests

* Add integration tests to workflow

* Fix formatting

* Fix bad substitution error

* Updating docs to reflect rds support of python 3

* Fix github actions error in script

* Add copyright disclaimers

* Run trace forwarder tests in CI

* Bump version from 3.8.0 to 3.9.0

* Remove gov cloud references

* Re-add note about disablng enhanced metrics

* Add private link to variables to template.yml

* Fix template errors

* Store lowercase ARN keys in tags cache

* Updating active_directory to activedirectory

Azure AD source changed from azure.active_directory to azure.activedirectory to match other azure source convention.

* Bump version from 3.9.0 to 3.10.0

* Fix lambda_function private link errors

* Update readme with private link details

* Add notes to the update instructions

* Fix a typo

Co-authored-by: Stephen Firrincieli <[email protected]>

* Bump version from 3.10.0 to 3.11.0

* Update Terraform code to ignore parameter change

Without this change `terraform apply` would keep trying to apply changes to the Cloudformation Stack even though its already been deployed.

* Updating if condition for tenant

to (resourceId.length > 4 && resourceId[4])

* Update PULL_REQUEST_TEMPLATE.md

* Make debug logging more verbose

* [aws forwader] Update readme with info on capabilities used

* Bump version from 3.11.0 to 3.12.0

* Update PULL_REQUEST_TEMPLATE.md

* [aws] add tag for memorysize

* Revert "Add memorysize tag to enhanced lambda metrics"

* add memorysize tag to enhanced metrics

* fixed formatting

* fixed formatting with black

* add init duration metric and coldstart tags

* add additional comment

* put logging level back to info

* update unittest for new tag and metric

* include unittest in lambda checks

* fix lambda check

* update PR template with unit test

* Ensure the forwarder can be installed to us-gov AWS accounts

* updated optional regex patterns

* Bump version from 3.12.0 to 3.13.0

* update tag logic and test

* remove enhanced lambda redundancies

* updated metric list

* fix typo

* fix comment

* update comment

* update unittest and CI

* add boto3 for unittest

* add region env variable for boto

* try setting AWS region for boto

* set aws region in test file

* set env variable in lambdacheck instead

* add timed out enhanced metric

* reset log level to info

* update enhanced lambda timed out metric name

* update to timeouts and count

* fix typo

* update timed out variable name

* fix variable name

* Update truncation behavior (DataDog#281)

* Change private link URL (DataDog#282)

* Bump version from 3.13.0 to 3.14.0

* Document AWS Forwarder CloudFormation params

* Update troubleshooting, remove TODOs.

* Move installation_alternatives

* Copy edited

* Ignore changes actually leads to a problem upon update, the masked value **** will be used instead

* Update template verbiage

* tcp connection

* Add out_of_memory enhanced metric (DataDog#287)

* Update installation_alternatives.md

* Update installation_alternatives.md

* Batch traces before making API calls (DataDog#291)

* Bump version from 3.14.0 to 3.15.0

* [azure event hub forwarder] Convert strings to json logs (DataDog#292)

* initial commit to update event hub forwader

* cleanup

* added another test and a few fixes

* update tests

* Batch traces by env instead of tags (DataDog#296)

* Parallelize GitHub Workflows (DataDog#297)

* Run integration tests with Python 3.8 (DataDog#298)

* Refactor parse_event_source to prevent mis-identification of sources (DataDog#299)

* Refactor parse_event_source and add tests for it

* Fix typo

* Add unit test

* Bump version from 3.15.0 to 3.16.0

* Use us-west-2 for installation test (DataDog#300)

* Remove code to support for Python 2.7 (DataDog#302)

* Refactor settings into a separate file (DataDog#301)

* Update README.md

* Override service metadata field with service tag (DataDog#305)

* Bump version from 3.16.0 to 3.16.1

* [documentation] updates to prepare to single source this into the documentation site (DataDog#304)

* removes alternatives page to be added to the main readme

* edits and reorg of the main readme

* Update aws/logs_monitoring/README.md

* Update aws/logs_monitoring/README.md

Co-authored-by: Alex <[email protected]>

* Use unique variable name for Logger instance (DataDog#307)

`log` is used in `for log in logs` loops where its use is more justified.

Also this fixes a bug in `enhanced_lambda_metrics.py:parse_and_submit_enhanced_metrics()` related to the naming conflict.

Signed-off-by: Eugene Glotov <[email protected]>

* adding decryption fix (DataDog#309)

* Updated CloudWatch deploy permissions (DataDog#313)

Adds new permissions for the IAM policy, without which the AWS CloudFormations deployment fails with "ForwarderZipsBucket, CREATE_FAILED, API: s3:SetBucketEncryption Access Denied" and "ForwarderZipsBucket, CREATE_FAILED, API: s3:PutPublicAccessBlock Access Denied"

* Update lodash version (DataDog#314)

* [AWS log forwarder] Override service tag for meta in trace (DataDog#316)

* override service tag for meta

* add test to override the meta key as well

* Bump version from 3.16.1 to 3.16.2

* [azure] Refactor and convert Azure eventhub log forwarder to use HTTP (DataDog#315)

* initial commit for http change

* updates

* pin lodash version

* lint

* update lodash version

* use promises and finally working retries

* Revert "use promises and finally working retries"

This reverts commit 4b9a9f5.

* fix it all and get it workng

* remove lodash changes for another pr

* update test

* Clarify PrivateLink Endpoints in README (DataDog#317)

* Enable Trace Forwarding via AWS PrivateLink (DataDog#318)

* Bump version from 3.16.2 to 3.16.3

* fix link, spelling (DataDog#320)

* Fix version number incrementing (DataDog#321)

* Update integration tests to use raw JSON data (DataDog#319)

* Merge ddtags (DataDog#326)

* Bump version from 3.16.3 to 3.16.4

* Fix version number (DataDog#328)

* Run integration tests on wrapped Forwarder (captures metrics) (DataDog#329)

* Add --skip-forwarder-build option to integration tests (DataDog#330)

* [azure log forwarder] Update lint command and lint blobs monitoring (DataDog#331)

* lint test file as well

* lint blobs monitoring

* Improve integration tests (DataDog#333)

* Fix integration tests --update option (DataDog#334)

* Translate binary data type to be human readable (DataDog#332)

* Handle binary data type

* Add tests and handle malformed logs

* Lint files

* Adjust for comments

* David.kim/updated lambda forwarder (DataDog#327)

* Add first pass of telemetry to log forwarder

* Updated lambda forwarder with telemetry

* Update lambda_function.py

Revise comment

* Update lambda_function.py

Remove DD_FORWARD_TRACES which is not a real env var

* Linted with black

* Refactored telemetry to fire from forward_* functions and removed counter

* Ran black linter

* Fix local/global variable issue

* Fix local/global variable issue and run black linter

* Refactored to use set_forwarder_telemetry_tags and add telemetry tags to enhanced metrics

* Finalize log forwarder telemetry and update integration test snapshots

* Run black linter on lambda_function and enhanced_lambda_metrics

* Address final comments and update integration tests

* Address final comments and update snapshots

* Add final tweaks

* Update error handling for event_type and update snapshots

* Add error handling to event_type and remove duplicate tags in ehanced_lambda_metrics

* Forwarder: Additional target lambdas (DataDog#335)

* Add integration test

* more changes to improve test script

* other script fixes

* fix format

* make integration tests optinally deploy lambda

* fix int tests in ci

* update integration tests

* add template changes to take multiple functions

* lint formatting

* add installation test verify to pull request template

* Update aws/logs_monitoring/README.md

Co-authored-by: Tian Chu <[email protected]>

* cr feedback

* add unit tests and make int tests default without the external lambda

Co-authored-by: Tian Chu <[email protected]>

* Bump version from 3.16.4 to 3.17.0

* Update README.md (DataDog#336)

* Update README with logs dual shipping info (DataDog#338)

* update README.md

* Update aws/logs_monitoring/README.md

Co-authored-by: Tian Chu <[email protected]>

Co-authored-by: Tian Chu <[email protected]>

* Fix flakiness in integration tests (DataDog#339)

* Fix access to S3 within VPC (DataDog#340)

* Support sending data from VPC through a proxy (DataDog#342)

Support sending data through a proxy

* Bump version from 3.17.0 to 3.18.0

* Fix forwarder template to work in AWS China (DataDog#347)

Co-authored-by: stroem <[email protected]>

* Use KMS for S3 bucket encryption (DataDog#349)

* Bump version from 3.18.0 to 3.18.1

* [azure] Make azure log forwarder tests cleaner and easier to understand (DataDog#350)

* update azure tests

* add and update binary tests

* lint

* rename and cleanup

* fix setUp

* Handle malformed metrics (DataDog#352)

* handle malformed metrics

* fix tests

* fix formatting

* Bump version from 3.18.1 to 3.18.2

* [sls-661] Use s3 for shared tags cache (DataDog#343)

* Use s3 for tags cache

* Update unit tests

* Update snapshot for integration test

* Case when s3 cache is expired

* Set ddfetchlambdatags to true by default

* Update label

* Move ddfetchlambdatags out of experimental

* Allow additional tags for internal metrics

* Add unit test for client error case

* New line at EOF

* Patch aws boto3 pagination calls

* Import env vars from settings

* Script to run unittests (sets env vars)

* Add env vars to github workflow

* Try quotes

* Update description for DDFetchLambdaTags

* Fix env var name

* Lambda function for cache test

* Remove lambda functions after external lambdas test

* Integration test for s3 cache

* Update events with cache lambda name

* Update snapshots

* Run cache integration test by default

* Remove guids from snapshots

* Update snapshots

* Turn cache tests off by default

* Snapshots for when cache tests are on

* Snapshots for when cache test is off

* Undo lambda function name change

* Enable locking for S3 cache

* Add try catch for cache lock creation

* Update unit tests

* Update snapshot for cache integration test

* Fix concat issue

* Fix operand type issue

* Add deletion permission for s3 bucket

* Group DdFetchLambdaTags under advanced

* Add debug logs

* Bump version from 3.18.2 to 3.19.0

* Update README.md

* Fix manual install anchor link reference. (DataDog#311)

* Fix a typo

* Add mention of undocumented SNS support (DataDog#286)

Had to check the source code to confirm this was supported

https://github.com/DataDog/datadog-serverless-functions/blob/e9180b1bbdd9ccef14d4c498d47d2901c28bbea5/aws/logs_monitoring/lambda_function.py#L624

There is sort of/kind of a reference to this, but it's worded in a very strange way. Logging isn't supported, but it is, by picking an ARN that you "want to use".  I'm assuming what it really means, is select *the* datadog logs forwarder, you set up here<link to cloudformation forwarder documentation>

* change runtime to 3.7 (DataDog#353)

* Forwarder: add KMS permission to get log from Encrypted S3 Bucket (DataDog#337)

* feat: add KMS permission to get log from Encrypted S3 Bucket

* fix: remove none necessary permissiong

If you use a customer managed CMK, the Forwarder only needs kms:Decrypt for downloading, typically CloudTrail SSE require CMK and this permission will work for the case.
if you use the AWS managed CMK (i.e., key aws/s3), no extra permissions are required for the Forwarder to download the object.

* Add support for S3 events triggered via SNS (DataDog#351)

* Add support for S3 events triggered via SNS
- When S3 events are configured to push to SNS topics
- The DD Lambda Forwarder subscribes to the SNS topic
- Then in DD console the event shows up as SNS with SNS event data being logged rather than the S3 log being ingested
- This change extracts the S3 event from the SNS topic and then extract and ingests the S3 log from it

* removing redundant checks

Co-authored-by: Tian Chu <[email protected]>

* simpler validations using try-catch

Co-authored-by: Tian Chu <[email protected]>

Co-authored-by: Tian Chu <[email protected]>

* Bump version from 3.19.0 to 3.20.0

* Group TagsCacheTTLSeconds under Advanced (DataDog#357)

* [azure] Add new source types and other cleanup (DataDog#354)

* inital commit for new sources

* update tests and replace allSettled

* make empty string checking a bit more clear

* handle trailing slash

Co-authored-by: Ryan Nixon <[email protected]>
Co-authored-by: Alex Burgel <[email protected]>
Co-authored-by: chenrui <[email protected]>
Co-authored-by: DarcyRaynerDD <[email protected]>
Co-authored-by: Darcy Rayner <[email protected]>
Co-authored-by: Tian Chu <[email protected]>
Co-authored-by: Philipp Hoffmann <[email protected]>
Co-authored-by: Stephen Firrincieli <[email protected]>
Co-authored-by: Greg Kaestle <[email protected]>
Co-authored-by: anshumgargdd <[email protected]>
Co-authored-by: Stephen Pinkerton <[email protected]>
Co-authored-by: Jim Park <[email protected]>
Co-authored-by: William Richard <[email protected]>
Co-authored-by: Garner Fox McCloud <[email protected]>
Co-authored-by: Andrew Zakordonets <[email protected]>
Co-authored-by: Jonathan VanBriesen <[email protected]>
Co-authored-by: Andre Marcelo-Tanner <[email protected]>
Co-authored-by: Jordan Storms <[email protected]>
Co-authored-by: Nicolas Hinsch <[email protected]>
Co-authored-by: chris.agocs <[email protected]>
Co-authored-by: Christopher Agocs <[email protected]>
Co-authored-by: Claudia D'Adamo <[email protected]>
Co-authored-by: Kaylyn <[email protected]>
Co-authored-by: Alex <[email protected]>
Co-authored-by: Eugene Glotov <[email protected]>
Co-authored-by: Lim Wei Chiang <[email protected]>
Co-authored-by: Sergio Prada <[email protected]>
Co-authored-by: Kari Halsted <[email protected]>
Co-authored-by: Ricky Thomas <[email protected]>
Co-authored-by: David D. Kim <[email protected]>
Co-authored-by: stroem <[email protected]>
Co-authored-by: stroem <[email protected]>
Co-authored-by: Harvinder Ghotra <[email protected]>
Co-authored-by: David E. Weekly <[email protected]>
Co-authored-by: Ryan Romanchuk <[email protected]>
Co-authored-by: Ikiru Yoshizaki <[email protected]>
Co-authored-by: Sagar Khanna <[email protected]>
Co-authored-by: Mike Pua <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants