New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

bug-1889185: implement stage submitter and fake collector #6617

Merged

willkg merged 3 commits into main from willkg-bug-1889185-submitter

May 21, 2024

Contributor

willkg commented May 15, 2024 •

edited

Loading

This implements a stage submitter that pulls crash ids from QUEUE and crash data from STORAGE, packages that up into a crash report payload, and submits it to STAGE_SUBMITTER_DESTINATIONS.

This creates a make runsubmitter system that runs the stage submitter and a fake collector allowing you to use bin/process_crash.sh to set everything up and publish a crash id to the standard queue which the submitter will see and submit a crash to the fake collector. The fake collector will parse the payload and log headers, annotations, and files.

This adds stage submitter tests.

To test:

make build
make setup
make runservices
make runsubmitter
in another terminal, run make shell and then you can use bin/process_crash.sh to download crash data, put it in storage, and publish the crash id to the standard queue
1. if it's sampled, then you'll see a socorro.submitter.sample incr metric
2. if it's accepted, then you'll see a socorro.submitter.accept incr metric and you'll see the headers, annotations, and files logged by the fake collector

Light documentation is provided in docs/services/stage_submitter.rst.

willkg requested a review from a team as a code owner

May 15, 2024 21:15

willkg commented

View reviewed changes

bin/test.sh

@@ @@ -54,6 +54,7 @@ echo ">>> run tests" @@
               # Run socorro tests
               "${PYTEST}"
+              CLOUD_PROVIDER=GCP "${PYTEST}" -m gcp

Contributor Author

willkg May 15, 2024

The submitter tests use queue_helper and storage_helper, so we need to do AWS and GCP tests now for socorro tests.

willkg commented

View reviewed changes

docker/config/local_dev.env

@@ @@ -103,6 +103,10 @@ SESSION_COOKIE_SECURE=False @@
               # -----------
               SENTRY_PORT=8090
+              # stage_submitter
+              # ---------------
+              STAGE_SUBMITTER_DESTINATIONS=http://fakecollector:8000/submit|20

Contributor Author

willkg May 15, 2024

This sets up a single destination sampled at 20%.

willkg force-pushed the willkg-bug-1889185-submitter branch from 1155fc2 to 04fad15 Compare

May 15, 2024 21:18

willkg commented

View reviewed changes

fakecollector/collector.py

+              from werkzeug.exceptions import HTTPException, MethodNotAllowed
+              from werkzeug.formparser import parse_form_data
+              from werkzeug.routing import Map, Rule
+              from werkzeug.wrappers import Request, Response

Contributor Author

willkg May 15, 2024

The only web framework we have available is Django. I didn't want to use Django for the fake collector because that involves a whole bunch of additional set up for the db I didn't want to do.

Flask is an ergonomic shell on top of werkzeug. We don't need templates, localizations, extensions, etc so we can use raw werkzeug to build a very minimal API for /submit.

willkg commented

View reviewed changes

fakecollector/collector.py

+                      content_encoding = request.headers.get("Content-Encoding", "")
+                      self.logger.info("content_encoding: %s", content_encoding)
+                      stream, form, files = parse_form_data(request.environ, max_content_length=content_length)

Contributor Author

willkg May 15, 2024

This worked surprisingly well.

Even so, I want to see how parse_form_data deals with some of the malformed data issues we deal with in Antenna. Depending on that, switching Antenna from Falcon (which uses the deprecated-to-be-removed cgi module) to Flask seems viable.

willkg commented

View reviewed changes

socorro/stage_submitter/submitter.py

		]


		def remove_collector_keys(raw_crash):

Contributor Author

willkg May 15, 2024

Large portions of this file were copied verbatim from socorro-submitter repository where they've been running for years.

willkg commented

View reviewed changes

socorro/stage_submitter/submitter.py

		return repr(thing).encode("utf-8")


		def multipart_encode(raw_crash, dumps, payload_type, payload_compressed):

Contributor Author

willkg May 15, 2024

This is copied from socorro-submitter.

willkg commented

View reviewed changes

socorro/stage_submitter/submitter.py

+                          if dest.sample >= 100 or random.randint(0, 100) > dest.sample
+                      ]
+                  def process(self, crash):

Contributor Author

willkg May 15, 2024

The structure of this was copied from the socorro-submitter.

willkg commented

View reviewed changes

socorro/stage_submitter/submitter.py Outdated Show resolved Hide resolved

willkg commented

View reviewed changes

socorro/stage_submitter/submitter.py


		self.shutdown()

		def run_once(self):

Contributor Author

willkg May 15, 2024

run_once is used in tests.

willkg commented

View reviewed changes

socorro/stage_submitter/submitter.py Outdated Show resolved Hide resolved

willkg commented

View reviewed changes

socorro/tests/stage_submitter/test_submitter.py

		@@ -0,0 +1,594 @@
		# This Source Code Form is subject to the terms of the Mozilla Public

Contributor Author

willkg May 15, 2024

The tests were lifted from socorro-submitter and then adjusted to work with the SubmitterApp instead of the socorro-submitter lambda function.

willkg requested a review from relud

May 15, 2024 21:44

willkg added 2 commits

May 17, 2024 21:49


          bug-1889185: implement stage submitter and fake collector

bcd3a20

This implements a stage submitter that pulls crash ids from QUEUE and
crash data from STORAGE, packages that up into a crash report payload,
and submits it to STAGE_SUBMITTER_DESTINATIONS.

This creates a `make runsubmitter` system that runs the stage submitter
and a fake collector allowing you to use `bin/process_crash.sh` to
set everything up and publish a crash id to the standard queue which the
submitter will see and submit a crash to the fake collector. The fake
collector will parse the payload and log headers, annotations, and
files.

This adds stage submitter tests.


          bug-1889185: fixes based on review comments

d269f85

willkg force-pushed the willkg-bug-1889185-submitter branch from 4c006d9 to d269f85 Compare

May 18, 2024 01:49

relud approved these changes

View reviewed changes

Member

relud left a comment

r+wc

pytest.ini

Comment on lines +5 to 7

+              addopts = -rsxX --tb=native -p no:django -m 'not gcp'
               norecursedirs = .git docs config docker __pycache__
               testpaths = socorro/

Member

relud May 20, 2024

pretty sure marker definitions need to be copied from webapp/pytest.ini as well:

Suggested change

      
            addopts = -rsxX --tb=native -p no:django -m 'not gcp'
          
            norecursedirs = .git docs config docker __pycache__
          
            testpaths = socorro/
          
            addopts = -rsxX --tb=native -p no:django -m 'not gcp'
          
            norecursedirs = .git docs config docker __pycache__
          
            testpaths = socorro/
          
            markers =
          
                aws: tests that require aws backends to be configured in the environment. this is the default.
          
                gcp: tests that require gcp backends to be configured in the environment. skipped unless explicitly requested.

Contributor Author

willkg May 20, 2024

I'll add that. I'm not sure how it works currently without those markers.

socorro/tests/stage_submitter/test_submitter.py Outdated

Comment on lines 27 to 28

		def generate_s3_key(kind, crash_id):
		"""Generates the key in S3 for this object kind

Member

relud May 20, 2024

this isn't just used for s3, i suggest we rename it:

Suggested change

      
            def generate_s3_key(kind, crash_id):
          
                """Generates the key in S3 for this object kind
          
            def generate_storage_key(kind, crash_id):
          
                """Generates the key in cloud storage for this object kind

Contributor Author

willkg May 20, 2024

Whoops--this is a copy/paste fail.


          bug-1889185: fix pytest markers and generalize generate_storage_key

7d00a3d

willkg merged commit 00d064a into main

2 checks passed

Contributor Author

willkg commented May 21, 2024

Thank you!

willkg deleted the willkg-bug-1889185-submitter branch

May 21, 2024 12:28

relud mentioned this pull request

bug 1889156: script to load processed crashes into Elasticsearch #6609

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet