Skip to content
This repository has been archived by the owner on Aug 13, 2019. It is now read-only.

Write SQS consumer looking for buildhub.json files #465

Closed
peterbe opened this issue May 14, 2018 · 7 comments
Closed

Write SQS consumer looking for buildhub.json files #465

peterbe opened this issue May 14, 2018 · 7 comments
Assignees

Comments

@peterbe
Copy link
Contributor

peterbe commented May 14, 2018

A new daemon synchronous script that consumes a SQS queue (ARI from an env var) and only looks for os.path.basename(uri) == 'buildhub.json' that consumes, validates, inserts into kinto and deletes.

See tokenserver code for example.

@mostlygeek
Copy link
Contributor

I created an S3 bucket and SQS queue to testing that the consumer works. It is in the dev iam so @peterbe your dev creds should give you full access to it.

region: us-west-2
s3 bucket: buildhub-sqs-test
sqs url: https://us-west-2.queue.amazonaws.com/927034868273/buildhub-s3-events

I uploaded a few random files into S3 to make sure it works. It does. :)

Example of CLI access:

$ aws sqs receive-message --queue-url="https://us-west-2.queue.amazonaws.com/927034868273/buildhub-s3-events" --visibility-timeout=5 --max-number-of-messages=10

{
    "Messages": [
        {
            "MessageId": "d769d208-1263-470c-888a-42d1e5ec3285",
            "ReceiptHandle": "AQEBHZ9dWPP17YcpriZwxxPPKUX4rCUz2blVUB91KkINnGg+Ad1nv0d2nbdnGD8dK11oSO5CEZYiUh5eSRorUYiudb8sinGtet/w7o6DIZTw2anxdJ2twdliYL6bk+eZ+U6jjR2m96EoJOsjKhCT3fyYdLhswakJjl0yYdTGxnIphw++5868YznVLPXOC0R998S/0g3i7XSuQMyxWBrgGfEey7mgOnr1HUsI6n/LjBMhdS7aqljX9Oshy9gGBk1wqf8jnjsjqzQfAaPQLkW/himzHZI/5DZyKte3BKnbtM2IXsbecp41GiSn440fKZoIq29QQNcjTjJc7cjzYl7tcdQA2dkg/m3ndjl07Y977OWZ+NHWKo/oUOjn35CFqtiizg//WM91LvsTPKWxuoAUVNKkHA==",
            "MD5OfBody": "88e29bd0e797a051d4d17e3a56d90111",
            "Body": "{\"Records\":[{\"eventVersion\":\"2.0\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"us-west-2\",\"eventTime\":\"2018-05-14T19:53:41.462Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AIDAI66EYHAJEFYA5TOD4\"},\"requestParameters\":{\"sourceIPAddress\":\"96.48.231.33\"},\"responseElements\":{\"x-amz-request-id\":\"E7369128C119B2B0\",\"x-amz-id-2\":\"VvDunUZcxNAdTZ2Pqdx2DIazPH8937l37fiURskXB4gphz0z3beCi9D+gkpKNbSLVIL+flLSiBA=\"},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"CreateObjectEvents\",\"bucket\":{\"name\":\"buildhub-sqs-test\",\"ownerIdentity\":{\"principalId\":\"A2KAVF94WGHDQ6\"},\"arn\":\"arn:aws:s3:::buildhub-sqs-test\"},\"object\":{\"key\":\"addi-frozen-cake.jpg\",\"size\":160075,\"eTag\":\"b01ab1f58eb4e4f43a31d5beb0d4b479\",\"sequencer\":\"005AF9E94560D90006\"}}}]}"
        },
        {
            "MessageId": "c902e689-41da-4ca8-a0aa-c8b8ceb64984",
            "ReceiptHandle": "AQEBNFXFGMVlOucHYqLUM+viowMFXbJzqS+yJ+kKt2Nf5cEXmCcKEa/tiR6SL7+LGEiZLf4hgydwa4GHeW+GdKA+/JiyZrwvash9Kjanc6YdMsx5cc8OdWRvsKea0E0cC2dEld9ajsC103q7xw6ldVyJ14WxrpVab77QEVb3vcq51kuTq4ZI5FCVcxPZg1mCGW8Esty/z8kWMUyFlwmPqyKGZLtz3QYO2+PUC2W8rDX6X3JCbQFDj/fwdZ4EVbmoAYpwiwJUgPQZi/vc4DTglC5Qq1Ot8yDaZiJs3uI08j2dbaWdIAtAmCBOIDPFrd5sjjcqnYuK+UsKkuEU3ccqW4s/iBiaJ9ji5GJp56s7JARfNMXFXe4xailgh43HjL04AICIN6uL12gigHbPs+dkJPOuJA==",
            "MD5OfBody": "9aa865a6bc42b2e6d9f65bcabb44e36e",
            "Body": "{\"Records\":[{\"eventVersion\":\"2.0\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"us-west-2\",\"eventTime\":\"2018-05-14T19:53:51.368Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AIDAI66EYHAJEFYA5TOD4\"},\"requestParameters\":{\"sourceIPAddress\":\"96.48.231.33\"},\"responseElements\":{\"x-amz-request-id\":\"320F9A6DF6D2838A\",\"x-amz-id-2\":\"avk3pH3bcycstgl8UA9fDOGViSt1rAw3oer4uYuQBN2+IcRjub1TUzGv9+RxqWHPDqFebIJcmEU=\"},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"CreateObjectEvents\",\"bucket\":{\"name\":\"buildhub-sqs-test\",\"ownerIdentity\":{\"principalId\":\"A2KAVF94WGHDQ6\"},\"arn\":\"arn:aws:s3:::buildhub-sqs-test\"},\"object\":{\"key\":\"autoscaling.png\",\"size\":75441,\"eTag\":\"15a369287f713326ffa941f7590b03de\",\"sequencer\":\"005AF9E94F494A3885\"}}}]}"
        }
    ]
}

@mostlygeek
Copy link
Contributor

mostlygeek commented May 14, 2018

The Consumer should access the S3 buckets directly and avoid going through the CDN. We can ask ops to make sure the EC2 box running the Consumer has the IAM right permissions to:

  • S3:getObject
  • SQS:ReceiveMessage
  • SQS:DeleteMessage*

Also the Consumer should be idempotent. So if the same buildhub.json message is processed twice the kinto results will remain the same. Regular SQS queues has an "at least once" delivery guarantee, so messages could get duplicated.

@mostlygeek
Copy link
Contributor

I created a random S3 file generator: https://github.com/mostlygeek/s3-file-maker

You can use this by running it in the background, go run main.go -num 100000. The SQS queue should fill up with messages. This will also make it easy to get an idea of the latency between an S3 file being created and the message being visible on the queue.

@peterbe
Copy link
Contributor Author

peterbe commented May 17, 2018

Consumer should access the S3 buckets directly

...and ...

[right permissions to] S3:getObject

Why did you mention that? Surely, SQS:ReceiveMessage and SQS:DeleteMessage* (although I don't know what the asterisk means there). If a consumer, in Python sits and consumes this, it'll get JSON blobs. It then inserts those and that's the end of that. At that point, it ultimately does need to care or know where the JSON blobs ultimately comes from.

Mind you, if we still need the backfill (which I suspect we do) we'll probably do that by extracting the manifests from the S3 bucket anonymously. So I guess the ask is to set up this bucket to be publicly readable just like the one we use for Mozilla builds.

@peterbe
Copy link
Contributor Author

peterbe commented May 17, 2018

Ah! I see. The message doesn't contain the body. Of course. Silly me. I still have to go and pick it up. ...by its key and bucket name.

@mostlygeek
Copy link
Contributor

Mind you, if we still need the backfill (which I suspect we do) we'll probably do that by extracting the manifests from the S3 bucket anonymously. So I guess the ask is to set up this bucket to be publicly readable just like the one we use for Mozilla builds.

  1. We will still need the backfill. I outline the plan in Architecture for buildhub.json #437 for backfilling
  2. The bucket is already public read. Confirmed with ops.

@peterbe
Copy link
Contributor Author

peterbe commented Jun 15, 2018

@peterbe peterbe closed this as completed Jun 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants