Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure API to connect to OpenSearch in AWS #2327

Closed
chouinar opened this issue Oct 1, 2024 · 1 comment · Fixed by #2450, #2452 or #2624
Closed

Configure API to connect to OpenSearch in AWS #2327

chouinar opened this issue Oct 1, 2024 · 1 comment · Fixed by #2450, #2452 or #2624
Assignees
Labels
topic: backend Backend development tickets topic: infra Infrastructure related tickets

Comments

@chouinar
Copy link
Collaborator

chouinar commented Oct 1, 2024

Summary

placeholder - should be able to follow: https://opensearch.org/docs/latest/clients/python-low-level/#connecting-to-amazon-opensearch-serverless - which is already implemented, but the env vars aren't configured.

Roughly we'll need to configure:

  • host
  • port
  • Anything special regarding the AWS auth if different from example
  • Slight adjustments based on serverless vs server-provisioned

Acceptance criteria

No response

@chouinar
Copy link
Collaborator Author

chouinar commented Oct 1, 2024

@coilysiren - the above link is what I'm aware of as far as connecting to AWS OpenSearch. Most env vars are pretty uneventful, but the credentials might require some sort of configuration on the ECS task itself? Not sure how that works.

@coilysiren coilysiren mentioned this issue Oct 1, 2024
2 tasks
@mxk0 mxk0 moved this from Icebox to Todo in Simpler.Grants.gov Product Backlog Oct 6, 2024
@mxk0 mxk0 added refinement infra topic: infra Infrastructure related tickets topic: backend Backend development tickets and removed infra refinement labels Oct 6, 2024
@chouinar chouinar self-assigned this Oct 10, 2024
@chouinar chouinar moved this from Todo to In Progress in Simpler.Grants.gov Product Backlog Oct 15, 2024
chouinar added a commit that referenced this issue Oct 15, 2024
## Summary
Fixes #2327

### Time to review: __5 mins__

## Changes proposed
Modified configuration to call OpenSearch with our current auth setup

Adjusted the names of several env vars to match the Terraform setup

Adjusted the number of records we upsert per batch to OpenSearch from 5k
to 1k

## Context for reviewers
We will have another pass of auth setup later, getting proper auth setup
with OpenSearch requires we adjust some other infra setup to get the
auth in OpenSearch itself working, so right now we're logging in with
the user/pass of the cluster itself. As those changes will take a bit to
setup, I've just commented out the AWS auth approach we will use later.

## Additional information
In dev I ran the load job + queried the API and it all functions.
Queries go to the search index as expected and responses look good. Not
sure about caching/performance, but was getting 100-200ms for the
requests to the API.

For future reference, a decent way to query this in Logs Insights:
```
filter msg = 'end request'
| filter request.path = '/v1/opportunities/search'
| stats avg(response.time_ms), min(response.time_ms), max(response.time_ms) by bin(5m)
```

Here is the final big log statement from running the job in dev:
```json
{
    "name": "src.task.task",
    "msg": "Completed %s in %s seconds",
    "args": [
        "LoadOpportunitiesToIndex",
        73.459
    ],
    "levelname": "INFO",
    "levelno": 20,
    "pathname": "/api/src/task/task.py",
    "filename": "task.py",
    "module": "task",
    "lineno": 48,
    "funcName": "run",
    "created": 1729014690.4964094,
    "msecs": 496,
    "relativeCreated": 80289.82591629028,
    "thread": 140513021029248,
    "threadName": "MainThread",
    "processName": "MainProcess",
    "process": 7,
    "taskName": null,
    "index_name": "opportunity-index-2024-10-15_13-50-17",
    "records_loaded": 16488,
    "task_duration_sec": 73.459,
    "app.name": "src.app",
    "environment": "dev",
    "task_name": "load-opportunity-data-opensearch",
    "task_uuid": "c237cc23-57e9-42b4-bb8d-3b69593cb470",
    "aws.ecs.task_name": "api-dev",
    "aws.ecs.task_id": "5d95f47fd9da4d359bf74134328e1947",
    "aws.ecs.task_definition": "api-dev:199",
    "aws.cloudwatch.log_group": "service/api-dev",
    "aws.cloudwatch.log_stream": "api-dev/api-dev/5d95f47fd9da4d359bf74134328e1947",
    "aws.step_function.id": null,
    "message": "Completed LoadOpportunitiesToIndex in 73.459 seconds"
}
```
coilysiren added a commit that referenced this issue Oct 17, 2024
Summary

Fixes #2327

## Changes

Adjust Opensearch policy to actually allow access, and also to restrict
access in the production ready way.

## Context

Opensearch debugging in 3 parts:

Part 1:
https://betagrantsgov.slack.com/archives/C05TSL64VUH/p1728587166738419

Part 2:
https://betagrantsgov.slack.com/archives/C05TSL64VUH/p1728663661259359

Part 3:
https://betagrantsgov.slack.com/archives/C05TSL64VUH/p1729002442864019

## Follow-up

#2472
coilysiren added a commit that referenced this issue Oct 29, 2024
…vel (#2624)

## Summary

Fixes #2327

### Time to review: __1 mins__

## Context for reviewers

<img width="930" alt="image"
src="https://github.com/user-attachments/assets/a5e14dcc-8ab9-4283-83aa-850526a1a4ce">

^ I ran into the resource limit above when deploying prod opensearch

So I have changed it so there is only 1
`aws_cloudwatch_log_resource_policy` resource

## Testing

I've already deployed this to dev, and to the account level. The diff
applied just fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment