Prod Release 02/04/2024 #623

darunrs · 2024-03-27T21:44:02Z

feat: Instrument Runner Service (feat: Instrument Runner Service #602)
Support WHERE col IN (...) in context.db.table.select and delete (Support WHERE col IN (...) in context.db.table.select and delete #606)
feat: Include indexer name in context db build failure warning (feat: Include indexer name in context db build failure warning #611)
Cache provisioning status (Cache provisioning status #607)
Fix ESLint on DmlHandler (Fix ESLint on DmlHandler #612)
fix: Substitution 'node-sql-parser' with a forked version until Apri 1st (fix: Substitution 'node-sql-parser' with a forked version until April 1st (Next Release) #597)
feat: Add pgBouncer to QueryApi (feat: Add pgBouncer to QueryApi #615)
feat: Expose near-lake-primitives to VM (feat: Expose near-lake-primitives to VM #613)

Runner is lacking instrumentation. It is responsible for many things and it's become hard to understand what tasks contribute to the overall latency of an indexer. In addition, we are now at a point where we need to drive down latencies to facilitate new * indexer use cases such as access keys. I've chosen to instrument Runner with OpenTelemetry. Tracing generally requires 3 items: An instrumented service, a trace collector, and a trace visualizer. The service is responsible for collecting and transmitting trace data to the collector. The collector should be able to receive trace data with little fuss to prevent performance impacts to the instrumented service. The collector then processes the trace data and transmits the processed data to the visualizer. The visualizer visualizes trace data and allows for filtering on traces. The benefit of OpenTelemetry over other options like Zipkin and Jaeger is that GCP already supports ingesting OpenTelemetry data. As such, we don't need to provision a collector ourselves, and can instead leverage GCP's existing collector & visualizer Tracing service. For local development, traces can be output to console, a Zipkin all-in-one container or to GCP (Requires Cloud Trace Agent role and specifying project ID). This is done by simply initializing the NodeSDK differently. In addition, we do not want to enable traces in prod yet, so by not specifying any exporter. This creates a No-Op Trace Exporter which won't attempt to record traces. No code changes were made changing code execution path. All tests pass with no changes, aside from having to replace snapshots due to changes in tabbing of mutation strings. I have manually verified mutation strings are still the same by stripping whitespace and checking against original.

Add support for `context.db.Table.select({column_name: ['a.near', 'b.near']})`. The same support is added for `delete`. Frontend support is added. I also improved parameter naming to reflect SQL statements like `where`, `values` and `set`.

context.db build failures are just logged instead of blocking to allow complex schemas but with only graphql calls available. However, these logs are repeatedly output and not tagged with an indexer name. This adds an indexer name to the log to aid debugging.

Checking provisioning status through Hasura takes 70-100ms on Dev. This PR caches the provisioning status inside of `Provisioner` and does make extra requests to Hasura on every run.

… 1st (Next Release) (#597) A temporary change in our codebase. Replacing the usage of node-sql-parser with kevin-node-sql-parser until April 1st. The reason for this substitution is that the official release of node-sql-parser lacks a version release for the additional SQL statements required for our current project needs. In the interim, this forked version addresses these shortcomings and allows us to incorporate the required SQL statements. Please note that this is a temporary measure, and we plan to revert to the official node-sql-parser version after April 1st, once the required features are officially available. See last comment for details

QueryApi has experienced issues with Postgres connections since the introduction of * indexers due to how QueryApi creates these connections through Application level connection pools. Since we can't use one pool for all workers, I've introduced PgBouncer as a Middleware to serve as an additional connection pooler in front of the DB.

- Upgrade `near-lake-primitives` to `0.2.0`, which includes `borsh` - Expose entire `near-lake-primitives` library to VM via `primitives`, e.g. borsh can be accessed via `primitives.borsh.fromBorsh()`

This PR expands provisioning to also schedule the cron jobs for adding/deleting log partitions. It assumes: 1. The `cron` database exists and has `pg_cron` enabled (near/near-ops#1665) 2. The `__logs` table exists and has the partition functions defined (#608) In relation to this flow, the high-level steps are: 1. Use an admin connection to the `cron` database to grant the required access to the user 2. Use a user connection to the `cron` database to schedule the jobs The cron job is executed under the user which schedules the job, therefore the user _must_ schedule the job as they are the only ones who have access to their schemas. If the admin were to schedule the job the job itself would fail as it doesn't have the required access. Merging this before 2. is fine, the jobs will just fail, but should start to succeed after it has been implemented.

This PR adds a very basic integration test for `Indexer`. It uses `testcontainers` to stand up both `postgres` and `hasura` so that `Indexer` can talk to real components rather than mocks. The test uses `Indexer` directly, which means S3/Redis are still somewhat mocked/ignored. We can add those in later if need be. This is essentially just the scaffolding for integration testing which can be expanded over time. The suite includes only 1 very basic test, which if successful should provide a fair amount of confidence that things are working as expected. The flow includes: provisioning, writing data to Postgres, and then asserting its existence via GraphQL. All errors bubble up from `Indexer` so this test should catch most problems. This PR points to #625, as so I could test the `pg_cron` flow via this integration test :)

Indexer schemas can have quoted or unquoted table & column names. However, QueryApi always quotes table names and does not quote column names during SQL query construction for context.db. This is because the AST generated form parsing the schema does not include if the identifier was quoted or not. However, recent updates tot he parsing library has added this functionality in. I've updated QueryApi to quote both the table name and the column names if they were quoted originally, and leave them unquoted otherwise. In addition, I've replaced kevin-node-sql-parser back with the original package now that the 5.0 update has released. I've also added a typscript examples folder for convenience, as well as a script to clear the local postgres database.

DmlHandler was using port number handed to it by Hasura, which is 5432. We want it to use 6432 which is the port specified by the env variable. 6432 points to pgBouncer.

The admin/cron connection is currently hard-coded to the `cron` database, but this needs to be configurable so that we can use the default DB (`postgres`) locally. Additionally, this PR combines the `pgbouncer` / `pg_cron` init scripts and uses the combined output in both docker compose and integration tests.

darunrs · 2024-04-02T22:58:35Z

Changes to be applied after merge of PR: https://github.com/near/near-ops/pull/1677

This reverts commit b96e269.

Reverts #623

darunrs and others added 8 commits March 15, 2024 13:56

Cache provisioning status (#607)

053483e

Checking provisioning status through Hasura takes 70-100ms on Dev. This PR caches the provisioning status inside of `Provisioner` and does make extra requests to Hasura on every run.

Fix ESLint on DmlHandler (#612)

d4337a3

feat: Expose near-lake-primitives to VM (#613)

41ccc6a

- Upgrade `near-lake-primitives` to `0.2.0`, which includes `borsh` - Expose entire `near-lake-primitives` library to VM via `primitives`, e.g. borsh can be accessed via `primitives.borsh.fromBorsh()`

darunrs marked this pull request as ready for review March 27, 2024 21:44

darunrs requested a review from a team as a code owner March 27, 2024 21:44

darunrs changed the title ~~Prod Release 01/04/2024~~ Prod Release 02/04/2024 Mar 28, 2024

morgsmccauley and others added 5 commits March 30, 2024 09:04

fix: DmlHandler using wrong port (#630)

9810b0b

DmlHandler was using port number handed to it by Hasura, which is 5432. We want it to use 6432 which is the port specified by the env variable. 6432 points to pgBouncer.

chore: Disable new logs table while still in development (#632)

4e1ac2f

darunrs merged commit b96e269 into stable Apr 3, 2024
9 checks passed

darunrs added a commit that referenced this pull request Apr 3, 2024

Revert "Prod Release 02/04/2024 (#623)"

95b96a1

This reverts commit b96e269.

darunrs mentioned this pull request Apr 3, 2024

Revert "Prod Release 02/04/2024" #633

Merged

darunrs added a commit that referenced this pull request Apr 3, 2024

Revert "Prod Release 02/04/2024" (#633)

bd64f58

Reverts #623

darunrs mentioned this pull request Apr 3, 2024

Investigate Production Issues and complete Prod Release #635

Closed

morgsmccauley added a commit that referenced this pull request Apr 22, 2024

Merge pull request #623 from near/main

88ea4f1

morgsmccauley added a commit that referenced this pull request Apr 22, 2024

Merge pull request #623 from near/main

211383c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prod Release 02/04/2024 #623

Prod Release 02/04/2024 #623

darunrs commented Mar 27, 2024 •

edited

Loading

darunrs commented Apr 2, 2024

Prod Release 02/04/2024 #623

Prod Release 02/04/2024 #623

Conversation

darunrs commented Mar 27, 2024 • edited Loading

darunrs commented Apr 2, 2024

darunrs commented Mar 27, 2024 •

edited

Loading