Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Substitution 'node-sql-parser' with a forked version until April 1st (Next Release) #597

Merged
merged 12 commits into from
Mar 20, 2024

Conversation

Kevin101Zhang
Copy link
Contributor

@Kevin101Zhang Kevin101Zhang commented Mar 6, 2024

A temporary change in our codebase. Replacing the usage of node-sql-parser with kevin-node-sql-parser until April 1st. The reason for this substitution is that the official release of node-sql-parser lacks a version release for the additional SQL statements required for our current project needs. In the interim, this forked version addresses these shortcomings and allows us to incorporate the required SQL statements. Please note that this is a temporary measure, and we plan to revert to the official node-sql-parser version after April 1st, once the required features are officially available.

See last comment for details

@Kevin101Zhang Kevin101Zhang requested a review from a team as a code owner March 6, 2024 23:40
@Kevin101Zhang Kevin101Zhang marked this pull request as draft March 6, 2024 23:40
@Kevin101Zhang Kevin101Zhang requested a review from darunrs March 6, 2024 23:42
@Kevin101Zhang Kevin101Zhang linked an issue Mar 10, 2024 that may be closed by this pull request
@Kevin101Zhang Kevin101Zhang marked this pull request as ready for review March 10, 2024 21:39
Copy link
Collaborator

@darunrs darunrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see to update accepted with few code changes necessary. I left some comments to remove testing changes. Seeing as we have another issue related to partitions, can you open an issue in node-sql-parser to support CREATE TABLE access_keys_hash_p0 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 0);? Hopefully, the author can put support for that in before the march release.

I would say neither of the suggested Runner approaches are ones we would want to work with. Trying to isolate create table statements is hard because more things than just create table impact type gen. Case in point CREATE TYPE. We also have alter table. And anything which sets restrictions on columns or primary keys. And probably many more. We expose ourselves to more edge cases. In that, that was my initial approach to case sensitive columns before scrapping it in favor of asking the author to just support quotes. It involves messing around with regex A LOT which is just a whole can of worms we'd like to avoid when possible. Especially against human generated strings like the schema.

Speaking of quotes, I believe that's the next low hanging fruit here before you fully drive on logs table epic.

I noticed you went ahead with letting the code default the any type for the created types. As I mentioned earlier, this is fine for now. Perhaps in the future we can support some commonly used types like enum.

frontend/src/utils/pgSchemaTypeGen.js Outdated Show resolved Hide resolved
frontend/src/utils/pgSchemaTypeGen.js Outdated Show resolved Hide resolved
runner/src/indexer/indexer.ts Outdated Show resolved Hide resolved
@darunrs
Copy link
Collaborator

darunrs commented Mar 11, 2024

Oh one other thing, can you rename the PR? It's very amgibuous as it is. It doesn't communicate what is being migrated. And it also doesn't explain that we're substituting a package in the short term. It should be more likefix: Use forked node-sql-parser and the desc explains what that means in more detail (e.g. forked because planned release is too late).

I'd also clean up the PR description (Which becomes the commit message). It's super wordy, which won't help whoever might reads this in the future. The PR description should describe what changes you made, why they were necessary (The issue being solved), and any tech debt incurred or planned future actions relevant to the changes.

In this case, substituting the package, testing to ensure some particular schema/sql is supported, explaining what isn't supported still (Partition) and any planned work related to the fix (partition support, leveraging quote data in AST for case sensitive schema support).

@Kevin101Zhang
Copy link
Contributor Author

Frontend Changes:

Changing our version of node-sql-parser from 4.10 -> 4.18.1 (kevin-node-sql-parser) resulted in the library generating a more verbose AST. As a result, our types generation stopped working. This video showcases successful type generation after migrating to the latest version using kevin-node-sql-parser for a basic ddl.

https://www.loom.com/share/63a05346c08c409387c6c3571c8a67d9

Changes to the AS after bumping versions using a diffchecker
Screenshot 2024-03-06 at 6 41 15 PM

However when trying a more complex ddl. The parser supports and respects Types but does not respect partitions syntax and throws an error when attempting to astify.

Runner Changes:

Runner uses node-sql-parser in the index.ts file.

When substituting node-sql-parser for kevin-node-sql-parser to runner which is the next upgrade when node-sql-parser is updated (expect end of this March), buildDatabaseContext will succeed for basic examples but for more complex DDL statements they fail for tables with an expected error message of the following:

Caught error when generating context.db methods. Building no functions. You can still use other context object methods. Expected "--", "/*", ":=", "=", "FOR", "TO", or [ \t\n\r] but "B" found. (Same error thrown on the frontend)

After a briefly tracking the error, the error occurs in getTableNames specifically in the astify line.
let schemaSyntaxTree = this.deps.parser.astify(schema, { database: 'Postgresql' });

Provided with this SQL below

CREATE TYPE access_key_permission_kind AS ENUM ('FULL_ACCESS', 'FUNCTION_CALL');
CREATE TABLE
  access_keys (
    public_key text NOT NULL,
    account_id text NOT NULL,
    created_by_receipt_id text NULL,
    deleted_by_receipt_id text NULL,
    permission_kind access_key_permission_kind NOT NULL,
    last_update_block_height numeric(20) NOT NULL,
    CONSTRAINT access_keys_pk PRIMARY KEY (public_key, account_id)
  );
PARTITION BY HASH (public_key); 
CREATE INDEX access_keys_account_id_idx ON access_keys USING btree (account_id);
CREATE INDEX access_keys_last_update_block_height_idx ON access_keys USING btree (last_update_block_height);
CREATE INDEX access_keys_public_key_idx ON access_keys USING btree (public_key);

Due to the ticket here. node-sql-parser does yet not support full declarative partitions.

ex: PARTITION BY HASH (public_key); CREATE TABLE access_keys_hash_p0 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 0);

As a result, The behavior of the sql above (Note the line PARTITION BY HASH (public_key)) will be using the table inheritance where it doesn't create separate physical tables for each partition; instead, it distributes data within the main table based on the hash value of the specified column.

compared to

CREATE TYPE access_key_permission_kind AS ENUM ('FULL_ACCESS', 'FUNCTION_CALL');
CREATE TABLE
  access_keys (
    public_key text NOT NULL,
    account_id text NOT NULL,
    created_by_receipt_id text NULL,
    deleted_by_receipt_id text NULL,
    permission_kind access_key_permission_kind NOT NULL,
    last_update_block_height numeric(20) NOT NULL,
    CONSTRAINT access_keys_pk PRIMARY KEY (public_key, account_id)
  );
PARTITION BY HASH (public_key); 
CREATE INDEX access_keys_account_id_idx ON access_keys USING btree (account_id);
CREATE INDEX access_keys_last_update_block_height_idx ON access_keys USING btree (last_update_block_height);
CREATE INDEX access_keys_public_key_idx ON access_keys USING btree (public_key);

CREATE TABLE access_keys_hash_p0 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 0);
CREATE TABLE access_keys_hash_p1 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 1);
CREATE TABLE access_keys_hash_p2 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 2);
CREATE TABLE access_keys_hash_p3 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 3);
CREATE TABLE access_keys_hash_p4 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 4);
CREATE TABLE access_keys_hash_p5 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 5);
CREATE TABLE access_keys_hash_p6 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 6);
CREATE TABLE access_keys_hash_p7 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 7);
CREATE TABLE access_keys_hash_p8 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 8);
CREATE TABLE access_keys_hash_p9 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 9);

In these statements there are physical creations of tables access_keys_hash_p0 - access_keys_hash_p9 based off the partition strategy.

Currently: The parser's astify does not support the Partition by Hash (see additional ticket link above).
I believe the intended behavior of what we want is the following and the steps we can take are the following:

For FE:
Blocked: Same issue as the initial just one step forward - Astify now respects Types and the use of Types but does not support Partitions.

For Runner:

  1. When the SQL involves partition by hash without specifying physical partitions of the table ex. CREATE TABLE access_keys_hash_p0 PARTITION OF access_keys FOR VALUES WITH (modulus 10, remainder 0); We can ignore PARTITION BY HASH as it has no effect on building the context for the tablenames (since there are no physical partitions taking place) We can still successfully generation table contexts after ignoring.
  2. If the SQL does involve both partition by hash and create table ... partiton of .. for values with... (Physical creation of partition tables). We Can treat them as individual table creations for the purpose of creating table contexts (insert, select, update, upsert, delete).

SUMMARY

  • Successfully upgraded node-sql-parser from 4.10 to 4.18.1 (kevin-node-sql-parser) in the fe and be.
  • Successful type generation and table generation for basic DDL; however, issues arise with more complex DDL statements.

Next Steps for Runner:

  • Consider ignoring PARTITION BY HASH when it doesn't involve physical creation of partition tables.
  • Evaluate treating SQL with both PARTITION BY HASH and physical partition table creation individually for creating table contexts.

In General:

  • Support partitions syntax for node-sql-parser astify

@Kevin101Zhang Kevin101Zhang changed the title fix: migrated from 4.10 -> 4.18.1 fix: Substitution node-sql-parser with kevin-node-sql-parser until April 1st (Next Release) Mar 11, 2024
@Kevin101Zhang Kevin101Zhang changed the title fix: Substitution node-sql-parser with kevin-node-sql-parser until April 1st (Next Release) fix: Substitution 'node-sql-parser' with a forked version until April 1st (Next Release) Mar 11, 2024
@Kevin101Zhang Kevin101Zhang requested a review from darunrs March 11, 2024 21:05
@darunrs
Copy link
Collaborator

darunrs commented Mar 11, 2024

Nice job with the changes to the PR Description. It's a good step in the right direction. You don't need to make it that formal either. Just consider that if someone new to the team had to look at this PR for whatever reason with little context. Write the description to, at a glance, describe what the PR does and the decision making going into it.

frontend/src/utils/pgSchemaTypeGen.js Outdated Show resolved Hide resolved
frontend/src/utils/pgSchemaTypeGen.js Outdated Show resolved Hide resolved
Copy link
Collaborator

@darunrs darunrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor Author

@Kevin101Zhang Kevin101Zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved conflicts

@Kevin101Zhang Kevin101Zhang merged commit 1bccff1 into main Mar 20, 2024
5 checks passed
@Kevin101Zhang Kevin101Zhang deleted the 537-support-extra-ddl-statements branch March 20, 2024 18:48
@darunrs darunrs mentioned this pull request Mar 28, 2024
darunrs pushed a commit that referenced this pull request Apr 3, 2024
- feat: Instrument Runner Service (#602)
- Support WHERE col IN (...) in context.db.table.select and delete
(#606)
- feat: Include indexer name in context db build failure warning (#611)
- Cache provisioning status (#607)
- Fix ESLint on DmlHandler (#612)
- fix: Substitution 'node-sql-parser' with a forked version until Apri
1st (#597)
- feat: Add pgBouncer to QueryApi (#615)
- feat: Expose near-lake-primitives to VM (#613)

---------

Co-authored-by: Pavel Kudinov <[email protected]>
Co-authored-by: Pavel Kudinov <[email protected]>
Co-authored-by: Kevin Zhang <[email protected]>
Co-authored-by: Morgan McCauley <[email protected]>
This was referenced Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support extra DDL statements
3 participants