Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes the pg_dump issues with create functions #980

Merged
merged 2 commits into from
Sep 16, 2024
Merged

Conversation

charithabandi
Copy link
Contributor

No description provided.

@@ -233,8 +233,28 @@ func pgDump(ctx context.Context, dbName, dbUser, dbPass, dbHost, dbPort string,
Name: fmt.Sprintf("validator-%d", validatorCount),
})
validatorCount++
} else if inFunctionBlock {
if strings.HasPrefix(line, "$$;") {
Copy link
Member

@jchappelow jchappelow Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to find documentation about this:

  • can there be other functions that end in something other than $$; (use a different delimiter)? like functions that happen to be in the dump that we didn't define for procedures. We might scan all the way to the end, or just too far (until the next function end).
  • Why prefix vs suffix vs equals (ignoring leading/trailing space) or even contains?
  • we often define functions ending with $$ LANGUAGE plpgsql; but I guess that gets simplified / stripped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jchappelow jchappelow Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately postgresql will error (and psql hangs expecting more input) if trying to do this:

CREATE FUNCTION test_func() RETURNS text AS $$
BEGIN
    RETURN 'This string contains $$ inside the body';
END;
$$ LANGUAGE plpgsql;

The only way to do that is to change the delimiter ($$) to something else:

CREATE FUNCTION test_func() RETURNS text AS $delim$
BEGIN
    RETURN 'This string contains $$ inside the body';
END;
$delim$ LANGUAGE plpgsql;

When we are crafting the CREATE FUNCTION in kwil, we use $$, so the function body cannot contain "$$" anywhere.


The remaining question I have is will pg_dump include any other system created functions that might use a different delimiter. (It creates a bunch of selects and sets to get things "right" like locale, so who knows if it's also going to create a function.) Note that in addition to dollar delimiting, the function body could be in single quotes, but that seems rare.

I think this will work for now, but we should document this pretty clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// create_user creates a user in the database.
// It is assigned a unique uuid.
// MUTABLE
procedure create_user($name text, $age int) public {
    // we will generate a uuid from the txid
    notice('stuff $$ adsf');
    INSERT INTO users (id, name, age, address)
    VALUES (uuid_generate_v5('985b93a4-2045-44d6-bde4-442a4e498bc6'::uuid, @txid), $name, $age, @caller);
}

on deploy:

failed to deploy database, code = 4294967295, log = "ERROR: syntax error at or near \"adsf\" (SQLSTATE 42601)"

} else {
if line == "" || strings.TrimSpace(line) == "" { // Skip empty lines
if strings.HasPrefix(line, "CREATE FUNCTION") {
Copy link
Member

@jchappelow jchappelow Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd miss CREATE OR REPLACE FUNCTION, right?
We use that for creating error and notice and probably other things.

Also in general, I think we should trim space (leading and trailing) before matching specific text

Copy link
Member

@jchappelow jchappelow Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

procedure create_user($name text, $age int) public {
    notice('stuff $$; create function adsf() returns text AS $asd$ select 1::int8; ');
    // malice here... how?
    notice('$asd$;');
}

Can someone get creative and create an arbitrary valid function like this @brennanjl ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. #981 fixes this. Since this edge case exists on preview right now, I think we should keep it since it is a breaking consensus change, and only put the fix on v0.9.

@brennanjl
Copy link
Collaborator

brennanjl commented Sep 16, 2024

As Jon points out, there are some edge cases here that are pretty unavoidable. This fix might work for now (since TSN is probably not using schemas with $$; as string literals), but it is a very attack-able system. IMO, we either need to:

  • Use the full dump, including the removed SETs and SELECTs. I did a test dump and none of the SETs or SELECTs seemed like a big deal, but I only tested one of them, and am unsure if a different DB would have different SETs and SELECTs.
  • Use a tokenizer that allows us to understand the context in which tokens are used. This can probably be done with some very sophisticated regexp, but it could be done pretty easily with ANTLR as well.

@charithabandi charithabandi force-pushed the statesync-snaps branch 2 times, most recently from df34999 to b0fb2c3 Compare September 16, 2024 16:13
@charithabandi charithabandi force-pushed the statesync-snaps branch 2 times, most recently from 5b22fc4 to 9ffb52a Compare September 16, 2024 17:59
@brennanjl
Copy link
Collaborator

Still reviewing, but is there any way to consolidate the sanitization logic between the two pgdump calls? It seems like they are very similar. If not, nbd, it just feels weird to have them be so similar and be duplicated.

@brennanjl
Copy link
Collaborator

Discussed in-person with @charithabandi but just for completeness:

We are going to go ahead with replicating the internal tables as-is. This means:

  1. Internal tables cannot be changed in a patch version.
  2. Users will get error warnings during replication since it will fail to create the internal tables (which already exist).

@brennanjl brennanjl merged commit 51cfe56 into main Sep 16, 2024
2 checks passed
@brennanjl brennanjl deleted the statesync-snaps branch September 16, 2024 20:57
brennanjl pushed a commit that referenced this pull request Sep 16, 2024
* Fixes the pg_dump issues with create functions

* Fail fast and warn users if statesync is enabled on the already initialized db
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants