Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide how to configure postgres table/function auto-discovery #512

Closed
nyurik opened this issue Dec 12, 2022 · 14 comments
Closed

Decide how to configure postgres table/function auto-discovery #512

nyurik opened this issue Dec 12, 2022 · 14 comments
Assignees
Labels
config Relates to Martin configuration enhancement relnotes

Comments

@nyurik
Copy link
Member

nyurik commented Dec 12, 2022

The upcoming #511 allows multiple postgres connections. Also, the recently merged #510 adds support for per-schema filtering when automatically configuring table and function sources. Yet, we don't have a way to configure them yet.


Proposed schema:

# Configure some tables/functions, plus use an extensive discovery system configuration
postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    functions:
        src1:
            schema: public
            function: my_function
            maxzoom: 10
    tables:
        src2:
            schema: public
            table: my_table
    discover:
        # Only search the given PostgreSQL schemas, for both functions and tables.
        # Set them up with maxzoom 10.
        # TBD: Naming: discover, auto, configure, auto-config, ...?
        functions: true
        tables: true
        schemas: ["public", "my-schema"]
        maxzoom: 10
        # How should the new table and function sources be named
        table_naming: {schema}.{table}.{column}
        function:naming: {function}

This is how currently users can automatically configure all available tables and functions in all schemas:

postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    # no additional configuration for tables or funcs

With this config, users enable just one table as a src1 source, and disable auto-discovery

postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    tables:
        src1:
            schema: public
            table: my_table
            srid: 4326
            geometry_column: geom
    functions:
        src2:
            schema: public
            function: my_function

Need feedback

Beyond the above, we may want to configure the following scenarios:

  • Neither tables nor functions are configured (first example above), but the users want to auto-discover just the functions, but not tables. Or just the tables.
  • Same, but autodiscovery should be done only in a given list of schemas (per martin config expose only specific schema #47 request)
  • Allow pre-configure some tables and additionally enable autodiscovery -- not certain if this is needed or not?

Other configuration ideas

See the primary idea at the top. Most of these other ideas apply to both tables and functions, but for brevity only one is mentioned.

# Configure some functions, plus expose all other functions too
postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    functions:
        src1:
            schema: public
            function: my_function
            maxzoom: 10
    discover_functions: true
# Configure some functions, plus expose all other functions that are listed in the given schemas
postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    functions:
        src1:
            schema: public
            function: my_function
            maxzoom: 10
    discover_functions:
        - public
        - some-schema

P.S. I also renamed connection_string into connection (will support older format for backward compat too)

@BigData-YC

This comment was marked as off-topic.

@nyurik

This comment was marked as off-topic.

@BigData-YC

This comment was marked as off-topic.

@nyurik

This comment was marked as off-topic.

@nyurik nyurik added enhancement config Relates to Martin configuration labels Dec 13, 2022
@stepankuzmin
Copy link
Collaborator

We can use this format to have more granular control over public schemas, tables, and functions, and keep it less verbose

# Expose schemas, tables, and functions automatically
auto_publish:
    # Expose all tables and functions from the following schemas.
    # Use "*" to expose all available sources.
    # Extends the `from_schemas` lists in the `tables` and `functions` below
    schemas: ["public", "my_schema"]
    tables:
        # Format string to build table source id.
        id_format: "{schema}.{function}.{column}"
        # Expose tables from the schemas list.
        # Use "*" to expose all available tables.
        from_schemas: ["public", "my_tables_schema"]
    functions:
        # Format string to build function source id
        id_format: "{schema}.{function}.{column}"
        # Expose functions from the schemas list.
        # Use "*" to expose all available functions.
        from_schemas: ["public", "my_functions_schema"]

Case 1

Expose all tables and functions from all schemas

auto_publish:
    schemas: "*"
    tables:
        id_format: "{schema}.{table}.{column}"
    functions:
        id_format: "{schema}.{function}.{column}"

This is the same as

auto_publish:
    tables:
        id_format: "{schema}.{table}.{column}"
        from_schemas: "*"
    functions:
        id_format: "{schema}.{function}.{column}"
        from_schemas: "*"

Case 2

Expose tables from all schemas

auto_publish:
    tables:
        id_format: "{schema}.{table}.{column}"
        from_schemas: "*"

Case 3

Expose functions from the "my-functions" schema

auto_publish:
    functions:
        id_format: "{schema}.{function}.{column}"
        from_schemas: ["my-functions"]

Case 4

Expose all tables and functions from the "public" schema, additionally expose all tables from the "my-tables" schema and all functions from the "my-functions" schema

auto_publish:
    schemas: ["public"]
    tables:
        id_format: "{schema}.{table}.{column}"
        from_schemas: ["my-tables"]
    functions:
        id_format: "{schema}.{function}.{column}"
        from_schemas: ["my-functions"]

@nyurik
Copy link
Member Author

nyurik commented Dec 25, 2022

I like it. A few things:

  • I think we should call it from_schemas everywhere, and not just inside the tables/functions - for consistency
  • I don't want to introduce any special characters like * just yet if we can avoid it. But that does introduce a problem that if from_schemas is missing in the root, does it mean that both tables and functions will be searched in all schemas, or does it mean that functions will not be searched if tables from_schemas is present?
  • to turn off auto publish, do we just set either auto_publish or tables/functions to false?

@stepankuzmin
Copy link
Collaborator

  • Sure, from_schemas in the root looks good 👍
  • I'd prefer enabling auto publish for all schemas somehow explicitly, but we can drop the "*" symbol. In case when there is no from_schemas in the root, I'd expect Martin to search for tables and functions only in schemas listed in from_schemas under corresponding keys. But I'm unsure what is the best way to enable auto publish for all schemas in such situation without explicitly listing all schemas or introducing some boolean flag for that, e.g., from_all_schemas: true.
  • I'd suggest that auto publish feature is disabled if user omits the auto_publish key. Same way with the tables and functions, e.g., if user omits tables key, we should disable auto publish for tables.

@nyurik
Copy link
Member Author

nyurik commented Dec 27, 2022

The easiest path to solve it IMO is:

  • If there is no auto_publish, auto-discovery of sources is disabled if either tables: or functions: exist (on the postgres object).
  • If auto_publish: <foo> exists, it is enabled. The <foo> can be either an object or a boolean. Only boolean false disables it.
  • If auto_publish is an object, and it has functions: <foo>, only functions will be auto-published, unless <foo> is false. If it is false, and tables does not exist, tables becomes true

Here's the list of all variants that enable auto source discovery for tables. Functions is similar.

auto_publish: true
---
auto_publish: {}
---
auto_publish:
    tables: true
---
auto_publish:
    tables: {}
---
# this is the weirdest one, but makes sense - if functions are explicitly off, than tables are implicitly on
auto_publish:
    functions: false

@nyurik
Copy link
Member Author

nyurik commented Dec 27, 2022

P.S. from_schemas will be additive -- if root postgres object has a list of schemas, and tables also has a list of schemas, they will be compbined.

@stepankuzmin
Copy link
Collaborator

I prefer to avoid implicit auto-publish if it's possible. I'd expect auto_publish to be explicit about exposing tables and functions, i.e., more preventive, so users should be explicit on what they want to expose.

Mixing booleans and objects are kinda weird, but

  1. If we consider "false" as a default value (i.e., if the key is absent, it's false by default)
  2. Fix the id_format as "{schema}.{table}.{column}" and "{schema}.{function}.{column}"

It would allow us to simplify the config a bit more:

# Expose all tables and functions from all schemas
auto_publish: true

# Same as above
auto_publish:
    tables: true
    functions: true
# Disable auto-publish (same as omitting the `auto_publish` key)
auto_publish: false
# Only expose all tables from all schemas
auto_publish:
    tables: true

# Same as above
auto_publish:
    tables: true
    functions: false
# Only expose all functions from all schemas
auto_publish:
    functions: true

auto_publish:
    tables: false
    functions: true
# Expose all tables and functions from the "public" schema and
# additionally, expose tables from the "my-tables" schema
# and functions from the "my-functions" schema
auto_publish:
    schemas: ["public"]
    tables: ["my-tables"]
    functions: ["my-functions"]
# Only expose tables from the "my-tables" schema
# and functions from the "my-functions" schema
auto_publish:
    tables: ["my-tables"]
    functions: ["my-functions"]
# Only expose tables from the "my-tables" schema
auto_publish:
    tables: ["my-tables"]
# Only expose functions from the "my-functions" schema
auto_publish:
    functions: ["my-functions"]

@nyurik
Copy link
Member Author

nyurik commented Dec 29, 2022

I like most of the above, except the list variant:

auto_publish:
    tables: ["my-tables"]
    functions: ["my-functions"]

I think it is non-obvious that tables: and functions: can have a list of strings - they could mean a list of schemas, or they could be a glob wildcard, e.g. public.table_*, etc. So I would only allow them to be either a boolean or an object for now. We can always increase the number of variants later. Bool = enable/disable, object = details:

...
    #[serde(skip_serializing_if = "Option::is_none")]
    pub auto_publish: Option<BoolOrObject<PgCfgPublish>>,
}

#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
pub struct PgCfgPublish {
    pub from_schema: Option<OneOrMany<String>>,
    pub tables: Option<BoolOrObject<PgCfgPublishType>>,
    pub functions: Option<BoolOrObject<PgCfgPublishType>>,
}

#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
pub struct PgCfgPublishType {
    pub from_schema: Option<OneOrMany<String>>,
    pub id_format: Option<String>,
}

Lastly, with the bool, the main issue i have is this scenario -- what does it mean? The best way here is to act on what the user "meant", not on the exact following of the internal logic.

auto_publish:
     from_schema: public
     functions: false

The user here clearly meant "allow all tables from the schema public, but not the functions". If we simply ignore functions: false, and treat it the same as auto_publish: false, we may be "technically" correct, but it would cause a lot of confusion with the users. So better handle it in a true Rust fashion - ease of use :)

nyurik added a commit to nyurik/martin that referenced this issue Dec 29, 2022
* NEW: support for maplibre#512 - pg table/function auto-discovery
  * can filter schemas
  * can use patterns like `{schema}.{table}.{column}` and `{schema}.{function}`
* NEW: add `calc_bounds` bool flag to allow disabling of the bounds computation
* reworked integration tests to use yaml
nyurik added a commit to nyurik/martin that referenced this issue Dec 30, 2022
* NEW: support for maplibre#512 - pg table/function auto-discovery
  * can filter schemas
  * can use patterns like `{schema}.{table}.{column}` and `{schema}.{function}`
* NEW: add `calc_bounds` bool flag to allow disabling of the bounds computation
* reworked integration tests to use yaml
@stepankuzmin
Copy link
Collaborator

stepankuzmin commented Dec 31, 2022

Okay, let's agree on "Bool = enable/disable, object = details" :)

However, I'd still prefer the user to be explicit about exposing tables in the example below. So it'd be safer when the user commented out/removed tables: true, expecting it to disable auto-publishing tables, so there'd be no unintentional publishing, which might lead to security issues.

auto_publish:
     from_schema: public
     functions: false

@nyurik
Copy link
Member Author

nyurik commented Dec 31, 2022

I think that would be extremely confusing - if I gave that specific configuration, my expectation is that tables are enabled - otherwise we have a whole configuration section that has no affect - not a good approach. This case below is so rare and weird, I would not take it into account

auto_publish:
    from_schema: public
    # tables: true
    functions: false

nyurik added a commit that referenced this issue Jan 3, 2023
* NEW: support for #512 - pg table/function auto-discovery
  * can filter schemas
* can use patterns like `{schema}.{table}.{column}` and
`{schema}.{function}`
* NEW: add `disable_bounds` bool flag to allow disabling of the bounds
computation
* reworked integration tests to use yaml
@nyurik nyurik added the relnotes label Jan 3, 2023
@nyurik
Copy link
Member Author

nyurik commented Jan 3, 2023

added in #546

@nyurik nyurik closed this as completed Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config Relates to Martin configuration enhancement relnotes
Projects
None yet
Development

No branches or pull requests

3 participants