Decide how to configure postgres table/function auto-discovery #512

nyurik · 2022-12-12T01:47:17Z

The upcoming #511 allows multiple postgres connections. Also, the recently merged #510 adds support for per-schema filtering when automatically configuring table and function sources. Yet, we don't have a way to configure them yet.

Proposed schema:

# Configure some tables/functions, plus use an extensive discovery system configuration
postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    functions:
        src1:
            schema: public
            function: my_function
            maxzoom: 10
    tables:
        src2:
            schema: public
            table: my_table
    discover:
        # Only search the given PostgreSQL schemas, for both functions and tables.
        # Set them up with maxzoom 10.
        # TBD: Naming: discover, auto, configure, auto-config, ...?
        functions: true
        tables: true
        schemas: ["public", "my-schema"]
        maxzoom: 10
        # How should the new table and function sources be named
        table_naming: {schema}.{table}.{column}
        function:naming: {function}

This is how currently users can automatically configure all available tables and functions in all schemas:

postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    # no additional configuration for tables or funcs

With this config, users enable just one table as a src1 source, and disable auto-discovery

postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    tables:
        src1:
            schema: public
            table: my_table
            srid: 4326
            geometry_column: geom
    functions:
        src2:
            schema: public
            function: my_function

Need feedback

Beyond the above, we may want to configure the following scenarios:

Neither tables nor functions are configured (first example above), but the users want to auto-discover just the functions, but not tables. Or just the tables.
Same, but autodiscovery should be done only in a given list of schemas (per martin config expose only specific schema #47 request)
Allow pre-configure some tables and additionally enable autodiscovery -- not certain if this is needed or not?

Other configuration ideas

See the primary idea at the top. Most of these other ideas apply to both tables and functions, but for brevity only one is mentioned.

# Configure some functions, plus expose all other functions too
postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    functions:
        src1:
            schema: public
            function: my_function
            maxzoom: 10
    discover_functions: true

# Configure some functions, plus expose all other functions that are listed in the given schemas
postgres:
    connection: 'postgres://postgres@localhost:5432/db'
    functions:
        src1:
            schema: public
            function: my_function
            maxzoom: 10
    discover_functions:
        - public
        - some-schema

P.S. I also renamed connection_string into connection (will support older format for backward compat too)

The text was updated successfully, but these errors were encountered:

stepankuzmin · 2022-12-25T08:52:30Z

We can use this format to have more granular control over public schemas, tables, and functions, and keep it less verbose

# Expose schemas, tables, and functions automatically
auto_publish:
    # Expose all tables and functions from the following schemas.
    # Use "*" to expose all available sources.
    # Extends the `from_schemas` lists in the `tables` and `functions` below
    schemas: ["public", "my_schema"]
    tables:
        # Format string to build table source id.
        id_format: "{schema}.{function}.{column}"
        # Expose tables from the schemas list.
        # Use "*" to expose all available tables.
        from_schemas: ["public", "my_tables_schema"]
    functions:
        # Format string to build function source id
        id_format: "{schema}.{function}.{column}"
        # Expose functions from the schemas list.
        # Use "*" to expose all available functions.
        from_schemas: ["public", "my_functions_schema"]

Case 1

Expose all tables and functions from all schemas

auto_publish:
    schemas: "*"
    tables:
        id_format: "{schema}.{table}.{column}"
    functions:
        id_format: "{schema}.{function}.{column}"

This is the same as

auto_publish:
    tables:
        id_format: "{schema}.{table}.{column}"
        from_schemas: "*"
    functions:
        id_format: "{schema}.{function}.{column}"
        from_schemas: "*"

Case 2

Expose tables from all schemas

auto_publish:
    tables:
        id_format: "{schema}.{table}.{column}"
        from_schemas: "*"

Case 3

Expose functions from the "my-functions" schema

auto_publish:
    functions:
        id_format: "{schema}.{function}.{column}"
        from_schemas: ["my-functions"]

Case 4

Expose all tables and functions from the "public" schema, additionally expose all tables from the "my-tables" schema and all functions from the "my-functions" schema

auto_publish:
    schemas: ["public"]
    tables:
        id_format: "{schema}.{table}.{column}"
        from_schemas: ["my-tables"]
    functions:
        id_format: "{schema}.{function}.{column}"
        from_schemas: ["my-functions"]

nyurik · 2022-12-25T10:03:58Z

I like it. A few things:

I think we should call it from_schemas everywhere, and not just inside the tables/functions - for consistency
I don't want to introduce any special characters like * just yet if we can avoid it. But that does introduce a problem that if from_schemas is missing in the root, does it mean that both tables and functions will be searched in all schemas, or does it mean that functions will not be searched if tables from_schemas is present?
to turn off auto publish, do we just set either auto_publish or tables/functions to false?

stepankuzmin · 2022-12-27T12:18:27Z

Sure, from_schemas in the root looks good 👍
I'd prefer enabling auto publish for all schemas somehow explicitly, but we can drop the "*" symbol. In case when there is no from_schemas in the root, I'd expect Martin to search for tables and functions only in schemas listed in from_schemas under corresponding keys. But I'm unsure what is the best way to enable auto publish for all schemas in such situation without explicitly listing all schemas or introducing some boolean flag for that, e.g., from_all_schemas: true.
I'd suggest that auto publish feature is disabled if user omits the auto_publish key. Same way with the tables and functions, e.g., if user omits tables key, we should disable auto publish for tables.

nyurik · 2022-12-27T20:05:43Z

The easiest path to solve it IMO is:

If there is no auto_publish, auto-discovery of sources is disabled if either tables: or functions: exist (on the postgres object).
If auto_publish: <foo> exists, it is enabled. The <foo> can be either an object or a boolean. Only boolean false disables it.
If auto_publish is an object, and it has functions: <foo>, only functions will be auto-published, unless <foo> is false. If it is false, and tables does not exist, tables becomes true

Here's the list of all variants that enable auto source discovery for tables. Functions is similar.

auto_publish: true
---
auto_publish: {}
---
auto_publish:
    tables: true
---
auto_publish:
    tables: {}
---
# this is the weirdest one, but makes sense - if functions are explicitly off, than tables are implicitly on
auto_publish:
    functions: false

nyurik · 2022-12-27T20:13:34Z

P.S. from_schemas will be additive -- if root postgres object has a list of schemas, and tables also has a list of schemas, they will be compbined.

stepankuzmin · 2022-12-28T12:57:10Z

I prefer to avoid implicit auto-publish if it's possible. I'd expect auto_publish to be explicit about exposing tables and functions, i.e., more preventive, so users should be explicit on what they want to expose.

Mixing booleans and objects are kinda weird, but

If we consider "false" as a default value (i.e., if the key is absent, it's false by default)
Fix the id_format as "{schema}.{table}.{column}" and "{schema}.{function}.{column}"

It would allow us to simplify the config a bit more:

# Expose all tables and functions from all schemas
auto_publish: true

# Same as above
auto_publish:
    tables: true
    functions: true

# Disable auto-publish (same as omitting the `auto_publish` key)
auto_publish: false

# Only expose all tables from all schemas
auto_publish:
    tables: true

# Same as above
auto_publish:
    tables: true
    functions: false

# Only expose all functions from all schemas
auto_publish:
    functions: true

auto_publish:
    tables: false
    functions: true

# Expose all tables and functions from the "public" schema and
# additionally, expose tables from the "my-tables" schema
# and functions from the "my-functions" schema
auto_publish:
    schemas: ["public"]
    tables: ["my-tables"]
    functions: ["my-functions"]

# Only expose tables from the "my-tables" schema
# and functions from the "my-functions" schema
auto_publish:
    tables: ["my-tables"]
    functions: ["my-functions"]

# Only expose tables from the "my-tables" schema
auto_publish:
    tables: ["my-tables"]

# Only expose functions from the "my-functions" schema
auto_publish:
    functions: ["my-functions"]

nyurik · 2022-12-29T02:52:34Z

I like most of the above, except the list variant:

auto_publish:
    tables: ["my-tables"]
    functions: ["my-functions"]

I think it is non-obvious that tables: and functions: can have a list of strings - they could mean a list of schemas, or they could be a glob wildcard, e.g. public.table_*, etc. So I would only allow them to be either a boolean or an object for now. We can always increase the number of variants later. Bool = enable/disable, object = details:

...
    #[serde(skip_serializing_if = "Option::is_none")]
    pub auto_publish: Option<BoolOrObject<PgCfgPublish>>,
}

#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
pub struct PgCfgPublish {
    pub from_schema: Option<OneOrMany<String>>,
    pub tables: Option<BoolOrObject<PgCfgPublishType>>,
    pub functions: Option<BoolOrObject<PgCfgPublishType>>,
}

#[derive(Clone, Debug, Default, PartialEq, Serialize, Deserialize)]
pub struct PgCfgPublishType {
    pub from_schema: Option<OneOrMany<String>>,
    pub id_format: Option<String>,
}

Lastly, with the bool, the main issue i have is this scenario -- what does it mean? The best way here is to act on what the user "meant", not on the exact following of the internal logic.

auto_publish:
     from_schema: public
     functions: false

The user here clearly meant "allow all tables from the schema public, but not the functions". If we simply ignore functions: false, and treat it the same as auto_publish: false, we may be "technically" correct, but it would cause a lot of confusion with the users. So better handle it in a true Rust fashion - ease of use :)

* NEW: support for maplibre#512 - pg table/function auto-discovery * can filter schemas * can use patterns like `{schema}.{table}.{column}` and `{schema}.{function}` * NEW: add `calc_bounds` bool flag to allow disabling of the bounds computation * reworked integration tests to use yaml

stepankuzmin · 2022-12-31T11:51:34Z

Okay, let's agree on "Bool = enable/disable, object = details" :)

However, I'd still prefer the user to be explicit about exposing tables in the example below. So it'd be safer when the user commented out/removed tables: true, expecting it to disable auto-publishing tables, so there'd be no unintentional publishing, which might lead to security issues.

auto_publish:
     from_schema: public
     functions: false

nyurik · 2022-12-31T17:59:58Z

I think that would be extremely confusing - if I gave that specific configuration, my expectation is that tables are enabled - otherwise we have a whole configuration section that has no affect - not a good approach. This case below is so rare and weird, I would not take it into account

auto_publish:
    from_schema: public
    # tables: true
    functions: false

* NEW: support for #512 - pg table/function auto-discovery * can filter schemas * can use patterns like `{schema}.{table}.{column}` and `{schema}.{function}` * NEW: add `disable_bounds` bool flag to allow disabling of the bounds computation * reworked integration tests to use yaml

nyurik · 2023-01-03T16:15:34Z

added in #546

nyurik assigned stepankuzmin and nyurik Dec 12, 2022

This was referenced Dec 12, 2022

martin config expose only specific schema #47

Closed

Decide how to configure file-based sources like mbtiles and pmtiles #513

Closed

This comment was marked as off-topic.

Sign in to view

nyurik added enhancement config Relates to Martin configuration labels Dec 13, 2022

nyurik mentioned this issue Dec 23, 2022

Why don't you take the front part？？ #538

Closed

nyurik mentioned this issue Dec 29, 2022

Implement postgres auto-publish #546

Merged

nyurik added the relnotes label Jan 3, 2023

nyurik closed this as completed Jan 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide how to configure postgres table/function auto-discovery #512

Decide how to configure postgres table/function auto-discovery #512

nyurik commented Dec 12, 2022 •

edited

Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

stepankuzmin commented Dec 25, 2022

nyurik commented Dec 25, 2022

stepankuzmin commented Dec 27, 2022

nyurik commented Dec 27, 2022 •

edited

Loading

nyurik commented Dec 27, 2022

stepankuzmin commented Dec 28, 2022

nyurik commented Dec 29, 2022 •

edited

Loading

stepankuzmin commented Dec 31, 2022 •

edited

Loading

nyurik commented Dec 31, 2022

nyurik commented Jan 3, 2023

Decide how to configure postgres table/function auto-discovery #512

Decide how to configure postgres table/function auto-discovery #512

Comments

nyurik commented Dec 12, 2022 • edited Loading

Need feedback

Other configuration ideas

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

stepankuzmin commented Dec 25, 2022

Case 1

Case 2

Case 3

Case 4

nyurik commented Dec 25, 2022

stepankuzmin commented Dec 27, 2022

nyurik commented Dec 27, 2022 • edited Loading

nyurik commented Dec 27, 2022

stepankuzmin commented Dec 28, 2022

nyurik commented Dec 29, 2022 • edited Loading

stepankuzmin commented Dec 31, 2022 • edited Loading

nyurik commented Dec 31, 2022

nyurik commented Jan 3, 2023

nyurik commented Dec 12, 2022 •

edited

Loading

nyurik commented Dec 27, 2022 •

edited

Loading

nyurik commented Dec 29, 2022 •

edited

Loading

stepankuzmin commented Dec 31, 2022 •

edited

Loading