Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying an id column with auto_publish #682

Closed
3 tasks
nyurik opened this issue May 26, 2023 Discussed in #680 · 5 comments · Fixed by #790
Closed
3 tasks

Specifying an id column with auto_publish #682

nyurik opened this issue May 26, 2023 Discussed in #680 · 5 comments · Fixed by #790
Labels

Comments

@nyurik
Copy link
Member

nyurik commented May 26, 2023

Auto-discovered PG tables may have some columns that should be exposed as a feature ID rather than just being a regular feature property. I propose to add an id_column param as shown below.

postgres:
  connection_string: ${DATABASE_URL}
  auto_publish:
      tables:
        # Set a custom source ID based on the table name
        id_format: 'table.{schema}.{table}.{column}'
        # A table column to use as the feature ID
        # If a table has no column with this name, `id_column` will not be set for that table.
        # We could even allow a list of strings here - the first found column will be treated as a feature ID.
        id_column: 'id'

Deliverables

  • allow id_column string param to specify the default id column if it exists. If it does not exist, log an info message. If the column exists, but it is NOT an integer, log a warning. If ID column is found, it will NOT be exported as a property, only as the ID.
  • Optional: possibly allow id_column to be a list of strings, so if a table has any of the specified columns, use it.
  • Optional: consider renaming id_format into source_id_format, with backwards compatibility

Discussed in #680

@Binabh
Copy link
Contributor

Binabh commented Jul 30, 2023

@nyurik I am giving this a try. Can you have a quick look at draft PR and confirm if I am in right direction? Also, any hint on adding warnings if column not found/not of type integer would be helpful. I have implemented a generic info currently.

I am learning Rust and this is my first contribution using Rust. So, I would be happy to take feedback on code quality also.
Thank You!

nyurik added a commit that referenced this issue Aug 3, 2023
* rename configuration `auto_publish.tables.id_format` and
`auto_publish.functions.id_format` fields from `id_format` to
`source_id_format` fields. The `id_format` will continue to be supported
(read) from the configuration, but it will be auto-converted to the new
name on save. It is an error to have both in the same config file.
  * The rename was discussed in #682

* internal refactorings: consolidate PG-related utilities, rename a few
vars, move PG errors to their own file.

This is partially made due to #790 (thanks @Binabh!) - and should be
merged before that to make that PR easier.
nyurik added a commit that referenced this issue Aug 13, 2023
Resolves #682 

- [x] Get id_column string from config.yaml and use for id column
- [x] Support for list of strings
- [x] Add info/warnings if column is not there or is of wrong type
- [x] if column for the feature ID is found, remove it from properties
(see inline comment)
- [x] cleanup logging messages
- [x] need more tests to catch other edge cases

---------

Co-authored-by: Yuri Astrakhan <[email protected]>
@saneth
Copy link

saneth commented Jun 20, 2024

Hello,

Why support only integer types as id ?
According to the feature standard they can be strings :

If a Feature has a commonly used identifier, that identifier
SHOULD be included as a member of the Feature object with the name
"id", and the value of this member is either a JSON string or
number.

@nyurik
Copy link
Member Author

nyurik commented Jun 20, 2024

Could you give a link to the source? Thx

@saneth
Copy link

saneth commented Jun 20, 2024

Thanks for answering. Sure ! I have been looking here :

https://geojson.org/
https://datatracker.ietf.org/doc/html/rfc7946#section-3.2

@nyurik
Copy link
Member Author

nyurik commented Jun 20, 2024

ah, yes, geojson has a different definition than MVT specification:

A feature MAY contain an id field. If a feature has an id field, the value of the id SHOULD be unique among the features of the parent layer.

Which is defined here

        message Feature {
                optional uint64 id = 1 [ default = 0 ];  // <--- integer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants