Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql metadata converter #862

Closed
wants to merge 14 commits into from

Conversation

ezwelty
Copy link

@ezwelty ezwelty commented Jun 14, 2021


I extracted the existing frictionless<> sqlalchemy metadata conversion of plugins.sql.SqlStorage (which requires a database connection) into a public API which does not require a database connection. All tests pass, including those for postgresql and mysql.

This draft creates a new SqlConverter class which requires as optional inputs a sqlalchemy.Dialect and a plugins.sql.SqlDialect. Example usage below:

from frictionless import Package, Resource, Schema, Field
from frictionless.plugins.sql import SqlConverter, SqlDialect
from sqlalchemy.dialects import postgresql

package = Package(
  name='package',
  resources=[
    Resource(
      name='resource',
      schema=Schema(
        fields=[
          Field(name='id', type='integer')
        ]
      )
    )
  ]
)
converter = SqlConverter(postgresql.dialect(), SqlDialect(namespace='public'))
converter.resource_to_sql_table(package.resources[0])
# Table('resource', MetaData(), Column('id', Integer(), table=<resource>), schema='public'
converter.package_to_sql_metadata(package)
# MetaData()

Please preserve this line to notify @roll (lead of this repository)

@roll
Copy link
Member

roll commented Jun 15, 2021

@ezwelty
Great! It will take some time for me to get onto it because I want to dive deep but I think it already looks very promising

@roll
Copy link
Member

roll commented Jun 15, 2021

Also, I think we can have similar converter for CKAN using https://github.com/frictionlessdata/frictionless-ckan-mapper

@roll
Copy link
Member

roll commented Jun 16, 2021

@ezwelty
I stayed a little bit longer on this today and I think it looks really. I might update the API slightly after you finish the PR but, in general, I think it's the way to go 👍

@ezwelty
Copy link
Author

ezwelty commented Jun 19, 2021

@roll Glad you like it. But I wonder if I should also extract a new method field_to_sql_column (and its reverse) from resource_to_sql_table? Realistically, the data type conversion will become more complex in the future as support for more features are added.

Playing around, I notice that:

  • A single quote in a field.constraints.pattern will need to be escaped, since it currently leads to invalid SQL.

And I believe the column type conversions would be improved by:

  • Using VARCHAR(n) for a string with maxLength (instead of TEXT + CHECK)
  • Using CHAR(n) for a string with a minLength + maxLength that are equal

Should I start (a) separate issue for these, and (b) address them in separate pull requests or in this current pull request?

@roll
Copy link
Member

roll commented Jun 26, 2021

@ezwelty
Sorry for the slow reply. I would start with package.to_sql_metadata and what it needs and then we can build on top. If I got you correctly. I would not decompose it too much as there is always a catch like relations etc

Feel free to fix it in the PR or create an issue; any would be helpful 👍

@roll
Copy link
Member

roll commented Jun 26, 2021

I'm partially available this and next week so I'm cc @alexkreidler if he has ideas on this one (it will require some diving in)

@roll roll added the feature New functionality label Jan 10, 2022
@ezwelty
Copy link
Author

ezwelty commented Sep 22, 2022

@roll Swinging back around to this old pull request.

Package.to_sql_metadata() would be as simple as the following, but unfortunately as the code is structured, this throws us into circular import hell.

  • plugins.sql.SqlStorage: Depends on plugins.sql.SqlDialect, plugins.sql.SqlConverter, and Package
  • plugins.sql.SqlConverter: Depends on plugins.sql.SqlDialect, Package (for sql_metadata_to_package(), although this could be moved to Package.from_sql_metadata()), and Resource (for sql_table_to_resource(), although this could be moved to Resource.from_sql_table())
  • Package.to_sql_metadata(): Depends on plugins.sql.SqlConverter

I think a fix might look like this:

  • Move SqlDialect and SqlConverter to a separate module from SqlStorage.
  • Drop SqlConverter dependencies on Package and Resource (e.g. by making sql_metadata_to_package() and sql_table_to_resource() return dict descriptors rather than class instances)

Thoughts?

from .plugins.sql import SqlConverter

# ...

    def to_sql_metadata(self, *, sadialect=None, dialect=None):
        """Convert package to SQLAlchemy MetaData

        Parameters:
            sadialect (any): SQLAlchemy dialect.
                For example, `sqlalchemy.dialects.postgresql.dialect()`.
                See https://docs.sqlalchemy.org/en/latest/dialects.
            dialect (dict): SQL dialect. The following attributes are relevant:

                - namespace (str): SQL schema name for all tables.
                    See https://docs.sqlalchemy.org/en/latest/core/metadata.html#specifying-the-schema-name.
                - prefix (str): prefix for all table names

        Returns:
            sqlalchemy.MetaData
        """
        converter = SqlConverter(sadialect=sadialect, dialect=dialect)
        return converter.package_to_sql_metadata(self)

@roll
Copy link
Member

roll commented Oct 10, 2022

Hi @ezwelty!

Since the PR had been introduced we add a new concept called Manager (i.e. CkanManager/GithubManager/etc) which has to highly simplify the code of the old legacy Storage API (to replace the Storage API).

If this work is not urgent I would suggest us to wait the migration from SqlStorage to SqlManager (we will be on soon) and I guess it will resolve this issue too because it will be way easier to re-use the needed code.

WDYT?

@ezwelty
Copy link
Author

ezwelty commented Oct 24, 2022

@roll I think that sounds great and I'm fine with waiting until the migration to Manager is completed.

@roll roll self-assigned this Nov 18, 2022
@roll roll closed this in #1304 Nov 18, 2022
@roll roll removed their assignment Nov 18, 2022
@roll
Copy link
Member

roll commented Nov 18, 2022

Hi @ezwelty,

I'm releasing [email protected] that includes formats.sql.SqlMapper -- can you please try it?

@ezwelty
Copy link
Author

ezwelty commented Nov 22, 2022

@roll I agree that this looks much cleaner. A few comments on the API:

  • Why is it schema <> table and not resource <> table? It seems strange that SqlMapper.read_schema() reads a sa.Table (which has a name) and returns a Schema (that doesn't).
  • Why does SqlMapper.read_field() take a type and SqlMapper.write_field() return a type? It seems these should read/write sa.Column (with logic currently in read/write_schema, or be renamed to *_field_type.
  • SqlMapper() should not require an sa.engine.Engine. All you need is the dialect, which can be as simple as a str:
import sqlalchemy as sa

dialect: str = 'postgresql'
dialect_obj = sa.dialects.registry.load(dialect)
  • Do you plan to transfer over the bug fixes and improvements I made on this branch?

@roll
Copy link
Member

roll commented Nov 23, 2022

Thanks @ezwelty!

I'll be incorporating your feedback soon. Also, sorry I missed the bug fixes from this PR, I'll transfer them too

@roll
Copy link
Member

roll commented Jan 5, 2023

I've created a new issue for further work - #1365

@roll
Copy link
Member

roll commented Feb 17, 2023

Thanks a lot @ezwelty,

I'm releasing [email protected] with almost all the changes and feedback incorporated. Sorry, I didn't manage to merge the PR, but I added a credentials comment.

Although I changed every other mapping after your suggestions, I kept Schema <-> Table mapping because other Frictionless mappers map schema, and sa.Table is just a useful container for basically Schema <-> Columns/Constrains mapping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Public API for datapackage <> sql metadata conversion
2 participants