Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database Connection Library #1351

Closed
6 of 8 tasks
sylwiabr opened this issue Dec 13, 2020 · 1 comment · Fixed by #1565
Closed
6 of 8 tasks

Database Connection Library #1351

sylwiabr opened this issue Dec 13, 2020 · 1 comment · Fixed by #1565
Assignees

Comments

@sylwiabr
Copy link
Member

sylwiabr commented Dec 13, 2020

Summary

In industry it is incredibly common to store data for later analysis in various kinds of database. To that end it is incredibly important for Enso to be able to connect to databases as part of workflows, using them as sources for data that is then processed and visualised.

Value

Enso will be able to connect to a variety of commonly-used databases.

Specification

  • Create a new library included as part of the standard library set called something akin to Database.
  • Determine how to build an API that is consistent with the Enso dataframes API. It should allow use of a database seamlessly as the backing store (no need to load the contents into memory where possible) for a dataframe.
  • Determine whether the library should build SQL queries itself, or whether it should be backed by JDBC. As part of this, determine whether the PostgreSQL JDBC driver generates SQL code as an output.
  • The API should abstract the operations on the database as a DSL on the Enso node.
  • The API should be designed in a fluent fashion, such that a query is built through use of an in-Enso DSL, and then executed on demand (e.g. when a visualisation is shown or a result requested). The query should be executed lazily instead of eagerly.
  • The API should be native to Enso, and not expose implementation details that have an impedance mismatch with Enso's semantics.
  • The library should be designed in an extensible manner, allowing it to function with multiple databases. Initially we only intend to support PostgreSQL, but the more easily it can be extended to other databases (e.g. Snowflake, MySQL, SQLite), the better.

Acceptance Criteria & Test Cases

@iamrecursion iamrecursion changed the title Library for creating and processing databases Database Connection Library Dec 14, 2020
@iamrecursion iamrecursion modified the milestones: Sprint 2021-01-04, Sprint 2021-01-18 Jan 6, 2021
@iamrecursion iamrecursion modified the milestones: Sprint 2021-01-04, Sprint 2021-01-18 Jan 18, 2021
@iamrecursion iamrecursion modified the milestones: Sprint 2021-01-18, Sprint 2021-02-01 Jan 29, 2021
@iamrecursion iamrecursion modified the milestones: Sprint 2021-02-01, Sprint 2020-02-15 Feb 12, 2021
@iamrecursion iamrecursion removed this from the Sprint 2020-02-15 milestone Feb 26, 2021
@radeusgd
Copy link
Member

Writing here to not forget to discuss this:

I've noticed that we are currently not handling an edge case for join both in Table and Database libraries:

we use suffixes to disambugate duplicate column names when joining two tables, but what if both tables contain a column named A, our suffixes are _left and _right, but one of these tables already contains A_left too? I think we will get an inconsistent state with the table containing two columns with equal names.

@kustosz how do you think we should handle this? The most straightforward for me is to just detect these situations and issue an error asking to rename the columns. Any other semi-automatic solutions is in my opinion likely to confuse users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants