-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added JDBC-integration #451
Conversation
@koperagen @Jolanrensen @ermolenkodev please have a look to be familiar with the code/changes. |
Awesome job! I'll leave some small comments here and there for now and play around with it a bit :) |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Outdated
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/Jdbc.kt
Show resolved
Hide resolved
core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/annotations/ImportDataSchema.kt
Show resolved
Hide resolved
dataframe-jdbc/src/test/kotlin/org/jetbrains/kotlinx/dataframe/io/imdbTest.kt
Show resolved
Hide resolved
@@ -372,6 +375,127 @@ class SchemaGeneratorPluginIntegrationTest : AbstractDataFramePluginIntegrationT | |||
result.task(":build")?.outcome shouldBe TaskOutcome.SUCCESS | |||
} | |||
|
|||
@Test | |||
// TODO: test is broken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a dataframe version mismatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the gradle plugin uses the bootstrap version of dataframe, which does not yet include your added functions (like readSqlTable
etc.). I remember from working on OpenAPI and adding support to the Gradle plugin that I continuously had to publishToMavenLocal because the Gradle plugin cannot use the source files directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@koperagen am I correct?
...frame-gradle-plugin/src/main/kotlin/org/jetbrains/dataframe/gradle/GenerateDataSchemaTask.kt
Outdated
Show resolved
Hide resolved
* @param [tableName] the name of the table to read data from. | ||
* @return the DataFrame containing the data from the SQL table. | ||
*/ | ||
public fun DataFrame.Companion.readSqlTable(dbConfig: DatabaseConfiguration, tableName: String): AnyFrame { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be possible to define the DatabaseConfiguration
in a little DSL as well, instead of a class instance? like:
DataFrame.readSqlTable("tableName") {
url = ""
user = ""
password = ""
}
Would be a bit more Dataframe-like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data class is easier to reuse imo. Like, i tried to experiment with SQLite db: read multiple tables, made multiple readSqlTable
calls. With data class i created a variable. With DSL it would be more tiresome, especially in notebook where there're no refactorings (?) like extract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easier to reuse yes, but not easier to type when calling the function. Maybe we could both support a DatabaseConfiguration and just arguments in the function call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, for now, a lot of functions, we tracked all the ideas here or I created a separate issues for that.
I want to give users ability to taste it with the status experimental to handle more cases
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Show resolved
Hide resolved
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/readJdbc.kt
Show resolved
Hide resolved
* @param [dbConfig] the database configuration to connect to the database, including URL, user, and password. | ||
* @return a list of [AnyFrame] objects representing the non-system tables from the database. | ||
*/ | ||
public fun DataFrame.Companion.readAllTables(dbConfig: DatabaseConfiguration): List<AnyFrame> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readAllSqlTables maybe? This is defined on the entire DataFrame.Companion scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this doesn't return the names of the tables right? may be better to return a map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another suggestion for the future: better notebooks support. We can generate a type safe wrapper for this map. From user perspective it'll be:
val tables = DataFrame.readAllTables(...)
next cell (each table has schema):
tables.employee
tables.tracks
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explored API after proposed changes and found that for meta information better to add separate methods or use methods returning a data schema
When I changed it to map, it overcomplicated a path to the dataframes
But I also thought, that it's great to have a field name for dataframes and one global space of all created dataframes to search between them by name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One global space of all created dataframes would break the functional aspect of Kotlin DataFrame. They are immutable and any modification makes a new instance.
We could have a DataFrame Collection of some kind, but that would not be much different than a Map, or be able to name DataFrames.
Naming a DataFrame is not that far-fetched, since essentially, a DataFrame and ColumnGroup are the same and ColumnGroups can be named too. That said, currently variable names are used to refer to a DataFrame. There would be a mismatch when people would write: val dataFrameName = dataFrameOf(...) named "otherDataFrameName"
.
Maybe in the meantime, you could return a DataFrame containing a "name: ValueColumn" and "df: FrameColumn" column, which, especially in notebooks, would be a clear way for users to access multiple dataframes using a single one.
Thanks for the initiative! We are using Trino: Would it be possible to make the DbType not sealed? |
@PaulWoitaschek thanks for the feedback, I will think about making nor sealed |
For inspiration you can also check how exposed did it: |
dataframe-jdbc/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/db/MariaDb.kt
Show resolved
Hide resolved
|
||
override fun toColumnSchema(tableColumnMetadata: TableColumnMetadata): ColumnSchema { | ||
return when (tableColumnMetadata.sqlType) { | ||
"CHARACTER", "CHAR" -> ColumnSchema.Value(typeOf<String>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about nullability? If i understand correctly, all rs.get*
methods might return nulls since SQL tables can store them. Thus, if your table has nulls and you use a generated schema, code will throw an NPE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean all the column values with NULL or justa few NULLs in the column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean all the column values with NULL or justa few NULLs in the column?
Enhanced SQL Database Integration via JDBC API
This PR marks a significant advancement in our project by introducing seamless integration with SQL databases through the JDBC API, addressing issues #212 and #124.
Key Features:
This enhancement empowers our framework to read data from SQL databases with remarkable flexibility, offering the following reading regimes:
Each reading regime allows for either utilizing a provided database connection or establishing a connection internally if the database configuration details are available. Furthermore, every method within these regimes offers two variants: with and without result limits.
Schema Information:
In addition to data retrieval, this enhancement provides the capability to obtain schema information for the results. You can retrieve the schema of individual tables, ResultSet objects, SQL queries, or even acquire schemas for all non-system tables within the database.
Supported Databases:
This implementation currently supports the following databases:
Please note that while PostgreSQL is supported, certain data types may not be fully accommodated yet.
To gain a comprehensive understanding of this new functionality, I strongly recommend consulting the documentation.
Note: The plugin support is a work in progress and not yet complete; kindly disregard this aspect for now. Your feedback and contributions are highly valued as we continue refining this feature.