Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAPI/Swagger JSON type schema support + many small fixes I came across #173

Merged
merged 90 commits into from
Nov 25, 2022

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Sep 30, 2022

As described by #142

This PR contains a custom OpenAPI -> DF Marker conversion and implementation in the Gradle- and KSP plugin.
There are also a lot of tests, but I'm not yet 100% confident I've caught every edge case (there are many...).
Docs are updated too.
Let me know if there are any major things (or minor ones, all are welcome) that need changing or if you have some more testing ideas.

Edit:
Since I changed so much I'm gonna provide a small overview of the changes I made. Might make it easier to review. I'll go through the changed files:

  • Linting changes (sorry, but improves readability)
  • Docs for OpenAPI and jsonoptions
  • replaced all occurrences of "splitted" with "split" because English
  • updated kotlinDatetime and made it api() for jupyter etc.
  • ImportDataSchema gained jsonOptions with typeClashTactic and keyValuePaths
  • starting with kdoc in some functions (more will come later)
  • convertTo
    • convertIf in dsl: way more powerful than convert(KType, KType), allows you to specify via a condition function whether you want to do a certain conversion and provides fromType and toSchema in ConverterScope. convert still has priority over convertIf
    • convertTo can now happen for any empty df (both 0 rows or 0 columns)
    • manual converters can now happen for any type of target column, not just value columns. user conversion happens before df conversion making some conversions that were previously impossible possible.
    • Value columns of datarows can become column groups, same as value columns with all nulls.
    • value columns of dataframes can become frame columns
    • absent columns can be created iff it's a nullable typed value column, DataRow<Something?> for a group column or frame column.
  • containsNoData() helper function (created that once but don't use it anymore, might still be useful? opinion needed)
  • enums can implement DataSchemaEnum to control how they are (en/de)coded from/to datasets instead of using just their name
  • code generation can now generate enums and type aliases too
  • if a generated interface contains a reference to another @DataSchema, the type will no longer be wrapped in a DataRow since that unneccesary.
  • DefaultReadDfMethod now provides actual Marker instead of just a name
  • nullability helpers for FieldType
  • bugfix for MarkersExtractor regarding nullableProperties
  • contains in BaseColumn is now operator fun
  • Bugfix for concat of DataFrames without columns (rows must still count up, else schema conversions will break)
  • String+Number can become Comparable, this was always the case but order dependent. That now always gives the same result thanks to a bugfix in commonParents() and commonType()
  • ISO_DATE_TIME support in parser from string
  • CodeGenerator.Companion.urlReader is now split into urlDfReader and urlCodeGenReader where the former is used to generate a dataframe from the data and types from that, while the latter directly generates types (for openapi)
  • Similar split for SupportedFormat into SupportedDataFrameFormat and SupportedCodeGenerationFormat
  • createColumn()
    • bugfix for empty iterables always making column groups. Now uses guesstype or creates value column
    • bugfix for iterables with just nulls always making a frame column, now also uses guesstype or creates value column
    • can now create Column group with iterable of datarows too
  • small fix in printing newlines of data schemas
  • createEmptyColumn and createEmptyDataFrame variants with numberOfRows
  • ColumnSchema, aside from type now also has contentType for extra type info for Group- and Frame columns (instead of just getting DataRow<*> or DataFrame<*>. Very useful for new convertIf method. intersectSchema also tries to merge these contentTypes if possible. extractSchema will make them Any?.
  • toSnakeCase() helper function
  • json reader:
    • distinction in guess.kt between normal json and openapi json
    • typeClashTactics ANY_COLUMNS and ARRAY_AND_VALUE_COLUMNS (original and default)
    • keyValuePaths where using given JsonPath, the objects will be read as DataFrame<KeyValueProperty>
  • openapi:
    • can decode json/yaml openapi 3.0 type specifications and produce CodeWithConverter with all DataSchema interfaces, type aliases and enums, as well as readJson functions which automatically fill in keyValuePaths and other conversions necessary.
    • objects with just additionalProperties will become key/value dataframes.
    • objects with properties & additionalProperties will ignore additionalProperties
  • importDataSchema() function for Jupyter for SupportedCodeGenerationFormats like openapi
  • DSL for JupyterConfiguration
  • bugfix for resolution overload ambiguity in DataFrame.get(vararg IntRage)
  • examples for openapi and keyvalue, both in jupyter and normal kt files
  • JsonOptions in KSP and Gradle plugin
  • tests
    Edit:
  • Values in columns are now only "listified" if suggestedType is given as list, not automatically anymore. Also adds optional listifyValues argument to guessValueType() and baseType()
  • all other references of baseType are now handled by commonType
  • acceptsSample function to SupportedFormat instances solve json/openapi clashes
  • openapi in new module

@Jolanrensen
Copy link
Collaborator Author

okay, I looked over additionalProperties... https://swagger.io/docs/specification/data-models/dictionaries/
Let's see how I can add that


if (from.isSubtypeOf(to)) try {
return (this as DataColumnInternal<*>).changeType(to.withNullability(hasNulls()))
} catch (e: UnsupportedOperationException) { /* */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you remember why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe this fixed a bug where converting from ColumnGroupImpl or FrameColumnImpl to something would crash because it doesn't have the changeType option. However, convertTo can now support conversions for any type of column, so the exception is caught here to give any user-given conversions a try first, otherwise, a TypeConverterNotFoundException is thrown.

}

fun getConverter(from: KType, to: KType): Converter? {
return converters[from.withNullability(false) to to.withNullability(false)]
override fun convertIf(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some representative example for this API? Honestly, i can't easily understand recursive list thing :(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For convertIf? Sure! It's not that difficult. convert.with can only support exact type matches, which can be a problem when you cannot define exact types for everything you want to catch. Let's say you want to write a converter that would convert all numbers to strings:

convert<Number>.with { it.toString() }

This wouldn't work, since typeOf<Double>() != typeOf<Number>(). That's where convertIf can come in handy. With this function you can manufacture any type of converter you like and you can use the full capabilities of Kotlin reflect you like:

convertIf({ fromType, toSchema -> 
    fromType.withNullability(false).isSubtypeOf(typeOf<Number>()) &&
        toSchema.type.withNullability(false) == typeOf<String>()
}) {
    it.toString()
}

For OpenApi I needed to provide a converter for DataRow -> Any, however, these converters only work on top-level types, so if a DataRow->Any conversion needs to happen in a List<List<List<DataFrame<*>>>, then we need my recursive function which makes convertIf accept any of these types and convert it accordingly.

}

/** Parse and read OpenApi specification to [DataSchema] interfaces. */
public fun readOpenApi(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make DataFrame.Companion receiver?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you don't create a DataFrame ;)

import kotlin.reflect.full.isSubtypeOf
import kotlin.reflect.jvm.jvmErasure
import kotlin.reflect.typeOf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about splitting this file? also, at some point we started to put formats in different modules

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might definitely be a good idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay splitting off openapi works up 'till the part where I decide to read a file as json or as openapi. Need to think of something clever or universal to make that work elegantly :)

@Jolanrensen Jolanrensen merged commit 0089ed3 into master Nov 25, 2022
@Jolanrensen Jolanrensen deleted the new-open-api branch November 25, 2022 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants