Skip to content

Commit

Permalink
Extracted subchapters (#391)
Browse files Browse the repository at this point in the history
* Splitting the page and add to the navigation bar

* Updated the pages

* Extracted one subchapter
  • Loading branch information
zaleslaw authored Jun 8, 2023
1 parent 8a36774 commit db85e02
Show file tree
Hide file tree
Showing 12 changed files with 344 additions and 238 deletions.
8 changes: 7 additions & 1 deletion docs/StardustDocs/d.tree
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,13 @@
</toc-element>
<toc-element topic="hierarchical.md"/>
<toc-element topic="schemas.md">
<toc-element topic="gradle.md"/>
<toc-element topic="schemasGradle.md"/>
<toc-element topic="schemasJupyter.md"/>
<toc-element topic="schemasInheritance.md"/>
<toc-element topic="schemasCustom.md"/>
<toc-element topic="schemasExternalJupyter.md"/>
<toc-element topic="schemasImportOpenApiGradle.md"/>
<toc-element topic="schemasImportOpenApiJupyter.md"/>
</toc-element>
</toc-element>
<toc-element topic="installation.md"/>
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/collectionsInterop.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ val df2 = df.add("c") { a + b }

<tip>

To enable extension properties generation you should use [dataframe plugin](gradle.md)
To enable extension properties generation you should use [dataframe plugin](schemasGradle.md)
for Gradle or [Kotlin jupyter kernel](installation.md)

</tip>
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/extensionPropertiesApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ In notebooks, extension properties are generated for [`DataSchema`](schemas.md)
instance after REPL line execution.
After that [`DataFrame`](DataFrame.md) variable is typed with its own [`DataSchema`](schemas.md), so only valid extension properties corresponding to actual columns in DataFrame will be allowed by the compiler and suggested by completion.

Extension properties can be generated in IntelliJ IDEA using the [Kotlin Dataframe Gradle plugin](gradle.md#configuration).
Extension properties can be generated in IntelliJ IDEA using the [Kotlin Dataframe Gradle plugin](schemasGradle.md#configuration).

<warning>
In notebooks generated properties won't appear and be updated until the cell has been executed. It often means that you have to introduce new variable frequently to sync extension properties with actual schema
Expand Down
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ tasks.withType(org.jetbrains.kotlin.gradle.tasks.KotlinCompile).configureEach {
</tabs>

Note that it's better to use the same version for a library and plugin to avoid unpredictable errors.
After plugin configuration you can try it out with [example](gradle.md#annotation-processing).
After plugin configuration you can try it out with [example](schemasGradle.md#annotation-processing).

### Custom configuration

Expand Down
253 changes: 19 additions & 234 deletions docs/StardustDocs/topics/schemas.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,248 +11,33 @@ It ignores order of columns in [`DataFrame`](DataFrame.md), but tracks column hi

In Jupyter environment compile-time [`DataFrame`](DataFrame.md) schema is synchronized with real-time data after every cell execution.

In IDEA projects, you can use the [Gradle plugin](gradle.md#configuration) to extract schema from the dataset
In IDEA projects, you can use the [Gradle plugin](schemasGradle.md#configuration) to extract schema from the dataset
and generate extension properties.

## DataSchema workflow in Jupyter

After execution of cell
## Popular use cases with Data Schemas

<!---FUN createDfNullable-->
Here's a list of the most popular use cases with Data Schemas.

```kotlin
val df = dataFrameOf("name", "age")(
"Alice", 15,
"Bob", null
)
```
* [**Data Schemas in Gradle projects**](schemasGradle.md) <br/>
If you are developing a server application and building it with Gradle.

<!---END-->
* [**DataSchema workflow in Jupyter**](schemasJupyter.md) <br/>
If you prefer Notebooks.

the following actions take place:
* [**Schema inheritance**](schemasInheritance.md) <br/>
It's worth knowing how to reuse Data Schemas generated earlier.

1. Columns in `df` are analyzed to extract data schema
2. Empty interface with [`DataSchema`](schema.md) annotation is generated:
* [**Custom Data Schemas**](schemasCustom.md) <br/>
Sometimes it is necessary to create your own scheme.

```kotlin
@DataSchema
interface DataFrameType
```
* [**Use external Data Schemas in Jupyter**](schemasExternalJupyter.md) <br/>
Sometimes it is convenient to extract reusable code from Jupyter Notebook into the Kotlin JVM library.
Schema interfaces should also be extracted if this code uses Custom Data Schemas.

3. Extension properties for this [`DataSchema`](schema.md) are generated:
* [**Import OpenAPI Schemas in Gradle project**](schemasImportOpenApiGradle.md) <br/>
When you need to take data from the endpoint with OpenAPI Schema.

```kotlin
val ColumnsContainer<DataFrameType>.age: DataColumn<Int?> @JvmName("DataFrameType_age") get() = this["age"] as DataColumn<Int?>
val DataRow<DataFrameType>.age: Int? @JvmName("DataFrameType_age") get() = this["age"] as Int?
val ColumnsContainer<DataFrameType>.name: DataColumn<String> @JvmName("DataFrameType_name") get() = this["name"] as DataColumn<String>
val DataRow<DataFrameType>.name: String @JvmName("DataFrameType_name") get() = this["name"] as String
```

Every column produces two extension properties:

* Property for `ColumnsContainer<DataFrameType>` returns column
* Property for `DataRow<DataFrameType>` returns cell value

4. `df` variable is typed by schema interface:

```kotlin
val temp = df
```

```kotlin
val df = temp.cast<DataFrameType>()
```

> _Note, that object instance after casting remains the same. See [cast](cast.md).
To log all these additional code executions, use cell magic

```
%trackExecution -all
```

## Schema inheritance

In order to reduce amount of generated code, previously generated [`DataSchema`](schema.md) interfaces are reused and only new
properties are introduced

Let's filter out all `null` values from `age` column and add one more column of type `Boolean`:

```kotlin
val filtered = df.filter { age != null }.add("isAdult") { age!! > 18 }
```

New schema interface for `filtered` variable will be derived from previously generated `DataFrameType`:

```kotlin
@DataSchema
interface DataFrameType1 : DataFrameType
```

Extension properties for data access are generated only for new and overriden members of `DataFrameType1` interface:

```kotlin
val ColumnsContainer<DataFrameType1>.age: DataColumn<Int> get() = this["age"] as DataColumn<Int>
val DataRow<DataFrameType1>.age: Int get() = this["age"] as Int
val ColumnsContainer<DataFrameType1>.isAdult: DataColumn<Boolean> get() = this["isAdult"] as DataColumn<Boolean>
val DataRow<DataFrameType1>.isAdult: String get() = this["isAdult"] as Boolean
```

Then variable `filtered` is cast to new interface:

```kotlin
val temp = filtered
```

```kotlin
val filtered = temp.cast<DataFrameType1>()
```

## Custom data schemas

You can define your own [`DataSchema`](schema.md) interfaces and use them in functions and classes to represent [`DataFrame`](DataFrame.md) with
specific set of columns:

```kotlin
@DataSchema
interface Person {
val name: String
val age: Int
}
```

After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be
generated. Now we can use these properties to create functions for typed [`DataFrame`](DataFrame.md):

```kotlin
fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName")
fun DataFrame<Person>.adults() = filter { age > 18 }
```

In Jupyter these functions will work automatically for any [`DataFrame`](DataFrame.md) that matches `Person` schema:

<!---FUN extendedDf-->

```kotlin
val df = dataFrameOf("name", "age", "weight")(
"Merton, Alice", 15, 60.0,
"Marley, Bob", 20, 73.5
)
```

<!---END-->

Schema of `df` is compatible with `Person`, so auto-generated schema interface will inherit from it:

```kotlin
@DataSchema(isOpen = false)
interface DataFrameType : Person

val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double>
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double
```

Despite `df` has additional column `weight`, previously defined functions for `DataFrame<Person>` will work for it:

<!---FUN splitNameWorks-->

```kotlin
df.splitName()
```

<!---END-->

```text
firstName lastName age weight
Merton Alice 15 60.000
Marley Bob 20 73.125
```

<!---FUN adultsWorks-->

```kotlin
df.adults()
```

<!---END-->

```text
name age weight
Marley, Bob 20 73.5
```

In JVM project you will have to [cast](cast.md) [`DataFrame`](DataFrame.md) explicitly to the target interface:

```kotlin
df.cast<Person>().splitName()
```

## Use external data schemas in Jupyter

Sometimes it is convenient to extract reusable code from Jupyter notebook into Kotlin JVM library. If this code
uses [Custom data schemas](#custom-data-schemas), schema interfaces should also be extracted. In order to enable support
them in Jupyter, you should register them in
library [integration class](https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md) with `useSchema`
function:

```kotlin
@DataSchema
interface Person {
val name: String
val age: Int
}

fun DataFrame<Person>.countAdults() = count { it[Person::age] > 18 }

@JupyterLibrary
internal class Integration : JupyterIntegration() {

override fun Builder.onLoaded() {
onLoaded {
useSchema<Person>()
}
}
}
```

After loading this library into Jupyter notebook, schema interfaces for all [`DataFrame`](DataFrame.md) variables that match `Person`
schema will derive from `Person`

<!---FUN createDf-->

```kotlin
val df = dataFrameOf("name", "age")(
"Alice", 15,
"Bob", 20
)
```

<!---END-->

Now `df` is assignable to `DataFrame<Person>` and `countAdults` is available:

```kotlin
df.countAdults()
```

## Import Data Schemas, e.g. from OpenAPI, in Jupyter

Similar to [importing OpenAPI data schemas in Gradle projects](gradle.md#openapi-schemas), you can also
do this in Jupyter notebooks. There is only a slight difference in notation:

Import the schema using any path (`String`), `URL`, or `File`:

```kotlin
val PetStore = importDataSchema("https://petstore3.swagger.io/api/v3/openapi.json")
```

and then from next cell you run and onwards, you can call, for example:

```kotlin
val df = PetStore.Pet.readJson("https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available")
```

So, very similar indeed!

(Note: The type of `PetStore` will be generated as `PetStoreDataSchema`, but this doesn't affect the way you can use
it.)
* [**Import Data Schemas, e.g. from OpenAPI, in Jupyter**](schemasImportOpenApiJupyter.md) <br/>
Similar to [importing OpenAPI Data Schemas in Gradle projects](schemasImportOpenApiGradle.md),
you can also do this in Jupyter Notebooks.
80 changes: 80 additions & 0 deletions docs/StardustDocs/topics/schemasCustom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
[//]: # (title: Custom Data Schemas)

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Schemas-->

You can define your own [`DataSchema`](schema.md) interfaces and use them in functions and classes to represent [`DataFrame`](DataFrame.md) with
specific set of columns:

```kotlin
@DataSchema
interface Person {
val name: String
val age: Int
}
```

After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be
generated. Now we can use these properties to create functions for typed [`DataFrame`](DataFrame.md):

```kotlin
fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName")
fun DataFrame<Person>.adults() = filter { age > 18 }
```

In Jupyter these functions will work automatically for any [`DataFrame`](DataFrame.md) that matches `Person` schema:

<!---FUN extendedDf-->

```kotlin
val df = dataFrameOf("name", "age", "weight")(
"Merton, Alice", 15, 60.0,
"Marley, Bob", 20, 73.5
)
```

<!---END-->

Schema of `df` is compatible with `Person`, so auto-generated schema interface will inherit from it:

```kotlin
@DataSchema(isOpen = false)
interface DataFrameType : Person

val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double>
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double
```

Despite `df` has additional column `weight`, previously defined functions for `DataFrame<Person>` will work for it:

<!---FUN splitNameWorks-->

```kotlin
df.splitName()
```

<!---END-->

```text
firstName lastName age weight
Merton Alice 15 60.000
Marley Bob 20 73.125
```

<!---FUN adultsWorks-->

```kotlin
df.adults()
```

<!---END-->

```text
name age weight
Marley, Bob 20 73.5
```

In JVM project you will have to [cast](cast.md) [`DataFrame`](DataFrame.md) explicitly to the target interface:

```kotlin
df.cast<Person>().splitName()
```
Loading

0 comments on commit db85e02

Please sign in to comment.