v0.5.0

Switched to Apache 2 license
Minor changes

v0.4.0

Note: this is not an exhaustive list of the changes, but the most important ones. See Towards Data Science article for more details: https://towardsdatascience.com/data-transformations-in-scala-with-gallia-version-0-4-0-is-out-f0b8df3e48f3

Metaschema: see MetaSchema.scala and example usage in MetaSchemaTest.scala
Union Types: limited support
- See union_types.md
- See code: Info.scala
- See usage: UnionTypeTest.scala (shows typical usage, hints, and fuseToUnion/fissionFromUnion functionalities)
Additional Basic Types:
- Enums: see EnumTest.scala
- Binary data: see UncommonTypesTest.scala
- Temporal data (LocalDateTime, ...): see TimeTest.scala

I/O:

Input:
- New data-class based construct, eg:
```
 case class Foo(s: String, i: Int)
 aobjFromCaseClass(Foo("hello", 3)).[...]
```
- Homogenization of the "tax mechanism" for JSON/Table streaming (eg automatic Int conversion), see GsonToGalliaData.scala
Output:
- Improved support for "naked" value(s) (gallia.heads.HeadV) output, see HeadVTest.scala

Both:

Apache Avro: Added support to read/write Avro files; usage:

 // libraryDependencies += "io.github.galliaproject" %% "gallia-avro" % "0.4.0"
 import gallia.avro._
 "./episodes.avro".streamAvro().[...]
 [...].writeAvro("/tmp/foo.avro")

Apache Parquet: Added support to read/write Parquet files (via Avro); usage

 // libraryDependencies += "io.github.galliaproject" %% "gallia-parquet" % "0.4.0"
 import gallia.parquet._
 "./episodes.avro".streamParquet().[...]
 [...].writeParquet("/tmp/foo.parq")

Integration:
- Python: Experimentation with Python integration (and soon R), via the excellent ScalaPy by Shadaj Laddad
  - Pandas: see ScalaPyPandasTest.scala
  - Seaborn (for visualization): see GalliaVizTest.scala#L37
- Java: See GalliaJava.scala gist (early attempt), mostly hindered by this Scala bug, which would make it prohibitively verbose to use Gallia in Java, at least as it is
Transformations::
- transformDataClass: see DataClassesTest.scala
- cotransformViaDataClass: see DataClassesTest.scala
- custom `.aggregateBy: see AggregatingTest.scala
- custom .reduce: see ReducingTest.scala
- .removeIf/.setDefault "if value for": see RemoveIfTest.scala
- .dropWhile/.takeWhile: see FilterByTest.scala
- .zipSameSize: see MergingTest.scala
- .unpivotOne: see UnpivotTest.scala
- .asNewKeys[SomeEnum]: see DeserializeTest.scala
- filterBy(X).isPresent/isMissing: see FilterByTest.scala
- filterOutEmptyLines: see FilterByTest.scala
- .custom mechanism improvements: see CustomTest.scala
- Renamed operations: (unchanged behaviors)
  - zen -> thn: see ForXTest.scala
  - untuplify-> deserialize see DeserializeTest.scala
Optimizations:
- Experimental code for memory optimization of dense entities (basically class Obg9(size: Int, data: Array[Any])), along with some example operations: see code at Obg9.scala and usage at Obg9Test.scala
- "Spilling" mechanism optimizations: see GalliaSpilling.scala
Execution DAG improvements:
- For Iterator mode (gallia.streamer.IteratorStreamer):
  - Forking: see IteratorStreamer.scala#fork()
  - Data regeneration (via closure): see IteratorStreamer.scala#from()
- Added support for more combinations: HeadV -> HeadO, HeadV -> HeadZ, (HeadV, HeadV) -> HeadV, and HeadV -> HeadO for instance for the likes of ReducingTest.scala#dressUp()
- More tests for edge cases: see e.g. GraphTest.scala#diamond()

v0.3.X

Not tracked here, see commits

v0.2.X

Not tracked here, see commits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

v0.5.0

v0.4.0

v0.3.X

v0.2.X

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

v0.5.0

v0.4.0

v0.3.X

v0.2.X