- Switched to Apache 2 license
- Minor changes
Note: this is not an exhaustive list of the changes, but the most important ones. See Towards Data Science article for more details: https://towardsdatascience.com/data-transformations-in-scala-with-gallia-version-0-4-0-is-out-f0b8df3e48f3
- Metaschema: see MetaSchema.scala and example usage in MetaSchemaTest.scala
- Union Types: limited support
- See union_types.md
- See code: Info.scala
- See usage: UnionTypeTest.scala (shows typical usage, hints, and
fuseToUnion
/fissionFromUnion
functionalities)
- Additional Basic Types:
- Enums: see EnumTest.scala
- Binary data: see UncommonTypesTest.scala
- Temporal data (LocalDateTime, ...): see TimeTest.scala
- I/O:
- Input:
- New data-class based construct, eg:
case class Foo(s: String, i: Int) aobjFromCaseClass(Foo("hello", 3)).[...]
- Homogenization of the "tax mechanism" for JSON/Table streaming (eg automatic Int conversion), see GsonToGalliaData.scala
- New data-class based construct, eg:
- Output:
- Improved support for "naked" value(s) (
gallia.heads.HeadV
) output, see HeadVTest.scala
- Improved support for "naked" value(s) (
- Both:
- Apache Avro: Added support to read/write Avro files; usage:
// libraryDependencies += "io.github.galliaproject" %% "gallia-avro" % "0.4.0" import gallia.avro._ "./episodes.avro".streamAvro().[...] [...].writeAvro("/tmp/foo.avro")
- Apache Parquet: Added support to read/write Parquet files (via Avro); usage
// libraryDependencies += "io.github.galliaproject" %% "gallia-parquet" % "0.4.0" import gallia.parquet._ "./episodes.avro".streamParquet().[...] [...].writeParquet("/tmp/foo.parq")
- Apache Avro: Added support to read/write Avro files; usage:
- Input:
- Integration:
- Python: Experimentation with Python integration (and soon R), via the excellent ScalaPy by Shadaj Laddad
- Pandas: see ScalaPyPandasTest.scala
- Seaborn (for visualization): see GalliaVizTest.scala#L37
- Java: See GalliaJava.scala gist (early attempt), mostly hindered by this Scala bug, which would make it prohibitively verbose to use Gallia in Java, at least as it is
- Python: Experimentation with Python integration (and soon R), via the excellent ScalaPy by Shadaj Laddad
- Transformations::
transformDataClass
: see DataClassesTest.scalacotransformViaDataClass
: see DataClassesTest.scala- custom `.aggregateBy: see AggregatingTest.scala
- custom
.reduce
: see ReducingTest.scala .removeIf/.setDefault
"if value for": see RemoveIfTest.scala.dropWhile/.takeWhile
: see FilterByTest.scala.zipSameSize
: see MergingTest.scala.unpivotOne
: see UnpivotTest.scala.asNewKeys[SomeEnum]
: see DeserializeTest.scalafilterBy(X).isPresent/isMissing
: see FilterByTest.scalafilterOutEmptyLines
: see FilterByTest.scala.custom
mechanism improvements: see CustomTest.scala- Renamed operations: (unchanged behaviors)
zen
->thn
: see ForXTest.scalauntuplify
->deserialize
see DeserializeTest.scala
- Optimizations:
- Experimental code for memory optimization of dense entities (basically
class Obg9(size: Int, data: Array[Any])
), along with some example operations: see code at Obg9.scala and usage at Obg9Test.scala - "Spilling" mechanism optimizations: see GalliaSpilling.scala
- Experimental code for memory optimization of dense entities (basically
- Execution DAG improvements:
- For Iterator mode (
gallia.streamer.IteratorStreamer
):- Forking: see IteratorStreamer.scala#fork()
- Data regeneration (via closure): see IteratorStreamer.scala#from()
- Added support for more combinations:
HeadV -> HeadO
,HeadV -> HeadZ
,(HeadV, HeadV) -> HeadV
, andHeadV -> HeadO
for instance for the likes of ReducingTest.scala#dressUp() - More tests for edge cases: see e.g. GraphTest.scala#diamond()
- For Iterator mode (
Not tracked here, see commits
Not tracked here, see commits