Skip to content

Latest commit

 

History

History
88 lines (86 loc) · 5.36 KB

HISTORY.rst

File metadata and controls

88 lines (86 loc) · 5.36 KB

Changelog

  • master
  • v0.4.1 (2017-05-27)
    • retries for failed partitions
    • improve pysparkling.streaming.DStream
    • updates to docs
  • v0.4.0 (2017-03-11)
    • major addition: pysparkling.streaming
    • updates to RDD.sample()
    • reorganized scripts and tests
    • added RDD.partitionBy()
    • minor updates to pysparkling.fileio
  • v0.3.23 (2016-08-06)
    • small improvements to fileio and better documentation
  • v0.3.22 (2016-06-18)
    • reimplement RDD.groupByKey()
    • clean up of docstrings
  • v0.3.21 (2016-05-31)
    • faster text file reading by using io.TextIOWrapper for decoding
  • v0.3.20 (2016-05-01)
    • Google Storage file system (using gs://)
    • dependencies: requests and boto are not optional anymore
    • aggregateByKey() and foldByKey() return RDDs
    • Python 3: use sys.maxsize instead of sys.maxint
    • flake8 linting
  • v0.3.19 (2016-03-06)
    • removed use of itertools.tee() and replaced with clear ownership of partitions and partition data
    • replace some remaining use of str() with format()
    • bugfix for RDD.groupByKey() and RDD.reduceByKey() for non-hashable values by @pganssle
    • small updates to docs and their build process
  • v0.3.18 (2016-02-13)
    • bring docs and Github releases back in sync
    • ... many updates.
  • v0.2.28 (2015-07-03)
    • implement RDD.sortBy() and RDD.sortByKey()
    • additional unit tests
  • v0.2.24 (2015-06-16)
    • replace dill with cloudpickle in docs and test
    • add tests with pypy and pypy3
  • v0.2.23 (2015-06-15)
    • added RDD.randomSplit()
    • saveAsTextFile() saves single file if there is only one partition (and does not break it out into partitions)
  • v0.2.22 (2015-06-12)
    • added Context.wholeTextFiles()
    • improved RDD.first() and RDD.take(n)
    • added fileio.TextFile
  • v0.2.21 (2015-06-07)
    • added doc strings and created Sphinx documentation
    • implemented allowLocal in Context.runJob()
  • v0.2.19 (2015-06-04)
  • v0.2.16 (2015-05-31)
    • add values(), union(), zip(), zipWithUniqueId(), toLocalIterator()
    • improve aggregate() and fold()
    • add stats(), sampleStdev(), sampleVariance(), stdev(), variance()
    • make cache() and persist() do something useful
    • better partitioning in parallelize()
    • logo
    • fix foreach()
  • v0.2.10 (2015-05-27)
    • fix fileio.codec import
    • support http://
  • v0.2.8 (2015-05-26)
    • parallelized text file reading (and made it lazy)
    • parallelized take() and takeSample() that only computes required data partitions
    • add example: access Human Microbiome Project
  • v0.2.6 (2015-05-21)
    • factor out fileio.fs and fileio.codec modules
    • merge WholeFile into File
    • improved handling of compressed files (backwards incompatible)
    • fileio interface changed to dump() and load() methods. Added make_public() for S3.
    • factor file related operations into fileio submodule
  • v0.2.2 (2015-05-18)
    • compressions: .gz, .bz2
  • v0.2.0 (2015-05-17)
    • proper handling of partitions
    • custom serializers, deserializers (for functions and data separately)
    • more tests for parallelization options
    • execution of distributed jobs is such that a chain of map() operations gets executed on workers without sending intermediate results back to the master
    • a few more methods for RDDs implemented
  • v0.1.1 (2015-05-12)
    • implemented a few more RDD methods
    • changed handling of context in RDD
  • v0.1.0 (2015-05-09)