Releases: snowplow/snowplow
Release 92 Maiden Castle (2017-09-11)
Improving EmrEtlRunner
EmrEtlRunner
- Release lock in case of no-op (#3396)
- Treat archive_enriched and archive_shredded as separate steps (#3401)
- Do not pass --skip shred to RDB Loader when skipping RDB Shredder (#3403)
- If RDB Loader step hangs and is cancelled, logs are not retrieved (#3399)
- Ensure appropriate log level for RDB logs (#3369)
- Unlink downloaded RDB logs (#3363)
- Do not try to download non-existent RDB loader log files (#3405)
- Rescue the intermittent RestClient::SSLCertificateNotVerified error (#2572)
- Pass GZIP compression argument to S3DistCp as "gz" not "gzip" (#3415)
- Update rdb_loader version in config.yml.sample to 0.13.0 (#3418)
- Bump to 0.28.0 (#3404)
Documentation
- Fix broken links in storage/postgres's README.md (#3390)
RDB Shredder and Loader
- Moved to https://github.com/snowplow/snowplow-rdb-loader (#3393 and #3398)
Release 91 Stonehenge (2017-08-17)
EmrEtlRunner robustness.
EmrEtlRunner
- Use S3DistCp not Sluice for staging step (#276)
- Add an S3DistCp step for the _SUCCESS file produced by RDB Shredder (#3137)
- Add step to delete raw events from HDFS before shredding (#2545)
- Use S3DistCp to move raw files from S3 to HDFS for all collector formats (#3136)
- Add file- and Consul-based locking mechanism (#3352)
- Move current behavior into a
run
command (#3104) - Add
lint
command which validates Iglu resolver and enrichments (#1946) - Add backend for a
generate
command (#3105) - Add --resume-from option (#3128)
- Remove support for --start and --end flags (#3132)
- Remove support for --process-enrich and --process-shred flags (#3365)
- Handle run= sub-folders if resuming from shred (#2693)
- Add "ongoing run" message on exit with return code 4 (#3129)
- Add "no logs to process" message on exit with return code 3 (#2644)
- Retrieve RDB loader logs only when it failed or the entire run was successful (#3361)
- Bump rspec to 3.5.0 (#3116)
- Bump to 0.27.0 (#3358)
Release 90 Lascaux (2017-07-26)
StorageLoader reboot.
Common
- Update CI/CD to push S3 artifacts to all regional Hosted Assets buckets (#3242)
- Add CI/CD to deploy RDB Loader to Snowplow Hosted Assets (#3025)
- No longer bundle StorageLoader in Bintray download (#3024)
Event Manifest Populator
EmrEtlRunner
- Make targets loading consistent with enrichments (#3268)
- Expose arbitrary EMR configuration options (#3255)
- Add maximizeResourceAllocation option to EMR cluster configuration (#3253)
- Move max attempts configuration to EMR cluster configuration (#3246)
- Use Elasticity to specify Thrift-specific configuration (#3252)
- Bump elasticity version to 6.0.12 (#3249)
- Remove storage.download from config.yml.sample (#3265)
- Add rdb_loader to config.yml.sample (#3266)
- Add S3DistCp step to move enriched and shredded files to archive (#1777)
- Add RDB Loader step for each target (#3121)
- Bump to 0.26.0 (#3254)
RDB Loader
- Remove StorageLoader (#3026)
- Accept storage target JSONs on command-line (#3022)
- Rewrite StorageLoader in Scala, removing file archiving step (#3023)
- Fix eventual consistency problem (#3113)
- Load all runs from shredded, not just the first run found (#2962)
- Remove compupdate step (#3178)
- Add logging around database load, analyze and vacuum (#2935)
- Use Redshift-specific driver to connect to Redshift (#1830)
Storage
- Storage: replace example Redshift storage target configuration with 2-0-0 (#3281)
Trackers
- Java Tracker: bump git submodule to 0.8.2 (#3260)
- Ruby Tracker: bump git submodule to 0.6.1 (#3264)
- NET Tracker: bump git submodule to 1.0.2 (#3258)
- Python Tracker: bump git submodule to 0.8.0 (#3263)
- Golang Tracker: bump git submodule to 1.1.0 (#3259)
- Node.js Tracker: bump git submodule to 0.3.0 (#3262)
- Android Tracker: bump git submodule to 0.6.2 (#3257)
- JavaScript Tracker: bump git submodule to 2.8.0 (#3261)
Release 89 Plain of Jars (2017-06-12)
Ports the Snowplow batch pipeline to Spark.
Documentation
- Fix incorrect hyphen underlining for R88 (#3198)
Common
- Refactor CI/CD deploy scripts into one (#3100)
- Update CI/CD to deploy Spark Enrich (#3069)
- Refactor CI/CD is release tag scripts into one (#3101)
- Update CI/CD to deploy RDB Shredder (#3038)
- Fix travis build due to the changes to the precise image (#3210)
- Build local Scala Common Enrich before publishing Kinesis-related artifacts (#3220)
- Add Sonatype credentials to .travis.yml (#3217)
- Bump Scala to 2.11 in .travis.yml (#3227)
Scala Common Enrich
- Bump to 0.25.0 (#3089)
- Bump scala-iglu-client to 0.5.0 (#3092)
- Remove scala-util (#3054)
- Get rid of deprecated erasure method calls (#3008)
- Bump scalaz to 7.0.9 (#3055)
- Bump scalding-args to 0.13.0 (#3058)
- Bump specs2 to 2.3.13 (#3059)
- Bump scalaz-specs2 to 0.2 (#3060)
- Bump scala-forex to 0.5.0 (#3057)
- Bump sbt to 0.13.13 (closes #3056)
- Bump Scala to 2.11.11 (#3007)
- Add Scala 2.11 cross-building (#3061)
- Make EnrichedEvent Serializable (#3081)
- Fix failing WeatherEnrichmentSpec expectation (#3205)
- Remove ScalazArgs (#3209)
- Upgrade to Java 8 (#3212)
- Add CI/CD (#3216)
Spark Enrich
- Bump to 1.9.0 (#3072)
- Rename from Scala Hadoop Enrich (#3064)
- Change the package from hadoop to spark (#3076)
- Bump sbt-assembly to 0.14.3 (#3078)
- Bump SBT to 0.13.13 (#3065)
- Port from Scalding to Spark (#3067)
- Bump scala-common-enrich to 0.25 (#3096)
- Bump Scalaz to 7.0.9 (#3097)
- Bump iglu-scala-client to 0.5.0 (#3098)
- Bump specs2-core to 2.3.13 (#3099)
- Bump Scala version to 2.11 (#3070)
- Upgrade to Java 8 (#2381)
- Fix SqlQueryEnrichmentCfLinesSpec (#3224)
- Fix CurrencyConversionTransactionSpec (#3225)
- Run the unit tests systematically in Travis (#3228)
EmrEltRunner
- Bump to 0.25.0 (#3039)
- Update to run Spark Enrich instead of Scala Hadoop Enrich (#3066)
- Update to run RDB Shredder instead of Scala Hadoop Shred (#3033)
- Add ability to run Spark jobs (#641)
- Replace hadoop_shred in config.yml.sample with rdb_shredder (#3035)
- Bump elasticity version to 6.0.11 (#3053)
- Use the Scalding step provided by Elasticity (#3052)
- Replace hadoop_enrich in config.yml.sample with spark_enrich (#3068)
- Bump AMI version in example config to 5.5.0 (#3207)
RDB Shredder
- Bump to 0.12.0 (#3042)
- Rename from Scala Hadoop Shred (#3031)
- Move from 3-enrich to 4-storage (#3032)
- Change the package to storage from enrich (#3036)
- Port from Scalding to Spark (#3034)
- Bump scala-common-enrich to 0.25 (#3091)
- Bump iglu-scala-client to 0.5.0 (#3090)
- Bump specs2-core to 2.3.13 (#3093)
- Bump Scala version to 2.11 (#3071)
- Upgrade to Java 8 (#3213)
- Run the unit tests systematically in Travis (#3229)
StorageLoader
Release 88 Angkor Wat (2017-04-27)
Introduces event de-duplication across different pipeline runs, powered by DynamoDB, along with an important refactoring of the batch pipeline configuration
Documentation
Documentation: fix incorrect release date for R87 (#3126)
Common
- Update copyright years in README (#3148)
- Add CI/CD for EmrEtlRunner and StorageLoader (#3102)
- Add CI/CD for Event Manifest Populator (#3170)
- Add AWS staging credentials to .travis.yml (#3114)
- Update script to sync ap-northeast-2 (Seoul) Snowplow Hosted Assets bucket (#3160)
- Update READMEs markdown in according with CommonMark (#3157)
Event Manifest Populator
- Add Spark job to backpopulate DynamoDB duplicate storage (#3158)
Scala Common Enrich
Scala Hadoop Shred
- Bump to 0.11.0 (#3041)
- Bump sbt-assembly to 0.14.4 (#3140)
- Bump SBT to 0.13.13 (#2972)
- Remove explicit jackson-databind dependency (#3138)
- Add cross-batch natural deduplication (#2999)
Storage
- Add example storage target configuration JSONs (#2990)
StorageLoader
- Bump to 0.10.0 (#3109)
- Remove Northern Virginia endpoint for Postgres load (#3143)
- Handle return code of 4 for EmrEtlRunner in snowplow-runner-and-loader.sh (#3139)
- Use storage target JSONs instead of targets section in config.yml (#2992)
- Replace table configuration property with schema (#2458)
EmrEtlRunner
- Bump to 0.24.0 (#3040)
- Update hadoop_shred version in config.yml.sample to 0.11.0 (#3197)
- Add script to convert config.yml targets section into JSON format (#3135)
- Remove targets section from config.yml.sample (#2989)
- No longer use sources property when loading Elasticsearch (#2993)
- Use storage target JSONs instead of targets section in config.yml (#2991)
R87 Chichen Itza
New features, stability enhancements and performance improvements for EmrEtlRunner and StorageLoader. As of this release EmrEtlRunner lets you specify EBS volumes for your Hadoop worker nodes; meanwhile StorageLoader now writes to a dedicated manifest table to record each load
EmrEtlRunner
- Bump to 0.23.0 (#2960)
- Bump JRuby version to 9.1.6.0 (#3050)
- Bump Elasticity to 6.0.10 (#3013)
- Remove AnonIpHash from contracts.rb (#2523)
- Remove UnmatchedLzoFilesError check (#2740)
- Use S3DistCp not Sluice for archive_raw step (#1977)
- Add warning about the array of in buckets in config.yml (#2462)
- Add dedicated return code of 4 for DirectoryNotEmptyError (#2546)
- Add support for specifying EBS for Hadoop workers (#2950)
- Add example EBS configuration to config.yml.sample (#3012)
- Catch Elasticity ThrottlingExceptions while waiting for EMR (#3028)
- Catch Elasticity ArgumentErrors while waiting for EMR (#3027)
StorageLoader
- Bump to 0.9.0 (#2961)
- Bump JRuby version to 9.1.6.0 (#3051)
- Fix typo in S3Tasks.download_events (#2888)
- Update manifest table as part of Redshift load transaction (#2280)
Redshift
- Added manifest table (#2265)
Release 86 Petra
Brings in-batch synthetic deduplication and data-modeling improvements.
Common
- Add AWS credentials to .travis.yml (#2963)
- Add CI/CD for Scala Hadoop Enrich (#2982)
- Add CI/CD for Scala Hadoop Shred (#2928)
- Migrate Hadoop Event Recovery deployment to Release Manager (#2983)
- Remove short-hostname addon from travis.yml (#2674)
- Update script to sync us-east-2 (Ohio) Snowplow Hosted Assets bucket (#2986)
- Update script to sync ca-central-1 (Montreal) Snowplow Hosted Assets bucket (#3004)
- Update script to sync eu-west-2 (London) Snowplow Hosted Assets bucket (#3005)
- Use AWS environment variables to sync Snowplow Hosted Assets buckets (#2985)
Scala Hadoop Shred
- Bump to 0.10.0 (#2979)
- Add general top-level exception handling (#2071)
- Get the CustomPartitionSourceTest working with Hadoop 2.4 (#1960)
- Fix omitted string interpolation (#2562)
- Deduplicate event_ids with different event_fingerprints (synthetic duplicates) (#24)
- Stop catching fatal errors (#1456)
Data Modeling
- Add drill fields to web block (#2956)
- Resolve issues with web model (#2954)
- Restrict table scan on deduplication queries (#2929)
- Add web model (#2925)
- Delete example models (#2836)
- Remove outdated recipes (#2626)
EmrEtlRunner
- Update hadoop_shred version in config.yml.sample to 0.10.0 (#3003)
Release 85 Metamorphosis
One of our hackathon projects at our Berlin company away-week: initial Kafka support for Snowplow
Scala Stream Collector
- Scala Stream Collector: bump to 0.9.0 (#2936)
- Scala Stream Collector: add Kafka sink (#2937)
- Scala Stream Collector: update config.hocon.sample to support Kafka (#2943)
- Scala Stream Collector: move sink.kinesis.buffer to sink.buffer in config.hocon.sample (#2938)
Stream Enrich
Release 84 Steller's Sea Eagle
Brings support for Elasticsearch 2.x to the Kinesis Elasticsearch Sink for both Transport and HTTP clients
Common
- Common: standardise sbt-assembly settings (#2900)
- Common: refactor Kinesis release CI/CD (#2887)
- Common: update script to sync ap-south-1 (Mumbai) Snowplow Hosted Assets bucket (#2903)
Scala Stream Collector
- Scala Stream Collector: bump to 0.8.0 (#2886)
- Scala Stream Collector: add scala_ into artifact filename in Bintray (#2843)
- Scala Stream Collector: use nuid query parameter value to set the 3rd party network id cookie (#2512)
- Scala Stream Collector: configurable cookie path (#2528)
- Scala Stream Collector: call Config.resolve() to resolve environment variables in hocon (#2879)
Stream Enrich
- Stream Enrich: bump to 0.9.0 (#2728)
- Stream Enrich: bump Scala Tracker to 0.3.0 (#2898)
- Stream Enrich: bump Scala Common Enrich to 0.24.0 (#2729)
- Stream Enrich: tolerate trailing slashes for paths in IP Lookups Enrichment configuration (#2744)
- Stream Enrich: call Config.resolve() to resolve environment variables in hocon (#2878)
Kinesis Elasticsearch Sink
- Kinesis Elasticsearch Sink: bump to 0.8.0 (#2885)
- Kinesis Elasticsearch Sink: bump Scala Tracker to 0.3.0 (#2899)
- Kinesis Elasticsearch Sink: allow parametrized timeouts for jest client (#2897)
- Kinesis Elasticsearch Sink: does not take into account buffer configurations (#2895)
- Kinesis Elasticsearch Sink: error messages are not helpful (#2896)
- Kinesis Elasticsearch Sink: ensure field names do not contain any dots (#2894)
- Kinesis Elasticsearch Sink: add support for Elasticsearch 2.x (#2525)
- Kinesis Elasticsearch Sink: call Config.resolve() to resolve environment variables in hocon (#2880)
Redshift
Release 83 Bald Eagle
Introduces our powerful new SQL Query Enrichment, long-awaited support for the EU Frankfurt AWS region, plus POST
support for our Iglu webhook adapter
Scala Tracker
- Bump git submodule to 0.3.0 (#2726)
ActionScript 3.0 Tracker
- Bump git submodule to 0.3.0 (#2727)
Scala Common Enrich
Scala Hadoop Enrich
- Bump to 1.8.0 (#2716)
- Bump Scala Common Enrich to 0.24.0 (#2717)
- Add test for SQL Query Enrichment (#2718)
- Make resolver config in JobSpecHelpers injectable (#2825)
EmrEtlRunner
- Bump to 0.22.0 (#2784)
- Bump Ruby version to 2.2.3 (#2869)
- Bump Sluice to 0.4.0 (#1708)
- Bump Contracts to 0.9 (#2789)
- Rebuild Gemfile.lock (#2872)
- Add version recognition of currently installed commons-codec (#2735)
- Update snowplow-ami4-bootstrap.sh to take optional commons-codec version argument (#2713)
- Fix bug with double compression in shred step if enrich skipped (#2586)
- Pass GZIP compression argument to S3DistCp as "gz" not "gzip" (#2679)
- Update hadoop_enrich version in config.yml.sample to 1.8.0 (#2756)
- Replace deprecated Dir.exists? with Dir.exist? (#2799)
- Fix contract for fatal_with (#2810)
- Use region-specific Snowplow Hosted Assets buckets (#2813)
- Disable contract on build_fix_filenames due to Contracts issue #238 (#2828)
Storage
- Add Kinesis S3 git submodule (#2706)
StorageLoader
- Bump to 0.8.0 (#2785)
- Bump Ruby version to 2.2.3 (#2870)
- Bump Sluice to 0.4.0 (#2786)
- Bump Contracts to 0.9 (#2790)
- Add explicit mime-types dependency (#2805)
- Rebuild Gemfile.lock (#2871)
- Use Northern Virginia endpoint not global endpoint for us-east-1 (#2748)
- Replace module_function everywhere with self (#2801)
- Fix broken contracts (#2461)
- Write JSON path for com.amazon.aws.lambda/s3_notification_event (#2590)
- Write JSON path for com.snowplowanalytics.snowplow/application_foreground/jsonschema/1-0-0 (#2857)
- Write JSON path for com.snowplowanalytics.snowplow/application_background/jsonschema/1-0-0 (#2856)
- Write JSON path for com.snowplowanalytics.snowplow/application_error/jsonschema/1-0-0 (#2855)
Redshift
- Add Redshift DDL for com.snowplowanalytics.snowplow/application_foreground/jsonschema/1-0-0 (#2854)
- Add Redshift DDL for com.snowplowanalytics.snowplow/application_background/jsonschema/1-0-0 (#2853)
- Add Redshift DDL for com.snowplowanalytics.snowplow/application_error/jsonschema/1-0-0 (#2852)
- Add Redshift DDL for com.amazon.aws.lambda/s3_notification_event/jsonschema/1-0-0 (#2589)