Skip to content

Commit

Permalink
Support PowerTrack v2
Browse files Browse the repository at this point in the history
  • Loading branch information
Laurent Farcy committed Aug 10, 2016
1 parent 827ff9e commit 0f2e7b2
Show file tree
Hide file tree
Showing 14 changed files with 280 additions and 102 deletions.
2 changes: 1 addition & 1 deletion Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
PATH
remote: .
specs:
powertrack (1.1.1)
powertrack (1.2.0)
em-http-request (~> 1.1)
eventmachine (~> 1.0)
exponential-backoff (~> 0.0.2)
Expand Down
6 changes: 6 additions & 0 deletions History.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
v1.2.0
------

* Support PowerTrack v2
- Rule validator and Replay v2 not supported yet

v1.1.1
------

Expand Down
32 changes: 29 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,14 +102,14 @@ Backfill is a feature provided by GNIP to avoid losing activities when being
disconnected. It automatically resends the messages sent on the stream for the
last 5 minutes when reconnecting.

Provide a (numerical) client id as the last (but optional) argument of the
PowerTrack::Stream constructor to enable this feature.
Provide a (numerical) client id by setting the ```:client_id``` option when
building a ```PowerTrack::Stream``` object to enable this feature.

## Replay

Replay is a feature provided by GNIP to recover lost activities over the last
5 days. The Replay stream lives aside the realtime stream and is activated
by setting the ```:replay``` option to true when building a ```PowerTrack::Stream```
by setting the ```:replay``` option to ```true``` when building a ```PowerTrack::Stream```
object.

Once Replay is activated, you use the stream as previously, starting by
Expand All @@ -128,6 +128,32 @@ replaying the same timeframe again and again when GNIP is unstable.
All the errors that come from PowerTrack are defined through an ad-hoc exception
class hierarchy. See ```lib/powertrack/errors.rb```.

## PowerTrack v2

The library provides early support for PowerTrack API version 2. Please read
[PowerTrack API v2](http://support.gnip.com/apis/powertrack2.0/index.html) and
the [Migration Guide](http://support.gnip.com/apis/powertrack2.0/transition.html)
for details about this new major release.

Set the ```:v2```option to ```true``` when building a ```PowerTrack::Stream```
object to enable this feature. The library uses v1 by default.

Everything should work the same for v2 as for v1 except

o ```PowerTrack::Stream.add_rule``` and ```PowerTrack::Stream.delete_rule```
returns a status instead of nil
o The Backfill feature is configured by the ```:backfill_minutes``` option passed
to the ```PowerTrack::Stream.track``` method instead of passing a ```:client_id```
option to the ```PowerTrack::Stream``` initializer (which is simply ignored
when v2 is turned on). The new option specifies a number of minutes of backfill
data to receive.
o The Replay feature still uses v1 even if you explicitly turn v2 on. Support
for [Replay v2](http://support.gnip.com/apis/replay2.0/api_reference.html) is
planned but not scheduled yet.

Finally, PowerTrack v2 has a new endpoint for rule validation that is not
supported by this library yet.

## Credits

The ```powertrack``` gem heavily relies on *EventMachine* and the *em-http-request*
Expand Down
28 changes: 24 additions & 4 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ See [Data format](http://support.gnip.com/sources/twitter/data_format.html)
* _[DONE]_ Support Original output format
* _[DONE]_ Support Activity Stream output format
* _[DONE]_ Support raw format
*

* _[OUT]_ Manage retweets.
See [Identifying and Understanding retweets](http://support.gnip.com/articles/identifying-and-understanding-retweets.html)

Expand All @@ -71,13 +71,33 @@ See [Managing disconnections](http://support.gnip.com/articles/disconnections-ex
* _[DONE]_ Reconnect after disconnect. See
[Disconnections & Reconnecting](http://support.gnip.com/apis/consuming_streaming_data.html#Disconnections)
* _[DONE]_ Reconnect using an exponential backoff pattern.
* _[DONE]_ Support Backfill
* Support Replay
* _[DONE]_ Support Backfill (v1)
* _[DONE]_ Support Replay (v1)
* Reconnect when there's a GNIP server issue signaled by the 503 HTTP response status

## Other features

* _[DONE]_ Support test and development streams
* _[DONE]_ Support Replay mode (5-days back history)
* Support status dashboard
* Support Historical Powertrack
* Support Historical PowerTrack

## PowerTrack v2
See [Migration Guide](http://support.gnip.com/apis/powertrack2.0/transition.html)
and [PowerTrack API v2](http://support.gnip.com/apis/powertrack2.0/index.html).

* _[DONE]_ Support both v1 and v2 with the same interface/class
* _[DONE]_ Support new endpoint URLs
* Support rule validator
* Support new operators and quoted tweet filtering.
Double-check with tests that the gem does not prevent their usage
* _[DONE]_ Support new backfill behavior
* _[DONE]_ Support fixed backfill period used at first connection
* _[DONE]_ Support fixed backfill period used at each reconnect
* Support dynamic backfill period at each reconnect, calibrated according to
the number of minutes the stream was disconnected. Emit a warning if the
stream was disconnected more than 5 minutes (tweets were probably lost)
* _[DONE]_ Use HTTP POST verb (instead of DELETE) for rule deletions
* _[DONE]_ Fallback to v1 when Replay mode wants to use v2. Emit a warning.
* Support Replay v2
[Replay API 2.0 Reference](http://support.gnip.com/apis/replay2.0/api_reference.html)
30 changes: 23 additions & 7 deletions lib/powertrack/rules/rule.rb
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,32 @@ class Rule
# The maximum number of negative terms in a single rule value
MAX_NEGATIVE_TERMS = 50

attr_reader :value, :tag, :error

# Builds a new rule based on a value and an optional tag.
# The default rule features
DEFAULT_RULE_FEATURES = {
# no id by default
id: nil,
# no tag by default
tag: nil,
# long determined by value length
long: nil
}.freeze

attr_reader :value, :id, :tag, :error

# Builds a new rule based on a value and some optional features
# (:id, :tag, :long).
#
# By default, the constructor assesses if it's a long rule or not
# based on the length of the value. But the 'long' feature can be
# explicitly specified with the third parameter.
def initialize(value, tag=nil, long=nil)
# explicitly specified with the :long feature.
def initialize(value, features=nil)
@value = value || ''
@tag = tag
features = DEFAULT_RULE_FEATURES.merge(features || {})
@tag = features[:tag]
@id = features[:id]
# check if long is a boolean
@long = long == !!long ? long : @value.size > MAX_STD_RULE_VALUE_LENGTH
_long = features[:long]
@long = _long == !!_long ? _long : @value.size > MAX_STD_RULE_VALUE_LENGTH
@error = nil
end

Expand Down Expand Up @@ -70,6 +85,7 @@ def to_json(options={})
def to_hash
res = {:value => @value}
res[:tag] = @tag unless @tag.nil?
res[:id] = @id unless @id.nil?
res
end

Expand Down
4 changes: 2 additions & 2 deletions lib/powertrack/rules/string_extension.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Extend core String class with a rule transformer
class String
# Returns a PowerTrace::Rule instance based on the value of the string.
def to_pwtk_rule(tag=nil, long=nil)
PowerTrack::Rule.new(self, tag, long)
def to_pwtk_rule(features=nil)
PowerTrack::Rule.new(self, features)
end
end
2 changes: 1 addition & 1 deletion lib/powertrack/streaming/data_buffer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ module PowerTrack
class DataBuffer

# The pattern used by GNIP PowerTrack to delimitate a single message.
MESSAGE_PATTERN = /^([^\r]*)\r\n/m
MESSAGE_PATTERN = /^([^\r]*)\r\n/m.freeze

# Builds a new data buffer.
def initialize
Expand Down
2 changes: 1 addition & 1 deletion lib/powertrack/streaming/retrier.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ class Retrier
max_elapsed_time: DEFAULT_MAX_ELAPSED_TIME,
multiplier: DEFAULT_INTERVAL_MULTIPLIER,
randomize_factor: DEFAULT_RANDOMIZE_FACTOR
}
}.freeze

# Builds a retrier that will retry a maximum retries number of times.
def initialize(max_retries, options=nil)
Expand Down
Loading

0 comments on commit 0f2e7b2

Please sign in to comment.