Skip to content

Commit

Permalink
first pass at fixing #4
Browse files Browse the repository at this point in the history
  • Loading branch information
josephlewis42 committed Oct 10, 2019
1 parent 3b5d560 commit 1bb7517
Show file tree
Hide file tree
Showing 7 changed files with 37 additions and 7 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
## 0.12.0

- Added support for `file_prefix` option for server-side filtering

## 0.11.0

- Change gzip file detection to use mime type instead of extension
Expand All @@ -10,7 +14,7 @@
## 0.9.0

- Initial release
- File inclusion/exclusion by
- File inclusion/exclusion by
- regex
- processed database
- metadata key
Expand Down
13 changes: 13 additions & 0 deletions docs/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ This plugin supports the following configuration options plus the <<plugins-{typ
| <<plugins-{type}s-{plugin}-json_key_file>> |<<path,path>>|No
| <<plugins-{type}s-{plugin}-interval>> |<<number,number>>|No
| <<plugins-{type}s-{plugin}-file_matches>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-file_prefix>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-file_exclude>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-metadata_key>> |<<string,string>>|No
| <<plugins-{type}s-{plugin}-processed_db_path>> |<<path,path>>|No
Expand Down Expand Up @@ -206,6 +207,18 @@ The number of seconds between looking for new files in your bucket.
A regex pattern to filter files. Only files with names matching this will be considered.
All files match by default.

[id="plugins-{type}s-{plugin}-file_prefix"]
===== `file_prefix`

added[0.12.0]

* Value type is <<string,string>>
* Default is: ``

A prefix filter applied server-side. Only files starting with this prefix will
be fetched from Cloud Storage. This can be useful if all the files you want to
process are in a particular folder and want to reduce network traffic.

[id="plugins-{type}s-{plugin}-file_exclude"]
===== `file_exclude`

Expand Down
15 changes: 13 additions & 2 deletions lib/logstash/inputs/cloud_storage/client.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,27 @@ module Inputs
module CloudStorage
# Client provides all the required transport and authentication setup for the plugin.
class Client
def initialize(bucket, json_key_path, logger)
def initialize(bucket, json_key_path, logger, blob_prefix='')
@logger = logger
@bucket = bucket
@blob_prefix = blob_prefix

# create client
@storage = initialize_storage json_key_path
end

java_import 'com.google.cloud.storage.Storage'
def list_blobs
@storage.list(@bucket).iterateAll().each do |blobname|
# NOTE: there is the option to filter which fields are returned by
# the call. If we find the bandwidth overhead is too much it would be
# possible (but tedious) to filter the returned fields to just those
# that this plugin uses.
filter = []
if @blob_prefix != ''
filter = [Storage::BlobListOption.prefix(@blob_prefix)]
end

@storage.list(@bucket, filter.to_java).iterateAll().each do |blobname|

This comment has been minimized.

Copy link
@indera-shsp

indera-shsp Oct 10, 2019

@mc0 I think this where the client is passing the option to the server to filter server-side...

yield LogStash::Inputs::CloudStorage::BlobAdapter.new(blobname)
end
rescue Java::ComGoogleCloudStorage::StorageException => e
Expand Down
3 changes: 2 additions & 1 deletion lib/logstash/inputs/google_cloud_storage.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ class LogStash::Inputs::GoogleCloudStorage < LogStash::Inputs::Base
config :interval, :validate => :number, :default => 60

# Inclusion/Exclusion Criteria
config :file_prefix, :validate => :string, :default => ''
config :file_matches, :validate => :string, :default => '.*\\.log(\\.gz)?'
config :file_exclude, :validate => :string, :default => '^$'
config :metadata_key, :validate => :string, :default => 'x-goog-meta-ls-gcs-input'
Expand All @@ -39,7 +40,7 @@ class LogStash::Inputs::GoogleCloudStorage < LogStash::Inputs::Base
def register
FileUtils.mkdir_p(@temp_directory) unless Dir.exist?(@temp_directory)

@client = LogStash::Inputs::CloudStorage::Client.new(@bucket_id, @json_key_file, @logger)
@client = LogStash::Inputs::CloudStorage::Client.new(@bucket_id, @json_key_file, @logger, @file_prefix)

if @processed_db_path.nil?
ls_data = LogStash::SETTINGS.get_value('path.data')
Expand Down
2 changes: 1 addition & 1 deletion logstash-input-google_cloud_storage.gemspec
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Gem::Specification.new do |s|
s.name = 'logstash-input-google_cloud_storage'
s.version = '0.11.0'
s.version = '0.12.0'
s.licenses = ['Apache-2.0']
s.summary = 'Plugin to import log data from Google Cloud Storage (GCS).'
s.description = 'This gem is a Logstash plugin required to be installed on top of the '\
Expand Down
2 changes: 1 addition & 1 deletion spec/inputs/cloud_storage/client_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

it 'does not throw an error when initializing' do
key_file = ::File.join('spec', 'fixtures', 'credentials.json')
LogStash::Inputs::CloudStorage::Client.new('my-bucket', key_file, logger)
LogStash::Inputs::CloudStorage::Client.new('my-bucket', key_file, logger, 'prefix')
end
end
end
3 changes: 2 additions & 1 deletion spec/inputs/google_cloud_storage_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
'processed_db_path' => processed_db_dir,
'temp_directory' => download_dir,
'delete' => true,
'unpack_gzip' => false
'unpack_gzip' => false,
'file_prefix' => '/some/prefix/here'
}
}

Expand Down

0 comments on commit 1bb7517

Please sign in to comment.