Automatic merge from master to 5.x branch #3554

tsg · 2017-02-07T21:35:54Z

Minor conflicts solved in the docs.

Please do a merge commit on this one, don't squash.

Now that 5.x is relase, we do not need the configs for the 1.x release in the docs anymore.

Provide match.Matcher and match.ExactMatcher using regular expressions for matching use-case only. The matchers compile a regular expression into a Matcher, which only provides the Match functionality. This gives us a chance to optimize/replace some common cases used for matching: - replace capture-groups by non-capturing groups - remove leading/trailing `.*` expressions (Match already searches for sub-string matching the regex) - replace simple literal searches with `==` and `strings.Contains` and `strings.startsWith` - replace regex for alternative literals (e.g. `DEBUG|INFO|ERROR`) with strings.Contains over set of literals - optimized empty-lines checks If input regular expression can not be matched to a simple case, regexp.Regexp will be used. The `ExactMatcher` will embedd `<regex>` into `^<regex>$` by default. Note: Matcher does currently not split simple cases. e.g. `abc.*def` or `abc.def` will still fallback to regexp.Regexp.

This should simplify the implementation of MetricSets based on HTTP.

* Load Filebeat modules pipelines on -setup This adds the `-setup` CLI flag, which, for now, makes Filebeat load the pipelines at startup. In case Elasticsearch is not available when Filebeat is started with `-setup`, Filebeat will exit with an error. This also exposes an Elasticsearch client from the output. * Use an interface instead of the ES client

This PR allows to dynamically reload prospectors. It works the same way as module reloading in metricbeat. **Refactoring** * LoadStates was separated from NewProspector. The reason is that after New only the ID is needed and setting up states requires more calculations. So this can be done in a second step when all the validations are done. * Only allow to start a prospector when all states are set to Finished. If not, LoadStates returns an error. This is to prevent a prospector starting before a harvester finished with a file. The prospector will be picked up again during the next reloading phase. * Extract ReloadConfig to libbeat **Limitations** This implementation currently has the some limitations. This are not new in filebeat but require more care as configurations change more often. * Two prospectors on one file: It is possible, that two prospectors pick up one file because they defined overlapping patterns. This can have the consequence that two harvesters on the same file are running which can lead to duplicates and unpredictable behaviour. The risk is minimized in that a prospector does not start as long as a state it takes care of is not finished. But it can still happen that a Finished state is picked up but it also managed by an other prospector. The user must ensure no prospector paths overlap. This problem can potentially be solved in the future with a global harvester registry. **Notes** * In a later PR, more refactoring and unification of the reloading should happen.

This changes the Makefile to run coverage-report after fix-permissions. This allows coverage-report to be able to fully traverse the contents the build dir.

Uses an intermediary `_meta/module.generated/` folder which is created on `make collect`. This gives us the opportunity to select which files are needed to be deployed in the package (currently everything but the `_meta` and `test` folders). The intermediary folder also allows us to avoid providing a custom `install-home` target in Filebeat. Part of elastic#3159.

This removes the dependency on cookiecutter for generating a beat based on Metricbeat. Further changes: * Unify handling of beat_path in the generator scripts * Merge generator script for both beats types * Move metricbeat generator out of metricset directory as it generates a beat and metricset is only a follow up call. * Enabled metricbeat modules by default to have something to get started.

This adds a new "pipeline" configuration option to the prospector, which can be used to set the Elasticsearch Ingest Node pipeline from the prospector config. While this was already possible by using format strings in the `pipeline` config from the output, this makes the configuration simpler in many cases and the mechanism is needed for the Filebeat modules. Part of elastic#3159.

The tcp procol generator is packetbeat specific. Similar to module and metricset generator it belongs inside the beat. * The generator was migrated from cookiecutter to a python script to not have additional dependency. * A makefile target was added to simplify the generation In the future collect should fetch all protocols and add them automatically to the import to have it the same as for metricbeat. In addition it should be possible based on the global generator to create a packetbeat "shell" to put in own protocols.

Configuration looks like this: ``` output.elasticsearch.headers: X-My-Header: Contents of the header ``` To use from the CLI: ``` metricbeat -E "output.elasticsearch.headers.X-test=Test value" ``` It's not possible to set the same header name more than once with different values, but it is possible to separate header values with a comma, which has the same meaning as per the RFC. Closes elastic#1768.

The list of files that are owned by root after running the build in docker extends outside of the build dir so just check everything in the repo. These are the files that this change will fix. ``` ./filebeat/tests/system/test_load.pyc ./filebeat/tests/system/test_harvester.pyc ./filebeat/tests/system/filebeat.pyc ./filebeat/tests/system/test_fields.pyc ./filebeat/tests/system/test_multiline.pyc ./filebeat/tests/system/test_shutdown.pyc ./filebeat/tests/system/test_processors.pyc ./filebeat/tests/system/test_registrar.pyc ./filebeat/tests/system/test_modules.pyc ./filebeat/tests/system/test_publisher.pyc ./filebeat/tests/system/test_prospector.pyc ./filebeat/tests/system/test_reload.pyc ./filebeat/tests/system/test_json.pyc ./filebeat/tests/system/test_crawler.pyc ./filebeat/tests/system/test_migration.pyc ./filebeat/tests/files/logs/nasa-50k.log ./filebeat/data find: `./filebeat/data': Permission denied ./filebeat/filebeat.test ./metricbeat/tests/system/test_kafka.pyc ./metricbeat/tests/system/test_base.pyc ./metricbeat/tests/system/metricbeat.pyc ./metricbeat/tests/system/test_docker.pyc ./metricbeat/tests/system/test_processors.pyc ./metricbeat/tests/system/test_mongodb.pyc ./metricbeat/tests/system/test_haproxy.pyc ./metricbeat/tests/system/test_postgresql.pyc ./metricbeat/tests/system/test_config.pyc ./metricbeat/tests/system/test_zookeeper.pyc ./metricbeat/tests/system/test_redis.pyc ./metricbeat/tests/system/test_reload.pyc ./metricbeat/tests/system/test_prometheus.pyc ./metricbeat/tests/system/test_system.pyc ./metricbeat/tests/system/test_apache.pyc ./metricbeat/tests/system/test_mysql.pyc ./metricbeat/metricbeat.test ./metricbeat/data find: `./metricbeat/data': Permission denied ./libbeat/dashboards/import_dashboards ./libbeat/tests/system/test_base.pyc ./libbeat/tests/system/beat/beat.pyc ./libbeat/tests/system/beat/__init__.pyc ./libbeat/tests/system/base.pyc ./libbeat/tests/system/test_dashboard.pyc ./libbeat/libbeat.test ./libbeat/data find: `./libbeat/data': Permission denied ```

Affected changes added

* Add docker environment for integration and system testing * Add system test file to check for correct docs. * Brings docs in line with generated output * Update data.json * Remove hostname fields as already part of metricset * Apply schema instead of manual conversion * Rename pool.pool to pool.name * Remove separate http client as not needed anymore This is a follow up PR for elastic#3415

* Update data.json with latest data * Clean up variable naming * Add system test to also check docs * Introduce healthcheck in dockerfile to allow automated testing * Update CHANGELOG * Rename wrongly named file

elastic#3427) Expose some simple metrics collection via `libbeat/monitoring` package. Changes/Features: - 30s metrics snapshot is now based on `libbeat/monitoring`, adding: - support for bool/float/string variables - keys are sorted before being printed - The package manages a hierarchical registry of known KPIs (names are split on `.`) with optional support for registering created KPIs to expvar package (metrics registered via expvar can not be removed). - All metrics registered must support the `monitoring.Var` interface providing a `Visit` method for reporting metric values. The `monitoring.Visitor` explicitly limits the type of values being reportable, so no `interface{}` will be used, simplifying/unifying reporting/collecting metrics. - Registry provides `Do` method to iterate all variables, with names being 'flattened' and thanks to limitations imposed by `monitoring.Var/Visitor` the values reported can only be int64, float64, string, bool or untyped nil. - Having a registry lowers chances of typos. e.g. ``` var ( metrics := monitoring.Default.NewRegistry("libbeat.outputs.logstash") bytesSend := metrics.NewInt("bytes_send") ... ) ``` - package adds support for dynamic removal of metrics - provides adapter for go-metrics (e.g. collect stats from kafka output) with selective whitelisting/renaming... for collecting stats we otherwise would have no access to.

`make fmt` now also includes the `-s` flag.

This is a follow up for elastic#3405, to avoid having a nil moduleRegistry. Also adds a unit test for that case.

Now that prospectors have a pipeline config option, use it in the filebeat modules. This gets rid of the requirement to have a weird pipeline option in the output, and removes one hack from the fields. Part of elastic#3159.

This will allow to use the protocol generator and directly create the correct imports.

* Use matchers in processor conditionals * Filebeat include/exclude lines/files using match.Matcher * Update filebeat multiline to use match.Matcher * Metricbeat system module whitelist using new string matcher * Remove `match` conditional * Update changelog

This adds the Beat version to the pipeline ID, which means that if we change the pipeline between versions, the new version will be used automatically. It also means that one can run different versions of the same Beat and the pipelines won't override each other. The pipelines are loaded automatically on the Beat start. Part of elastic#3159.

* Introduce Runner interface and wrappers * Remove duplicated code in filebeat and metricbeat

Done for: - nginx - apache Mysql seems to use windows event logs by default. Syslog doesn't really exist on Windows. Currently no paths are defined, which result in an error when starting Filebeat. Part of elastic#3159.

On Linux cgroup metric collection will be enabled by default. The feature is no longer labeled as experimental. The configuration option has been renamed from `cgroups` to `process.cgroups.enabled`.

We use dynamic fields in some places and date detection is enabled by default so if a Beat sends a dynamic string field that looks like a date, the field mapping will be date. This means that all follow on data using the same field name must also be a date or else a field mapping exception will occur. Disabling date detection will make the generated index mappings more predicable. Closes elastic#3389

…#3526) * Move all docker image specific configs into the modules by using a Dockerfile for each module * Move environment variables to module and link it from the composer file * Remove duplicated code for `wait_for` function This change will make it possible to partially auto generate the compose file and it brings the environment config closer to the module itself.

* Update docs * Update config file See elastic#3430

* Example for 'packetbeat.runopts' is now valid YAML. * Log output now correctly dereferences UID, GID variables.

Configuration reloading -> beta[] Filebeat symlinks -> GA Filebeat harvester_limit -> GA Filebeat publish_async -> experimental & deprecated Using environment variables in the Configuration -> GA Metricbeat ceph -> beta[] Metricbeat couchbase -> beta[] Metricbeat docker -> beta[] Metricbeat haproxy -> GA Metricbeat kafka -> beta[] Metricbeat php_fpm -> beta[] Metricbeat prometheus -> beta[] This also cleans up the CHANGELOG.

This replaces the `fields.source_type` hack with a `fileset` object that resembles the Metricbeat `metricset` object. The implementation still uses a hack: adds two hidden options to the prospector config, but it's the smaller evil IMHO. Part of elastic#3159.

Closes elastic#3104

See elastic#3546

Currently `close_timeout` does not apply in case the output is blocked. This PR changes the behavior of `close_timeout` to also close a file handler when the output is blocked. It is important to note, that this closes the file handler but NOT the harvester. This is important as the closing of the harvester requires a state update to set `state.Finished=true`. If this would not happen and the harvester is closed, processing would not continue when the output becomes available again. Previously the internal state of a harvester was updated when the event was created. This could lead to the issue that in case an event was not sent but the state update went through, that an event would be missing. This is now prevent by overwriting the internal state only when the event was successfully sent. The done channels from prospector and harvester are renamed to be more obvious which one belongs to what: h.done -> h.prospectorDone, h.harvestDone -> h.done. As the harvester channel is close with the `stop` method in all cases `h.done` is sufficient in most places. This PR does not solve the problem related to reloading and stopping a harvester mentioned in elastic#3511 (comment) related to reloading. This will be done in a follow up PR.

* Fix typo spacial to spatial. Change fieldnames couchbase.node.couch.*.actual_disk_size.* to couchbase.node.couch.*.disk_size.* for consistency. * Add system tests to verify docs * Update data.json

It seems in the most recent version of the Docker image the config path seems to have changed.

* Configuration files must not be writeable by other users … This PR adds enforcement of ownership and file permissions on configuration files. Any configuration file must be owned by the same user that the Beat is running as and the file must not be writable by anyone other than the owner. This strict permission checking is limited to platforms with POSIX file permissions. The DACLs used by Windows are not checked at this time. The check can be disabled on the CLI with `-strict.perms=false` or by setting env var `BEAT_STRICT_PERMS=false`. * Update jenkins_ci to fix umask on git clone

During some testing with filebeat I realised that when a relative path glob is put into the filebeat config, the event will contain the relative path and also the state. In most cases this should not be an issue and so far no issues were reported. For the state itself it is not an issue as they are compared based on inode/device. It could become an issue on restart in case a config was changed from a relative to an absolute path and the prospector does not detect, that the state would belong to the same prospector. This could also have an affect when migrating to this solutions. Old states could be left over in the registry file. But this requires, that someone was using relative paths before which was never recommended.

* Add docs for labels and tags

Add docs for -once command line option

Minor conflicts in the docs resolved.

ruflin and others added 30 commits January 20, 2017 12:05

Remove LS 1.x config from docs (elastic#3426)

ac7f7c9

Now that 5.x is relase, we do not need the configs for the 1.x release in the docs anymore.

Add HTTP helper for Metricsets (elastic#3413)

77cec5a

This should simplify the implementation of MetricSets based on HTTP.

Add connbeat to the list (elastic#3437)

59fa08b

Fix small spelling mistakes (elastic#3434)

4f41540

Generate coverage report after fixing file permissions (elastic#3445)

f1b59cd

This changes the Makefile to run coverage-report after fix-permissions. This allows coverage-report to be able to fully traverse the contents the build dir.

New PHP-FPM metricbeat module (elastic#3415)

58ed51f

Add CEPH Module with health metrics (elastic#3311)

f9031b9

Add HealthCheck information for Metricbeat docker module (elastic#3357)

b8a49d1

Add -s to gofmt to also simplify code (elastic#3461)

ccb977d

Affected changes added

Docker health metricset cleanup (elastic#3463)

37ed9da

* Update data.json with latest data * Clean up variable naming * Add system test to also check docs * Introduce healthcheck in dockerfile to allow automated testing * Update CHANGELOG * Rename wrongly named file

Fix processor_test.go build failure on 32-bit systems (elastic#3446)

badb214

Add beta header to every page (elastic#3458)

3265970

Remove simplify target in Makefile as not needed anymore (elastic#3464)

bb8301b

`make fmt` now also includes the `-s` flag.

Return empty registry if module path doesn't exist (elastic#3407)

cd9becd

This is a follow up for elastic#3405, to avoid having a nil moduleRegistry. Also adds a unit test for that case.

Auto generate imports for protocols in Packetbeat (elastic#3468)

d2168ac

This will allow to use the protocol generator and directly create the correct imports.

Add SSL flags to import_dashboards CLI help (elastic#3487)

e60a081

Bump version for xpack doc URL (elastic#3484)

8bea9af

tsg and others added 28 commits February 3, 2017 14:43

Remove step to set default index pattern (elastic#3507)

498aea5

Unify reloading (elastic#3439)

c06bafa

* Introduce Runner interface and wrappers * Remove duplicated code in filebeat and metricbeat

Removed edit_urls - no longer needed

2d260be

Enable cgroup metric collection by default (elastic#3519)

b8beed1

On Linux cgroup metric collection will be enabled by default. The feature is no longer labeled as experimental. The configuration option has been renamed from `cgroups` to `process.cgroups.enabled`.

Disk Metricsets for CEPH module (elastic#3499)

722d515

Rename reload config options according to elastic#3430 (elastic#3532)

21d875f

* Update docs * Update config file See elastic#3430

Fix docs, logs for setting UID/GID in Packetbeat (elastic#3543)

f13e66e

* Example for 'packetbeat.runopts' is now valid YAML. * Log output now correctly dereferences UID, GID variables.

Add docs for -once command line option

397a674

Closes elastic#3104

Add known issue for filebeat (elastic#3547)

2c869e4

See elastic#3546

Heartbeat HTTP sample dashboards (elastic#3531)

04f7147

Update go-ucfg to version 0.4.6 (elastic#3549)

f7cc4c7

Improve couchbase module (elastic#3545)

a8bd416

* Fix typo spacial to spatial. Change fieldnames couchbase.node.couch.*.actual_disk_size.* to couchbase.node.couch.*.disk_size.* for consistency. * Add system tests to verify docs * Update data.json

Fix php-fpm docker image (elastic#3552)

06e70d2

It seems in the most recent version of the Docker image the config path seems to have changed.

Remove exclude pattern from visualizations (elastic#3548)

360a3da

Update docker image with data.json (elastic#3523)

f6698a5

* Add docs for labels and tags

Merge pull request elastic#3535 from ruflin/once-option-docs

68458ea

Add docs for -once command line option

Add dashboards to the list of configuration options (elastic#3551)

60e7999

Merge branch 'master' into automatic_merge_from_master_to_5.x_branch

6206faf

Minor conflicts in the docs resolved.

version fix

0575fef

tsg added the review label Feb 7, 2017

monicasarbu merged commit 61ef81c into elastic:5.x Feb 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic merge from master to 5.x branch #3554

Automatic merge from master to 5.x branch #3554

tsg commented Feb 7, 2017

Automatic merge from master to 5.x branch #3554

Automatic merge from master to 5.x branch #3554

Conversation

tsg commented Feb 7, 2017