Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic merge from master to 5.x branch #3554

Merged

Conversation

tsg
Copy link
Contributor

@tsg tsg commented Feb 7, 2017

Minor conflicts solved in the docs.

Please do a merge commit on this one, don't squash.

ruflin and others added 30 commits January 20, 2017 12:05
Now that 5.x is relase, we do not need the configs for the 1.x release in the docs anymore.
Provide match.Matcher and match.ExactMatcher using regular expressions for
matching use-case only.

The matchers compile a regular expression into a Matcher, which only provides
the Match functionality. This gives us a chance to optimize/replace some common
cases used for matching:
- replace capture-groups by non-capturing groups
- remove leading/trailing `.*` expressions (Match already searches for
  sub-string matching the regex)
- replace simple literal searches with `==` and `strings.Contains` and
  `strings.startsWith`
- replace regex for alternative literals (e.g. `DEBUG|INFO|ERROR`) with
  strings.Contains over set of literals
- optimized empty-lines checks

If input regular expression can not be matched to a simple case, regexp.Regexp
will be used.

The `ExactMatcher` will embedd `<regex>` into `^<regex>$` by default.

Note: Matcher does currently not split simple cases. e.g. `abc.*def` or
`abc.def` will still fallback to regexp.Regexp.
This should simplify the implementation of MetricSets based on HTTP.
* Load Filebeat modules pipelines on -setup

This adds the `-setup` CLI flag, which, for now, makes Filebeat load the
pipelines at startup. In case Elasticsearch is not available when
Filebeat is started with `-setup`, Filebeat will exit with an error.

This also exposes an Elasticsearch client from the output.

* Use an interface instead of the ES client
This PR allows to dynamically reload prospectors. It works the same way as module reloading in metricbeat.

**Refactoring**

* LoadStates was separated from NewProspector. The reason is that after New only the ID is needed and setting up states requires more calculations. So this can be done in a second step when all the validations are done.
* Only allow to start a prospector when all states are set to Finished. If not, LoadStates returns an error. This is to prevent a prospector starting before a harvester finished with a file. The prospector will be picked up again during the next reloading phase.
* Extract ReloadConfig to libbeat

**Limitations**

This implementation currently has the some limitations. This are not new in filebeat but require more care as configurations change more often.

* Two prospectors on one file: It is possible, that two prospectors pick up one file because they defined overlapping patterns. This can have the consequence that two harvesters on the same file are running which can lead to duplicates and unpredictable behaviour. The risk is minimized in that a prospector does not start as long as a state it takes care of is not finished. But it can still happen that a Finished state is picked up but it also managed by an other prospector. The user must ensure no prospector paths overlap. This problem can potentially be solved in the future with a global harvester registry.

**Notes**

* In a later PR, more refactoring and unification of the reloading should happen.
This changes the Makefile to run coverage-report after fix-permissions. This allows coverage-report to be able to fully traverse the contents the build dir.
Uses an intermediary `_meta/module.generated/` folder which is created on
`make collect`. This gives us the opportunity to select which files are
needed to be deployed in the package (currently everything but the `_meta` and
`test` folders).

The intermediary folder also allows us to avoid providing a custom `install-home`
target in Filebeat.

Part of elastic#3159.
This removes the dependency on cookiecutter for generating a beat based on Metricbeat.

Further changes:

* Unify handling of beat_path in the generator scripts
* Merge generator script for both beats types
* Move metricbeat generator out of metricset directory as it generates a beat and metricset is only a follow up call.
* Enabled metricbeat modules by default to have something to get started.
This adds a new "pipeline" configuration option to the prospector, which
can be used to set the Elasticsearch Ingest Node pipeline from the prospector
config.

While this was already possible by using format strings in the `pipeline` config
from the output, this makes the configuration simpler in many cases and the
mechanism is needed for the Filebeat modules.

Part of elastic#3159.
The tcp procol generator is packetbeat specific. Similar to module and metricset generator it belongs inside the beat.

* The generator was migrated from cookiecutter to a python script to not have additional dependency.
* A makefile target was added to simplify the generation

In the future collect should fetch all protocols and add them automatically to the import to have it the same as for metricbeat. In addition it should be possible based on the global generator to create a packetbeat "shell" to put in own protocols.
Configuration looks like this:

```
output.elasticsearch.headers:
  X-My-Header: Contents of the header
```

To use from the CLI:

```
metricbeat -E "output.elasticsearch.headers.X-test=Test value"
```

It's not possible to set the same header name more than once with different values,
but it is possible to separate header values with a comma, which has the same meaning
as per the RFC.

Closes elastic#1768.
The list of files that are owned by root after running the build in docker extends outside of the build dir so just check everything in the repo. These are the files that this change will fix.

```
./filebeat/tests/system/test_load.pyc
./filebeat/tests/system/test_harvester.pyc
./filebeat/tests/system/filebeat.pyc
./filebeat/tests/system/test_fields.pyc
./filebeat/tests/system/test_multiline.pyc
./filebeat/tests/system/test_shutdown.pyc
./filebeat/tests/system/test_processors.pyc
./filebeat/tests/system/test_registrar.pyc
./filebeat/tests/system/test_modules.pyc
./filebeat/tests/system/test_publisher.pyc
./filebeat/tests/system/test_prospector.pyc
./filebeat/tests/system/test_reload.pyc
./filebeat/tests/system/test_json.pyc
./filebeat/tests/system/test_crawler.pyc
./filebeat/tests/system/test_migration.pyc
./filebeat/tests/files/logs/nasa-50k.log
./filebeat/data
find: `./filebeat/data': Permission denied
./filebeat/filebeat.test
./metricbeat/tests/system/test_kafka.pyc
./metricbeat/tests/system/test_base.pyc
./metricbeat/tests/system/metricbeat.pyc
./metricbeat/tests/system/test_docker.pyc
./metricbeat/tests/system/test_processors.pyc
./metricbeat/tests/system/test_mongodb.pyc
./metricbeat/tests/system/test_haproxy.pyc
./metricbeat/tests/system/test_postgresql.pyc
./metricbeat/tests/system/test_config.pyc
./metricbeat/tests/system/test_zookeeper.pyc
./metricbeat/tests/system/test_redis.pyc
./metricbeat/tests/system/test_reload.pyc
./metricbeat/tests/system/test_prometheus.pyc
./metricbeat/tests/system/test_system.pyc
./metricbeat/tests/system/test_apache.pyc
./metricbeat/tests/system/test_mysql.pyc
./metricbeat/metricbeat.test
./metricbeat/data
find: `./metricbeat/data': Permission denied
./libbeat/dashboards/import_dashboards
./libbeat/tests/system/test_base.pyc
./libbeat/tests/system/beat/beat.pyc
./libbeat/tests/system/beat/__init__.pyc
./libbeat/tests/system/base.pyc
./libbeat/tests/system/test_dashboard.pyc
./libbeat/libbeat.test
./libbeat/data
find: `./libbeat/data': Permission denied
```
* Add docker environment for integration and system testing
* Add system test file to check for correct docs.
* Brings docs in line with generated output
* Update data.json
* Remove hostname fields as already part of metricset
* Apply schema instead of manual conversion
* Rename pool.pool to pool.name
* Remove separate http client as not needed anymore

This is a follow up PR for elastic#3415
* Update data.json with latest data
* Clean up variable naming
* Add system test to also check docs
* Introduce healthcheck in dockerfile to allow automated testing
* Update CHANGELOG
* Rename wrongly named file
elastic#3427)

Expose some simple metrics collection via `libbeat/monitoring` package.

Changes/Features:
- 30s metrics snapshot is now based on `libbeat/monitoring`, adding:
  - support for bool/float/string variables
  - keys are sorted before being printed
-  The package manages a hierarchical registry of known KPIs (names are split
   on `.`) with optional support for registering created KPIs to expvar package
   (metrics registered via expvar can not be removed).
- All metrics registered must support the `monitoring.Var` interface providing
  a `Visit` method for reporting metric values. The `monitoring.Visitor`
  explicitly limits the type of values being reportable, so no `interface{}`
  will be used, simplifying/unifying reporting/collecting metrics.
- Registry provides `Do` method to iterate all variables, with names being
  'flattened' and thanks to limitations imposed by `monitoring.Var/Visitor` the
  values reported can only be int64, float64, string, bool or untyped nil.
- Having a registry lowers chances of typos. e.g.

```
var (
    metrics := monitoring.Default.NewRegistry("libbeat.outputs.logstash")
    bytesSend := metrics.NewInt("bytes_send")
    ...
)
```

- package adds support for dynamic removal of metrics
- provides adapter for go-metrics (e.g. collect stats from kafka output) with
  selective whitelisting/renaming... for collecting stats we otherwise would
  have no access to.
This is a follow up for elastic#3405, to avoid having a nil moduleRegistry.
Also adds a unit test for that case.
Now that prospectors have a pipeline config option, use it in the
filebeat modules. This gets rid of the requirement to have a weird
pipeline option in the output, and removes one hack from the fields.

Part of elastic#3159.
This will allow to use the protocol generator and directly create the correct imports.
* Use matchers in processor conditionals

* Filebeat include/exclude lines/files using match.Matcher

* Update filebeat multiline to use match.Matcher

* Metricbeat system module whitelist using new string matcher

* Remove `match` conditional

* Update changelog
tsg and others added 28 commits February 3, 2017 14:43
This adds the Beat version to the pipeline ID, which means that if
we change the pipeline between versions, the new version will be used
automatically. It also means that one can run different versions of the
same Beat and the pipelines won't override each other. The pipelines
are loaded automatically on the Beat start.

Part of elastic#3159.
* Introduce Runner interface and wrappers
* Remove duplicated code in filebeat and metricbeat
Done for:
- nginx
- apache

Mysql seems to use windows event logs by default.

Syslog doesn't really exist on Windows. Currently no paths are defined, which
result in an error when starting Filebeat.

Part of elastic#3159.
On Linux cgroup metric collection will be enabled by default. The feature is no longer labeled as experimental. The configuration option has been renamed from `cgroups` to `process.cgroups.enabled`.
We use dynamic fields in some places and date detection is enabled by default so if a Beat sends a dynamic string field that looks like a date, the field mapping will be date. This means that all follow on data using the same field name must also be a date or else a field mapping exception will occur. Disabling date detection will make the generated index mappings more predicable.

Closes elastic#3389
…#3526)

* Move all docker image specific configs into the modules by using a Dockerfile for each module
* Move environment variables to module and link it from the composer file
* Remove duplicated code for `wait_for` function

This change will make it possible to partially auto generate the compose file and it brings the environment config closer to the module itself.
* Example for 'packetbeat.runopts' is now valid YAML.

* Log output now correctly dereferences UID, GID variables.
Configuration reloading -> beta[]
 Filebeat symlinks -> GA
 Filebeat harvester_limit -> GA
 Filebeat publish_async -> experimental & deprecated
 Using environment variables in the Configuration -> GA
 Metricbeat ceph -> beta[]
 Metricbeat couchbase -> beta[]
 Metricbeat docker -> beta[]
 Metricbeat haproxy -> GA
 Metricbeat kafka -> beta[]
 Metricbeat php_fpm -> beta[]
 Metricbeat prometheus -> beta[]

This also cleans up the CHANGELOG.
This replaces the `fields.source_type` hack with a `fileset` object that
resembles the Metricbeat `metricset` object.

The implementation still uses a hack: adds two hidden options to the prospector
config, but it's the smaller evil IMHO.

Part of elastic#3159.
Currently `close_timeout` does not apply in case the output is blocked. This PR changes the behavior of `close_timeout` to also close a file handler when the output is blocked.

It is important to note, that this closes the file handler but NOT the harvester. This is important as the closing of the harvester requires a state update to set `state.Finished=true`. If this would not happen and the harvester is closed, processing would not continue when the output becomes available again.

Previously the internal state of a harvester was updated when the event was created. This could lead to the issue that in case an event was not sent but the state update went through, that an event would be missing. This is now prevent by overwriting the internal state only when the event was successfully sent.

The done channels from prospector and harvester are renamed to be more obvious which one belongs to what: h.done -> h.prospectorDone, h.harvestDone -> h.done. As the harvester channel is close with the `stop` method in all cases `h.done` is sufficient in most places.

This PR does not solve the problem related to reloading and stopping a harvester mentioned in elastic#3511 (comment) related to reloading. This will be done in a follow up PR.
* Fix typo spacial to spatial. Change fieldnames couchbase.node.couch.*.actual_disk_size.* to couchbase.node.couch.*.disk_size.* for consistency.
* Add system tests to verify docs
* Update data.json
It seems in the most recent version of the Docker image the config path seems to have changed.
* Configuration files must not be writeable by other users  …
This PR adds enforcement of ownership and file permissions on configuration files. Any configuration file must be owned by the same user that the Beat is running as and the file must not be writable by anyone other than the owner.

This strict permission checking is limited to platforms with POSIX file permissions. The DACLs used by Windows are not checked at this time.

The check can be disabled on the CLI with `-strict.perms=false` or by setting env var `BEAT_STRICT_PERMS=false`.

* Update jenkins_ci to fix umask on git clone
During some testing with filebeat I realised that when a relative path glob is put into the filebeat config, the event will contain the relative path and also the state. In most cases this should not be an issue and so far no issues were reported.

For the state itself it is not an issue as they are compared based on inode/device. It could become an issue on restart in case a config was changed from a relative to an absolute path and the prospector does not detect, that the state would belong to the same prospector. This could also have an affect when migrating to this solutions. Old states could be left over in the registry file. But this requires, that someone was using relative paths before which was never recommended.
Add docs for -once command line option
@tsg tsg added the review label Feb 7, 2017
@monicasarbu monicasarbu merged commit 61ef81c into elastic:5.x Feb 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.