You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a process that is posting measurements with a timestamp linked to the data processed to telegraf. Usually, as it is working on (almost) realtime data, the timestamp are more or less current. However, occasionally, it can be asked to reprocess old data and the measurements will be send again but at the original timestamps.
If there is a default retention policy on the database, when reprocessing older data, all the metrics in Chronograf dashboard are delayed by a few minutes. (How much seems to vary between environment). When that process stops emitting past events, the dashboard still lag a bit before returning to normal.
Setting the retention policy is critical to reproduce. It causes partial writes at Influx level and telegraf seems a bit confused and appears to hold hostage other measurements even if issued by another input plugin. However I have not seen any missing measurement value when it gets back to normal.
Use the telegraf configuration provided in attachment
Start the environment using : docker-compose up
Create a default retention policy on telegraf database: docker-compose run influxdb-cli -execute 'CREATE RETENTION POLICY realtime ON telegraf DURATION 4w REPLICATION 1 DEFAULT;'
Open a browser on chronograf (localhost:8888), go to host list, use the "system" dashboard for your host.
Setup refresh to "Every 10s" and timerange to "Past 15 minutes".
Wait a few minutes to have data points collected
Validate that the collected data is up-to-date (for example, use the tooltip on the CPU usage measurements to validate the time)
Then begin to reproduce:
Post a few events in the past beyond the retention policy: curl -i -XPOST "http://localhost:8186/write?db=telegraf&precision=ns" --data-binary "@test.txt"
Wait 1 or 2 minutes and confirm that most of the measurements don't reach the dashboard anymore. You should have a gap on almost all charts (at least those who refresh their X axis).
If it does not work for you, try posting several times (5 times, 1 or 2 seconds apart seems to be enough for me).
The telegraf logs should reveal something along the line of:
E! InfluxDB Output Error: Response Error: Status Code [400], expected [204], [partial write: points beyond retention policy dropped=xx]
E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster
The expected behaviour would be to have no delay at all (or close to none) in unrelated metrics, especially if coming from other plugins.
The actual behaviour: No new metrics value available during a (variable) time period (at least 4-5 min, sometimes way more).
I have a process that is posting measurements with a timestamp linked to the data processed to telegraf. Usually, as it is working on (almost) realtime data, the timestamp are more or less current. However, occasionally, it can be asked to reprocess old data and the measurements will be send again but at the original timestamps.
If there is a default retention policy on the database, when reprocessing older data, all the metrics in Chronograf dashboard are delayed by a few minutes. (How much seems to vary between environment). When that process stops emitting past events, the dashboard still lag a bit before returning to normal.
Setting the retention policy is critical to reproduce. It causes partial writes at Influx level and telegraf seems a bit confused and appears to hold hostage other measurements even if issued by another input plugin. However I have not seen any missing measurement value when it gets back to normal.
Environment setup using docker on linux:
docker-compose up
docker-compose run influxdb-cli -execute 'CREATE RETENTION POLICY realtime ON telegraf DURATION 4w REPLICATION 1 DEFAULT;'
Then begin to reproduce:
curl -i -XPOST "http://localhost:8186/write?db=telegraf&precision=ns" --data-binary "@test.txt"
If it does not work for you, try posting several times (5 times, 1 or 2 seconds apart seems to be enough for me).
The telegraf logs should reveal something along the line of:
The expected behaviour would be to have no delay at all (or close to none) in unrelated metrics, especially if coming from other plugins.
The actual behaviour: No new metrics value available during a (variable) time period (at least 4-5 min, sometimes way more).
issue_telegraf.zip
The text was updated successfully, but these errors were encountered: