Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewriting index with tsm very slow when restarting server #4952

Closed
cheribral opened this issue Dec 2, 2015 · 3 comments
Closed

rewriting index with tsm very slow when restarting server #4952

cheribral opened this issue Dec 2, 2015 · 3 comments
Labels

Comments

@cheribral
Copy link

Running from git, we've noticed the server just "disappear" after a while while the process is still running. This isn't a big deal since we know we are in for some of that, but something possibly worth mentioning is that the restarts have been taking an incredibly long time. During the restart the server is mostly idle with only a couple CPUs busy. I see not much happening on the disks at all. As you can see here, there are quite large chunks of time while it rewrites index files.

Dec 02 02:26:48 influxdb influxd[21949]: [tsm1] 2015/12/02 02:26:48 rewriting index file /influxdb/data/telegraf/3months/26741/0000839.tsm1 with /influxdb/data/telegraf/3months/26741/0000841.tsm1
Dec 02 02:28:55 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:28:55 /influxdb/data/telegraf/3months/26741 flush to index took 2m7.084940928s
Dec 02 02:28:56 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:28:56 tsm1 WAL starting with 5242880 flush memory size threshold and 104857600 max memory size threshold
Dec 02 02:28:56 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:28:56 tsm1 WAL writing to /influxdb/data/telegraf/3months/28299
Dec 02 02:28:57 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:28:57 startup flush of /influxdb/data/telegraf/3months/28299 with 82450 keys and 286496 total values of 4588678 bytes
Dec 02 02:28:58 influxdb influxd[21949]: [tsm1] 2015/12/02 02:28:58 rewriting index file /influxdb/data/telegraf/3months/28299/0000830.tsm1 with /influxdb/data/telegraf/3months/28299/0000832.tsm1
Dec 02 02:31:07 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:31:07 /influxdb/data/telegraf/3months/28299 flush to index took 2m9.970747255s
Dec 02 02:31:08 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:31:08 tsm1 WAL starting with 5242880 flush memory size threshold and 104857600 max memory size threshold
Dec 02 02:31:08 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:31:08 tsm1 WAL writing to /influxdb/data/telegraf/3months/29856
Dec 02 02:31:09 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:31:09 startup flush of /influxdb/data/telegraf/3months/29856 with 82836 keys and 393553 total values of 6303262 bytes
Dec 02 02:31:11 influxdb influxd[21949]: [tsm1] 2015/12/02 02:31:11 rewriting index file /influxdb/data/telegraf/3months/29856/0000489.tsm1 with /influxdb/data/telegraf/3months/29856/0000491.tsm1
Dec 02 02:32:31 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:32:31 /influxdb/data/telegraf/3months/29856 flush to index took 1m21.486255322s
Dec 02 02:32:37 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:32:37 tsm1 WAL starting with 5242880 flush memory size threshold and 104857600 max memory size threshold
Dec 02 02:32:37 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:32:37 tsm1 WAL writing to /influxdb/data/telegraf/3months/31349
Dec 02 02:32:39 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:32:39 startup flush of /influxdb/data/telegraf/3months/31349 with 82816 keys and 490279 total values of 7852805 bytes
Dec 02 02:32:41 influxdb influxd[21949]: [tsm1] 2015/12/02 02:32:41 writing new index file /influxdb/data/telegraf/3months/31349/0000576.tsm1
Dec 02 02:32:41 influxdb influxd[21949]: [tsm1] 2015/12/02 02:32:41 rewriting index file /influxdb/data/telegraf/3months/31349/0000574.tsm1 with /influxdb/data/telegraf/3months/31349/0000577.tsm1
Dec 02 02:32:41 influxdb influxd[21949]: [tsm1] 2015/12/02 02:32:41 rewriting index file /influxdb/data/telegraf/3months/31349/0000573.tsm1 with /influxdb/data/telegraf/3months/31349/0000578.tsm1
Dec 02 02:34:12 influxdb influxd[21949]: [tsm1wal] 2015/12/02 02:34:12 /influxdb/data/telegraf/3months/31349 flush to index took 1m32.614097171s
Dec 02 02:34:12 influxdb influxd[21949]: [tsm1] 2015/12/02 02:34:12 Starting compaction in /influxdb/data/telegraf/3months/31349 of 8 files to new file /influxdb/data/telegraf/3months/31349/0000579.tsm1

This continues for a while, and in the end it takes about 15 - 20 minutes to start the server. I'm guessing that since the server is mostly idle, something isn't quite right. If you need any more information, please let me know.

Cheers

@beckettsean
Copy link
Contributor

@cheribral what version are you running?

@cheribral
Copy link
Author

I have been running from git, but this morning I noticed that it tripped over one of the files because it was corrupt, so perhaps there was something else going on. The box was mistakenly bounced by support a few days ago, so it is possible that the corruption came from that. I've wiped that database, and am starting again. After letting it run for a day or two, I will see if the problem comes back.

If I can cause it to happen again, I'll update the ticket.

@otoolep
Copy link
Contributor

otoolep commented Dec 9, 2015

tsm1 has been significantly re-written in the 0.9.6 release. Even with hundreds of gigabytes, I now see start-up times of less than a minute.

Please try the latest software.

https://influxdata.com/downloads/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants