-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InfluxDB 0.10.x performance unusable with less than 30 days history #5856
Comments
@fxstein Can you run Do you have some examples of the kind of data you are writing? If you have some sample writes in line protocol format, that would be really useful. The slow startup is a known issue with loading the in-memory index. If you are able to build the code, I'd be interested to know if #5372 helps. |
Looks like homebrew does not install influx_inspect - any pointer as to how to install it? Here are example data feeds in addition to collectd stats from a few machines: For example Solar PV Sensor Data from SMA Inverters: Have not tested the load improvements, but from watching top and memory consumption, nothing but cutting memory footprint will help it. The last startup swapped more then 500GB of IO in and out. It seems that for about 16GB of data for 4 weeks the memory footprint is well in access of 20GB of RAM with the server only having 16GB total. |
@fxstein If you have
It looks like how you are using tags may be creating the memory issue based on the code you have referenced. Here are a few things I see that could be causing problems:
If you convert the date based tags to fields, that will probably help. Also, removing the very high cardinality tag values should also help. I really need to see the report generated by |
@jwilder, please, is this documented in FAQ or guidelines? I found note on series cardinality. The rest is sort of "new", eg. large tag keys or values. |
@jwilder Thank you for the insights. Sorry for my slow response - just came back from a business trip. Going to work on this over the weekend, really want to get to the bottom of this. Have wondered how to best use tags and in which cases. Especially with new IoT devices and APIs its sometime totally unknown what data you will receive over time. As a default I have taken anything that is numeric as a field and anything that comes as a string as a tag - might create exactly the problem you describe. Having said that, is there any way to change an existing time series or do I have to start over if I remove certain tags and turn them into fields? I am going to get you the output of inlfux_inspect. |
@jwilder I have inspect running right now, but it might take a while given the server is OOM most of the time. The header of inspect is already pointing at the problem you suggested: high cardinality of tags leading to tons of series:
Need to find a way to eliminate tags that don't make sense. I think the main culprit is the Ubiquiti MFI data that has timestamps for the last update of a sensor and that might change every few seconds. |
@jwilder Ok this is definitely the tag cardinality problem you suggested. Within minutes I had a 2+GB inspect output file. Not gonna upload any part of it as I can see what is happening with some of the tags. Most of my series have some high cardinality tags and the worst offender is definitely the MFI sensor data with 3 timestamps that update every few seconds for every individual sensor. Now as for the cleanup: Can I simply drop tags I want to eliminate and free up memory by doing so or do I need to reload all of that data? |
@fxstein The data is split up by series, which takes the measurement name and tag set (keys and values) to create a series key. I do not believe there is currently a means of combining series by "dropping" tags. I think you'll have to re-load your data and make those tags fields instead. |
Thanks I was afraid that would be the answer. It makes the whole schema less - late binding - you don't need to design a database - a mood point. It means that if you don't know your data upfront and you make a mistake you are screwed and have to start over. We have learned the same from the likes of MongoDB and now so many other no relational approaches. This always becomes the achilles heal of any such solutions. |
I agree with @fxstein Unfortunately InfluxDB has this weekspot with tags. Some approach @desa laid out in #3445 (comment) could be of some help to maintain the evolving tags, though I would prefer to see InfluxDB supporting the tags / metadata as first class citizen treating them as another time-series, albeit a low frequency one (stays constant most of the time but possibly can change anytime), rather than treating it as "additional tagged along data". |
Sorry to say but InfluxDB has degraded to a point that makes it unusable for me. I have been running pre 0.9 versions last year with up to a year of data and now since my move to 0.9.x (disaster) and now 0.10.x (just as bad if not worse) I cannot handle 28 days of IoT data with less than 1 million events per day.
I hope I am just doing something very wrong - so any help would be very much appreciated.
Dedicated Mac Mini with 16GB of RAM, Dual SSD.
The biggest issue since 0.9.x is that memory consumption goes up linearly with the amount of history. And is more than the physical storage of all the data. If you query it or not is irrelevant.
Prior to 0.9 I was running 0.8.8 and could have a year of the same data on the same machine. multi months queries would take a while, but startup or loads never experienced any issues.
Now even with no queries running, memory footprint grows and grows until the server swaps to death (even without a single query running) - as in TBs of swap per day for about 16GB of total data in the db.
As it has been mentioned the startup times explode and can take multiple hours, but that is not even the worst issue. Every nn minutes that database goes through a cycle, where it does the same thing for multiple minutes - running out of all memory and refusing any new writes.
Writes slowing down after a few minutes where single writes can take 9-10+ seconds instead of us/ms:
Here is the one database in the filesystem I am working with:
Here a reboot after ideling the server, shutting it down and starting it without any writes hitting the server.
Parallel loading is definitely not the problem and will not improve that since the culprit is total memory exhaustion that makes a clean boot that hours.
The text was updated successfully, but these errors were encountered: