-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Service silently crashes after start #954
Comments
I have same issue.
After restart the instance don't open any tcp ports. However it writes in the log:
Debian 7 (wheezy) amd64. InfluxDB 0.8.2 installed from deb file. |
I am also having this problem on a Debian Wheezy install. Things were running fine for about 3 days, then this morning I noticed the graphs had stopped updating and port 8083 was unavailable. I couldn't find anything in the logs and tried updating the package, but I'd rather not lose my data. I was storing mostly small text blurbs with sentiment analysis data.. |
Same behaviour. I can see the open ports in the netstat output, but can't connect to them. |
What version of InfluxDB are you guys running ? |
I'm not sure the version I was using before but, it was the latest deb package from last week and then I upgraded the package from the latest deb again yesterday, which didn't help. I've only been running influx for a few days, so this is a serious dissappointment. It's understandable, though. Hopefully this is a simple fix that is just hard to see. |
Here is the full output of /opt/influxdb/shared/log.txt: |
I was running 0.8.1, it experienced the hang-up. I've updated to 0.8.2 in hopes that it would solve it, but it did not. |
Are you using a single node or a cluster ? |
Single node. |
I am using a single node, just started learning influx. @vsviridov Are you running debian also? I thought maybe I had overloaded influxdb with requests and influxdb ran out of memory or didn't like some some text being sent (they should have been safe... except I just remembered I forgot to replace " with /" so a lot of broken strings could have been being sent to influx). I am logging IRC messages and my load avg dropped significantly around 7 am, about the time people start waking up and when I image my influx server went down. |
how many shards do you guys have ? you should be able to see all the shards by ls-ing |
I was using grafana when it went down. |
can you also answer the other questions |
selby@know:~$ ls /opt/influxdb/shared/data/db/shard_db_v2
00001 00012 00023 00034 00045 00056 00067 00078 00089 00100 00111 00122 00133 00144 00155 00166 00177 00188 00199 00210 00221 00232 00243 00254 00265 00276 00287 00298 00309
00002 00013 00024 00035 00046 00057 00068 00079 00090 00101 00112 00123 00134 00145 00156 00167 00178 00189 00200 00211 00222 00233 00244 00255 00266 00277 00288 00299 00310
00003 00014 00025 00036 00047 00058 00069 00080 00091 00102 00113 00124 00135 00146 00157 00168 00179 00190 00201 00212 00223 00234 00245 00256 00267 00278 00289 00300 00311
00004 00015 00026 00037 00048 00059 00070 00081 00092 00103 00114 00125 00136 00147 00158 00169 00180 00191 00202 00213 00224 00235 00246 00257 00268 00279 00290 00301 00312
00005 00016 00027 00038 00049 00060 00071 00082 00093 00104 00115 00126 00137 00148 00159 00170 00181 00192 00203 00214 00225 00236 00247 00258 00269 00280 00291 00302 00313
00006 00017 00028 00039 00050 00061 00072 00083 00094 00105 00116 00127 00138 00149 00160 00171 00182 00193 00204 00215 00226 00237 00248 00259 00270 00281 00292 00303 00314
00007 00018 00029 00040 00051 00062 00073 00084 00095 00106 00117 00128 00139 00150 00161 00172 00183 00194 00205 00216 00227 00238 00249 00260 00271 00282 00293 00304 00315
00008 00019 00030 00041 00052 00063 00074 00085 00096 00107 00118 00129 00140 00151 00162 00173 00184 00195 00206 00217 00228 00239 00250 00261 00272 00283 00294 00305 00316
00009 00020 00031 00042 00053 00064 00075 00086 00097 00108 00119 00130 00141 00152 00163 00174 00185 00196 00207 00218 00229 00240 00251 00262 00273 00284 00295 00306 00317
00010 00021 00032 00043 00054 00065 00076 00087 00098 00109 00120 00131 00142 00153 00164 00175 00186 00197 00208 00219 00230 00241 00252 00263 00274 00285 00296 00307 00318
00011 00022 00033 00044 00055 00066 00077 00088 00099 00110 00121 00132 00143 00154 00165 00176 00187 00198 00209 00220 00231 00242 00253 00264 00275 00286 00297 00308 I don't think I've changed anything from the sample config.
|
Should I try to compile master or another branch? |
No, master doesn't have a fix for this issue. what's the output of |
We're writing performance stats every 5 seconds and grafana instance is also open. The size attained was around 256Mb on disk, so it's very little. At the time when it first stopped responding there was no indication of OOM or not having enough disk space. We are running on CentOS 6.4 (x64). 268 shard folders. All default settings on creation. |
Here's my limits for currently running instance
|
selby@know:~$ cat /proc/$(pidof influxdb)/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 16001 16001 processes
Max open files 1024 4096 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 16001 16001 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us |
I have the same problem on Ubuntu 14. I have uninstalled, cleaned up, and reinstalled. It works for a few days, but eventually stops responding. If I restart InfluxDB, it dies by itself within a few seconds. I have not used the database very much, only a few tests. After installation, I inserted some data (apprx 1.5 million points), and I've made a few queries using Grafana and InfluxDB admin. Both the web interface (port 8083) and API (port 8086) becomes unresponsive. The API responds with error messages: Internal Error: runtime error: invalid memory address or nil pointer dereference Example query: vagrant@ubuntu-14:~$ service influxdb status vagrant@ubuntu-14:~$ sudo service influxdb restart 5 s later: Tail of log file: This is the data I inserted to InfluxDB and MySQL: InfluxDB The same data: 108MB becomes 2GB in InfluxDB....? |
I think this problem is caused by a change in grafana that caused unexpected behavior in InfluxDB. Grafana is currently storing dashboards in InfluxDB which caused InfluxDB to create a massive number of shards that aren't currently being used, combined with the fact that the no of open files is low and there is no limit on the open shards in your configuration the process ran out of open files. We will try to address this issue sometime today. To work around this issue, you can either limit the number of shards in the config file, bump the limit or delete shards that aren't being used. |
I guess I could move the grafana config back to elasticsearch. |
you can try that, but as i said before the shards are already created. you need to use one of the workarounds mentioned earlier |
can someone post the shards information in json, |
Sorry, I can't restart InfluxDB. It crashes immediately. |
then try to bump the limit of the process or limit the number of shards |
I had to delete the database. Also I tried using the lev node app to try to read from the leveldb files, but it's not able to open them. |
I looked at the url. grafana did create a ridiculous amount of shards. But this time I've put the grafana config into a separate namespace. I guess if I delete them - it should not affect the main database. |
What I pushed to master will stop the automatic creation of shards. That said, if you are already suffering from this problem, you can work around it by doing the following to start influxdb and prevent from crashing:
In order to get rid of the extra shards, you can do the following:
|
I confirm the issue is due to grafana creates too many shards. |
Some notion of that in the logs might be nice to have. |
It is clear in the log that the process can't open any more files. Normally the process shouldn't die with some random error, but this sometimes happened because we didn't check the error returned from a method call which was fixed in 78f8c39 |
The file limits are still going to be a problem even with the number of shards from the Grafana DB being dropped down. At least that's what I see from @selbyk's file limits. The soft and hard limits should be set to infinity. |
@pauldix: How do I set the soft and hard file limits? |
There's a good post on how to change the system wide max number of open files (which is a hard limit set for the entire system) as well as user level limits which @pauldix mentioned above. Please don't set the |
@jvshahid: the other question is, does increasing the file limit actually solve the problem? Or does it merely postpone the crash? I followed the instructions in this post: |
32k is still too low. You should be able to set it much higher. Where did On Tue, Sep 23, 2014 at 2:46 PM, perqa [email protected] wrote:
|
@pauldix: I might have jumped to conclusions there. I thought they were addressing the same problem in the post I referred to. Do you have any recommendation on a minimum level? |
You should set it to infinity, unless there's some compelling reason to do On Wed, Sep 24, 2014 at 1:50 AM, perqa [email protected] wrote:
|
I'm on Ubuntu 14, and from what I can find there is a maximum limit. |
This was causing InfluxDB to create a new shard in the grafana db every ten minutes. Also we talked about getting rid of this feature a while ago, so here we go. Fix #954 Conflicts: cluster/cluster_configuration.go
Hello Friends, i have single influxdb node having data over 170GB of data. things were working well. but today it crashed suddenly. after restart using "service influxdb restart" the status shows sometimes "influxdb Process is not running [ FAILED ]" when i visit log file it shows the following error. please help me friends. i cant access the "8083","8086" ports.panic: not ordered: 712 1455942607000000000 >= 1455942607000000000goroutine 676 [running]: |
Sorry friends, the issue was rectified by my colleague(Thomas Kurz), by simple version update. now the influxdb up again. my influxdb was beta1.0 and the influxdb is upgraded to beta1.3 version. i wasn't aware of it. and it again working like a charm. thanks for the bug fixes. |
Again the same issue is coming in "beta1.3 version" also.i should find another way to fix it. I thought everything was ok. but again the error persists. Below is the output of "/var/log/influxdb/influxd.log" panic: not ordered: 793 1455352203000000000 >= 1455352203000000000 goroutine 2026 [running]: |
Same issue.. all the sudden too. Not sure why. It's been working for a long time (>2 years) Limits:
Shards: Queries:
Any ideas? Turned off heka and restarted influx and still a no go. Silently dies after about 30 seconds. Changed logging to |
Today we've experienced a situation when the database stopped responding to HTTP queries on the admin interface.
After the restart it stopped responding to all HTTP requests and crashes occasionally.
If I restart it with clean data folder it starts normally.
There's nothing in the log that points to any potential issues.
Please advise.
The text was updated successfully, but these errors were encountered: