Problem with increasing connections while getting metrics from Elasticsearch in client mode #452

eresgie · 2015-12-17T13:45:15Z

Hello.

I'm using Telegraf (ver. 0.2.4) to monitor few Elasticsearch (ver. 1.7.3) instances. One of them is configured in client mode (has no data, essentially it's a load balancer). When I'm starting Telegraf, metrics are gathered and send to Influx database. But Telegraf process keeps opening network connections to Elasticsearch, never closing them. It keeps increasing until limit of open files is reached.

22994 is PID of Telegraf process:

# ls /proc/22994/fd | wc -l
128

In 2 hours number of open files is reaching limit, causing:

Dec 17 12:07:30 es-lb-0 telegraf: 2015/12/17 12:07:30 Error in plugin [net]: error getting net io info: open /proc/net/dev: too many open files
Dec 17 12:07:30 es-lb-0 telegraf: panic: runtime error: index out of range

Then Telegraf restarts (or clear active connections) or sometimes even stops getting metrics.

When Elasticsearch plugin is not loaded, everything is ok.
I've tried all possible Elasticsearch plugin configuration options (local = true/false), without any positive results.
Number of open files is increasing in Elasticsearch process, too, so I'm sure it's connected.

Process file descriptors (output shortened for brevity):

# ls -al /proc/22994/fd
[...]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:16 90 -> socket:[8634053]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:14 91 -> socket:[8634093]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:16 92 -> socket:[8634632]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:16 93 -> socket:[8634743]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:16 94 -> socket:[8634671]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:16 95 -> socket:[8634776]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:16 96 -> socket:[8635496]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:16 97 -> socket:[8635534]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:17 98 -> socket:[8635002]
lrwx------. 1 telegraf telegraf 64 Dec 17 13:16 99 -> socket:[8635163]

I can help with further testing, for now my solution is to remove Elasticsearch plugin from Telegraf.

The text was updated successfully, but these errors were encountered:

zstyblik · 2015-12-17T14:34:59Z

My guess is this is caused by not calling r.Body.Close() and not reading response body. I don't see anything like that in code of the plugin.

The client must close the response body when finished with it:

resp, err := http.Get("http://example.com/")
if err != nil {
    // handle error
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)

I had the similar issue in Python with requests.

sparrc · 2015-12-17T17:53:34Z

thanks for the report, if @zstyblik is correct then that is the issue, we should be always be closing the response body.

zstyblik · 2015-12-17T18:26:25Z

@sparrc - I don't know architecture of Telegraf, so it's a "guess" based on reading through the Elasticsearch plugin. But I doubt such functionality, read body of response and close connection, is implemented somewhere inside. Anyway, that part is copy-paste from documentation of net/http + my own experience from Python land :)

sparrc · 2015-12-17T18:46:47Z

I think you're right, we should be closing this: https://github.com/influxdb/telegraf/blob/master/plugins/elasticsearch/elasticsearch.go#L201

zstyblik · 2015-12-17T18:55:57Z

@sparrc indeed :) It could be something like ...

    r, err := e.client.Get(url)
    if err != nil {
        return err
    }
    defer r.Body.Close()
    if r.StatusCode != http.StatusOK {
        ioutil.ReadAll(r.Body)
        return fmt.Errorf("elasticsearch: API responded with status-code %d, expected %d", r.StatusCode, http.StatusOK)
    }
    if err = json.NewDecoder(r.Body).Decode(v); err != nil {
        return err
    }
    return nil

? And import ioutil or use another Reader, I guess.

sparrc · 2015-12-17T19:00:52Z

why does it require the ioutil.ReadAll? I think only the defer is necessary

zstyblik · 2015-12-17T19:04:33Z

Because you're supposed to read the body of the response even if you don't care about it. Whether it's just a good practice or optional, that I don't know. I'm always doing it regardless :)

jipperinbham · 2015-12-17T20:08:05Z

@sparrc I was going to work on this and #430 but wasn't sure if I should create a branch from master or 0.2.4 given all the changes going on with master

sparrc · 2015-12-17T20:27:14Z

@jipperinbham do it on master, the changes I'm making are on the 0.3.0 branch and will remain there until sometime in January

jipperinbham · 2015-12-17T20:48:55Z

@zstyblik re: whether or not the body should be read, from what I've read you are right in that the underlying transport will not reuse the connection if Body.Close() is called without first being read.

In this situation, I tend to think it's ok to not reuse the underlying connection because we received something other than HTTP/1.1 200 OK and would want to setup the connection again.

thoughts?

zstyblik · 2015-12-18T05:34:20Z

@jipperinbham sorry, but I don't know that well. I'd expect this to be related to connection pool rather than connection being reused, resp. another request being sent over the exactly the same and already open connection and freeing up that connection early. shrug

sparrc added the bug unexpected problem or unintended behavior label Dec 17, 2015

jipperinbham mentioned this issue Dec 17, 2015

close r.Body, remove network metrics, updated other sections as needed #454

Closed

sparrc closed this as completed in 97ebcc2 Dec 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with increasing connections while getting metrics from Elasticsearch in client mode #452

Problem with increasing connections while getting metrics from Elasticsearch in client mode #452

eresgie commented Dec 17, 2015

zstyblik commented Dec 17, 2015

sparrc commented Dec 17, 2015

zstyblik commented Dec 17, 2015

sparrc commented Dec 17, 2015

zstyblik commented Dec 17, 2015

sparrc commented Dec 17, 2015

zstyblik commented Dec 17, 2015

jipperinbham commented Dec 17, 2015

sparrc commented Dec 17, 2015

jipperinbham commented Dec 17, 2015

zstyblik commented Dec 18, 2015

Problem with increasing connections while getting metrics from Elasticsearch in client mode #452

Problem with increasing connections while getting metrics from Elasticsearch in client mode #452

Comments

eresgie commented Dec 17, 2015

zstyblik commented Dec 17, 2015

sparrc commented Dec 17, 2015

zstyblik commented Dec 17, 2015

sparrc commented Dec 17, 2015

zstyblik commented Dec 17, 2015

sparrc commented Dec 17, 2015

zstyblik commented Dec 17, 2015

jipperinbham commented Dec 17, 2015

sparrc commented Dec 17, 2015

jipperinbham commented Dec 17, 2015

zstyblik commented Dec 18, 2015