Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak in 0.8.2 #941

Closed
prune998 opened this issue Sep 16, 2014 · 49 comments
Closed

memory leak in 0.8.2 #941

prune998 opened this issue Sep 16, 2014 · 49 comments
Milestone

Comments

@prune998
Copy link

I still have a memory leak in Influxdb 0.8.2

image

Servers are VMWARE hosts, ubuntu 14.04 LTS :
Linux poplar 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

It is a two node cluster with dedicated servers (nothing else is running except admin tools like logstash and diamond, not using much CPU or RAM).
The database have a one day shard with replication = 2 and split = 1

Here is the influxdb process info :

ps auxwww | grep infl
belisar+ 851041 12.5 13.3 1574544 540452 ?      Sl   Sep15 121:19 /opt/data/apps/influxdb/current/influxdb -pidfile /opt/data/apps/influxdb/shared/influxdb.pid -config /opt/data/influxdb/shared/config.toml

limits on the process :

cat /proc/851041/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    10240000             bytes
Max resident set          unlimited            unlimited            bytes
Max processes             31450                31450                processes
Max open files            64000                64000                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       31450                31450                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

process mapping :

cat /proc/851041/maps                                                                                                                                                                                            [49/1929]
00400000-01000000 r-xp 00000000 00:21 138854                             /opt/data/apps/influxdb/versions/0.8.2/influxdb
01000000-01027000 rw-p 00c00000 00:21 138854                             /opt/data/apps/influxdb/versions/0.8.2/influxdb
01027000-0105d000 rw-p 00000000 00:00 0
013af000-0219c000 rw-p 00000000 00:00 0                                  [heap]
c000000000-c00006e000 rw-p 00000000 00:00 0
c206494000-c2236c0000 rw-p 00000000 00:00 0
7fdf14000000-7fdf14287000 rw-p 00000000 00:00 0
7fdf14287000-7fdf18000000 ---p 00000000 00:00 0
7fdf1c000000-7fdf1ca64000 rw-p 00000000 00:00 0
7fdf1ca64000-7fdf20000000 ---p 00000000 00:00 0
7fdf20000000-7fdf20c50000 rw-p 00000000 00:00 0
7fdf20c50000-7fdf24000000 ---p 00000000 00:00 0
7fdf24000000-7fdf24bac000 rw-p 00000000 00:00 0
7fdf24bac000-7fdf28000000 ---p 00000000 00:00 0
7fdf28000000-7fdf2893d000 rw-p 00000000 00:00 0
7fdf2893d000-7fdf2c000000 ---p 00000000 00:00 0
7fdf2c000000-7fdf2c30f000 rw-p 00000000 00:00 0
7fdf2c30f000-7fdf30000000 ---p 00000000 00:00 0
7fdf30000000-7fdf300bf000 rw-p 00000000 00:00 0
7fdf300bf000-7fdf34000000 ---p 00000000 00:00 0
7fdf34000000-7fdf343ba000 rw-p 00000000 00:00 0
7fdf343ba000-7fdf38000000 ---p 00000000 00:00 0
7fdf39a46000-7fdf39ffc000 rw-p 00000000 00:00 0
7fdf39ffc000-7fdf39ffd000 ---p 00000000 00:00 0
7fdf39ffd000-7fdf3a7fd000 rw-p 00000000 00:00 0                          [stack:851598]
7fdf3a7fd000-7fdf3a7fe000 ---p 00000000 00:00 0
7fdf3a7fe000-7fdf3affe000 rw-p 00000000 00:00 0                          [stack:851412]
7fdf3affe000-7fdf3afff000 ---p 00000000 00:00 0
7fdf3afff000-7fdf3b7ff000 rw-p 00000000 00:00 0
7fdf3b7ff000-7fdf3b800000 ---p 00000000 00:00 0
7fdf3b800000-7fdf3c9ac000 rw-p 00000000 00:00 0                          [stack:851331]
7fdf3c9ac000-7fdf40000000 ---p 00000000 00:00 0
7fdf40000000-7fdf4076c000 rw-p 00000000 00:00 0
7fdf4076c000-7fdf44000000 ---p 00000000 00:00 0
7fdf44000000-7fdf444d4000 rw-p 00000000 00:00 0
7fdf444d4000-7fdf48000000 ---p 00000000 00:00 0
7fdf4801c000-7fdf4823c000 rw-p 00000000 00:00 0
7fdf48277000-7fdf483d8000 rw-p 00000000 00:00 0
7fdf483d8000-7fdf483d9000 ---p 00000000 00:00 0
7fdf483d9000-7fdf48bd9000 rw-p 00000000 00:00 0                          [stack:851330]
7fdf48bd9000-7fdf48bda000 ---p 00000000 00:00 0
7fdf48bda000-7fdf493da000 rw-p 00000000 00:00 0                          [stack:851329]
7fdf493da000-7fdf493db000 ---p 00000000 00:00 0
7fdf493db000-7fdf49bdb000 rw-p 00000000 00:00 0                          [stack:851328]
7fdf49bdb000-7fdf49bdc000 ---p 00000000 00:00 0
7fdf49bdc000-7fdf4a3dc000 rw-p 00000000 00:00 0                          [stack:851048]
7fdf4a3dc000-7fdf4a3dd000 ---p 00000000 00:00 0
7fdf4a3dd000-7fdf4abdd000 rw-p 00000000 00:00 0                          [stack:851047]
7fdf4abdd000-7fdf4abf4000 r-xp 00000000 fc:00 275675                     /lib/x86_64-linux-gnu/libresolv-2.19.so
7fdf4abf4000-7fdf4adf4000 ---p 00017000 fc:00 275675                     /lib/x86_64-linux-gnu/libresolv-2.19.so
7fdf4adf4000-7fdf4adf5000 r--p 00017000 fc:00 275675                     /lib/x86_64-linux-gnu/libresolv-2.19.so
7fdf4adf5000-7fdf4adf6000 rw-p 00018000 fc:00 275675                     /lib/x86_64-linux-gnu/libresolv-2.19.so
7fdf4adf6000-7fdf4adf8000 rw-p 00000000 00:00 0
7fdf4adf8000-7fdf4adfd000 r-xp 00000000 fc:00 275664                     /lib/x86_64-linux-gnu/libnss_dns-2.19.so
7fdf4adfd000-7fdf4affc000 ---p 00005000 fc:00 275664                     /lib/x86_64-linux-gnu/libnss_dns-2.19.so
7fdf4affc000-7fdf4affd000 r--p 00004000 fc:00 275664                     /lib/x86_64-linux-gnu/libnss_dns-2.19.so
7fdf4affd000-7fdf4affe000 rw-p 00005000 fc:00 275664                     /lib/x86_64-linux-gnu/libnss_dns-2.19.so
7fdf4affe000-7fdf4afff000 ---p 00000000 00:00 0
7fdf4afff000-7fdf4b7ff000 rw-p 00000000 00:00 0                          [stack:851046]
7fdf4b7ff000-7fdf4b800000 ---p 00000000 00:00 0
7fdf4b800000-7fdf4c0c8000 rw-p 00000000 00:00 0                          [stack:851045]
7fdf4c0c8000-7fdf50000000 ---p 00000000 00:00 0
7fdf50000000-7fdf50021000 rw-p 00000000 00:00 0
7fdf50021000-7fdf54000000 ---p 00000000 00:00 0
7fdf54000000-7fdf54021000 rw-p 00000000 00:00 0
7fdf54021000-7fdf58000000 ---p 00000000 00:00 0
7fdf58005000-7fdf58025000 rw-p 00000000 00:00 0
7fdf5802c000-7fdf580ec000 rw-p 00000000 00:00 0
7fdf580ec000-7fdf580f7000 r-xp 00000000 fc:00 275661                     /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fdf580f7000-7fdf582f6000 ---p 0000b000 fc:00 275661                     /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fdf582f6000-7fdf582f7000 r--p 0000a000 fc:00 275661                     /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fdf582f7000-7fdf582f8000 rw-p 0000b000 fc:00 275661                     /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fdf582f8000-7fdf58318000 rw-p 00000000 00:00 0                          [stack:851043]
7fdf58318000-7fdf58319000 ---p 00000000 00:00 0
7fdf58319000-7fdf58b19000 rw-p 00000000 00:00 0
7fdf58b19000-7fdf58b1a000 ---p 00000000 00:00 0
7fdf58b1a000-7fdf5941a000 rw-p 00000000 00:00 0
7fdf5941a000-7fdf5941b000 ---p 00000000 00:00 0
7fdf5941b000-7fdf59c1b000 rw-p 00000000 00:00 0                          [stack:851042]
7fdf59c1b000-7fdf59dd7000 r-xp 00000000 fc:00 275659                     /lib/x86_64-linux-gnu/libc-2.19.so
7fdf59dd7000-7fdf59fd6000 ---p 001bc000 fc:00 275659                     /lib/x86_64-linux-gnu/libc-2.19.so
7fdf59fd6000-7fdf59fda000 r--p 001bb000 fc:00 275659                     /lib/x86_64-linux-gnu/libc-2.19.so
7fdf59fda000-7fdf59fdc000 rw-p 001bf000 fc:00 275659                     /lib/x86_64-linux-gnu/libc-2.19.so
7fdf59fdc000-7fdf59fe1000 rw-p 00000000 00:00 0
7fdf59fe1000-7fdf59ff7000 r-xp 00000000 fc:00 259939                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7fdf59ff7000-7fdf5a1f6000 ---p 00016000 fc:00 259939                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7fdf5a1f6000-7fdf5a1f7000 rw-p 00015000 fc:00 259939                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7fdf5a1f7000-7fdf5a210000 r-xp 00000000 fc:00 275676                     /lib/x86_64-linux-gnu/libpthread-2.19.so
7fdf5a210000-7fdf5a40f000 ---p 00019000 fc:00 275676                     /lib/x86_64-linux-gnu/libpthread-2.19.so
7fdf5a40f000-7fdf5a410000 r--p 00018000 fc:00 275676                     /lib/x86_64-linux-gnu/libpthread-2.19.so
7fdf5a410000-7fdf5a411000 rw-p 00019000 fc:00 275676                     /lib/x86_64-linux-gnu/libpthread-2.19.so
7fdf5a411000-7fdf5a415000 rw-p 00000000 00:00 0
7fdf5a415000-7fdf5a424000 r-xp 00000000 fc:00 259980                     /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fdf5a424000-7fdf5a623000 ---p 0000f000 fc:00 259980                     /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fdf5a623000-7fdf5a624000 r--p 0000e000 fc:00 259980                     /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fdf5a624000-7fdf5a625000 rw-p 0000f000 fc:00 259980                     /lib/x86_64-linux-gnu/libbz2.so.1.0.4
7fdf5a625000-7fdf5a63d000 r-xp 00000000 fc:00 259674                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7fdf5a63d000-7fdf5a83c000 ---p 00018000 fc:00 259674                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7fdf5a83c000-7fdf5a83d000 r--p 00017000 fc:00 259674                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7fdf5a83d000-7fdf5a83e000 rw-p 00018000 fc:00 259674                     /lib/x86_64-linux-gnu/libz.so.1.2.8
7fdf5a83e000-7fdf5a943000 r-xp 00000000 fc:00 275662                     /lib/x86_64-linux-gnu/libm-2.19.so
7fdf5a943000-7fdf5ab42000 ---p 00105000 fc:00 275662                     /lib/x86_64-linux-gnu/libm-2.19.so
7fdf5ab42000-7fdf5ab43000 r--p 00104000 fc:00 275662                     /lib/x86_64-linux-gnu/libm-2.19.so
7fdf5ab43000-7fdf5ab44000 rw-p 00105000 fc:00 275662                     /lib/x86_64-linux-gnu/libm-2.19.so
7fdf5ab44000-7fdf5ab4b000 r-xp 00000000 fc:00 275677                     /lib/x86_64-linux-gnu/librt-2.19.so
7fdf5ab4b000-7fdf5ad4a000 ---p 00007000 fc:00 275677                     /lib/x86_64-linux-gnu/librt-2.19.so
7fdf5ad4a000-7fdf5ad4b000 r--p 00006000 fc:00 275677                     /lib/x86_64-linux-gnu/librt-2.19.so
7fdf5ad4b000-7fdf5ad4c000 rw-p 00007000 fc:00 275677                     /lib/x86_64-linux-gnu/librt-2.19.so
7fdf5ad4c000-7fdf5ad6f000 r-xp 00000000 fc:00 275658                     /lib/x86_64-linux-gnu/ld-2.19.so
7fdf5ad8a000-7fdf5af0c000 rw-p 00000000 00:00 0                          [stack:851044]
7fdf5af1c000-7fdf5af62000 rw-p 00000000 00:00 0
7fdf5af63000-7fdf5af6e000 rw-p 00000000 00:00 0
7fdf5af6e000-7fdf5af6f000 r--p 00022000 fc:00 275658                     /lib/x86_64-linux-gnu/ld-2.19.so
7fdf5af6f000-7fdf5af70000 rw-p 00023000 fc:00 275658                     /lib/x86_64-linux-gnu/ld-2.19.so
7fdf5af70000-7fdf5af71000 rw-p 00000000 00:00 0
7fff98598000-7fff985b9000 rw-p 00000000 00:00 0                          [stack]
7fff985f5000-7fff985f7000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]


global mem info :

cat /proc/meminfo
MemTotal:        4048272 kB
MemFree:          235096 kB
Buffers:           33284 kB
Cached:          2913560 kB
SwapCached:        20944 kB
Active:          2443780 kB
Inactive:        1238768 kB
Active(anon):     384684 kB
Inactive(anon):   351076 kB
Active(file):    2059096 kB
Inactive(file):   887692 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1000444 kB
SwapFree:         893720 kB
Dirty:              1132 kB
Writeback:             0 kB
AnonPages:        728136 kB
Mapped:            21720 kB
Shmem:                48 kB
Slab:              70076 kB
SReclaimable:      54644 kB
SUnreclaim:        15432 kB
KernelStack:        1472 kB
PageTables:         6256 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     3024580 kB
Committed_AS:    1127708 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      282128 kB
VmallocChunk:   34359411292 kB
HardwareCorrupted:     0 kB
AnonHugePages:    258048 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       69568 kB
DirectMap2M:     4124672 kB

Let me know if I can provide more info... or tell me how to memory dump if this is what you need....

@prune998
Copy link
Author

Well, memory usage is not that much on first server, but on second server, memory usage is far more...

./procstat 877524
                 pid: 877524
               tcomm: (influxdb)
               state: S
                ppid: 1
                pgid: 877516
                 sid: 877516
              tty_nr: 0
            tty_pgrp: -1
               flags: 1077960960
             min_flt: 2654611
            cmin_flt: 0
             maj_flt: 322
            cmaj_flt: 0
               utime: 14848.000000
               stime: 2680.250000
              cutime: 0.000000
              cstime: 0.000000
            priority: 20
                nice: 0
         num_threads: 20
       it_real_value: 0.000000
          start_time: 09.15 22:10 (63952.85s)
               vsize: 2818584576
                 rss: 345328
              rsslim: 9223372036854775807
          start_code: 4194304
            end_code: 16774412
         start_stack: 140735617385872
                 esp: 139664039706272
                 eip: 5314051
             pending: 0000000000000000
             blocked: 0000000000000000
              sigign: 0000000000000001
            sigcatch: 000000007fc1fefe
               wchan: 9223372036854775807
               zero1: 0
               zero2: 0
         exit_signal: 0000000000000011
                 cpu: 1
         rt_priority: 0
              policy: 0

@prune998
Copy link
Author

here is pmap of the process :

877524:   /opt/data/apps/influxdb/current/influxdb -pidfile /opt/data/apps/influxdb/shared/influxdb.pid -config /opt/data/influxdb/shared/config.toml
0000000000400000  12288K r-x-- influxdb
0000000001000000    156K rw--- influxdb
0000000001027000    216K rw---   [ anon ]
0000000002de1000  29392K rw---   [ anon ]
000000c000000000   1396K rw---   [ anon ]
000000c2028e8000 1516128K rw---   [ anon ]
00007f05c8000000    384K rw---   [ anon ]
00007f05c8060000  65152K -----   [ anon ]
00007f05cc000000  26840K rw---   [ anon ]
00007f05cda36000  38696K -----   [ anon ]
00007f05d0bc9000   1408K rw---   [ anon ]
00007f05d0d29000      4K -----   [ anon ]
00007f05d0d2a000   8192K rw---   [ anon ]
00007f05d152a000      4K -----   [ anon ]
00007f05d152b000   8192K rw---   [ anon ]
00007f05d1d2b000      4K -----   [ anon ]
00007f05d1d2c000   8192K rw---   [ anon ]
00007f05d252c000      4K -----   [ anon ]
00007f05d252d000   9984K rw---   [ anon ]
00007f05d2eed000      4K -----   [ anon ]
00007f05d2eee000  29120K rw---   [ anon ]
00007f05d4b5e000  53896K -----   [ anon ]
00007f05d8000000   9804K rw---   [ anon ]
00007f05d8993000  55732K -----   [ anon ]
00007f05dc000000  14760K rw---   [ anon ]
00007f05dce6a000  50776K -----   [ anon ]
00007f05e0000000  11804K rw---   [ anon ]
00007f05e0b87000  53732K -----   [ anon ]
00007f05e4000000   5812K rw---   [ anon ]
00007f05e45ad000  59724K -----   [ anon ]
00007f05e8000000    624K rw---   [ anon ]
00007f05e809c000  64912K -----   [ anon ]
00007f05ec000000  21852K rw---   [ anon ]
00007f05ed557000  43684K -----   [ anon ]
00007f05f0000000  15060K rw---   [ anon ]
00007f05f0eb5000  50476K -----   [ anon ]
00007f05f4008000   3328K rw---   [ anon ]
00007f05f4350000   3260K rw---   [ anon ]
00007f05f469e000    364K rw---   [ anon ]
00007f05f46f9000      4K -----   [ anon ]
00007f05f46fa000   9216K rw---   [ anon ]
00007f05f4ffa000      4K -----   [ anon ]
00007f05f4ffb000   8192K rw---   [ anon ]
00007f05f57fb000      4K -----   [ anon ]
00007f05f57fc000   8192K rw---   [ anon ]
00007f05f5ffc000      4K -----   [ anon ]
00007f05f5ffd000   8192K rw---   [ anon ]
00007f05f67fd000      4K -----   [ anon ]
00007f05f67fe000   8192K rw---   [ anon ]
00007f05f6ffe000      4K -----   [ anon ]
00007f05f6fff000   8192K rw---   [ anon ]
00007f05f77ff000      4K -----   [ anon ]
00007f05f7800000   8324K rw---   [ anon ]
00007f05f8021000  65404K -----   [ anon ]
00007f05fc000000   9588K rw---   [ anon ]
00007f05fc95d000  55948K -----   [ anon ]
00007f0600000000    132K rw---   [ anon ]
00007f0600021000  65404K -----   [ anon ]
00007f0604000000   6716K rw---   [ anon ]
00007f060468f000  58820K -----   [ anon ]
00007f0608000000   1580K rw---   [ anon ]
00007f060818b000  63956K -----   [ anon ]
00007f060c015000    512K rw---   [ anon ]
00007f060c0a2000    272K rw---   [ anon ]
00007f060c0f9000   1792K rw---   [ anon ]
00007f060c2c5000    936K rw---   [ anon ]
00007f060c3b3000   2944K rw---   [ anon ]
00007f060c693000     92K r-x-- libresolv-2.19.so
00007f060c6aa000   2048K ----- libresolv-2.19.so
00007f060c8aa000      4K r---- libresolv-2.19.so
00007f060c8ab000      4K rw--- libresolv-2.19.so
00007f060c8ac000      8K rw---   [ anon ]
00007f060c8ae000     20K r-x-- libnss_dns-2.19.so
00007f060c8b3000   2044K ----- libnss_dns-2.19.so
00007f060cab2000      4K r---- libnss_dns-2.19.so
00007f060cab3000      4K rw--- libnss_dns-2.19.so
00007f060cab4000     44K r-x-- libnss_files-2.19.so
00007f060cabf000   2044K ----- libnss_files-2.19.so
00007f060ccbe000      4K r---- libnss_files-2.19.so
00007f060ccbf000      4K rw--- libnss_files-2.19.so
00007f060ccc0000    128K rw---   [ anon ]
00007f060cce0000      4K -----   [ anon ]
00007f060cce1000   8192K rw---   [ anon ]
00007f060d4e1000      4K -----   [ anon ]
00007f060d4e2000   8192K rw---   [ anon ]
00007f060dce2000      4K -----   [ anon ]
00007f060dce3000   8320K rw---   [ anon ]
00007f060e503000      4K -----   [ anon ]
00007f060e504000   8192K rw---   [ anon ]
00007f060ed04000      4K -----   [ anon ]
00007f060ed05000   8192K rw---   [ anon ]
00007f060f505000      4K -----   [ anon ]
00007f060f506000   9216K rw---   [ anon ]
00007f060fe06000      4K -----   [ anon ]
00007f060fe07000   8192K rw---   [ anon ]
00007f0610607000   1776K r-x-- libc-2.19.so
00007f06107c3000   2044K ----- libc-2.19.so
00007f06109c2000     16K r---- libc-2.19.so
00007f06109c6000      8K rw--- libc-2.19.so
00007f06109c8000     20K rw---   [ anon ]
00007f06109cd000     88K r-x-- libgcc_s.so.1
00007f06109e3000   2044K ----- libgcc_s.so.1
00007f0610be2000      4K rw--- libgcc_s.so.1
00007f0610be3000    100K r-x-- libpthread-2.19.so
00007f0610bfc000   2044K ----- libpthread-2.19.so
00007f0610dfb000      4K r---- libpthread-2.19.so
00007f0610dfc000      4K rw--- libpthread-2.19.so
00007f0610dfd000     16K rw---   [ anon ]
00007f0610e01000     60K r-x-- libbz2.so.1.0.4
00007f0610e10000   2044K ----- libbz2.so.1.0.4
00007f061100f000      4K r---- libbz2.so.1.0.4
00007f0611010000      4K rw--- libbz2.so.1.0.4
00007f0611011000     96K r-x-- libz.so.1.2.8
00007f0611029000   2044K ----- libz.so.1.2.8
00007f0611228000      4K r---- libz.so.1.2.8
00007f0611229000      4K rw--- libz.so.1.2.8
00007f061122a000   1044K r-x-- libm-2.19.so
00007f061132f000   2044K ----- libm-2.19.so
00007f061152e000      4K r---- libm-2.19.so
00007f061152f000      4K rw--- libm-2.19.so
00007f0611530000     28K r-x-- librt-2.19.so
00007f0611537000   2044K ----- librt-2.19.so
00007f0611736000      4K r---- librt-2.19.so
00007f0611737000      4K rw--- librt-2.19.so
00007f0611738000    140K r-x-- ld-2.19.so
00007f0611776000   1544K rw---   [ anon ]
00007f0611903000    300K rw---   [ anon ]
00007f061194f000     44K rw---   [ anon ]
00007f061195a000      4K r---- ld-2.19.so
00007f061195b000      4K rw--- ld-2.19.so
00007f061195c000      4K rw---   [ anon ]
00007fff90794000    132K rw---   [ stack ]
00007fff907fe000      8K r-x--   [ anon ]
ffffffffff600000      4K r-x--   [ anon ]
 total          2752652K

As you can see there is a 1.5G anon block at the start... I'll run the same command in an hour to see which part grow...

@prune998
Copy link
Author

after a day or so, the second server, acting only as a replication, is using 1.8G of resident RAM :

4375:   /opt/data/apps/influxdb/current/influxdb -pidfile /opt/data/apps/influxdb/shared/influxdb.pid -config /opt/data/influxdb/shared/config.toml
Address           Kbytes     RSS   Dirty Mode  Mapping
0000000000400000   12288    4268       0 r-x-- influxdb
0000000001000000     156      68      36 rw--- influxdb
0000000001027000     216     128     128 rw---   [ anon ]
0000000001b66000    2656    2456    2456 rw---   [ anon ]
000000c000000000    1700    1696    1692 rw---   [ anon ]
000000c2015dc000 1847696 1784940 1778028 rw---   [ anon ]
...

@prune998
Copy link
Author

got help to debug with pprof from jvshahid, here some results :

/opt/data/apps/go/bin/go tool pprof ./influxdb http://localhost:8086/debug/pprof/heap
Fetching /pprof/heap profile from localhost:8086 to
  /tmp/PR3I2kCbpd
Wrote profile to /tmp/PR3I2kCbpd
Adjusting heap profiles for 1-in-524288 sampling rate
Welcome to pprof!  For help, type 'help'.
(pprof) top100 -cum
Total: 174.3 MB
     0.0   0.0%   0.0%    173.8  99.7% runtime.gosched0
     0.0   0.0%   0.0%    107.0  61.4% github.com/influxdb/influxdb/coordinator.(*RaftServer).startRaft
     0.0   0.0%   0.0%    107.0  61.4% github.com/influxdb/influxdb/coordinator.func.006
     0.0   0.0%   0.0%    106.5  61.1% encoding/gob.(*Decoder).Decode
     0.0   0.0%   0.0%    106.5  61.1% encoding/gob.(*Decoder).DecodeValue
     0.0   0.0%   0.0%    106.5  61.1% encoding/gob.(*Decoder).decodeMap
     0.0   0.0%   0.0%    106.5  61.1% encoding/gob.(*Decoder).decodeStruct
     0.0   0.0%   0.0%    106.5  61.1% encoding/gob.(*Decoder).decodeValue
     0.0   0.0%   0.0%    106.5  61.1% encoding/gob.decodeIntoValue
     0.0   0.0%   0.0%    106.5  61.1% encoding/gob.func.002
     0.0   0.0%   0.0%    106.5  61.1% encoding/gob.func.004
     0.0   0.0%   0.0%    106.5  61.1% github.com/influxdb/influxdb/_vendor/raft.(*server).LoadSnapshot
     0.0   0.0%   0.0%    106.5  61.1% github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).Recovery
     0.0   0.0%   0.0%     68.5  39.3% reflect.Value.SetMapIndex
    68.5  39.3%  39.3%     68.5  39.3% reflect.mapassign
     0.0   0.0%  39.3%     57.3  32.9% encoding/gob.(*Encoder).Encode
     0.0   0.0%  39.3%     57.3  32.9% encoding/gob.(*Encoder).EncodeValue
     0.0   0.0%  39.3%     57.3  32.9% github.com/influxdb/influxdb/_vendor/raft.(*server).TakeSnapshot
     0.0   0.0%  39.3%     57.3  32.9% github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).Save
     0.0   0.0%  39.3%     57.3  32.9% github.com/influxdb/influxdb/coordinator.(*RaftServer).CompactLog
     0.0   0.0%  39.3%     57.3  32.9% github.com/influxdb/influxdb/coordinator.(*RaftServer).ForceLogCompaction
     0.0   0.0%  39.3%     45.2  25.9% bytes.(*Buffer).grow
    45.2  25.9%  65.2%     45.2  25.9% bytes.makeSlice
     0.0   0.0%  65.2%     29.1  16.7% encoding/gob.(*Encoder).encode
     0.0   0.0%  65.2%     29.1  16.7% encoding/gob.(*Encoder).encodeMap
     0.0   0.0%  65.2%     29.1  16.7% encoding/gob.(*Encoder).encodeStruct
     0.0   0.0%  65.2%     29.1  16.7% encoding/gob.encodeReflectValue
     0.0   0.0%  65.2%     29.1  16.7% encoding/gob.func.015
     0.0   0.0%  65.2%     29.1  16.7% encoding/gob.func.016
     0.0   0.0%  65.2%     28.2  16.2% bytes.(*Buffer).Write
     0.0   0.0%  65.2%     28.2  16.2% encoding/gob.(*Encoder).writeMessage
    25.5  14.6%  79.9%     25.5  14.6% encoding/gob.decString
     0.0   0.0%  79.9%     17.0   9.7% bytes.(*Buffer).WriteString
     0.0   0.0%  79.9%     17.0   9.7% encoding/gob.encString
     8.6   4.9%  84.8%     12.1   6.9% reflect.Value.MapKeys
     9.5   5.5%  90.2%      9.5   5.5% makemap_c
     0.0   0.0%  90.2%      9.5   5.5% reflect.MakeMap
     0.0   0.0%  90.2%      9.5   5.5% reflect.makemap
     6.5   3.7%  94.0%      6.5   3.7% reflect.unsafe_New
     0.0   0.0%  94.0%      5.5   3.2% code.google.com/p/gogoprotobuf/proto.Unmarshal
     0.0   0.0%  94.0%      5.5   3.2% code.google.com/p/gogoprotobuf/proto.UnmarshalMerge
     5.5   3.2%  97.1%      5.5   3.2% github.com/influxdb/influxdb/_vendor/raft/protobuf.(*LogEntry).Unmarshal
     0.0   0.0%  97.1%      5.0   2.9% github.com/gorilla/mux.(*Router).ServeHTTP
     0.0   0.0%  97.1%      5.0   2.9% github.com/influxdb/influxdb/_vendor/raft.(*AppendEntriesRequest).Decode
     0.0   0.0%  97.1%      5.0   2.9% github.com/influxdb/influxdb/_vendor/raft.func.001
     0.0   0.0%  97.1%      5.0   2.9% github.com/influxdb/influxdb/_vendor/raft/protobuf.(*AppendEntriesRequest).Unmarshal
     0.0   0.0%  97.1%      5.0   2.9% net/http.(*conn).serve
     0.0   0.0%  97.1%      5.0   2.9% net/http.HandlerFunc.ServeHTTP
     0.0   0.0%  97.1%      5.0   2.9% net/http.serverHandler.ServeHTTP
     0.0   0.0%  97.1%      3.0   1.7% encoding/gob.allocValue
     0.0   0.0%  97.1%      3.0   1.7% reflect.New
     2.0   1.1%  98.3%      2.0   1.1% github.com/influxdb/influxdb/coordinator.(*ProtobufClient).readResponses
     2.0   1.1%  99.4%      2.0   1.1% github.com/influxdb/influxdb/coordinator.(*ProtobufServer).handleConnection
     0.0   0.0%  99.4%      2.0   1.1% github.com/influxdb/influxdb/coordinator.func.003
     0.0   0.0%  99.4%      0.5   0.3% _rt0_go
     0.5   0.3%  99.7%      0.5   0.3% allocg
     0.0   0.0%  99.7%      0.5   0.3% github.com/influxdb/influxdb/_vendor/raft.(*Log).open
     0.0   0.0%  99.7%      0.5   0.3% github.com/influxdb/influxdb/_vendor/raft.(*LogEntry).Decode
     0.0   0.0%  99.7%      0.5   0.3% github.com/influxdb/influxdb/_vendor/raft.(*server).Init
     0.0   0.0%  99.7%      0.5   0.3% github.com/influxdb/influxdb/_vendor/raft.(*server).Start
     0.0   0.0%  99.7%      0.5   0.3% github.com/influxdb/influxdb/_vendor/raft.(*server).followerLoop
     0.0   0.0%  99.7%      0.5   0.3% github.com/influxdb/influxdb/_vendor/raft.(*server).loop
     0.0   0.0%  99.7%      0.5   0.3% github.com/influxdb/influxdb/_vendor/raft.afterBetween
     0.0   0.0%  99.7%      0.5   0.3% github.com/influxdb/influxdb/_vendor/raft.func.007
     0.0   0.0%  99.7%      0.5   0.3% mcommoninit
     0.0   0.0%  99.7%      0.5   0.3% runtime.malg
     0.0   0.0%  99.7%      0.5   0.3% runtime.mpreinit
     0.0   0.0%  99.7%      0.5   0.3% runtime.schedinit
     0.0   0.0%  99.7%      0.5   0.3% time.After
     0.5   0.3% 100.0%      0.5   0.3% time.NewTimer

@prune998
Copy link
Author

Here is the result after some time :

(pprof) top20 --cum
Total: 184.1 MB
     0.0   0.0%   0.0%    183.6  99.7% runtime.gosched0
     0.0   0.0%   0.0%    106.5  57.8% encoding/gob.(*Decoder).Decode
     0.0   0.0%   0.0%    106.5  57.8% encoding/gob.(*Decoder).DecodeValue
     0.0   0.0%   0.0%    106.5  57.8% encoding/gob.(*Decoder).decodeMap
     0.0   0.0%   0.0%    106.5  57.8% encoding/gob.(*Decoder).decodeStruct
     0.0   0.0%   0.0%    106.5  57.8% encoding/gob.(*Decoder).decodeValue
     0.0   0.0%   0.0%    106.5  57.8% encoding/gob.decodeIntoValue
     0.0   0.0%   0.0%    106.5  57.8% encoding/gob.func.002
     0.0   0.0%   0.0%    106.5  57.8% encoding/gob.func.004
     0.0   0.0%   0.0%    106.5  57.8% github.com/influxdb/influxdb/_vendor/raft.(*server).LoadSnapshot
     0.0   0.0%   0.0%    106.5  57.8% github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).Recovery
     0.0   0.0%   0.0%    106.5  57.8% github.com/influxdb/influxdb/coordinator.(*RaftServer).startRaft
     0.0   0.0%   0.0%    106.5  57.8% github.com/influxdb/influxdb/coordinator.func.006
     0.0   0.0%   0.0%     68.5  37.2% reflect.Value.SetMapIndex
    68.5  37.2%  37.2%     68.5  37.2% reflect.mapassign
     0.0   0.0%  37.2%     38.1  20.7% encoding/gob.(*Encoder).Encode
     0.0   0.0%  37.2%     38.1  20.7% encoding/gob.(*Encoder).EncodeValue
     0.0   0.0%  37.2%     38.1  20.7% github.com/influxdb/influxdb/_vendor/raft.(*server).TakeSnapshot
     0.0   0.0%  37.2%     38.1  20.7% github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).Save
     0.0   0.0%  37.2%     38.1  20.7% github.com/influxdb/influxdb/coordinator.(*RaftServer).CompactLog

the process memory consumption seen from the OS is around 1G with a really big chuck of 800M...

As proposed, I'm going to run even more debug with :

GODEBUG="gctrace=1" HEAP_PROFILE_MMAP=true ./influxdb --stdout --profile /opt/data/influxdb/profile/influxdb -pidfile /opt/data/apps/influxdb/shared/influxdb.pid -config /opt/data/influxdb/shared/config.toml  2>&1 | tee /opt/data/influxdb/profile/influxdb.log

@prune998
Copy link
Author

Influx process is stuck at 200% CPU. trying to gdb to dump threads stack with :

thread apply all bt
Thread 15 (Thread 0x7f4d964be700 (LWP 33226)):
#0  runtime.futex () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x00000000004f75b7 in runtime.futexsleep () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/os_linux.c:49
#2  0x00000000012fc8b8 in runtime.sched ()
#3  0x0000000000000000 in ?? ()

Thread 14 (Thread 0x7f4d95bbd700 (LWP 33227)):
#0  0x00007f4d97f72093 in sys_futex (t=0x7f4d95bbcaa0, v=2, o=128, a=0x7f4d981a8f68 <heap_lock>) at ./src/base/linux_syscall_support.h:1929
#1  base::internal::SpinLockDelay (w=w@entry=0x7f4d981a8f68 <heap_lock>, value=2, loop=loop@entry=24854) at ./src/base/spinlock_linux-inl.h:87
#2  0x00007f4d97f71f87 in SpinLock::SlowLock (this=this@entry=0x7f4d981a8f68 <heap_lock>) at src/base/spinlock.cc:132
#3  0x00007f4d97f6e484 in Lock (this=0x7f4d981a8f68 <heap_lock>) at src/base/spinlock.h:70
#4  SpinLockHolder (l=0x7f4d981a8f68 <heap_lock>, this=<synthetic pointer>) at src/base/spinlock.h:135
#5  RecordAlloc (skip_count=0, bytes=12, ptr=0x2ce1780) at src/heap-profiler.cc:317
#6  NewHook (ptr=0x2ce1780, size=12) at src/heap-profiler.cc:339
#7  0x00007f4d97f69682 in MallocHook::InvokeNewHookSlow (p=p@entry=0x2ce1780, s=s@entry=12) at src/malloc_hook.cc:525
#8  0x00007f4d97f7431b in InvokeNewHook (s=12, p=0x2ce1780) at src/malloc_hook-inl.h:161
#9  tc_malloc (size=size@entry=12) at src/tcmalloc.cc:1564
#10 0x00000000004cf96c in x_cgo_malloc (p=0x7f4d915be388) at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/cgo/gcc_util.c:16
#11 0x0000000000516511 in runtime.asmcgocall () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/asm_amd64.s:705
#12 0x000000000811b680 in ?? ()
#13 0x0000000000506ef7 in selparkcommit () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/chan.goc:631
#14 0x000000c20811a900 in ?? ()
#15 0x00000000004fcc6e in park0 () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1410
#16 0x000000c20811ab40 in ?? ()
#17 0x00007f4d915be328 in ?? ()
#18 0x000000c20811a900 in ?? ()
#19 0x0000000000514dfb in runtime.mcall () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/asm_amd64.s:181
#20 0x000000c20811ab40 in ?? ()
#21 0x000000c208012900 in ?? ()
#22 0x00000000004cfa1c in crosscall_amd64 () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/cgo/gcc_amd64.S:35
#23 0x00007f4d95bbd700 in ?? ()
#24 0x00007f4d95bbd9c0 in ?? ()
#25 0x0000000000000000 in ?? ()

Thread 13 (Thread 0x7f4d953bc700 (LWP 33228)):
#0  runtime.futex () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x00000000004f75b7 in runtime.futexsleep () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/os_linux.c:49
#2  0x00000000012fc500 in text/template.zero ()
#3  0x0000000000000000 in ?? ()

Thread 12 (Thread 0x7f4d94bbb700 (LWP 33229)):
#0  0x00007f4d97f72093 in sys_futex (t=0x7f4d94bbaab0, v=2, o=128, a=0x7f4d981a8f68 <heap_lock>) at ./src/base/linux_syscall_support.h:1929
#1  base::internal::SpinLockDelay (w=w@entry=0x7f4d981a8f68 <heap_lock>, value=2, loop=loop@entry=68594) at ./src/base/spinlock_linux-inl.h:87
#2  0x00007f4d97f71f87 in SpinLock::SlowLock (this=this@entry=0x7f4d981a8f68 <heap_lock>) at src/base/spinlock.cc:132
#3  0x00007f4d97f6e484 in Lock (this=0x7f4d981a8f68 <heap_lock>) at src/base/spinlock.h:70
#4  SpinLockHolder (l=0x7f4d981a8f68 <heap_lock>, this=<synthetic pointer>) at src/base/spinlock.h:135
#5  RecordAlloc (skip_count=0, bytes=32, ptr=0x2ead8a0) at src/heap-profiler.cc:317
#6  NewHook (ptr=0x2ead8a0, size=32) at src/heap-profiler.cc:339
#7  0x00007f4d97f69682 in MallocHook::InvokeNewHookSlow (p=p@entry=0x2ead8a0, s=s@entry=32) at src/malloc_hook.cc:525
#8  0x00007f4d97f7431b in InvokeNewHook (s=32, p=0x2ead8a0) at src/malloc_hook-inl.h:161
#9  tc_malloc (size=size@entry=32) at src/tcmalloc.cc:1564
#10 0x00000000004cf9be in x_cgo_thread_start (arg=0x7f4d981ffe88) at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/cgo/gcc_util.c:39
#11 0x0000000000516511 in runtime.asmcgocall () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/asm_amd64.s:705
#12 0x000000c2165a6000 in ?? ()
#13 0x0000000000506ef7 in selparkcommit () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/chan.goc:631
#14 0x00000000004fd8c8 in mstackalloc () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1755
#15 0x000000c2080b4268 in ?? ()
#16 0x000000c200008000 in ?? ()
#17 0x00007f4d981ffe70 in ?? ()
#18 0x000000c2080b4240 in ?? ()
#19 0x0000000000514dfb in runtime.mcall () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/asm_amd64.s:181
#20 0x000000c2080b4240 in ?? ()
#21 0x000000c208012900 in ?? ()
#22 0x00000000004cfa1c in crosscall_amd64 () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/cgo/gcc_amd64.S:35
#23 0x00007f4d94bbb700 in ?? ()
#24 0x00007f4d94bbb9c0 in ?? ()
#25 0x0000000000000000 in ?? ()

Thread 11 (Thread 0x7f4d943ba700 (LWP 33230)):
#0  runtime.futex () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x00000000004f75b7 in runtime.futexsleep () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/os_linux.c:49
#2  0x00007f4d92e08008 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 10 (Thread 0x7f4d93919700 (LWP 33231)):
#0  runtime.futex () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x00000000004f75b7 in runtime.futexsleep () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/os_linux.c:49
#2  0x000000c208099bf0 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f4d92de7700 (LWP 33232)):
#0  runtime.futex () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x00000000004f75b7 in runtime.futexsleep () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/os_linux.c:49
#2  0x00007f4d9820bf60 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 8 (Thread 0x7f4d925c6700 (LWP 33233)):
#0  runtime.futex () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x00000000004f7622 in runtime.futexsleep () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/os_linux.c:55
#2  0x00000000012fc8a8 in runtime.sched ()
#3  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f4d91dc5700 (LWP 33234)):
#0  0x00007f4d97f72093 in sys_futex (t=0x7f4d91dc4a60, v=2, o=128, a=0x7f4d981a8f68 <heap_lock>) at ./src/base/linux_syscall_support.h:1929
#1  base::internal::SpinLockDelay (w=w@entry=0x7f4d981a8f68 <heap_lock>, value=2, loop=loop@entry=26689) at ./src/base/spinlock_linux-inl.h:87
#2  0x00007f4d97f71f87 in SpinLock::SlowLock (this=this@entry=0x7f4d981a8f68 <heap_lock>) at src/base/spinlock.cc:132
#3  0x00007f4d97f6e484 in Lock (this=0x7f4d981a8f68 <heap_lock>) at src/base/spinlock.h:70
#4  SpinLockHolder (l=0x7f4d981a8f68 <heap_lock>, this=<synthetic pointer>) at src/base/spinlock.h:135
#5  RecordAlloc (skip_count=0, bytes=8, ptr=0x2cda488) at src/heap-profiler.cc:317
#6  NewHook (ptr=0x2cda488, size=8) at src/heap-profiler.cc:339
#7  0x00007f4d97f69682 in MallocHook::InvokeNewHookSlow (p=p@entry=0x2cda488, s=s@entry=8) at src/malloc_hook.cc:525
#8  0x00007f4d97f75c9b in InvokeNewHook (s=8, p=0x2cda488) at src/malloc_hook-inl.h:161
#9  tc_new (size=size@entry=8) at src/tcmalloc.cc:1607
#10 0x00000000008f1b6a in rocksdb_writebatch_create () at db/c.cc:861
#11 0x00000000004e519b in _cgo_e19cb4f8be60_Cfunc_rocksdb_writebatch_create ()
#12 0x0000000000516511 in runtime.asmcgocall () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/asm_amd64.s:705
#13 0x00007f4d9838eef8 in ?? ()
#14 0x0000000000000002 in ?? ()
#15 0x000000c2194c0900 in ?? ()
#16 0x00000000004fcc6e in park0 () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1410
#17 0x000000c20811a6c0 in ?? ()
#18 0x00007f4d8da0b860 in ?? ()
#19 0x000000c2194c0900 in ?? ()
#20 0x0000000000514dfb in runtime.mcall () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/asm_amd64.s:181
#21 0x000000c20811a6c0 in ?? ()
#22 0x000000c208012900 in ?? ()
#23 0x00000000004cfa1c in crosscall_amd64 () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/cgo/gcc_amd64.S:35
#24 0x00007f4d91dc5700 in ?? ()
#25 0x00007f4d91dc59c0 in ?? ()
#26 0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f4d90d32700 (LWP 33235)):
#0  runtime.osyield () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:343
#1  0x00000000004fb865 in lockextra () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:886
#2  0x00000000012fb4b0 in runtime.externalthreadhandlerp ()
#3  0x000000c21e83d680 in ?? ()
#4  0x00000000004fb8ec in unlockextra () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:907
#5  0x00000000012fb4b0 in runtime.externalthreadhandlerp ()
#6  0x000000c21660c000 in ?? ()
#7  0x00000000004fb5f7 in runtime.needm () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:759
#8  0x000000c21660c000 in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f4d90531700 (LWP 33236)):
#0  runtime.osyield () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:343
#1  0x00000000004fb865 in lockextra () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:886
#2  0x00000000012fb4b0 in runtime.externalthreadhandlerp ()
#3  0x000000c21e83d680 in ?? ()
#4  0x00000000004fb8ec in unlockextra () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:907
---Type <return> to continue, or q <return> to quit---
#5  0x00000000012fb4b0 in runtime.externalthreadhandlerp ()
#6  0x000000c21660c000 in ?? ()
#7  0x00000000004fb5f7 in runtime.needm () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:759
#8  0x000000c21660c000 in ?? ()
#9  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f4d8fd30700 (LWP 33237)):
#0  runtime.osyield () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:343
#1  0x00000000004fb865 in lockextra () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:886
#2  0x00000000012fb4b0 in runtime.externalthreadhandlerp ()
#3  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f4d8f52f700 (LWP 33238)):
#0  runtime.osyield () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:343
#1  0x00000000004fb865 in lockextra () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:886
#2  0x00000000012fb4b0 in runtime.externalthreadhandlerp ()
#3  0x00007f4d8f52c6b0 in ?? ()
#4  0xffffffffffffffff in ?? ()
#5  0x0000000000000001 in ?? ()
#6  0x00007f4d8f52c788 in ?? ()
#7  0x00000000004fb5f7 in runtime.needm () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:759
#8  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f4d8d2b0700 (LWP 33257)):
#0  runtime.osyield () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/sys_linux_amd64.s:343
#1  0x00000000004fb865 in lockextra () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:886
#2  0x00000000012fb4b0 in runtime.externalthreadhandlerp ()
#3  0x0000000002d41400 in ?? ()
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f4d983bd880 (LWP 33225)):
#0  0x00007f4d97f72093 in sys_futex (t=0x7fffdcd28560, v=2, o=128, a=0x7f4d981a8f68 <heap_lock>) at ./src/base/linux_syscall_support.h:1929
#1  base::internal::SpinLockDelay (w=w@entry=0x7f4d981a8f68 <heap_lock>, value=2, loop=loop@entry=34060) at ./src/base/spinlock_linux-inl.h:87
#2  0x00007f4d97f71f87 in SpinLock::SlowLock (this=this@entry=0x7f4d981a8f68 <heap_lock>) at src/base/spinlock.cc:132
#3  0x00007f4d97f6e484 in Lock (this=0x7f4d981a8f68 <heap_lock>) at src/base/spinlock.h:70
#4  SpinLockHolder (l=0x7f4d981a8f68 <heap_lock>, this=<synthetic pointer>) at src/base/spinlock.h:135
#5  RecordAlloc (skip_count=0, bytes=8, ptr=0x2cda480) at src/heap-profiler.cc:317
#6  NewHook (ptr=0x2cda480, size=8) at src/heap-profiler.cc:339
#7  0x00007f4d97f69682 in MallocHook::InvokeNewHookSlow (p=p@entry=0x2cda480, s=s@entry=8) at src/malloc_hook.cc:525
#8  0x00007f4d97f75c9b in InvokeNewHook (s=8, p=0x2cda480) at src/malloc_hook-inl.h:161
#9  tc_new (size=size@entry=8) at src/tcmalloc.cc:1607
#10 0x00000000008f1b6a in rocksdb_writebatch_create () at db/c.cc:861
#11 0x00000000004e519b in _cgo_e19cb4f8be60_Cfunc_rocksdb_writebatch_create ()
#12 0x0000000000516511 in runtime.asmcgocall () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/asm_amd64.s:705
#13 0x0000000000000002 in ?? ()
#14 0x000000c20811ad80 in ?? ()
#15 0x00000000004fcc6e in park0 () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1410
#16 0x000000c20811a6c0 in ?? ()
#17 0x00007f4d9838eef8 in ?? ()
#18 0x00007f4d915a2860 in ?? ()
#19 0x000000c20811ad80 in ?? ()
#20 0x000000c20811a6c0 in ?? ()
#21 0x00000000012fce28 in runtime.g0 ()
#22 0x0000000000514d07 in _rt0_go () at /home/jvshahid/.gvm/gos/go1.3.1/src/pkg/runtime/asm_amd64.s:103
#23 0x00007fffdcd28978 in ?? ()
#24 0x0000000000000008 in ?? ()
#25 0x00007fffdcd28978 in ?? ()
#26 0x0000000000000000 in ?? ()

@prune998
Copy link
Author

profile of the debug run (ended with a lock) is on dropbox :
https://www.dropbox.com/s/zwfoxzt7i9957py/profile-prune.tar.gz?dl=0

@prune998
Copy link
Author

I had a better run with memory usage going up to 1+ gig.
Process finaly crashed by itself so you have a complete view.

https://dl.dropboxusercontent.com/u/1965631/profile-prune-2.tar.gz

@prune998
Copy link
Author

as requested on IRC, LOG files from rocksdb from both servers.
replica server is the one with the profile files above.

https://dl.dropboxusercontent.com/u/1965631/rocksdb-logs-prune-master.tar.gz
https://dl.dropboxusercontent.com/u/1965631/rocksdb-logs-prune-replica.tar.gz

@prune998
Copy link
Author

the leader process (also the one receiving reads and writes) crashed with out of memory (no OOM) :

fatal error: runtime: out of memory

goroutine 51 [running]:
runtime.throw(0x100d557)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/panic.c:520 +0x69 fp=0xc212f6f1c0 sp=0xc212f6f1a8
runtime.SysMap(0xc2f3280000, 0x5300000, 0x507c00, 0x1042e78)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/mem_linux.c:147 +0x93 fp=0xc212f6f1f0 sp=0xc212f6f1c0
runtime.MHeap_SysAlloc(0x104ee60, 0x5300000)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/malloc.goc:616 +0x15b fp=0xc212f6f248 sp=0xc212f6f1f0
MHeap_Grow(0x104ee60, 0x2980)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/mheap.c:319 +0x5d fp=0xc212f6f288 sp=0xc212f6f248
MHeap_AllocLocked(0x104ee60, 0x2980, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/mheap.c:222 +0x379 fp=0xc212f6f2c8 sp=0xc212f6f288
runtime.MHeap_Alloc(0x104ee60, 0x2980, 0x10100000000)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/mheap.c:178 +0x7b fp=0xc212f6f2f0 sp=0xc212f6f2c8
largealloc(0xc200000001, 0xc212f6f3a0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/malloc.goc:224 +0xa2 fp=0xc212f6f338 sp=0xc212f6f2f0
runtime.mallocgc(0x52ffc00, 0xae0981, 0x1)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/malloc.goc:169 +0xb6 fp=0xc212f6f3a0 sp=0xc212f6f338
cnew(0xae0980, 0x52ffc00, 0x1)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/malloc.goc:836 +0xc1 fp=0xc212f6f3c0 sp=0xc212f6f3a0
runtime.cnewarray(0xae0980, 0x52ffc00)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/malloc.goc:849 +0x3a fp=0xc212f6f3e0 sp=0xc212f6f3c0
makeslice1(0xac9080, 0x52ffc00, 0x52ffc00, 0xc212f6f440)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/slice.goc:55 +0x4d fp=0xc212f6f3f8 sp=0xc212f6f3e0
runtime.makeslice(0xac9080, 0x52ffc00, 0x52ffc00, 0x0, 0x52ffc00, 0x52ffc00)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/slice.goc:36 +0xb3 fp=0xc212f6f428 sp=0xc212f6f3f8
bytes.makeSlice(0x52ffc00, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/bytes/buffer.go:191 +0x64 fp=0xc212f6f460 sp=0xc212f6f428
bytes.(*Buffer).grow(0xc29393a4d0, 0x400, 0x400)
        /root/.gvm/gos/go1.3.1/src/pkg/bytes/buffer.go:99 +0x204 fp=0xc212f6f500 sp=0xc212f6f460
bytes.(*Buffer).Write(0xc29393a4d0, 0xc274274038, 0x400, 0x400, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/bytes/buffer.go:127 +0x56 fp=0xc212f6f568 sp=0xc212f6f500
encoding/base64.(*encoder).Write(0xc274274000, 0xc2f0fa7a00, 0x445236, 0x445a3a, 0x1f1fa00, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/base64/base64.go:165 +0x447 fp=0xc212f6f620 sp=0xc212f6f568
encoding/json.encodeByteSlice(0xc29393a4d0, 0xac9080, 0xc29a0f3068, 0x0, 0x176, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/json/encode.go:646 +0x386 fp=0xc212f6f708 sp=0xc212f6f620
encoding/json.(*structEncoder).encode(0xc20c119e90, 0xc29393a4d0, 0xc0bcc0, 0xc29a0f3040, 0x0, 0x196, 0x1028600)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/json/encode.go:582 +0x2fd fp=0xc212f6f8d0 sp=0xc212f6f708
encoding/json.*structEncoder.(encoding/json.encode)·fm(0xc29393a4d0, 0xc0bcc0, 0xc29a0f3040, 0x0, 0x196, 0xc29a0f3000)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/json/encode.go:596 +0x62 fp=0xc212f6f910 sp=0xc212f6f8d0
encoding/json.(*ptrEncoder).encode(0xc212e3d1f0, 0xc29393a4d0, 0xbcd840, 0xc29a0f3040, 0x0, 0x160, 0x100)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/json/encode.go:704 +0x128 fp=0xc212f6f980 sp=0xc212f6f910
encoding/json.*ptrEncoder.(encoding/json.encode)·fm(0xc29393a4d0, 0xbcd840, 0xc29a0f3040, 0x0, 0x160, 0xc29a0f3000)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/json/encode.go:709 +0x62 fp=0xc212f6f9c0 sp=0xc212f6f980
encoding/json.(*encodeState).reflectValue(0xc29393a4d0, 0xbcd840, 0xc29a0f3040, 0x0, 0x160)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/json/encode.go:297 +0x85 fp=0xc212f6f9f8 sp=0xc212f6f9c0
encoding/json.(*encodeState).marshal(0xc29393a4d0, 0xbcd840, 0xc29a0f3040, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/json/encode.go:268 +0xe9 fp=0xc212f6fa60 sp=0xc212f6f9f8
encoding/json.Marshal(0xbcd840, 0xc29a0f3040, 0x0, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/encoding/json/encode.go:133 +0x9c fp=0xc212f6fac0 sp=0xc212f6fa60
github.com/influxdb/influxdb/_vendor/raft.(*Snapshot).save(0xc29a0f3040, 0x0, 0x0)
        /root/gocodez/src/github.com/influxdb/influxdb/_vendor/raft/snapshot.go:64 +0xff fp=0xc212f6fbc0 sp=0xc212f6fac0
github.com/influxdb/influxdb/_vendor/raft.(*server).saveSnapshot(0xc20807cb40, 0x0, 0x0)
        /root/gocodez/src/github.com/influxdb/influxdb/_vendor/raft/server.go:1239 +0xf0 fp=0xc212f6fc20 sp=0xc212f6fbc0
github.com/influxdb/influxdb/_vendor/raft.(*server).TakeSnapshot(0xc20807cb40, 0x0, 0x0)
        /root/gocodez/src/github.com/influxdb/influxdb/_vendor/raft/server.go:1219 +0x76a fp=0xc212f6fe10 sp=0xc212f6fc20
github.com/influxdb/influxdb/coordinator.(*RaftServer).ForceLogCompaction(0xc2080f00b0, 0x0, 0x0)
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/raft_server.go:334 +0x6a fp=0xc212f6feb0 sp=0xc212f6fe10
github.com/influxdb/influxdb/coordinator.(*RaftServer).CompactLog(0xc2080f00b0)
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/raft_server.go:359 +0x2cf fp=0xc212f6ffa0 sp=0xc212f6feb0
runtime.goexit()
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1445 fp=0xc212f6ffa8 sp=0xc212f6ffa0
created by github.com/influxdb/influxdb/coordinator.(*RaftServer).startRaft
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/raft_server.go:401 +0x487
goroutine 16 [IO wait]:
net.runtime_pollWait(0x7f65487f63c0, 0x72, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc211145b10, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc211145b10, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc211145ab0, 0xd93670, 0x0, 0x7f654c46a2b8, 0xb)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc210be33d0, 0x8, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:234 +0x5d
net.(*TCPListener).Accept(0xc210be33d0, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:244 +0x4b
net/http.(*Server).Serve(0xc20faf6d20, 0x7f65487f5788, 0xc210be33d0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:1698 +0x91
github.com/influxdb/influxdb/api/http.(*HttpServer).serveListener(0xc2080f0160, 0x7f65487f5788, 0xc210be33d0, 0xc210be33d8)
        /root/gocodez/src/github.com/influxdb/influxdb/api/http/api.go:210 +0xb9
github.com/influxdb/influxdb/api/http.(*HttpServer).Serve(0xc2080f0160, 0x7f65487f5788, 0xc210be33d0)
        /root/gocodez/src/github.com/influxdb/influxdb/api/http/api.go:180 +0xf55
github.com/influxdb/influxdb/api/http.(*HttpServer).ListenAndServe(0xc2080f0160)
        /root/gocodez/src/github.com/influxdb/influxdb/api/http/api.go:94 +0x1a4
github.com/influxdb/influxdb/server.(*Server).ListenAndServe(0xc208042480, 0x0, 0x0)
        /root/gocodez/src/github.com/influxdb/influxdb/server/server.go:191 +0x9b9
main.main()
        /root/gocodez/src/github.com/influxdb/influxdb/daemon/influxd.go:166 +0xec7

goroutine 19 [finalizer wait, 1663 minutes]:
runtime.park(0x4f7b20, 0x1026920, 0x1010ee9)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0x1026920, 0x1010ee9)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1445

goroutine 20 [syscall, 4822 minutes]:
os/signal.loop()
        /root/.gvm/gos/go1.3.1/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
        /root/.gvm/gos/go1.3.1/src/pkg/os/signal/signal_unix.go:27 +0x32

goroutine 21 [chan receive, 4822 minutes]:
code.google.com/p/log4go.ConsoleLogWriter.run(0xc20806e000, 0x7f654c46a440, 0xc20802c008)
        /root/gocodez/src/code.google.com/p/log4go/termlog.go:27 +0x79
created by code.google.com/p/log4go.NewConsoleLogWriter
        /root/gocodez/src/code.google.com/p/log4go/termlog.go:19 +0x68

goroutine 17 [syscall, 4822 minutes]:
runtime.goexit()
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/proc.c:1445

goroutine 22 [select]:
code.google.com/p/log4go.func·002()
        /root/gocodez/src/code.google.com/p/log4go/filelog.go:84 +0x8ac
created by code.google.com/p/log4go.NewFileLogWriter
        /root/gocodez/src/code.google.com/p/log4go/filelog.go:116 +0x2c4

goroutine 23 [chan receive, 1651 minutes]:
github.com/influxdb/influxdb/wal.(*WAL).processEntries(0xc2080bc620)
        /root/gocodez/src/github.com/influxdb/influxdb/wal/wal.go:252 +0x64
created by github.com/influxdb/influxdb/wal.NewWAL
        /root/gocodez/src/github.com/influxdb/influxdb/wal/wal.go:103 +0xa53
goroutine 24 [sleep, 1 minutes]:
time.Sleep(0x8bb2c97000)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/time.goc:39 +0x31
github.com/influxdb/influxdb/cluster.func·001()
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/cluster_configuration.go:170 +0x3d
created by github.com/influxdb/influxdb/cluster.(*ClusterConfiguration).CreateFutureShardsAutomaticallyBeforeTimeComes
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/cluster_configuration.go:177 +0x63

goroutine 25 [chan receive, 4822 minutes]:
main.waitForSignals(0x7f65487f5708, 0xc208042480)
        /root/gocodez/src/github.com/influxdb/influxdb/daemon/null_profiler.go:23 +0x14c
created by main.startProfiler
        /root/gocodez/src/github.com/influxdb/influxdb/daemon/null_profiler.go:15 +0x4b

goroutine 26 [IO wait, 2052 minutes]:
net.runtime_pollWait(0x7f65487f6730, 0x72, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc2087d4060, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc2087d4060, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc2087d4000, 0xd93670, 0x0, 0x7f654c46a2b8, 0xb)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc20802c030, 0x8, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:234 +0x5d
net.(*TCPListener).Accept(0xc20802c030, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:244 +0x4b
net/http.(*Server).Serve(0xc208004360, 0x7f65487f5788, 0xc20802c030, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:1698 +0x91
github.com/influxdb/influxdb/coordinator.func·008()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/raft_server.go:579 +0x3a
created by github.com/influxdb/influxdb/coordinator.(*RaftServer).Serve
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/raft_server.go:583 +0x516

goroutine 53 [select, 1651 minutes]:
github.com/influxdb/influxdb/cluster.(*WriteBuffer).handleWrites(0xc2105fc8c0)
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/write_buffer.go:75 +0xd3
created by github.com/influxdb/influxdb/cluster.NewWriteBuffer
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/write_buffer.go:43 +0x286

goroutine 59 [IO wait, 4820 minutes]:
net.runtime_pollWait(0x7f65487f6520, 0x72, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc21110ced0, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc21110ced0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc21110ce70, 0xd93670, 0x0, 0x7f654c46a2b8, 0xb)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc212645c50, 0x18, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:234 +0x5d
net.(*TCPListener).Accept(0xc212645c50, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:244 +0x4b
github.com/influxdb/influxdb/api/graphite.(*Server).Serve(0xc2086c6870, 0x7f65487f5788, 0xc212645c50)
        /root/gocodez/src/github.com/influxdb/influxdb/api/graphite/api.go:118 +0x3b
github.com/influxdb/influxdb/api/graphite.(*Server).ListenAndServe(0xc2086c6870)
        /root/gocodez/src/github.com/influxdb/influxdb/api/graphite/api.go:113 +0x26d
created by github.com/influxdb/influxdb/server.(*Server).ListenAndServe
        /root/gocodez/src/github.com/influxdb/influxdb/server/server.go:149 +0xfdc
goroutine 29 [runnable]:
time.Sleep(0x5f5e100)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/time.goc:39 +0x31
github.com/influxdb/influxdb/cluster.(*WriteBuffer).write(0xc20c4477a0, 0xc255cd1300)
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/write_buffer.go:105 +0x54c
github.com/influxdb/influxdb/cluster.(*WriteBuffer).handleWrites(0xc20c4477a0)
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/write_buffer.go:79 +0xc0
created by github.com/influxdb/influxdb/cluster.NewWriteBuffer
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/write_buffer.go:43 +0x286

goroutine 30 [sleep]:
time.Sleep(0xbebc200)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/time.goc:39 +0x31
github.com/influxdb/influxdb/coordinator.(*ProtobufClient).readResponses(0xc20803b040)
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/protobuf_client.go:167 +0x12e
github.com/influxdb/influxdb/coordinator.func·005()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/protobuf_client.go:68 +0x3f
created by github.com/influxdb/influxdb/coordinator.(*ProtobufClient).connect
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/protobuf_client.go:69 +0x96

goroutine 31 [sleep]:
time.Sleep(0xdf8475800)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/time.goc:39 +0x31
github.com/influxdb/influxdb/coordinator.(*ProtobufClient).peridicallySweepTimedOutRequests(0xc20803b040)
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/protobuf_client.go:244 +0x37
created by github.com/influxdb/influxdb/coordinator.(*ProtobufClient).connect
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/protobuf_client.go:70 +0xb1

goroutine 49 [sleep]:
time.Sleep(0x2540be400)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/time.goc:39 +0x31
github.com/influxdb/influxdb/cluster.(*ClusterServer).handleHeartbeatError(0xc2080423f0, 0x7f654c4661b0, 0xc238ed5560)
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/cluster_server.go:212 +0x1b8
github.com/influxdb/influxdb/cluster.(*ClusterServer).heartbeat(0xc2080423f0)
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/cluster_server.go:166 +0x14f
created by github.com/influxdb/influxdb/cluster.(*ClusterServer).StartHeartbeat
        /root/gocodez/src/github.com/influxdb/influxdb/cluster/cluster_server.go:91 +0x5a

goroutine 52 [IO wait, 2317 minutes]:
net.runtime_pollWait(0x7f65487f6680, 0x72, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc2105fc990, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc2105fc990, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc2105fc930, 0xd93670, 0x0, 0x7f654c46a2b8, 0xb)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc2104ad658, 0x18, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:234 +0x5d
net.(*TCPListener).Accept(0xc2104ad658, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:244 +0x4b
github.com/influxdb/influxdb/coordinator.(*ProtobufServer).ListenAndServe(0xc2082afd00)
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/protobuf_server.go:64 +0x1c7
created by github.com/influxdb/influxdb/server.(*Server).ListenAndServe
        /root/gocodez/src/github.com/influxdb/influxdb/server/server.go:122 +0x2a2

goroutine 50 [select]:
github.com/influxdb/influxdb/_vendor/raft.(*server).candidateLoop(0xc20807cb40)
        /root/gocodez/src/github.com/influxdb/influxdb/_vendor/raft/server.go:779 +0xa09
github.com/influxdb/influxdb/_vendor/raft.(*server).loop(0xc20807cb40)
        /root/gocodez/src/github.com/influxdb/influxdb/_vendor/raft/server.go:607 +0x2b0
github.com/influxdb/influxdb/_vendor/raft.func·007()
        /root/gocodez/src/github.com/influxdb/influxdb/_vendor/raft/server.go:470 +0x5d
created by github.com/influxdb/influxdb/_vendor/raft.(*server).Start
        /root/gocodez/src/github.com/influxdb/influxdb/_vendor/raft/server.go:471 +0x3b8
goroutine 58 [IO wait, 4821 minutes]:
net.runtime_pollWait(0x7f65487f6470, 0x72, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc21110cf40, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc21110cf40, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc21110cee0, 0xd93670, 0x0, 0x7f654c46a2b8, 0xb)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc212645c58, 0xc2105ceba8, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:234 +0x5d
net.(*TCPListener).Accept(0xc212645c58, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:244 +0x4b
net/http.(*Server).Serve(0xc20ec185a0, 0x7f65487f5788, 0xc212645c58, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:1698 +0x91
net/http.Serve(0x7f65487f5788, 0xc212645c58, 0x7f653840f610, 0xc20c014bb0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:1576 +0x7c
github.com/influxdb/influxdb/admin.(*HttpServer).ListenAndServe(0xc2082ae000)
        /root/gocodez/src/github.com/influxdb/influxdb/admin/http_server.go:35 +0x1ea
created by github.com/influxdb/influxdb/server.(*Server).ListenAndServe
        /root/gocodez/src/github.com/influxdb/influxdb/server/server.go:136 +0x556

goroutine 60 [chan receive, 501 minutes]:
github.com/influxdb/influxdb/server.(*Server).startReportingLoop(0xc208042480, 0xc20be2a4e0)
        /root/gocodez/src/github.com/influxdb/influxdb/server/server.go:203 +0xc9
created by github.com/influxdb/influxdb/server.(*Server).ListenAndServe
        /root/gocodez/src/github.com/influxdb/influxdb/server/server.go:184 +0x8aa

goroutine 41 [select]:
github.com/influxdb/influxdb/api/graphite.(*Server).committer(0xc2086c6870)
        /root/gocodez/src/github.com/influxdb/influxdb/api/graphite/api.go:211 +0x575
created by github.com/influxdb/influxdb/api/graphite.(*Server).ListenAndServe
        /root/gocodez/src/github.com/influxdb/influxdb/api/graphite/api.go:112 +0x241

goroutine 4566123 [chan receive, 198 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4585645 [chan receive, 181 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4752222 [chan receive, 33 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 61 [IO wait]:
net.runtime_pollWait(0x7f65487f6260, 0x72, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc212cf1f00, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc212cf1f00, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).accept(0xc212cf1ea0, 0xd93670, 0x0, 0x7f654c46a2b8, 0xb)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_unix.go:419 +0x343
net.(*TCPListener).AcceptTCP(0xc210be36b8, 0xc228bf0120, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:234 +0x5d
net.(*TCPListener).Accept(0xc210be36b8, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/tcpsock_posix.go:244 +0x4b
crypto/tls.(*listener).Accept(0xc209c195c0, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/crypto/tls/tls.go:46 +0x6a
net/http.(*Server).Serve(0xc210d38240, 0x7f653840fe18, 0xc209c195c0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:1698 +0x91
github.com/influxdb/influxdb/api/http.(*HttpServer).serveListener(0xc2080f0160, 0x7f653840fe18, 0xc209c195c0, 0xc210be33d8)
        /root/gocodez/src/github.com/influxdb/influxdb/api/http/api.go:210 +0xb9
github.com/influxdb/influxdb/api/http.(*HttpServer).startSsl(0xc2080f0160, 0xc210be33d8)
        /root/gocodez/src/github.com/influxdb/influxdb/api/http/api.go:205 +0x402
created by github.com/influxdb/influxdb/api/http.(*HttpServer).Serve
        /root/gocodez/src/github.com/influxdb/influxdb/api/http/api.go:179 +0xf27

goroutine 4589020 [chan receive, 178 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4692949 [chan receive, 85 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4584005 [chan receive, 182 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4643403 [chan receive, 129 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4616432 [chan receive, 153 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4695759 [chan receive, 83 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4633246 [chan receive, 138 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4580101 [chan receive, 186 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4775957 [chan receive, 11 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4787779 [chan receive, 1 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4650693 [chan receive, 123 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4618133 [chan receive, 152 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4789607 [IO wait]:
net.runtime_pollWait(0x7f65158e0938, 0x72, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/runtime/netpoll.goc:146 +0x66
net.(*pollDesc).Wait(0xc23e4a1aa0, 0x72, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:84 +0x46
net.(*pollDesc).WaitRead(0xc23e4a1aa0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_poll_runtime.go:89 +0x42
net.(*netFD).Read(0xc23e4a1a40, 0xc21ed01000, 0x1000, 0x1000, 0x0, 0x7f654c46a2b8, 0xb)
        /root/.gvm/gos/go1.3.1/src/pkg/net/fd_unix.go:242 +0x34c
net.(*conn).Read(0xc254273fc8, 0xc21ed01000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/net.go:122 +0xe7
net/http.(*liveSwitchReader).Read(0xc20caa6328, 0xc21ed01000, 0x1000, 0x1000, 0x702f5b, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:206 +0xaf
io.(*LimitedReader).Read(0xc2860588e0, 0xc21ed01000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/io/io.go:399 +0xd0
bufio.(*Reader).fill(0xc232168180)
        /root/.gvm/gos/go1.3.1/src/pkg/bufio/bufio.go:97 +0x1b3
bufio.(*Reader).ReadSlice(0xc232168180, 0xc22c68680a, 0x0, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/bufio/bufio.go:298 +0x22c
bufio.(*Reader).ReadLine(0xc232168180, 0x0, 0x0, 0x0, 0x506300, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/bufio/bufio.go:326 +0x69
net/textproto.(*Reader).readLineSlice(0xc28dc055f0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/textproto/reader.go:55 +0x9d
net/textproto.(*Reader).ReadLine(0xc28dc055f0, 0x0, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/textproto/reader.go:36 +0x4e
net/http.ReadRequest(0xc232168180, 0xc22c686ea0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/request.go:556 +0xc7
net/http.(*conn).readRequest(0xc20caa6300, 0x0, 0x0, 0x0)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:577 +0x276
net/http.(*conn).serve(0xc20caa6300)
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:1132 +0x61e
created by net/http.(*Server).Serve
        /root/.gvm/gos/go1.3.1/src/pkg/net/http/server.go:1721 +0x313
goroutine 4627079 [chan receive, 144 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4559461 [chan receive, 204 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4762404 [chan receive, 24 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4629345 [chan receive, 142 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4613643 [chan receive, 156 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4595181 [chan receive, 172 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4588473 [chan receive, 178 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4573399 [chan receive, 192 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4701430 [chan receive, 78 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4622054 [chan receive, 148 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4774257 [chan receive, 13 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf
goroutine 4627106 [chan receive, 144 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4682258 [chan receive, 95 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4634382 [chan receive, 137 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4619251 [chan receive, 151 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4764663 [chan receive, 22 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4617567 [chan receive, 152 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4594626 [chan receive, 173 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4577860 [chan receive, 188 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4628230 [chan receive, 143 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4635521 [chan receive, 136 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4770879 [chan receive, 16 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4587347 [chan receive, 179 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4600773 [chan receive, 167 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4572217 [chan receive, 193 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4571713 [chan receive, 193 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4768610 [chan receive, 18 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4663087 [chan receive, 112 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4661958 [chan receive, 113 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4585110 [chan receive, 181 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4594604 [chan receive, 173 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4775958 [chan receive, 11 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

goroutine 4580652 [chan receive, 185 minutes]:
github.com/influxdb/influxdb/coordinator.func·002()
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:317 +0x6a
created by github.com/influxdb/influxdb/coordinator.(*CoordinatorImpl).getShardsAndProcessor
        /root/gocodez/src/github.com/influxdb/influxdb/coordinator/coordinator.go:330 +0x3bf

Well, this is not the complete message, it TOO big, but I thing you got the point : lots of goroutine, some in "chan receive" and some in "IOWAIT"

@prune998
Copy link
Author

in total it is 73 goroutines on IO WAIT and 2148 on chan receive

@prune998
Copy link
Author

Re-installed cluster on 4 nodes, 2 physical servers and 2 vmware. Ubuntu 14.04, dedicated, influxdb 0.8.2.
still the same memory consumption. Most hitted servers (physicals with 32G memory) are around 2.7G memory used and growing.
less hitted servers (replicas) on vmware are around 1.9G of memory, growing.

I provided everything I could, here and on IRC. Everything requested, including logs from a debug binary and LOG files. I'm still waiting for an answer on your side.

Could you please, at least, change the status from "more information" to "acknowledged and working on this huge memory problem influx is having" ?

Last thing I could do, which will not be easy, is give a ssh access to someone so he could have a look by itself...

@maoe
Copy link

maoe commented Sep 26, 2014

I'm now seeing similar behavior here. We have three nodes in a cluster and they have been working fine until this morning. Today, one of the nodes suddenly started growing in terms of memory usage and keeps growing so far. Although the other two nodes look normal. We're using 0.8.2.

@localhots
Copy link

We have a single node running 0.8.2 which started leaking memory right after upgrading it from 0.7.3.

@jvshahid
Copy link
Contributor

Anyone on this thread who can provide a script to reproduce this issue will be very helpful. I tried to reproduce it few days ago with no luck and had to take a look at other bugs and cut a release which distracted me. I'll keep trying to reproduce the issue today. @localhots let me know if you can provide any help since you're using one node only and that will make it easier to isolate the problem and repro it locally.

@jvshahid
Copy link
Contributor

What are the sizes of your databases as reported by the fs ?

@localhots
Copy link

Database size is 23GB. What kind of information would help you?

@tuukkamustonen
Copy link

Also experiencing memory leaking in 0.8.2 that results in database crash. I'm throwing only ~100-200 metrics per minute to the database and after a day or so it has eaten most of its' 4GB of RAM (single-node setup). With less memory it would crash earlier.

DB size is 313M but the issues started way before that.

I'm not using database for anything exotic, simply 2 shard spaces (one for metrics, one for grafana). Roughly 30 shards total (due to grafana/grafana#663).

Running it on top of EC2 (Amazon Linux 2014.03). The machine does not have swap associated.

@dgnorton
Copy link
Contributor

Here is an app that can be used to stress influxdb. Still need to add a few things to it but it's working as is. https://github.com/dgnorton/influxdb_stress

@prune998
Copy link
Author

prune998 commented Oct 2, 2014

Modifying the stress app to only write new series showed a huge memory consumption.
you can test a modified version at : https://github.com/prune998/influxdb_stress

If you stop the app before the OOM error, server stays up with the memory used, never giving it back, even if you don't write to the serie anymore...

This is easily reproductible, with only one client and a batch size of 100 series :

./influxdb_stress -b=100 -c=1 -i=30 -s=100

As these series only contains one value and are never accessed then, it's not a write contention.
Starting a fresh server took around 70M or RAM.
Inserting around 700k series, what I have on production, lifted the memory usage to 1.2G.
I tried stopping and restarting influx : 500M were used before any query.
after a "list series", 780M were used.
after a second "list series", 823M were used.
after a few "list series" it stabilized at around 1.23G used.... and I'm still reading/writing nothing...

a "list serie" also lock the process at 100% CPU.

I'm going to write a write/read script to add values to existing series and see how memory changes over time, but you already have something to look at now...

@maoe
Copy link

maoe commented Oct 3, 2014

I've just confirmed that I can reproduce the leak @prune998 described above using ./influxdb_stress -b=100 -c=1 -i=30 -s=100 on a single node. Residential memory of the influxdb process grows linearly in time. Apparently this happens regardless of write-batch-size. There were no reads during the test.

As for the other leak in a cluster node I mentioned earlier, it happens even when there are no writes but some reads on the node. I haven't looked into it closely yet, though.

@maoe
Copy link

maoe commented Oct 3, 2014

I was able to reproduce the case of read-only memory leak with this Haskell script consistently a little while ago. But now I can't reproduce the issue after I ported the script to Ruby for some reason. Probably this leak happens occasionally.

When I was able to reproduce it, the behavior was as follows:

  1. The insertions didn't blow up the memory usage.
  2. The q2 queries didn't either.
  3. The q1 queries gave me Couldn't look up columns errors and during this phase influxdb's residential memory increased.
  4. The subsequent q2 queries kept blowing up the memory usage.

If I remember correctly the memory usage was increasing quicker than 1MB/s. Once I restart the influxdb process, q2 queries stops blowing up the usage.

Unfortunately I cannot reproduce the behavior now with either of scripts. So I guess there should be another trigger to produce the issue.

EDIT:

I've been using influxdb v0.8.3 on Ubuntu 12.04 throughout the tests and the client libraries are the latest ones.

@tuukkamustonen
Copy link

@jvshahid We also don't introduce that many series per day (maybe ~50 average and ~200 at maximum). I also remember the DB crashing over the weekend, when there were no new series introduced at all.

@sanga
Copy link
Contributor

sanga commented Oct 7, 2014

Has this always been the case or is it a regression? I'm seeing something like this locally but haven't really had time to investigate. Was just thinking that if it's a regression the quickest way to find out the problem is probably by doing a git bisect and the "stress script" mentioned above

@jvshahid
Copy link
Contributor

jvshahid commented Oct 7, 2014

Can you guys report the number of series that you have so far and the
memory usage you're seeing. Also running the following command and
capturing the output while the memory usage increase will be great. Please
make sure you stop any load on the process and leave it for a while (e.g.
10 mins) before killing it to make sure the go runtime will return any used
memory to the os. Also make sure you don't kill -9 the process otherwise
the profile data will not be generated

GODEBUG='gctrace=1' HEAP_PROFILE_MMAP=true ./influxdb --stdout --profile

/tmp/influxdb.profile 2>&1 | tee /tmp/influxdb.stdout

You should use the binary found here
https://s3.amazonaws.com/get.influxdb.org/influxdb that has profiling
enabled. We are interested in the following files:

  1. /tmp/influxdb.stdout
  2. /tmp/influxdb.profile.mem
  3. /tmp/influxdb.profile.*.heap

On Tue, Oct 7, 2014 at 1:48 AM, Tuukka Mustonen [email protected]
wrote:

@jvshahid https://github.com/jvshahid We also don't introduce that many
series per day (maybe ~50 average and ~200 at maximum). I also remember the
DB crashing over the weekend, when there were no new series introduced at
all.


Reply to this email directly or view it on GitHub
#941 (comment).

@jvshahid
Copy link
Contributor

jvshahid commented Oct 7, 2014

The command from the last comment is

GODEBUG='gctrace=1' HEAP_PROFILE_MMAP=true ./influxdb --stdout --profile /tmp/influxdb.profile 2>&1 | tee /tmp/influxdb.stdout

@perqa
Copy link

perqa commented Oct 8, 2014

Following the instructions, when issuing the given command, I get an error "flag provided but not defined: -profile", and InfluxDB is not started. See below:

sudo GODEBUG='gctrace=1' HEAP_PROFILE_MMAP=true ./influxdb --stdout --profile /tmp/influxdb.profile 2>&1 | tee /tmp/influxdb.stdout

gc1(1): 3+2+733+2 us, 0 -> 0 MB, 18 (19-1) objects, 0/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc2(1): 3+2+212+2 us, 0 -> 0 MB, 236 (237-1) objects, 13/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
gc3(1): 4+3+548+2 us, 0 -> 0 MB, 951 (1013-62) objects, 35/0/0 sweeps, 0(0) handoff, 0(0) steal, 0/0/0 yields
flag provided but not defined: -profile
Usage of ./influxdb:
-config="config.sample.toml": Config file
-hostname="": Override the hostname, the hostname config option will be overridden
-pidfile="": the pid file
-protobuf-port=0: Override the protobuf port, the protobuf_port config option will be overridden
-raft-port=0: Override the raft port, the raft.port config option will be overridden
-repair-ldb=false: set to true to repair the leveldb files
-reset-root=false: Reset root password
-stdout=false: Log to stdout overriding the configuration
-syslog="": Log to syslog facility overriding the configuration
-v=false: Get version number

@pkittenis
Copy link

Perhaps a hard memory usage limit read from configuration is the way to go here. If series names are read into memory that's fine, but with no limit on how much memory series names can take or a total limit the memory usage will surely run out of control given enough time.

There should be a way to set a hard limit on how much memory influxdb will use without having to guess by adjusting write-buffer-size and write-batch-size.

In case this info is useful, we are running a four node cluster here with two replicas and over a couple days both replicas died with OOM while the two non-replicas are still going strong with no suspect memory usage.

Can also easily replicate the memory issue by running the stress tool that @prune998 posted earlier, as well as our own graphite metric creating test tool which I can provide if needed.

Profiling can be done locally with either of these tools.

@jvshahid
Copy link
Contributor

jvshahid commented Oct 8, 2014

@perqa sorry about that, i think i forgot to enable profiling when i built that binary. I updated the binary to have profiling enabled you can use the same link i posted earlier.

@dgnorton
Copy link
Contributor

dgnorton commented Oct 9, 2014

https://github.com/dgnorton/influxdb_stress ... added an option to have readers (clients executing queries). The queries aren't configurable from command line or file but you can edit the queryTemplates array and rebuild for now. Also added a --reset-db command line option, which will deleted the database if it already exists and recreate it. That used to be the default behavior, which made it impossible to run more than one instance of influxdb_stress. Run with -h to get the full list of command line options.

@perqa
Copy link

perqa commented Oct 15, 2014

@jvshahid: Thanks for the update. I tried the new binary, but now I get a different error message:

./influxdb: error while loading shared libraries: libtcmalloc.so.4: cannot open shared object file: No such file or directory

I'm running Ubuntu 14 on Vagrant.

@jvshahid
Copy link
Contributor

You may need to run sudo apt-get install libgoogle-perftools*

@perqa
Copy link

perqa commented Oct 15, 2014

Yes, indeed, as well as sudo apt-get -f install. I also had to modify the start command by adding
-config="/opt/influxdb/shared/config.toml" to it. But finally it ran. :-)

The log files are available at
https://www.dropbox.com/sh/0giitjc31kwc4j7/AAANaadiRo0hJzvSXlbLXwdja?dl=0

@jvshahid
Copy link
Contributor

@perqa The memory profile shows very low memory usage, are you sure these profile files are from the right run ?

@perqa
Copy link

perqa commented Oct 16, 2014

It's the one and only profiling run I've done...so it must be the right one. What I did during the session was to log in to the web interface at port 8083, navigate to the right database and issue one query:
select * from arne1_mean1h where time > '2013-09-01' and time < '2014-09-01',
where arne1_mean1h is a continuous query, defined as
select mean(value) from arne1 where time > '2013-09-01' and time < '2014-09-01' group by time (1h) into arne1_mean1h.
This query takes apprx 8-9s to execute with the "normal" binary, and at least as long with the profiling-enabled binary.

@jvshahid
Copy link
Contributor

That's not the point, you have to trigger the same memory leak behavior in
order for the profile to be relevant and useful. Did you see the memory
footprint of InfluxDB increase ?

On Thu, Oct 16, 2014 at 3:12 AM, perqa [email protected] wrote:

It's the one and only profiling run I've done...so it must be the right
one. What I did during the session was to log in to the web interface at
port 8083, navigate to the right database and issue one query:
select * from arne1_mean1h where time > '2013-09-01' and time <
'2014-09-01',
where arne1_mean1h is a continuous query, defined as
select mean(value) from arne1 where time > '2013-09-01' and time <
'2014-09-01' group by time (1h) into arne1_mean1h. This query takes apprx
8-9s to execute with the "normal" binary, and at least as long with the
profiling-enabled binary.


Reply to this email directly or view it on GitHub
#941 (comment).

@toddboom toddboom added this to the 0.9.0 milestone Nov 26, 2014
@imcom
Copy link

imcom commented Jan 6, 2015

excuse me @toddboom Just noticed you added this thread to 0.9.0 milestone, does that mean this is not gonna be fixed with 0.8.x releases? I am running 0.8.3 in production because of aggregation functions errors (which fixed in 0.8.8 I assume) but as far as I know you have quite a different approach starting from 0.9.0, so I don't think I will migrate to 0.9.x in near future : ( So what would be the cause or is there any workaround to reduce or free the memory used by influxdb other than restart the instance?

@imcom
Copy link

imcom commented Jan 7, 2015

Hi @toddboom @jvshahid , I can confirm one situation will cause huge memory consumption, leaking. I am using the stressing tool provided by @dgnorton , with 10 writers and 10 readers, write batch is 3000 series per second per writer and there are total 30000 series. Two instances run as a cluster on two ubuntu 12.04 precise servers. LRU for leveldb is 500m and other configs are pretty much the default ones from installation. I noticed the memory only leaks when I have queries like below:

var queryTemplates = []string{
  "list series",
  "select * from {{randSeries}} where time > now() - 30m",
  "select {{randAggregate}} from {{randSeries}}",
  "select derivative(value) from {{randSeries}} where time > now() - 5m group by time(1m)",
  "select derivative(value) from {{randSeries}} where time > now() - 15m group by time(5m)",
  "select derivative(value) from {{randSeries}} where time > now() - 30m group by time(10m)",
  "select derivative(value) from /^_minute.writer1*/ where time > now() - 30m group by time(1m)",
}

I am inclined that the last query

select derivative(value) from /^_minute.writer1*/ where time > now() - 30m group by time(1m)

is the trouble maker here, also I have similar queries in my production which will eventually use up all memory on the server.

The query aforementioned involves 3000 series in the db and once the query got executed things start going south... I can see memory usage ever increasing.

In a nut shell, what I've seen is whenever InfluxDB gets stuck in query involves a large number of series will trigger the memory exhaustion. So the idea would be to manually control the query size to avoid memory issue?

Also the split setting in shard config has impacts on following servers I assume.

@imcom
Copy link

imcom commented Jan 7, 2015

Hi @jvshahid , could you post instructions how to build a influxdb binary with profiling enabled? because the version you posted above was compiled against GLIBC_2.17 which I do not have on Ubuntu 12.04. So I will have to build the binary on my own with my environment.
Thanks in advance.

@toddboom
Copy link
Contributor

Current testing of v0.9.0 shows that this is no longer an issue, so we're closing it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests