Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server runtime error when execute Watch command #358

Closed
marouane-kech opened this issue Dec 2, 2013 · 19 comments
Closed

Server runtime error when execute Watch command #358

marouane-kech opened this issue Dec 2, 2013 · 19 comments
Labels

Comments

@marouane-kech
Copy link

I try to test the wath command with ( curl -L http://127.0.0.1:4001/v2/keys/message?wait=true ) and I do the same test with python-client. I got the same result:
the etcd instance crash with exeption :

panic serving 127.0.0.1:32907: runtime error: invalid memory address or nil pointer dereference
goroutine 758 [running]:
net/http.func·007()
/usr/local/go/src/pkg/net/http/server.go:1022 +0x9f
sync/atomic.AddUint64()
/usr/local/go/src/pkg/sync/atomic/asm_386.s:69 +0xc
github.com/coreos/etcd/store.(_watcherHub).watch(0x1868b4c0, 0x20186080, 0x8, 0x0, 0x23c, ...)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/store/watcher_hub.go:71 +0x165
github.com/coreos/etcd/store.(_store).Watch(0x1868e480, 0x20186080, 0x8, 0x82a0400, 0x0, ...)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/store/store.go:295 +0x151
github.com/coreos/etcd/server/v2.GetHandler(0x1e53be20, 0x1d2a3c80, 0x1d254310, 0x1ec206f0, 0x18691540, ...)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/server/v2/get_handler.go:49 +0x455
github.com/coreos/etcd/server.func·003(0x1e53be20, 0x1d2a3c80, 0x1d254310, 0x0, 0x84be0e0, ...)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/server/server.go:141 +0x66
github.com/coreos/etcd/server.func·004(0x1e53be20, 0x1d2a3c80, 0x1d254310)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/server/server.go:162 +0x23b
net/http.HandlerFunc.ServeHTTP(0x1868bda0, 0x1e53be20, 0x1d2a3c80, 0x1d254310)
/usr/local/go/src/pkg/net/http/server.go:1149 +0x3c
github.com/gorilla/mux.(_Router).ServeHTTP(0x18666540, 0x1e53be20, 0x1d2a3c80, 0x1d254310)
/home/tprojadmin/etcd/src/github.com/gorilla/mux/mux.go:90 +0x193
net/http.serverHandler.ServeHTTP(0x18691540, 0x1e53be20, 0x1d2a3c80, 0x1d254310)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x121
net/http.(_conn).serve(0x1f3f5190)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x653
created by net/http.(_Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x242
^Ctprojadmin@tproj-devt-04:~/etcd$ ./etcd -data-dir node0 -name node0
[etcd] Dec 2 15:41:49.280 INFO | etcd server [name node0, listen on 127.0.0.1:4001, advertised url http://127.0.0.1:4001]
[etcd] Dec 2 15:42:26.081 INFO | URLs: / node0 (http://127.0.0.1:7001,http://127.0.0.1:7002,http://127.0.0.1:7003)
[etcd] Dec 2 15:42:26.082 WARNING | the entire cluster is down! this peer will restart the cluster.
[etcd] Dec 2 15:42:26.082 INFO | raft server [name node0, listen on 127.0.0.1:7001, advertised url http://127.0.0.1:7001]
2013/12/02 15:43:32 http: panic serving 127.0.0.1:55893: runtime error: invalid memory address or nil pointer dereference
goroutine 781 [running]:
net/http.func·007()
/usr/local/go/src/pkg/net/http/server.go:1022 +0x9f
sync/atomic.AddUint64()
/usr/local/go/src/pkg/sync/atomic/asm_386.s:69 +0xc
github.com/coreos/etcd/store.(_watcherHub).watch(0x1868b4c0, 0x20eb8030, 0x8, 0x0, 0x23c, ...)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/store/watcher_hub.go:71 +0x165
github.com/coreos/etcd/store.(_store).Watch(0x1868e480, 0x20eb8030, 0x8, 0x82a0400, 0x0, ...)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/store/store.go:295 +0x151
github.com/coreos/etcd/server/v2.GetHandler(0x1dc65640, 0x1b1138c0, 0x1aa0dd90, 0x1e48fe10, 0x18691540, ...)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/server/v2/get_handler.go:49 +0x455
github.com/coreos/etcd/server.func·003(0x1dc65640, 0x1b1138c0, 0x1aa0dd90, 0x0, 0x84be0e0, ...)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/server/server.go:141 +0x66
github.com/coreos/etcd/server.func·004(0x1dc65640, 0x1b1138c0, 0x1aa0dd90)
/home/tprojadmin/etcd/src/github.com/coreos/etcd/server/server.go:162 +0x23b
net/http.HandlerFunc.ServeHTTP(0x1868bda0, 0x1dc65640, 0x1b1138c0, 0x1aa0dd90)
/usr/local/go/src/pkg/net/http/server.go:1149 +0x3c
github.com/gorilla/mux.(_Router).ServeHTTP(0x18666540, 0x1dc65640, 0x1b1138c0, 0x1aa0dd90)
/home/tprojadmin/etcd/src/github.com/gorilla/mux/mux.go:90 +0x193
net/http.serverHandler.ServeHTTP(0x18691540, 0x1dc65640, 0x1b1138c0, 0x1aa0dd90)
/usr/local/go/src/pkg/net/http/server.go:1517 +0x121
net/http.(_conn).serve(0x1f3d91e0)
/usr/local/go/src/pkg/net/http/server.go:1096 +0x653
created by net/http.(_Server).Serve
/usr/local/go/src/pkg/net/http/server.go:1564 +0x242

Some one have a solution please ?

@xiang90
Copy link
Contributor

xiang90 commented Dec 2, 2013

@marouane-kech Hmm... Are you on a 32bit machine? If it is, then seems like a "bug" in golang. I cannot reproduce your problem on my current machine. But I will find a 32bit machine to give a try.

We have a struct like

type watcherHub struct {
    watchers     map[string]*list.List
    count        int64 // current number of watchers.
    EventHistory *EventHistory
}

When you access count via 64bit operation, it panics.
I guess the problem is that the struct is not 64bit aligned on your machine.
Can you try to move count to the first like

type watcherHub struct {
    count        int64 // current number of watchers.
    watchers     map[string]*list.List
    EventHistory *EventHistory
}

@xiang90
Copy link
Contributor

xiang90 commented Dec 2, 2013

@marouane-kech Oh. I found a similar issue on golang forum. https://code.google.com/p/go/issues/detail?id=6404. I am going to fix it now.

@philips
Copy link
Contributor

philips commented Dec 2, 2013

@xiangli-cmu Is this already fixed in go1.2 maybe?

On Mon, Dec 2, 2013 at 8:13 AM, Xiang Li [email protected] wrote:

@marouane-kech https://github.com/marouane-kech Oh. I found a similar
issue on golang forum. https://code.google.com/p/go/issues/detail?id=6404.
I am going to fix it now.


Reply to this email directly or view it on GitHubhttps://github.com//issues/358#issuecomment-29630859
.

@marouane-kech
Copy link
Author

@xiangli-cmu Yes I have a 32bit machine, I do the change like you said in watcherHub struct (file watcher_hub.go) but I still have the same problem.
do you find a solution to this yet ? How can I fix it ?

Thank you.

@xiang90
Copy link
Contributor

xiang90 commented Dec 4, 2013

@marouane-kech I will check it on a 32bit machine.

@marouane-kech
Copy link
Author

Good morning @xiangli-cmu Have you some news about watcher function on a 32bit machine.
Thank you.

@farcaller
Copy link

I'm hitting this one as well. That's one of the most amazing bugs in golang I've seem so far, and I see lots of those recently.

It's possible to reorder some fields accessed by atomic where that's required. A possible fix could be seen at farcaller@1257e1c. If this is an acceptable solution for etcd, I can clean that up and send a PR.

@philips
Copy link
Contributor

philips commented Mar 17, 2014

@farcaller Is there an upstream bug for this alignment problem?

@neunhoef
Copy link

This is to remind you of this issue. We are hit by this as well. Upstream says
"On both ARM and x86-32, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically." and has the issue status set to "Unfortunate".
I would consider this as a bug in etcd on 32bit ARM and x86-32.
To fix it one would have to make sure that all 64bit values that must be changed atomically are 8byte aligned. farcaller's patch might achieve this.

@jsteemann
Copy link

Also hitting this bug on i386. As @farcaller's patch is not yet included in current master, is there any hope this is going to be fixed / worked around in etcd soon?

@philips
Copy link
Contributor

philips commented Apr 24, 2014

@jsteemann @neunhoef @farcaller Since you have the hardware can one of you figure out the alignment of the structs that makes this work for you and send a patch? I am fine taking a patch as long as it doesn't have any functional changes. I will also have to rely on you all to test releases so we don't regress.

Thanks.

@farcaller
Copy link

Based on my discussion at go-nuts, there are no plans to make atomic safer. So far, the safest way is:

  • make sure that any integer accessible by atomic is first in the struct;
  • if more than one such integer is required, they must be ordered from longest (in number of bits) to shortest; alternatively, always use int64 for atomics.

@yichengq
Copy link
Contributor

@farcaller I like your patch!
I think this is the best way to fix it for now.
For third party lib, we may need to send separate PR to upstream one.
But we could merge the watch hub one if it works. :)

@jsteemann
Copy link

@unihorn @farcaller: I have tried farcaller's patch (farcaller@1257e1c) with ETCD 0.3 on a 32 bit Ubuntu and it worked fine.
With patch, the crashes when using watches do not seem to occur anymore.
I have also tried the patch on 64 bit Linuxes, and it did not do any harm there either.

I'd love to see farcaller's merged being merged into ETCD.

@yichengq
Copy link
Contributor

yichengq commented May 9, 2014

The patch covers three repos, and I think it could only be a workaround for watch command.
I have grep int64 in the repo, and found out tens of lines. If I want to use this solution fully, I have to make changes on about five structs, which break their original sequences. And other changes should be applied to third_party repo.
Is this acceptable? @xiangli-cmu @philips @bmizerany

@bmizerany
Copy link
Contributor

This is a big change since it is going to touch a ton in third_party, which we don't own and don't have a good way to know what that is going to affect, I think.

@jsteemann
Copy link

If this is issue is not going to be fixed, I think there should at least be some statement in ETCD's README.md like ETCD works on 64 bit operating systems only. Support for 32 bit operating systems may be added in the future.
Just to save others from trouble.

What do you think?

@pyramids
Copy link

+1

@kelseyhightower
Copy link
Contributor

etcd does not support 32bit systems. We have updated our README to state the fact. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

10 participants