Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpha crashes with fault address in groupby.go #3642

Closed
jarifibrahim opened this issue Jul 9, 2019 · 6 comments
Closed

Alpha crashes with fault address in groupby.go #3642

jarifibrahim opened this issue Jul 9, 2019 · 6 comments
Labels
area/facets Issues related to face handling, querying, etc. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.

Comments

@jarifibrahim
Copy link
Contributor

jarifibrahim commented Jul 9, 2019

What version of Dgraph are you using?

727b246

Have you tried reproducing the issue with latest release?

Yes

What is the hardware spec (RAM, OS)?

N/A

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Run zero using
dgraph zero -w /home/ibrahim/m2_drive/flock_backup/zero1/zw -o 0 --idx=1 --my=localhost:5080 --replicas=1 --logtostderr -v=2 --bindall
  1. Run alpha using
dgraph alpha -p /home/ibrahim/m2_drive/flock_backup/alpha1/p -w /home/ibrahim/m2_drive/flock_backup/alpha1/w -o 100 --my=localhost:7180 --lru_mb=1024 --zero=localhost:5080 --logtostderr -v=2
  1. Apply the following diff on flock agent's main.go
diff --git a/main.go b/main.go
index c2cf6ec..91e8da0 100644
--- a/main.go
+++ b/main.go
@@ -52,8 +52,8 @@ id_str: string @index(exact) @upsert .
 created_at: dateTime @index(hour) .
 hashtags: [string] @index(exact) .
 
-author: uid @count @reverse .
-mention: uid @reverse .
+author: [uid] @count @reverse .
+mention: [uid] @reverse .
 `
 
        cDgraphTweetQuery = `
  1. Run flock agent
go run main.go -a ":9180"
  1. Apply the following to flock client's main.go (without this change some of the queries in flock were failing)
diff --git a/client/main.go b/client/main.go
index c1eab19..5913475 100644
--- a/client/main.go
+++ b/client/main.go
@@ -547,7 +547,7 @@ type querySix struct {
 
 func (q *querySix) getParams(dgr *dgo.Dgraph) error {
        // we subtract 41 hours because that's the latest data we get from twitter
-       curTime := time.Now().Add(-41 * time.Hour)
+       curTime := time.Now().Add(-100 * time.Hour)
 
        query := fmt.Sprintf(`
 {
@@ -619,7 +619,7 @@ type querySeven struct {
 
 func (q *querySeven) getParams(dgr *dgo.Dgraph) error {
        // we subtract 41 hours because that's the latest data we get from twitter
-       curTime := time.Now().Add(-41 * time.Hour)
+       curTime := time.Now().Add(-100 * time.Hour)
 
        query := fmt.Sprintf(`
 {
@@ -693,7 +693,7 @@ type queryEight struct {
 
 func (q *queryEight) runQuery(dgr *dgo.Dgraph) error {
        // we subtract 41 hours because that's the latest data we get from twitter
-       curTime := time.Now().Add(-41 * time.Hour)
+       curTime := time.Now().Add(-100 * time.Hour)
 
        query := fmt.Sprintf(`
 {
@@ -787,7 +787,7 @@ type queryNine struct {
 
 func (q *queryNine) getParams(dgr *dgo.Dgraph) error {
        // we subtract 41 hours because that's the latest data we get from twitter
-       curTime := time.Now().Add(-41 * time.Hour)
+       curTime := time.Now().Add(-100 * time.Hour)
 
        query := fmt.Sprintf(`
 {
  1. Run flock client
go run main.go -a ":9180"

Expected behaviour and actual result.

After a few seconds alpha crashes. Here's the stack trace of alpha https://gist.github.com/jarifibrahim/d517efefd7f7c334a9bb50552041cc11

@jarifibrahim jarifibrahim added area/facets Issues related to face handling, querying, etc. kind/bug Something is broken. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate/work on it. labels Jul 9, 2019
@mangalaman93 mangalaman93 removed their assignment Jul 12, 2019
@mangalaman93 mangalaman93 added priority/P1 Serious issue that requires eventual attention (can wait a bit) and removed priority/P2 Somehow important but would not block a release. labels Jul 12, 2019
@martinmr
Copy link
Contributor

@jarifibrahim Can you check if the following change to groupby.go in line 225 helps?

Change from

ul := child.uidMatrix[i]

to

if i >= len(child.uidMatrix) {
  continue
}
ul := child.uidMatrix[i]

I think that would stop the crash but I am not sure if it's the right solution, or why uidMatrix does not have enough entries.

@jarifibrahim
Copy link
Contributor Author

@martinmr The change stopped the crash.

@martinmr
Copy link
Contributor

ok.still not sure of the root cause but i'll open a PR with the fix. We have similar checks in our codebase so this should be fine.

@martinmr
Copy link
Contributor

My bad, the crash is in the next line while trying to access ul.Uids. So ul is nil. I am guessing the error is not happening every time so it looked like my original suggestion fixed it. For now, I'll open a PR to check that ul is non-nil.

martinmr added a commit that referenced this issue Jul 15, 2019
For some reason, there are nil values stored in the uidMatrix, which
results in a segmentation fault when trying to access the Uids field.
The root cause of this issue is still unclear but this change adds an
additional check to prevent the segmentation fault.
@martinmr
Copy link
Contributor

#3670

martinmr added a commit that referenced this issue Jul 15, 2019
For some reason, there are nil values stored in the uidMatrix, which
results in a segmentation fault when trying to access the Uids field.
The root cause of this issue is still unclear but this change adds an
additional check to prevent the segmentation fault.
@martinmr
Copy link
Contributor

Fix has been submitted so closing this for now. Feel free to reopen if you happen to see it again but the original segfault should be gone now.

dna2github pushed a commit to dna2fork/dgraph that referenced this issue Jul 19, 2019
For some reason, there are nil values stored in the uidMatrix, which
results in a segmentation fault when trying to access the Uids field.
The root cause of this issue is still unclear but this change adds an
additional check to prevent the segmentation fault.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/facets Issues related to face handling, querying, etc. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.
Development

No branches or pull requests

3 participants