Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash while performing redisClusterAsyncFree() #222

Closed
rahulKrishnaM opened this issue May 21, 2024 · 3 comments · Fixed by #225
Closed

Crash while performing redisClusterAsyncFree() #222

rahulKrishnaM opened this issue May 21, 2024 · 3 comments · Fixed by #225

Comments

@rahulKrishnaM
Copy link

rahulKrishnaM commented May 21, 2024

Hi, came across this crash during one of the runs.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007e36c41fe3a2 in selectNode (nodes=<optimized out>)
[Current thread is 1 (Thread 0x7e36b50b3a40 (LWP 21))]
(gdb) bt
#0  0x00007e36c41fe3a2 in selectNode (nodes=<optimized out>)
#1  updateSlotMapAsync (acc=acc@entry=0x58a8403266c0)
#2  0x00007e36c41fe598 in clusterSlotsReplyCallback (ac=<optimized out>, r=0x0, privdata=0x58a8403266c0)
#3  0x00007e36c420ce79 in __redisRunCallback (reply=0x0, cb=0x7fffa203cfc0, ac=0x58a84f4e8780)
#4  __redisAsyncFree (ac=0x58a84f4e8780)
#5  0x00007e36c420d645 in redisAsyncFree (ac=<optimized out>)
#6  0x00007e36c41f9e90 in freeRedisClusterNode (node=0x58a87df71bc0)
#7  0x00007e36c41f7c56 in _dictClear (ht=0x58a85c542180)
#8  dictRelease (ht=0x58a85c542180)
#9  0x00007e36c41fbfc6 in redisClusterFree (cc=0x58a84218b300)
#10 0x00007e36c41ff555 in redisClusterAsyncFree (acc=0x58a8403266c0)
### layers below called from application while shutting down.

Looks like the execution was at this line number when the crash happened

if (nodeIsConnected(node)) {

I am suspecting the reason for SEGV could be the below code area:

dictFreeEntryVal(ht, he);

where, even though we are freeing up the memory pointed by redisClusterNode* or dictEntry, we are not unlinking it from the table. Hence it gets looked up later on(during selectNode()), which would be a dangling pointer.

        if ((he = ht->table[i]) == NULL)
            continue;
        while (he) {
            nextHe = he->next;
            dictFreeEntryKey(ht, he);
            dictFreeEntryVal(ht, he);      // memory released, but entry not set to NULL, so ht->table[]->val holds a dangling pointer ?
            hi_free(he);                   // memory released, but entry not set to NULL, so ht->table[] holds a dangling pointer ?
            ht->used--;
            he = nextHe;
        }

And also, other thing that looks odd is the library trying to reattempt discovery when the redisClusterAsyncContext object is getting deleted. Maybe we should prevent that path altogether, in case of redisClusterAsyncFree() where it's about to tear everything down anyways.

cc: @bjosv

@rahulKrishnaM
Copy link
Author

I don't see that address being explicitly set to 0 after memory is freed. Do you agree with the analysis?

@rahulKrishnaM
Copy link
Author

rahulKrishnaM commented May 29, 2024

Posting some more thoughts,
selectNode() is trying to do a dictNext(). So, if we don't set the he object to NULL once we free it inside _dictClear(), any later dictNext() access might still think the entry as valid and try to process that (i.e when the execution is still inside _dictClear()).

@rahulKrishnaM
Copy link
Author

maybe something in lines of:

    while (he) {
        nextHe = he->next;
   +    ht->tabl[i] = nextHe;
        dictFreeEntryKey(ht, he);
        dictFreeEntryVal(ht, he);
        hi_free(he);
        ht->used--;
        he = nextHe;
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant