Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go-redis/v8 nil shard in the shards map #2126

Closed
alemrtv opened this issue Jun 21, 2022 · 2 comments · Fixed by #2153
Closed

go-redis/v8 nil shard in the shards map #2126

alemrtv opened this issue Jun 21, 2022 · 2 comments · Fixed by #2153
Labels

Comments

@alemrtv
Copy link

alemrtv commented Jun 21, 2022

Expected Behavior

no panic due to nil pointer dereference after retrieving a nil shard from the shards map

Current Behavior

There is no check if a shard is not nil before returning it.

It might cause panic due to nil pointer dereference here

Possible Solution

Check if shard is not nil before returning it.

Steps to Reproduce

We have a proprietary code which wraps go-redis/go and sometimes recreates the Ring structure (and hence repopulating the shards map), all while holding the mutex.
Despite it, sometimes after the ring is recreated, we get panic in the place linked above because the shard retrieved by the name is nil.

Possible Implementation

func (c *ringShards) GetByName(shardName string) (*ringShard, error) {
	if shardName == "" {
		return c.Random()
	}

	c.mu.RLock()
	shard := c.shards[shardName]
	c.mu.RUnlock()

	if shard == nil {
		return nil, fmt.Errorf("a shard named %q is nil", shardName)
	}

	return shard, nil
}
@vmihailenco
Copy link
Collaborator

c.shards should not contain nil shards. Do you have an idea what conditions cause go-redis to have a nil shard in the map?

@alemrtv
Copy link
Author

alemrtv commented Jun 22, 2022

We use a structure S which (simplified) looks like this:

import "github.com/go-redis/redis/v8"
...
type S struct {
  mu sync.RWMutex
  ring *redis.Ring
  ...
}

`*redis.Ring` is the one from `package redis` in go-redis/redis@v8.11.4.

And function f which (simplified) does something like this:

func (s *S) f() {
  s.mu.Lock()
  defer s.mu.Unlock()
  ...
  // here we change opts.Addrs map
  var newOpts redis.RingOptions = ... 
  // here the shards map gets changed and we substitute the old ring with the new one
  prev, s.ring = s.ring, redis.NewRing(&newOpts) the//new one
  ...
  defer func() {
    if prev != nil {
      go func() {
        if err := prev.Close(); err != nil {
          log(err)
        }
      }()    
    }
  }()
}

At some random points of time we call f() to reinitialize S in order to change opts.Addrs. We take the lock while doing this to avoid any race conditions.

However, as I can see in the source code, between the moment when the hash is calculated and the moment when we retrieve a shard by it a bit later, there is no lock held.

I suppose that we might have a situation, when things happen in the following order:

  1. The hash is calculated in generalProcessPipeline()
  2. Our s.f() is called, it does its work and changes the shards map
  3. GetByName() tries to retrieve a shard with an invalid hash and gets a shard without checking if it is nil. This is where the nil pointer dereference happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants