Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Race on metadata.annotations for Managed Resources #560

Closed
ulucinar opened this issue Sep 25, 2023 · 0 comments · Fixed by #561
Closed

Data Race on metadata.annotations for Managed Resources #560

ulucinar opened this issue Sep 25, 2023 · 0 comments · Fixed by #561
Labels
bug Something isn't working

Comments

@ulucinar
Copy link
Contributor

What happened?

With @sergenyalcin, while doing some scale tests with 10K MRs with the official AWS provider and the no-fork external client implementation using the provider image index.docker.io/ulucinar/provider-aws-iam:v0.39.0-iamnofork, we've observed pod restarts due to a concurrent map iteration and writes to the same map:
image

The relevant provider logs are as follows:

...
fatal error: concurrent map iteration and map write

goroutine 4049 [running]:
k8s.io/apimachinery/pkg/apis/meta/v1.(*ObjectMeta).DeepCopyInto(0xc0095551a0, 0xc0059dbc20)
	k8s.io/[email protected]/pkg/apis/meta/v1/zz_generated.deepcopy.go:689 +0x387
github.com/upbound/provider-aws/apis/iam/v1beta1.(*UserPolicyAttachment).DeepCopyInto(0xc009555180, 0xc0059dbc00)
	github.com/upbound/provider-aws/apis/iam/v1beta1/zz_generated.deepcopy.go:4472 +0xc5
github.com/upbound/provider-aws/apis/iam/v1beta1.(*UserPolicyAttachment).DeepCopy(...)
	github.com/upbound/provider-aws/apis/iam/v1beta1/zz_generated.deepcopy.go:4483
github.com/upbound/provider-aws/apis/iam/v1beta1.(*UserPolicyAttachment).DeepCopyObject(0xc009555180?)
	github.com/upbound/provider-aws/apis/iam/v1beta1/zz_generated.deepcopy.go:4489 +0x45
sigs.k8s.io/controller-runtime/pkg/cache/internal.(*CacheReader).Get(0xc003756710, {0xc00801cb40?, 0x10d784cc?}, {{0x0?, 0x10d0d195?}, {0xc008794be0?, 0xcf9e550?}}, {0x12303160, 0xc0059dba40}, {0x0, ...})
	sigs.k8s.io/[email protected]/pkg/cache/internal/cache_reader.go:88 +0x145
sigs.k8s.io/controller-runtime/pkg/cache.(*informerCache).Get(0xc000c43090, {0x122be3f0, 0xc00801cb40}, {{0x0?, 0x40bd05?}, {0xc008794be0?, 0x10ba70e0?}}, {0x12303160?, 0xc0059dba40?}, {0x0, ...})
	sigs.k8s.io/[email protected]/pkg/cache/informer_cache.go:70 +0x1fd
sigs.k8s.io/controller-runtime/pkg/client.(*client).Get(0xc000db0870, {0x122be3f0, 0xc00801cb40}, {{0x0?, 0x0?}, {0xc008794be0?, 0xc0059dafc0?}}, {0x12303160?, 0xc0059dba40?}, {0x0, ...})
	sigs.k8s.io/[email protected]/pkg/client/client.go:341 +0x4c4
github.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*Reconciler).Reconcile(0xc0000712c0, {0x122be428, 0xc00801cab0}, {{{0x0, 0x0}, {0xc008794be0, 0x10}}})
	github.com/crossplane/[email protected]/pkg/reconciler/managed/reconciler.go:704 +0x279
github.com/crossplane/crossplane-runtime/pkg/ratelimiter.(*Reconciler).Reconcile(0xc005c958b0, {0x122be428, 0xc00801cab0}, {{{0x0?, 0x0?}, {0xc008794be0?, 0x40e127?}}})
	github.com/crossplane/[email protected]/pkg/ratelimiter/reconciler.go:54 +0x16b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x122be428?, {0x122be428?, 0xc00801cab0?}, {{{0x0?, 0xdb06400?}, {0xc008794be0?, 0x12295fb8?}}})
	sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0xc8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc005c91ae0, {0x122be380, 0xc0002d2050}, {0xf74c720?, 0xc00b402a60?})
	sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314 +0x377
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc005c91ae0, {0x122be380, 0xc0002d2050})
	sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:222 +0x587
...

We were able to see the writing site by running the provider with the Go race detector and setting the --max-reconcile-rate to 1000 and setting --poll to 10s:

...
==================
WARNING: DATA RACE
Write at 0x00c05f748540 by goroutine 19444:
  runtime.mapassign_faststr()
      /Users/alper/.goenv/versions/1.19.1/src/runtime/map_faststr.go:203 +0x42c
  github.com/crossplane/crossplane-runtime/pkg/resource.AnnotationChangedPredicate.Update()
      /Users/alper/data/workspaces/go/pkg/mod/github.com/crossplane/[email protected]/pkg/resource/predicates.go:212 +0x268
  github.com/crossplane/crossplane-runtime/pkg/resource.(*AnnotationChangedPredicate).Update()
      <autogenerated>:1 +0xa8
  sigs.k8s.io/controller-runtime/pkg/predicate.or.Update()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/predicate/predicate.go:300 +0xd4
  sigs.k8s.io/controller-runtime/pkg/predicate.(*or).Update()
      <autogenerated>:1 +0x30
  sigs.k8s.io/controller-runtime/pkg/internal/source.(*EventHandler).OnUpdate()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/event_handler.go:113 +0x578
  sigs.k8s.io/controller-runtime/pkg/internal/source.(*EventHandler).OnUpdate-fm()
      <autogenerated>:1 +0x54
  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/tools/cache/controller.go:250 +0x6c
  k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate()
      <autogenerated>:1 +0x20
  k8s.io/client-go/tools/cache.(*processorListener).run.func1()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:971 +0x1b8
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x48
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0x94
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x110
  k8s.io/apimachinery/pkg/util/wait.Until()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:161 +0x6c
  k8s.io/client-go/tools/cache.(*processorListener).run()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:967 +0x18
  k8s.io/client-go/tools/cache.(*processorListener).run-fm()
      <autogenerated>:1 +0x34
  k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:72 +0x6c

Previous read at 0x00c05f748540 by goroutine 25544:
  k8s.io/apimachinery/pkg/apis/meta/v1.(*ObjectMeta).DeepCopyInto()
      /Users/alper/data/workspaces/go/pkg/mod/k8s.io/[email protected]/pkg/apis/meta/v1/zz_generated.deepcopy.go:688 +0x4ac
  github.com/upbound/provider-aws/apis/iam/v1beta1.(*UserPolicyAttachment).DeepCopyInto()
      /Users/alper/data/workspaces/github.com/ulucinar/upbound/provider-aws/apis/iam/v1beta1/zz_generated.deepcopy.go:4472 +0x134
  github.com/upbound/provider-aws/apis/iam/v1beta1.(*UserPolicyAttachment).DeepCopy()
      /Users/alper/data/workspaces/github.com/ulucinar/upbound/provider-aws/apis/iam/v1beta1/zz_generated.deepcopy.go:4483 +0x4c
  github.com/upbound/provider-aws/apis/iam/v1beta1.(*UserPolicyAttachment).DeepCopyObject()
      /Users/alper/data/workspaces/github.com/ulucinar/upbound/provider-aws/apis/iam/v1beta1/zz_generated.deepcopy.go:4489 +0x30
  sigs.k8s.io/controller-runtime/pkg/cache/internal.(*CacheReader).Get()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/cache/internal/cache_reader.go:88 +0x220
  sigs.k8s.io/controller-runtime/pkg/cache.(*informerCache).Get()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/cache/informer_cache.go:70 +0x17c
  sigs.k8s.io/controller-runtime/pkg/client.(*client).Get()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/client/client.go:341 +0x348
  github.com/crossplane/crossplane-runtime/pkg/reconciler/managed.(*Reconciler).Reconcile()
      /Users/alper/data/workspaces/go/pkg/mod/github.com/crossplane/[email protected]/pkg/reconciler/managed/reconciler.go:704 +0x2a4
  github.com/crossplane/crossplane-runtime/pkg/ratelimiter.(*Reconciler).Reconcile()
      /Users/alper/data/workspaces/go/pkg/mod/github.com/crossplane/[email protected]/pkg/ratelimiter/reconciler.go:54 +0x16c
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118 +0xe4
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314 +0x3b8
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265 +0x240
  sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
      /Users/alper/data/workspaces/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226 +0x98

Looks like the resource.AnnotationChangedPredicate's implementation is modifying the shared informer cache. We had better keep the cache items intact.

How can we reproduce it?

What environment did it happen in?

Crossplane version: UXP 1.13.2-up.2
Official AWS provider: index.docker.io/ulucinar/provider-aws-iam:v0.39.0-iamnofork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant