-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/datarace when concurrent save the same file #836
Fix/datarace when concurrent save the same file #836
Conversation
Codecov Report
@@ Coverage Diff @@
## master #836 +/- ##
==========================================
+ Coverage 52.94% 52.98% +0.03%
==========================================
Files 261 261
Lines 18999 19005 +6
==========================================
+ Hits 10059 10069 +10
+ Misses 7384 7376 -8
- Partials 1556 1560 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@lonng PTAL |
@@ -25,6 +28,13 @@ func SaveFileWithBackup(path string, data []byte, backupDir string) error { | |||
return errors.Errorf("%s is directory", path) | |||
} | |||
|
|||
if _, ok := fileLocks[path]; !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fileLocks itself is not thread safe, so I think the datarace is still exists. eg.
# assume the fileLocks is empty as initial state.
# thread 1 start
if _, ok := fileLocks["/path/a"]; !ok { # the ok is false because fileLocks is empty
# thread 1 hang after check !ok but before fileLocks[path] = &sync.Mutex{}
# thread 2 start
if _, ok := fileLocks["/path/a"]; !ok { # the ok is false because fileLocks is empty and thread 1 doesn't assign it yet
fileLocks["/path/a"] = &sync.Mutex{} # assign a new mutex and success
}
fileLocks["/path/a"].Lock()
# thread 2 enter critical area
# thread 1 start
fileLocks["/path/a"] = &sync.Mutex{} # assign a new mutex and success
fileLocks["/path/a"].Lock() # this will success because thread 1 reset fileLocks["/path/a"]
# thread 1 enter critical area
In this case, both threads enter into the same critical area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion, sorry to forget that the map needs to be locked in concurrent access.
@@ -10,7 +10,7 @@ GOARCH := $(if $(GOARCH),$(GOARCH),amd64) | |||
GOENV := GO111MODULE=on CGO_ENABLED=0 GOOS=$(GOOS) GOARCH=$(GOARCH) | |||
GO := $(GOENV) go | |||
GOBUILD := $(GO) build $(BUILD_FLAG) | |||
GOTEST := GO111MODULE=on CGO_ENABLED=1 $(GO) test -p 3 | |||
GOTEST := GO111MODULE=on CGO_ENABLED=1 go test -p 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to revert this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to run tests with -race
, but which need CGO was enabled. Seems there is no effective to set GO111MODULE=on CGO_ENABLED=1
before $(GO)
, such as
$ make race
TIUP_HOME=/root/.go/src/github.com/pingcap/tiup/tests/tiup GO111MODULE=on CGO_ENABLED=1 GO111MODULE=on CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go test -p 3 -race ./... || { $(tools/bin/failpoint-ctl disable); exit 1; }
go test: -race requires cgo; enable cgo by setting CGO_ENABLED=1
make: *** [Makefile:97: race] Error 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
4e67051
to
96895d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
In some cases, saving the same file may cause an error due to a race condition, eg in the cluster deploy with TLS supported scenario, the ca.crt file was saved multi times in parallel, in my local env, this leads to the ca.crt file to be left empty.
What is changed and how it works?
introduce a simple lock to ensure the file was written in sequence.
Check List
Tests
Code changes
Side effects
Related changes
Release notes: