Skip to content

Commit

Permalink
object/put: fix concurrent PUT data corruption
Browse files Browse the repository at this point in the history
If ants pool is busy and cannot take task, early `return` without `wg.Wait()`
leads to `iterateNodesForObject`'s `return` and all the buffers for binary
replication from now may be reused while are still in use by the other routines
inside the pool. Wait for WG and try other nodes more instead, it also can
increase the rate of successful PUTs at high loads. Closes #2978.

Signed-off-by: Pavel Karpy <[email protected]>
  • Loading branch information
carpawell committed Nov 23, 2024
1 parent 339b4cb commit a63db8a
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ attribute, which is used for container domain name in NNS contracts (#2954)
- Panic in event listener related to inability to switch RPC node (#2970)
- Non-container nodes never check placement policy on PUT, SEARCH requests (#3014)
- If shards are overloaded with PUT requests, operation is not skipped but waits for 30 seconds (#2871)
- Data corruption if PUT is done too concurrently (#2978)

### Changed
- `ObjectService`'s `Put` RPC handler caches up to 10K lists of per-object sorted container nodes (#2901)
Expand Down
2 changes: 1 addition & 1 deletion pkg/services/object/put/distributed.go
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ func (x placementIterator) iterateNodesForObject(obj oid.ID, f func(nodeDesc) er
if e, _ := lastRespErr.Load().(error); e != nil {
err = fmt.Errorf("%w (last node error: %w)", err, e)
}
return errIncompletePut{singleErr: err}
wg.Wait()
}
}
wg.Wait()
Expand Down

0 comments on commit a63db8a

Please sign in to comment.