Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] limiting_resource_adaptor isn't actually thread safe #868

Closed
jrhemstad opened this issue Sep 13, 2021 · 2 comments · Fixed by #869
Closed

[BUG] limiting_resource_adaptor isn't actually thread safe #868

jrhemstad opened this issue Sep 13, 2021 · 2 comments · Fixed by #869
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@jrhemstad
Copy link
Contributor

jrhemstad commented Sep 13, 2021

Describe the bug

The limiting_resource_adaptor wraps an upstream and limits the amount of memory allocated from an upstream. If that limit is exceeded, it throws an rmm::bad_alloc.

An attempt was made at making this adaptor thread safe by using an atomic for the internal allocated_bytes_ used to track how much memory has been allocated from the upstream. However, this is not sufficient to guarantee the limit is enforced. This is the logic of allocating from the upstream:

  void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
  {
    void* p = nullptr;


    std::size_t proposed_size = rmm::detail::align_up(bytes, allocation_alignment_);
    if (proposed_size + allocated_bytes_ <= allocation_limit_) {
      p = upstream_->allocate(bytes, stream);
      allocated_bytes_ += proposed_size;
    } else {
      throw rmm::bad_alloc{"Exceeded memory limit"};
    }


    return p;
  }

This line:

if (proposed_size + allocated_bytes_ <= allocation_limit_)

is equivalent to:

auto const tmp = allocated_bytes_.load(memory_order_seq_cst)
if(proposed_size + tmp <= allocation_limit_)

Imagine at time t0 that allocation_limit_ is 2GB and allocated_bytes_ is 1GB.

At t0, two threads i and j attempt to allocate 1GB. They concurrently load allocated_bytes_ and see it is 1GB such that the check if (proposed_size + allocated_bytes_ <= allocation_limit_) will pass.

Both i and j will then attempt to allocate 1GB from the upstream. If both succeed (depending on the state of the upstream), the total allocated from the upstream will be 3GB, surpassing the 2GB limit.

Solution

A trivial fix is to just declare this adaptor is not threadsafe and require wrapping with the thread_safe_adaptor .

Similarly trivial is we could just add a lock internally to limiting_resource_adaptor.

A bit more involved, but still pretty easy is to just improve the allocation logic with the atomic.

  void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
  {
    auto const proposed_size = rmm::detail::align_up(bytes, allocation_alignment_);
    auto const old = allocated_bytes_.fetch_add(proposed_size);
    if (old + proposed_size <= allocation_limit_) {
     return upstream_->allocate(bytes, stream);
    }
    allocated_bytes_.fetch_sub(proposed_size);
    throw rmm::bad_alloc{"Exceeded memory limit"};
  }

  void do_deallocate(void* p, std::size_t bytes, cuda_stream_view stream) override
  {
    std::size_t allocated_size = rmm::detail::align_up(bytes, allocation_alignment_);
    upstream_->deallocate(p, bytes, stream);
    allocated_bytes.fetch_sub(allocated_size);
  }

In this "eager" algorithm, a thread will first atomically increment the allocated_bytes_ tracking variable before allocating from the upstream. In this way, two threads can never simultaneously observe a value of allocated_bytes_ that would allow them both to succeed even if the sum of their allocations would exceed the limit. If the old + proposed_size exceeds the limit, then you revert the previous fetch_add with a fetch_sub.

@jrhemstad jrhemstad added bug Something isn't working good first issue Good for newcomers labels Sep 13, 2021
@abellina
Copy link

@rongou fyi

@rongou
Copy link
Contributor

rongou commented Sep 13, 2021

I can take a look.

@rongou rongou self-assigned this Sep 13, 2021
@rapids-bot rapids-bot bot closed this as completed in #869 Sep 14, 2021
rapids-bot bot pushed a commit that referenced this issue Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants