-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] FileFormat::GetReaderAsync hangs when ARROW_IO_THREADS is set to 1 #37487
Comments
It's worth noting that this only happens on |
Hmm let me take a look. Perhaps something is already use ioExecutor and re-allocating sub-task in ioExecutor, which might causing deadlock and hang forever |
I've found out the reason. But I'm too tired today, I'd like to fix it tomorrow. Or you can try to fix this and at me for review. Change Future<std::shared_ptr<Buffer>> BufferReader::ReadAsync(const IOContext& ctx,
int64_t position,
int64_t nbytes) {
return DeferNotOk(ctx.executor()->Submit([this, position, nbytes]() {
return DoReadAt(position, nbytes);
}));
} (The origin impl uses a blocking read so test cannot find problem here, it's just a hacking to reproduce the problem) The IO Thread would wait for it: Future<std::optional<int64_t>> ParquetFileFormat::CountRows(
const std::shared_ptr<FileFragment>& file, compute::Expression predicate,
const std::shared_ptr<ScanOptions>& options) {
auto parquet_file = checked_pointer_cast<ParquetFileFragment>(file);
if (parquet_file->metadata()) {
ARROW_ASSIGN_OR_RAISE(auto maybe_count,
parquet_file->TryCountRows(std::move(predicate)));
return Future<std::optional<int64_t>>::MakeFinished(maybe_count);
} else {
return DeferNotOk(options->io_context.executor()->Submit(
[parquet_file, predicate]() -> Result<std::optional<int64_t>> {
RETURN_NOT_OK(parquet_file->EnsureCompleteMetadata()); // <- here, it submit a task, and will wait for it in parquet_file->EnsureCompleteMetadata, causing deadlock
return parquet_file->TryCountRows(predicate);
}));
}
} |
Also cc @westonpace @pitrou , I'd like to implement a blocking read in |
I've implement a draft fixing here: #37514 I'm not sure it can solve all deadlock problem, and I'm a bit tried these days so test might be added later, you can try it yourself. |
…GetReader` (#37514) ### Rationale for this change As #37487 says. When thread cnt == 1, the thread might blocking in `ParquetFileFormat::GetReaderAsync`, that's because: 1. `ParquetFileFormat::CountRows` would call `EnsureCompleteMetadata` in `io_executor` 2. `EnsureCompleteMetadata` call `ParquetFileFormat::GetReader`, which dispatch real request to async mode 3. `async` is executed in `io_executor`. 1/3 in same fix-sized executor, causing deadlock. ### What changes are included in this PR? Implement sync `ParquetFileFormat::GetReader`. ### Are these changes tested? Currently not ### Are there any user-facing changes? Bugfix * Closes: #37487 Authored-by: mwish <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…rmat::GetReader` (apache#37514) ### Rationale for this change As apache#37487 says. When thread cnt == 1, the thread might blocking in `ParquetFileFormat::GetReaderAsync`, that's because: 1. `ParquetFileFormat::CountRows` would call `EnsureCompleteMetadata` in `io_executor` 2. `EnsureCompleteMetadata` call `ParquetFileFormat::GetReader`, which dispatch real request to async mode 3. `async` is executed in `io_executor`. 1/3 in same fix-sized executor, causing deadlock. ### What changes are included in this PR? Implement sync `ParquetFileFormat::GetReader`. ### Are these changes tested? Currently not ### Are there any user-facing changes? Bugfix * Closes: apache#37487 Authored-by: mwish <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
…rmat::GetReader` (apache#37514) ### Rationale for this change As apache#37487 says. When thread cnt == 1, the thread might blocking in `ParquetFileFormat::GetReaderAsync`, that's because: 1. `ParquetFileFormat::CountRows` would call `EnsureCompleteMetadata` in `io_executor` 2. `EnsureCompleteMetadata` call `ParquetFileFormat::GetReader`, which dispatch real request to async mode 3. `async` is executed in `io_executor`. 1/3 in same fix-sized executor, causing deadlock. ### What changes are included in this PR? Implement sync `ParquetFileFormat::GetReader`. ### Are these changes tested? Currently not ### Are there any user-facing changes? Bugfix * Closes: apache#37487 Authored-by: mwish <[email protected]> Signed-off-by: Benjamin Kietzman <[email protected]>
Describe the bug, including details regarding any error messages, version, and platform.
ARROW_IO_THREADS=1 python test.py
hangs forever. I traced this and it's stuck inGetReader
.It works normally if I use any
ARROW_IO_THREADS
larger than 1. I reproduced this on mac with version13.0.0
and linux with version12.0.0
.Component(s)
Python
The text was updated successfully, but these errors were encountered: