Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration Test is failing on master branch #1398

Closed
alamb opened this issue Mar 4, 2022 · 13 comments · Fixed by #1402
Closed

Integration Test is failing on master branch #1398

alamb opened this issue Mar 4, 2022 · 13 comments · Fixed by #1402
Assignees
Labels
arrow Changes to the arrow crate bug

Comments

@alamb
Copy link
Contributor

alamb commented Mar 4, 2022

Describe the bug

The "Integration Test" CI test is failing on master. For example:
https://github.com/apache/arrow-rs/runs/5421632152?check_suite_focus=true

It appears to have started in c947027 (though I don't think that PR had anything to do with it)

Screen Shot 2022-03-04 at 9 18 07 AM

To Reproduce
Run integration test on master

Expected behavior
I expect it to pass

Additional context
The integration test pulls from https://github.com/apache/arrow so perhaps something upstream changed.

I think we need to sort out what is wrong with this test prior to the 10.0.0 release

@alamb alamb added bug arrow Changes to the arrow crate labels Mar 4, 2022
@xudong963
Copy link
Member

Maybe related to hyperium/tonic#887

@xudong963
Copy link
Member

#864 (comment)

When updated tonic to 6.0, I found the integration test also failed.

So I guess it's the problem of tonic @alamb

@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2022

Thanks @xudong963 👍

@alamb alamb self-assigned this Mar 4, 2022
@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2022

I do not yet know what is going on, but wanted to post on my progress (I don't have all that much more time to spend today on this)

Last good run @ Mar 3 9:04 AM EST: 258f828

First failed run @ Mar 3 12:47 PM EST: c947027

These commits PR from arrow appears to be in that time range, but are not obviously related to the tests

The error appears to be happening with C++ serving and rust requesting in two scenarios:

Testing file interval_mdn
Testing file duration

I am working to reproduce the error locally so I can debug further, but as I am not super familiar with the integration testing setup, it is slow going.

@jorgecarleitao
Copy link
Member

fwiw arrow2 is also failing on those two tests and we did not change anything in tonic, flight, IPC etc that could have caused this. I am investigating it as well.

@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2022

FYI @jorgecarleitao

I have been able to reproduce this locally.

Here are some notes I have in case that is helpful:

# check out arrow
# install archery:
cd arrow
pip install -e dev/archery[docker]
# link arrow-rs to arrow/rust
ln -s ../arrow-rs rust
# build cpp binaries
cd arrow/cpp
mkdir build
cd  build
cmake  -DARROW_BUILD_INTEGRATION=ON -DARROW_FLIGHT=ON --preset ninja-debug-minimal ..
cd ..
ninja

Then

# build rust:
cd ../arrow-rs
cargo build --all

now from arrow directory run the tests:

archery integration --with-cpp=true --with-rust=true

Run individual test:

(arrow_dev) alamb@MacBook-Pro-2:~/Software/arrow/cpp/build$ /Users/alamb/Software/arrow/cpp/build/debug/flight-test-integration-server -port 49153

Run in rust:
/Users/alamb/Software/arrow/rust/target/debug/flight-test-integration-client --host localhost --port=49153 --path /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/arrow-integration-v7xz5285/generated_dictionary_unsigned.json

Repro with this:
cd /Users/alamb/Software/arrow-rs && RUST_LOG=debug RUST_BACKTRACE=1 CARGO_TARGET_DIR=/Users/alamb/Software/df-target cargo run --bin flight-test-integration-client -- --host localhost --port=49153 --path /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/arrow-integration-v7xz5285/generated_dictionary_unsigned.json

Results in this
E0304 16:33:43.055262000 123145352572928 hpack_parser.cc:1240]         Error parsing metadata: error=invalid value key=:scheme value=grpc

@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2022

The output from rust is:

Error: Status { code: Unknown, message: "transport error", source: Some(tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: Reset(StreamId(1), INTERNAL_ERROR, Remote) }))) }

The output from C++ is

E0304 16:47:21.317255000 123145352572928 hpack_parser.cc:1240]         Error parsing metadata: error=invalid value key=:scheme value=grpc

Which seems to suggest some sort of bug in parsing headers or something

However, since other tests run, maybe it is something related to the contents?

Given how it came on suddenly, smells like something related to a change in a dependency, but I haven't found one yet

@matthewmturner
Copy link
Contributor

apologize if ive misunderstood any part of this or if i dont understand full intergration testing process - but given that both arrow and arrow2 started failing i am focusing on two scenarios:

  1. Dependency that both arrow and arrow2 use which caused issue. @alamb and @jorgecarleitao are each looking into this - so i consider this "covered" in the sense that its being looked into.
  2. Issue from C++ (content or dependency) that's impacted both. I know there was an email sent to dev email list mentioning the integration test fail but its not clear to me if anything beyond that has been raised on that side. Should we be raising something on that side?

@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2022

@matthewmturner -- I agree with your assesment.

Interesting the integration tests appear to fail inhttps://github.com/apache/arrow/ as well with the same errors:

https://github.com/apache/arrow/runs/5427260660?check_suite_focus=true

I am looking into that as well

@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2022

I tried backing out apache/arrow@2462492 (the first build that failed in arrow) from my local environment and it did not help

@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2022

Even when I use arrow at apache/arrow@e314d8d which had a passing Integration test my local test setup still fails.

This suggests that it is some change in third-party dependencies (either rust or c++) perhaps

@lidavidm
Copy link
Member

lidavidm commented Mar 4, 2022

I believe the :scheme pseudo-header is supposed to be 'http' or 'https', not 'grpc': https://grpc.github.io/grpc/cpp/md_doc__p_r_o_t_o_c_o_l-_h_t_t_p2.html

It sounds like Tonic is sending :scheme = grpc for some reason, then gRPC in C++ is sending a RST_STREAM back to Rust.

@alamb
Copy link
Contributor Author

alamb commented Mar 5, 2022

Thanks @lidavidm -- that was a great hint! I think I have a workaround now #1402 and we'll see if that works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants