-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mysql failure while dropping a future in the middle of reading or writing to the stream. #563
Comments
It might be related to me connecting to mariadb instead of mysql. pub(crate) async fn recv_packet(&mut self) -> Result<Packet<Bytes>, Error> { at if payload[0] == 0xff When I add the following code just before the if if payload.is_empty() {
return Err(
Error::Protocol("Empty package".to_string())
);
} I get the following error: So perhaps ping is just supposed to return 0 bytes and the right fix for this is to change the if to if payload.get(0) == Some(&0xff) { However there are many other places after this that assumes payload to be non empty, so they would also have to be changed I assume. |
What version of MariaDB are you using? Is it possible for you to put together a small example that showcases the problem? It's impossible that an empty payload is supposed to be returned. There must be something else afoot. |
I am using |
It is also somewhat odd that it only happens when I run multiple queries at the same time. |
I have done a bit printf tracing antialize@53d55dd. Below is a trace of the interesting bits: user@server:~$ grep "packet_size: 0" log
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,660 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @6 sequence_id: 0, packet_size: 0, payload.len(): 0, header.len(): 0
Jul 26 19:07:29 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:29,076 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @4 sequence_id: 0, packet_size: 0, payload.len(): 0, header.len(): 0
user@server:~$ grep "@4" log | tail -n 10
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,337 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @4
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,337 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @4 sequence_id: 26, packet_size: 27, payload.len(): 27, header.len(): 0
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,337 INFO [sqlx_core::mysql::connection::stream] mysql write_packet @4
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,337 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @4
Jul 26 19:07:29 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:29,076 INFO [sqlx_core::mysql::connection] ping:0 @4
Jul 26 19:07:29 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:29,076 INFO [sqlx_core::mysql::connection] ping:1 @4
Jul 26 19:07:29 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:29,076 INFO [sqlx_core::mysql::connection::stream] wait_until_ready:1 @4
Jul 26 19:07:29 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:29,076 INFO [sqlx_core::mysql::connection::stream] wait_until_ready:4 @4
Jul 26 19:07:29 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:29,076 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @4
Jul 26 19:07:29 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:29,076 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @4 sequence_id: 0, packet_size: 0, payload.len(): 0, header.len(): 0
user@server:~$ grep "@6" log | tail -n 10
Jul 26 19:07:27 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:27,935 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @6
Jul 26 19:07:27 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:27,935 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @6 sequence_id: 26, packet_size: 27, payload.len(): 27, header.len(): 0
Jul 26 19:07:27 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:27,935 INFO [sqlx_core::mysql::connection::stream] mysql write_packet @6
Jul 26 19:07:27 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:27,935 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @6
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,488 INFO [sqlx_core::mysql::connection] ping:0 @6
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,488 INFO [sqlx_core::mysql::connection] ping:1 @6
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,488 INFO [sqlx_core::mysql::connection::stream] wait_until_ready:1 @6
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,488 INFO [sqlx_core::mysql::connection::stream] wait_until_ready:4 @6
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,488 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @6
Jul 26 19:07:28 ws5 a3c66e9325e0[30076]: 2020-07-26 19:07:28,660 INFO [sqlx_core::mysql::connection::stream] mysql recv_packet:1 @6 sequence_id: 0, packet_size: 0, payload.len(): 0, header.len(): 0 It seems that both streams receive a zero size package in |
I tried digging in and reproducing this but I can't. I came up with this, that from the description sounds like it should trigger this, but everything works fine. Tried on all supported versions of MariaDB and MySQL. #[sqlx_macros::test]
async fn it_can_run_many_queries_at_once() -> anyhow::Result<()> {
let pool = pool::<MySql>().await?; // 10 max connections
let mut handles = Vec::new();
for i in 0..100 {
let pool = pool.clone();
handles.push(async move {
let val: i32 = sqlx::query_scalar("SELECT ?, SLEEP(1)").bind(i).fetch_one(&pool).await?;
assert_eq!(val, i);
Ok::<_, anyhow::Error>(())
});
}
future::try_join_all(handles).await?;
Ok(())
} |
Thanks for looking at it. I will try to blind my data and minimize my failing example. |
The issue turned out to be somewhat unrelated to what I though.
I have tried wrapping all the queries in |
I have been looking into making the code support dropping the future on the floor in the middle of the query, however I do not think this can be done without making a lot of copies of data, so for efficiency reasons I think that would be a no go. So it would probably make sense to say that drops should not be allowed. If this is the Case it would probably make sense to panic on the drop. Something like struct DropGuard {}
impl DropGuard {
fn release(self) {std::mem::forget(self);}
}
impl Drop for DropGuard {
fn drop(&mut self) {
panic!("I should not be dropped")
}
}
....
pub(crate) async fn recv_packet_inner(&mut self) -> Result<Packet<Bytes>, Error> {...}
pub(crate) async fn recv_packet(&mut self) -> Result<Packet<Bytes>, Error> {
let g = DropGuard{};
let a = self.recv_packet_inner().await;
g.release();
a
} This should not be too costly. If I am not mistaken the cost of this is a single match over the state (index) of the generate future at drop time. |
Really appreciate the dive here. This now makes perfect sense to me. Very busy at the moment so can't dive in but I will within a day to fix it. Basically we need to make sure the stream is properly exhausted before we run the ping command. |
The reading should be relationally easy to fix as you say, you can just keep a counter of how many bytes to skip within the stream. I think the hard problem to fix is dropping the future in the middle of sending a command to the database, here you cannot just keep a counter of how many bytes you have yet to send, you have to send the right bytes for the command. |
Okay so this is not what I was thinking. We're talking about futures being dropped as a way to cancel the request. Hmmm.. 🤔 I'm going to sit and think about this for a bit. I don't think SQLx is properly following Rust and allowing futures to be dropped. |
We're hitting this on our actix-web app too. Our very crude workaround is to pass down the connection string to each new actix thread and connect then, that way it failing doesn't bring down the whole system. Obviously that eliminates 95% of why we'd have a pool in the first place, but it's a necessary kludge for now. |
We now have a reasonable way to reproduce this with #593 |
We've found that it's possible to see a panic after sqlx/sqlx-core/src/mysql/connection/stream.rs Line 147 in 6a70780
Bytes of size 0, so [0] is out of bounds and causes a panic.
The only way I can see that happening is if the let packet_size = header.get_uint_le(3) as usize; because the read buffer will always be set to the size it's asked to read: sqlx/sqlx-core/src/io/buf_stream.rs Line 115 in 6a70780
Should I PR on throwing a new error if the packet size is 0? Or is a 0 packet size okay and we should simply not do this if it is 0 |
Has anything gone in that might help with this? It's still causing headaches for us -- even a minute of outage on our live system today. |
I think the best course of action would be to drop the connection from the pool, if a query/query result is dropped before it is done. This is much easier than somehow trying to cancel the query and recover the connection, though it is much less performant. I can try to create a PR to implement this, if the maintainers would be willing to merge something with that approach. |
@antialize it's perhaps less performant in the individual case, but the simpler recovery means all the callers will have an easier time recovering too. Today's outage was from 100% CPU, where the many sqlx panics had it thrashing as the whole conn pool sat there picking up connections and immediately erroring out. It doesn't make sense to me to live with site-crashingly poor performance because there's a theoretical better implementation that won't be implemented in months, when an easier (and at least better for my use case) approach can go in immediately. |
I'm hitting this as well; although I'm not using |
@antialize We'd very much appreciate a PR that fixed this in the short term in the way you're describing. |
I have added a pull request #915. This does not fix the issue by dropping the connection as I first proposed, instead it makes |
First of all, thank you for the great work. |
As I was writing this response, I realized that the work-around described here was never implemented. For some reason I thought it was, and merged. But reading back up I see that wasn't done ( nothing on you @antialize , I should have followed up on this ). I think the reason I thought it was merged was #918. I'll see if we can get this investigated in the next couple days. @kraigher Do you have In the Lastly, I haven't merged the other PR @antialize because it only addresses a slice of the problem. It's my fault for not communicating this better but a simple guide is look at every |
I do not have |
Can anyone that's experienced this problem and can reproduce it, try #1025 and see how it works ? |
I have implemented a web backend using sqlx, when running requests one at a time, everything works fine, but when multiple requests run concurently it causes A panic within the mysql part in sqlx.
thread 'tokio-runtime-worker' panicked at 'index out of bounds: the len is 0 but the index is 0', /home/jakobt/.cargo/git/checkouts/sqlx-f05f33ba4f5c3036/eba6f39/sqlx-core/src/mysql/connection/stream.rs:152:12
I currently cannot compile agains master so the bug is against antialize@908279f, this is just head of master with an additional symbol exposed.
The text was updated successfully, but these errors were encountered: