Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when network is disconnected #134

Open
aykevl opened this issue Nov 7, 2024 · 2 comments
Open

Crash when network is disconnected #134

aykevl opened this issue Nov 7, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@aykevl
Copy link
Contributor

aykevl commented Nov 7, 2024

I got this crash, I think when the network connection got disconnected:

ERRO[755131] did not receive last pong from dealer, 60s passed 
ERRO[755136] failed receiving dealer message               error="failed to get reader: WebSocket closed: sent close frame: status = StatusServiceRestart and reason = \"\""
ERRO[755161] did not receive last pong from dealer, 90s passed 
ERRO[755161] did not receive last pong ack from accesspoint, 173s passed 
ERRO[755161] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:51278->34.158.1.133:443: use of closed network connection"
WARN[755181] failed to connect to AP ap-gew4.spotify.com:80, retrying with a different AP  error="dial tcp: lookup ap-gew4.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:60895->10.65.42.1:53: i/o timeout"
ERRO[755191] did not receive last pong from dealer, 120s passed 
WARN[755201] failed to connect to AP ap-guc3.spotify.com:4070, retrying with a different AP  error="dial tcp: lookup ap-guc3.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:54788->10.65.42.1:53: i/o timeout"
ERRO[755221] did not receive last pong from dealer, 150s passed 
WARN[755221] failed to connect to AP ap-gew1.spotify.com:443, retrying with a different AP  error="dial tcp: lookup ap-gew1.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:56707->10.65.42.1:53: i/o timeout"
WARN[755241] failed to connect to AP ap-gae2.spotify.com:80, retrying with a different AP  error="dial tcp: lookup ap-gae2.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:46551->10.65.42.1:53: i/o timeout"
ERRO[755251] did not receive last pong from dealer, 180s passed 
WARN[755256] failed fetching new endpoint for accesspoint  error="failed fetching apresolve URL: Get \"https://apresolve.spotify.com/?type=accesspoint\": dial tcp: lookup apresolve.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:46939->10.65.42.1:53: i/o timeout"
WARN[755271] failed to connect to AP ap-gew4.spotify.com:4070, retrying with a different AP  error="dial tcp: lookup ap-gew4.spotify.com: i/o timeout"
ERRO[755281] did not receive last pong ack from accesspoint, 293s passed 
ERRO[755281] did not receive last pong from dealer, 210s passed 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc87ed8]

goroutine 15 [running]:
github.com/devgianlu/go-librespot/ap.(*Accesspoint).pongAckTicker(0x4000471880)
        /home/ayke/src/spotify/go-librespot/ap/ap.go:347 +0x1e8
created by github.com/devgianlu/go-librespot/ap.(*Accesspoint).startReceiving.func1 in goroutine 1
        /home/ayke/src/spotify/go-librespot/ap/ap.go:250 +0x10c

And this crash, probably the same bug, when the network definitely got disconnected (I think there was some maintenance on the router).

ERRO[937664] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:34414->104.199.65.9:443: read: connection reset by peer"                                                                                                            
INFO[937665] authenticated AP as 0bzabjufeuoh2wmg0n912ftgw                                                                                                                                                                                                                                
ERRO[938626] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:43240->104.199.241.202:80: read: connection reset by peer"                                                                                                          
INFO[938626] authenticated AP as 0bzabjufeuoh2wmg0n912ftgw                                                                                                                                                                                                                                
ERRO[952286] failed receiving packet                       error="failed reading packet header: EOF"                                                                                                                                                                                      
INFO[952287] authenticated AP as 0bzabjufeuoh2wmg0n912ftgw                                                                                                                                                                                                                                
ERRO[958771] did not receive last pong from dealer, 60s passed                                                                                                                                                                                                                            
ERRO[958776] failed receiving dealer message               error="failed to get reader: WebSocket closed: sent close frame: status = StatusServiceRestart and reason = \"\""                                                                                                              
ERRO[958801] did not receive last pong ack from accesspoint, 154s passed                                                                                                                                                                                                                  
ERRO[958801] did not receive last pong from dealer, 90s passed                                                                                                                                                                                                                            
ERRO[958801] failed receiving packet                       error="failed reading packet header: read tcp 10.65.42.112:39610->34.158.1.133:443: use of closed network connection"                                                                                                          
WARN[958821] failed to connect to AP ap-gew4.spotify.com:80, retrying with a different AP  error="dial tcp: lookup ap-gew4.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:53441->10.65.42.1:53: i/o timeout"                                                                         
ERRO[958831] did not receive last pong from dealer, 120s passed                                                                                                                                                                                                                           
WARN[958841] failed to connect to AP ap-guc3.spotify.com:4070, retrying with a different AP  error="dial tcp: lookup ap-guc3.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:50588->10.65.42.1:53: i/o timeout"                                                                       
ERRO[958861] did not receive last pong from dealer, 150s passed                                                                                                                                                                                                                           
WARN[958861] failed to connect to AP ap-gue1.spotify.com:443, retrying with a different AP  error="dial tcp: lookup ap-gue1.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:47574->10.65.42.1:53: i/o timeout"                                                                        
WARN[958881] failed to connect to AP ap-gae2.spotify.com:80, retrying with a different AP  error="dial tcp: lookup ap-gae2.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:40195->10.65.42.1:53: i/o timeout"                                                                         
ERRO[958891] did not receive last pong from dealer, 180s passed                                                                                                                                                                                                                           
WARN[958901] failed fetching new endpoint for accesspoint  error="failed fetching apresolve URL: Get \"https://apresolve.spotify.com/?type=accesspoint\": dial tcp: lookup apresolve.spotify.com on 10.65.42.1:53: read udp 10.65.42.112:40473->10.65.42.1:53: i/o timeout"               
WARN[958911] failed to connect to AP ap-gew4.spotify.com:4070, retrying with a different AP  error="dial tcp: lookup ap-gew4.spotify.com: i/o timeout"                                                                                                                                    
ERRO[958921] did not receive last pong ack from accesspoint, 274s passed                                                                                                                                                                                                                  
ERRO[958921] did not receive last pong from dealer, 210s passed                                                                                                                                                                                                                           
panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                                                                                   
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc87ed8]

goroutine 78 [running]:
github.com/devgianlu/go-librespot/ap.(*Accesspoint).pongAckTicker(0x4000454000)
        /home/ayke/src/spotify/go-librespot/ap/ap.go:347 +0x1e8
created by github.com/devgianlu/go-librespot/ap.(*Accesspoint).startReceiving.func1 in goroutine 1
        /home/ayke/src/spotify/go-librespot/ap/ap.go:250 +0x10c

I suspect the issue is here:

ap.conn, err = proxy.Dial(ctx, "tcp", addr)

This is ap.init, which is called from ap.connect, which is called from ap.reconnect. Looks like it sets ap.conn to nil when it can't connect resulting in the nil pointer dereference.

(Also, I think ap.connMu should have been used inside pongAckTimer?)


I should also say it's quite impressive that the daemon had been running crash-free for over 10 days!

@devgianlu devgianlu added the bug Something isn't working label Nov 7, 2024
devgianlu added a commit that referenced this issue Nov 9, 2024
This is a similar fix to 0b32df7. See #134
@devgianlu
Copy link
Owner

Also, I think ap.connMu should have been used inside pongAckTimer?

I have mixed feelings about this: yes it should be held for reading, but at the same time if we end up waiting because it is held for writing and then call close on it we are probably closing a connection that has just been created. Citing my own comment:

go-librespot/ap/ap.go

Lines 53 to 56 in 0950205

// connMu is held for writing when performing reconnection and for reading mainly when accessing welcome
// or sending packets. If it's not held, a valid connection (and APWelcome) is available. Be careful not to deadlock
// anything with this.
connMu sync.RWMutex


Main problem should be fixed with 0950205, same issue as 0b32df7.

@aykevl
Copy link
Contributor Author

aykevl commented Nov 9, 2024

I have mixed feelings about this: yes it should be held for reading, but at the same time if we end up waiting because it is held for writing and then call close on it we are probably closing a connection that has just been created.

It's still a data race, which is basically undefined behavior (though it might work well enough in practice since it's a pointer read, which will normally not be torn).

Maybe I'll take a stab at fixing this at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants