| Summary: | sshd does not seem to terminate despite ClientAlive[Interval|CountMax] when a process is polling a remote forwarding channel | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Portable OpenSSH | Reporter: | willchan | ||||||
| Component: | sshd | Assignee: | Assigned to nobody <unassigned-bugs> | ||||||
| Status: | CLOSED FIXED | ||||||||
| Severity: | normal | CC: | djm, dtucker | ||||||
| Priority: | P5 | ||||||||
| Version: | 6.7p1 | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Bug Depends on: | |||||||||
| Bug Blocks: | 2698 | ||||||||
| Attachments: |
|
||||||||
|
Description
willchan
2017-08-09 14:06:07 AEST
(In reply to willchan from comment #0) > I should note at this point that the client is running OpenSSH_7.2p2 > and the server is running OpenSSH_6.7p1. I'd suggest trying 7.5p1, there was a keepalive bug (#2252) fixed in 7.3. OK, thanks! I'll try testing it the next time I get my team together to repro this situation again. I'll get back to you on that one. What did you think about my questions around this code snippet which I linked earlier from wait_until_can_do_something()? /* Wait for something to happen, or the timeout to expire. */ ret = select((*maxfdp)+1, *readsetp, *writesetp, NULL, tvp); At a very quick glance, it looks possible that if the client connection is dead (and thus readsetp never becomes ready), the ClientAliveInterval (tvp) may never be hit if writesetp always becomes ready before the ClientAliveInterval expires. In my situation, a monitoring service polling a remote forwarding channel's server port at an interval shorter than ClientAliveInterval might conceivably trigger this. Created attachment 3029 [details] keep track of the last time we heard from the client and trigger client_alive_check() (In reply to willchan from comment #2) > What did you think about my questions around this code snippet which > I linked earlier from wait_until_can_do_something()? I think you're right; the select won't time out so the client_alive_check() won't be triggered. Attached is an untested patch which might help... Cool, thanks! I glanced briefly at the patch and it looks like it'll definitely help. The minor nit I have is it could also update the select timeout. It's OK as is, but it means that the client_alive_check() may be called up to, in worst case, just under a full ClientAliveInterval after it should. The worst case is when writesetp is ready when (last_client_time + options.client_alive_interval == monotime()), so it fails the (last_client_time + options.client_alive_interval < monotime()) check. And then the next select() call times out after a full ClientAliveInterval. That's a nit. This fixes the bulk of the issue AFAICT. Thanks. Created attachment 3030 [details]
keep track of the last time we heard from the client and trigger client_alive_check()
I came up with the following to reproduce:
1) make sure you've got an inetd with the discard service enabled.
2) sshd -o ClientAliveInterval=3 -o ClientAliveCountMax=3 -p 2022
3) ssh -p 2022 -R 1234:localhost:9 localhost
4) while sleep 1; do echo foo; done | nc localhost 1234
5) pkill -STOP -u $USER -x ssh
-current does indeed hang. I found that my first patch kills the connection too early because once the last_client_time check fires it'll fire again immediately, so last_client_time needs to be reset when that happens. With that it works more or less as expected.
I'm not super concerned about the potential timing inaccuracy you mention as we're looking at redoing the select code to use something that allows a bit more flexibility and is easier to reason about.
I have committed a variant of this patch and it will be in the 7.6 release. Thanks for the report and analysis! Thanks for the quick turnaround! Much appreciated. Close all resolved bugs after release of OpenSSH 7.7. |