Bug 2573 - dead sessions cannot be closed with ~.
Summary: dead sessions cannot be closed with ~.
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh (show other bugs)
Version: 7.1p2
Hardware: Other Linux
: P5 normal
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks: V_9_1
  Show dependency treegraph
 
Reported: 2016-05-26 13:27 AEST by Christoph Anton Mitterer
Modified: 2022-10-04 21:58 AEDT (History)
2 users (show)

See Also:


Attachments
keep processing and buffering input during rekeying (by djm) (1.52 KB, patch)
2021-08-06 14:35 AEST, Darren Tucker
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Anton Mitterer 2016-05-26 13:27:02 AEST
Hey.

This is may be the client-side counterpart to bug #2572.
Versions, distro, etc., all as there, i.e. I also think it may have started with 6.9 (but I may be wrong her).


Every now and then, when some connections get stuck (it e.g. happens sometimes, but not always, when I reboot the server, and Debian's ssh systemd units do not yet properly close ssh connections before they kill networking, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=751636), ~. doesn't work anymore to terminate the session/client.

Similarly, but perhaps another issue comes happens during this https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=770135 ... see also its duplicate reports there):
When dbus is restarted, something gets apparently wrong with systemd-logind... and the logins with ssh are considerably delayed (until they finally succeed when something in systemd or so times out).
Now this is a systemd/dbus/whatever problem,... BUT... during the server hangs, because of that bug,... the client side hangs either.
Neither ~. nor Ctrl-C work then (even though no remote shell login has happened yet)


Amongst other, my ssh_config has:
TCPKeepAlive	no
ServerAliveInterval	15
ServerAliveCountMax	8

But right now, I cannot really tell if the client also seems to fail enforcing the Alive, as the server does (Bug 2572)... will wait for the 8*15s period next time it happens.


Any ideas?

Chris.
Comment 1 Damien Miller 2016-07-08 14:21:13 AEST
What version ssh are you using? Is it really 3.7.1p2?
Comment 2 Darren Tucker 2016-07-20 11:05:23 AEST
(In reply to Damien Miller from comment #1)
> What version ssh are you using? Is it really 3.7.1p2?

"I also think it may have started with 6.9" so no, but we still don't know exactly which version it is.  Christoph?
Comment 3 Christoph Anton Mitterer 2016-07-20 11:25:04 AEST
Hey.

Sorry, for the delay, somehow missed the notification from Damien's comment.

Uhm right now I'm in 7.2p2, and it still happens there from time to time.
As I wrote in the comment, it may have occurred since 6.9 but I'm all but sure about this.
The "3.7.1p2" was bogus, no idea how I managed to accidentally select this O:-)

Same as in #2572, I have rekeying enabled on the client side.... so perhaps it's the same as there?

Cheers.
Comment 4 Christoph Anton Mitterer 2017-08-16 10:44:09 AEST
An update... this still happens from time to time in 7.5p1 ...

But I don't know how to properly reproduce it. Usually it happens when the network went down ... but not always.
Comment 5 Darren Tucker 2017-08-16 11:04:46 AEST
A suggestion for a possible reproduction case: find the sshd handling a given connection on the server and kill -STOP it.

Note that ~ is only recognised after a carriage return, are you always hitting enter before ~?

Thinking about it, if the input channel buffer was full that might stop ssh reading input and thus processing escapes.  What kind of activity were you doing in the cases where you observed it?

Also, if you do happen to observe it, stracing the hung ssh might help. (i'd expect it to be blocked in select() but I could be wrong).
Comment 6 Christoph Anton Mitterer 2017-08-18 10:29:45 AEST
(In reply to Darren Tucker from comment #5)
> A suggestion for a possible reproduction case: find the sshd
> handling a given connection on the server and kill -STOP it.

Maybe I found another way to reproduce:
I started
1) ssh -vv somehost
2) while true; do sleep 1 ; dmesg ; done #though that may be totally unnecessary, I just thought it may fill some buffers on the other side (though that shouldn't be the input buffer you referred to?)
3) put the client computer into suspend to RAM, when I just came back an hour or so later, the connection was frozen and <return>~. didn't work.



> Note that ~ is only recognised after a carriage return, are you
> always hitting enter before ~?

Sure... I always did.

> Thinking about it, if the input channel buffer was full that might
> stop ssh reading input and thus processing escapes.  What kind of
> activity were you doing in the cases where you observed it?
> 
> Also, if you do happen to observe it, stracing the hung ssh might
> help. (i'd expect it to be blocked in select() but I could be wrong).

see below


Some details that may help you:

1)
$ ssh -vv kronecker
OpenSSH_7.5p1 Debian-5, OpenSSL 1.0.2l  25 May 2017
debug1: Reading configuration data /home/calestyo/.ssh/config
debug1: /home/calestyo/.ssh/config line 22: Applying options for kronecker
debug1: /home/calestyo/.ssh/config line 145: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 6: Applying options for *
debug1: /etc/ssh/ssh_config line 7: Deprecated option "useroaming"
/etc/ssh/ssh_config line 141: Unsupported option "rsaauthentication"
debug1: Control socket "/home/calestyo/.ssh/channel-mux/heisenberg_root@kronecker:22" does not exist
debug2: resolving "kronecker" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to kronecker [<snip snap>] port 22.
debug1: Connection established.
debug1: identity file /home/calestyo/.ssh/id_ed25519 type 4
debug1: key_load_public: No such file or directory
debug1: identity file /home/calestyo/.ssh/id_ed25519-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/calestyo/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/calestyo/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/calestyo/.ssh/id_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /home/calestyo/.ssh/id_rsa-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.5p1 Debian-5
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.5p1 Debian-5
debug1: match: OpenSSH_7.5p1 Debian-5 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to kronecker:22 as 'root'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256@libssh.org,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,ext-info-c
debug2: host key algorithms: ssh-ed25519,ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp521,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp384,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp256-cert-v01@openssh.com,ssh-rsa,ssh-rsa-cert-v01@openssh.com
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: MACs ctos: hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com
debug2: MACs stoc: hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com
debug2: compression ctos: none,zlib@openssh.com,zlib
debug2: compression stoc: none,zlib@openssh.com,zlib
debug2: languages ctos: 
debug2: languages stoc: 
debug2: first_kex_follows 0 
debug2: reserved 0 
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256@libssh.org,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256
debug2: host key algorithms: ssh-ed25519,ecdsa-sha2-nistp521,ssh-rsa,rsa-sha2-512,rsa-sha2-256
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: MACs ctos: hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com
debug2: MACs stoc: hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com
debug2: compression ctos: none
debug2: compression stoc: none
debug2: languages ctos: 
debug2: languages stoc: 
debug2: first_kex_follows 0 
debug2: reserved 0 
debug1: kex: algorithm: curve25519-sha256@libssh.org
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ssh-ed25519 SHA256:<snip snap>
debug1: Host 'kronecker' is known and matches the ED25519 host key.
debug1: Found key in /etc/ssh/ssh_known_hosts:18
debug2: set_newkeys: mode 1
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug2: set_newkeys: mode 0
debug1: rekey after 134217728 blocks
debug2: key: /home/calestyo/.ssh/id_ed25519 (<snip snap>)
debug2: key: /home/calestyo/.ssh/id_ecdsa ((nil))
debug2: key: /home/calestyo/.ssh/id_rsa (<snip snap>)
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering ED25519 public key: /home/calestyo/.ssh/id_ed25519
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: pkalg ssh-ed25519 blen 51
debug2: input_userauth_pk_ok: fp SHA256:<snip snap>
debug1: Authentication succeeded (publickey).
Authenticated to kronecker ([<snip snap>]:22).
debug1: channel 0: new [client-session]
debug2: channel 0: send open
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: network
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug2: callback start
debug2: fd 3 setting TCP_NODELAY
debug2: client_session2_setup: id 0
debug2: channel 0: request pty-req confirm 1
debug1: Sending environment.
debug1: Sending env LANG = en_DE.UTF-8
debug2: channel 0: request env confirm 0
debug2: channel 0: request shell confirm 1
debug2: callback done
debug2: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel_input_status_confirm: type 99 id 0
debug2: PTY allocation request accepted on channel 0
debug2: channel 0: rcvd adjust 2097152
debug2: channel_input_status_confirm: type 99 id 0
debug2: shell request accepted on channel 0
Linux kronecker 4.12.0-1-amd64 #1 SMP Debian 4.12.6-1 (2017-08-12) x86_64

You have new mail.
Last login: Fri Aug 18 02:21:09 2017 from <snip snap>
root@kronecker:~# 

=> then I went into suspend, after resume (the session was already hanging here and couldn't be ~. closed anymore):
debug1: enqueue packet: 94
debug1: SSH2_MSG_KEXINIT sent
debug1: rekeying in progress
debug1: enqueue packet: 80
debug1: rekeying in progress
debug1: enqueue packet: 80
debug1: rekeying in progress
debug1: enqueue packet: 80
debug1: rekeying in progress
debug1: enqueue packet: 80
debug1: rekeying in progress
debug1: enqueue packet: 80
debug1: rekeying in progress
debug1: enqueue packet: 80
debug1: rekeying in progress
debug1: enqueue packet: 80
debug1: rekeying in progress


And you were right:
calestyo@heisenberg:~$ ps ax | grep ssh
 3642 ?        Ss     0:00 /usr/bin/ssh-agent cinnamon-session-cinnamon
15730 pts/6    S+     0:00 ssh -vv kronecker
16661 pts/4    S+     0:00 grep --color=auto ssh
calestyo@heisenberg:~$ strace -p 15730
strace: Process 15730 attached
select(7, [3], [], NULL, {tv_sec=7, tv_usec=854645}) = 0 (Timeout)
clock_gettime(CLOCK_BOOTTIME, {tv_sec=24266, tv_nsec=97525160}) = 0
write(2, "debug1: enqueue packet: 80\r\n", 28) = 28
write(2, "debug1: rekeying in progress\r\n", 30) = 30
clock_gettime(CLOCK_BOOTTIME, {tv_sec=24266, tv_nsec=97622715}) = 0
select(7, [3], [], NULL, {tv_sec=15, tv_usec=0}

) = 0 (Timeout)
clock_gettime(CLOCK_BOOTTIME, {tv_sec=24281, tv_nsec=112709952}) = 0
write(2, "debug1: enqueue packet: 80\r\n", 28) = 28
write(2, "debug1: rekeying in progress\r\n", 30) = 30
clock_gettime(CLOCK_BOOTTIME, {tv_sec=24281, tv_nsec=112808125}) = 0
select(7, [3], [], NULL, {tv_sec=15, tv_usec=0}^Cstrace: Process 15730 detached
 <detached ...>


Hope that helps,
Chris.
Comment 7 Darren Tucker 2017-08-18 11:45:08 AEST
(In reply to Christoph Anton Mitterer from comment #6)
[...]

This is the clue: it's stuck in rekeying:

I was able to reproduce with:
$ ssh -o rekeylimit="1G 15s" -vvv server
$ sudo kill -STOP [sshd_pid]

> debug1: SSH2_MSG_KEXINIT sent
> debug1: rekeying in progress

[...]
> select(7, [3], [], NULL, {tv_sec=15, tv_usec=0}

FD#3 is almost certainly the socket to the server, and stdin is not checked.

The reason for this is in client_wait_until_can_do_something() -> channel_prepare_select():

	if (!rekeying)
		channel_handler(channel_pre, *readsetp, *writesetp,
		    minwait_secs);

so while it's rekeying it won't process stdin and thus doesn't look for escape sequences.  I'm not sure what do to about it, though.

In the mean time, you could mitigate by increasing or turning of rekeying on the affected connection.
Comment 8 Christoph Anton Mitterer 2017-08-18 11:49:58 AEST
>In the mean time, you could mitigate by increasing
>or turning of rekeying on the affected connection.

Well it's really not a big deal, just reported it, though you guys can fix it sooner or later :-)

If there's anything left I can do, just tell!
Comment 9 Christoph Anton Mitterer 2021-06-29 05:46:07 AEST
Hey.

Anything new with respect to this?

Actually I think this bug can cause accidental problems, consider the following:

- one has such a hanging ssh connection (not knowing it)
- one copy&pastes a command into that terminal like
  rm -rf something
- one notices the session is dead, tries to kill it with "~.", which doesn't work
- one kills the ssh proccess

but what happens now is, that all stuff that was entered before is then on the shell's stdin (i.e. the shell from which ssh was invoked).
So instead of a remote rm -rf something, this is done locally... which might be bad of course.


Cheers,
Chris
Comment 10 Darren Tucker 2021-08-06 14:33:31 AEST
Damien has come up with a potential solution, I'm just running it through the regress tests now.

BTW an slightly easier testcase: start an sshd in debug mode (/path/to/sshd -ddd -p 2222), ssh into it with rekeying (ssh -o rekeylimit="1G 15s" -vvv localhost), CTRL-Z the sshd then wait until the client tries to rekey.
Comment 11 Darren Tucker 2021-08-06 14:35:35 AEST
Created attachment 3541 [details]
keep processing and buffering input during rekeying (by djm)
Comment 12 Darren Tucker 2021-08-12 11:18:41 AEST
It's quite late in the release cycle, and because this requires changes in the channels code we want to put this in early for maximum chance to catch problems so we're deferring this.
Comment 13 Damien Miller 2022-04-20 15:10:08 AEST
Something similar has been committed and should be in openssh-9.1

commit fec014785de198b9a325d1b94e324bb958c5fe7b
Author: djm@openbsd.org <djm@openbsd.org>
Date:   Wed Apr 20 04:19:11 2022 +0000

    upstream: Try to continue running local I/O for channels in state
    
    OPEN during SSH transport rekeying. The most visible benefit is that it
    should make ~-escapes work in the client (e.g. to exit) if the connection
    happened to have stalled during a rekey event. Based work by and ok dtucker@
    
    OpenBSD-Commit-ID: a66e8f254e92edd4ce09c9f750883ec8f1ea5f45
Comment 14 Damien Miller 2022-10-04 21:58:04 AEDT
Closing bugs from OpenSSH 9.1 release cycle