Bug 3531 - Ssh will not exit when it receives SIGTERM before calling poll in client_wait_until_can_do_something until some events happen.
Summary: Ssh will not exit when it receives SIGTERM before calling poll in client_wait...
Status: NEW
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh (show other bugs)
Version: 9.1p1
Hardware: Other Linux
: P5 enhancement
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-01 13:16 AEDT by renmingshuai
Modified: 2023-02-14 15:53 AEDT (History)
2 users (show)

See Also:


Attachments
myexpect.sh (638 bytes, text/x-sh)
2023-02-14 13:56 AEDT, renmingshuai
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description renmingshuai 2023-02-01 13:16:41 AEDT
In general, ssh will call poll in client_wait_until_can_do_something to wait for poll events with setting timeout to -1. When ssh receives SIGTERM before poll, it will not exit as expected until some events happen or receiving new signals.

client_loop
    client_wait_until_can_do_something
                                 <--------- SIGTERM 
                                 -> signal_handler()
        poll() // wait for events and never exit as expected

In my opinion, when SIGTERM is received, ssh should not continue to call the poll(), or call poll() with setting timeout to -1.
Comment 1 renmingshuai 2023-02-01 13:37:10 AEDT
I had this problem while using my expect script. My expect script call sftp for transferring files. When exit because of some error, it sent SIGTERM to the sftp process, which sent a signal to its child process, ssh. Ssh did not exit as expected and will never exit unless killing it.
Comment 2 Damien Miller 2023-02-10 15:02:55 AEDT
I think we need to convert the client to use ppoll() as we did the server, though now looking at the server I'm not 100% convinced we're doing the right thing there wrt SIGTERM and friends either.

Cc'ing dtucker@ as he's more recent on this than I am
Comment 3 renmingshuai 2023-02-14 13:56:56 AEDT
Created attachment 3673 [details]
myexpect.sh
Comment 4 renmingshuai 2023-02-14 13:58:18 AEDT
Maybe I didn't describe my problem clearly. Using ppoll instead doesn't seem to solve my problem.

As described above, My expect script would send a SIGTERM signal to the sftp process to kill it, when it exits because of some errors. The sftp process would also send a SIGTERM to its child process, ssh, to kill it after receiving the signal. However, if the ssh process, for example, received the signal after set_control_persist_exit_time()  and just before call poll() in client_wait_until_can_do_something(), it would call signal_handler() to handle the SIGTERM first and then sleep due to call poll() or ppoll() instead of exitting due to receive SIGTERM. In that case, the ssh process would never exit unless killing it by hand, becase it's parent, my expect script, had exited and would never send SIGTERM to it again.

client_loop
    client_wait_until_can_do_something
        set_control_persist_exit_time(ssh);
                                 <- receive SIGTERM 
                                 -> call signal_handler()
        poll() // sleep and never exit unless killing it by hand

In my opinion, when SIGTERM is received, ssh should not continue to call the poll(), or call poll() with setting timeout to -1.

The following describes the reproduction.

1. To make it easier to reproduce this problem, I added sleep(5) before calling poll().
```
# git diff
diff --git a/clientloop.c b/clientloop.c
@@ -560,6 +560,7 @@ client_wait_until_can_do_something(struct ssh *ssh, struct pollfd **pfdp,
                    ssh_packet_get_rekey_timeout(ssh));
        }

+       sleep(5);
        ret = poll(*pfdp, *npfd_activep, ptimeout_get_ms(&timeout));

        if (ret == -1) {
```

2. Run strace -Tttyvs 1024 ./myexpect.sh and myexpect.sh exited because of some error.
```
15:16:16.575781 openat(AT_FDCWD, "/dev/null", O_RDONLY) = 0</dev/null> <0.000023>
15:16:16.575862 fcntl(0</dev/null>, F_SETFD, FD_CLOEXEC) = 0 <0.000021>
15:16:16.575984 close(4</dev/null>)     = 0 <0.000026>
15:16:16.576064 close(3</dev/null>)     = 0 <0.000025>
15:16:16.576145 close(2</dev/null>)     = 0 <0.000026>
15:16:16.576229 close(0</dev/null>)     = 0 <0.000025>
15:16:16.576310 write(6<pipe:[427008753]>, "q", 1) = 1 <0.000380>
15:16:16.576776 close(-1)               = -1 EBADF (错误的文件描述符) <0.000026>
15:16:16.576927 exit_group(0)           = ?
15:16:16.577487 +++ exited with 0 +++
```
3. While myexpect.sh was running, strace its grandchild process,ssh. The ssh process still called poll() and then sleep even if the SIGTERM signal is received.
```
15:16:16.573303 nanosleep({tv_sec=5, tv_nsec=0}, {tv_sec=4, tv_nsec=998917430}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) <0.001154>
15:16:16.574512 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1678659, si_uid=0} ---
15:16:16.574541 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) <0.000021>
15:16:16.574726 ppoll([{fd=3<socket:[427009847]>, events=POLLIN}, {fd=3<socket:[427009847]>, events=0}, {fd=4<socket:[427011086]>, events=POLLIN}], 3, NULL, NULL, 0
```

4. When myexpect script exited, its child sftp and the child of sftp, ssh, are still exists. They would never exit unless being killed by hand because its parent has exited.
```
# ps -ef|grep myexpect
root     1680657 1462737  0 15:17 pts/3    00:00:00 grep --color=auto myexpect
[root@localhost ~]# ps -ef|grep sftp
root     1678659       1  0 15:14 ?        00:00:00 /home/communities/openssh-portable/sftp rms@10.137.16.214
root     1678663 1678659  0 15:14 ?        00:00:00 /usr/local/bin/ssh -oForwardX11 no -oPermitLocalCommand no -oClearAllForwardings yes -oForwardAgent no -l rms -s -- 10.137.16.214 sft  p
root     1680721 1462737  0 15:17 pts/3    00:00:00 grep --color=auto sftp
```
The attachment is myexpect.sh.
Sorry to interrupt again.
Comment 5 Damien Miller 2023-02-14 15:53:24 AEDT
That sort of race condition is exactly what ppoll() is designed to avoid. 

We can test quit_pending with signals blocked and have ppoll(3) atomically unblock them, so it would instantly be interrupted if a SIGTERM came in before ppoll() started. Have a look at how serverloop.c does it.