Bug 3312 - Behavior change in OpenSSH8.1 compared to OpenSSH7.5
Summary: Behavior change in OpenSSH8.1 compared to OpenSSH7.5
Status: CLOSED WORKSFORME
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 8.1p1
Hardware: PPC AIX
: P5 major
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-18 00:26 AEST by Mayank Sharma
Modified: 2022-02-25 13:57 AEDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mayank Sharma 2021-05-18 00:26:46 AEST
Hi

I recently build and install Openssh8.1 and I found out some strange behavior compared to OpenSSH 7.5. 
I want to get your opinion on this change in behavior between 7.5 and 8.1 OpenSSH.

Problem title:

8.1 OpenSSH waits for command run in background

Problem description:

Using ssh to run a command in the background will cause the ssh session to stay open until the command is done executing, if using OpenSSH 8.1. For example:

ssh -n user@hostname 'nohup /tmp/dosleep > /dev/null & > /dev/null'

 
...if /tmp/dosleep has a sleep of 30 seconds, this session will stay open for 30 seconds. At earlier levels of OpenSSH, the ssh session would exit immediately since the command is being run in the background.

Analysis so far:
This test is run using this script:

#!/bin/ksh
exec 1>/tmp/dosleep.out
sleep 10
lslpp -L

 
Run:

# time (ssh -n root@tele1 'nohup /tmp/dosleep > /dev/null & > /dev/null')

...output will show it paused for 10 seconds:

real    0m11.30s
user    0m0.12s
sys     0m0.00s

...whereas using OpenSSH 7.5 version of sshd will show more like "real    0m0.94s" as the ssh session is allowed to exit immediately.




Redirecting stderr to null will allow it to exit immediately even when using 8.X:
# time (ssh -n root@tele1 'nohup /tmp/dosleep > /dev/null 2>&1 & > /dev/null')

real    0m0.94s
user    0m0.12s
sys     0m0.00s

 

sshd debug will show that we pause here:

debug2: notify_done: reading
debug2: channel 0: read<=0 rfd 13 len 0
debug2: channel 0: read failed
debug2: channel 0: chan_shutdown_read (i0 o3 sock -1 wfd 13 efd 15 [read])
debug2: channel 0: input open -> drain
debug2: channel 0: ibuf_empty delayed efd 15/(0)
<pause here>


In file, channels.c (OpenSSH8.1), If I do the following change, OpenSSH8.1 behaves same as what we saw in ssh 7.5 starts working.

	if ((len = sshbuf_len(c->input)) == 0) {
		if (c->istate == CHAN_INPUT_WAIT_DRAIN) {
	-		/*
	-		 * input-buffer is empty and read-socket shutdown:
	-		 * tell peer, that we will not send more data:
	-		 * send IEOF.
	-		 * hack for extended data: delay EOF if EFD still
	-		 * in use.
	-		 */
	-		if (CHANNEL_EFD_INPUT_ACTIVE(c))
	-			debug2("channel %d: "
	-			    "ibuf_empty delayed efd %d/(%zu)",
	-			    c->self, c->efd, sshbuf_len(c->extended));
	-		else
	-			chan_ibuf_empty(ssh, c);
	+		chan_ibuf_empty(ssh, c);
		}
		return;
	}


It seems to think we have an input - an 'extended file descriptor / EFD" - active still that we are waiting on input from.  efd 15 is opened for the stderr of the command.

This might mean we still think we have input to read from stderr - since we see that, by redirecting stderr to stdout, the problem does not happen.

It says this is a 'hack for extended data', which is a bit worrying. 
This happens whether the program being run is a ksh shell script or a compiled C program.

Testing by adding a circumvention and calling chan_ibuf_empty even if we do hit that CHANNEL_EFD_INPUT_ACTIVE condition will avoid the problem, although that would undo the 'hack' that is in place.

It would appear the logic is not correct that leads us to wait on this file descriptor since we are running the command with nohup and in the background. Even if something in the script were to write to stderr after our sleep has happened, we shouldn't be displaying that on the client side, since the nohup ought to prevent things from writing to the screen.

But then, if I change my test script to this:

#!/bin/ksh
exec 1>/tmp/dosleep.out
sleep 10
lslpp -L
echo "this is going to stderr" >&2

 
...and run this:

# time (ssh -n root@tele1 'nohup /tmp/dosleep >/dev/null & > /dev/null')

...the client side actually sees what the script is writing to stderr:

 
<10 second pause happens here, then we see: >

this is going to stderr
real    0m11.47s
user    0m0.12s
sys     0m0.00s

 
I wouldn't have expected we'd see that since we are nohup'ing it.

Testing the same way with 7.5 sshd, we do not see the 'this is going to stderr' output.

What do you think, is this change in behavior a defect?

Thanks
Mayank
Comment 1 Damien Miller 2021-05-18 11:18:03 AEST
sshd has no information of what is running in the user session except:

1) SIGCHILD/waitpid() notifications of process exit
2) File descriptors closing

sshd does see the notifications of process exit for your example. E.g. if you run in verbose mode (ssh -vvv ...) then you'll see:

> debug1: client_input_channel_req: channel 0 rtype exit-status reply 0

But the child process' stderr file descriptor is still open from ssh'd perspective and sshd has no way of knowing whether important information could be coming until the child process closes it.

In your case of running nohup - sshd will see that stderr is still open and wait for it to close before finalising the session.

> if something in the script were to write to stderr after our sleep has
> happened, we shouldn't be displaying that on the client side, since the
> nohup ought to prevent things from writing to the screen.

This is not entirely correct - nohup will prevent subprocesses from writing to a *TTY*. There is no TTY involved in your example above because ssh does not request one when a command is specified on the command-line.

If you change your example to request a TTY then it should exit immediately:

ssh -tn user@hostname 'nohup /tmp/dosleep > /dev/null &'
Comment 2 Damien Miller 2022-02-25 13:57:56 AEDT
closing bugs resolved before openssh-8.9