Hi, This is a bit of an edge-case (and I've already found a workaround using `exec tee`), but since updating to the latest release I've had 100 % CPU usage on a script, which you can replicate with `ssh <host> 2> >({exec 1>&2})`. (I'm actually redirecting stderr to a shell function that parses it for a while then gives up on it by doing `exec 1>&2`). After a few seconds, the ssh process starts using 100 % CPU. Using GDB, I consistently get the following trace: <SNIP> #1 0x0000556af8a302e1 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:39 #2 client_wait_until_can_do_something (conn_out_readyp=<synthetic pointer>, conn_in_readyp=<synthetic pointer>, rekeying=<optimized out>, npfd_activep=0x7ffe8c32d07c, npfd_allocp=0x7ffe8c32d078, pfdp=0x7ffe8c32d080, ssh=0x556af964bf80) at clientloop.c:575 <SNIP> I'm guessing this is linked to OpenBSD-Commit a77e16a667d5b194dcdb3b76308b8bba7fa7239c "upstream: convert ssh, sshd mainloops from select() to poll();".
I can't get this to replicate. What's the full command you're using? `ssh <host> 2> >({exec 1>&2})` doesn't include a command that makes any output
Created attachment 3581 [details] avoid polling fds we're not ready to service events on I did get it to reproduce, I was just holding it wrong. Thanks for the report and the reproduction instructions. Here's my analysis and a fix is attached: > I think the sequence of operations is something like: > > 1. Channel opens, stderr is attached > 2. Stderr goes away > 3. We poll with pfd[stderr].events including POLLOUT > 4. We get back POLLHUP (POLLERR on Linux) > 5. channel_handle_efd_write() sees sshbuf_len(c->extended)==0, returns > 6. GOTO 3 forever > > One problem is that there is no way to propagate a POLLHUP condition > back to a writable channel when there is no output pending. > > A more fundamental problem is step 3, where we unconditionally included > the fd in the poll array, regardless of whether the channel code had any > intention of attempting a write() later. When I was doing the conversion > from select(), I kept the fds in there because it made the matching of > pfd entries and channels easier but this was probably a mistake. > > The patch corrects the mistake (hopefully). It will only > set up a pollfd entry if IO was requested for the fd, and as a > consequence, should avoid spurious events. It also forced me to make > the pollfd bookkeeping less brittle :) > > PS. an alternate approach would be to leave the pollfd entries in the > array and find a way to handle POLLERR/POLLHUP events. I have a diff > that implements this approach too, by treating such a condition on > a channel that wasn't ready to write something as an automatic write > failure. I shelved this in favour of this approach.
*** Bug 3411 has been marked as a duplicate of this bug. ***
This has been fixed in git head and will be in the openssh-9.0 release that is due very soon. commit d6556de1db0822c76ba2745cf5c097d9472adf7c Author: djm@openbsd.org <djm@openbsd.org> Date: Wed Mar 30 21:10:25 2022 +0000 upstream: fix poll() spin when a channel's output fd closes without data in the channel buffer. Introduce more exact packing of channel fds into the pollfd array. fixes bz3405 and bz3411; ok deraadt@ markus@ OpenBSD-Commit-ID: 06740737849c9047785622ad5d472cb6a3907d10
closing bug resolved during openssh-9.0 release cycle