Bug 3483 - closefrom() calls in sshd closes the file descriptors associated with Intel QAT crypto engine
Summary: closefrom() calls in sshd closes the file descriptors associated with Intel Q...
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 9.1p1
Hardware: Other Windows 10
: P5 enhancement
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks: V_9_2
  Show dependency treegraph
 
Reported: 2022-10-13 10:47 AEDT by Joel Schuetze
Modified: 2023-03-17 13:42 AEDT (History)
2 users (show)

See Also:


Attachments
Requested sshd.c patch (616 bytes, patch)
2022-10-13 10:47 AEDT, Joel Schuetze
djm: ok+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Joel Schuetze 2022-10-13 10:47:08 AEDT
Created attachment 3616 [details]
Requested sshd.c patch

This issue is seen while using Intel QAT Crypto OpenSSL Engine https://github.com/intel/QAT_Engine as a default OpenSSL engine.

Instead of using the OpenSSL library and CPU to process it, the user can configure to offload all OpenSSL crypto operations to the QAT engine including the operations from ssh and sshd applications.

As part of the QAT engine, there will be a number of file descriptors opened related to the internal memory allocator and other processes.

All the file descriptors related to QAT are opened as part of OPENSSL_init_crypto() which gets invoked from the seed_rng() function in the main() of sshd application code.

But the closefrom() call in the application after seed_rng() closes every file descriptor including the ones opened by QAT. This causes an issue in an inability to ssh into the system which uses QAT engine for default OpenSSL. The file descriptors are very much required for the QAT to process the ssh crypto request that is coming into the system.

We understand the need for closefrom() calls that they are related to Control persist feature.

The same issue was present during ssh out of the system using the QAT engine in older versions of the ssh application before OpenSSH 8.7.

But the issue has been fixed in the ssh application in the latest OpenSSH versions with this commit https://github.com/openssh/openssh-portable/commit/c9f7bba2e6f70b7ac1f5ea190d890cb5162ce127 

The closefrom() call is moved before seed_rng() call because of a similar issue faced by tcmalloc().

A similar fix is required for sshd application. Please find attached the patch which helps to resolve the issue.
Comment 1 Darren Tucker 2022-10-13 23:27:07 AEDT
Looks reasonable, there's no RNG ops in the code between where it was and where it moves to.
Comment 2 Darren Tucker 2022-11-09 09:44:03 AEDT
Applied, thanks.  It will be in 9.2p1.
Comment 3 Darren Tucker 2022-11-09 20:34:45 AEDT
BTW this caused a test breakage in the reexec test when built against OpenSSL 1.1.1 only. 1.1.1a and up is fine.  The specific thing that doesn't work is the fallback path when, eg, the sshd binary has been removed while it's still running, so it's a fairly esoteric case:

https://github.com/openssh/openssh-portable/actions/runs/3423783333/jobs/5705690743#step:11:854

I wanted to understand the reason for this, and I believe the reason is a bug in 1.1.1's RNG fixed in this commit:

https://github.com/openssl/openssl/commit/abf58ed3191dcd3a7c6b296b1494bd7fd336e253

My theory is that OpenSSL opens descriptors to the random devices earlier than it should, sshd closes that descriptor and ends up reusing it for its own purposes, then blows up when seed_rng ends up trying to seed from this reused descriptor.

I'll skip this specific test on that specific OpenSSL version.
Comment 4 Damien Miller 2023-03-17 13:42:09 AEDT
OpenSSH 9.3 has been released. Close resolved bugs