The way PAM is implemented in OpenSSH makes pam_set_data unusable for passing data between PAM stacks. This is causing issues with multiple PAM modules: - with pam_zfs_key it precludes mounting encrypted home dirs - with pam_krb5 it precludes storing TGT in user cache and registering ticket renewal with ktkt_warnd Current OpenSSH code flow with respect to PAM looks as follows. (monitor) | fork --------------------------(privsep child) | | pam_start | | | fork ----- (authentication 'thread') | | | | | pam_authenticate | | | | | pam_acct_mgmt | | | | | pam_chauthtok | | | | | X | | | | X | pam_setcred | pam_open_session | fork ------ (authenticated child) | | | setreuid(100) | | | fork ------- (future shell) | | | | | exec(bash) | | | | | X | | | X | pam_close_session | pam_setcred | pam_end | X The problem is, that pam_authenticate and pam_acct_mgmt is called in a separate auxiliary process. Any data stored using pam_set_data and any other state information stored by those two functions are lost when the auxiliary process exits (with exceptions like environment variables, which are sent over between the processes). (Opening as a new bug as oppossed to appending to ancient-history-burdened #688.)
Created attachment 2796 [details] Switch roles of monitor and auxiliar PAM process in PAM authentication. The proposed patch fixes this by switching the roles of the monitor and the auxiliary process when doing PAM authentication. In the new code the monitor will be the one calling pam_authenticate and pam_acct_mgmt (eventually blocking and calling callbacks), whereas the other process (callback child) will be sending messages to the client (either directly or through privsep child). The resulting code flow looks like this: (monitor) | fork --------------------------(privsep child) | | pam_start | | | fork ------- (callback child) | | | | pam_authenticate | | | | | pam_acct_mgmt | <--msgs--> | | | | pam_chauthtok | | | | | | X | | | | X | pam_setcred | pam_open_session | fork ------ (authenticated child) | | | setreuid(100) | | | fork ------- (future shell) | | | | | exec(bash) | | | | | X | | | X | pam_close_session | pam_setcred | pam_end | X When the monitor is done calling PAM functions, it sends the callback child a message relieving it from the duties. In response, the callback child sends back any state information required and exits. After that, the monitor resumes communication with the client. When privilege separation is off, passing state from the callback child back to the monitor is more tricky. The callback child has been communicating directly with the ssh client for PAM prompts and responses. When the monitor tries the resume the communication with the client, the packet state has already diverged. The child has sent several messages in between, and the client is expecting different sequence number, different state of crypto counters and so on. As a result, the child hangs up upon the first packet from the monitor after the exit of the callback child. To solve that, in the no privsep case, the callback child serializes the packet state before it exits, sends it over to the monitor, which restores the packet state. Now all the internal state matches and the ssh client does not hang up anymore. There is one corner case, where this does not work. When privilege separation has been turned off (not recommended, on by default) and compression has been set to 'zlib' (not recommended, 'zlib@openssh.com' by default), passing the packet state from the callback child back to the monitor fails with a compression state error. In that case, the current patch prints an error message and sets compression from 'yes' to 'delayed'. Making it work even with 'early' compression is possible, but would require keeping compression state in memory shared between the monitor and the callback child. I believe supporting this undesired configuration is not worth the damage it would do to the code layers abstraction and the risks. The proposed patch also removes the code for UNSUPPORTED_POSIX_THREADS_HACK. Now that all PAM calls take place in a single process space, there is no need for thread support. As a nit, substring 'thread' has been removed from symbol names too.
Created attachment 2807 [details] Updated patch - Switch roles of monitor and auxiliar PAM process in PAM authentication Fixed two regressions in the original patch: - last pam error msg was not shown - not propagated through loginmsg var - with compression=delayed connection broke after entering wrong password in the first attempt
I'm going to look at this, however I think it will be after we remove the code in auth-pam.c that supports SSHv1 TIS auth, now that the SSHv1 server code has been removed. (In reply to Tomas Kuthan from comment #2) > Making it work even with 'early' compression is possible That's now fixed because "early compression was removed :-) https://anongit.mindrot.org/openssh.git/commit/monitor_mm.c?id=0082fba4efdd492f765ed4c53f0d0fbd3bdbdf7f Does this work with UsePrivilegeSeparation=no and/or direct root logins?
Created attachment 2882 [details] Updated for 7.3 - Switch roles of monitor and auxiliar PAM process in PAM authentication This is the current version of the patch as used in Solaris with OpenSSH 7.3 updated to apply on current HEAD.
Thank you for looking at the patch. Following are the combinations that I tested: privsep(on,off) compression(on,off,delayed) authenticationmethods - combination password(correct,wrong-correct,wrong-wrong-wrong) ptty(with,without) pam_modules_fail(auth,acct,setcred,chauthtok,open_session) user(valid,invalid) account_state(OK,pw_expired,account_expired) (besides testing with the affected modules pam_krb5 and pam_zfs_key and with pam_chatty - a module I wrote, that prompts too much.) Disclaimer: I ran the tests three times before, but I didn't test the patch after I updated it to apply to HEAD just now. Please, let me know if there is anything I can do to help.
I don't see any code change that fixes this, why was this marked as RESOLVED FIX ? I understand if it is to be CLOSED and not resolved but it certainly doesn't appear to be fixed by a code change.