| Summary: | Race condition in ssh-agent AUTH_CONNECTION | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Portable OpenSSH | Reporter: | noodle10000 | ||||||
| Component: | ssh-agent | Assignee: | Assigned to nobody <unassigned-bugs> | ||||||
| Status: | CLOSED FIXED | ||||||||
| Severity: | normal | CC: | djm, focht, ohannet | ||||||
| Priority: | P2 | Keywords: | patch | ||||||
| Version: | 5.2p1 | ||||||||
| Hardware: | ix86 | ||||||||
| OS: | Linux | ||||||||
| Bug Depends on: | 1254 | ||||||||
| Bug Blocks: | 1626 | ||||||||
| Attachments: |
|
||||||||
|
Description
noodle10000
2009-08-19 06:27:26 AEST
Created attachment 1670 [details]
fall back to select() on read/write interruptions
Could you try to reproduce the problem with this patch applied?
... and here is a theory on how it occurs: on a heavily loaded ssh-agent, we can create a new socket in the ssh-agent.c:after_select() loop, via the AUTH_SOCKET case calling new_socket(). This might increase sockets_alloc past the value it had when execution enters after_select(). The for() loop in after_select() can therefore progress into sockets that did not exist when select() and, critically, prepare_select() was called. prepare_select() sizes and clears the fd_sets that select() subsequently populates and after_select() tests. So a new AUTH_CONNECTION socket whose creation increments sockets_alloc can cause after_select to test past the end of the allocated fd_sets and might (depending on what it finds) treat them as ready for reading. Created attachment 1671 [details]
fix the root cause of the problem too
Patch applied to the ssh-agent.c in openssh-5.2p1 (RCS revision 1.159). I have now successfully run our scripts against 6000 hosts for the first time, so it appears to have solved the issue. I will be soak-testing over the next 48 hours and will update after that. (and thanks for the very quick response!) Have you been able to reproduce the problem with patch #1671 applied? (In reply to comment #5) > Have you been able to reproduce the problem with patch #1671 applied? We've not had any further problems with ssh-agent since applying #1671 - looks like it's fixed. Thanks! patch applied. This will be in openssh-5.4. Mass move of RESOLVED bugs to CLOSED now that 5.3 is out. *** Bug 1135 has been marked as a duplicate of this bug. *** With the release of 5.4p1, this bug is now considered closed. |