Bug 1254 - Race condition in ssh-agent AUTH_CONNECTION
Summary: Race condition in ssh-agent AUTH_CONNECTION
Status: CLOSED INVALID
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh-agent (show other bugs)
Version: 4.4p1
Hardware: ix86 FreeBSD
: P2 normal
Assignee: Assigned to nobody
URL:
Keywords: patch
Depends on:
Blocks: 1633
  Show dependency treegraph
 
Reported: 2006-10-25 06:22 AEST by Omar W. Hannet
Modified: 2010-03-26 10:51 AEDT (History)
1 user (show)

See Also:


Attachments
Adds a sleep when socket reads fail with EAGAIN (602 bytes, patch)
2006-10-25 06:25 AEST, Omar W. Hannet
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Omar W. Hannet 2006-10-25 06:22:59 AEST
In function after_select(), case AUTH_CONNECTION, the do-loop which handles socket reads will peg my CPU at close to 100% when errno is EAGAIN.

I'm running FreeBSD 6.2 pre-release, with OpenSSH built from the ports collection (security/openssh-portable).

The problem only occurs for me while running an automation script that sends commands through ssh to about a hundred servers at at time, and I have not been successful in identifying which server causes the problem.  But the bottom line is that the read fails with errno EAGAIN, and continues to fail in a very tight loop until a timeout occurs at some point.

My work-around was to introduce a tiny sleep before the continue statement in that loop, which is apparently enough to allow some data to become available for reading, and makes the problem go away.

I will attach my work-around as a patch, realizing that usleep() is probably not available on all platforms.
Comment 1 Omar W. Hannet 2006-10-25 06:25:39 AEST
Created attachment 1203 [details]
Adds a sleep when socket reads fail with EAGAIN
Comment 2 Damien Miller 2008-01-20 11:50:11 AEDT
This patch does not look correct - select() should guarantee that the socket has data ready to be read, so EAGAIN should not occur.

Even if it does, falling back through to select() again to wait would be the correct behaviour.
Comment 3 Damien Miller 2008-06-13 13:23:39 AEST
I can't reproduce this at all, can you try a recent ssh-agent (ideally from 5.0p1) to see if the behaviour persists?
Comment 4 Damien Miller 2009-07-31 10:19:00 AEST
1 year with no followup == bug closed
Comment 5 Damien Miller 2009-10-06 15:03:02 AEDT
Mass move of RESOLVED bugs to CLOSED now that 5.3 is out.
Comment 6 Darren Tucker 2010-03-26 10:51:54 AEDT
With the release of 5.4p1, this bug is now considered closed.