Bug 581 - SFTP "ls" listings never end
Summary: SFTP "ls" listings never end
Status: CLOSED WONTFIX
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sftp-server (show other bugs)
Version: -current
Hardware: Alpha OSF/1
: P2 major
Assignee: OpenSSH Bugzilla mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-05-31 20:52 AEST by Lawrence D'Oliveiro
Modified: 2004-04-14 12:24 AEST (History)
0 users

See Also:


Attachments
Fixes the problem (726 bytes, patch)
2003-06-04 07:24 AEST, Lawrence D'Oliveiro
no flags Details | Diff
Fixes the problem (1.42 KB, patch)
2003-06-04 17:11 AEST, Lawrence D'Oliveiro
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Lawrence D'Oliveiro 2003-05-31 20:52:16 AEST
When the OpenSSH SFTP server running on Compaq/HP Tru64 UNIX 5.1 
is sent an "ls" command for a directory on an NFS-mounted volume, it 
either returns an endlessly-repeating directory listing, or seems to hang 
without returning anything.

The problem has to do with the semantics of the readdir call, as used in 
the process_readdir routine in sftp-server.c. readdir returns NULL to 
indicate the end of the directory listing. However, process_readdir may 
then later call readdir again on the same directory handle. For a local 
volume, it will continue returning NULL. However, for an NFS-mounted 
volume, after returning NULL once, on Tru64 it will start returning the 
entire directory listing again.

Solution: add a separate Boolean flag to the Handle structure. This flag is 
initialized to false when a new directory context is created (it is not needed 
for file contexts). It is set to true when a readdir call returns NULL. Once it 
is true, no subsequent calls to readdir are to be made on that context; 
instead the next call to process_readdir will immediately return a 
SSH2_FX_EOF status.
Comment 1 Damien Miller 2003-06-01 00:53:25 AEST
How is this a bug in sftp and not a bug in Tru64?
Comment 2 Ben Lindstrom 2003-06-01 05:11:17 AEST
What client?

As far as I can see your solution would not work.  Since we open a FD handle 
on the server, process on that handle, then close/remove that handle.

The only non-freed case is if the handle passed to it is not a HANDLE_DIR or 
HANDLE_FILE or the handle is invalid.  Which then returns an errno of ENOENT.

So unless your hitting some odd memory corruption.  I'm not seeing how you are 
getting this behavior unless the client your using is not RFC legal.
Comment 3 Darren Tucker 2003-06-03 23:27:14 AEST
Err, if this bug is innapropriate, blame me.  I suggested that if you 
squinted and read the posix spec [1] at an angle in bad light the behaviour of 
Tru64's readdir might not be non-compliant.

It says it must return "an ordered sequence of all the directory entries" which 
it apparently does, and return a NULL at the end, which it apparently does.

Ben's point about the client is a good one though.  What is it?

[1] http://www.opengroup.org/onlinepubs/007904975/functions/readdir.html
Comment 4 Lawrence D'Oliveiro 2003-06-04 07:24:57 AEST
Created attachment 319 [details]
Fixes the problem

The client used was OpenSSH 3.6.1ps SFTP, though I expect the problem would
happen with any client.

The bug is triggered because each time process_readdir is called, it collects
up to 100 entries from readdir, or until readdir returns NULL. If it collected
any entries at all, it doesn't remember which case it encountered. So it will
get called again to return more entries, so it calls readdir again, which can
start returning the entire directory listing again. And so on, and so on.
Comment 5 Damien Miller 2003-06-04 08:19:15 AEST
Could you please attach that in unified "diff -u" format? These are much easier
to read.
Comment 6 Lawrence D'Oliveiro 2003-06-04 08:44:52 AEST
Sorry, no -u option on Tru64 diff:

diff -u sftp-server.c*
diff: illegal option -- u
Usage: diff [-bitw] [-c |-C Lines|-D String|-e|-f|-h|-n] File1 File2
       diff [-bilrstw] [-c |-C Lines|-e|-f|-h|-n] [-S File] Directory1 Directory2
Comment 7 Damien Miller 2003-06-04 09:05:00 AEST
Try diff -c then
Comment 8 Ben Lindstrom 2003-06-04 09:22:41 AEST
There are two aspects to this bug.

1. Tru64's kernel is broken. =)  After readdir() returns NULL the first time 
it should return NULL every other time.  Something Tru64 does not do when it 
is access files via NFS.

2. sftp-server depends on this behavior.

the Posix definition just states "the end of the directory is encountered, a 
null pointer shall be returned and errno is not changed."  which to me implies 
it should not rewind() the DIR* structure.  Which is what seems to be 
happening.

You need to really report this bug to Compaq/HP.  Before I consider any work 
arounds I would perfer to know that this has not been resolved in a service 
pack.
Comment 9 Lawrence D'Oliveiro 2003-06-04 17:11:53 AEST
Created attachment 320 [details]
Fixes the problem

OK, here's a diff -u done from a Linux box.
Comment 10 Lawrence D'Oliveiro 2003-06-04 17:31:27 AEST
Which way is Tru64's kernel "broken":

1) It doesn't do it the way Linux does (true enough), or
2) It doesn't conform to the official spec (not so clear).

Darren Tucker has previously referenced the spec at <http://
www.opengroup.org/onlinepubs/007904975/functions/readdir.html>. The 
relevant paragraph is the following:

The readdir() function shall return a pointer to a structure representing the 
directory entry at the current position in the directory stream specified by 
the argument dirp, and position the directory stream at the next entry. It 
shall return a null pointer upon reaching the end of the directory stream. 
The structure dirent defined in the <dirent.h> header describes a directory 
entry.

Now, what does this spec say about what happens when you call readdir 
again _after_ it has returned a null pointer? It's not clear at all. There 
doesn't seem to be any prerequisite attached to the post-condition " ...and 
position the directory stream at the next entry". So what is the "next" entry 
after the end of the directory stream? The Tru64 interpretation (start over 
from the beginning) seems as reasonable as anything else.

Remember the robustness principle <ftp://ftp.isi.edu/in-notes/rfc791.txt>: 
"be liberal in what you accept, conservative in what you send."
Comment 11 Damien Miller 2003-06-04 17:58:06 AEST
Have you checked whether this is a reported bug with DEC/Compaq/HP?

This *is* a bug - Neither BSD, Linux, Sun nor even Tru64 for non-NFS mounts
behave this way. While SUSv3 and Darren's reference don't mention what happens
after the initial NULL is returned, the behaviour to rewinddir() back to the
beginning of the directory is so utterly unexpected and bogus that it should not
need to be specified against.

I'd consider including a workaround if you can point to system documentation
describing this behaviour as being intended or if DEC/Compaq/HP don't accept
this as a bug.
Comment 12 Damien Miller 2003-08-26 14:03:24 AEST
no followup == no bug
Comment 13 Lawrence D'Oliveiro 2003-08-27 10:49:41 AEST
If you are claiming to offer "portable" code, then it is not really acceptable 
to use certain vendors' implementations as having definitive semantics, 
and others as not. You should go by the specification. And if the 
specification is unclear, you should not rely on that behaviour.
Comment 14 Damien Miller 2003-08-27 11:20:56 AEST
You have not provided one iota of evidence that this behaviour is not an OS bug.
Given that a single platform is the only one exhibiting the problem, and then
only for one specific file system, then it is almost certainly an OS bug. 

I'm willing be be convinced otherwise but, as I mentioned two months ago, the
onus is on you to provide evidence. Please don't reopen the bug unless you are
willing to do so.
Comment 15 Lawrence D'Oliveiro 2003-08-27 11:29:54 AEST
>You have not provided one iota of evidence that this behaviour is
>not an OS bug.

We have given you the specification. Is that not the right specification? Is 
there some more official specification somewhere else that we're not 
aware of?

You have not provided one iota of evidence that the Tru64/NFS behaviour 
is contrary to the specification. Your code does not work correctly with this 
behaviour. Therefore it is your code that has the bug.
Comment 16 Damien Miller 2004-04-14 12:24:19 AEST
Mass change of RESOLVED bugs to CLOSED