Bug 2192 - scp output alignment bug with UTF-8/multibyte sequences
Summary: scp output alignment bug with UTF-8/multibyte sequences
Status: NEW
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: scp (show other bugs)
Version: 6.4p1
Hardware: Other Linux
: P5 enhancement
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-29 11:54 AEDT by Vincent Lefevre
Modified: 2020-08-07 18:14 AEST (History)
3 users (show)

See Also:


Attachments
screenshot (1.23 KB, image/png)
2016-08-08 23:59 AEST, Vincent Lefevre
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vincent Lefevre 2013-12-29 11:54:13 AEDT
In UTF-8 locales:

$ scp é z localhost:/tmp
é                                            100%    0     0.0KB/s   00:00    
z                                             100%    0     0.0KB/s   00:00    

It seems that scp thinks that "é" has two characters since it has two bytes in UTF-8.

I originally reported this bug against Debian in 2007[*], and it is still present.

[*] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=407088
Comment 1 Darren Tucker 2016-07-20 16:42:55 AEST
Ingo Schwarze just did some work on the progressmeter and its support for utf8:

https://anongit.mindrot.org/openssh.git/log/?qt=author&q=schwarze@openbsd.org+&showmsg=1

The work is already in the snapshots http://www.mindrot.org/openssh_snap/ and will be in the 7.3 release.

Does that fix the problem you're reporting?
Comment 2 Vincent Lefevre 2016-08-08 23:53:48 AEST
(In reply to Darren Tucker from comment #1)
> Does that fix the problem you're reporting?

I've tried Debian's openssh-client 1:7.3p1-1, and the problem is still there:

cventin:~> scp é z localhost:/tmp
Connected to cventin (from 127.0.0.1)
é                                            100%    0     0.0KB/s   00:00
z                                             100%  486     1.3MB/s   00:00
Comment 3 Vincent Lefevre 2016-08-08 23:59:15 AEST
Created attachment 2860 [details]
screenshot

Note: there's currently a problem with bugzilla, which interprets my UTF-8 characters as ISO-8859-1. I've attached a screenshot.
Comment 4 Vincent Lefevre 2016-08-09 00:03:11 AEST
(In reply to Vincent Lefevre from comment #3)
> Created attachment 2860 [details]
> screenshot
> 
> Note: there's currently a problem with bugzilla, which interprets my
> UTF-8 characters as ISO-8859-1. I've attached a screenshot.

The PNG file has been corrupted in the upload, so that it isn't visible either!
Comment 5 Vincent Lefevre 2017-07-28 18:04:50 AEST
It appears that my comments have been modified! "é" should appear as "é" (so that the alignment issue is now no longer visible). Without the accent over the "e" (if it gets corrupt again), that would give:

$ scp e z localhost:/tmp
e                                            100%    0     0.0KB/s   00:00
z                                             100%    0     0.0KB/s   00:00

showing the alignment issue.
Comment 6 Vincent Lefevre 2017-07-28 18:11:28 AEST
That's wrong again. It seems to be a bug in the current Bugzilla, which does a spurious ISO-8859-1 to UTF-8 transformation, while the input is already in UTF-8 (note: I've checked that the page in interpreted as UTF-8 by my browser, and the double encoding is also visible by downloading the HTML source).
Comment 7 Damien Miller 2020-08-07 14:12:39 AEST
Are you able to replicate this with a recent OpenSSH? There have been quite a few fixes in this area since 7.x.
Comment 8 Darren Tucker 2020-08-07 18:14:22 AEST
Yeah it's still there.  It's kinda hard to follow what progressmeter is doing composing the status line into a single buffer, we should probably put the component parts into their own dynamically allocated buffer and compose the final one with a single asmprintf.