Bug 3419 - regular expression patterns in Host directive
Summary: regular expression patterns in Host directive
Status: NEW
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: Miscellaneous (show other bugs)
Version: 9.0p1
Hardware: Other All
: P5 enhancement
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-04-10 09:47 AEST by Christoph Anton Mitterer
Modified: 2022-05-16 02:10 AEST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christoph Anton Mitterer 2022-04-10 09:47:20 AEST
Hey.

One thing that could be really nice was, the Host directive (not sure whether it would make sense for Match, too) would allow to match not only the current patterns (which basically have "just" '*', '?', "[…]" and '!') but general regular expressions.

The idea is that one is able to more thoroughly specify hostnames, e.g. right now, in order to match:
server1.example.org
server2.example.org
server3.example.org
one can do:
a) server*.example.org
b) server?.example.org
c) server[1-3].example.org

(a) and (b) are not really exact (as they'd also match e.g. serverX.example.org). (c) is difficult if one has more digits (servers 1-45) and really wants to match only those.

The above is still a very simple example of course, but with regular expressions one could do much more nice things like matching:
(www\.)?[^.]+\.(intranet|public)\.example\.(com|biz)

Or (at least if PCRE were chosen as RE language) one could do exclusions (i.e. all but a certain pattern).


So the first motivation for such feature is "simply" to better select the actually desired hostnames only (which * and ? may not always help with).


And I guess that level is what this enhancement idea is about.


Possible future ideas:

However, this could be made even much more useful, if one would allow to use the matched subexpressions via back-references in the sub-directives of a given Host block.
IIRC Apache httpd, can do such things in its <(Location|Directory|File)Match> blocks.

I think of use-cases like:
Host ^[^.]+\.(intranet|public)\.example\.com$
     Identity ~/.ssh/example.com_\1_id_rsa

where different keys are used with different groups of hosts.

The above is obviously something, which one could do also now (just with a bit more configuration lines.


One could however think to even further extend syntax:

We at the university run many servers for the LHC Computing Grid.
These are all named like server<n>.example.org
We buy new ones every 2-3 years, and these are then all the same (so .e.g. n = 1 - 34 are one model and 35-53 another or so)

These have service processors (like HP's ILO or Dell's iDrac) whose SSH we use for serial console access... and typically their SSH is not so well maintained and after a while upgrades stop, so they don't support modern algorithms (e.g. just RSA keys, but not ED25519) or need things like:
        KexAlgorithms                   +diffie-hellman-group-exchange-sha256
        HostkeyAlgorithms               +ssh-rsa
        PubkeyAcceptedAlgorithms        +ssh-rsa


If one would now have something like an <If> directive that matches TOKENS or the above back-references one could do things like:

Host ^server(\d+)\.service\.example\.com$
     ProxyJump ...
     User ...
     If %1 >= 1 AND %1 <= 34
     {
        KexAlgorithms                   +diffie-hellman-group-exchange-sha256
        HostkeyAlgorithms               +ssh-rsa
        PubkeyAcceptedAlgorithms        +ssh-rsa
     }
     If %1 >= 35 AND %1 <= 52
     {
         # possible other non default settings
     }
The backref would be handled like a TOKEN.

Obviously, one would then need a powerful If directive with many operators, perhaps again with regexp string matching and perhaps also with conversion of the matched strings to integers... or operators like "in".


Not saying that this would really be worth all the (considerable) effort... it would be just something to keep in mind for possible future developments.


Cheers,
Chris.
Comment 1 Darren Tucker 2022-04-11 10:18:16 AEST
(In reply to Christoph Anton Mitterer from comment #0)
[...]
> The above is still a very simple example of course, but with regular
> expressions one could do much more nice things like matching:
> (www\.)?[^.]+\.(intranet|public)\.example\.(com|biz)

You can already do this via Match Exec, eg:

$ cat /tmp/ssh_config
Match Exec "egrep '^[^.]+\.(intranet|public)\.example\.com$' <<<'%h'"
	Hostname matched
Host *
	Hostname notmatched

$ ssh -G -F /tmp/ssh_config foo.public.example.com | awk '$1=="hostname"'
hostname matched

$ ssh -G -F /tmp/ssh_config somehost | awk '$1=="hostname"'
hostname notmatched

You can use any pattern matching language you have tooling for, not just regular expressions.
Comment 2 Christoph Anton Mitterer 2022-05-16 02:10:34 AEST
Nice, though the syntax is a bit ugly ;-)

But AFAIU, this would only work if the user's shell is bash, as it uses the non-standard <<<, right?


And it gives some ugly errors, if the user accidentally has a ' in the hostname.
In principle one could even think that this may cause accidental execution an intended remote command, locally:

It's a bit constructed of curse, but consider something lile:
intended:
ssh -G  "foo.public.example.com" "'; echo 'foo' >&2'" | awk '$1=="hostname"'

written by accident:
ssh -G  "foo.public.example.com'; echo 'foo' >&2'" | awk '$1=="hostname"'
that actually prints:
foo
hostname matched

Now replace echo 'foo' with 'rm -rf /'.

But of course it's clear, that the same could just happen without using the Match-exec at all... so it's not really an issue I think.




With %h, AFAIU, one really get's the same behaviour as with Host <pattern>, i.e. after any substitutions via the Hostname or CanonicalizeHostname options, right?
Could that be added to the description of %h? It already says for %n that it's the one from the command line.

I could provide a patch if it helps you.


Since you've left the issue open,... do you still consider this? Or is the Match+exec solution the way to go?
Cause if the latter, it would be nice if one could perhaps add that as an example somewhere in the config.
Ideally with non-bash specific code, I guess printf '%s' '%s' | egrep ... should do the job, too?!

One subtle remaining issue is perhaps, that this solution means that the values of %-escapes appear in the process list.
I mean there is non like %p with p being the password, but it might still be undesired by a user that others can see e.g. the true %h, which may have been obfuscated by using a fake name on the command line, and having ssh_config substitute that to the real one.
But again, only a very subtle thing, as usually there are other means to find out that for another user.




Cheers,
Chris.