" itemprop="description"/>

JSE's Blog

Jonah's blog is here

Lighttpd, regex, rewrites, and you

- Posted in Web/Technology by

So mempler (join our discord server if you don't know him!) is working on a currently undisclosed project on one of my servers and wanted to do some rewriting to a php file.

I never have done much with rewrites ever with anything. I never really cared about ugly URLs anyway with anything I ran in PHP so I just didn't care. On the occasion I needed to run something that did, someone always provided it to me in a .htaccess with it, but I want to use lighttpd. It's just more efficient for what we want. Fight me.

Anyway he wanted something like example.com/path/here to rewrite to example.com/index.php?path=WhateverCameAfterSlashHere as long as no ? appears in the URL (more on that later).

So that is pretty easy. Just do something like "([^?]*)" => "index.php?path=$1"

So what does this do? Well. I used this place and the lighttpd documentation to figure this out. Basically, a * will match anything. Quoting autohotkey.com:

An asterisk matches zero or more of the preceding character, class, or subpattern. For example, a* matches ab and aaab. It also matches at the very beginning of any string that contains no "a" at all.

Anything in the parentheses are a regex group for lighttpd, so anything it matches is provided to $1. Quoting the lighttpd documentation:

If the matched regex contains groups in parentheses, $1..$9 in the replacement refer to the captured text in the matching group "$1" meaning the first group, "$2" the second, and so on.

More on that later when we want to have a second group.

I also wanted it to stop at a ?, which eventually will be used to look like you're really accessing an index.php on the path in the URL and you're going to specify the first parameter that of course is passed to $_GET, even though the path is one too but unsuspecting people won't know that.

With the [^?] before the * we're telling regex to match everything EXCEPT for a ?.

Again, to quote autohotkey.com:

Matches any single character that is not in the class. For example, [^/]* matches zero or more occurrences of any character that is not a forward-slash, such as http://. Similarly, [^0-9xyz] matches any single character that isn't a digit and isn't the letter x, y, or z.

Now we also want to handle other parameters. This is where our second group comes in. Of course, we already included the ? for the path parameter in the rewrite rule where $1 is passed. When people supply a ? it should actually be after an & and the ? shouldn't actually be passed to the requested php file. I did this:

url.rewrite = (
                "([^?|&]*)\?(.*)" => "index.php?path=$1&$2"

As you see, since the ? in regex actually means something (A question mark matches zero or one of the preceding character[...]) I need to escape it with a backslash which isn't a foreign concept to anyone who has dabbled in scripting and programming even a little bit. So this means expect an actual ? character in the URL and then I made a second group that will match *literally *everything with the .*. That's fine though, he can handle it in the PHP code for whatever he wants to do. This means that any other parameters in the URL will be followed with an actual & character in the URL and not something else of course, but that's fine, nothing for lighttpd to handle.

You'll also notice that in my first group where I told it to not match a ? that I specified a | (which means OR, also not a foreign concept for most) followed by an & symbol, so people can't specify additional parameters after an & sign before the ?.

This means we're now resulted with: url.rewrite = ( "([^?|&]*)\?(.*)" => "index.php?path=$1&$2" )

Now the problem is that sometimes he doesn't want to specify other parameters after the ? in the URL, but if you try the rewrite example I provided you'll notice if no ? character is specified in the URL you get a 404 thrown at you as the rule has no match.

I initially then decided to do

url.rewrite = (
                "([^?|&]*)" => "index.php?path=$1"
                "([^?|&]*)\?(.*)" => "index.php?path=$1&$2"

Now as you might expect if you're not crazy like me that won't work. Now parameters are never provided to the PHP file when they are in the URL because the first rule I specified always applies (and of course, doesn't match anything with a? character or an & character)!

The fix is simple. Just reverse them like so:

url.rewrite = (
                "([^?|&]*)\?(.*)" => "index.php?path=$1&$2",
                "([^?|&]*)" => "index.php?path=$1"

This way if a ? exists in the URL then of course, the first rewrite rule applies since it's the first that appears. Otherwise, the second will.

Thanks for going along with me on my regex learning journey. I know this is probably simple for most people but took me a while to figure out as I've never done it before (aside from fixing my htmly rewrites for lighttpd a while ago which doesn't count!)