mod_rewrite Essential Tips for Beginners

No web admin wants URLs with .htm’s, .php’s, and least of all long ?id=zqw6&ref=who%20cares query strings. SEO aside, it just looks so 2005. Fortunately most any CMS or framework has support for pretty URLs these days, whether WordPress’s permalinks or Laravel’s intuitive and flexible routing. But for developers working on legacy sites, the only choice may be to implement pretty URLs the old-fashioned way – via Apache’s mod_rewrite. That was the situation I recently found myself in, and here are a few tricks I learned that may help those in the same situation.

Note that this article isn’t meant to be a comprehensive intro to mod_rewrite (for which I recommend this article by Added Bytes) or a complete round-up of its functionality (for which there’s the Apache documentation). It simply covers a few scenarios that took some research and experimentation to figure out, and that I think others are likely run into.

Point a Pretty URL to its Source Location

Let’s begin with the most basic scenario. Say say we’re running a movie website, and for the movie index we want a pretty URL to point to a PHP page – without the source PHP page’s location showing up in the browser address bar.

http://myfilmsite.com/movies
points to
http://myfilmsite.com/movies-index.php

The following rule would do it:
RewriteRule  ^movies$  /movies-index.php  [NC,L]

The first expression matches the URL path /movies, while the second one points that URL internally to the path /movies-index.php.  Note that ^ and $ are regular expression characters that designate the beginning and end of a string, respectively.  Without those characters, any path with the word “movies” in it – say top-movies-2014 – would match the rule, which is not what we want. (It’s a good idea to be familiar with regular expressions before delving into mod_rewrite.)

The NC flag makes the search non-case-sensitive, while the L flag designates that if matched, this will be the last rule to process.

Important note about the L flag: Using the [L] flag in a rule does not necessarily mean it’ll be the last rule processed for that URL, since when using redirects, the .htaccess file and its rules may wind up being processed more than once. The general gist is that the L flag is usually needed in rewrite rules, but you still need to account for conflicting rules. (For the lowdown check the L flag docs.)

Redirect an Old URL to a New One

While we’re tidying up our URL structure we might want to redirect the old index page URL to its pretty new one:

http://myfilmsite.com/movies-index.php
redirects to
http://myfilmsite.com/movies

The rewrite rule:
RewriteRule  ^movies-index\.php  /movies  [R=301,NC,L]

The focal point here is the R=301 flag. It tells the server to redirect the first location to the second one, returning a 301 (permanently relocated) HTTP response code. In this case users will actually see the new location in their browser address bars.

Since we’re dealing with regular expressions and periods are a special character, the backslash in movies-index\.php is just to escape it.

Point a Pretty URL to its Source Location, and Redirect Away from the Source

Use the above two rules in tandem as is, and you’ll encounter one of the great joys of URL rewriting, the infinite-loop server error. The problem being myfilmsite.com/movies points to myfilmsite.com/movies-index.php, which then redirects back to the original URL, which then points back to the second, which then redirects and… yeah. This is where the Rewrite Condition directive comes into play. To put it all together:

RewriteRule  ^movies$  /movies-index.php  [NC,L]
RewriteCond  %{THE_REQUEST}  movies-index\.php  [NC]
RewriteRule  ^movies-index\.php  /movies  [R=301,NC,L]

RewriteCond ensures that the RewriteRule following it will be processed only if its conditions are met, while %{THE_REQUEST} limits the rule to the HTTP request sent by the browser. So, IF the address requested by the browser matches movies-index.php, the subsequent rule which redirects to the pretty URL will be processed. Otherwise (for example when the first RewriteRule is pointing to movies-index.php internally), it will be ignored. Infinite loop problem solved.

Important note about %{THE_REQUEST}: As stated in the Apache docs, it gets “the full HTTP request line sent by the browser to the server” – NOT just the URL. So in the rewrite condition above, prepending the string with the caret symbol (^movies-index\.php) wouldn’t work, because the full HTTP request is actually something like “GET /movies-index.php HTTP/1.1“. To avoid this problem I just refrain from using the beginning-of-string and end-of-string special characters ^ and $ in rewrite conditions having %{THE_REQUEST}.

Include Query String Parameters

A common scenario in URL rewriting is to have segments of the URL represent query string parameters for the page it points to.

http://myfilmsite.com/movies/looper
points to
http://myfilmsite.com/show-movie.php?slug=looper

The rewrite rule:
RewriteRule  ^movies/(.+)  /show-movie.php?slug=$1  [NC,L]

The first expression matches URL paths that begin with movies/ and captures whatever comes after the slash for later reference. The second expression points that URL internally to the path /show-movie.php?slug=$1, where $1 is replaced with the text that was captured by the (.+) pattern.

The rule to redirect the old URL with query string parameters to the new, pretty one is more complicated.

RewriteCond  %{THE_REQUEST}  show-movie\.php  [NC]
RewriteCond  %{QUERY_STRING}  slug=([^&]+)  [NC]
RewriteRule  ^show-movie\.php  /movies/%1?  [R=301,NC,L]

Here we have two rewrite conditions that apply to the same rewrite rule.  The first condition is just a safeguard against infinite looping, as in our earlier example.  The second condition deals with the query string. It says that if the URL has the parameter slug= in the query string, the rule that follows will be applied. Finally, the rewrite rule itself redirects to /movies/%1, with the %1 replaced by whatever text was captured by ([^&]+) for the slug parameter. (The [^&] part of the regular expression is to prevent capturing any extraneous parameters, so that for example show-movie.php?slug=looper&year=2012 will get just “looper” for the slug.)

An important difference from earlier examples: patterns matched in rewrite rules are back-referenced with the $ character, while patterns matches in rewrite conditions are back-referenced with %.

One final thing to note is the ? at the end of the /movies/%1? in our rewrite rule. This tells the server not to append the query string from the original URL when redirecting.  So in the case above:

http://myfilmsite.com/show-movie.php?slug=looper
redirects to
http://myfilmsite.com/movies/looper (using question mark at end of rule)
http://myfilmsite.com/movies/looper?slug=looper (not using it)

Re_write Away

These examples are just the tip of the iceberg when it comes to URL rewriting, but they should help with a few common scenarios. For more on mod_rewrite be sure to check out the links below, and feel free to post here with any questions or comments!