This article was last updated Wednesday, 6 July 2005. View a newer version at my site, hardanswers.net/hotlink-protection

Hotlink Prevention

“Hotlinking” (sometimes called “Link Hijacking”), the action of linking directly to a file or resource on a site, rather than the page the site owner intended, is a common problem. The most common example is when someone places an image on his or her site, but rather than host the image themselves, they link directly to an image on your site. When people visit their site, they are actually viewing your image and using your bandwidth to view it – without even knowing it comes from your site.

There are two main methods of preventing “hotlinking”, and both have flaws. The most common involves checking for referrers. Most browsers send a “referrer” – the address of the page that referred them to your page, that is, the address of the page they were viewing directly before they went to your page, which is generally the page that they followed a link from to get to your page – or, in the case of images embedded into a page, the address of the page they are embedded into. You can perform a simple check, if the referrer is from your site, then you allow the user to access files such as images, and if not, you block them. The main problem with this is that you generally need to allow blank referrers, as some people don’t send referrers at all, and if people type an address directly into a browser, it won’t have a referrer. Some more paranoid people also send fake referrers, in the belief that they’re some type of privacy risk.

The other more complex way, which I will detail here, involves the use of sessions – and is almost infallible, although it still has its flaws.

The logic behind this idea is simple – when a user first visits your site, you start a session. A session is a simple method of maintaining information about a specific user across a period of time. Unfortunately, HTTP itself has no way of doing this – every HTTP request is generally not associated with any other HTTP request, so a server-side method is required. Here we use PHP’s inbuilt session handling.

Having started a session when the user first visited your site, you simply check, every time that user attempts to access a specific type of file, for example an image, whether or not that user has a valid session. An exception is made for certain robots, such as Google’s Googlebot. For these we simply check that their user agent matches a list of known robot’s user agents.

We use Apache’s .htaccess and mod_rewrite to rewrite all requests for specific file types, in this case jpg, png, gif, swf and mp3. This means that when a user requests, for example, nedmartin.org/picture.jpg, this is internally rewritten to nedmartin.org/hotlink.php?file=picture.jpg. The .htaccess code used to do this is below:

.htaccess

Having rewritten all requests for specific resources to go through your hotlink.php, and assuming you are actually using PHP, you will need to create the hotlink.php file. There’s one small issue with this method. By default, PHP will send headers to ensure that dynamically generated content is not cached. This is not what you want when you’re actually sending non-dynamic content, such as images, so we have to modify the headers sent. The commented code is as follows:

hotlink.php

The other thing you must do, is ensure that a session is created when a user first visits your site. This is surprisingly simple if you’ve used good design decisions when creating your site, and you can easily add a piece of code to every page – just add the following code to every page on your site:

There’s, unfortunately, one major flaw with all of this. By default, PHP will attempt to store session information, generally in the form of a large pseudo-random number, in a cookie. Cookies have received some bad press, so some people now consider them a privacy risk and block them. This is not true – a session cookie is not a privacy risk, it is a simple method used to ensure that session data is propagated across different pages and you should never block them. However, because people do block cookies, PHP has a secondary method it uses should cookies fail. PHP adds “?PHPSESSID=bigPseudoRandomNumber” to every link in your page. This works well – when a user clicks on the link, PHP can gather the session information from it. Unfortunately though, this has one major drawback. Search engines such as Google don’t support cookies, so when they visit your site they will see all links with a large random number appended. Next time they visit your site, they will see all the links with a different random number appended. This will confuse them and they will probably decide your links have all changed, and your site will drop into the nether regions of search-engine-land, never to be seen again. If this is likely to be a problem with your site, you should use the following code to force PHP to use cookies only. Be aware though, that when using cookies only, anyone who refuses, or doesn’t support, cookies will not be able to view the protected resources on your site.

To force PHP to use cookie only session handling, either add the line “session.use_only_cookies=1” to a php.ini file in the main root directory, or place “ini_set('session.use_only_cookies',1);” directly before calling “session_start()”.

For more information on:

 

Last updated Tuesday, 20 November 2012