PHP - Detecting Spam Requests With Raw Analytics

If you own a website, chances are you have some sort of analytics software installed such as Google Analytics to report on traffic. But Google Analytics doesn't report on spiders/bots/referral spammers that crawl your site using your bandwidth and resources. In this article, I discuss the approach I took to try and block these spiders and spammers.

First, some background. A few weeks ago I noticed a high number of referrals appear in my Google Analytics. The referral host names didn't seem related to the contents of my website and a quick web search revealed that these referrals were indeed spammy jokers. It seemed other website owners were having the same problem. Below are just a few hosts that have plagued my website with there ghost referrals.

Please do not visit any of these sites.


To understand what was going on, I decided to start monitoring details of each request. After capturing data for a 24 hour period, I went through each of the requests and noticed not only a high number of spam referrals that Google Analytics didn't report on but also a very high number of search bots.

To make matters even worse, one of my articles was spammed by various hosts. Again this did not appear on Google Analytics. Further investigation revealed that most of these referral hosts shared the same IP address and each one of these hosts would request a page up to three times in one visit.

My initial though was to add some entries in my .htaccess file that would redirect these hosts else where, but the problem was I would need to know which referral hosts to redirect and that meant monitoring HTTP requests. Eventually I decided to develop a simple Raw Analytics PHP control panel that would allow me to view recent requests and take action if needed. I created a block list, that allowed me to block a request based on IP address, host name and user-agent.

After 24 hours of setting up the block list, I noticed a huge reduction in crawlers and spammy referrals and after several weeks of using the block list, I've noticed my bandwidth usage reduce. Of course I get the occasional unwanted crawler or referrer.

You can download the application from the link below.


