Robots, web crawlers, search spiders, auto-refreshers; whatever label you give them, software uses the Internet almost just as much as humans. And, much like humans, the nature of the robots isn’t always clear.
On one hand, you have the benign, beneficial robots. You have bots like Google’s crawlers, that trawl your site looking for links and indexing pages, calculating your position and feeding your data into the index.
On the other hand, you have the spambots, that look for comment fields and web forums without anti-spam security. They take up residence on these unmoderated communication channels and repeatedly post their messages, advertising some hacker’s affiliate links.
On a third hand, you have the invisible robots, the bots designed to give a page views, hits and maybe even affiliate link clicks, without actually giving any benefit to the owner of the site. They don’t post anything, and if you don’t check your analytics, you might never know they exist.
If you had a fourth hand, you might consider some sort of Internet-based learning intelligence, but thankfully, humans only have three hands.
Wait a second…
The Problem with Bots
Take the best and worst examples. You have the Google bot on one side, and a bot designed to refresh a page on the other. They both visit your site, increasing your analytics hits and traffic. They both have very low times on page. They both may refresh the page or crawl across links. The only difference, really, is that the Googlebot is doing something beneficial, feeding data back to the index. The refresh bot is just inflating your traffic statistics and potentially putting your affiliate program in jeopardy.
Unless the bot identifies itself, you don’t have much to go on in knowing what is good and what is bad. Thankfully, Google does exactly that. Spammers might, or might not, but known spam bots can be blocked as easily as unknown bots. So how do you go about blocking them?
Blocking Unwanted Bot Traffic: Method One
The first method, henceforth known as Method One, is to use the .htaccess file to block the bots that visit your site. This method is easy and common, but it has one major drawback; it can only block bots that identify themselves as bots that are known to be spambots. If the bot identifies itself as a legitimate user or as a benign bot, you won’t block it with this method. Additionally, your web host needs to be hosted on Apache; .htaccess is an Apache function.
To implement .htaccess blocking, you need to know the names of the bots – how they identify themselves – or their IP addresses. IP blocking is tricky, because if a bot is using a proxy, you might end up blocking some legitimate traffic. If you have one particularly persistent bot coming from the same IP every time, it’s okay to block that IP.
Essentially, what you will do is open your root .htaccess file and add some code. For example:
- RewriteEngine On
- RewriteCond %{HTTP_USER_AGENT} ^GrabNet
Those two lines will block the GrabNet bot from accessing your site. To add more than one bot, you append [OR] to the end of the bot line and add a second line, like this:
- RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
- RewriteCond %{HTTP_USER_AGENT} ^JetCar
You can see an example of a robust bot-blocking list here. You can copy and paste that entire block and use it, or you can monitor your traffic, identify bots when they appear, and add them to your .htaccess block list individually.
Blocking Unwanted Bot Traffic: Method Two
The second viable method for blocking bots again uses your .htaccess file, but blocks by IP address instead. The code for this looks like:
- Order Deny,Allow
- Deny from 127.0.0.1
The 127.0.0.1 example is not an IP you should ever block; it’s the example IP of your own computer. Replace it with the IP address of a bot you want to block.
To block more than one, just add another Deny line. You do not need a [OR] entry or any other addition.
You can also use the User Agent string to block bots, as seen here.
Blocking Unwanted Bot Traffic: WordPress Edition
WordPress is a little unique in handling bad bot traffic. You still edit your .htaccess file, but you will use slightly different code. For a good example, check out this support thread. You can, of course, also use WordPress anti-spam plugins like Akismet.
Dealing with Faulty Analytics
That all allows you to block bad traffic from reaching your site once you’ve identified it. Before then, however, and moving forward, you will always have to deal with bot traffic on some level. This means that your Google Analytics data will always have a margin for error. How can you minimize this error and make sure your data is reporting just real humans and not spambots?
First, realize that Google organically ignores most bot traffic. If you see an unexpected surge of traffic, it could be legitimate traffic, rather than bot traffic. If you haven’t done anything to warrant a surge, you can look for bots.
In Google Analytics, visit the Traffic Sources section and look at all traffic. Look at the traffic before and after the surge began; where are your referrals coming from? Are they referrals from a major website or social network? If so, the traffic is very likely legitimate. If the traffic is mostly coming from direct traffic, however, it’s probably bot traffic. It’s unlikely that you suddenly got hundreds or thousands of users to bookmark and check your site directly every day.
Once you have an idea of the surge happening, check the landing page statistics for that traffic. What’s the average duration look like? If it’s extremely low, you have bots. Fumigate your servers.
To block bots from appearing in your data, you will need to make an exclude filter. You can exclude traffic from domains or IPs; IP being the best option. This will filter your data moving forward. Unfortunately, there is nothing you can do to edit older data.
The post How to Block Unwanted Spam Bot Website Traffic appeared first on Growtraffic Blog.