Fake traffic is impossible to stop coming into your website. There are thousands of bots crawling around the web at any given moment, many of which use shared IPs or the IPs of important referrers. Try to block them and you’ll end up blocking some legitimate traffic, and the bots use rotating IPs anyways. You can’t block spammers without blocking legit users, so you just have to deal with it.
Thankfully, fake traffic isn’t detrimental to your site in general. If it’s all targeting your ads and clicking on them, you might be removed from AdWords or another ad program for click fraud, but generally bots aren’t going to do that. Spammers don’t go around trying to kill the accounts of other users. The only place that fake traffic typically hurts you is within Google Analytics.
Why is this? Google Analytics tracks a lot of data about users who visit your site. This allows you to get an idea of where your traffic is located, how it’s performing, and how you can improve it overall. If you’re recording a lot of data about fake users, though, you end up with incorrect metrics. Your bounce rate will look higher, your time on site will look lower, your demographics will be skewed, and your referrers will be incorrect.
“Bad traffic” comes in many forms, some more malicious than others. Different forms require different solutions, so let’s take a look at what’s going on.
There are a bunch of different kinds of bots. Technically, any visit from a piece of software without a human involved is a bot. A script that visits a page, pulls a piece of data, and leaves is a bit. Google’s web indexers are bots. They’re not all good, and they’re not all bad, so banning them across the board isn’t a good idea. Of course, they’re also impossible to completely ban them away.
On the good side, you have bots like the Googlebot. These are at worst unimportant, and at best actively beneficial to you. Google’s search crawlers are good, because without them your content would never enter the search results, which would leave your business stuck with nowhere to go.
Bad bots ignore robots.txt. They don’t necessarily execute scripts, but some of them do, because they want to see your page as it is displayed to a user, or need scripts to accomplish their goals. Some just want to scrape your content. Some want to register accounts or leave spam comments.
Some bad bots even fall under the heading of a botnet. A botnet is a network of bots controlled by one person, often numbering in the thousands of computers. Botnets are typically made up of computers infected by a virus. The owner of the botnet can issue a command, and every internet-enabled computer infected will execute that command. Botnets are how DDoS attacks happen most commonly.
There are also some bots, called referral bots, that can send data directly to your Google Analytics without ever actually visiting your site. They can do this because they know how HTTP works and they know how to ping Google’s analytics script. This is how you end up with fake referrers in your site analytics.
Referral spam is especially dangerous. The sites that show up in your analytics typically show up because they have linked to you and someone has followed that link. With referral spam, no such link exists. Why would a spammer do this? They want to see if you’re paying attention to your analytics. If you are, you’ll be curious as to why this site is linking to you, and you’ll click to visit it. The problem is, it doesn’t really exist. It just redirects you to an affiliate shopping cart, or a spam site of some description, or even to a malware spam download or malicious script execute page.
Not only is it dangerous to have these as referrers in your analytics; it’s dangerous to click them. If you don’t know how to handle malicious sites, you could end up infected with something, which can compromise all sorts of data, from your site to your bank accounts.
Google knows and understands referral spam. They’re working on a solution, but it’s a very complex problem and it’s difficult to filter and block without breaking analytics for legitimate users. In the mean time, it’s up to us webmasters to work out a solution that works well enough.
A Trick for New Websites
This first trick works for new websites, but it won’t work for established sites because it would change your analytics history completely. Historical data is very important, so you don’t want to cut off all data. This method works because of how Google Analytics itself works.
When you register a Google Analytics account, you are assigned a tracking ID. You can run multiple websites – up to 50 – with one Google Analytics account, though. These are differentiated by a number at the end of the ID. For example, your base ID might be UA-9876543-1. When you track analytics on your site, the code uses that ID.
Almost everyone running Google Analytics uses one account for their site, and their site is a -1 ID. Referral spammers target random IDs with -1, because it’s by far the most common.
You can circumvent this by using another ID number. All you need to do is make another property. Make two or three, and then use the last one for your website. Sure, you’re “wasting” the first couple, but they’re decoys. Spammers send referrers to UA-9876543-1, but you aren’t using that ID. Your actual site ID would be UA-9876543-3.
Like I said, this isn’t a solution for established sites, because changing your ID changes your analytics data. You won’t be able to access both the old and new data on the same report, because they would count as different websites.
Avoid the Referral Exclusion List
There’s a “referral exclusion list” in Google Analytics, and at first glance it sounds like exactly the kind of thing you want. You don’t want certain referrers in your site, so exclude them, right?
Well, unfortunately, that’s not how the tool works. How it actually works is that it ignores referrer data, but keeps everything else. The visit, the keywords, the time spent on site, the bounce; that’s all kept. The difference is, it removes referrer data, so it looks like that data comes as direct traffic.
All you’re doing, really, is moving the data from one category to another. Worse, you’re moving it from an easily filtered location to a location where it’s impossible to tell the difference between a good and a bad visit. You can see this in action here.
There’s no good way to block this traffic, because it’s not real traffic and you can’t block what you can’t predict. If I was going to pick a number between 1 and 1,000,000 and you had 10 guesses to pick it first, could you do it? Maybe, but probably not. That’s what trying to block this kind of traffic is like; guessing or just picking it up after the fact.
Rather than try to purge it from your Analytics – impossible – or block it before it happens – also impossible – the best thing you can do is implement filters to block it from your reports. This will make your reports more accurate, but does nothing to eliminate it from happening. It’s just a fact of life until Google removes the loophole that allows referrer spam to exist.
There are a ton of different sites that could be showing up in your referrer report. Semalt, Buttons-for-website, Darodar, Anticrawler, and a whole lot more all fall into this category. If you visit them, you are redirected to malicious content, which is why I’m not even typing out the TLD or linking to them.
Many of these sites are parked domains now, of course. Web hosts don’t take kindly to spam on their servers, but there’s only so much policing they can do. Spammers, meanwhile, just register dozens of other domains and repeat the process for as long as it’s profitable for them to do so.
The first type of filter you want to implement is the filter that removes “ghost referrals” or fake referrals from visits that don’t exist. These are the referrers that don’t actually visit your site and have 100% bounce rate because of it.
- Visit Google Analytics and click to select the right account and property for your site. Click to create a new view.
- Create a list of the valid hostnames that should show up. This will include URLs like example.com, blog.example.com, support.example.com, and something like translate.googleusercontent.com, which is the referrer for anyone viewing your site through Google Translate. If you’re not sure what hostnames to include, click Audience – Technology – Network and view the report with hostname as the primary dimension. This will show you the hostnames you want to use.
- Format each host name in a line. Each name should have a before the .com, and a | after it. You will end up with a list that looks like example.com|blog.example.com|support.com
- Build a new segment according to these instructions to test your filter without applying the permanent filter to your report. Remember, playing with filters is a permanent change.
- Apply your new segment to your network report with hostname as the primary dimension. This should filter out all data that isn’t included in your filter list. This will test the filter for you and will help you make sure your data is still accurate, and that you’re not missing something.
- Create a new view and apply the filter list you made in the filter pattern. Make sure the filter is a custom filter set to “include” with the field as hostname.
The new view will show your data moving forward without the spam referrals in it. This will be much more accurate than an unfiltered set, assuming you didn’t leave out an important source of actual traffic.
The second type of filter is the one that filters out bots that actually visit your site and inject referrer spam that way. They also have 100% bounce but they are less efficient at it, because they do have to actually load your page and your analytics script to work.
This is a more traditional filter. You make a list of the spam hostnames you see in your analytics. Go to Acquisition – All Traffic – Referrals and add a secondary dimension as hostname. This will take some manual pruning. Go through the source list and identify sources where the source itself is a spam domain and the hostname is your domain. If the hostname is another domain, don’t worry; that data will be filtered by the first filter we created.
For example, a filter list might look like: semalt.com|buttons-for-website.com to filter out those two spam domains. You will have to identify any spam domains you want filtered, but make sure they actually are spam domains. The problem with this kind of filter is the potential for misuse. If you filter a legitimate site, it kills a potentially valuable link. The risk is that you have to check a spam domain, and that’s exactly what the spammer wants. I recommend doing your testing with an incognito tab with adblock and noscript on, cranked up to max security.
Follow the same steps to create and test the filter as above, but this time set it to the “exclude” type rather than include, and use the field “Referral” instead of hostname. Put your list in the filter pattern and save to test.
The biggest problem with this filter, unfortunately, is that it only works as long as you keep up with it. As spam domains are dropped and others come up, your filter becomes out of date. You need to keep an eye out for new spam domains and add them to your filter, to keep it operating at peak efficiency.
This is a filter that chases the problem, but you can never catch up to it. You can’t add spam domains before they hit you. Or can you? Take a look at the top of this post. The folks over at Analytics Edge keep some filters up to date every month or so, and you can use them to pre-emptively filter spammers before they hit you.
Blocking out these spammers before they hit you is a matter for .htaccess blocking. That’s an entirely different deal, though, and it only works if your server is running Apache. If you meet those criteria, and are comfortable creating conditions in .htaccess, you can follow these steps to add spammers from your filter list to the block list as well.
When all is said and done, you should have a much cleaner, more spam-free Google Analytics report. You have to keep up with it, of course, but as long as you do so, you’re good to go.