My website is a fun project for me. I just love working on it and watching it grow day by day.
Speaking of watching it grow, I am also someone, who actually reads web server logs just to see, whats going on. One thing, I am particularly interested in are the HTTP referrers. After all, which website owner would not want to know, where his visitors are coming from? Curiously, some of mine seem to be coming from porn websites, "make money fast" blogs and other shady places.
Being linked from a adult site seems rather odd. Even more so, when according to the HTTP referrer, that link supposedly lives on the their frontpage and directly points to mine.
Something like this just begs to be investigated, especially since it's also a nice excuse for having to look at boobs. So, when I took a look at the
boobs page (source) in question, I found the expected link not only to be missing, but also that it would not have made any sense being there in the first place. There is certainly something fishy about this!
Of course, as everybody knows, a HTTP referrer can easily be spoofed and a lot of people even do it to protect their online privacy. Thats fair and square. But why would someone, who is privacy concerned lie by stating that he came from an adult website? That seems rather counterproductive and illogical.
Ok, time for having a closer look at the web server's log. What browser where those suspicious visitors using? Some identified themselves as "\xef\xbb\xbfMozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:220.127.116.11) Gecko/20060728 Firefox/18.104.22.168". The escape sequence at the beginning of the string looks quite strange. It's certainly not from an official release of Mozilla Firefox. Also, only 231 bytes and no other files were transferred. This just smells like some kind of bot, which only does HTTP HEAD requests and tries to mask itself as Firefox.
Maybe Google knows a bit more about this user agent? Wow! Some 5k hits and most of them being unprotected web server statistics or web server logs! Some of these access logs even show whole series of successive requests from the same IP, claiming to have been referred by a different site each time.
Seems like some blackhat SEO is trying to collect backlinks big time by having a spider crawl the entire web in the hope of eventually hitting a website, that (unintentionally) publishes it's statistics in HTML format with referrers being transformed into following links. A mistake, that is apparently easily made by inexperienced webmasters in three simple steps:
- Have directory listing enabled in the web server.
- Have webalizer (or some similar tool) dump it's statistics in HTML format to some subdirectory below the document root, so they can conveniently be accessed from by a web browser, but don't password protect that directory.
- Buy a used and indexed domain. Have it point at the web server before adding an index.html file, giving Google a chance to discover the statistics directory.
If you happen to have been hit by this scheme as well, I'd suggest to immediately take steps to remove your statistics from the Google index.
Sites, promoting themselves by cheating eventually end up being considered as "bad neighborhood" by search engines. Linking to such a bad neighborhood can negatively affect the ranking of your own website.