The Webcrawling Robot Swarm War is On!

submited by
Style Pass
2021-07-07 11:30:10

Share on ...   [Facebook]  [Twitter]  [Reddit]  [Linkedin]  [Hacker News]     

I have recently become more aggressive about blocking webcrawling robots from accessing my website for two reasons. First, I self-host my website from a webserver in my physical possession, and I have a limited choice of residential Internet connection plans. I do not want to pay a higher price for a commercial Internet connection with a higher bandwidth. After all, I am a Cheapskate. Or, Ich bin ein Cheapskate. As a result, I only have a small upload bandwidth. I would like to reserve that bandwidth for people who want to read my articles, not companies making money by gathering data on my website.

My second reason for blocking robots is that I want to accurately count the number of real people who are reading my articles. I do that with my own PHP page-view-counting script that is called from the bottom of each article as it is viewed by a visitor to my website. Blocking robots prevents my script from incrementing page counters when real people are not involved. If I cannot block all robots, I want to at least be able to identify them to prevent my script from counting robot-generated page views as being from real people. I occasionally use an offline web analytics tool that can--to an extent--separate real people from robots, but I would like to have an accurate count all the time.

Leave a Comment