Permanent Links


What should be the topic for the next Impossibly Stupid poll?

A Town Square Poll Space

Tech Corner

See Also

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[TXT]README.html2014-11-29 12:13 3.2K 
[   ]info.json2014-12-09 01:41 40  
[   ]tags=play2015-03-26 18:30 0  

The state of web spidering for spam emails

Since there haven't been any new notable organizations spamming the blog comments here in the last couple weeks (although there have been quite a number of repeat offenders), I'm going to pass along some observations about the good ol' fashioned email variety of spam.

I have a long history of fighting spam emails. I won't get into the details, but I was onboard with all the mainstream anti-spam efforts from the early days. Then something happened in 2005, which I also won't get into, that made me realize that a lot of the anti-spam people weren't much more than power-hungry goons who themselves were more interested in profiting from spam than eliminating it.

At the time, my main address was getting 5000+ spam/day. So I did the Impossibly Stupid thing of taking a step back from all the filtering and blacklisting and reporting and all the other machinery that has been thrown at the problem of email spam, asking myself instead what I, as a lone individual, could do to keep my inbox clean.

I now get maybe 1 spam/week. I do no filtering. I maintain no blacklists. My server only sees that one email, so there isn't much need to do anything fancy after the fact. I'm not really hiding my contact info, either. I even provide a clickable email link on web pages without a hint of JavaScript obfuscation. I did think up and use some new techniques to choke the flow, but this post isn't really about detailing them.

Instead, because my inbox was essentially cleared of spam for over a year, I decided to start an experiment. In November of 2007, I put up a web page on the corporate site that had a unique email address link in the clear. The question in my mind at the time was "Is web spidering even done to collect email addresses anymore?"

You see, the trickle of spam I was getting in my "real" email was easily traced by my new techniques. 99% of it was from Usenet posts; if I started using an invalid address there I'd essentially get no spam at all. Nothing was coming in via any web site I was on, but I wasn't certain if that was directly due to things I had done. So I put up an email address free and clear. Very retro!

Just now, 2 years and 2 months later, I saw the first spam to that email address. A couple more came in after that, and I'm going to keep an eye on whether or not the volume starts to take off. But the conclusion is pretty clear: there seems to be very little need to make people browsing your web site jump through a lot of hoops to get your email, because that doesn't appear to be how spammers are finding you these days.

Caveats abound, to be sure, but that's what I'm seeing. One obvious factor is that the web has gotten so large that it no longer makes sense to spider it all just to scrape out a new email address or two. It may be that they just hit the index page of most sites (my test email address was buried at least 3 clicks deep). I'm even going to test that out by making my main contact to the right a clear email address. Stay tuned for progress reports. If they're not even doing that small amount of spidering, you might have a long wait . . .