When we heard that Google had unleashed a new algorithm in the United States to battle content farms, we were cautiously optimistic. Content farms, which bet they can make more more money on any advertisements than they spend producing very low-quality stories, had come to dominate the Internet's long tail.
But I've had my doubts that Google's machines could weed out these content farms. What signals would allow them to distinguish between high- and low-quality writing? Especially considering that humans are only decent at it.
Luckily, Google has gifted us a chance to do some side-by-side comparisons because they're rolling out the new-and-improved algorithm in the United States first. So, we did two searches for the phrase "drywall dust," figuring it was just random enough. One we executed in the standard way, presumably using the new algorithm, and the other we routed through a proxy server that made it look like were coming from India, presumably using the old algorithm.
And I have to say: Wow, the new algorithm yielded far superior results.
Granted, this is just one search for "drywall dust," but if this is even remotely indicative of how well the new algorithm works, we're all going to be tremendously impressed. The search via India led to seven sites that were producing low-quality or aggregated content, a photo of someone covered in dust, and a blog about an individual's remodel. The new algorithm search yielded very different results. Not only were there less content farms but two specialty sites and five fora made the list as well as a Centers for Disease Control page on the dangers of drywall dust. Having clicked through all 20 links, I can assure you that the information delivered by the new algorithm is much, much better.
Let us know if you have similar experience with other searches. We've been trying out other strings and the pattern appears to hold. We're seeing less content farms and more fora and news websites. For example, check out: "is botox safe" with the old algorithm and the new algorithm. In the latter, I counted five pages from what most would call respectable news sources. In the former, only three made the cut.