Testing Google's New Algorithm: It Really Is Better

When we heard that Google had unleashed a new algorithm in the United States to battle content farms, we were cautiously optimistic. Content farms, which bet they can make more more money on any advertisements than they spend producing very low-quality stories, had come to dominate the Internet's long tail.

But I've had my doubts that Google's machines could weed out these content farms. What signals would allow them to distinguish between high- and low-quality writing? Especially considering that humans are only decent at it.

Luckily, Google has gifted us a chance to do some side-by-side comparisons because they're rolling out the new-and-improved algorithm in the United States first. So, we did two searches for the phrase "drywall dust," figuring it was just random enough. One we executed in the standard way, presumably using the new algorithm, and the other we routed through a proxy server that made it look like were coming from India, presumably using the old algorithm.

And I have to say: Wow, the new algorithm yielded far superior results.

Granted, this is just one search for "drywall dust," but if this is even remotely indicative of how well the new algorithm works, we're all going to be tremendously impressed. The search via India led to seven sites that were producing low-quality or aggregated content, a photo of someone covered in dust, and a blog about an individual's remodel. The new algorithm search yielded very different results. Not only were there less content farms but two specialty sites and five fora made the list as well as a Centers for Disease Control page on the dangers of drywall dust. Having clicked through all 20 links, I can assure you that the information delivered by the new algorithm is much, much better.

SidebySide.jpg

Let us know if you have similar experience with other searches. We've been trying out other strings and the pattern appears to hold. We're seeing less content farms and more fora and news websites. For example, check out: "is botox safe" with the old algorithm and the new algorithm. In the latter, I counted five pages from what most would call respectable news sources. In the former, only three made the cut.

Google Changes Its Algorithm

Chatter blizzard! There is a flurry of commentary about Google’s change to cope with outfits that generate content to attract traffic, get a high Google ranking, and deliver information to users! You can read the Google explanation in “Finding More High-Quality Sites in Search” and learn about the tweaks. I found this passage interesting:

We can’t make a major improvement without affecting rankings for many sites. It has to be that some sites will go up and some will go down. Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem. Therefore, it is important for high-quality sites to be rewarded, and that’s exactly what this change does.

Google faces increasing scrutiny for its display of content from some European Web sites. In fact, one of the companies affected has filed an anti trust complain against Google. You can read about the 1PlusV matter and the legal information site EJustice at this link (at least for a while. News has a tendency to disappear these days.)

image

Source: http://www.mentalamusement.com/our%20store/poker/casino_accessories.htm

Why did I find this passage interesting?

Well, it seems that when Google makes a fix, some sites go up or down in the results list. Interesting because as I understand the 1PlusV issue, the site arbitrarily disappeared and then reappeared. On one hand, human intervention doesn’t work very well. And, if 1PlusV is correct, human intervention does work pretty well.

Which is it? Algorithm that adapts or a human or two doing their thing independently or as the fingers of a committee.

I don’t know. My interest in how Google indexed Web sites diminished when I realized that Google results were deteriorating over the last few years. Now my queries are fairly specialized, and most of the information I need appears in third party sources. Google’s index, for me, is useful, but it is now just another click on a series of services I must use to locate information.

Google Cloud Connect: A Microsoft Office Plugin for Syncing with Google Docs

Google Cloud Connect is a Microsoft Office plugin released today by Google. It has been available for testers since November, but it is now generally available. It syncs a user's Office docs with their Google Docs, and adds a toolbar for sharing documents right into Office. We've been asking for offline access for Google Docs for years now, and this is a step towards that.

Google Cloud Connect is available for Windows XP, Windows Vista and Windows 7. Office 2003, 2007 and 2010 are all supported.

Several other services sync Google Docs with offline folders, including Gladinet, Insync, Memeo, Offisync and Syncplicity. We looked at some of these here.

Google has slowly but surely been turning Google Docs into the mythical Gdrive, and we've been tracking that progression.

Google also announced today its 90-Day Appsperience program - a way for those curious about Google Apps to get a chance to try it out for a "nominal fee" for 90 days.

 

Google Search Results Get More Social

Google is taking its biggest step yet toward making search results more social.

Though Google remains many people’s front door to the Web, people have increasingly been turning to social networking sites like Facebook and Twitter to search for shopping tips, what to read or travel information from people they know. Google said Thursday that its search results would now incorporate much more of that information.

“Relevance isn’t just about pages — it’s also about relationships,” Mike Cassidy, a Google product management director, and Matthew Kulick, a product manager, wrote in a company blog post announcing the new features.

Google has had a version of social search since 2009. People could link their Google profiles to LinkedIn and Twitter, for instance, and posts their friends had created would show up at the bottom of search results. But only a small percentage of people did this, and the chances that one of your LinkedIn contacts has written a blog post on a city you’re planning to visit is relatively slim.

Now, links to posts from friends across the Web, including on Twitter, YouTube and Flickr, will be incorporated into search results, not hidden at the bottom of the page, with a note and a picture telling you the post is from your friend. So if you are thinking about traveling to a beach in Mexico that a friend has visited, a link to her blog could be a top result.

Google will also let you know if a friend of yours has shared a particular link on the Web. This is a big change, because before, Google would only highlight material that acquaintances actually created.

You might be more likely to read an essay on a topic related to your job if a professional contact on Twitter shared it, for instance. That is a point that many Web publishers, including The Huffington Post and Forbes.com, have taken to heart.

Finally, Google users will be able to privately link their social networking accounts to their Google profiles. Before, those connections were made public, which might have discouraged some users. People will see social results only if they are logged in to their Google accounts and have connected their social networking accounts.

Notably, there is no mention of Facebook in Google’s announcement, through the company blog post says social results will appear only “if someone you’re connected to has publicly shared a link.” Facebook posts are generally private, and Facebook has made it difficult for Google to import social information, as several Google executives have complained in the past.

The Syndication Hustle

I was just reading John Heithaus, Chief Marketing Officer, of MRIS blog post titled; Reality Check Ahead: Data Mining and the Implication for Real Estate Professionals. He does a great job of outlining the implications of syndication has upon the real estate business.

Unfortunately it seems nobody cares.

To me, and others, it’s clear that the risks of syndication far out weigh the rewards. Yet brokers continue to sign agreements they never read with fine print they never see. Granted there are some best practices to follow, such as making sure the syndicator’s site has much less information than yours, and to make sure you understand what rights to the data you are giving away.

But, with sites like Facebook and Google people have become accustom to surrendering their personal/business data for “free” products and services.

The frustration is that many MLS professionals understand the dangers of listing syndication but are powerless to dissuade their board members to stop, look and listen. Bob Hale at the recent Inman Connect conference did an excelent job of listing off the battles that MLS/Real Estate Industry has lost in recent years, citing “agent ratings” as the latest defeat. And if Bob Hale can’t get anything done? Can anyone???

 

Reality Check Ahead: Data Mining and the Implications for Real Estate Professionals

MLS is a 100-year old institution that expertly aggregates and houses most, if not all, of real estate’s most critical data. Today, our data is currently being leveraged, sourced, scraped, licensed and syndicated by a grand assortment of players, partners and members. It’s being utilized in ways never imagined just a decade ago. Or, for that matter, six months ago.

The result: a plethora of competitive, strategic, financial and security-based issues have surfaced that challenge every MLS, as well every single one of our members/customers.

I think about this all the time. During my recent visit with my son KB – a college junior – he told me about how Google recently came to his campus offering everyone free email, voice mail, Docs (to replace MS Office) and data storage – an impressive list of free services for all.

I asked him why this publically traded company would give away its products for free. Despite his soaring IQ and studies in information systems technology, he couldn’t come up with an answer.

Searching Google on my laptop I presented KB with the following Google customer email (September, 2009) that read: “We wanted to let you know about some important changes … in a few weeks, documents, spreadsheets and presentations that have been explicitly published outside your organization and are linked to or from a public website will be crawled and indexed, which means they can appear in search results you see on Google.com and other search engines.” Note: once data is available on Google searches, their business model calls for selling advertising around that search result.

Bear in mind this refers to published docs and not those labeled as private – a setting within Google Docs that of which not all users are aware.

I also presented him with the specific EULA (End-User Licensing Agreement) language that states how a user grants a “perpetual, irrevocable, royalty free license to the content for certain purposes (republication, publication, adaptation, distribution), extending to the provision of syndicated services and to use such content in provision of those services.”

 

I recounted for KB how back in March of 2010, we learned in the national news that: “A confidential, seven-page Google Inc. “vision statement” shows the information-age giant is in a deep round of soul-searching over a basic question: How far should it go in profiting from its crown jewels—the vast trove of data it possesses about people’s activities?”

Source: Wall Street Journal August 10, 2010

This chart above shows that nearly 85% of respondents are concerned about the practice of tracked online behavior by advertisers.

Then, a Wall Street Journal article titled “What They Know” was posted which discusses how companies are developing ‘digital fingerprint’ technology to track our use of individual computers, mobile devices and TV set-top boxes so they can sell the data to advertisers. It appears that each device broadcasts a unique identification number that computer servers recognize and, thus, can be stored in a database and later analyzed for monetization. This 3-minute video is a must-see!

By the way, they call this practice “Human Barcoding.” KB began to squirm. As we all should.

 

Data. Security. And real estate

So what do “innovative” data mining and monetization methods now in use by Google and others, mean to real estate – specifically the data aggregated by an MLS and then shared around the globe?

We all must first grasp what happens to listing data when it’s collected and syndicated into “the cloud”, as well as the human transaction interactions that follow from start to finish (and beyond, actually).

Second, we need to understand how business intelligence and analytics are being applied to the data generated by real estate transactions today. If there is a monetization to the data without the knowledge and permission of the rightful owner, then, potentially, agreements need to be negotiated (or renegotiated) and modified to get in step with today’s (and tomorrow’s) inevitable ways of doing business. I’m not in any way opposed to data mining per se, the issue at hand here is fair compensation for the data on which it is based.

Here’s why the latest developments regarding Google (and others) are vitally important:

 

  • The world of leveraging digital information is changing very rapidly. As businesses push harder and deeper in their quest to monetize data, information, bits/bytes and mouse clicks, we must establish clear and informed consent on who exactly owns the data, who should control it and how it should be monetized. Protecting OUR “crown jewels”, if you will.
  • What do you know about “Human Barcoding”? It’s time for industry leaders to research this new phenomenon and begin to establish the basis for an industry position as it pertains to residential real estate.
  • How do we, as an industry, determine the real value of data beyond the property-centric context? As true business intelligence and data mining progress in our industry, we need “comps” to build upon to derive a valuation model.
  • What exactly is the MLS’s role? Are we the “stewards” of the data (on behalf of our customers) that emanates from the property record and the subsequent transaction and electronic interactions between all the parties connected to it?  How should the MLS industry confront the challenge?

We all certainly remember when the national consumer portals planted their flag(s) on this industry and, by association, MLS territory. Their rationale then was that they would help drive “eyeballs” and traffic to the inventory. Indeed they have. But, looking back, it all came with a pretty steep price tag.

For example, referral fees were subsequently replaced with advertising revenues that more often than not started chipping away at the edges of the broker’s affiliated business models (mortgage, insurance, etc). Now, as a result, the margins of the business are perilously thin from a broker’s perspective.

The roots of the MLS began as a business to facilitate a fair distribution of commissions and compensation amongst brokers. It’s safe to say, dear Toto, that we are no longer in Kansas anymore. Given the digital landscape, where value can be derived in so many unique ways, the fact that others whose motives for increasing the value of the asset are potentially suspect, it’s critical that we convene right now to assert an intellectual lead on what is happening here, or at least make the conscious decision to step aside.

I’m sure there are many other questions and reasons why this is “mission critical” to us. But what I’ve offered, with the help of several really smart folks in the industry, provides a good starting point. We welcome all industry commentators on this topic. Thanks in advance for sharing ….

John L. Heithaus Chief Marketing Officer, MRIS (john.heithaus@mris.net)

Ps – a “tip of the hat” to Greg Roberston of Vendor Alley for starting us on this path after his excellent post “Inside Trulia’s Boiler Room”*. I also benefited mightily from the comments of David Charron of MRIS, Marilyn Wilson of the WAV Group and Marc Davison of 1000watt Consulting, and I extend my appreciation to them for sharing their perspectives.

* After this story ran, the You Tube video interview with a Trulia staffer was made “private” and is now inaccessible. Vendor Alley’s analysis of the video provides an excellent overview of the situation.