Testing Google's New Algorithm: It Really Is Better

When we heard that Google had unleashed a new algorithm in the United States to battle content farms, we were cautiously optimistic. Content farms, which bet they can make more more money on any advertisements than they spend producing very low-quality stories, had come to dominate the Internet's long tail.

But I've had my doubts that Google's machines could weed out these content farms. What signals would allow them to distinguish between high- and low-quality writing? Especially considering that humans are only decent at it.

Luckily, Google has gifted us a chance to do some side-by-side comparisons because they're rolling out the new-and-improved algorithm in the United States first. So, we did two searches for the phrase "drywall dust," figuring it was just random enough. One we executed in the standard way, presumably using the new algorithm, and the other we routed through a proxy server that made it look like were coming from India, presumably using the old algorithm.

And I have to say: Wow, the new algorithm yielded far superior results.

Granted, this is just one search for "drywall dust," but if this is even remotely indicative of how well the new algorithm works, we're all going to be tremendously impressed. The search via India led to seven sites that were producing low-quality or aggregated content, a photo of someone covered in dust, and a blog about an individual's remodel. The new algorithm search yielded very different results. Not only were there less content farms but two specialty sites and five fora made the list as well as a Centers for Disease Control page on the dangers of drywall dust. Having clicked through all 20 links, I can assure you that the information delivered by the new algorithm is much, much better.

SidebySide.jpg

Let us know if you have similar experience with other searches. We've been trying out other strings and the pattern appears to hold. We're seeing less content farms and more fora and news websites. For example, check out: "is botox safe" with the old algorithm and the new algorithm. In the latter, I counted five pages from what most would call respectable news sources. In the former, only three made the cut.

Google Changes Its Algorithm

Chatter blizzard! There is a flurry of commentary about Google’s change to cope with outfits that generate content to attract traffic, get a high Google ranking, and deliver information to users! You can read the Google explanation in “Finding More High-Quality Sites in Search” and learn about the tweaks. I found this passage interesting:

We can’t make a major improvement without affecting rankings for many sites. It has to be that some sites will go up and some will go down. Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem. Therefore, it is important for high-quality sites to be rewarded, and that’s exactly what this change does.

Google faces increasing scrutiny for its display of content from some European Web sites. In fact, one of the companies affected has filed an anti trust complain against Google. You can read about the 1PlusV matter and the legal information site EJustice at this link (at least for a while. News has a tendency to disappear these days.)

image

Source: http://www.mentalamusement.com/our%20store/poker/casino_accessories.htm

Why did I find this passage interesting?

Well, it seems that when Google makes a fix, some sites go up or down in the results list. Interesting because as I understand the 1PlusV issue, the site arbitrarily disappeared and then reappeared. On one hand, human intervention doesn’t work very well. And, if 1PlusV is correct, human intervention does work pretty well.

Which is it? Algorithm that adapts or a human or two doing their thing independently or as the fingers of a committee.

I don’t know. My interest in how Google indexed Web sites diminished when I realized that Google results were deteriorating over the last few years. Now my queries are fairly specialized, and most of the information I need appears in third party sources. Google’s index, for me, is useful, but it is now just another click on a series of services I must use to locate information.

Google Search Results Get More Social

Google is taking its biggest step yet toward making search results more social.

Though Google remains many people’s front door to the Web, people have increasingly been turning to social networking sites like Facebook and Twitter to search for shopping tips, what to read or travel information from people they know. Google said Thursday that its search results would now incorporate much more of that information.

“Relevance isn’t just about pages — it’s also about relationships,” Mike Cassidy, a Google product management director, and Matthew Kulick, a product manager, wrote in a company blog post announcing the new features.

Google has had a version of social search since 2009. People could link their Google profiles to LinkedIn and Twitter, for instance, and posts their friends had created would show up at the bottom of search results. But only a small percentage of people did this, and the chances that one of your LinkedIn contacts has written a blog post on a city you’re planning to visit is relatively slim.

Now, links to posts from friends across the Web, including on Twitter, YouTube and Flickr, will be incorporated into search results, not hidden at the bottom of the page, with a note and a picture telling you the post is from your friend. So if you are thinking about traveling to a beach in Mexico that a friend has visited, a link to her blog could be a top result.

Google will also let you know if a friend of yours has shared a particular link on the Web. This is a big change, because before, Google would only highlight material that acquaintances actually created.

You might be more likely to read an essay on a topic related to your job if a professional contact on Twitter shared it, for instance. That is a point that many Web publishers, including The Huffington Post and Forbes.com, have taken to heart.

Finally, Google users will be able to privately link their social networking accounts to their Google profiles. Before, those connections were made public, which might have discouraged some users. People will see social results only if they are logged in to their Google accounts and have connected their social networking accounts.

Notably, there is no mention of Facebook in Google’s announcement, through the company blog post says social results will appear only “if someone you’re connected to has publicly shared a link.” Facebook posts are generally private, and Facebook has made it difficult for Google to import social information, as several Google executives have complained in the past.

MLS Tesseract: A Listing Syndication Discussion

There has been a great deal of talk and writing lately about MLS/broker listing syndication. This isn't surprising considering one of the leading synidcation providers, Threewide, was recently acquired by Move, Inc., operator of Realtor.com. At the same time, initiatives like those led by industry veterans Bud Fogel and Mike Meyers and by LPS's Ira Luntz (himself a veteran of syndication), suggest that a new model could be in the offiing. Among commentators who have taken up the issue are MRIS Chief Marketing Officer John Heithaus, in an often-discussed post; Victor Lund of the WAVGroup(you need to be in Inman subscriber to read that one); and Rob Hahn.

Elizabeth and I wrote a longish whitepaper on syndication back in 2008, which is still available on our firm web site. That paper, written just before Elizabeth and I formed our firm together, focused on the role of MLS and how MLSs should approach operational and legal issues; it assumed that MLSs would want to do syndication because at least some brokers wanted syndication. Most of the concerns we expressed then remained unaddressed in the industry. The current debate, it seems to me, is about whether brokers and MLSs should be doing syndication at all. It takes up the key assumption in our 2008 whitepaper.

I'd like to expend some effort and thought on this topic in the next few posts. This first one will define what I mean by "listing syndication," in order to distinguish it from other forms of listing data distribution and licensing; and it will discuss some reasons MLSs get involved in syndication. In the next post, we'll consider ways that MLSs get involved in syndication and some problems and issues. Then we'll take a look at the key underlying question: should brokers be sending listings to all these places in the first place, and what role should MLSs play in that decision?

“Syndication” defined

There is no official definition of “listing syndication.” There is no Platonic universal or form corresponding to listing syndication. So, we just need a practical definition that provides some scope to what we are talking about. For this summary, I will use the following definition: “listing syndication is the distribution in bulk of active real estate listings (listings currently available for sale), by or on behalf of the listing agent or listing broker, to sites that will advertise them on the web to consumers.”

We include each of the following things in this definition:

  • Distribution of listings by MLS through a listing syndicator, such as ThreeWide (ListHub) or Point2, to advertising sites.
  • Direct distribution of listings by MLS to advertising sites (including local newspaper web sites and national sites).
  • Distribution by a listing broker via a data feed (whether broker-internal or created by MLS on the broker’s behalf) to advertising sites.
  • Use by a broker or agent of a service that offers to take a bulk data feed and then distribute the listings to advertising sites.

I usually call web sites that advertise real estate listings “aggregators” (we used "commercial distributors" in the 2008 report). I usually refer to the recipients of data through syndication as “syndication channels.” As a result, a site like Zillow.com is both an aggregator and a syndication channel. I do not consider the following syndication (at least for purposes of this discussion), though they share some characteristics with it:

  • Services where agents manually load their own listings in, like vFlyer and Postlets. These sites generally do not take bulk data feeds.
  • Data licensing to RPR or CoreLogic under their current proposals. They are using off-market listings, in addition to active ones, and the applications they use them for are not advertising. Sending data to Move, Inc. for its Find application is not syndication because it includes off-markets; but see the discussion relating to that below.
  • A “back-office data feed” from MLS to the listing brokerage. A back-office feed often includes all the MLS listing data (from all brokers) and comes with a license for the brokerage to use the data internally for the core purposes of MLS and the freedom to use its own listings any way it pleases. Many MLSs provide such feeds to their participants to facilitate brokerage business activities. Thus, though MLS’s action here is not syndication, the brokerage might turn around and engage in syndication itself.

Why only active listings? We do not include off-market listings (listing records relating to properties not currently for sale) in our definition of syndication for two reasons:

  1. MLSs perceive the off-market listings as something different. Most MLSs recognize that very recent off-market activity in the MLS provides a very valuable resource, one not available elsewhere. Thus very few MLSs distribute off-market listings through typical syndication channels.
  2. Brokers perceive the off-market listings as something different. Brokers want their active listings advertised (and their sellers want it, too). But brokers rarely perceive value in having their off-market listings distributed. We are not acquainted with any brokerage firm that distributes its off-market listing data.

The value MLSs bring to syndication

Almost from the beginning of “listings on the Internet,” people have asked what role the MLS should play in getting listings out there. The short answer is efficiency:

  1. MLS already has all the listing content in one database. Every listing broker has already paid for that database to be created and maintained; and every such database has the capability to export listing data. In theory, at least, it should always be cheaper for MLS to ship brokers’ listings to a channel, because it requires fewer steps. In the alternative, MLS would supply each broker a data feed of its own listings, then each broker would have to set up a feed to each channel (or at least set up a feed to a syndicator who could reach the channels). That’s a lot more data feeds, IT staff hours, etc.
  2. Many MLSs and traditional syndicators permit broker ease-of-use. Syndicators like Point 2 offer brokers a dashboard where they can click on the channels they want to receive their listings and click off the ones they don’t want to receive them. In theory, this does not require the broker to perform research and due diligence on each channel; the syndicator or MLS has theoretically done that before presenting the option on the dashboard. (In practice, this may not be happening.)
  3. Syndication through MLS or a syndicator may give listing brokers more leverage. If a channel is getting the listings from many brokers in MLS through a data feed from MLS, the MLS may have leverage with the channel to get it to behave properly. If the MLS cuts off the data feed, the channel loses listings from all the brokers. Similarly, if a syndicator cuts off a feed to a channel, the channel loses the feed for all the MLSs working with the syndicator. A single broker, by comparison, usually does not have the volume of listings to exert leverage on the channels. Note that some channels (like Google, before it decided to stop accepting listings), did not necessarily react to that leverage anyway. Note also that just because MLSs and syndicators have this leverage, that does not mean they have actually used it (I'll discuss this in another post).

The resourceful broker problem

One important fact about syndication is what we call the “resourceful broker problem.” It’s not really a problem at all; it’s just competition. If an MLS does not syndicate listings on behalf of its brokers, some of the brokers will assume the costs and work associated with syndicating their own listings. This will give those brokers a competitive advantage in the market. Note that we don’t call this the “large broker problem.” Though the large brokers in markets are also often resourceful brokers, in many cases, smaller brokers also find the means to be resourceful. The MLS is confronted with its age-old problem, almost its nemesis: Choose between (a) delivering services at the lowest common denominator, drawing complaints from some brokers that MLS should be doing more to deliver efficiencies to all brokers; and (b) delivering efficient services to all brokers, drawing complaints from resourceful brokers that MLS is “leveling the playing field.”

Neither of these arguments is wholly right or wrong. But they appear in some form whenever MLSs consider offering services like syndication. The intensity of feeling about which path the MLS should take varies a great deal from MLS to MLS and often within the board room of a single MLS.

So, we've stipulated a definition for "syndication"; should we be including other things, or perhaps excluding something I've included here? And we've discussed why MLSs often believe they should be involved. I'm curious what your thoughts are about my efficiency arguments there. Next time, we'll consider some ways that MLSs do, and don't, syndicate. Following that, I'd like to spend a little time considering where brokers should be sending their listings and whether the MLS should be deciding for them.

Zillow Changes Back To The Old Map Search

A few weeks ago, we updated the home shopping search experience on Zillow with the intent to give our users better shopping tools to find the home of their dreams. We added a more usable map area, added keyword search to the list of filters, and also changed the format of the list so that the map stays in view when you scroll through the homes.

No one has ever accused us of being complacent; we are constantly working to improve the Zillow site, and we’re certainly willing to take risks and learn from our mistakes.

 

Firefox Takes Unusual Approach In Unveiling ‘Do Not Track’ Option

For all the talk among among policymakers and the press about online privacy, it still isn’t clear how much average consumers are even aware of online ad tracking. Firefox, the browser of choice for a third of all internet users, is apparently looking to change that. The beta of the latest version of Firefox trumpets the new “Do Not Track” feature prominently—listing it, in large font, as the very first item on the “What’s New in Firefox 4” page. The move could increase the pressure on other browser companies as well as advertisers to beef up their own privacy options. 

Mozilla announced months ago that it would put a Do Not Track option in the new version of Firefox—so in that sense, the release of the beta version isn’t a surprise. But what is unexpected is the headline “Opt Out of Ad Tracking” splashed across the company’s upgrade page.

What’s New in Firefox 4 Beta

Opt Out of Ad Tracking

Reality Check Ahead: Data Mining and the Implications for Real Estate Professionals

MLS is a 100-year old institution that expertly aggregates and houses most, if not all, of real estate’s most critical data. Today, our data is currently being leveraged, sourced, scraped, licensed and syndicated by a grand assortment of players, partners and members. It’s being utilized in ways never imagined just a decade ago. Or, for that matter, six months ago.

The result: a plethora of competitive, strategic, financial and security-based issues have surfaced that challenge every MLS, as well every single one of our members/customers.

I think about this all the time. During my recent visit with my son KB – a college junior – he told me about how Google recently came to his campus offering everyone free email, voice mail, Docs (to replace MS Office) and data storage – an impressive list of free services for all.

I asked him why this publically traded company would give away its products for free. Despite his soaring IQ and studies in information systems technology, he couldn’t come up with an answer.

Searching Google on my laptop I presented KB with the following Google customer email (September, 2009) that read: “We wanted to let you know about some important changes … in a few weeks, documents, spreadsheets and presentations that have been explicitly published outside your organization and are linked to or from a public website will be crawled and indexed, which means they can appear in search results you see on Google.com and other search engines.” Note: once data is available on Google searches, their business model calls for selling advertising around that search result.

Bear in mind this refers to published docs and not those labeled as private – a setting within Google Docs that of which not all users are aware.

I also presented him with the specific EULA (End-User Licensing Agreement) language that states how a user grants a “perpetual, irrevocable, royalty free license to the content for certain purposes (republication, publication, adaptation, distribution), extending to the provision of syndicated services and to use such content in provision of those services.”

 

I recounted for KB how back in March of 2010, we learned in the national news that: “A confidential, seven-page Google Inc. “vision statement” shows the information-age giant is in a deep round of soul-searching over a basic question: How far should it go in profiting from its crown jewels—the vast trove of data it possesses about people’s activities?”

Source: Wall Street Journal August 10, 2010

This chart above shows that nearly 85% of respondents are concerned about the practice of tracked online behavior by advertisers.

Then, a Wall Street Journal article titled “What They Know” was posted which discusses how companies are developing ‘digital fingerprint’ technology to track our use of individual computers, mobile devices and TV set-top boxes so they can sell the data to advertisers. It appears that each device broadcasts a unique identification number that computer servers recognize and, thus, can be stored in a database and later analyzed for monetization. This 3-minute video is a must-see!

By the way, they call this practice “Human Barcoding.” KB began to squirm. As we all should.

 

Data. Security. And real estate

So what do “innovative” data mining and monetization methods now in use by Google and others, mean to real estate – specifically the data aggregated by an MLS and then shared around the globe?

We all must first grasp what happens to listing data when it’s collected and syndicated into “the cloud”, as well as the human transaction interactions that follow from start to finish (and beyond, actually).

Second, we need to understand how business intelligence and analytics are being applied to the data generated by real estate transactions today. If there is a monetization to the data without the knowledge and permission of the rightful owner, then, potentially, agreements need to be negotiated (or renegotiated) and modified to get in step with today’s (and tomorrow’s) inevitable ways of doing business. I’m not in any way opposed to data mining per se, the issue at hand here is fair compensation for the data on which it is based.

Here’s why the latest developments regarding Google (and others) are vitally important:

 

  • The world of leveraging digital information is changing very rapidly. As businesses push harder and deeper in their quest to monetize data, information, bits/bytes and mouse clicks, we must establish clear and informed consent on who exactly owns the data, who should control it and how it should be monetized. Protecting OUR “crown jewels”, if you will.
  • What do you know about “Human Barcoding”? It’s time for industry leaders to research this new phenomenon and begin to establish the basis for an industry position as it pertains to residential real estate.
  • How do we, as an industry, determine the real value of data beyond the property-centric context? As true business intelligence and data mining progress in our industry, we need “comps” to build upon to derive a valuation model.
  • What exactly is the MLS’s role? Are we the “stewards” of the data (on behalf of our customers) that emanates from the property record and the subsequent transaction and electronic interactions between all the parties connected to it?  How should the MLS industry confront the challenge?

We all certainly remember when the national consumer portals planted their flag(s) on this industry and, by association, MLS territory. Their rationale then was that they would help drive “eyeballs” and traffic to the inventory. Indeed they have. But, looking back, it all came with a pretty steep price tag.

For example, referral fees were subsequently replaced with advertising revenues that more often than not started chipping away at the edges of the broker’s affiliated business models (mortgage, insurance, etc). Now, as a result, the margins of the business are perilously thin from a broker’s perspective.

The roots of the MLS began as a business to facilitate a fair distribution of commissions and compensation amongst brokers. It’s safe to say, dear Toto, that we are no longer in Kansas anymore. Given the digital landscape, where value can be derived in so many unique ways, the fact that others whose motives for increasing the value of the asset are potentially suspect, it’s critical that we convene right now to assert an intellectual lead on what is happening here, or at least make the conscious decision to step aside.

I’m sure there are many other questions and reasons why this is “mission critical” to us. But what I’ve offered, with the help of several really smart folks in the industry, provides a good starting point. We welcome all industry commentators on this topic. Thanks in advance for sharing ….

John L. Heithaus Chief Marketing Officer, MRIS (john.heithaus@mris.net)

Ps – a “tip of the hat” to Greg Roberston of Vendor Alley for starting us on this path after his excellent post “Inside Trulia’s Boiler Room”*. I also benefited mightily from the comments of David Charron of MRIS, Marilyn Wilson of the WAV Group and Marc Davison of 1000watt Consulting, and I extend my appreciation to them for sharing their perspectives.

* After this story ran, the You Tube video interview with a Trulia staffer was made “private” and is now inaccessible. Vendor Alley’s analysis of the video provides an excellent overview of the situation.