Testing Google's New Algorithm: It Really Is Better

When we heard that Google had unleashed a new algorithm in the United States to battle content farms, we were cautiously optimistic. Content farms, which bet they can make more more money on any advertisements than they spend producing very low-quality stories, had come to dominate the Internet's long tail.

But I've had my doubts that Google's machines could weed out these content farms. What signals would allow them to distinguish between high- and low-quality writing? Especially considering that humans are only decent at it.

Luckily, Google has gifted us a chance to do some side-by-side comparisons because they're rolling out the new-and-improved algorithm in the United States first. So, we did two searches for the phrase "drywall dust," figuring it was just random enough. One we executed in the standard way, presumably using the new algorithm, and the other we routed through a proxy server that made it look like were coming from India, presumably using the old algorithm.

And I have to say: Wow, the new algorithm yielded far superior results.

Granted, this is just one search for "drywall dust," but if this is even remotely indicative of how well the new algorithm works, we're all going to be tremendously impressed. The search via India led to seven sites that were producing low-quality or aggregated content, a photo of someone covered in dust, and a blog about an individual's remodel. The new algorithm search yielded very different results. Not only were there less content farms but two specialty sites and five fora made the list as well as a Centers for Disease Control page on the dangers of drywall dust. Having clicked through all 20 links, I can assure you that the information delivered by the new algorithm is much, much better.

SidebySide.jpg

Let us know if you have similar experience with other searches. We've been trying out other strings and the pattern appears to hold. We're seeing less content farms and more fora and news websites. For example, check out: "is botox safe" with the old algorithm and the new algorithm. In the latter, I counted five pages from what most would call respectable news sources. In the former, only three made the cut.

Google Changes Its Algorithm

Chatter blizzard! There is a flurry of commentary about Google’s change to cope with outfits that generate content to attract traffic, get a high Google ranking, and deliver information to users! You can read the Google explanation in “Finding More High-Quality Sites in Search” and learn about the tweaks. I found this passage interesting:

We can’t make a major improvement without affecting rankings for many sites. It has to be that some sites will go up and some will go down. Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem. Therefore, it is important for high-quality sites to be rewarded, and that’s exactly what this change does.

Google faces increasing scrutiny for its display of content from some European Web sites. In fact, one of the companies affected has filed an anti trust complain against Google. You can read about the 1PlusV matter and the legal information site EJustice at this link (at least for a while. News has a tendency to disappear these days.)

image

Source: http://www.mentalamusement.com/our%20store/poker/casino_accessories.htm

Why did I find this passage interesting?

Well, it seems that when Google makes a fix, some sites go up or down in the results list. Interesting because as I understand the 1PlusV issue, the site arbitrarily disappeared and then reappeared. On one hand, human intervention doesn’t work very well. And, if 1PlusV is correct, human intervention does work pretty well.

Which is it? Algorithm that adapts or a human or two doing their thing independently or as the fingers of a committee.

I don’t know. My interest in how Google indexed Web sites diminished when I realized that Google results were deteriorating over the last few years. Now my queries are fairly specialized, and most of the information I need appears in third party sources. Google’s index, for me, is useful, but it is now just another click on a series of services I must use to locate information.

On Syndication: Is A MLS A Data Repository, or An Exchange?

This is an exchange, that happens to throw off data

A current discussion within the MLS and tech vendor industry is around the issue of listing syndication. This post by Brian Larson, and the discussions therein, is a pretty good summation of the thinking on the part of MLS executives, vendors, and consultants. As Victor Lund of the WAV Group, a leader in the world of MLS consulting, notes in the comments:

Syndication is absolutely a nightmare on many levels – the control of the data quality is gone – leaving behind dregs like duplicate data, false data, reproductized data, resold data, loss of ownership by brokers, loss of copyright by MLSs, reduction in the quality of curated listing content – yadda, yadda.

For what it’s worth, I agree with Victor 100%… if the MLS is a data collections company, like say NPD Group which collects retail data from thousands of point-of-sales systems. Then the practice of syndication is a nightmare, and a disaster.

I believe, however, that there is a real question as to whether the subscribers to the MLS, the brokers and agents who actually create the data that constitutes the valuable intellectual property at question, see things that way. Most working real estate brokers and agents I know think of the MLS as a way to advertise properties for sale (let’s stick strictly with listing brokers/agents for now). I don’t believe that they think of what they’re doing, when they’re at the MLS screen entering data, as anything other than putting in information to get a house sold.

The popular and oft-heard response to this line of reasoning is, “Well, it’s both, Rob”. (Shortly followed by or preceded by, “You’re so black-and-white; the world is shades of grey, son!”) It is true that I tend towards black-and-white thinking, even if I recognize that in the real world of implementation, sometimes you have to tolerate shades of gray. But it is because without such clarity in thought, effectiveness in action is impossible.

Another way to think about it is from a prioritization standpoint. Fine, a MLS is both a data repository and an exchange. Which is its primary identity, and which is the secondary? Consequences follow from the answer.

If the MLS Is A Data Repository

Let’s say that the answer is that a MLS is a data repository. It may have begun as a way for brokers to cooperate in getting a house sold, but in this day and age of the Internet and sophisticated data analytics, the primary purpose of a MLS is to provide clean, accurate, timely property data to real estate professionals, consumers, and other users of real estate data.

Certain consequences follow this definition of the MLS.

  1. Syndication must be eliminated, except in cases where the MLS can make a reasonable business decision to do syndication, under its licensing terms, with varying degrees of control dependent on compensation.
  2. In fact, if the MLS is primarily a data repository, its membership agreements probably should spell out that it will be the exclusive provider of listings data, and that the listing brokers and agents will surrender their rights to send the same intellectual property to a different source. If I contract to write columns for AOL, I cannot then send that same column to Yahoo, unless our agreement says I can. The same analysis must apply to listings entered into the system by brokers and agents.
  3. Intellectual property rights, sharing of those rights, and various mutual licensing arrangements must be clarified and agreed upon by all participants, including the real estate agent who is actually doing all of the data entry. At a minimum, if the MLS is a data repository, and its subscribers are paying to create the valuable intellectual property that is being deposited into the repository, then some accommodation has to be reached between the MLS and the subscribers as to if, when, and how those content creators ought to be compensated for their efforts.
  4. The data that is being entered, aggregated, and re-sold/licensed needs to be examined for more than what it is today. There are hundreds of data fields in a listing that go unfilled because they’re not particularly relevant for attracting buyers to the property. But maybe those fields — like soil type, distance to power lines, etc. — are very relevant for sophisticated users of the data.
  5. IDX must be put back on the table for discussion.
  6. There has to be a discussion about the equal treatment that listing brokers and buying brokers receive in the MLS. The former creates IP that will be leveraged and monetized; the latter does not.
  7. A real discussion has to be held as to the minimum useful geography if a MLS is to be thought of primarily as a data repository. Real estate may be local, but data is not particularly useful unless it’s at a certain size. It’s impossible to do trend analysis on four closed sales in one zip code.

Each or all of these things can be modified, tweaked, or changed based on how strongly the secondary purpose of advertising a home for sale is to the MLS and its subscribers. But if the MLS is widely understood by all stakeholders and participants to be a data repository, the relationship between the brokers, agents, Associations, and the MLS will likely need to be renegotiated.

If the MLS Is An Exchange

If, on the other hand, all of the stakeholders and participants understand the MLS to be an exchange, created for the primary purpose of selling a home, then other consequences follow.

  1. Whatever value the property data might hold, that value is subordinate to the primary value of advertising a home for sale. Syndication must not only continue, but be expanded, and the propriety of charging licensing fees and other revenues at the cost of wider advertising distribution must be examined.
  2. The whole concept of data accuracy and data integrity has to be understood in the context of advertising a property for sale, rather than the context of third party users such as government agencies, banks, and academics.
  3. MLS rules and practices should be re-examined in light of the clarified understanding of the MLS as an exchange facilitating the sale of a home.
  4. MLS products and services that do not advance the primary goal of advertising homes for sale need to be validated by the leadership and by the subscriber membership.
  5. There has to be a discussion not of minimum geography, but of maximum geography. It isn’t logical to believe that if the MLS is merely an exchange, and real estate is local, then a super-regional MLS could serve the advertising function as effectively as a hyperlocal one.

Again, these consequences can be modified, tweaked, altered, and so forth based on the particular MLS’s stakeholders deciding how much to be influenced by the secondary function of data services.

My Take On The Issue

My personal take, after laying out the issues, is that a MLS is first and foremost an exchange, created for the primary purpose of advertising homes for sale. The exchange activities happen to throw off extremely valuable intellectual property as byproduct: accurate, timely, and comprehensive data on real estate activities. To the extent that the activity generates valuable assets — much like how fertilizer is often a byproduct of raising cattle — the MLS should attempt to control and monetize those assets. However, like any other byproduct, one would not impinge on the primary purpose for the sake of the secondary.

The whole syndication debate is far more complex and far more detailed, of course. And as I’ve mentioned, in the real world of implementation, there are going to be some grey areas. But I believe much of the confusion in the industry today around the issue stems from the fact that the big assumption has not been adequately communicated, discussed, or accepted by some of the main stakeholders: brokers and agents. Debate and settle the big question, and the details can be resolved using the clear understanding of primary vs. secondary purposes.

MLS Tesseract: A Listing Syndication Discussion

There has been a great deal of talk and writing lately about MLS/broker listing syndication. This isn't surprising considering one of the leading synidcation providers, Threewide, was recently acquired by Move, Inc., operator of Realtor.com. At the same time, initiatives like those led by industry veterans Bud Fogel and Mike Meyers and by LPS's Ira Luntz (himself a veteran of syndication), suggest that a new model could be in the offiing. Among commentators who have taken up the issue are MRIS Chief Marketing Officer John Heithaus, in an often-discussed post; Victor Lund of the WAVGroup(you need to be in Inman subscriber to read that one); and Rob Hahn.

Elizabeth and I wrote a longish whitepaper on syndication back in 2008, which is still available on our firm web site. That paper, written just before Elizabeth and I formed our firm together, focused on the role of MLS and how MLSs should approach operational and legal issues; it assumed that MLSs would want to do syndication because at least some brokers wanted syndication. Most of the concerns we expressed then remained unaddressed in the industry. The current debate, it seems to me, is about whether brokers and MLSs should be doing syndication at all. It takes up the key assumption in our 2008 whitepaper.

I'd like to expend some effort and thought on this topic in the next few posts. This first one will define what I mean by "listing syndication," in order to distinguish it from other forms of listing data distribution and licensing; and it will discuss some reasons MLSs get involved in syndication. In the next post, we'll consider ways that MLSs get involved in syndication and some problems and issues. Then we'll take a look at the key underlying question: should brokers be sending listings to all these places in the first place, and what role should MLSs play in that decision?

“Syndication” defined

There is no official definition of “listing syndication.” There is no Platonic universal or form corresponding to listing syndication. So, we just need a practical definition that provides some scope to what we are talking about. For this summary, I will use the following definition: “listing syndication is the distribution in bulk of active real estate listings (listings currently available for sale), by or on behalf of the listing agent or listing broker, to sites that will advertise them on the web to consumers.”

We include each of the following things in this definition:

  • Distribution of listings by MLS through a listing syndicator, such as ThreeWide (ListHub) or Point2, to advertising sites.
  • Direct distribution of listings by MLS to advertising sites (including local newspaper web sites and national sites).
  • Distribution by a listing broker via a data feed (whether broker-internal or created by MLS on the broker’s behalf) to advertising sites.
  • Use by a broker or agent of a service that offers to take a bulk data feed and then distribute the listings to advertising sites.

I usually call web sites that advertise real estate listings “aggregators” (we used "commercial distributors" in the 2008 report). I usually refer to the recipients of data through syndication as “syndication channels.” As a result, a site like Zillow.com is both an aggregator and a syndication channel. I do not consider the following syndication (at least for purposes of this discussion), though they share some characteristics with it:

  • Services where agents manually load their own listings in, like vFlyer and Postlets. These sites generally do not take bulk data feeds.
  • Data licensing to RPR or CoreLogic under their current proposals. They are using off-market listings, in addition to active ones, and the applications they use them for are not advertising. Sending data to Move, Inc. for its Find application is not syndication because it includes off-markets; but see the discussion relating to that below.
  • A “back-office data feed” from MLS to the listing brokerage. A back-office feed often includes all the MLS listing data (from all brokers) and comes with a license for the brokerage to use the data internally for the core purposes of MLS and the freedom to use its own listings any way it pleases. Many MLSs provide such feeds to their participants to facilitate brokerage business activities. Thus, though MLS’s action here is not syndication, the brokerage might turn around and engage in syndication itself.

Why only active listings? We do not include off-market listings (listing records relating to properties not currently for sale) in our definition of syndication for two reasons:

  1. MLSs perceive the off-market listings as something different. Most MLSs recognize that very recent off-market activity in the MLS provides a very valuable resource, one not available elsewhere. Thus very few MLSs distribute off-market listings through typical syndication channels.
  2. Brokers perceive the off-market listings as something different. Brokers want their active listings advertised (and their sellers want it, too). But brokers rarely perceive value in having their off-market listings distributed. We are not acquainted with any brokerage firm that distributes its off-market listing data.

The value MLSs bring to syndication

Almost from the beginning of “listings on the Internet,” people have asked what role the MLS should play in getting listings out there. The short answer is efficiency:

  1. MLS already has all the listing content in one database. Every listing broker has already paid for that database to be created and maintained; and every such database has the capability to export listing data. In theory, at least, it should always be cheaper for MLS to ship brokers’ listings to a channel, because it requires fewer steps. In the alternative, MLS would supply each broker a data feed of its own listings, then each broker would have to set up a feed to each channel (or at least set up a feed to a syndicator who could reach the channels). That’s a lot more data feeds, IT staff hours, etc.
  2. Many MLSs and traditional syndicators permit broker ease-of-use. Syndicators like Point 2 offer brokers a dashboard where they can click on the channels they want to receive their listings and click off the ones they don’t want to receive them. In theory, this does not require the broker to perform research and due diligence on each channel; the syndicator or MLS has theoretically done that before presenting the option on the dashboard. (In practice, this may not be happening.)
  3. Syndication through MLS or a syndicator may give listing brokers more leverage. If a channel is getting the listings from many brokers in MLS through a data feed from MLS, the MLS may have leverage with the channel to get it to behave properly. If the MLS cuts off the data feed, the channel loses listings from all the brokers. Similarly, if a syndicator cuts off a feed to a channel, the channel loses the feed for all the MLSs working with the syndicator. A single broker, by comparison, usually does not have the volume of listings to exert leverage on the channels. Note that some channels (like Google, before it decided to stop accepting listings), did not necessarily react to that leverage anyway. Note also that just because MLSs and syndicators have this leverage, that does not mean they have actually used it (I'll discuss this in another post).

The resourceful broker problem

One important fact about syndication is what we call the “resourceful broker problem.” It’s not really a problem at all; it’s just competition. If an MLS does not syndicate listings on behalf of its brokers, some of the brokers will assume the costs and work associated with syndicating their own listings. This will give those brokers a competitive advantage in the market. Note that we don’t call this the “large broker problem.” Though the large brokers in markets are also often resourceful brokers, in many cases, smaller brokers also find the means to be resourceful. The MLS is confronted with its age-old problem, almost its nemesis: Choose between (a) delivering services at the lowest common denominator, drawing complaints from some brokers that MLS should be doing more to deliver efficiencies to all brokers; and (b) delivering efficient services to all brokers, drawing complaints from resourceful brokers that MLS is “leveling the playing field.”

Neither of these arguments is wholly right or wrong. But they appear in some form whenever MLSs consider offering services like syndication. The intensity of feeling about which path the MLS should take varies a great deal from MLS to MLS and often within the board room of a single MLS.

So, we've stipulated a definition for "syndication"; should we be including other things, or perhaps excluding something I've included here? And we've discussed why MLSs often believe they should be involved. I'm curious what your thoughts are about my efficiency arguments there. Next time, we'll consider some ways that MLSs do, and don't, syndicate. Following that, I'd like to spend a little time considering where brokers should be sending their listings and whether the MLS should be deciding for them.