Showing posts with label Web Crawlers. Show all posts
Showing posts with label Web Crawlers. Show all posts

Thursday, December 3, 2009

Google States Case for Online News in WSJ

Original Article: Google has created a new web crawler specifically for Google News. What this means is that publishers who do not want Google News to index their content can more easily control that. That also applies to publishers who don't wish to completely cut out indexing, but wish to limit/restrict certain elements of their content from being indexed.

Google offers this new crawler at a time when Google's relationship with online news is a heavy focus of discussion throughout the industry, with the FTC's meeting of the media minds taking place. This week Google already announced some changes to how it handles paid content (by offering a five-article limit for the "first click free" plan). Now the company appears to be further extending its olive branch to concerned publishers (whether or not that will be enough is another discussion).

In the past, publishers have been able to block Google from content via robots.txt and the Robots Extension Protocol (REP). They have also been able to keep content out of Google News and stay in Google Search, by using a contact form provided by Google. Now, Google is making it so publishers don't even have to contact them.

"Now, with the news-specific crawler, if a publisher wants to opt out of Google News, they don't even have to contact us - they can put instructions just for user-agent Googlebot-News in the same robots.txt file they have today," says Google News Senior Business Product Manager Josh Cohen. "In addition, once this change is fully in place, it will allow publishers to do more than just allow/disallow access to Google News. They'll also be able to apply the full range of REP directives just to Google News. Want to block images from Google News, but not from Web Search? Go ahead. Want to include snippets in Google News, but not in Web Search? Feel free. All this will soon be possible with the same standard protocol that is REP."

"While this means even more control for publishers, the effect of opting out of News is the same as it's always been," says Cohen. "It means that content won't be in Google News or in the parts of Google that are powered by the News index. For example, if a publisher opts out of Google News, but stays in Web Search, their content will still show up as natural web search results, but they won't appear in the block of news results that sometimes shows up in Web Search, called Universal search, since those come from the Google News index."

Cohen says Google News users shouldn't notice any difference in their experience with the service. It will be interesting to see the reaction from disgruntled publishers, and whether or not this will make any significant difference in how they view Google News.