Wednesday, July 1, 2009

Site Architecture and SEO – file/page issues

Source: Bing.com: Search engine optimization (SEO) has three fundamental pillars upon which successful optimization campaigns are run. Like a three-legged stool, take one away, and the whole thing fails to work. The SEO pillars include: content (which we initially discussed in Are you content with your content?), links (which we covered in Links: the good, the bad, and the ugly, Part 1 and Part 2), and last but not least, site architecture. You can have great content and a plethora of high quality inbound links from authority sites, but if your site’s structure is flawed or broken, then it will still not achieve the optimal page rank you desire from search engines.

The search engine web crawler (also known as a robot or, more simply, a bot) is the key to website architecture issues (Bing uses MSNBot). Think of the bot as a headless web browser, one that does not display what it sees, but instead interprets the HTML code it finds on a webpage and sends the content it discovers back to the search engine database so that it can be analyzed and indexed. You can even equate the bot to a very simple user. If you target your site’s content to be readable by that simple user (serving as a lowest common denominator), then more sophisticated users (running browsers like Internet Explorer 8 or Firefox 3) will most certainly keep up. Using that analogy, doing SEO for the bot is very much a usability effort.

If you care about your website being found in search (and I presume you do if you’re reading this column!), you’ll want to help the crawler do its job. Or at the very minimum, you should remove any obstacles under your control that can get in its way. The more efficiently the search engine bot crawls your site, the higher the likelihood that more of its content that will end up in the index. And that, my friend, is how you show up in the search engine results pages (SERPs).

With site architecture issues for SEO, there’s a ton of material to cover. So much so, in fact, that I need to break up this subject into a multi-part series of blog posts. I’ve broken them down into subsets of issues that pertain to: HTML files (pages), URLs and links, and on-page content. I even plan a special post devoted solely to tag optimizations for SEO.

So let’s kick off this multi-part series of posts with a look at SEO site architecture issues and solutions related to files and pages.

Use descriptive file and directory names

Every time you can use descriptive text to help represent your content, the better off your site will be. This even goes for file and directory names. Besides being far more user friendly for end users to remember, the strategic use of keywords in file and directory names will further reinforce their relevance to those pages.

And while you’re examining the names of files and directories, avoid using underscores as word separators. Use hyphens instead. This syntax will help the bot to properly parse the long name you use into individual words instead of having it treated as the equivalent of a meaningless superlongkeyword.

Limit directory depth

Bots don’t crawl endlessly, searching every possible nook and cranny of every website (unless you are an important authority site, where it may probe deeper than usual). For the rest of us, though, creating a deep directory structure will likely mean the bot never gets to your deepest content. To alleviate this possibility, make your site’s directory structure shallow, no deeper than four child directories from the root.

Limit physical page file size

Keep your individual webpage files down under 150 KB each. Anything bigger than that and the bot may abandon the page after a partial crawl or skip crawling the page entirely.

Externalize on-page JavaScript and CSS code

If your pages use JavaScript and or Cascading Style Sheets (CSS), make sure that content is not inline within the HTML page. Search bots want to see the tag content as quickly as possible. If your pages are filled with script and CSS code, you run the risk of making the pages too long to be effectively crawled. In fact, ensure that the tag starts within the first 100 KB of the page’s source code; otherwise, the bot may not crawl the page at all.

Removing JavaScript and CSS code from your pages into external files offers additional advantages beyond just shortening your webpage files. By being external to the content they modify, they can be used by multiple pages simultaneously. Externalizing this content also simplifies code maintenance issues.

Follow these examples on how to reference external JavaScript and CSS code in your HTML pages.

A few notes to consider. External file references are not supported in really old browser versions, such as Netscape Navigator 2.x and Microsoft Internet Explorer 3.x. But if the users of such old browsers are not your target audience, the benefits of externalizing this code will far outweigh that potential audience loss. I also recommend storing your external code files separately from your HTML code, such as in /Scripts and /CSS directories. This helps keep website elements organized, and you can then easily use your robots.txt file to block bot access to all of your code files (after all, sometimes scripts handle business confidential data, so preventing the indexing of those files might be a wise idea!).

Use 301 redirects for moved pages

When you move your site to a new domain or change folder and/or file names within your site, don’t lose all of your previously earned site ranking “link juice.” Search engines are quite literal in that the same pages on different domains or the same content using different file names are regarded as duplicates. Search engines also attribute rank to pages. Search engines have no way of knowing when you intend new page URLs to be considered updates of your old page URLs. So what do you do? Use an automatic redirect to manage this for you.

Automatic redirects are set up on your web server. If you don’t have direct access to your web server, ask your administrators to set this up for you. Otherwise, you’ll need to do a bit of research. First you need to know which type of HTML redirect code you need. Unless your move is very temporary (in which case you’ll want to use a 302 redirect), use a 301 redirect for permanently moved pages. A 301 tells the search engine that the page has moved to a new location and that the new page is not a duplicate of the old page, but instead IS the old page at a new location. Thus when the bot attempts to crawl your old page location, it’ll be redirected to the new location, gather the new page’s content , and apply any and all changes made to the existing page rank standing.

To learn how to set this up, you’ll first need to know which web server software is running your site. Once you know that, click either Windows Server Internet Information Server (IIS) or Apache HTTP Server to learn how you can set up 301 redirects on your website.

Avoid JavaScript or meta refresh redirects

Technically you can also do page redirects with JavaScript or “refresh” tags. However, these are not recommended methods of accomplishing this task and still achieving optimal SEO results. These methods were highly abused in the past for hijacking users away from content that they wanted to web spam that they didn’t want. As a result, search engines take a dim view of these techniques for redirect. To do the job right, to preserve your link juice, and to continue your good standing with search engines, use 301 redirects instead.

Implement custom 404 pages

When a user makes a mistake then typing your URL into the address bar of their browser or an inbound link contains a typo, the typical website pops up a generic HTML 404 File Not Found error page. The most common end user response to that error message is to abandon that webpage. If that user had gone to your website and despite the error, you actually had the information they were seeking, that’s a lost business opportunity.

Instead of letting users go away thinking your site is broken, make an attempt to help them find what they want by showing a custom 404 page. Your page should look like the other page designs on your site, include an acknowledgment that the page the user was looking for doesn’t exist, and offer a link to your site’s home page and more importantly, access to either a site-wide search or an HTML-based sitemap page. At a minimum, make sure your site’s navigation tools are present, enabling the user to search for their content of interest before they leave.

Implementing a custom 404 page is dependent upon which web server you are using: For users of Windows Server IIS, check out the new Bing Web Page Error Toolkit. Otherwise, browse the 404 information for Apache HTTP Server.
Other crawler traps

The search engine bot doesn’t see the Web as do you and I. As such, there are several other page-related issues that can “trap” the bot, preventing it from seeing all of the content you intend to have indexed. For example, there are many page types that the bot doesn’t handle very well. If you use frames on your website (does anyone still use frames?), the bot will only see the frame page elements as individual pages. Thus, when it want to see how each page interrelates to other pages on your site, frame element pages are usually poor performers. This is because frame pages usually separate content from navigation. Thus content pages often become islands of isolated text that are not linked to directly by anything. And with no links to them, they might never get found. But even if the bot finds the frame’s navigation pane page, there’s no context to the links. This is pretty bad in terms of ranking in search engine relevance.

Other types of pages that can trip up search engine bots include forms (there’s typically no useful content on a form page) and authentication pages (bots can’t execute authentication schemes, so they are blocked from seeing all of the pages behind the authentication gateway). Pages that require either session IDs or cookies to be accessed are similar to authentication pages in that the bot’s inability to generate session IDs or accept cookies block them from accessing content requiring such tracking measures.

To keep the search engine bot from going places that might trip it up, see the following information about the “noindex” attribute to prevent indexing of whole pages.

We’re only getting started here on site architecture issues. There’s plenty more to come. If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. See you soon…

– Rick DeJarnette, Bing Webmaster Center

Monday, June 29, 2009

15 Nifty SEO Google Alert Tips




You may know that you can get the latest news headline links using Google alerts.
Simply go to http://www.news.google.com and put in a search for something you want to know more about.

For instance, I may want to get updates on news about "search engine marketing". After you get the results on that page, drag down to the bottom. In the middle you will see
New! Get the latest news on search engine marketing with Google Alerts. Click the link to got to the Alerts page.

On the Alerts page you can tell Google how often you want to receive the alerts (I always choose "once a day" and to which email account you want to receive the alerts (some people have many email accounts to choose from). Then hit the "Create Alert" button and you will start receiving the alerts for the term you searched. Easy enough unless you are lazy like me. See, I never thought to investigate the "Type" of search result I was looking for so I was getting just news. I could also have been getting blog, web, video and groups alerts, You also have the option to receive "Comprehensive" alerts. Now I select that option. You can subscribe to alerts in multiple languages.

You can receive up to 1,000 alerts. Woot!

Here are some ideas about how you can use Google Alerts.

1. monitor your competitors - new products, ideas, financial changes - competitive intelligence
2. monitor your customers and prospects - It would be nice to send them a card when they do something newsworthy
3. track your name and your business name - put quotes around the phrases like "Joe Jones" or "Pete's Pies" - what are people saying about you or your company in the blogs?
4. in the "Advanced Search" page you can narrow you search by geographical location, date and other parameters.
5.
Track news about new software releases or version upgrades
6. local news - track the subject and the newspaper

7. Want to know when someone links to your website or blog? Search
link:myblogname.com
8. authors - get ideas for a new article

9. niches - more ideas and what is happening in your niche
10. job seekers - think of the many ways to use this to learn more about the job market

11. when is a new page from you blog included in Google? type in a unique line from your article

12. cache- what a page looked like earlier
cache:sitename.com
13. site: get results from just one website

14. related: what does Google think is related to the site - related:www.
sitename.com
15. inurl: search for the page URLs - inurl:seo
Leave your good Google alert tips in the comments.

Bing Webmaster Tools

Source: bing.com: Use the Webmaster Tools to troubleshoot the crawling and indexing of your site, submit sitemaps and view statistics about your sites. Get data on how many pages of your site have been indexed, backlinks, inbound links and keyword performance.

To submit your site to Bing:

To request that Bing crawl your site, submit your site’s domain to http://www.bing.com/docs/submit.aspx.

To submit your Sitemap:

To submit an XML-based Sitemap for your site, copy and paste the below URL into the address bar of your browser–be sure to change “www.YourWebAddress.com” to your domain name–and then press ENTER:

http://www.bing.com/webmaster/ping.aspx?sitemap=www.YourWebAddress.com/sitemap.xml

Get the SEO Toolkit: The IIS Search Engine Optimization Toolkit helps improve a Web site’s relevance in search results

More>>http://www.bing.com/toolbox/webmasters/default.aspx

New tools for webmasters in the Bing Toolbox

Today we’re really excited to announce the arrival of the Bing Toolbox, a new portal for all you Bing webmasters, publishers, developers, and advertisers out there. The Toolbox is an organized set of tools for the entire Bing community, plus links to our Webmaster and Developer community blogs and forums.

Thursday, June 25, 2009

Matt Cutts Answers Questions About Directories and Ranking

As you may know, Google’s Matt Cutts frequently answers questions from Google users on the Google Webmaster Central YouTube channel. There are a couple recent ones in which he addresses questions about directories and how they contribute to a site’s rankings.

The first question is:

Will Google consider Yahoo! Directory and BOTW (Best of the Web) as sources of paid links? If no, why is this different from another site that sells links?


When Google looks at whether or not a directory is useful to users, Google looks at:

- What is the value-add?

- Do they go out and find entries on their own or do they only wait for people to come to them?

- How much do they charge?

- What is the editorial service that’s being charged?

“If a directory takes $50 and every single person who ever applies in the directory automatically gets in for that $50, there’s not as much editorial oversight as something like the Yahoo! Directory, where people do get rejected,” says Cutts. “So if there is no editorial value-add there, then that is much closer to paid links.”

The second question is:

We sell a software product, and there are 100s of software download directories on the web of varying quality. Could submitting our product to all of them hurt our rankings or domain trust/authority?