Gooyah.net Search Engine Blog: June 2006

Written By: Trevor A. Winchell - Gooyah Search Founder

One of the most frequently asked questions readers and clients email StepForth Placement's SEO staff, revolves around how websites can be best optimized to meet the algorithmic needs of each of the major 4 search engines, Google, Yahoo, MSN and Ask.

The more things change, the more they stay the same. Though there have been wide sweeping changes in the organic search engine landscape over the past six months, the fundamental ways search engines operate remains the same.

This question, or variants on it, reflects a shared notion among some webmasters that SEO driven placements at one search engine might come at the expense of high rankings across the other search engines. As the thinking goes, the techniques used to make a well optimized website rank well at Google might somehow prevent that same site from achieving high rankings at Yahoo, MSN and/or Ask. Alternately, webmasters and advertisers who already have great placements at Google but not at the others appear wary of sacrificing their Google rankings in pursuit of higher placements on Yahoo, MSN or Ask.

The differences between how each engine works appears to be causing a bit of confusion among webmasters and search marketers, especially regarding how to optimize well for all four at the same time.

Techniques that work on one engine might not work as well on another. In some extreme cases, techniques that work brilliantly with old school engines like MSN and Ask, and even with the invigorated Yahoo, are a kin to a kiss of death on Google.

There is one search engine friendly site design and optimization philosophy that works, almost every time, without fail. Good content, smart networking, and persistence over time. A well constructed website, or one that has been treated by a good search engine optimizer, should be able to rank well on all major search engines, provided that site has useful, relevant information to express.

Questions about ranking well on all four engines brings up some of the basic differences between the major search engines and, in light of so much change in the sector over the past few months, a look at what search engines look at, and how they do it seems in order.

There are a lot of differences between the major search engines but, by and large, they all gather information the same way. Each major search engine uses unique spider agents known as Googlebot, Slurp (Yahoo/Inktomi), Ask.com/Teoma, and MSNbot, (updated list @ Wikipedia ), that find information by following links from document to document across the web. Spiders are designed to revisit sites on a semi-regular basis as well, though they often hit the index (or home) page more often than other pages. Spiders do tend to dig deeper looking for changes to internal documents based on changes to the index (or home) page. This allows the engines to maintain rapidly updating versions of the web, or parts of the web, in separate proprietary databases.

Each search database has its own characteristics and most importantly, each engine has its own algorithms for sorting and ranking web documents.

Getting information into those databases is the first stage of SEO. The site needs to be constructed (or reconstructed) in such a way as to allow search spiders to easily read and absorb the information and content contained on them.

Assuming realistic expectations and goal setting are already part of the equation, the success or failure of any multi-engine optimization campaign is dependent on the type of site being marketed, as much as it depends on methods and techniques used to market it. If the ultimate goal is strong search engine placements across all major search engines, a few compromises in style might be a temporary necessity in order to expose the great content and reap the rewards of multiple rankings.

Before beginning the building or construction of a site, having a working knowledge of the major on and off-site elements each search engine looks at when examining and evaluating a site and its contents is a key starting point.

There are two overarching areas all search engines examines when ranking a web document or site known as "on-page" and "off-page". As their names indicate, search engines examine factors and elements that occur on the document or site in question as well as factors and elements occurring on other documents and sites related by links or by topical theme.

While the search algorithms of each engine might differ in the number of factors found on or off page and the overall importance of those factors, they all examine generally similar sets of data when deciding which should rank where in relation to whatever search-queries are entered.

For example, Google loves links, as does Yahoo, MSN and to a lesser degree, Ask. MSN and Ask are considered to be old school search engines, allowing simpler SEO techniques to work quite well, as they still do with Yahoo.

On-page factors are generally found in one of four areas, Titles, Tags, Text and Structure, while off-page elements tend to involve links, locality, search-user behaviours and the performance of competing sites.

Here is a thumbnail breakdown the most important factors each search engine considers, roughly laid-out in order of importance.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Google: Incoming Links, On-page SEO, Site Design Spiderability, User analytics, Outgoing links, Inclusion in other Google indexes, Document Histories

Yahoo: On-page SEO, Links and Link Patterns, Site Design, User analytics, Inclusion in other Yahoo indexes, Document Footprints

MSN: On-page SEO, Site Design and Structure and Sipderability

Ask: On-page SEO, Site Design, Site Structure and Spiderability

----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Because Google drives approximately 50% of all organic search traffic, SEOs, webmasters, and search advertisers tend to be most concerned with Google placements. When planning a search optimization campaign, whether for a new site or in the redevelopment of an existing site, building around Google's needs is obviously the most logical path. It is also a smart way to find your way into the other search engines. Though each of the rival engines want to present the best possible results, Google's algorithms account for quality scoring to a deeper degree than the others do. In other words, if your site meets Google's various tests, it will likely meet those of the other engines.

Google puts an enormous weight on its evaluation of the network of links leading to and out from every web document in its index. Most, if not all, documents found in Google's index got there because Google's spider Googlebot found it by following an inbound link. Because its ranking algorithm is so heavily link dependent, Google is frequently forced to tinker with how it evaluates links, a process that generates a score known as PageRank. The basic wisdom on links says that incoming links from topically relevant sites are beneficial while those placed in order to get a better ranking at Google are not. Google also examines links on a document or site that are directed towards other sites in order to gauge if a webmaster is trying to game it or not by participating in link-networking schemes. To one degree or another, the three other major search engines do this as well, though MSN and Ask are not known for using link analysis as a weighty measure of site or document relevancy. Yahoo most certainly does. Link analysis is used to determine the seriousness and credibility of a web document by comparing it with other documents it is associated with.

Once a document exists in a search engine database, several on-page factors are examined. The engines tend to examine several elements of any particular document and the sites they are associated with including title, meta tags (in some cases), body text and other content, and internal site structure.

Written By: Trevor A. Winchell - Gooyah Search Founder

We often get asked by prospective and new clients about Googlebot, such as, "how they know if its been to their site", and so on. Therefore, we've designed this FAQ section to answer some of the more common questions.

1. How do I get listed in Google?
2. What's the name of The Google Spider.
3. What's the difference between Deepbot and Freshbot?
4. How do I know if Freshbot and/or Deepbot have visited my site?
5. What's the actual name for Googlebot showing in the logs?
6. How do I know if my site has been spidered by Google?
7. How do I know which spider (Freshbot or Deepbot) has visited my site if my log analysis reports don't tell me?
8. Googlebot has been to my pages what happens then?
9. How do I know if my site has been indexed by Google?
10. It has been a few days since Googlebot visited but I'm still not showing up in the results pages - Why?
11. My site uses dynamic pages. How can I get it indexed in Google?
12. What if my site is unavailable for Googlebot?
13. What other issues will cause my site to not be indexed by Google?
14. Can I block Googlebot from indexing my site?

All the answers to the questions above are listed below.

1. How do I get listed in Google?
Go to this page and request that your site be added. We recommend only submitting the index of a properly linked site and letting Google find your pages on its own. If for some reason you feel that Google will not index your site properly, then we recommend submitting your sitemap page as well.

2. What is the name of The Google Spider?
Google calls its spider "Googlebot". Googlebot comes in two flavors: Deepbot and Freshbot.

3. What's the difference between Deepbot and Freshbot?
Both spiders have specific tasks, as you may have guessed by their names. Freshbot is the spider you hope to see more often. It looks for fresh content on a site. It is not uncommon for Freshbot to visit a site many times a day. Deepbot is responsible for the really deep crawling. Generally when you see search results change, it is because Deepbot has been active. Deepbot is responsible for thoroughly crawling a site and attempting to build a complete picture, or matrix, of the site; how the site is interlinked and how its navigation affects its usability. Deepbot uses data gathered by Freshbot as well as its own results to build a picture of your site.

4. How do I know if Freshbot and/or Deepbot have visited my site?
Generally you can tell by the spider's IP address. Deepbot uses IPs that start with 216, while Freshbot uses IPs that start with 64. In other words, a Deepbot IP would resemble 216.239.45.4 while a Freshbot IP could include 64.208.32.4.

5. What's the actual name for Googlebot showing in the logs?
Googlebot/2.1 (+http://www.googlebot.com/bot.html). This appears for both Freshbot and Deepbot.

6. How do I know if my site has been spidered by Google?
The easiest way is to do a search for Googlebot (http://www.googlebot.com/bot.html) in your logfile. It may not appear in the "spiders" section of your log analysis tool, however, as it tends to emulate a browser instead. Therefore, if you are using a log analysis tool like WebTrends, look in the "browsers" section of the report.

7. How do I know which spider (Freshbot or Deepbot) has visited my site if my log analysis reports doesn't tell me?
You will likely need to look at the raw log files to see which spider visited. If you perform a search in the file for "Googlebot" then look for the IP address. It will be in the ranges listed above.

8. Googlebot has been to my pages, so what happens next?
Generally, you can expect your site to start showing up in the search results in a short time. Google makes no guarantees when or even if your site will be included in the index.

9. How do I know if my site has been indexed by Google?
If you have noticed the Googlebot appearing in your log files, the easiest way to see if you have been indexed by Google is to perform a search for your site. Simply search for your site (i.e. www.mysite.com or mysite.com) and see if your pages show up.

10. It has been a few days since Googlebot visited but I'm still not showing up in the results pages - Why?
Generally, it takes more than a few days to show up in the index. We recommend waiting at least 1 month as Google usually regularly updates its index in this timeframe. If it has been more than two months, there may be other issues which affect your site's ability to be indexed. As mentioned above, Google makes no guarantees when, or even if, your site will show up in the search results pages. Go to this page on the Google site to see reasons why your site may not be indexed. Aside from Google deciding not to list your site, there are other issues which could have affected your being indexed.

11. My site uses dynamic pages. How can I get it indexed in Google?
Google does index dynamic sites on its own. The problem will be that it won't rank them highly. If all you are concerned with is getting into the index, then you are ok. If you want to rank well for key phrases though, you should consider alternatives other than a dynamic URL system to display your site.

12. What if my site is unavailable when Googlebot visits?
Generally, both Deepbot and Freshbot will make repeated attempts to access your site before moving on. Therefore, it's recommended that your site be available for a majority of the time. If you were indexed in Google, then were removed because your site was unavailable, we recommend waiting at least a month to see if you get reindexed. Many times Google will remove an inaccessible site, to keep its results relevant, then will reinstate the site when the site is available again.

13. What other issues will cause my site to not be indexed by Google?
There can be many things, aside from your site not being available, which could cause Googlebot to exclude your site from the current crawl and index. There are many server issues which could affect ranking as well as design issues and other issues which make it difficult to index the site.

14. Can I block Googlebot from indexing my site?
While we do not recommend this in any situation, unless you thoroughly understand how to write this file, you may feel the need to block all or part of your site from spiders. Through the use of a file called robots.txt you can exclude specific files and folders from being included in the index. You can even block your whole site from being indexed, therefore, you should only employ this file when you are sure you have it configured properly.

If you have any other question feel free to email me trevor@gooyah.net

Gooyah.net Search Engine Blog

Sunday, June 25, 2006

Can My Site Rank Well On All Four Major Engines?

Tuesday, June 20, 2006

Google and Googlebot Information CodeWords

About Me

My Other Blogs

Previous Posts

Archives