An interview with Matt Cutts of Google was published this morning and it reveals many indexing techniques that SEO professionals may or may not be aware of. Often, an issue that comes up with sites is, are my pages being indexed? Typically, all pages and content of a site is indexed in some form, albeit at a slower pace sometimes. But, with PageRank, not all indexing is created equally. Lower page ranks on many pages for one site can reduce the amount of time or effort Google spends crawling it, and other poor SEO techniques, such as duplicate content, may contribute to this. Some points, for those of you curious, taken from the interview include:
• The crawling and indexing team at Google wants to see all content, but they won’t crawl every page of a website.
• No indexing cap exists for a site. This means that, if a site has 1000 pages, all will be indexed at some point, although rates for sites vary.
• The pages crawled by Google are proportional to the PageRank.
• The formula for duplicate content’s negative effects. If two out of three pages crawled are duplicate, the total amount of pages crawled is lowered and, thus, the PageRank is also lowered.
• Links to pages with duplicate content aren’t beneficial to increasing PageRank.
• A loss of PageRank may occur through 301 redirects.
• If a site has a large amount of low-ranked pages, Google may not crawl the entire site. This is why, often, brand new websites take longer to be indexed with SEO changes than those already established.
Although this interview doesn’t give the full picture behind Google’s indexing efforts, it does touch on a few significant points and supports many current ideas regarding Google’s indexing methods in comparison to its peers’.
