I just saw a video of a talk, Matt Cutts, head of Google's anti web spam team, gave at WordCamp San Francisco 2009. I can only recommend watching listening to it (sadly, audio only), since it's as entertaining as informative and covers a lot of topics potentially relevant to bloggers.
Among other things, Matt also addresses Google Page Rank and gives away one of it's applications.
When you ask 10 website owners, what they think about PR, you'll probably get 12 different opinions about it. Views typically range from "obsolete and worthless" to "an absolute must have", along with a usual speculation about how PR does or does not affect search engine result pages and the potential for blog marketing.
Matt did not actually say anything about PR and SERP positioning. He did however make the definitive statement, that Page Rank correlates with crawl rate (slide 14 / about 5:00 in the video). According to him, a higher ranking will result in a site being visited more often. This just calls for doing a comparison of the crawl rates between sites with different ranks.
For my comparison, I have chosen two sites: onyxbits.de, which currently has a PR of 5 and awo-dillenburg.de with a PR of 3. These two meet a number of criteria required to prevent comparing apples and oranges:
- Both are hosted on the same web server, so they are equally accessible to the crawler. Also, logs rotate in the same way and at the same time.
- Both belong to the same TLD. This should rule out any country specific preferences, Google might or might not have. The fact, that awo-dillenburg.de is in German, while onyxbits.de is in English should not matter. The crawler can not figure this out before downloading the content first.
- Both sites are powered by the Drupal CMS. They have a similar setup and link structure, allowing for the same "clickpathes".
- Both sites update slowly. This should reduce effects of the Googlebot being attracted by and checking more often because of new content.
- There is guaranteed to be at least a one entire level of difference in PR between the two sites, even if they are borderline (5.0 vs. 3.9).
With two sites being chosen, it is time to consider methodology.
I intend to extract the raw data for doing my analysis directly from the apache access logs. Logs are rotated at midnight by logrotate(8) and old logs are archived for 52 days.
I am interested in how often the Google bot visits each site per day. However, since both sites do not consist of the same amount of pages, I cannot simply count the total number of hits per day per site, as more/less pages also result in more/less hits and would therefore skew the result. Instead, I have to choose one page from each site to be representative for the whole site. The front page is a natural pick for this. Drupal ensures, that it is the most central page by linking to it form every other page. It also simplifies the task of writing a shell script for data extraction.
Once I have my 52 samples extracted, I can use gnuplot for visualizing them.
Plans being discussed, what are the results?
The graph below illustrates the number of hits per day to the frontpage of each site, for the past 52 days, starting with yesterday (08/21/2009) as day 1. The curve labeled "Site A" represents awo-dillenburg.de with a PR of 3, while "Site B" refers to onyxbits.de with a PR of 5.
Plotting was done, using gnuplot 4.2 patchlevel 2. Click here to download the sample data and settings file.

According to the graph, the average amount of crawls per day was 0.87 for PR3 and 3.27 for PR5. Of course, when interpreting these numbers, it should be kept in mind, that the PR difference was probably not exactly 2 (5-3), but lies somewhere in the range between 2.9 (5.9-3.0) and 1.1 (5.0-3.9).
Conclusion: Page Rank correlates with crawl rate. A higher PR means being crawled more often, resulting in new content being indexed faster. When time to market matter, Page Rank matters as well.
Similar posts
- Unintentionally linking to a bad neighborhood - how to be taken for a ride by black hat SEO
- Disabling the breadcrumb navigation in Drupal for content pages
- Protecting your website's content from deep linking, when using the Drupal CMS
- AdSense not showing relevant ads for your website?
- Putting a notepad into Drupal's sidebar

Delicious
Digg
StumbleUpon
Reddit
Facebook