October 5th, 2007

Lisp Usenet Classics II

Google webmaster verification

Google webmaster tools show you how Google sees your site: who links to it, what search terms lead to it, etc. Very handy.

They verify site ownership by asking you to make the URL http://your-site/long-token.html valid. Then they fetch both "long-token.html" and "notfound_long-token.html"; if the former is 200 and the latter 404, you are granted access to the site.

I wondered why they had the "notfound" bit, but then I stumbled across the Hacker News page-not-found page: it gives a 200 instead of a 404. If Google didn't also check for a 404, any website with a similar "every URL is real" policy would be accessible to anyone via the webmaster tools.

It seems to me that responding with 200 for every possible page is a problem waiting to happen. For example, someone could make links to random, long URLs on the site, then wait for search engines to start crawling away. Since no page would ever be invalidated, they'd be checked over and over again.