I need the URLs for these 142,000 blogs
27 September 2007
Meg Tsiamis’ latest post points to a link that shows the US based Google owned Blogger service is hosting 142,000 Australian blogs.
I need a script that will extract the URLs of these blogs from the profile pages on that link. I’m willing to pay for a developer that can run a script which can do it. Email me at anthony @ blogs .com.au for details and to discuss it further.
4 comments… read them below or add one
It appears to be *only* 137,073 bloggers with AU profile (at this point in time).
http://www.blogger.com/profile-find.g?t=l&loc0=AU&start=137070
Sounds like a fun exercise though. But…
Whip up the code to download listing and profile pages — 1 hour
Download and scan through 140,000+ pages — more than 8 days at 5 sec/page
Offend the Google God and get penalised — priceless
Hi Scott and thanks for commenting. I take your point so just checked their robots.txt file and they actually disallow spidering of that file. Personally I don’t see the load on their site as being any different to what Google themselves do when indexing public web sites, databases, books etc etc. Google are in a win-win-win-win situation if I have the list of URLs. Their blogspot sites get more exposure, blogs.com.au which is Powered by Google Co-op then becomes a better resource which generates more search requests that earns Google more advertising money and because I haven’t entered an adsense code they keep it all and as well as this I’d have to upgrade to the Business Edition of Google Co-op to cope with the number of URLs. Alas, disallow in the robots.txt has ruined my plans for world domination. Damm you google and damm you all!
hi,
i’ve got a list of 145,793 Australian blogger blogs at
http://theaustralianindex.com/?s=blogger
i haven’t done much filtering on the list yet…it probably contains a large number of spam and abandoned blogs. a rough estimate would be over half haven’t been updated in the last year.
let me know if you want the list in another format.
Hi Anthony. I’ve sent you an email about a script that I wrote that I believe is able to get what you want. Did you get my email? Cheers.
Leave a Comment