A look at the Australian blogosphere by blogs.com.au

I need the URLs for these 142,000 blogs

Meg Tsiamis’ latest post points to a link that shows the US based Google owned Blogger service is hosting 142,000 Australian blogs.

I need a script that will extract the URLs of these blogs from the profile pages on that link. I’m willing to pay for a developer that can run a script which can do it. Email me at anthony @ blogs .com.au for details and to discuss it further.

4 comments… read them below or add one

1 Scott Yang — 09.28.07 at 10:33 am

It appears to be *only* 137,073 bloggers with AU profile (at this point in time).

http://www.blogger.com/profile-find.g?t=l&loc0=AU&start=137070

Sounds like a fun exercise though. But…

Whip up the code to download listing and profile pages — 1 hour
Download and scan through 140,000+ pages — more than 8 days at 5 sec/page
Offend the Google God and get penalised — priceless

2 Anthony — 09.29.07 at 11:10 am

Hi Scott and thanks for commenting. I take your point so just checked their robots.txt file and they actually disallow spidering of that file. Personally I don’t see the load on their site as being any different to what Google themselves do when indexing public web sites, databases, books etc etc. Google are in a win-win-win-win situation if I have the list of URLs. Their blogspot sites get more exposure, blogs.com.au which is Powered by Google Co-op then becomes a better resource which generates more search requests that earns Google more advertising money and because I haven’t entered an adsense code they keep it all and as well as this I’d have to upgrade to the Business Edition of Google Co-op to cope with the number of URLs. Alas, disallow in the robots.txt has ruined my plans for world domination. Damm you google and damm you all!

3 Australian Index — 10.03.07 at 6:17 pm

hi,

i’ve got a list of 145,793 Australian blogger blogs at

http://theaustralianindex.com/?s=blogger

i haven’t done much filtering on the list yet…it probably contains a large number of spam and abandoned blogs. a rough estimate would be over half haven’t been updated in the last year.

let me know if you want the list in another format.

4 Antonio — 10.03.07 at 8:46 pm

Hi Anthony. I’ve sent you an email about a script that I wrote that I believe is able to get what you want. Did you get my email? Cheers.

Leave a Comment

You can use these XHTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>