KDnuggets : News : 2003 : n22 : item3 < PREVIOUS | NEXT >

Features

From: Pablo Mayrgundter
Date: 12 Nov 2003
Subject: Internet Archive approached 300 TB in 2002

Regarding your item on largest commercial databases (KDnuggets News 03:21, Item27).

I talked with Brewster Kahle at the Internet Archive (www.archive.org) about their setup last summer. At that time the archive was somewhere around 300TB.

The specs at the time: Many cheap linux boxes, with 4 160GB drives each (the largest at the time I think). The filesystem in use was ext2, with perhaps RAID 2, and all of the archiving/querying was done with standard GNU tools, e.g. find, grep, sort, gawk/sed, etc.. Mr. Kahle said that they'd tried other approaches (e.g. the terra-server technologies from MS, via Jim Grey of Microsoft Research), but nothing could scale to that level as effectively. They found some problems in some of the tools, but submitted patches that are now in the common distributions.

Pablo Mayrgundter
Director of Applications and Services
www.reeltwo.com


KDnuggets : News : 2003 : n22 : item3 < PREVIOUS | NEXT >

Copyright © 2003 KDnuggets.   Subscribe to KDnuggets News!