Skip to main content
Internet Archive's 25th Anniversary Logo

View Post [edit]

Poster: JTW Date: May 12, 2004 4:42pm
Forum: petabox Subject: Re: Selective powering of large petasites

Like you’ve been saying once you going beyond a couple Terabytes of data most is rarely accessed again or in some cases never again. This is one of the problems I see with very large databases. We have massive servers run 24/7 that only has about 1% of the data stored in it used and 97% of that data was added in the last 3-6 months. But because of the database software we’re running (and a management decision) all the data is stored and always powered on in one large database. But to the part that might be of interest to you, we also store reports long-term.

I’ve been looking into is having a system which has “on demand wake up” functionality for these reports. The “computers” and more importantly Hard Drives spend most of their time turned off in a sleep mode i.e. actually off and using a Network signal to the BIOS to bring them back to life when required. For large archives this could save thousands in power consumption, heat problems and should cut down hard drive failure rates. From a management side of things, if it’s possible to figure out in advance what is going to be accesses least, place them in the this long-term storage computer system, while keeping the more highly demanded data in always on subsystem.

From a topology point of view everything seems to be online 24/7 but in actuality it’s the requests for data that drive what systems are currently powered up. I’m on the prowl to see if anybody else is doing this before I invest time into creating our own solution for feasibility testing with off the shelf components and Linux. Initial with 4 boxes single 100GB drives (400GB total of data) and a master control. This controller will mount all the subsystem with NFS or SAMBA depend on the OS that worked best for hibernation / suspend modes. The idea being it’s only when someone access data in those subdirectories on the master controller that the other computer will power-up. Of course there needs to be some controlling program that knows the location of all the data you have, meaning you can’t just let the user start browsing the network looking for files as the systems will end up starting and stop ever couple of minutes.

Reply [edit]

Poster: brewster Date: May 12, 2004 5:32pm
Forum: petabox Subject: Re: Selective powering of large petasites

The Library of Alexandria has a copy of much of the web collection of the Internet Archive. They run their systems with a sleep after 3 minutes of inactivity setting. They report it works fine. In a separate test by Bruce Baumgart, he found it takes 9-10 seconds to spin a disk back up.

We have not done a large-scale test of this approach, but it sounds promising for many applications.

The petabox with spun-down disks would save 1/2 the power.