home :: features :: article

Simple logic for smart data protection

Parity checks could meet the exploding demand for digital storage.

People who want to make the digital revolution more energy efficient, says Tiffany Jing Li, might benefit from a visit to the U.S. Library of Congress.

The world’s largest library has assembled 29 million books and periodicals, 15 million recordings and photographs, and 5 million maps in its 210-year history.

If that sounds impressive, says Li, consider this: The world today generates an equivalent amount of new digital information every 15 minutes.

Whether the videos, photos and tunes taking their place in cyberspace are as worthy of preservation as congressional tomes is a matter of conjecture.

Tiffany Jing Li’s erasure codes promise to reduce the need for heavily redundant data replacement schemes.

But there is little doubt, says Li, an associate professor of electrical and computer engineering, that the explosion of digital data is exacting a steep energy price tag.

“The proliferation of information has created an exponential demand for digital storage,” says Li. “This requires the availability of massive data centers that are becoming notorious energy hogs.”

A PC or laptop by itself generates an inconsequential amount of heat, says Li. But when hundreds or thousands join forces to back up data at a bank, corporation or military base, cooling is essential. In 2006, the U.S. burned through 61 billion kilowatt-hours, at a cost of $4.5 billion, to power and cool data centers. The U.S. Environmental Protection Agency predicts consumption will double by 2011.

Much of that energy, says Li, goes to provide redundant data protection.

“If one disk fails, you need to replace it and restore the data from its duplication. If a second disk fails during the restoration, you need a third disk with the data backed up and protected. Replicating in triplicate enables you to support two concurrent failures.

“All three disks have to be operating simultaneously – collecting and filing data, incorporating updates, using energy. so when you measure energy usage, you are always multiplying by at least three.

“If you can reduce your storage disks without sacrificing reliability or robustness, you can cut costs significantly.”

Government and industry have sought to make data centers more efficient by arranging servers more optimally, improving air flow and developing better lighting and cooling systems.

Li, who has an EAGER (EArly Concept Grants for Exploratory Research) grant from NSF's Office of Cyberinfrastructure, proposes to cut energy consumption in data centers by making data storage itself more efficient. She designs erasure codes that restore lost data when storage disks fail, thus reducing the need for heavily redundant replacement schemes.

An efficient erasure code, says Li, enables one storage disk to protect multiple disks of data with parity checks that rely on a simple logical operation known as the “exclusive or” (XOR), or “modulo-2 addition.” Data is stored as a binary sequence. A parity check adds the sequences for a and b, rounding sums to 0 for even numbers and 1 for odd numbers.

“If you know any two numbers, you can derive the missing number,” says Li. for example, using a parity check, a c disk can protect disk a and disk b through bit-by-bit XOR operation.

“The c disk is equal to a modulo-2 plus b. It is a function of both. If the a disk fails, you subtract b from c to retrieve and restore the content of a. If the b disk fails, you do the reverse. If the c disk fails, you recompute using data from a and b. A single parity disk thus simultaneously backs up two data disks.

While the idea sounds attractive, erasure codes are difficult to design, implement and verify in large-scale data storage systems, says Li.

“In case of data changes, you have to recalculate the related parity checks. With one-for-one replication, it’s easy to verify whether or not you’ve correctly backed up or restored data.

“Erasure codes must be designed just right. The optimal code has to be computationally simple. It has to provide maximum protection against erasures with a minimal number of parity checks.

“In the end, it’s not just a storage network but also a computing network that is needed.”

An optimal erasure code, says Li, is like a hypothetical situation in which a person is given five keys to open three locked doors.

“If you lose two of the keys,” she says, “the remaining three will still open all three doors. It doesn’t matter which keys you lose. “The same is true when you back up three disks of original data with two parity check disks such that any one disk can replace any other. With an optimal code consisting of a + b + c plus f + g, you can derive the data on two failed disks as long as three remain. Any added disk can replace any original disk, but the bits have to work harder.”

The 3 + 2 replacement scheme can be expanded to 5 + 2, 7 + 2 and beyond, says Li.

“Our goal is to design optimal erasure coding schemes that are robust, simple and space-efficient and can replace the replication method of storing data.”

Li draws an analogy to baseball. While today’s specialized relief pitchers might be called on to face just one batter, a Babe Ruth able to strike out the side and then hit the game-winning homer is a superior model for the versatility required of bits of data in an erasure code backup scheme.

Li and her students have successfully designed optimal k + 2 erasure codes for any integer k and for odd values of k. These codes have the smallest possible computational complexity promised by theory. Li’s group has performed theoretical analyses and run computer simulations and will work with data centers to check the codes against real-world failure frequencies and protection requirements.

“Each data center is different. You have to observe it for years, and you have to provide protection for all possible contingencies.

“We’re investigating the possibility and efficiency of a layered approach, with codes of different erasure-correcting capabilities coexisting in the same center to provide local, regional and global protection.”

A parity check for energy use. Enlarge this image.