Corral: 100 Diverse Big Data Collections

To keep large scientific data for long periods of time special-purpose technologies and expertise are required. That is the purpose of Corral big data repository, which is celebrating addition of 100th unique scientific research collection.

Texas Advanced Computing Center (TACC), Oct 11, 2013.

The world is full of fascinating scientific data from wind maps to water tables to brain databases. All of that information must be stored somewhere and cared for if it hopes to stand the test of time, just as documents in physical archives have.

Dropbox, Amazon or a university server might be sufficient to house small to medium-sized digital collections, or for short-term storage, but to maintain massive (hundreds of gigabytes to petabyte-sized) datasets for years or decades, special-purpose technologies and expertise are required.

Corral Big Data repository

Corral, the large-scale data repository at the Texas Advanced Computing Center (TACC), came online in 2009 to support the storing and sharing of research, data and results. The system just achieved a milestone: Corral now hosts 100 unique scientific research collections from measurements of Earth's gravity field to whale songs to mass spectrometry data. And its usage is growing. Total usage has grown by 10% per month over the last six months, and recently Corral crossed the one petabyte mark in total data stored.

Corral complements the existing suite of TACC resources for data-intensive computing, including the Ranch tape archive (more than 100 petabyte), Stampede, the newest petascale supercomputer (more than 15 petabytes of dedicated storage), and a scalable global file system (20 petabytes) released to users in the Fall 2013.

... At six petabytes - or six million gigabytes - and growing, Corral stores collections that can't live anywhere else because of their scale and complexity. Corral also makes it easy to share data, control access to information, and analyze large datasets. Connected to TACC's other advanced computing systems via a high-speed network, Corral is a critical part of the end-to-end research workflow for scientists.

Read more.