Gigaom.com, By Brett Sheppard, Aug 1, 2010.
Many Fortune 500 and mid-size enterprises are funding Hadoop test/dev projects for Big Data analytics, but question how to integrate Hadoop into their standard enterprise architecture. For example, Joe Cunningham, head of technology strategy and innovation at credit card giant Visa, told the audience at last year's Hadoop World that he would like to see Hadoop evolve from an alpha/beta environment into mainstream use for transaction analysis, but has concerns about integration and operations management.
What's been missing for Big Data analytics has been a LAMP (Linux, Apache HTTP Server, MySQL and PHP) equivalent. Fortunately, there's an emerging LAMP-like stack for Big Data aggregation, processing and analytics that includes:
- Hadoop Distributed File System (HDFS) for storage
- MapReduce for distributed processing of large data sets on compute clusters
- Apache HBase for fast read/write access to tabular data [Editor: Hbase is the Hadoop database]
- Apache Hive for SQL-like queries on large data sets as well as a columnar storage layout using RCFile
- Flume for log file and streaming data collection, along with Sqoop for database imports
- JDBC and ODBC drivers to allow tools written for relational databases to access data stored in Hive
- Hue (Hadoop User Experience) for user interfaces
- Apache Pig for dataflow and parallel computations