What
is Hadoop?
About Hadoop®
Apache™ Hadoop® is an open source
software project that enables the distributed processing of large data sets
across clusters of commodity servers.
It is designed to scale up from a single
server to thousands of machines, with a very high degree of fault tolerance.
Rather than relying on high-end
hardware, the resiliency of these clusters comes from the software’s ability to
detect and handle failures at the application layer.
Apache Hadoop has two main subprojects:
·HDFS - A file system that spans all the nodes in a Hadoop
cluster for data storage. It links together the file systems on many local
nodes to make them into one big file system. HDFS assumes nodes will fail, so
it achieves reliability by replicating data across multiple nodes
Hadoop is
supplemented by an ecosystem of Apache projects, such as Pig, Hive and Zookeeper, that extend the value of Hadoop and improves its
usability.
So what’s the big deal?
Hadoop changes
the economics and the dynamics of large scale computing. Its impact can be
boiled down to four salient characteristics.
Hadoop enables a computing solution that is:
Hadoop enables a computing solution that is:
·Scalable– New nodes can be added as needed, and added without
needing to change data formats, how data is loaded, how jobs are written, or
the applications on top.
·Cost
effective– Hadoop brings massively parallel
computing to commodity servers. The result is a sizeable decrease in the cost
per terabyte of storage, which in turn makes it affordable to model all your
data.
·Flexible– Hadoop is schema-less, and can absorb any type of data,
structured or not, from any number of sources. Data from multiple sources can
be joined and aggregated in arbitrary ways enabling deeper analyses than any
one system can provide.
·Fault
tolerant– When you lose a node, the system
redirects work to another location of the data and continues processing without
missing a beat.
Think Hadoop is right for you?
Eighty percent
of the world’s data is unstructured, and most businesses don’t even attempt to
use this data to their advantage. Imagine if you could afford to keep all the
data generated by your business? Imagine if you had a way to analyze that data?
IBM InfoSphere
BigInsights brings the power of Hadoop to the
enterprise. With built-in analytics, extensive integration capabilities and the
reliability, security and support that you require, IBM can help put your big
data to work for you.
InfoSphere
BigInsights Quick Start Edition,
the latest edition to the InfoSphere BigInsights family, is a free, downloadable,
non-production version.
With InfoSphere
BigInsights Quick Start, you get access to hands-on learning through a set of
tutorials designed to guide you through your Hadoop experience. Plus, there is no
data capacity or time limitation, so you can experiment with large data
sets and explore different use cases, on your own timeframe.