Hadoop and Big
Data
Doug Cutting, Cloudera's
Chief Architect, helped create Apache Hadoop out of necessity as data
from the web exploded, and grew far beyond the ability of traditional systems
to handle it.
Hadoop was initially inspired
by papers published by Google outlining its approach to handling an avalanche
of data, and has since become the de facto standard for storing, processing and analyzing hundreds
of terabytes, and even petabytes of data.
Apache Hadoop is 100% open
source, and pioneered a fundamentally new way of storing and processing data.
Instead of relying on
expensive, proprietary hardware and different systems to store and process
data, Hadoop enables distributed parallel processing of huge amounts of data
across inexpensive, industry-standard servers that both store and process the
data, and can scale without limits.
With Hadoop, no data is too big.
And in today’s
hyper-connected world where more and more data is being created every day,
Hadoop’s breakthrough advantages mean that businesses and organizations can now
find value in data that was recently considered useless.
|
Reveal
Insight
From All Types of Data,
From All Types of Systems
|
Hadoop can handle all types of data from disparate systems:
1.
structured,
2.
unstructured,
3.
log files,
4.
pictures,
5.
audio files,
6.
communications
records,
7.
email–
just about anything you can think of, regardless of its
native format.
Even when different types of
data have been stored in unrelated systems, you can dump it all into your
Hadoop cluster with no prior need for a schema.
In other words, you don’t
need to know how you intend to query your data before you store it;
Hadoop lets you decide later and over time can
reveal questions you never even thought to ask.
By making all of your data
useable, not just what’s in your databases, Hadoop lets you see relationships
that were hidden before and reveal answers that have always been just out of
reach.
You can start making more
decisions based on hard data instead of hunches and look at complete data sets,
not just samples.
Redefine
the Economics of Data:
Keep Everything, Forever, Online |
In addition, Hadoop’s cost
advantages over legacy systems redefine the economics of data.
Legacy systems, while fine
for certain workloads, simply were not engineered with the needs of Big Data in
mind and are far too expensive to be used for general purpose with today's
largest data sets.
One of the cost advantages of
Hadoop is that because it relies in an internally redundant data structure and
is deployed on industry standard servers rather than expensive specialized data
storage systems, you can afford to store data not previously viable .
And we all know that once
data is on tape, it’s essentially the same as if it had been deleted -
accessible only in extreme circumstances.
Enterprises who build their
Big Data around Cloudera can afford to store literally all the data in their
organization, and keep it all online for real-time interactive querying,
business intelligence, analysis and visualization.
Restructure
Your Thinking:
Make Big Data the Lifeblood of Your Enterprise |
With data growing so rapidly and the
rise of unstructured data accounting for 90% of the data today, the time has
come for enterprises to re-evaluate their approach to data storage, management
and analytics.
Legacy systems will remain necessary
for specific high-value, low-volume workloads, and compliment the use of Hadoop-optimizing the
data management structure in your organization by putting the right Big Data
workloads in the right systems.
The cost-effectiveness, scalability and
streamlined architectures of Hadoop will make the technology more and more
attractive.
In
fact, the need for Hadoop is no longer a question.
The
only question now is how to take advantage of it best, and the
enterprise-proven answer is Cloudera.
No comments:
Post a Comment