BIG DATA
- bits
1.
Big
data is too big for one title to tackle.
2.
We
need to build big data teams
3. There are five steps to build such a
team and it all starts with breaking down the big data talent needs of the
company.
4.
The
four steps are
a.
Business
analysis
b.
Analytics
expertise
c.
Data
technology expertise
d.
Visualization
expertise
5. Once the organization has determined in
talent needs, it can then proceed to the next four steps which involves:
- Evaluating the internal talent
pool
- Filling your talent gap from
other sources
- Cross-training to improve and
enable your team
- Empower
your team by giving them freedom
6.
BIGDATA Technique
a.
Business Intelligence (BI)/Online Analytical Processing
(OLAP):
b. Cluster
Analysis
c. Data
Mining
d. Predictive
Modeling
e.
SQL:
f.
A/B Testing
g.
Crowdsourcing:
h. Textual
Analysis
i.
Sentiment Analysis
j.
Network analysis
7. BIGDATA
Vendor
a.
Leading BI Tools:
b.
Microsoft SQL Server Analysis and Reporting Services
c.
SAP BusinessObjects
d.
Oracle Business Intelligence
e.
IBM Cognos/SPSS
f.
SAS
g.
Microstrategy
h.
QlikTech
i.
TIBCO Spotfire
8.
Four Big Data
strategies
· Performance Management
· Data Exploration
· Social Analytics
· Decision Science
9.
With respect to future trends in the Big Data field,
the following practices are starting to emerge:
a. Integrating multiple big data strategies.
b. Build a Big Data capability.
c. Be proactive and create a Big Data policy.
10.
Big
Data and provides real-world case studies and expert advice to help organisations
on their journey. Windows manages the basic functions of a PC and its software,
11. Cloudera’s technology helps companies
break data into digestible chunks that can be spread across relatively cheap
computers.
12.
Cloudera
is essentially trying to build a type of operating system, à la Windows, for
examining huge stockpiles of information.
15. Strapped
for storage? Before you start randomly deleting stuff, find out which apps are
consuming the most space.
16. Your
only real option is to free up additional space by deleting apps and data.
17. The
solution lies a few steps inside iOS Settings, which can show you exactly
what's using your storage -- from most to least. Here's how to get there:
18. Tags:
19. Hadoop
and the different components make a
specialized computing system for big data.
20. I
was going through the book, and actively trying to link the different pieces
like
HDFS, Map
Reduce, Hadoop, Pig,
Hive, Jaql,
Zoo Keeper, Flume
21. What’s Computing System??
Computing System (CS) is comprised of many components. The different
components are- Storage system to store the data submitted via Input
(E.g. Hard Disks)
- Input devices which produce data streams
(E.g. Keyboard, Sensors)
- Output devices
(E.g. Screen)
- Operating System for managing the show for users and hardware
(E.g. Windows, Mac)
- Machine Language aka Machine Instruction Set
(E.g. Intel SSE, Intel MMX, Intel VT-X)
- High Level Languages for writing apps and scripts
(E.g. C, C++, Java, Python)
- Application and System softwares to do the user defined tasks, as well as managing the high level system activities.
(E.g. MS Word, Photoshop, C Cleaner,
Antivirus, Disk Defrag)
20. Hadoop Ecosystem and Compute System
The CS and Different Hadoop Ecosystem components have lots of Similarity
between them.- Storage in CS is similar to Hadoop File System (HDFS). The HDFS is Distributed Storage System and the way data is actually stored in HDFS/CS and How we view data is totally different.
- Apache Flume is Input equivalent of CS Input Device. Flume routes data into HDFS. Flume can be viewed as log data continuously being stored in a file without any user intervention.
- Hadoop is like Operating System which manages the show for User as well as manages the Resources.
The way OS has many components like Resource Managers,
Kernel, File systems - Hadoop has different
components like
1.
Hadoop Core,
2.
HDFS,
3.
Hadoop YARN,
4.
Hadoop Map Reduce
- The Map Reduce Framework is like Machine Instruction Set.
- The Pig, Hive and Jaql are High Level Languages the way we have
C, Java, Python in CS.
The commands in
above languages are converted into corresponding Map Reduce Jobs.
- The Mahout, HBase, Cassandra, Ambari, Zoo Keeper are the various Application and System Softwares equivalents running atop Hadoop.
22. YARN to
Spin Hadoop into Big Data Operating System
23. SQL
in Hadoop via YARN is a part of the core of this metamorphosis.
24. One
of these fundamental trends that is changing the picture is enterprises viewing
“big data” as “all their data,” – not just specific, narrow aspects of it.
25. Tools
and other capabilities have been designed and implemented to address these
potential limitations of Hadoop, including vendor tools such as Platfora, as
well as well-known projects such as Hive, Pig, and HBase.
26. the
YARN project is about opening up the entire framework for use cases that were
previously not possible
27. By
managing the resource requests across a cluster, YARN turns Hadoop from a
single application system to a multi-application operating system.”
28. YARN,
which they say is an acronym for “Yet Another Resource Negotiator,”
29. YARN
can run applications that do not follow the MapReduce model
30. This
opens up Hadoop to a whole new paradigm of usage.
31. This
means everything from machine learning, to real-time event processing, data
modeling and more.
32. So
while Hadoop has been virtually synonymous with MapReduce
33.
Intel is looking to solve
software gaps with on-chip accelerators and cores
34. Intel, increasingly customising server
chips for customers, is now tuning chips for workloads in Big Data.
35. Outside of the silicon, Intel is
focusing on providing the right software tools for data centers.
36. Hadoop was the starting point, and now
Intel is looking closely at analytics, Kasabian said.
37. Our feature warns that while technology
is obviously key when it comes to Big Data, we shouldn’t underestimate the
human factor.
38. It’s not just about trying to
anticipate how big an opportunity Big Data will be for you in 2013 and beyond.
39. EMC Updates OS for Big Data Storage, File
Sharing Platforms
40. The
company is slated to release the new version of OneFS later this year.
42. EMC Isilon OneFS Operating
System:
a.
DESIGNED FOR BIG DATA
b.
SIMPLICITY, SPEED, AND
SCALABILITY
c.
ROBUST SECURITY AND
PROTECTION
d.
OPERATIONAL FLEXIBILITY
43. Cloudera: An Operating System for BigData
44. Cloudera’s
technology helps companies break data into digestible chunks that can be spread
across relatively cheap computers.
45. The
challenges include capture, curation, storage, search, sharing, transfer,
analysis, and visualization
46. "spot
business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine
real-time roadway traffic conditions."
47. Data
sets grow in size in part because they are increasingly being gathered by ubiquitous
information-sensing mobile devices, aerial sensory technologies (remote sensing),
software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks.
49. Examples
include Big Science, RFID, sensor networks, social networks,
big social data analysis (due to the social data revolution), Internet documents, Internet
search indexing, call detail records, astronomy, atmospheric science, genomics,
biogeochemical, biological, and other complex and often interdisciplinary
scientific research, military surveillance, forecasting drive times for new home
buyers, medical records, photography archives, video archives, and large-scale
e-commerce.
50. PIG
was developed by Yahoo!, and, just like Hive, has also been made fully open
source.
51. Hive
is a “SQL-like” bridge that allows conventional BI applications to run queries
against a Hadoop cluster.
52. Big
Data and cloud computing go hand-in-hand.
No comments:
Post a Comment