Wednesday 11 September 2013

527. BIG DATA - bits


BIG  DATA - bits

1.        Big data is too big for one title to tackle.
2.       We need to build big data teams
3.       There are five steps to build such a team and it all starts with breaking down the big data talent needs of the company.
4.      The four steps are
a.      Business analysis
b.      Analytics expertise
c.       Data technology expertise
d.      Visualization expertise
5.       Once the organization has determined in talent needs, it can then proceed to the next four steps which involves:
    1. Evaluating the internal talent pool
    2. Filling your talent gap from other sources
    3. Cross-training to improve and enable your team
    4. Empower your team by giving them freedom
6.      BIGDATA Technique
a.       Business Intelligence (BI)/Online Analytical Processing (OLAP):
b.      Cluster Analysis
c.       Data Mining
d.      Predictive Modeling
e.       SQL:
f.       A/B Testing
g.      Crowdsourcing:
h.      Textual Analysis
i.        Sentiment Analysis
j.        Network analysis

7.      BIGDATA Vendor
a.       Leading BI Tools:
b.      Microsoft SQL Server Analysis and Reporting Services
c.       SAP BusinessObjects
d.      Oracle Business Intelligence
e.       IBM Cognos/SPSS
f.       SAS
g.      Microstrategy
h.      QlikTech
i.        TIBCO Spotfire

8.       Four Big Data strategies

·         Performance Management

·         Data Exploration

·         Social Analytics

·         Decision Science

9.      With respect to future trends in the Big Data field, the following practices are starting to emerge:

a.      Integrating multiple big data strategies.

b.      Build a Big Data capability.

c.       Be proactive and create a Big Data policy.


10.    Big Data and provides real-world case studies and expert advice to help organisations on their journey. Windows manages the basic functions of a PC and its software,
11.     Cloudera’s technology helps companies break data into digestible chunks that can be spread across relatively cheap computers.
12.    Cloudera is essentially trying to build a type of operating system, à la Windows, for examining huge stockpiles of information.
13.  Filed in: News, Products & Service
15.  Strapped for storage? Before you start randomly deleting stuff, find out which apps are consuming the most space.
16.  Your only real option is to free up additional space by deleting apps and data.
17.  The solution lies a few steps inside iOS Settings, which can show you exactly what's using your storage -- from most to least. Here's how to get there:
18.  Tags:
iPad,    iPhone,            apps,                data,                iOS,
iPod,    memory,          storage
19.  Hadoop and  the different components make a specialized computing system for big data.
20.  I was going through the book, and actively trying to link the different pieces like
HDFS,                      Map Reduce,               Hadoop,                      Pig,
Hive,                         Jaql,                             Zoo Keeper,                Flume

21.  What’s Computing System??

Computing System (CS) is comprised of many components. The different components are
  1. Storage system to store the data submitted via Input
(E.g. Hard Disks)
  1. Input devices which produce data streams
(E.g. Keyboard, Sensors)
  1. Output devices
(E.g. Screen)
  1. Operating System for managing the show for users and hardware
(E.g. Windows, Mac)
  1. Machine Language aka Machine Instruction Set
(E.g. Intel SSE, Intel MMX, Intel VT-X)
  1. High Level Languages for writing apps and scripts
(E.g. C, C++, Java, Python)
  1. Application and System softwares to do the user defined tasks, as well as managing the high level system activities.
(E.g. MS Word, Photoshop, C Cleaner, Antivirus, Disk Defrag)

20.  Hadoop Ecosystem and Compute System

The CS and Different Hadoop Ecosystem components have lots of Similarity between them.
  1. Storage in CS is similar to Hadoop File System (HDFS). The HDFS is Distributed Storage System and the way data is actually stored in HDFS/CS and How we view data is totally different.
  2. Apache Flume is Input equivalent of CS Input Device. Flume routes data into HDFS. Flume can be viewed as log data continuously being stored in a file without any user intervention.
  3. Hadoop is like Operating System which manages the show for User as well as manages the Resources.
The way OS has many components like Resource Managers, Kernel, File systems - Hadoop has different components like
1.      Hadoop Core,
2.      HDFS,
3.      Hadoop YARN,
4.      Hadoop Map Reduce
  1. The Map Reduce Framework is like Machine Instruction Set.

  1. The Pig, Hive and Jaql are High Level Languages the way we have
C, Java, Python in CS.
The commands in above languages are converted into corresponding Map Reduce Jobs.
  1. The Mahout, HBase, Cassandra, Ambari, Zoo Keeper are the various Application and System Softwares equivalents running atop Hadoop.

22.  YARN to Spin Hadoop into Big Data Operating System

23.  SQL in Hadoop via YARN is a part of the core of this metamorphosis.
24.  One of these fundamental trends that is changing the picture is enterprises viewing “big data” as “all their data,” – not just specific, narrow aspects of it.
25.  Tools and other capabilities have been designed and implemented to address these potential limitations of Hadoop, including vendor tools such as Platfora, as well as well-known projects such as Hive, Pig, and HBase.
26.  the YARN project is about opening up the entire framework for use cases that were previously not possible
27.  By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.”
28.  YARN, which they say is an acronym for “Yet Another Resource Negotiator,”
29.  YARN can run applications that do not follow the MapReduce model
30.  This opens up Hadoop to a whole new paradigm of usage.
31.  This means everything from machine learning, to real-time event processing, data modeling and more.
32.  So while Hadoop has been virtually synonymous with MapReduce
33.    Intel is looking to solve software gaps with on-chip accelerators and cores
34.   Intel, increasingly customising server chips for customers, is now tuning chips for workloads in Big Data.
35.   Outside of the silicon, Intel is focusing on providing the right software tools for data centers.
36.   Hadoop was the starting point, and now Intel is looking closely at analytics, Kasabian said.
37.    Our feature warns that while technology is obviously key when it comes to Big Data, we shouldn’t underestimate the human factor.
38.   It’s not just about trying to anticipate how big an opportunity Big Data will be for you in 2013 and beyond.

39.   EMC Updates OS for Big Data Storage, File Sharing Platforms

40.  The company is slated to release the new version of OneFS later this year.
42.  EMC Isilon OneFS Operating System:

a.       DESIGNED FOR BIG DATA

b.      SIMPLICITY, SPEED, AND SCALABILITY

c.       ROBUST SECURITY AND PROTECTION

d.      OPERATIONAL FLEXIBILITY

43.  Cloudera: An Operating System for BigData

44.  Cloudera’s technology helps companies break data into digestible chunks that can be spread across relatively cheap computers.
45.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization
46.  "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions."
47.  Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks.
48.  every day 2.5 quintillion (2.5×1018) bytes of data were created.
49.  Examples include Big Science, RFID, sensor networks, social networks, big social data analysis (due to the social data revolution), Internet documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, forecasting drive times for new home buyers, medical records, photography archives, video archives, and large-scale e-commerce.
50.  PIG was developed by Yahoo!, and, just like Hive, has also been made fully open source.
51.  Hive is a “SQL-like” bridge that allows conventional BI applications to run queries against a Hadoop cluster.
52.  Big Data and cloud computing go hand-in-hand.






No comments:

Post a Comment

647. PRESENTATION SKILLS MBA I - II

PRESENTATION  SKILLS MBA   I - II There are many types of presentations.                    1.       written,        story, manual...