Wednesday, 11 September 2013

528. BIG DATA SUMMARY


BIG DATA    SUMMARY

It is all their data.

Big Data is a synthesis of or compendium of many technologies.

It is very useful to learners who have a few years of experience.

Big Data – In built
To name a few
1.      Many applications
2.      A lot of space
3.      Many strategies
4.      Many wonder tools
5.      Many technologies
6.      Many talent needs
7.      Many Operating Systems   etc.

It is multidimensional in one.

It is handled by Tensor based computation.

Big Data – Technologies

1]    Hadoop
It is the ecosystem of Big Data.

2]    HDF

3]    MapReducer
It is a machine instruction set.

4]    PIG
It is a high level language.

5]    HIVE
It is a high level language.

6]    Jaql
It is a high level language.

7]    WibiData

8]    PLATFORA

9]    FLUME

10]  Zookeeper


Big Data – Applications
If we know the various applications, really we love it.
1.      Internet documents
2.      Astronomy
3.      Atmosphere
4.      Genomics
5.      Biogeochemical
6.      Biological
7.      Military surveillance
8.      Medical records
9.      Photography
10.  Video
11.  e-commerce
12.  sensor networks
13.  Spot business trends
14.  Quality  of research
15.  Prevent diseases
16.  Legal citations
17.  Combat crime
18.  Road traffic
19.  Cameras
20.  Identification
21.  Wireless sensor networks
Etc.,

Big Data – Space
Space is provided by EMC [Isilon One FS OS]

1.      Skytree
2.      BigData in cloud

2.5 quintillions [ 10 18 bytes ] space is created every day.

\Big Data – Technologies
1.      Business intelligence
2.      Cluster analysis
3.      Data mining
4.      Predictive modelling
5.      SQL
6.      A/B testing
7.      Crow Sourcing
8.      Textual analysis
9.      Sentiment analysis
10.  Network analysis

Big Data – Operating Systems
1]         Cloudera
Cloudera’s Technology breaks data into chunks

2]         Hadoop
It is the ecosystem of Big Data.
This contains
1.      Resource Manager
2.      Kernel
3.      Filesyst4em

3]         YARN
Yet Another Resource Negotiator.
It made impossible, possible.
It made Hadoop into multi-application OS.

4]         EMC
Isilon OneFS OS.
Speed,
Scalability
Security
Protection
It updates Big Data storage.

5]         a’ La  Windows
It examines files.


Big Data - vendor tools
1.      leading business intelligence
2.      Microsoft SQL
3.      SAP business objects
4.      Oracle business intelligence
5.      IBM congnoss
6.      SAS
7.      Microstrategy
8.      Qlik tech
9.      Tibco spot fire
10.  Platfora

Big Data – Strategies
1.      Performance Management
2.      Data exploration
3.      Social analytics
4.      Decision science

Big Data – Talent needs / Potentiality
1.      Evaluate the internal talent pool
2.      Filling your talent gap from sources
3.      Cross training
4.      Empower your team
Big Data - iOS settings
iPod
iPad
iphone
iOS
apps
data
Memory
storage




527. BIG DATA - bits


BIG  DATA - bits

1.        Big data is too big for one title to tackle.
2.       We need to build big data teams
3.       There are five steps to build such a team and it all starts with breaking down the big data talent needs of the company.
4.      The four steps are
a.      Business analysis
b.      Analytics expertise
c.       Data technology expertise
d.      Visualization expertise
5.       Once the organization has determined in talent needs, it can then proceed to the next four steps which involves:
    1. Evaluating the internal talent pool
    2. Filling your talent gap from other sources
    3. Cross-training to improve and enable your team
    4. Empower your team by giving them freedom
6.      BIGDATA Technique
a.       Business Intelligence (BI)/Online Analytical Processing (OLAP):
b.      Cluster Analysis
c.       Data Mining
d.      Predictive Modeling
e.       SQL:
f.       A/B Testing
g.      Crowdsourcing:
h.      Textual Analysis
i.        Sentiment Analysis
j.        Network analysis

7.      BIGDATA Vendor
a.       Leading BI Tools:
b.      Microsoft SQL Server Analysis and Reporting Services
c.       SAP BusinessObjects
d.      Oracle Business Intelligence
e.       IBM Cognos/SPSS
f.       SAS
g.      Microstrategy
h.      QlikTech
i.        TIBCO Spotfire

8.       Four Big Data strategies

·         Performance Management

·         Data Exploration

·         Social Analytics

·         Decision Science

9.      With respect to future trends in the Big Data field, the following practices are starting to emerge:

a.      Integrating multiple big data strategies.

b.      Build a Big Data capability.

c.       Be proactive and create a Big Data policy.


10.    Big Data and provides real-world case studies and expert advice to help organisations on their journey. Windows manages the basic functions of a PC and its software,
11.     Cloudera’s technology helps companies break data into digestible chunks that can be spread across relatively cheap computers.
12.    Cloudera is essentially trying to build a type of operating system, à la Windows, for examining huge stockpiles of information.
13.  Filed in: News, Products & Service
15.  Strapped for storage? Before you start randomly deleting stuff, find out which apps are consuming the most space.
16.  Your only real option is to free up additional space by deleting apps and data.
17.  The solution lies a few steps inside iOS Settings, which can show you exactly what's using your storage -- from most to least. Here's how to get there:
18.  Tags:
iPad,    iPhone,            apps,                data,                iOS,
iPod,    memory,          storage
19.  Hadoop and  the different components make a specialized computing system for big data.
20.  I was going through the book, and actively trying to link the different pieces like
HDFS,                      Map Reduce,               Hadoop,                      Pig,
Hive,                         Jaql,                             Zoo Keeper,                Flume

21.  What’s Computing System??

Computing System (CS) is comprised of many components. The different components are
  1. Storage system to store the data submitted via Input
(E.g. Hard Disks)
  1. Input devices which produce data streams
(E.g. Keyboard, Sensors)
  1. Output devices
(E.g. Screen)
  1. Operating System for managing the show for users and hardware
(E.g. Windows, Mac)
  1. Machine Language aka Machine Instruction Set
(E.g. Intel SSE, Intel MMX, Intel VT-X)
  1. High Level Languages for writing apps and scripts
(E.g. C, C++, Java, Python)
  1. Application and System softwares to do the user defined tasks, as well as managing the high level system activities.
(E.g. MS Word, Photoshop, C Cleaner, Antivirus, Disk Defrag)

20.  Hadoop Ecosystem and Compute System

The CS and Different Hadoop Ecosystem components have lots of Similarity between them.
  1. Storage in CS is similar to Hadoop File System (HDFS). The HDFS is Distributed Storage System and the way data is actually stored in HDFS/CS and How we view data is totally different.
  2. Apache Flume is Input equivalent of CS Input Device. Flume routes data into HDFS. Flume can be viewed as log data continuously being stored in a file without any user intervention.
  3. Hadoop is like Operating System which manages the show for User as well as manages the Resources.
The way OS has many components like Resource Managers, Kernel, File systems - Hadoop has different components like
1.      Hadoop Core,
2.      HDFS,
3.      Hadoop YARN,
4.      Hadoop Map Reduce
  1. The Map Reduce Framework is like Machine Instruction Set.

  1. The Pig, Hive and Jaql are High Level Languages the way we have
C, Java, Python in CS.
The commands in above languages are converted into corresponding Map Reduce Jobs.
  1. The Mahout, HBase, Cassandra, Ambari, Zoo Keeper are the various Application and System Softwares equivalents running atop Hadoop.

22.  YARN to Spin Hadoop into Big Data Operating System

23.  SQL in Hadoop via YARN is a part of the core of this metamorphosis.
24.  One of these fundamental trends that is changing the picture is enterprises viewing “big data” as “all their data,” – not just specific, narrow aspects of it.
25.  Tools and other capabilities have been designed and implemented to address these potential limitations of Hadoop, including vendor tools such as Platfora, as well as well-known projects such as Hive, Pig, and HBase.
26.  the YARN project is about opening up the entire framework for use cases that were previously not possible
27.  By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.”
28.  YARN, which they say is an acronym for “Yet Another Resource Negotiator,”
29.  YARN can run applications that do not follow the MapReduce model
30.  This opens up Hadoop to a whole new paradigm of usage.
31.  This means everything from machine learning, to real-time event processing, data modeling and more.
32.  So while Hadoop has been virtually synonymous with MapReduce
33.    Intel is looking to solve software gaps with on-chip accelerators and cores
34.   Intel, increasingly customising server chips for customers, is now tuning chips for workloads in Big Data.
35.   Outside of the silicon, Intel is focusing on providing the right software tools for data centers.
36.   Hadoop was the starting point, and now Intel is looking closely at analytics, Kasabian said.
37.    Our feature warns that while technology is obviously key when it comes to Big Data, we shouldn’t underestimate the human factor.
38.   It’s not just about trying to anticipate how big an opportunity Big Data will be for you in 2013 and beyond.

39.   EMC Updates OS for Big Data Storage, File Sharing Platforms

40.  The company is slated to release the new version of OneFS later this year.
42.  EMC Isilon OneFS Operating System:

a.       DESIGNED FOR BIG DATA

b.      SIMPLICITY, SPEED, AND SCALABILITY

c.       ROBUST SECURITY AND PROTECTION

d.      OPERATIONAL FLEXIBILITY

43.  Cloudera: An Operating System for BigData

44.  Cloudera’s technology helps companies break data into digestible chunks that can be spread across relatively cheap computers.
45.  The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization
46.  "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions."
47.  Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks.
48.  every day 2.5 quintillion (2.5×1018) bytes of data were created.
49.  Examples include Big Science, RFID, sensor networks, social networks, big social data analysis (due to the social data revolution), Internet documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, forecasting drive times for new home buyers, medical records, photography archives, video archives, and large-scale e-commerce.
50.  PIG was developed by Yahoo!, and, just like Hive, has also been made fully open source.
51.  Hive is a “SQL-like” bridge that allows conventional BI applications to run queries against a Hadoop cluster.
52.  Big Data and cloud computing go hand-in-hand.






647. PRESENTATION SKILLS MBA I - II

PRESENTATION  SKILLS MBA   I - II There are many types of presentations.                    1.       written,        story, manual...