BIG DATA SUMMARY

It is all their data.

Big Data is a synthesis of or compendium of many technologies.

It is very useful to learners who have a few years of experience.

Big Data – In built

To name a few

1. Many applications

2. A lot of space

3. Many strategies

4. Many wonder tools

5. Many technologies

6. Many talent needs

7. Many Operating Systems etc.

It is multidimensional in one.

It is handled by Tensor based computation.

Big Data – Technologies

1] Hadoop

It is the ecosystem of Big Data.

2] HDF

3] MapReducer

It is a machine instruction set.

4] PIG

It is a high level language.

5] HIVE

It is a high level language.

6] Jaql

It is a high level language.

7] WibiData

8] PLATFORA

9] FLUME

10] Zookeeper

Big Data – Applications

If we know the various applications, really we love it.

1. Internet documents

2. Astronomy

3. Atmosphere

4. Genomics

5. Biogeochemical

6. Biological

7. Military surveillance

8. Medical records

9. Photography

10. Video

11. e-commerce

12. sensor networks

13. Spot business trends

14. Quality of research

15. Prevent diseases

16. Legal citations

17. Combat crime

18. Road traffic

19. Cameras

20. Identification

21. Wireless sensor networks

Etc.,

Big Data – Space

Space is provided by EMC [Isilon One FS OS]

1. Skytree

2. BigData in cloud

2.5 quintillions [ 10 ¹⁸ bytes ] space is created every day.

\Big Data – Technologies

1. Business intelligence

2. Cluster analysis

3. Data mining

4. Predictive modelling

5. SQL

6. A/B testing

7. Crow Sourcing

8. Textual analysis

9. Sentiment analysis

10. Network analysis

Big Data – Operating Systems

1] Cloudera

Cloudera’s Technology breaks data into chunks

2] Hadoop

It is the ecosystem of Big Data.

This contains

1. Resource Manager

2. Kernel

3. Filesyst4em

3] YARN

Yet Another Resource Negotiator.

It made impossible, possible.

It made Hadoop into multi-application OS.

4] EMC

Isilon OneFS OS.

Speed,

Scalability

Security

Protection

It updates Big Data storage.

5] a’ La Windows

It examines files.

Big Data - vendor tools

1. leading business intelligence

2. Microsoft SQL

3. SAP business objects

4. Oracle business intelligence

5. IBM congnoss

6. SAS

7. Microstrategy

8. Qlik tech

9. Tibco spot fire

10. Platfora

Big Data – Strategies

1. Performance Management

2. Data exploration

3. Social analytics

4. Decision science

Big Data – Talent needs / Potentiality

1. Evaluate the internal talent pool

2. Filling your talent gap from sources

3. Cross training

4. Empower your team

Big Data - iOS settings

iPod

iPad

iphone

iOS

apps

data

Memory

storage

BIG DATA - bits

1. Big data is too big for one title to tackle.

2. We need to build big data teams

3. There are five steps to build such a team and it all starts with breaking down the big data talent needs of the company.

4. The four steps are

a. Business analysis

b. Analytics expertise

c. Data technology expertise

d. Visualization expertise

5. Once the organization has determined in talent needs, it can then proceed to the next four steps which involves:

Evaluating the internal talent pool
Filling your talent gap from other sources
Cross-training to improve and enable your team
Empower your team by giving them freedom

6. BIGDATA Technique

a. Business Intelligence (BI)/Online Analytical Processing (OLAP):

b. Cluster Analysis

c. Data Mining

d. Predictive Modeling

e. SQL:

f. A/B Testing

g. Crowdsourcing:

h. Textual Analysis

i. Sentiment Analysis

j. Network analysis

7. BIGDATA Vendor

a. Leading BI Tools:

b. Microsoft SQL Server Analysis and Reporting Services

c. SAP BusinessObjects

d. Oracle Business Intelligence

e. IBM Cognos/SPSS

f. SAS

g. Microstrategy

h. QlikTech

i. TIBCO Spotfire

8. Four Big Data strategies

· Performance Management

· Data Exploration

· Social Analytics

· Decision Science

9. With respect to future trends in the Big Data field, the following practices are starting to emerge:

a. Integrating multiple big data strategies.

b. Build a Big Data capability.

c. Be proactive and create a Big Data policy.

10. Big Data and provides real-world case studies and expert advice to help organisations on their journey. Windows manages the basic functions of a PC and its software,

11. Cloudera’s technology helps companies break data into digestible chunks that can be spread across relatively cheap computers.

12. Cloudera is essentially trying to build a type of operating system, à la Windows, for examining huge stockpiles of information.

13. Filed in: News, Products & Service

14. Tags: cloud, emc, file sync and share, Isilon, OneFS, operating system, storage, Syncplicity

15. Strapped for storage? Before you start randomly deleting stuff, find out which apps are consuming the most space.

16. Your only real option is to free up additional space by deleting apps and data.

17. The solution lies a few steps inside iOS Settings, which can show you exactly what's using your storage -- from most to least. Here's how to get there:

18. Tags:

iPad, iPhone, apps, data, iOS,

iPod, memory, storage

19. Hadoop and the different components make a specialized computing system for big data.

20. I was going through the book, and actively trying to link the different pieces like

HDFS, Map Reduce, Hadoop, Pig,

Hive, Jaql, Zoo Keeper, Flume

21. What’s Computing System??

Computing System (CS) is comprised of many components. The different components are

Storage system to store the data submitted via Input

(E.g. Hard Disks)

Input devices which produce data streams

(E.g. Keyboard, Sensors)

Output devices

(E.g. Screen)

Operating System for managing the show for users and hardware

(E.g. Windows, Mac)

Machine Language aka Machine Instruction Set

(E.g. Intel SSE, Intel MMX, Intel VT-X)

High Level Languages for writing apps and scripts

(E.g. C, C++, Java, Python)

Application and System softwares to do the user defined tasks, as well as managing the high level system activities.

(E.g. MS Word, Photoshop, C Cleaner, Antivirus, Disk Defrag)

20. Hadoop Ecosystem and Compute System

The CS and Different Hadoop Ecosystem components have lots of Similarity between them.

Storage in CS is similar to Hadoop File System (HDFS). The HDFS is Distributed Storage System and the way data is actually stored in HDFS/CS and How we view data is totally different.
Apache Flume is Input equivalent of CS Input Device. Flume routes data into HDFS. Flume can be viewed as log data continuously being stored in a file without any user intervention.
Hadoop is like Operating System which manages the show for User as well as manages the Resources.

The way OS has many components like Resource Managers, Kernel, File systems - Hadoop has different components like

1. Hadoop Core,

2. HDFS,

3. Hadoop YARN,

4. Hadoop Map Reduce

The Map Reduce Framework is like Machine Instruction Set.

The Pig, Hive and Jaql are High Level Languages the way we have

C, Java, Python in CS.

The commands in above languages are converted into corresponding Map Reduce Jobs.

The Mahout, HBase, Cassandra, Ambari, Zoo Keeper are the various Application and System Softwares equivalents running atop Hadoop.

22. YARN to Spin Hadoop into Big Data Operating System

23. SQL in Hadoop via YARN is a part of the core of this metamorphosis.

24. One of these fundamental trends that is changing the picture is enterprises viewing “big data” as “all their data,” – not just specific, narrow aspects of it.

25. Tools and other capabilities have been designed and implemented to address these potential limitations of Hadoop, including vendor tools such as Platfora, as well as well-known projects such as Hive, Pig, and HBase.

26. the YARN project is about opening up the entire framework for use cases that were previously not possible

27. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.”

28. YARN, which they say is an acronym for “Yet Another Resource Negotiator,”

29. YARN can run applications that do not follow the MapReduce model

30. This opens up Hadoop to a whole new paradigm of usage.

31. This means everything from machine learning, to real-time event processing, data modeling and more.

32. So while Hadoop has been virtually synonymous with MapReduce

33. Intel is looking to solve software gaps with on-chip accelerators and cores

34. Intel, increasingly customising server chips for customers, is now tuning chips for workloads in Big Data.

35. Outside of the silicon, Intel is focusing on providing the right software tools for data centers.

36. Hadoop was the starting point, and now Intel is looking closely at analytics, Kasabian said.

37. Our feature warns that while technology is obviously key when it comes to Big Data, we shouldn’t underestimate the human factor.

38. It’s not just about trying to anticipate how big an opportunity Big Data will be for you in 2013 and beyond.

39. EMC Updates OS for Big Data Storage, File Sharing Platforms

40. The company is slated to release the new version of OneFS later this year.

41. Tags: cloud, emc, file sync and share, Isilon, OneFS, operating system, storage, Syncplicity

42. EMC Isilon OneFS Operating System:

a. DESIGNED FOR BIG DATA

b. SIMPLICITY, SPEED, AND SCALABILITY

c. ROBUST SECURITY AND PROTECTION

d. OPERATIONAL FLEXIBILITY

43. Cloudera: An Operating System for BigData

44. Cloudera’s technology helps companies break data into digestible chunks that can be spread across relatively cheap computers.

45. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization

46. "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions."

47. Data sets grow in size in part because they are increasingly being gathered by ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification readers, and wireless sensor networks.

48. every day 2.5 quintillion (2.5×10¹⁸) bytes of data were created.

49. Examples include Big Science, RFID, sensor networks, social networks, big social data analysis (due to the social data revolution), Internet documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, forecasting drive times for new home buyers, medical records, photography archives, video archives, and large-scale e-commerce.

50. PIG was developed by Yahoo!, and, just like Hive, has also been made fully open source.

51. Hive is a “SQL-like” bridge that allows conventional BI applications to run queries against a Hadoop cluster.

52. Big Data and cloud computing go hand-in-hand.

jignasa

Wednesday, 11 September 2013

528. BIG DATA SUMMARY

527. BIG DATA - bits

8. Four Big Data strategies

· Performance Management

· Data Exploration

· Social Analytics

· Decision Science

a. Integrating multiple big data strategies.

b. Build a Big Data capability.

c. Be proactive and create a Big Data policy.

21. What’s Computing System??

20. Hadoop Ecosystem and Compute System

22. YARN to Spin Hadoop into Big Data Operating System

39. EMC Updates OS for Big Data Storage, File Sharing Platforms

a. DESIGNED FOR BIG DATA

b. SIMPLICITY, SPEED, AND SCALABILITY

c. ROBUST SECURITY AND PROTECTION

d. OPERATIONAL FLEXIBILITY

43. Cloudera: An Operating System for BigData

647. PRESENTATION SKILLS MBA I - II

Followers

Important web sites