• Non ci sono risultati.

http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/BigData_2015_2x.pdf Partially based on “Big Data: Hype or Hallelujah?” by Elena Baralis

N/A
N/A
Protected

Academic year: 2021

Condividi "http://dbdmg.polito.it/wordpress/wp-content/uploads/2010/12/BigData_2015_2x.pdf Partially based on “Big Data: Hype or Hallelujah?” by Elena Baralis"

Copied!
20
0
0

Testo completo

(1)
(2)
(3)

February 2010

Google detected flu

outbreak two weeks ahead of CDC data (Centers for Disease Control and

Prevention – U.S.A)

Based on the analysis of Google search queries

(4)

February 2010

Google detected flu

outbreak two weeks ahead of CDC data (Centers for Disease Control and

Prevention – U.S.A)

Based on the analysis of Google search queries

Nowcasting

(5)

Internet live stats

http://www.internetlivestats.com/

(6)

User Generated Content (Web & Mobile)

E.g., Facebook, Instagram, Yelp, TripAdvisor, Twitter, YouTube

Health and scientific computing

(7)

Log files

Web server log files, machine system log files

Internet Of Things (IoT)

Sensor networks, RFIDs, smart meters

(8)

Crowdsourcing Sensing

Computing

Map data

Real time traffic info

Travel time forecast/nowcast

(9)

Many different definitions

“Data whose scale, diversity and complexity

require new architectures, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it”

(10)

Many different definitions

“Data whose scale, diversity and complexity

require new architectures, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it”

(11)

Many different definitions

“Data whose scale, diversity and complexity

require new architectures, techniques, algorithms and analytics to manage it and extract value and hidden knowledge from it”

(12)

The 3Vs of big data

Volume: scale of data

Variety: different forms of data

Velocity: analysis of streaming data

… but also

Veracity: uncertainty of data

Value: exploit information provided by data

(13)

Volume

Data volume increases exponentially over time

44x increase from 2009 to 2020

Digital data 35 ZB in 2020

(14)

Variety

Various formats, types and structures

Numerical data, image data, audio, video, text, time series

A single application may generate many different formats

Heterogeneous data

Complex data integration problem

(15)

Velocity

Fast data generation rate

Streaming data

Very fast data processing to ensure timeliness

(16)

Veracity

Data quality

(17)

Value

Translate data into business advantage

(18)

Technology and infrastructure

New architectures, programming paradigms and techniques are needed

Data management and analysis

New emphasis on “data”

Data science

(19)

Processors process data

Hard drives store data

We need to transfer data from the disk to the

processor

(20)

Transfer the processing power to the data

Multiple distributed disks

Each one holding a portion of a large dataset

Process in parallel different file portions from

different disks

Riferimenti

Documenti correlati

Considering only the statistics related to May 2018, the application must select only the pairs (VSID, hour) for which the average of CPUUsage% for that pair is

Specifically, given a date and a sensor, the sensor measured an “unbalanced daily wind speed” during that date if that sensor, in that specific date, is characterized by

Considering only the failures related to year 2017, the application must select the identifiers of the data centers (DataCenterIDs) affected by at least 365

Considering only the faults related to the first 6 months of year 2015 (from January 2015 to June 2015) , the application must select the production plants (PlantIDs) associated

Considering only the prices in year 2017, the application must compute for each stock the number of dates associated with a daily price variation greater than 10

Considering only the prices in year 2014, the application must compute for each stock the number of dates associated with a daily price variation less than 5

(2 points) Consider an input HDFS folder logsFolder containing the files log2017.txt and log2018.txt. Suppose to execute a map-only application, based on MapReduce, that

Suppose that the following MapReduce program, which computes the maximum temperature value for each input date, is executed by providing the folder.. “inputData”