• Non ci sono risultati.

Critical dates analysis

N/A
N/A
Protected

Academic year: 2021

Condividi "Critical dates analysis"

Copied!
14
0
0

Testo completo

(1)
(2)

Critical dates analysis

Input: a textual csv file containing the daily value of PM10 for a set of sensors

Each line of the files has the following format sensorId,date,PM10 value (μg/m3 )\n

Output: an HDFS file containing one line for each sensor

Each line contains a sensorId and the list of dates with a PM10 values greater than 50 for that sensor

Also the sensors which have never been associated with a PM10 values greater than 50 must be included in the result (with an empty set)

(3)

Input file

Output

s1,2016-01-01,20.5 s2,2016-01-01,30.1 s1,2016-01-02,60.2 s2,2016-01-02,20.4 s1,2016-01-03,55.5 s2,2016-01-03,52.5 s3,2016-01-03,12.5

(s1, [2016-01-02, 2016-01-03]) (s2, [2016-01-03])

(s3, [])

(4)

From each input reading

if PM10>50 -> Return (sensorId, date)

if PM10<=50 -> Return (sensorId, null)

Group together values by key

Return (sensorId, [list of values])

One pair per distinct sensorId is returned

Remove null from the lists of values

(5)

N: number of input lines

S: number of distinct sensors

M: number of input lines with PM10>50

Sc: number of distinct sensors associated with PM10>50 at least on time

M<N

Sc<S

S<<N

(6)

textFile(..) N strings mapTopPair(…) N pairs

groupByKey() N pairs are sent on the

network.

S pairs are created.

mapValues(…) S values are updated

(7)

Select only readings with PM10>50

From each of the

selected input readings

Return (sensorId, date)

Group together values by key

Return (sensorId, [list of dates])

One pair per distinct sensorId is returned

From each input reading

Return sensorId

Remove duplicates

Apply subtract to remove the sensorIds that are

associated with PM10>50 at least one time

For each of the selected sensorIds

Return (sensorIds, [ ]), where [ ] is the empy list

(8)

N: number of input lines

S: number of distinct sensors

M: number of input lines with PM10>50

Sc: number of distinct sensors associated with PM10>50 at least on time

NPart: number of partitions

M<N

Sc<S

S<<N

(9)

textFile(..) N strings

filter(..) M<N strings are selected mapTopPair(…) M pairs

groupByKey() M pairs are sent on the

network.

Sc<S pairs are created.

(10)

textFile(..) N strings

map(..) N strings

distinct(..) S x NPart values are sent on the network.

S values are returned.

subtract(… .keys() ) Sc values are sent on the network

mapToPair() S-Sc values are returned

(11)

union(..) S pairs are returned

(12)

N: number of input lines

S: number of distinct sensors

M: number of input lines with PM10>50

Sc: number of distinct sensors associated at least on time with PM10>50

NPart: number of partitions

M<N

Sc<S

S<<N

(13)

Solution V1 sends on the network

N pairs (sensorId, value) on the network

Solution V2 sends on the network

M pairs (sensorId, value) +

S x NPart sensorIds +

Sc sensorIds

(14)

S<<N

If we suppose that PM10>50 is a rare event

M<<N

Hence, Solution V2 sends less data on the network than Solution V1 if M<<N

if M<<N => N > M + S x NPart + Sc

Riferimenti

Documenti correlati

Edited by: Giampaolo Papi, Azienda Unità Sanitaria Locale di Modena, Italy Reviewed by: Zubair Wahid Baloch, University of Pennsylvania, United States Pasqualino Malandrino,

Nella prima cercherò di offrire un quadro delle lotte intorno ai limiti e al contenuto della cittadinanza culturale europea, focalizzando l’attenzione su due approcci

Figure 14: Distribution of Y -vorticity and in-plane velocity vectors in the mixing channel cross-section at Y = −1.3 predicted with (a) Fluent and (b) Nek5000.. However, the flow

PARTITION (ROWID) Maps rows to query servers based on the partitioning of a table or index using the rowid of the row to UPDATE/DELETE.. PARTITION (KEY) Maps rows to query

▪ Each line of the output file contains the number of days with a PM10 values greater than 50 for a sensor s and the sensorId of sensor s.  Consider only the sensors associated

The fact that Benjamin grasps the essential aspect of modern dwelling in that intérieur (as opposed to the working environment) which distinguishes the late

The analysis of data from the selected background stations shows that in the Polish cities the highest daily PM 10 concentrations are observed in the evening and late

Then I suggest that the actors who could be advancing a value shift towards sustainability are the states and political parties, international organizations,