• Non ci sono risultati.

Spark provides specific actions for RDD containing numerical values (integers or floats)

N/A
N/A
Protected

Academic year: 2021

Condividi "Spark provides specific actions for RDD containing numerical values (integers or floats)"

Copied!
6
0
0

Testo completo

(1)
(2)

Spark provides specific actions for RDD containing numerical values (integers or floats)

RDDs of numbers can be created by using the standard methods

parallelize

transformations that return and RDD of numbers

The following specific actions are also available on this type of RDDs

sum(), mean(), stdev(), variance(), max(), min()

(3)

All the examples reported in the following are applied on inputRDD that is an RDD

containing the following double values

[1.5, 3.5, 2.0]

(4)

Action Purpose Example Result

sum() Return the sum over the

values of the inputRDD

inputRDD.sum() 7.0

mean() Return the mean value inputRDD.mean() 2.3333

stdev() Return the standard

deviation computed over the values of the inputRDD

inputRDD.stdev() 0.8498

variance() Return the variance

computed over the values of the inputRDD

inputRDD.

variance()

0.7223

max() Return the maximum value inputRDD.max() 3.5

min() Return the minimum value inputRDD.min() 1.5

(5)

Create an RDD containing the following float values

[1.5, 3.5, 2.0]

Print on the standard output the following statistics

sum, mean, standard deviation, variance, maximum value, and minimum value

(6)

# Create an RDD containing a list of float values inputRDD = sc.parallelize([1.5, 3.5, 2.0])

# Compute the statistics of interest and print them on # the standard output

print("sum:", inputRDD.sum()) print("mean:", inputRDD.mean()) print("stdev:", inputRDD.stdev())

print("variance:", inputRDD.variance()) print("max:", inputRDD.max())

print("min:", inputRDD.min())

Riferimenti

Documenti correlati

▪ It contains the code that is applied on each element of the “input” RDD and returns a list of Double elements included in the returned RDD.  For each element of the

 The DataFrameReader class (the same we used for reading a json file and store it in a DataFrame) provides other methods to read many standard (textual) formats and read data

// Map each input record/data point of the input file to a InputRecord object // InputRecord is characterized by the features attribute. JavaRDD<InputRecord> inputRDD

// Extract frequent itemsets and association rules by invoking the fit // method of FPGrowth on the input data. FPGrowthModel itemsetsAndRulesModel

 A regression algorithm is used to predict the value of a continuous attribute (the target attribute) by applying a model on the..

 The collectAsMap() action returns a local dictionary while collect() return a list of key- value pairs (i.e., a list of tuples).  The list of pairs returned by collect() can

▪ The broadcast variable is stored in the main memory of the driver in a local variable.  And it is sent to all worker nodes that use it in one or more

 The DataFrameReader class (the same we used for reading a json file and store it in a DataFrame) provides other methods to read many standard (textual) formats and read data