• Non ci sono risultati.

Spark programs are executed (submitted) by using the spark-submit command

N/A
N/A
Protected

Academic year: 2021

Condividi "Spark programs are executed (submitted) by using the spark-submit command"

Copied!
12
0
0

Testo completo

(1)
(2)

 Spark programs are executed (submitted) by using the spark-submit command

 It is a command line program

 It is characterized by a set of parameters

▪ E.g., the name of the jar file containing all the classes of the Spark application we want to execute

▪ The name of the Driver class

▪ The parameters of the Spark application

▪ etc.

(3)

 spark-submit has also two parameters that are used to specify where the application is executed

--master option

▪ Specify which environment/scheduler is used to execute the application

▪ spark://host:port The spark scheduler is used

▪ mesos://host:port The memos scheduler is used

▪ yarn The YARN scheduler (i.e., the one of Hadoop)

▪ local The application is executed exclusively on

the local PC

(4)

--deploy-mode option

▪ Specify where the Driver is launched/executed

▪ client The driver is launched locally (in the

“local” PC executing spark-submit)

▪ cluster The driver is launched on one node of the

cluster

(5)
(6)

In cluster mode

• The Spark driver runs in the ApplicationMaster on a cluster node.

• The cluster nodes are used also to store RDDs and execute transformations and actions on the RDDs

• A single process in a YARN container is responsible for both driving the application and requesting resources from YARN.

• The resources (memory and CPU) of the client that launches the application are not used.

(7)
(8)

In client mode

• The Spark driver runs on the host where the job is submitted (i.e., the resources of the client are used to execute the Driver)

• The cluster nodes are used to store RDDs and execute transformations and actions on the RDDs

• The ApplicationMaster is responsible only for requesting executor containers from YARN.

(9)

 Spark-submit allows specifying

 The number of executors

▪ --num-executors NUM

▪ Default value: NUM=2 executors

 The number of cores per executor

▪ --executor-cores NUM

▪ Default value: NUM=1 core

 Main memory per executor

▪ --executor-memory MEM

▪ Default value: MEM=1GB

 The maximum values of these parameters are

limited by the configuration of the cluster

(10)

 Spark-submit allows specifying

 The number of cores for the driver

▪ --driver-cores NUM

▪ Default value: NUM=1 core

 Main memory for the driver

▪ --driver-memory MEM

▪ Default value: MEM=1GB

 Also the maximum values of these parameters

are limited by the configuration of the cluster

(11)

 The following command submits a Spark application on a Hadoop cluster

spark-submit --class it.polito.spark.DriverMyApplication -- deploy-mode cluster --master yarn MyApplication.jar

arguments

 It executes/submits the application

it.polito.spark.DriverMyApplication contained in MyApplication.jar

 The application is executed on a Hadoop cluster based on the YARN scheduler

▪ Also the Driver is executed in a node of cluster

(12)

 The following command submits a Spark application on a local PC

spark-submit --class it.polito.spark.DriverMyApplication -- deploy-mode client --master local MyApplication.jar

arguments

 It executes/submits the application

it.polito.spark.DriverMyApplication contained in MyApplication.jar

 The application is completely executed on the local PC

▪ Both Driver and Executors

▪ Hadoop is not needed in this case

Riferimenti

Documenti correlati

nonostante queste analogie, altre caratteristiche morfologiche, quali le dimensioni e forma delle squame dell’epicalice, dei petali, e delle antere, per- mettono

 The DataFrameReader class (the same we used for reading a json file and store it in a DataFrame) provides other methods to read many standard (textual) formats and read data

// By default, the MLlib decision tree algorithm considers only the content // of the features attribute to predict the value of the label attribute. // The other attributes are

// Extract frequent itemsets and association rules by invoking the fit // method of FPGrowth on the input data. FPGrowthModel itemsetsAndRulesModel

 If a stock satisfies the conditions multiple times in the same batch, return the stockId only one time for each

 The DataFrameReader class (the same we used for reading a json file and store it in a DataFrame) provides other methods to read many standard (textual) formats and read data

▪ For each record, the value of the new column is the original string associated with the value of the input numerical column.  Has also all the columns of the

Both the amplitude of phase 2 and the duration of phase 3 of the isotonic velocity transient following the force step are reduced by the rise in [Pi] (Fig. 3A) are reproduced by