WEKA REFERENCE COMMANDS

(1)

WEKA REFERENCE COMMANDS

Data Filtering

ACTION PATH

Replace missing values with the mode/mean values of a specific field

Preprocess à

Unsupervised à Attributes à ReplaceMissingValues Compute the outliers Preprocess à

Unsupervised à Attributes à InterquartileRange Remove the outliers Compute the outliers (see

previous), then select the Outliers field, then

Preprocess à

Unsupervised à Instances à RemoveWithValues, click on properties and set nominalIndices to last.

Finally remove the Outlier column.

Resample the dataset Preprocess à Supervised à Instances à Resample, click on properties and set the sampleSizePercent Select the attributes Preprocess à Supervised

à Instances à

AttributeSelection, click

(2)

on properties and select the matching algorithm Principal Component

Analysis Preprocess à

Unsupervised à Attributes à PrincipalComponents

Data Analysis (CLASSIFICATION)

ALGORITHM PATH

Zero Algorithm (just

choose the most frequent category, no learning)

Classifiers à Rules àZeroR

Bayesian algorithm (assuming independence among fields)

Classifiers à Bayes à NaiveBayes

Bayesian algorithm (estimating correlation among fields)

Classifiers à Bayes à BayesNet

KNearestNeighbour (simple geometric)

Classifiers à Lazy à IBk (select properties à KNN in order to set the number of neighbours) Support vector machines

(geometric, divide the dataset into regions via hyperplanes)

Classifiers à Functions à SMO

Decision Tree based

(build a flowchart tree-like Classifiers à Trees à RandomTree

(3)

structure)

Decision Tree based

(build multiple flowchart tree-like structures and then take the average decision)

Classifiers à Trees à RandomForest

Neural Network based

(multi layer network) Classifiers à Functions à MultiLayerPerceptron Decision Table

(build a list of IF-THEN- ELSE rules)

Classifiers à Rules à DecisionTable

METACLASSIFIERS Voting Technique

(apply multiple classifiers in parallel, and then return the most voted category)

Classifiers à Meta à Vote (and then click on

properties à Classifiers à in order to select the list of ML algorithms)

Boosting Technique

(apply multiple classifiers in parallele, and then

return the a weighted combination of the individual outputs)

Classifiers à Meta à AdaBoostM1 (and then click on properties à Classifiers à in order to select the list of ML

algorithms)

(4)

Data Analysis (CLUSTERING)

ALGORITHM PATH

Simple K-Means Clusterers à

SimpleKMeans, click on properties à numClusters in order to set the number of groups to create.