WEKA REFERENCE COMMANDS
Data Filtering
ACTION PATH
Replace missing values with the mode/mean values of a specific field
Preprocess à
Unsupervised à Attributes à ReplaceMissingValues Compute the outliers Preprocess à
Unsupervised à Attributes à InterquartileRange Remove the outliers Compute the outliers (see
previous), then select the Outliers field, then
Preprocess à
Unsupervised à Instances à RemoveWithValues, click on properties and set nominalIndices to last.
Finally remove the Outlier column.
Resample the dataset Preprocess à Supervised à Instances à Resample, click on properties and set the sampleSizePercent Select the attributes Preprocess à Supervised
à Instances à
AttributeSelection, click
on properties and select the matching algorithm Principal Component
Analysis Preprocess à
Unsupervised à Attributes à PrincipalComponents
Data Analysis (CLASSIFICATION)
ALGORITHM PATH
Zero Algorithm (just
choose the most frequent category, no learning)
Classifiers à Rules àZeroR
Bayesian algorithm (assuming independence among fields)
Classifiers à Bayes à NaiveBayes
Bayesian algorithm (estimating correlation among fields)
Classifiers à Bayes à BayesNet
KNearestNeighbour (simple geometric)
Classifiers à Lazy à IBk (select properties à KNN in order to set the number of neighbours) Support vector machines
(geometric, divide the dataset into regions via hyperplanes)
Classifiers à Functions à SMO
Decision Tree based
(build a flowchart tree-like Classifiers à Trees à RandomTree
structure)
Decision Tree based
(build multiple flowchart tree-like structures and then take the average decision)
Classifiers à Trees à RandomForest
Neural Network based
(multi layer network) Classifiers à Functions à MultiLayerPerceptron Decision Table
(build a list of IF-THEN- ELSE rules)
Classifiers à Rules à DecisionTable
METACLASSIFIERS Voting Technique
(apply multiple classifiers in parallel, and then return the most voted category)
Classifiers à Meta à Vote (and then click on
properties à Classifiers à in order to select the list of ML algorithms)
Boosting Technique
(apply multiple classifiers in parallele, and then
return the a weighted combination of the individual outputs)
Classifiers à Meta à AdaBoostM1 (and then click on properties à Classifiers à in order to select the list of ML
algorithms)
Data Analysis (CLUSTERING)
ALGORITHM PATH
Simple K-Means Clusterers à
SimpleKMeans, click on properties à numClusters in order to set the number of groups to create.