Data cleaning results - Politecnico di Torino Dipartimento di Ingegneria Meccanica e Aerospazia

In this section, the principal results of the cleaning will be explained following the steps described in section 3.7.

In the first part, the feature selection results will be shown, presenting the step-by-step outcomes only for the gross movement pattern; on the other hand, the final result of the redundant feature removal will be shown for every movement pattern.

In the second part, the results of the noisy instances will be presented only for the gross movement pattern, but the others have similar results.

4.4.1 Selected features

Since the first step of the feature selection algorithm proposed in this work is the estimation of the feature importance, figure 4.6 represents the bar diagram of the importance for the gross movement cluster of movement patterns. According to the feature importance, most of the proposed features in section 3.5.4 carry out essential information for the bradykinesia prediction.

The second result of the procedure is the correlation matrix presented in figure 4.7.

It may be observed that a significant number of features are highly correlated, hence the removal of those features is necessary. The final result of the redundant feature removal is in figure 4.8, where the black spots in the correlation matrix means that the feature is discarded.

Upper gross movements ReliefF Feature Importance

time_since_med_intake seg_v_perc_mov

entropy_velentropy_pos seg_v_entropy

entropy_X

entropy_PSD_Xzero_cross_Y seg_v_max_ratio

entropy_Zentropy_Y

entropy_PSD_Yentropy_PSD_Zseg_v_std_ratio seg_v_mean_ratio

entropy_jerk

seg_v_entropy_no_trem entropy_mgn

seg_v_entropy_ratio energy_peakzero_cross_X

peak_cc_Z

dominant_freq_ampl zero_cross_Z

mad_velmad_posmad_mgn mad_Zmad_Xiqr_vel

peak_cc_X mad_Y peak_cc_Ymad_jerkrange_pos

std_posrms_posstd_velrms_vel range_vel

power_sum std_PSDmean_velrms_mgn

autocorrRange_X std_mgn

sma

mean_mgn

seg_v_mean_vel_when_mov iqr_posrms_jerk

seg_v_max_no_trem midHinge_mgn

rms_X range_jerkmean_pos

seg_v_std_no_trem mean_jerkiqm_mgnstd_jerk

seg_v_mean_no_trem rms_Z

seg_v_meanrange_mgnseg_v_std

energy_ratio_no_peak range_Ziqr_mgnrange_Yrms_Y

mean_range_X iqr_X range_Xmead_X

autocorrRange_Y std_range_Xmead_mgn

mean_range_Y seg_v_max

ratio_mean_max_peak autocorrRange_Zmean_range_Z

iqr_jerk std_range_Z

mead_Y iqr_Y

std_range_Y mead_Z

iqr_Z

energy_ratiokurtosis_PSD skewness_PSD

seg_v_perc_intense_mov timelag_cc_Ytimelag_cc_Zdominant_freqtimelag_cc_X

Features

-0.01 -0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035

Predictor Importance Estimates

Figure 4.6: Feature importance estimates computed using ReliefF on the gross movement cluster. The outcome of the algorithm shows as the most important features to predict the bradykinesia severity is the medication intake, the segment velocity features, and the entropy. On the other hand, the less important for the bradykinesia estimates are the cross-correlation time lag and the dominant frequency.

mean_mgnstd_mgn rms_mgn rms_X rms_Y rms_Z

range_mgn

range_X range_Y range_Z

dominant_freq_ampl dominant_freqpower_sumenergy_peak energy_ratio

energy_ratio_no_peakstd_PSDratio_mean_max_peak entropy_X entropy_Y entropy_Z

entropy_mgn

entropy_PSD_X entropy_PSD_Y entropy_PSD_Z skewness_PSD

kurtosis_PSD

peak_cc_X peak_cc_Y peak_cc_Z timelag_cc_X timelag_cc_Y timelag_cc_Z

rms_velrange_velentropy_velmean_vel std_velrms_posrange_posentropy_pos mean_posstd_pos rms_jerk

range_jerk entropy_jerkmean_jerkstd_jerksmaiqr_mgn

iqr_X iqr_Y iqr_Z

autocorrRange_X autocorrRange_Y autocorrRange_Z

mad_mgn

mad_X mad_Y mad_Z

mead_mgn

mead_X mead_Y mead_Z

mean_range_X mean_range_Y mean_range_Z std_range_X std_range_Y std_range_Z zero_cross_X zero_cross_Y zero_cross_Z

iqm_mgnmidHinge_mgniqr_vel mad_veliqr_posmad_posiqr_jerk mad_jerkseg_v_maxseg_v_meanseg_v_std seg_v_entropyseg_v_max_no_tremseg_v_mean_no_tremseg_v_std_no_tremseg_v_entropy_no_trem seg_v_perc_movseg_v_perc_intense_movseg_v_mean_vel_when_movseg_v_max_ratio seg_v_mean_ratioseg_v_std_ratioseg_v_entropy_ratiotime_since_med_intake

mean_mgnrms_mgnstd_mgnrms_Xrms_Yrms_Z range_mgnrange_Xrange_Yrange_Z dominant_freq_ampldominant_freqenergy_peakenergy_ratiopower_sum energy_ratio_no_peakstd_PSD ratio_mean_max_peakseg_v_mean_no_tremseg_v_max_no_tremseg_v_std_no_tremautocorrRange_XautocorrRange_YautocorrRange_Zentropy_PSD_Xentropy_PSD_Yentropy_PSD_Zskewness_PSDmean_range_Xmean_range_Ymean_range_ZmidHinge_mgnseg_v_entropykurtosis_PSDzero_cross_Xzero_cross_Yzero_cross_Ztimelag_cc_Xtimelag_cc_Ytimelag_cc_Zentropy_mgnseg_v_meanstd_range_Xstd_range_Ystd_range_Zentropy_posentropy_jerkentropy_velseg_v_maxpeak_cc_Xpeak_cc_Ypeak_cc_Zmead_mgnrange_posmean_posrange_jerkmean_jerkentropy_Xentropy_Yseg_v_stdentropy_Zrange_velmad_mgnmean_veliqm_mgnmad_posmad_jerkrms_posrms_jerkmead_Xmead_Ymad_veliqr_mgnmead_Zrms_velstd_posstd_jerkiqr_posiqr_jerkstd_velmad_Xmad_Ymad_Ziqr_veliqr_Xiqr_Yiqr_Zsma

seg_v_entropy_no_tremseg_v_perc_mov seg_v_perc_intense_mov seg_v_mean_vel_when_movtime_since_med_intakeseg_v_entropy_ratioseg_v_mean_ratioseg_v_max_ratioseg_v_std_ratio

Upper gross movements Feature correlation matrix

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Figure 4.7: Feature correlation matrix of the gross movement cluster. Most features are highly correlated among each other and this justify the removal of the redundant ones. Meanwhile, a small amount of features are very low correlated like the energy ratio, the features derived by the channel cross-correlation, the zero crossing rate and the medication intake.

Besides, a general overview of the results in the different movement patterns is in figure 4.9, where the features are ranked according to their importance.

The features selected in all the clusters are invariant respect to the movements, instead, the less selected features depend on the patterns of movement.

Figure 4.8: Feature correlation matrix of the gross movement pattern after the removal of redundant predictors. The black squares are the removed features;

in general, there are still some correlation among the features but the retained features carry out the most information for the bradykinesia estimates.

Figure 4.9: Retained features of the different movement patterns. The move-ment clusters have some features in common, such as the dominant frequency, the auto-correlation range, and the zero crossing, but different other features can detect the bradykinesia only in particular movement patters.

4.4.2 Data cleaning

The results are shown in a qualitative way using t-SNE projections and the numerical results of this section will be described in section 4.7.1.

Figure 4.10 illustrates the projection of the gross movement cluster before the re-moval of noisy and outlier data points. It may be noticeable that the class separation is not significant.

Then, the result of the k-means clustering algorithm is shown in figure 4.11, where

Figure 4.10: t-SNE projection of the gross movement cluster before the cleaning.

The overlap among the classes is serious and the variability is still high.

the clusters should correspond to the classes of bradykinesia severity.

Figure 4.11: k-means clustering outcome of the gross movement pattern.

Finally, in figure 4.12 the projection after the cleaning is illustrated, showing an enhancement of the class separation compared to figure 4.10.

Nel documento Politecnico di Torino Dipartimento di Ingegneria Meccanica e Aerospaziale (DIMEAS) (pagine 73-76)