Smart Technologies Empowering Assistive Systems for People with Disabilities

(1)

UNIVERSITÁ DIPISA

DOTTORATO DI RICERCA ININGEGNERIA DELL’INFORMAZIONE

S

MART

T

ECHNOLOGIES

E

MPOWERING

A

SSISTIVE

S

YSTEMS FOR

PEOPLE WITH

DISABILITIES

DOCTORALTHESIS

Author

Davide Mulfari

Tutor

Prof. Luca Fanucci

Reviewers

Prof. Fabio Salice Dr. Stuart Cunningham

The Coordinator of the PhD Program

Prof. Fulvio Gini

Pisa, Academic Year 2018 - 2019 XXXII Cycle

(2)

(3)

"The majority of people with disabilities in the world have an extremely difficult time with everyday survival, let alone productive employment and personal fulfilment." Stephen William Hawking

"Finally someone will understand me, don’t care if it will be a machine" DM

(4)

(5)

Acknowledgements

The author wishes to thank all the people who contributed to the research activity presented in this thesis, in particular Mimmo, Nino, Filippo, Christian, Sergio, Da-vide, Carmela, Tania, Mario, Wiola, Graziella, Tommaso, Gabriele, Gianluca, Marco, Alessandra, Giuseppe, Frank, Salvatore, Valeria, Arcangelo, Maurizio, Emilia.

A special thank you to my supervisor, Prof. Luca Fanucci, for providing guidance and feedback throughout this research.

Many thanks to Prof. Marco Luise for supporting the PhD program with passion and competence.

Finally, thank you to Prof. Fulvio Gini for his patience during the completion of this dissertation.

(6)

(7)

Summary

In the framework of assistive technologies for people with disabilities, this disserta-tion aims to investigate potential benefits of smart technologies to achieve open source and low cost solutions. Recent advances in the field of Internet of Things, artificial intelligence and machine learning have led to the development of solutions supporting the interaction with computing devices in smart spaces, such as smart cities and homes. We propose to analyse a possible synergy among such heterogeneous technologies with the aim of addressing open challenges in assistive technology research and application. More specifically, we focus our attention on three separate scenarios, including:

• Isolated word recognition system for users with dysarthria and other speech disor-ders. This research uses machine learning to achieve a speaker dependent isolated word recognition tool intended for people with speech disorders, in particular for those with dysarthria, i.e., a neuromotor speech impairment associated with physi-cal disabilities. Within the field of automatic speech recognition (ASR), nowadays standard approaches and popular voice assistant solutions perform poorly on atyp-ical speech processing, so users with disabilities, especially those with reduced motor skills and dysarthria, cannot benefit from voice - driven services in many different application scenarios, such as smart home automation. To address these issues, a custom software solution has been prototyped using a speaker dependent approach: it recognizes predefined spoken words from users with speech impair-ments who shared their speech samples in order to contribute to the training of the speech recognition model. Early results are promising, we also present an use case for voice commands recognition considering a simple smart home scenario. • Computer vision aid for people with visual impairments. This application benefits

from open source deep learning techniques and models in the field of artificial in-telligence to achieve a wearable computer vision (CV) system intended for people who are blind or visually impaired. An initial prototype has been implemented us-ing low cost pieces of hardware, with the purpose of real time classifyus-ing objects in the user’s surrounding space. To this aim, visual data are captured by a camera located on the user’s glasses and they are managed by a single board computer for the on board processing, while the developed software relies on an off-line

(8)

speech functionality to provide its user with an audio description of the processed stills’ content.

• Internet of Things technologies to support the human interaction with smart com-puting devices for users with severe motor disabilities. Achieving alternative hu-man computer interfaces is critical to support people with reduced motor skills who cannot use traditional input interfaces to control smart computing devices. In light of this, IoT and smart sensors may serve as assistive technology to al-low us to control a plenty of connected devices accepting a few number of input commands. Among these devices, there are many video conference equipments such as Internet Protocol networked cameras. Specifically, we propose the initial development of a personalized human computer interface for a motorized pan tilt zoom camera. Several application scenarios may be imagined for this system: it may be exploited to support distance learning activities for remote students with disabilities, as discussed in our use case.

(9)

Sommario

Questa tesi si pone l’obiettivo di analizzare le potenziali ricadute tecnologiche delle cosiddette “smart technologies” nel settore dell’assistive technology, attraverso l’im-plementazione di prototipi a basso costo che sfruttano componenti hardware e software open source. Negli ultimi anni, l’innovazione tecnologica nei settori del’Internet of Things (IoT), dell’intelligenza artificiale e del machine learning, ha permesso lo svi-luppo di nuovi sistemi e dispositivi “intelligenti” per facilitare l’interazione uomo – macchina in ambito di smart spaces, come smart city e smart home. In questa tesi, ver-ranno analizzate ed individuate alcune possibili sinergie tra queste tecnologie, al fine di affrontare tematiche aperte all’interno del panorama attuale della tecnologia assistiva.

Gli scenari applicativi individuati per questa analisi sono:

• Sistema di riconoscimento vocale per persone con disartria ed altre disabilità del linguaggio.

Questa ricerca sfrutta tecnologie machine learning per realizzare un sistema spea-ker dependent in grado di riconoscere un set limitato di comandi vocali pronun-ciati da persone con disabilità del linguaggio, in particolare disartria, ovvero un disturbo neuromotorio della parola spesso associato a disabilità fisiche, anche gra-vi. L’attuale tecnologia di riconoscimento vocale automatico (ASR – Automatic Speech Recognition) si rivela inefficacie per il parlato disartrico, gli approcci stan-dard ASR evidenziano limiti nell’elaborazione del segnale vocale, e i sistemi di voice assistant, in particolare quelli basati su servizi cloud, attualmente si mani-festano inefficienti in presenza di voci affette da disabilità. Di conseguenza molti disabili, in particolare le persone con disartria e ridotte abilità motorie, al giorno d’oggi non possono trarre benefici da un proficuo utilizzo dei servizi di riconosci-mento vocale per facilitare l’interazione con dispositivi informatici in diversi am-biti applicativi, come smart home. Allo scopo di fornire una risposta tecnologica a tale esigenza, in questa tesi è stato implementato un sistema ASR personalizzato in base ad un approccio speaker – dependent, con l’obbiettivo di riconoscere co-mandi vocali pronunciati dalle stesse persone che hanno contribuito ad addestrare il sistema con i loro campioni vocali. I primi test sperimentali hanno evidenziato risultati positivi. Inoltre, il sistema di riconoscimento vocale sviluppato è stato

(10)

inserito in un semplice scenario di smart home al fine di mostrarne le potenzialità in un caso applicativo reale.

• Sistema di visione artificiale per persone con disabilità visive.

Questa ricerca sfrutta tecnologie deep learning e modelli computazionali open source per realizzare un sistema indossabile di visione artificiale (computer vi-sion) per persone non vedenti. Ciò è stato prototipato utilizzando componenti hardware a basso costo, con l’obiettivo di classificare, in tempo reale, gli oggetti nello spazio circostante la persona. Il prototipo reallizzato acquisisce fotogrammi da una telecamera posizionata sugli occhiali dell’utente, e ne elabora il contenuto sfruttando le potenzialità di un sistema embedded. Al termine di tale processo di riconoscimento, il sistema di computer vision fornisce all’utente una descrizione audio degli oggetti rilevati tramite un servizio integrato di text to speech.

• Sistemi IoT come ausili ed interfacce uomo – computer per persone con gravi disabilità motorie.

Questa ricerca studia le potenzialità dell’Internet of Things e dei cosiddetti ”smart sensors” per realizzare interfacce uomo – computer alternative, allo scopo di sup-portare utenti con disabilità motorie che non possono accedere a dispositivi in-formatici usando strumenti di input standard. In questo ambito si analizza come smart sensors e sistemi IoT possano essere sfruttati come tool di assistive tech-nology per controllare vari strumenti elettronici che accettano in input un numero limitato di comandi. Un esempio di tali dispositivi è rappresentato dai sistemi per video conferenza, come le telecamere motorizzate accessibili tramite la rete Internet. È stato quindi sviluppato un primo prototipo di interfaccia alternativa per il controllo remoto di una camera dotata di funzionalità pan / tilt, e ciò è stato inserito in un caso di studio finalizzato al supporto di attività di distance learning per studenti con disabilità.

(11)

List of publications

International Journals

1. Bruneo, D., Distefano, S., Giacobbe, M., Minnolo, A. L., Longo, F., Merlino, G., Mulfari, D., ... & Puliafito, C. (2019). An iot service ecosystem for smart cities: The # smartme project. Internet of Things, 5, 12-33.

International Conferences with Peer Review

1. Mulfari, D., Meoni, G., Marini, M., & Fanucci, L. (2019). A Machine Learning Assistive Solution for Users with Dysarthria, Technology and Disability, 31, S137-S146.

2. Mulfari, D., Meoni, G., & Fanucci, L. (2018, November). Machine Learning in Assistive Technology: a Solution for People with Dysarthria. Proceedings of the 4th EAI International Conference on Smart Objects and Technologies for Social Good(pp. 308-309). ACM.

3. Mulfari, D., & Mulfari, S. (2018, November). Assistive Technologies to Support Distance Learning for Students with Disabilities. Proceedings of the 4th EAI In-ternational Conference on Smart Objects and Technologies for Social Good (pp. 312-313). ACM.

4. Mulfari, D., Meoni, G., Marini, M., & Fanucci, L. (2018, July). Towards a Deep Learning Based ASR System for Users with Dysarthria. International Confer-ence on Computers Helping People with Special Needs(pp. 554-557). Springer, Cham.

5. Mulfari, D. (2018, April). A TensorFlow-based Assistive Technology System for Users with Visual Impairments. Proceedings of the Internet of Accessible Things (p. 11). ACM.

6. Mulfari, D., Palla, A., & Fanucci, L. (2017). Using TensorFlow to Design Assis-tive Technologies for People with Visual Impairments. 11th International Confer-ence on Computer Graphics, Visualization, Computer Vision and Image Process-ing(pp. 110-116). IADIS.

(12)

7. Mulfari, D., Palla, A., & Fanucci, L. (2017). Embedded Systems and TensorFlow Frameworks as Assistive Technology Solutions. Studies in health technology and informatics242, 396-400.

Others

1. Mulfari, D., Meoni, G., Marini, M., & Fanucci, L. (2019). A Machine Learning Automatic Speech Recognition Platform for Users with Dysarthria. University Booth, DATE 2019 Conference

2. Mulfari, D. & Fanucci, L. (2019, April). Machine Learning come Tecnologia Assistiva per Persone con Disartria. Conferenza Italiana sulla Comunicazione Aumentativa e Alternativa, 2019

3. Mulfari D., (2017). TerzOcchio project presentation. Shaping the Future Of Pedi-atrics Conference

(13)

List of Figures

1.1 Percentage of people with disabilities in world [108]. . . 4

3.1 CNN architecture for isolated word recognition. . . 21

3.2 Example of a speech signal in time domain: the volume keyword. . . . 22

3.3 Example of a spectrogram: the volume keyword. . . 22

3.4 The cnn-trad-fpool3 model viewed as a TensorFlow graph. . . . 23

3.5 A desktop microphone connected to the smart phone running the Capis-ciAMe app. . . 24

3.6 Login screen for the CapisciAMe desktop application. . . 25

3.7 Registration form. . . 25

3.8 CapisciAMe application: settings for configuring training session. . . . 27

3.9 A computer’s screenshot: the user (on the right) uses a desktop micro-phone and speaks aloud the single word suggested by the software (on the left). . . 27

3.10 CapisciAMe desktop application: your recordings manager. . . 27

3.11 CapisciAMe app installations per month. . . 28

3.12 Login screen for the CapisciAMe app. . . 29

3.13 Nickname registration screen for the CapisciAMe app. . . 29

3.14 App home screen. . . 30

3.15 App home menu. . . 30

3.16 App settings screen. . . 31

3.17 Select how many times the user is required to pronounce each keyword in the training session. . . 31

3.18 Example of keywords selection for training. . . 32

3.19 Example of the app screenshot during a training session. . . 32

3.20 Example of My recordings screen. . . 33

3.21 Options to manage speech samples. . . 33

3.22 Uploading audio data to contribute speech model training. . . 34

3.23 Interface for standard recognition mode. . . 35

3.24 In standard recognition mode, the app listens for user’s spoken command. 35 3.25 Example of a keyword recognized in standard recognition mode. . . 36

(16)

3.26 Example of a keyword not recognized in standard recognition mode. . . 36 3.27 In supervised recognition mode, the app prompts you to utter the

dis-played keyword. . . 37 3.28 Example of a keyword not recognized in supervised recognition mode. . 37 3.29 Example of a keyword recognized in supervised recognition mode. . . . 38 3.30 Block diagram of an alternative input control system for CapisciAMe:

the control box act as an interface between external sensors and our app. 39 3.31 Examples of assistive technology input devices . . . 40 3.32 A user records his own speech samples using the internal hands-free

smartphone’s microphone . . . 40 3.33 A user with a wearable microphone to use the app . . . 41 3.34 A user records his own speech samples using a desktop microphone

paired with the smartphone . . . 41 3.35 M01 user’s recognition status: correct predictions (%) in keywords

recog-nition considering three settings of the Global speech model. . . 48 3.36 M02 user’s recognition status: correct predictions in keywords

recogni-tion considering three settings of the global speech model. . . 50 3.37 M03 user’s recognition status: correct predictions in keywords (%)

spot-ting considering three setspot-tings of the global speech model. . . 52 3.38 Global mode 20 speech model configuration: accuracy in keywords

recognition (%) considering the selected users. . . 53 3.39 Global mode 20 speech model configuration: mean accuracy for each

keyword (%). Error bars show the minimum and the maximum accuracy level per class. . . 53 3.40 Global mode 30 speech model configuration: accuracy in keywords

keyword (%). Error bars show the minimum and the maximum accuracy level per class. . . 54 3.42 Global mode 40 speech model configuration: accuracy in keywords

keyword (%). Error bars show the minimum and the maximum accuracy level per class. . . 55 3.44 Table summarizing our results. The accuracy level in keywords

recogni-tion is expressed in percentage, considering speech model configurarecogni-tions and settings. . . 56 3.45 Accuracy level (%) in keywords recognition per user, by considering

three global speech model settings. . . 56 3.46 User speaks on a smart phone app and interacts vocally with smart

de-vices, thanks to OpenHAB framework deployed on a single board com-puter. . . 58 3.47 Main hardware components for the real smart home scenario. . . 58 3.48 User interacts with a smart plug via personalized speech commands

pro-cessed by our app. . . 59 3.49 Get the CapisciAMe App on Google Play Store. . . 61

(17)

List of Figures

4.1 AiPoly vision app recognizes an object. . . 63

4.2 The vision sensor is mounted on the user’s glasses. . . 64

4.3 Hardware components of the computer vision system. . . 66

4.4 Overview of the image classification task with TensorFlow. . . 67

4.5 Artwork recognition task. . . 68

4.6 By using the embedded computer vision system, a person with visual impairments gets an audio description of a painting according to her preferences and expertise. . . 69

4.7 In a smart city scenario (Messina), the system helps its user to get an au-dio description of a monument according to her preferences and expertise. 70 4.8 The system classifies correctly 47 objects by considering 50 pictures. . 71

4.9 The objects classification process related to the confidence score. . . 72

5.1 Reference scenario for the UGA project [101]. . . 76

5.2 Low cost hardware components for the UGA project [100]. . . 76

5.3 Reference scenario. The control box as an interface between sensors and PTZ IP camera. . . 78

5.4 A mobile device acts as control box and it interacts with the IP camera over the Internet. . . 80

5.5 Graphical user interface for the “four areas” app. . . 81

5.6 The inertial motion unit is the small device attached to the user’s glasses in order to achieve a head - driven PTZ camera. . . 81

5.7 An inertial motion sensor is located on the user’s hand. The sensor’s data are processed by an embedded system board that controls the IP camera. 82 A.1 Example of a convolutional neural network architecture. . . 93

A.2 The 4-layer model for smart cities. . . 95

A.3 OpenHAB general architecture [63]. . . 97

(18)

List of Tables

3.1 Structure of the cnn-trad-fpool3 model . . . 21

3.2 Comparison between CapisciAMe versions in terms of advantages and disadvantages . . . 26

3.3 Hardware software configuration of the training environment. . . 42

3.4 Parameters and values for the speech model training procedure . . . 43

3.5 Characteristics of participants . . . 43

3.6 Age ranges of participants . . . 44

3.7 Dysarthria severity level of the participants . . . 44

3.8 Ten participants divided into four user groups . . . 44

3.9 Properties of audio files . . . 44

3.10 Confusion matrix: M01 user’s recognition status by considering the model trained on a maximum of 20 examples per keyword from every contributor (Global:mode20). . . 46

3.13 M01 user’s recognition status: accuracy level (%) for each keyword and its mean value, by considering three settings of the speech model. . . . 47

3.14 Confusion matrix: M02 user’s recognition status by considering the model trained on a maximum of 20 examples per keyword from every contributor. (Global:mode20). . . 48

3.16 Confusion matrix: M02 user’s recognition status by considering the model trained a maximum of 40 examples per keyword from every con-tributor (Global:mode40). . . 49

(19)

List of Tables

3.17 M02 user’s recognition status: accuracy level (%) for each keyword and its mean value, by considering three settings of the Global speech model. 49 3.18 Confusion matrix: M03 user’s recognition status by considering the

model trained on a maximum of 20 examples per keyword from every contributor (Global:mode20). . . 50 3.19 Confusion matrix: M03 user’s recognition status by considering the

model trained on a maximum of 30 examples per keyword from every contributor (Global:mode30). . . 51 3.20 Confusion matrix: M03 user’s recognition status by considering the

model trained on a maximum of 40 examples per keyword from every contributor (Global:mode40). . . 51 3.21 M03 user’s recognition status: accuracy level (%) for each keyword and

its mean value, by considering three settings of the Global speech model. 51

(20)

Foreword

I am Davide Mulfari, I have spastic dystonic quadriplegia resulting from my infant cerebral palsy. Assistive technologies play a critical role in my everyday life. When I was 3 years old, an engineer from the Ausilioteca AIAS in Bologna suggested the more suitable aid for me, namely a keyguard. This simple aid, placed on the top of a standard computer keyboard, (in conjunction with the ability of emulating mouse actions by means of numerical keypad) allowed (and allows) me to interact with computers, to study, to develop software and to work as computer engineer.

Severe dysarthria is associated with my motor disabilities. It prevents me from us-ing today’s automatic speech recognition (ASR) systems and voice - controlled devices. Over the years, I have tried to use multiple computer voice recognition software solu-tions, e.g., various Dragon releases, many times, but all of my attempts regularly failed at the training process. This situation was very frustrating for me. Recent advancements in automatic speech recognition do not help me to breakdown these technological barri-ers. Virtual assistant services, such as Google Assistant, Amazon Alexa, Apple Siri, do not work at all on my speaking, therefore I cannot rely on this technology to promote my autonomy and safety in my home. Smart speakers are currently not accessible for me, they force me to repeat many times speech commands without any response; so speech recognition systems available on the consumer market function poorly for me.

In the field of assistive technologies, automatic speech recognition in presence of dysarthria is an open challenge. Many researchers, primarily at the University of Sheffield, are investigating very promising approaches towards ASR for users with dysarthria, by considering English as the main language. However such researches are currently away from concrete technological aids that I can obtain.

During my PhD program in Information Engineering at the University of Pisa, I have investigated the usage of machine learning as assistive technology. In particular, during the first year, I studied convolutional neural network models to build computer vision systems for visually impaired people. In subsequent years, my tutor, Prof. Fanucci, sug-gested to me to apply similar machine learning based approaches in automatic speech recognition for Italian native speakers with dysarthria, like me. In this thesis, I am going to present my initial results.

According to a speaker dependent method, by considering isolated word recognition

(21)

List of Tables

tasks within a limited vocabulary, I have appreciated a good accuracy level in keywords recognition (greater than 80%) on my personal speech. Consequently, I plan to ex-tend this methodology to other people with speech disorders who wish to share their speech samples to contribute to enrich my speech model. This task may be difficult to accomplish, but I am ready to work with strong determination in order to pursue my goals.

Davide Mulfari

(22)

CHAPTER

1

Introduction

1.1

Disability and technology

Over the last decades, the progressive advancement in Information and Communication Technologies (ICT) allows us to benefit rapidly from new classes of innovative devices and services, at decreasing costs. Within such scenarios, people with disabilities are often marginalised by the newest information technologies that are increasingly woven into our everyday life. In particular, by considering the field of assistive technologies, just few applications (designed for users with disabilities) take advantage from techno-logical improvements in ICT within a certain period of time. This is due to multiple reasons.

Disability represents a complex concept: it is a collective term for impairments, ac-tivity limitations and participation restrictions, referring to the negative aspects of the interaction between an individual (with a health condition) and that individual’s con-textual factors (environmental and personal factors) [86]. Recent reports suggest than more than one billion people (about 15.3% of the total population) [108] in the world live with some form of disability, about 2.9% of users experience severe impairments, as described in figure 1.1. In the years ahead, disability will be an even greater concern because its prevalence is on the rise. This is due to ageing populations and the higher risk of disability in older people as well as the global increase in chronic health condi-tions such as diabetes, cardiovascular disease, cancer and mental health disorders [108]. Nowadays, the framework of disabilities includes a heterogeneous set of conditions and users’ personal needs. Multiple types of disabilities exist, in terms of users’ de-mands each person with a disability appears to be unique along with his / her particular requirements, which the current technology tries to fulfil. Consequently, many ICT based assistive solutions, which may really support users with disabilities, are required to be flexible to adapt them to various disability conditions, but, at the same time, such

(23)

Chapter 1. Introduction

Figure 1.1: Percentage of people with disabilities in world [108].

aids require a heavy personalisation process according to specific end user’s needs and preferences. Indeed, assistive devices have to be selected through a multi-disciplinary analysis based on the specific disability, the function to be restored or augmented, the user’s wishes, environment in which the device will be used and its cost [46].

Considering this, since the overall number of people with disabilities is increasing constantly, the global assistive technology market is soaring because users need for aids in order to enhance or maintain their functional capabilities. On the other hand, the extreme necessity for high aid personalisation makes the market very fragmented with an extremely reduced number of consumers for each product. As a consequence, assistive product generally available on the market are featured by outdated information technologies at high prices.

Similar issues pertain to the assistive technologies research activity field. In fact, this is a sort of marginalised research area: therefore many researchers study the potential benefits of new emerging ICT trends (for example, in the framework of the computer science and engineering) towards the design of new assistive technology solutions for people with a disability. Within this research area, in the present dissertation we investi-gate smart technologies, including machine learning and Internet of Things, as enablers of customizable AT solutions. We propose to explore how these smart systems can be flexible enough to adapt them to their users with disabilities in terms of a personalisa-tion process. In this sense, a modern machine learning approach, characterized by the ability of adjusting intelligent systems from real world examples and experience, may be exploited to predict and to classify states from users with disabilities. Several ap-plications may benefit from this feature: deep neural networks can be trained to detect keywords within speech from a speaker with dysarthria, as discussed in chapter 3, and to help a visually impaired person to get audio information about his / her surrounding space, as presented in chapter 4. In the depicted scenarios, the current Internet of Things technology plays a critical role thanks to the availability of low cost, performant smart sensors and embedded systems. Therefore this may serve as an encouragement towards

(24)

the design of innovative, open source assistive systems for people with disabilities.

1.2

Thesis motivation and aim

Smart Technologies is a collective term for a heterogeneous set of ICT trends and paradigms that have enabled the definition of intelligent environments (such as smart cities) over the last few years. Such spaces integrate plenty of newest information and communication technologies in order to improve the quality of life of the citizens and the available services.

Recent advancements in smart technologies, especially in the field of Internet of Things, have led to the development of many applications in different scenarios involv-ing distributed computinvolv-ing capabilities and smart sensors. In the context of assistive technologies, the potential impact of the newest ICT trends may lead to build unknown digital barriers for people with disabilities, or on the contrary, the growth of smart tech-nology may represent an encouragement towards the development of affordable as-sistive inclusive solutions. Therefore designing asas-sistive systems, in conjunction with the newest ICT trends, may be of critical importance to support those with a disabil-ity living in smart spaces. For these reasons, in this work we investigate the synergy among smart technologies with the aim of empowering assistive technology consider-ing inclusive and intelligent application scenarios. Within this research area, the present dissertation aims to explore the potential benefits of smart technologies toward the de-velopment of personalized assistive technology solutions for people with disabilities. Three separate case studies have been analysed within the framework of artificial intel-ligence, machine learning and Internet of Things.

The presented applications exploit open source pieces of hardware and software, low cost components with the aim of addressing open challenges for the current assistive technology, including:

i. Isolated word recognition system for people with speech disorders, particularly in presence of dysarthria, with the aim of supporting the natural speech interaction with computers and appliances in a personalized smart home context.

ii. Computer vision system for users who are blind or visually impaired, aimed at supporting the awareness of the user’s surrounding environment, considering smart city scenario.

iii. Human computer interfaces based on smart sensors and IoT technologies, espe-cially designed for people with severe motor disabilities.

With regard to the first topic, we propose to investigate the impact of machine learn-ing approaches towards the design of isolated word recognition systems intended for users with speech impairments, especially for those with dysarthria, i.e., a neuromotor speech disorder occurring with severe motor disabilities, leading to a very low intelli-gibility of users’ speaking. In recent years, although the relevant improvement of the technology in the field of the automatic speech recognition, voice assistant platforms available on smart computing systems e.g. virtual assistant services, perform poorly on dysarthric speech processing tasks, as highlighted in my own personal experience. This is mainly due to the extreme variability of the articulatory output, particularly in

(25)

Chapter 1. Introduction

presence of dysarthria. In assistive technology scenarios, atypical speech recognition is an important challenge in order to enable an alternative access method to many smart computing services (e.g., in a smart home automation context) since many users, espe-cially those with dysarthria and reduced motor skills, are currently marginalised by a new wave of speech technology which is not robust to their voices. In order to address these issues, we investigate the use of machine learning in conjunction with convolu-tional neural networks to implement a speaker dependent solution. More specifically, we focus on isolated word recognition tasks with the aim of recognizing predefined voice commands within dysarhric speech. The proposed analysis is based on spectro-gram tools, i.e., visual representation of a sound, and it is based on convolutional neural network speech model. Such a method is not tied to a specific language, it is applicable to many different speaking: collecting audio contribution from people with dysarthria is of critical importance in order to enrich our speech dataset for model training. For these reasons, within our PhD research, specialized software applications have been created with the purpose of enabling users with speech impairments to share their voice samples while they utter selected keywords. In this way, those with dysarthria may re-ally contribute to empower the speech model. Additionre-ally, early results highlight the goodness of the proposed approach, by considering the accuracy level in single word recognition. Several application scenarios may benefit from the future development of our solution. An interesting use case is an accessible smart home speech enabled con-trol, by integrating our recognizer with an open source software framework for a smart home. With this solution, users with speech disabilities can use personalized voice commands to perform basic actions (such as controlling plugs or TV) in an inclusive smart home.

In the framework of artificial intelligence applied to the assistive technology field, we show also a different application. We present an initial prototype of a computer vision system intended to recognize objects in the user’s surroundings and to provide its user with an audio description of the detected things. This aid is mainly intended for people with visual impairments and its core is currently based on TensorFlow, an open source machine learning framework released from Google. We have deployed our prototype on a single board computer with a wearable camera and we employed open source convolutional neural network models with promising results in the field of pattern classification. Moreover, we have also investigated the achievement of personal-ized image classifiers based on convolutional neural networks processed by TensorFlow running on our edge computing platform. In this way, we provide the computer vision system with the functionality of recognizing specific assets within a precise space, for example artworks in a museum. We have applied our application in smart city scenarios towards inclusive and accessible tourism initiatives.

Finally, in the last part of this thesis, we investigate the use of Internet of Things (IoT) technology to support the human interaction with smart computing devices for users with severe motor disabilities. Indeed, in many cases, achieving alternative hu-man computer interfaces is critical to support people with reduced motor skills who cannot use traditional input devices to control smart computing systems. In light of this, IoT and smart sensors may serve as assistive technology to allow us to control a plenty of connected devices accepting a few number of input commands. Among these devices, there are many video conference equipments such as Internet Protocol

(26)

worked cameras. In this context, we propose the initial development of a personalized human computer interface for a motorized pan tilt zoom camera. Several application scenarios may be imagined for this system: it can support distance learning activities for remote students with disabilities, as shown in the depicted scenario.

1.3

Research question

The present work investigates a possible synergy among smart technologies to address specific open challenges within the field of assistive technology for people with disabil-ities. Considering three different applications scenarios: i) isolated word recognition in presence of speech disorders; ii) interaction with surrounding environment for users with visual impairments; iii) interaction with smart devices based on IoT and intelli-gent sensors for those with severe motor disabilities, we propose to evaluate the benefits of a synergy between recent information technology trends toward the development of personalized prototypes.

Within each area of interest, a shared research question can be summarized as fol-lows: how the recent advancement of smart technology may lead to improvements regarding assistive purposes?

1.4

Thesis organisation

This thesis is organised as follows.

Chapter 2 presents some recent trends in assistive technology research and summa-rizes the thesis contributions.

Chapter 3 presents our research, in the field of artificial intelligence and machine learning, aimed at developing a speaker dependent isolated word recognition system intended for users with a speech disorder, particularly dysarthria.

Chapter 4 investigates deep learning based approach towards the development of a wearable computer vision aid for people with visual impairments that exploits open source technologies and affordable hardware components.

Chapter 5 describes the design of alternative IoT based human computer interfaces mainly intended for people with severe motor impairments and we present an applica-tion aimed at supporting distance learning for remote students with disabilities.

Finally, chapter 6 concludes the thesis and gives suggestions for possible future improvements.

(27)

CHAPTER

2

Advances in Assistive Technologies

In the framework of assistive technologies for people with disabilities, many applica-tion scenarios benefit from a synergy between informaapplica-tion technology trends, rather than taking advantage from a specific advancement in a single area of concern. This chapter explores some recent trends in the current assistive technology research and summarizes our thesis contribution.

2.1

Assistive Technology

Assistive Technology (AT) is an umbrella term for a heterogeneous set of technologies and strategies used by individuals with disabilities in order to perform functions that might otherwise be difficult or impossible. AT solutions come in many forms, they can include mobility devices such as walkers and wheelchairs, as well as hardware, software, and peripherals that assist people with disabilities in accessing computers or other information technologies.

Therefore several application scenarios benefit from assistive technologies. For in-stance, people with limited hand function may use a keyboard with a keyguard, large keys or a special mouse to operate a computer, people who are blind may use software that reads text on the screen in a computer generated voice, people with low vision may use software that enlarges screen content, people who are deaf may use a text telephone, or people with speech impairments may use a device that speaks out loud as they enter text via a keyboard.

In light of this, according to the ISO 9999:2016 document, an assistive product is any tool, especially produced or generally available, used by or for people with disabilities:

• for participation;

(28)

• to protect, support, train, measure or substitute for body functions, structure or activities, or

• to prevent impairments, activity limitations or participation restrictions.

Due to the advancement of computer technologies, several ATs have become more sophisticated in recent years [37]. Nowadays, the combination between ICT and as-sistive technology offers new opportunities for everyone, but these opportunities are specifically more significant for users with a disability, who use assistive technology for their daily activities to a higher extent than people in general. At present, assistive technology means that disabled end users are able to participate in all aspects of social life on more equal terms than ever before. It is vital that people are able to benefit on an equal basis from the rapid development of ICT, to enable them to partake in an inclusive and barrier free information society.

Within this framework, the research in assistive technology domain plays a critical role with the aim of bringing the power of technology to those who need it the most, as discussed in the rest of this chapter.

2.2

Recent trends in AT research

This sections aims to analyse some of the recent and evolving areas of concerns in the current assistive technology research. Many researchers studies the latest advance-ments in information and communication technologies in order to empower and to re-alise novel assistive systems for persons with a disability. Specifically, considerable progress is manly linked to a valuable synergy between actual IT trends, rather than benefiting from advances in a limited area of interest. For instance, Internet of Things (IoT) and Artificial Intelligence (AI) technologies have been worked together in many business and other areas since quite some time. In fact, IoT collects huge amounts of data and AI, in particular machine learning, represents an efficient tool to make sense of huge amounts of data. AI is the engine that performs analysis, processes the data, and makes decisions based on this data. Artificial Intelligence (AI) enables un-derstand patterns and therefore helps to make more informed decisions [77]. The use of machine learning, along big data, has opened new opportunities in IoT empowering multiple application scenarios, such as e-health, transportation, robotics, industrial, and automation. A significant strength of AI-IoT combined solutions relies on the availabil-ity on open source machine learning framework, such as TensorFlow, which suit well the hardware requirements of a wide range of low cost mobile devices and low power embedded systems. This may be of critical importance to introduce open source, low price solutions empowering assistive systems for people with disabilities.

Considering the domain of robotics, this synergism between AI and IoT has led to the creation of robots able to interface with humans and their surrounding environment. Indeed, robots are IoT powered devices in themselves since they contain multiple sen-sors and actuators along with AI functions helping them continuously learn and adapt themselves over time [49]. Assistive robotics aims to promote the well-being and inde-pendence of persons with disabilities. Robots may assist users in a wide range of tasks at home (especially in terms of activities for daily living), and so ongoing research includes household robots and rehabilitation robots, among others. Here, interdisci-plinarity is required to achieve the final goal, integrating multiple research areas and

(29)

Chapter 2. Advances in Assistive Technologies

a heterogeneous set of technologies. Indeed, the robots have to adapt their behaviours to the new routines and needs of the users, which are currently important tasks to be solved [143]. To meet this demand, artificial intelligence and machine learning al-gorithms must be developed and deployed in these systems, since the robots cannot be programmed in advance to react to every possible circumstance that might occur during interactions [91]. In this field, the social assistive robots play a significant role because their main task is the interaction with human individuals. Over the last few years, these techniques have become popular for the treatment and diagnosis of autism: the research in this field has presented an increase in user therapy acceptance and improvements in social skills [112]. Preliminary results with preschool children with autism speech dis-orders highlight that interacting with a humanoid robot may facilitate engagement and goal achievement in educational activities [41]. Yang et al. [153] exploit face and emotion recognition to make a Pepper robot adapt a story to the mood of the children. In [69], the researchers work on a technique for face recognition using a humanoid robot NAO to track the faces of the children with autism and measure their concentra-tion during social interacconcentra-tion. Following this line, in [107], the authors propose several activities through the interaction with a Pepper robot, receiving feedback by measuring the users’ smiles. Within the depicted scenarios, the social assistive robotics must be developed with a certain level of autonomy in order to carry out the treatments. This autonomy is directly correlated with the specific manipulator’s level of intelligence in adapting to the environment and the end user’s responses. This is where machine learn-ing comes in, providlearn-ing solutions to the problems these systems must address, such as eye-tracking, and gestures or automatic speech recognition [91]. A different applica-tion field for the assistive robotics concerns with intelligent arms to support people with reduced motor skills, especially wheelchair users. RIMEDIO [109] is an autonomous robotic arm that can perform simple tasks, e.g., knocking a door, press an elevator but-ton. Giuffrida et al. present a low-cost manipulator realizing only simple tasks and controllable by three different graphical human machine interface [50]. The latter are empowered using a You Only Look Once (YOLO) v2 Convolutional Neural Network that analyses the video stream generated by a camera placed on the robotic arm end-effector and recognizes the objects with which the user can interact. The PARLOMA project, developed at the University of Pisa, intends to create a robotic system to allow remote communication between two users who are deafblind, a deafblind person to a deaf person and a deafblind person to a hearing person with a knowledge of sign lan-guage. At the time of writing, this innovative system is in its beta phase, it is mainly intended to transfer tactile sign language messages remotely and in real time, so that a deafblind recipient is able to understand the message conveyed by the signer, i.e., the speaker. The signer communicates using a regular keyboard or signing in front of a low-cost depth camera. Input information are digitally processed, encrypted and trans-mitted reliably over the web. Everywhere in the world, data are received, decrypted and provided to the final deaf-blind user via an haptic interface based on anthropomorphic robotic hands and arms (or many users, if different hands are connected to the web).

Different AI driven applications for assistive technology purpose can be found in the recent literature [87] [117] [9]. More specifically, deep learning strategies have been employed in different assistive technology applications for users with visual dis-abilities [85]. Convolutional neural networks (CNNs) are exploited to improve the

(30)

performance in the autonomous navigation [45]. Nair et al. [106] present an indoor positioning and navigation system that guides a user from point A to point B indoors with high accuracy. In [89], the authors focus on the recognition of road barrier and dedicate to providing assistance based on CNN within specific scenarios where visu-ally impaired people travel in daily environments, such as residential area or working area, and propose a light weight convolutional neural network named KrNet. Following this line, in the present dissertation, as discussed in chapter 4, we work on deep learn-ing strategies and CNN models in order to achieve an open source, low cost, wearable computer vision for people who are blind or visually impaired. More specifically, we consider image classification task, and, thanks to transfer learning procedures, an exist-ing CNN model has been retrained to identify assets within a custom dataset. This aid can help users visual disabilities to recognize objects in a specific environment, such as a museum.

In the framework of AT, many different machine learning applications can be found, for example Brain Computer Interface [137], rehabilitative devices [95] and others [136]. More recent developments have focused on using multisensor fusion approaches and deep learning to develop assistive technology. This is essentially motivated by the relatively improved scalability of these learning architectures as opposed to the tradi-tional feature extraction and classification pipeline, as discussed in [138]. Nowadays, recent advancements in these AI related technologies show several issues of fairness for people with disabilities. For instance, there are cases in which mainstream AI tech-nologies (e.g. automatic speech recognition) have not performed well for users with speech disorders (such as dysarthria [103]) and for those who are deaf or hard of hear-ing [77]. An important aspect of supervised-AI based systems is that they depend on the availability of a wide dataset which is necessary for training the machine learn-ing model, rather than the ruled-based algorithm. Thus, a critical issue is whether the training data (especially if it includes data from people, e.g., faces, voices, etc.) in-cludes representation from a diverse group of people. For instance, in personal experi-ence, we have found that automatic speech recognition tools are not optimal to process voice commands from speakers with dysarthria: as a result, these users face commu-nication problems that can lead to social exclusion, while they are now being further marginalised by a new wave of speech technology that is increasingly woven into ev-eryday life but which is not robust to atypical voice [61]. To address these issues, many researchers investigate machine learning approaches [25] [104]. Within this research direction, in the present dissertation we propose to train a deep learning model for pre-dicting the presence of predefined speech commands (i.e., single words) within atypical speech and we show a real world application in a basic smart home scenario.

2.3

Thesis contribution

As motivated in the previous section, many AT solutions can benefit from a synergism of latest information and communication technologies, e.g., the combination of IoT and AI techniques. This scenario is also empowered by open source software frameworks that can be used to analyze data produced by IoT devices. Within this research area, the present thesis aims to investigate a possible conjunction of such smart technologies to achieve open source and low cost solutions empowering assistive services for

(31)

Chapter 2. Advances in Assistive Technologies

ple with disabilities. Specifically, in chapter 3 we exploit machine learning technology towards the development of a isolated word recognition system intended for users with a speech disorder, in particular for those with dysarthria, i.e., a common neuromotor speech impairment associated with severe physical disabilities. In presence of such disabilities, nowadays automatic speech recognition is an open challenge since stan-dard voice recognition approaches and popular voice driven services are ineffective on atypical speech processing tasks. It is mainly due to the extreme variability of articula-tory output, as in moderate and severe dysarthria conditions. Consequently, many users with speech disorders and reduced motor skills cannot rely on their natural speech to interact with computer systems; at same time they cannot use voice driven interfaces to control connected devices in many application scenarios, e.g., smart home automa-tion, as highlighted in personal own experience. In order to address these issues, we focus our attention on isolated word recognition in presence of dysarthria, by follow-ing a deep learnfollow-ing based method in conjunction with an existfollow-ing convolutional neural network model to build an open source aid for users with dysarthria. However, our approach requires enough data availability for the training of the model; so our major contribution regards the acquisition of speech samples from end users. To this aim, a mobile app has been released to allow those with speech disorders to collect their audio contribution easily. This early solution supports Google TensorFlow framework and requires no cloud services for the voice commands recognition task, which executes on edge computing device, such as a smart phone. Initial results have considered just a few number of involved users; however, as motivated in chapter 3, our analysis shows the goodness of our approach (considering a limited number of participants in our ex-periments) and gives us interesting perspectives for possible future research directions. In chapter 4, we present an open source, low cost computer vision aid for people with visual impairments. This application brings together AI and IoT, it exploits open source deep learning techniques (such as transfer learning) and models in the field of artificial intelligence to achieve a wearable computer vision system intended for people who are blind or visually impaired. An initial prototype has been implemented using low cost pieces of hardware, with the purpose of real time classifying objects in the user’s surrounding space. To this aim, visual data are captured by a camera located on the user’s glasses and they are managed by a single board computer for the on board processing, while the developed software relies on an off-line text-to-speech functionality to provide its user with a description of the processed stills’ content.

Finally, in chapter 5, we use IoT technology to prototype an alternative human com-puter interface: it is critical to support people with reduced motor skills who cannot use traditional input devices to control smart computer-based systems and mobiles. In light of this, IoT and smart sensors may serve as assistive technology to allow us to control a plenty of connected devices accepting a low number of input commands. Among these devices, there are many video conference equipments such as Internet Protocol networked cameras. We propose the initial development of a personalized human com-puter interface for a motorized pan tilt zoom camera. Several application scenarios may be imagined for this system: for example, it can support distance learning activities for remote students with disabilities, as motivated in our use case.

(32)

CHAPTER

3

Machine learning application for users with

speech disorders in smart home scenarios

In this chapter, we investigate the impact of machine learning approaches towards the design of a speaker dependent isolated word recognition system intended for users with speech impairments, especially for those with dysarthria, i.e., a neuromotor speech impairment associated with severe physical disabilities. Within such a scenario, nowa-days automatic speech recognition (ASR) is an open challenge since standard voice recognition approaches and popular voice driven services are ineffective on atypical speech processing tasks. It is mainly due to the extreme variability of articulatory out-put, as in moderate and severe dysarthria conditions. Consequently, many users with speech disorders and reduced motor skills cannot rely on their natural speech to interact with computer systems; at same time they cannot use voice driven interfaces to con-trol connected devices in many application scenarios, e.g., smart home automation, as highlighted in personal own experience.

In order to address these issues, we focus our attention on isolated word recognition in presence of dysarthria and we exploit deep learning technology in conjunction with an existing convolutional neural network model to build a tailored assistive system for users with dysarthria. However, the usage of a machine learning approach requires enough data availability for the training of the model; so one of the major activity regards the acquisition of speech samples from end users. To this purpose, the main contribution of the present thesis is to release a mobile software (app) allowing those with speech disorders to collect their audio contribution voluntarily, whenever they wish. Our app is currently available for free, it uses Google TensorFlow framework and also comes with a dedicated section for the real time word recognition task. This process requires no cloud services and executes on edge computing device, such as a smart phone.

(33)

Chapter 3. Machine learning application for users with speech disorders in smart home scenarios

Recognizing just a few number of isolated speech commands from a person with dysarthria and severe physical disabilities may act as an enabler for an effective en-vironmental control within a personalized smart home context. For these reasons, we propose to integrate our software solution with an open source home automation plat-form, such as OpenHAB.

Initial experimental results, considering a few number of users with speech impair-ments, show the goodness of our approach and give us interesting perspectives for possible future research directions.

3.1

Speech disorders and dysarthria

Speech disordersis a collective term for a wide range of abilities and differences related to an impairment of the articulation of speech sounds, fluency and/or voice [8]. As a result user’s speaking is atypical in volume, rate, or quality that may affect the ability of the talker to be understood.

Within the framework of speech disorders, dysarthria is one of the most common status that can result from congenital conditions or it can be acquired at any age as the result of neurologic injury or disease. Congenital causes of dysarthric speech, i.e., the speech produced by people with dysarthria, are often caused by some sort of asphyxia-tion of the brain, inhibiting normal development in the speech – motor areas. Cerebral palsy is among the most common cause for the aforementioned speech impairments affecting approximately 2 in 1000 live births in Europe [74], more than 90% of whom are dysarthric throughout adulthood. Dysarthria may also result from traumatic causes, including cerebro-vascular stroke affecting approximately 1% of the population aged 45 to 64, and 5% of those aged 65+, with the severity of impairment varying with the amount of cerebral damage. In addition, other sources of dysarthria include multiple sclerosis, Parkinson’s disease [40], myasthenia gravis and amyotrophic lateral sclerosis (ALS). ALS is a progressive neurodegenerative disorder that affects speech production, among other things; according to recent data, it is estimated there are around 450,000 people living with ALS worldwide [52], dysarthria occurs in more than 80% of ALS patients and may cause major disability [145].

In most cases, there is a strong relation between motor impairments and the presence of dysarthric speech [82]. With reference to the clinical literature, dysarthria is defined as a complex set of neuromotor speech impairmnents resulting from abnormalities in the strength, speed, range, steadiness, tone or accuracy of movements required for con-trol of the respiratory, phonatory, resonatory, articulatory and prosodic aspect of speech production [43]. The responsible pathophysiologic disturbances are due to central or peripheral nervous system abnormalities and most often reflect weakness, spasticity, incoordination, involuntary movements or variable muscle tone [79]. Multiple types of dysarthria exist [36], as reported below:

• Flaccid: it is due to weakness in muscles supplied by cranial or spinal nerves that innervate respiratory laryngeal or articulatory structures.

• Spastic: it is usually associated with bilateral disorders of the upper motor neuron system. User’s speaking is characterized by strained phonation, imprecise place-ment of the articulators, incomplete consonant closure, and reduced voice onset time distinctions between voiced and unvoiced stops [114].

(34)

3.2. Issues in speech recognition

• Ataxic: it is associated with disorders of the cerebellar control circuit.

• Hypokinetic and hyperkinetic: it is associated with disorders of the basal ganglia control circuit.

• Unilateral upper motor neuron: it is related unilateral disorders of the upper motor neuron system. This mostly commonly results from stroke affecting upper motor neuron pathways.

• Mixed: it reflects combination of two or more of the single dysarthria types. This condition occurs more frequently than any single dysarthria type in many clinical settings. In fact, some diseases are associated only with a specific mix; for exam-ple, the combination between flaccid and spastic is mainly related to ALS [80].

3.2

Issues in speech recognition

Dysarthria and the related speech disabilities lead to very low intelligibility of the users’ speaking, as a result this kind of speech is difficult to understand by both humans (espe-cially for unfamiliar listeners) and machines. This is mainly due to the extreme variabil-ity of articulatory output: the speech of a person with dysarthria may vary considerably depending on time of day, level of stress, level of fatigue and presence / absence of several environmental factors. Moreover, substantial variability exists among speakers with dysarthria because of the difference in severity level and involvement of various aspect of the speech production system [67].

Automatic speech recognition, i.e., the process by which a computer is able to rec-ognize and act upon spoken language or utterances, represents an open challenge in assistive technology field because standard approaches are ineffective for those with such disabilities, particularly for people with severe dysarthria. Therefore these users face communication problems that can lead to social exclusion, while they are now being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but which is not robust to atypical voice [61]. Recent advances in voice recognition and natural language processing have led to integrate on -line automatic speech recognition into our smart phones, smart watches, smart home devices, and smart speakers. Research conducted by Pew Research Center estimated that approximately 46% of U.S. adults use a voice assistant [73]; additionally, other reports found that approximately 43 million American adults own a smart speaker [73]. At the present, people are using this technology to perform a multitude of everyday tasks in their homes like playing the news, setting timers, home automation, and more. However, the aforementioned voice - driven services are not curently accessible for all users. At present, ASR solutions available on voice assistant services are not optimal for atypical speech processing, so many users (approximately 9.4 million adults in the United States have trouble using their voices [73] as a means to interact with computing devices). cannot benefit from such technologies in many application scenarios e.g., smart environments interaction. In such contexts, the speech interaction with intelligent platforms (for example, smart speakers) may be of critical importance for those with speech disabilities and reduced motor skills, however they face severe difficulties in accessing smart appliance services via speech commands, as highlighted in my own personal experience.

(35)

Chapter 3. Machine learning application for users with speech disorders in smart home scenarios

Recently, although the relevant improvement of the information and communication technologies allowed conceiving extrely efficient solutions for speech recognition, spe-cific aid systems for users with speech disabilities do not exist on the market. According to the recent literature and research [126], two methods are used for speech recognition in presence of disabilities: deep signal analysis of voice and machine learning algo-rithms. While the former methods do not provide any reliable results for people with dysarthria, the machine learning approach, using deep neural networks [144], seems to be more promising even if an effective implementation for such users has to be demon-strated yet [104].

To address the aforementioned issues, in the present chapter we propose machine learning based approaches in conjunction with convolutional neural networks to build a speaker dependent isolated word recognition system for users with dysarthria. Our effort is to train a deep learning model for predicting the presence of predefined speech commands (i.e., single words) within atypical speech. The proposed approach requires enough data availability for the training of the model; so one of the major activity re-gards the acquisition of speech samples from end users. To this purpose, the main con-tribution of the present work is to release a mobile software (app) allowing those with speech disorders to collect their audio contribution easily. The proposed methodology does not aim to be innovative respect to the current state of art in automatic speech recognition technology; we intend to highlight how a speaker dependent approach, in case of dysarthria and motor impairments, can serve to recognize a few number of pre-defined utterances allowing a person with a disability to interact with smart appliances via personalised speech commands.

3.3

Related work

In this section, we analyse a literature review on speech recognition in presence of speech impairments and we select work focused on supporting the interaction with computer systems and smart devices for those with such disabilities. Nowadays, sev-eral reports suggest that there is an ever-growing need to improve human-to-machine interaction for these groups, with the purpose of promoting autonomy, wellbeing and independence [134].

In the framework of assistive technologies, many researchers used speech technol-ogy (and, in general, ASR strategies) as solutions for accessing computers and tra-ditional desktop environments; here, as an example, users can compose documents through dictation applications [132]. A primary reason for using a voice recognition software package exists when users with atypical speech have additional motor impa/ir-ments that preclude or severely limit their use of standard computer input devices [67], e.g., keyboard, mouse or touch screen. In this sense, ASR act as a piece of alternative human computer interface in order to improve accessibility in mainstream operating systems and to interact with mobile devices, e.g., smart phones, over recent years. In [59], Hawley presents an overview about the suitability and performance of speech recognition for computer access by people with disabilities, including people with dysarthria. The author shows that, given adequate time, training, and support, com-mercial speech recognition platforms for personal computers are often appropriate for people with no, mild, or moderate speech disabilities. People with dysarthria achieve

(36)

3.3. Related work

lower recognition rates, but speech recognition can still be a useful input method for some individuals, however natural speech as a mean of controlling a wide range of electronic devices is more troublesome, especially for people with severe communi-cations needs and dysarthria. In the subsequent years, to overcome this kind of is-sues, researchers investigated new methods and proposed tailored ASR systems [73] in presence of dysarthria, e.g., by using hidden Markov models [58] [115], HMM based techniques [133] or articulatory dynamic Bayes networks [125].

Because of their potential to inform our understanding of the neural control of the speech, and because their prevalence in frequently occurring neurological diseases is high and their functional effects are significant, dysarthria draws considerable atten-tion from both clinicians and researchers [80] [139] [3]. In particular, there has been significant progress in Clinical Applications of Speech Technology (CAST), within a plenty of settings. CloudCAST is a research project aimed at providing a route to bring automatic learning and speech recognition technology to professionals who deal with speech problems [54], such as therapists, pathologists, AT experts and teachers, creat-ing a self-sufficient community workcreat-ing on that complex system. Such a project aims to achieve this by creating an internet-based and free resource, such as a set of software tools for personalized speech recognition and speech therapy [35].

Several articles highlight the issues of conceiving VIVOCA (Voice Input Voice Out-put Communication Aid) for users with severe speech disorders and dysarthria [15] [130] [110]. The idea behind the concept of VIVOCA is the development of a portable (eventually wheelchair - mounted or body - worn) that translates the speech of a person with dysarthria [62] into clear speech output (synthesised or recorded). In [60] Hawley et al. present a VIVOCA solution as a form of augmentative and alternative commu-nication (AAC) device for people with severe speech disabilities: the researchers ap-plied statistical ASR techniques, based on HMMs, to the speech of severely dysarthric speakers to produce speaker dependent recognition models, by introducing a custom methodology for building small vocabulary, speaker-dependent personal recognizers with reduced amounts of training data. With this approach, accurate recognition of severely dysarthric speech has been shown to be feasible for relatively small vocab-ularies [61]. Starting from the VIVOCA experience, in [18] the authors present Vo-caTempo, a tablet-based app aimed at children and young people.

In many cases, a synergy between assistive and home automation technologies [99] can help people with motor and speech disabilities. For instance, in [34] Malavasi et al. propose a possible integration between CloudCAST platform and traditional home automation solutions, by using mobile and low cost devices. Considering a smart home scenario in the Bologna Ausilioteca Living Lab, the authors discuss how bring together OpenHAB, i.e., an open source home automation bus, IoT technologies and custom speech recognition services to build a prototype of a voice-controlled smart home for people with dysarthria. However these authors do not present results due to the absence of a speech recognition model for Italian dysarthric users. In this area, Hawley et al. have developed a limited vocabulary speaker dependent speech recognition application [61] which has greater tolerance to variability of speech, coupled with a computerised training package which assists dysarthric speakers to improve the consistency of their vocalisations and provides more data for recogniser training; in this work, a piece of interface for a speech-controlled environmental control system is also described [61].

Smart Technologies Empowering Assistive Systems for People with Disabilities

S

MART

T

ECHNOLOGIES

E

MPOWERING

A

SSISTIVE

S

YSTEMS FOR

PEOPLE WITH

DISABILITIES

Acknowledgements

Summary

Sommario

List of publications

Contents

List of Figures

List of Tables

Foreword

CHAPTER

1

Introduction

1.1

1.2

1.3

1.4

CHAPTER

2

Advances in Assistive Technologies

2.1

2.2

2.3

CHAPTER

3

Machine learning application for users with

speech disorders in smart home scenarios

3.1

3.2

3.3