• Non ci sono risultati.

Dereverberation with binaural cue preservation for hearing aids

N/A
N/A
Protected

Academic year: 2021

Condividi "Dereverberation with binaural cue preservation for hearing aids"

Copied!
118
0
0

Testo completo

(1)

POLITECNICO DI MILANO

Corso di Laurea Magistrale in Ingegneria Informatica Dipartimento di Elettronica, Informazione e Bioingegneria

DEREVERBERATION WITH BINAURAL CUE

PRESERVATION FOR HEARING AIDS

International Audio Laboratories Erlangen

Relatore: Prof. Dr. Marco Tagliasacchi

Correlatore: Prof. Dr. ir. Emanuël A. P. Habets

Tesi di Laurea di

Matteo Torcoli, matricola 787272

(2)
(3)

Erfüll davon dein Herz so groß es ist,

Und wenn du ganz in dem Gefühle seelig bist,

Nenn es dann, wie du willst:

Nenn’s Glück! Herz! Liebe! Gott!

Ich habe keinen Namen

Dafür. Gefühl ist alles,

Name ist Schall und Rauch,

Umnebelnd Himmelsglut.

(4)

R I N G R A Z I A M E N T I

Il documento che state leggendo costituisce la Tesi di Laurea di Matteo Torcoli per la Laurea Magistrale in Ingegneria Informatica, cur-riculum in Sound and Music Engineering, Politecnico di Milano, cam-pus di Como. Esso descrive il lavoro di ricerca condotto presso gli International Audio Laboratories Erlangen1

, un ente congiunto dell’istituto Fraunhofer IIS2

e dell’Università Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)3

, ubicato a Erlangen, Germa-nia, durante il periodo tra Ottobre 2013 e Aprile 2014. In questa sede, lo studio è stato supervisionato dal Prof. Dr. ir. Emanuël A.P. Habets e dal Dipl. Ing. Sebastian Braun, mantenendosi in contatto con il Prof. Dr. Ing. Marco Tagliasacchi del Politecnico di Milano. L’apporto di queste tre persone a tale lavoro è stato di fondamentale importanza e la loro straordinaria competenza, disponibilità e gentilezza sono mo-tivo di orgoglio. Degno di nota anche l’aiuto del Dipl. Ing. Daniel Marquardt (Università di Oldenburg). Sincera gratitudine va a tutti loro ed anche a Viviam Reyes, Dimitrios Kosmidis, Leo McCormack e Marco Baratelli.

Inoltre e soprattutto l’autore intende ringraziare caldamente mamma, papà e tutta la sua famiglia per l’essenziale ed amorevole supporto, morale ed economico, ed i suoi amici di una vita per essere sempre vicini anche nella distanza.

1 International Audio Laboratories Erlangen:http://www.audiolabs-erlangen.de 2 Fraunhofer IIS:http://www.iis.fraunhofer.de

3 FAU:http://www.fau.eu/

(5)

My supervisors, my friends, and my family are awesome.

A C K N O W L E D G M E N T S

The document you are reading constitutes the Master Thesis of Matteo Torcoli for the Master Of Science in Engineering of Computing Systems, curriculum of Sound and Music Engineering, Politecnico di Milano, Como campus. It describes research conducted at the International Audio Laboratories Erlangen1

, a joint institution of Fraunhofer IIS2

and Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU)3

, located in Erlangen, Germany, during the period between October 2013 and April 2014. There, the study was super-vised by Prof. Dr. ir. Emanuël A.P. Habets and Dipl. Ing. Sebastian Braun, keeping in touch with Prof. Dr. Ing. Marco Tagliasacchi of the Politecnico di Milano. The contributions given by these three persons to this work were of fundamental importance and their extraordinary competence, availability and kindness are reasons to brag. Also the help of Dipl. Ing. Daniel Marquardt (University of Oldenburg) has to be mentioned. They all deserve sincere gratitude, that is also ex-tended to Viviam Reyes, Dimitrios Kosmidis, Leo McCormack, and Marco Baratelli.

Moreover and especially, the author would like to warmly thank his parents and all his family for the essential and lovely support, both moral and economic, and his life friends for being always so close also in the distance.

1 International Audio Laboratories Erlangen:http://www.audiolabs-erlangen.de 2 Fraunhofer IIS:http://www.iis.fraunhofer.de

3 FAU:http://www.fau.eu/

(6)
(7)

C O N T E N T S c o n t e n t s v a c r o n y m s vii l i s t o f f i g u r e s ix l i s t o f ta b l e s xvii s o m m a r i o xviii a b s t r a c t xix 1 i n t r o d u c t i o n 1 1.1 Room Reverberation 2

1.2 Hearing Loss and Hearing Aids 4 1.3 Structure of the Thesis 6

2 b i nau r a l r o o m i m p u l s e r e s p o n s e g e n e r at o r f o r h e a r i n g a i d s 7

2.1 The Image Method 8

2.2 Simulating Binaural Room Impulse Responses 10 2.3 Binaural Room Impulse Responses for Hearing Aids 13

2.3.1 Angle Dependent Reflection Coefficient 13 2.3.2 The Oldenburg Database 14

2.4 Conclusions 19

3 d e r e v e r b e r at i o n b y s pat i a l f i lt e r i n g f o r h e a r -i n g a -i d s 23

3.1 Configuration and Notation 24 3.2 Binaural Spatial Filtering 28

3.2.1 Binaural Multichannel Wiener Filter 29 3.2.2 Speech Distortion Weighted MWF 31 3.2.3 Binaural Cues and Measures 32

3.2.4 Interaural Transfer Function Preservation MWF 34 3.2.5 Interaural Coherence Preservation MWF 35 3.3 Diffuse PSD Estimator 36

3.4 Diffuse Coherence Matrix 38

3.4.1 Based on Free-field Transfer Functions 39 3.4.2 Head-related 39

3.4.3 Comparison 41 4 p e r f o r m a n c e m e a s u r e s 45

4.1 Speech Quality Measures 45

4.1.1 Signal-to-Interference Ratio 46

4.1.2 Perceptual Evaluation of Speech Quality 46 4.1.3 Speech to Reverberation Modulation energy

Ra-tio 47

4.2 Log-spectral Distance 48 4.3 Directivity Index 49

5 s i m u l at i o n a n d e va l uat i o n 51 5.1 Room Setup and Parameters 51

(8)

5.2 Implementation Details 53

5.3 Diffuse PSD Estimates Evaluation 56

5.4 Binaural Spatial Filtering without IC Preservation 60 5.5 Binaural Spatial Filtering with IC Preservation 68 5.6 Evaluation in Noisy Environments 77

5.7 A Real-world Scenario 82

6 f u t u r e w o r k a n d c o n c l u s i o n s 87 6.1 Open Issues and Future Work 87 6.2 Conclusions 89

(9)

A C R O N Y M S

BRIR Binaural Room Impulse Response. 6–8, 11, 13–21, 51, 53, 82, 89

BRTF Binaural Room Transfer Function. 12, 15, 16, 18 BTE Behind-The-Ear. 4, 6, 7, 12, 14, 15, 19, 40, 41, 51, 89

DF Directivity Factor. 49, 50 DI Directivity Index. 49, 50, 64, 68 DOA Direction Of Arrival. 12, 19, 53, 62

HRIR Head-Related Impulse Response. 11, 13–15, 19, 51, 89

HRTF Head-Related Transfer Function. 10–13, 15–17, 19, 40–43, 53, 55, 61, 62, 66

IC Interaural Coherence. xii, xiii, 32–36, 38, 51, 60, 64, 65, 68–71, 75, 76, 78, 81, 87–90

ICP Interaural Coherence Preservation. 68, 69, 71, 75, 76, 78, 80, 83, 85, 86

ILD Interaural Level Difference. 10, 32, 62–64, 67, 71, 73, 74, 78, 80, 85, 86

IR Impulse Response. 7, 15, 16, 51, 82

iSTFT inverse Short Time Fourier Transform. 69–75, 78– 81, 84–86

ITD Interaural Time Difference. 10, 32, 62–64, 67, 71, 73, 74, 78, 80, 85, 86

ITF Interaural Transfer Function. 32–35

LTI Linear Time-Invariant. 7

MMSE Minimum Mean Square Error. 29, 31, 34

MSC Magnitude Squared Coherence. 33, 35, 60–62, 64, 66, 68–72, 78, 79, 84, 88

MVDR Minimum Variance Distortionless Response. 32 MWF Multichannel Wiener Filter. 24, 29, 31, 34, 35, 60–68,

71, 75, 78, 80–86, 88, 89

(10)

MWF-ICP Interaural Coherence Preservation MWF. 35, 68–71, 73–75, 78, 80–83, 85–87, 89, 90

MWF-ITF Interaural Transfer Function preservation MWF. 34

PDF Probability Density Function. 62–64, 67, 71, 73, 74, 78, 80, 85, 86

PESQ Perceptual Evaluation of Speech Quality. 47, 60, 64, 69, 77, 78, 83

PSD Power Spectral Density. 28, 30, 36–38, 45, 48, 49, 51, 53, 55–57, 60, 68–71, 73–78, 80–82, 85, 86, 88, 89

RIR Room Impulse Response. 2, 3, 7–9, 17, 26 RTF Room Transfer Function. 7, 10

SDW-MWF Speech Distortion Weighted MWF. 31, 34, 60 SIR Signal-to-Interference Ratio. 35, 46

SMIR Spherical Microphone array Impulse Response. 13 SNR Signal-to-Noise Ratio. 46, 53

SRMR Speech to Reverberation Modulation energy Ratio. 47, 48

SSS Single Speech Source. 32–34, 60, 62, 64, 78, 83 STFT Short-Time Fourier Transform. 24, 28, 39, 51, 70, 82,

88

TF Transfer Function. 7, 10, 13, 16, 25, 26, 33, 41

WF Wiener Filter. 29

(11)

L I S T O F F I G U R E S

Figure 1 Room reverberation as multi-path propaga-tion. 2

Figure 2 Schematic RIR. 3

Figure 3 Main modern different types of hearing aids. 5

Figure 4 According to [41], just a small portion of hearing impaired people wears hearing aids.

6

Figure 5 Two-dimensional room schematic for a source S, a receiver R and mirror sources. 8 Figure 6 If hL(t)and hR(t)are the HRIRs, xL(t)is

ob-tained by convolving the anechoic x(t) with hL(t). 11

Figure 7 BTE hearing aid and in-ear microphone mounted on a dummy head as done for the Oldenburg database. 15

Figure 8 The TF of the loudspeaker used during the measurements of the Oldenburg database is plotted in blue. The equalizer used in the pro-posed BRIR generator is plotted in red. 16 Figure 9 The HRTF corresponding to point hE (at

az-imuth θ and elevation φ) are obtained by bilin-ear interpolation from measured data points hA, hB, hC, and hD. 17

Figure 10 Two channels of the generated BRIRs and BRTFs for the situation described by Fig. 11. The listing 1 shows the related source code. 18

Figure 11 Room setup of the generation of Fig. 10: the red cross depicts the source, while the blue circle stands for the listener. The line com-ing out the circle can be thought as the nose of the listener, representing his looking direc-tion. 18

Figure 12 Schematic of the structure of the proposed generator of BRIR for hearing aids. 20 Figure 13 Binaural processing configuration. 25

(12)

Figure 14 Schematic RIR: to show the main temporal structure of the sound field at the considered room point. After the direct sound (i.e. x(k)), the first strong reflections occur at first spo-radically, later their temporal density increases rapidly, so that the different reflections become essentially indistinguishable. However, at the same time, the reflections carry less and less energy. All these reflections form the reverber-ation that are usually divided into early and late reverberation. 26

Figure 15 Hearing aids positioned on a sphere model-ing the human head, shown from above (left) and from the right side (right). In both cases the sphere is ”looking” towards the reader’s right. 41

Figure 16 Free-field spherical coherence Γdi f f ,sphf ree (blue), modified sinc coherence Γmod

di f f ,sph (green) and

head-related spherical spatial coherence ΓHRTF

di f f ,sph of the rigid sphere head model (black)

between the central microphone on the left side and either the central one on the right side (solid line) or the front one on the left side (dashed line) . Microphones positions as in Fig. 15. 42

Figure 17 Free-field Γdi f f ,cylf ree (blue) and head-related ΓHRTF

di f f ,cyl exploiting either the sphere head

model (black) or measured Head-Related Transfer Functions (HRTFs) (green). Coherence between the central microphone on the left side and either the central one on the other side (solid line) or the front one on the same side (dashed line). Microphone positions as in Fig. 15. 43

Figure 18 Head-related Γdi f f ,cylHRTF of the rigid sphere head model (dashed line), or measured HRTFs (solid line). Colors are used for distinguishing be-tween sensors as explained in the legend. Please note that the blue and purple dashed lines almost overlap. 43

Figure 19 Intrusive and non-intrusive types of quality assessment. Non-intrusive algorithms do not have access to the reference signal. Examples of treated measures are given in parenthe-ses. 45

(13)

l i s t o f f i g u r e s xi

Figure 20 Simulated room setup, first scenario. The red cross depicts the source, while the blue circle stands for the listener. The line coming out the circle can be thought as the nose of the lis-tener, representing his looking direction. Both azimuth and elevation of the sound source are equal to 0º. Distance of 2 meters between the listener and the source. Reverberation time T60=0.8 seconds. 52

Figure 21 Simulated room setup, second scenario. Az-imuth is equal to 50º and the distance is of 1, 5 meters. The other characteristics are the same as above. 52

Figure 22 Spectrogram of the direct speech component at a reference microphone. Power values in dB.

54

Figure 23 Noisy reverberant signal spectrogram. 54 Figure 24 Spectrogram of the Oracle PSD of the

rever-berant component (φRoracle) for the 0º azimuth scenario. Power values in dB. 57

Figure 25 Maximum likelihood estimate PSD of the re-verberant component (bφlimR ). 57

Figure 26 LS D(k) for the 0º azimuth scenario. Distance between φRoracle and either bφlimR (blue line) or φminR (magenta). 58

Figure 27 Underestimation error LS Dunder(k). 58 Figure 28 Overestimation error LS Dover(k). 58 Figure 29 LS D(k)for the 50º azimuth scenario. Distance

between φRoracle and either bφlimR (blue line) or φminR (magenta). 59

Figure 30 Underestimation error LS Dunder(k). 59 Figure 31 Overestimation error LS Dover(k). 59 Figure 32 MSC of the interference between the reference

microphones averaged over time: at the input (black line); at the output of the typical (rank-1) MWF (red line) and as measured from dummy HRTFs (blue line). 0º azimuth scenario. 61 Figure 33 MSC of the interference between the

refer-ence microphones averaged over time: colors as above. 50º azimuth scenario. 61

(14)

Figure 34 Joint PDFs of ITD and ILD of the input at the reference microphones (top-left), the input in-terference at the reference microphones (top-right), total output (bottom-left), output inter-ference (bottom-right). Rank-1 MWF. Critical band centered at 0.5 kHz. 0º azimuth scenario. The binaural cues of the interference at the out-put are dramatically distorted. 63

Figure 35 Joint PDFs of ITD and ILD: order of the signal as above. Critical band centered at 2 kHz. 0º azimuth scenario. 63

Figure 36 Joint PDF of ITD and ILD of the input direct component. Critical band centered at 0.5 kHz. 0º azimuth scenario. 64

Figure 37 MSC of the interference between the reference microphones averaged over time: at the input (black line); at the output of the full-rank MWF (red line) and as measured from dummy HRTFs (blue line). 0º azimuth scenario. The problem becomes less prominent. 66

Figure 38 MSC of the interference between the refer-ence microphones averaged over time: colors as above. 50º azimuth scenario. 66

Figure 39 Joint PDFs of ITD and ILD of the input at the reference microphones (top-left), the input in-terference at the reference microphones (top-right), total output (bottom-left), output inter-ference (bottom-right). Full-rank MWF. Criti-cal band centered at 0.5 kHz. 0º azimuth sce-nario. The problem is less prominent. 67 Figure 40 Joint PDFs of ITD and ILD: order of the signal

as above. Critical band centered at 2 kHz. 0º azimuth scenario. 67

Figure 41 DI of the full-rank (red line) and the rank-1 (blue line) MWF, computed for the 0º azimuth sce-nario and using the Oracle PSD matrix. 68 Figure 42 MSC of the interference between the

refer-ence microphones averaged over time, calcu-lated before the iSTFT, i.e. in the minimiza-tion domain. µIC = 1. The MSC at the

out-put of the MWF-ICP that uses the Oracle PSD matrix (green line) almost completely overlies the input MSC (black line). 0º azimuth sce-nario. 70

(15)

l i s t o f f i g u r e s xiii

Figure 43 MSC of the interference between the reference microphones averaged over time, calculated before the iSTFT, i.e. in the minimization do-main. µIC = 100. The MSC at the output of

the MWF-ICP that uses the Oracle PSD matrix (green line) completely overlies the input MSC (black line). 0º azimuth scenario. 71

Figure 44 MSC of the interference between the reference microphones averaged over time, calculated af-ter the iSTFT, i.e. in the perceptual domain.

µIC=1. 72

Figure 45 MSC of the interference between the reference microphones averaged over time, calculated af-ter the iSTFT, i.e. in the perceptual domain.

µIC=100. 72

Figure 46 Joint PDFs of ITD and ILD (calculated after iSTFT) of the input at the reference micro-phones (top-left), the input interference at the reference microphones (top-right), total out-put left), outout-put interference (bottom-right). MWF-ICP with µIC = 1 and Oracle

PSD. Critical band centered at 0.5 kHz. 0º az-imuth scenario. 73

Figure 47 Joint PDFs of ITD and ILD (calculated after iSTFT): order of the signal as above. MWF-ICP with µIC = 1 and Oracle PSD. Critical band

centered at 2 kHz. 0º azimuth scenario. The binaural cues of the interference at the output are preserved. 73

Figure 48 Joint PDFs of ITD and ILD (calculated after iSTFT) of the input at the reference micro-phones (top-left), the input interference at the reference microphones (top-right), total out-put left), outout-put interference (bottom-right). MWF-ICP with µIC = 100 and Oracle

PSD. Critical band centered at 0.5 kHz. 0º az-imuth scenario. 74

Figure 49 Joint PDFs of ITD and ILD (calculated after iSTFT): order of the signal as above. MWF-ICP with µIC = 100 and Oracle PSD. Critical band

centered at 2 kHz. 0º azimuth scenario. The binaural cues of the interference at the output are preserved. 74

(16)

Figure 50 Top: spectrogram of the input signal at one reference signal. Below, absolute interference IC error of (from top to bottom): typical MWF; MWF-ICP using interference PSD estimator; MWF-ICP using Oracle PSD. Values calculated after iSTFT, µIC=1, 0º azimuth scenario. Since

the IC can assume values in the range between 1 and −1, the absolute IC error can go from 2 to 0. 75

Figure 51 Spectrogram of the input signal at one refer-ence signal and absolute interferrefer-ence IC errors. 50º azimuth scenario. 76

Figure 52 Spectrogram of the complete noisy reverberant signal at a reference microphone. Power values in dB. SNRin =20 dB. 77

Figure 53 MSC of the interference between the reference microphones averaged over time, calculated before the iSTFT, i.e. in the minimization do-main. SNRin =20 dB and µIC=100. 79 Figure 54 MSC of the interference between the reference

microphones averaged over time, calculated af-ter the iSTFT, i.e. in the perceptual domain. SNRin =20 dB and µIC=100. 79

Figure 55 Joint PDF of ITD and ILD (calculated after iSTFT) of the input at the reference micro-phones (top-left), the input interference at the reference microphones (top-right); total output of the typical MWF without ICP (center-left), output interference of the same filter (center-right); total output of the MWF-ICP with µIC=

100 (bottom-left), output interference of the same filter (bottom-right). Oracle PSD matrix. Critical band centered at 0.5 kHz. SNRin =20 dB. 80

Figure 56 Top: spectrogram of the input signal at one reference signal. Below, absolute interference IC error of (from top to bottom): typical MWF; MWF-ICP using interference PSD estimator; MWF-ICP using Oracle PSD. Values calculated after iSTFT; µIC = 100 and SNRin = 20 dB.

Since the IC can assume values in the range between 1 and −1, the absolute IC error can go from 2 to 0. 81

Figure 57 Spectrogram of the complete noisy reverberant signal at a reference microphone. Power values in dB. Cafeteria scenario. 82

(17)

l i s t o f f i g u r e s xv

Figure 58 MSC of the interference between the reference microphones averaged over time, calculated before the iSTFT, i.e. in the minimization do-main. Cafeteria scenario. 84

Figure 59 MWF of the interference between the reference microphones averaged over time, calculated af-ter the iSTFT, i.e. in the perceptual domain. Cafeteria scenario. 84

Figure 60 Joint PDFs of ITD and ILD (calculated after iSTFT) of the input at the reference micro-phones (top-left), the input interference at the reference microphones (top-right); total output of the typical MWF without ICP (center-left), output interference of the same filter (center-right); total output of the MWF-ICP with µIC=

100 (bottom-left), output interference of the same filter (bottom-right). Oracle PSD matrix. Critical band centered at 0.5 kHz. Cafeteria sce-nario. 85

Figure 61 Joint PDFs of ITD and ILD (calculated after iSTFT) of the input at the reference micro-phones (top-left), the input interference at the reference microphones (top-right); total output of the typical MWF without ICP (center-left), output interference of the same filter (center-right); total output of the MWF-ICP with µIC=

100 (bottom-left), output interference of the same filter (bottom-right). Oracle PSD matrix. Critical band centered at 2 kHz. Cafeteria sce-nario. 86

(18)
(19)

L I S T O F TA B L E S

Table 1 Effective resistivity typical values expressed in kPa · s/m2. A more complete list and descrip-tion can be found in [17]. 15

Table 2 Interference suppression performance of the SDW-MWF-SSS filter (µSDW = 1) applied to

the generated signals, comparing the use of the Oracle PSD matrix with the maximum likeli-hood estimator. The results are shown for the 0º azimuth scenario. The Perceptual Evalua-tion of Speech Quality (PESQ) of the unpro-cessed noisy reverberant signal was 2, 05. The left channel is evaluated. 60

Table 3 Comparison between the MWF-SSS (rank 1) and the full-rank MWF. The objective perfor-mance measures are shown for the 0º azimuth scenario. The left channel is evaluated. 64 Table 4 Objective measures averaged between left

and right channels. Oracle PSD. 0º az-imuth scenario. The PESQ of the unprocessed noisy reverberant signal was 2, 07. Increasing

µIC slightly reduces interference suppression.

69

Table 5 Objective measures averaged between left and right channels. PSD Estimator. 0º azimuth sce-nario. 69

Table 6 Objective measures averaged between left and right channels. SNRin = 20 dB, µ

SDW = 1 and µIC=100. The PESQ of the unprocessed noisy

reverberant signal was 2, 07. 78

Table 7 Objective measures averaged between left and right channels. Cafeteria scenario. The PESQ of the unprocessed noisy reverberant signal was 2, 46. 83

(20)

L’ipoacusia è un indebolimento del sistema uditivo. Si tratta di una

Ipoacusia

delle patologie neurosensoriali più diffuse: coinvolge il 35%-50% della popolazione d’età superiore a 65 anni e più di 80 milioni di persone in Europa. Molti di questi soggetti preferiscono non usare apparec-chi acustici, alcuni per motivi legati a fenomeni di stigmatizzazione o ai costi; ma molti scelgono di non servirsene a causa di limitazioni tecniche.

La qualità e l’intelligibilità del parlato possono essere peggiorate dal rumore di sottofondo così come anche dal riverbero. Oggi

esi-Riverbero

stono tecniche di filtraggio spaziale - che possono essere applicate agli apparecchi acustici - in grado di ridurre significativamente ru-more e riverbero. Se da un lato gli algoritmi monoaurali possono far perdere le stimolazioni binaurali che contengono informazioni

cru-Stimolazioni

binaurali ciali sulla scena acustica che circonda l’ascoltatore, dall’altro, invece,

gli algoritmi binaurali consentono la preservazione di queste fonda-mentali stimolazioni. A questo proposito la comunità scientifica si è concentrata sulla preservazione delle differenze interaurali di in-tensità e tempo ma finora nessun lavoro si è occupato della preser-vazione della coerenza interaurale (IC) del riverbero; quest’ultimo può essere modellato come una componente sonora diffusa altamente tempo-variante. Se la IC dei suoni diffusi è distorta, le componenti dif-fuse possono essere percepite come direzionali: fenomeno fortemente indesiderato.

Il punto di partenza consiste nella realizzazione di un simulatore

Obiettivi

di risposte all’impulso in una stanza, come fossero ricevute da ap-parecchi acustici. La simulazione si basa sul metodo delle sorgenti immagine e su risposte all’impulso misurate in ambiente anecoico. Successivamente si sviluppa una tecnica di filtraggio spaziale per ridurre congiuntamente riverbero e rumore. Quindi, si investiga una tecnica recente che preserva la IC del rumore residuo, assunto come stazionario, e la si estende tenendo presente che, nel caso in analisi, le componenti diffuse residue possono essere altamente non-stazionarie. Infine viene condotta una valutazione esaustiva dei diversi filtri con-frontando gli effetti con e senza preservazione della IC.

(21)

A B S T R A C T

Hearing loss is one of the most prevalent sensorineural impairments, Hearing loss

affecting 35%-50% of the population in the age group of 65 and over, and more than 80 million people in Europe. Still, a lot of hearing impaired people do not like using hearing aids, partly due to stigma-related reasons and cost issues but also due to the technical limita-tions.

Background noise as well as reverberation can decrease the speech Reverberation

quality and intelligibility. Recently developed spatial filtering tech-niques are able to provide a significant amount of noise and reverber-ation reduction. By integrating these techniques in hearing aids, it is possible to improve the speech quality and intelligibility. Monaural

processing can destroy binaural cues, which contain crucial spatial Binaural cues

information about the acoustic scene surrounding the listener. On the other hand, binaural processing allows to maintain certain binaural cues. Most research has focused on the preservation of interaural level and time differences. So far, no work has focused on the preservation of the Interaural Coherence (IC) of the reverberation, which can be modeled as a highly time-varying diffuse sound component. If the IC of diffuse sound is destroyed, the diffuse sound components can be perceived as directional, which is decidedly undesirable.

Within the scope of this thesis, in the first step a tool to simulate Goals

binaural room impulse responses for hearing aids is developed. The simulation is based on the image source method by incorporating impulse responses measured in an anechoic environment by hear-ing aids placed on a head and torso simulator. In the next step, a spatial filtering technique to jointly reduce reverberation and back-ground noise is developed and implemented. Finally, a recently pro-posed technique to preserve the IC of the residual noise (which was assumed to be stationary) is investigated and extended to preserve the IC also of the residual diffuse sound (in our case, highly non-stationary); in this way, also the IC of the residual reverberation can be preserved. A comprehensive evaluation of the performance of the different spatial filters (i.e., with and without the preservation of the IC) is conducted.

(22)
(23)

1

I N T R O D U C T I O N

In speech communication systems, room reverberation often leads Reverberation affects intelligibility

to a degradation of speech quality and intelligibility. More specifi-cally, late reverberation can overlap and mask the desired signal. This detrimental perceptual effect generally rises with increasing distance between the source and the receiver. Background noise and other in-terfering sources may further deteriorate the perceived speech qual-ity. Thus, noise and reverberation suppression can improve speech intelligibility in many different applications (e.g., hands-free devices, voice-controlled systems, mobile telephones), particularly in hearing aids.

Hearing impaired people suffer great difficulties communicating Especially for hearing impaired persons

in noisy and reverberant environments. To achieve the same level of speech understanding, they generally require a signal-to-interference ratio that is much higher than normal hearing person would need. Actually, similar problems also exist for normal hearing persons in difficult situations (e.g., train stations, car, large noisy rooms), espe-cially when using telecommunication systems. These communication difficulties represent a great challenge for our communicating and ageing society.

Even if the majority of hearing impaired people could benefit from Improving hearing aids

wearing hearing aids, just a small portion of them treats the disease. The main reasons that prevent people from wearing hearing aids are stigma-related or due to cost issues, but also associated to technical limitations. The lack of treatment of hearing loss impacts not only the hearing impaired individual but also the entire society.

The current thesis tries to contribute to the research field that aims Goal

to improve hearing aids, such that people can feel more comfortable in wearing them. Our contribution can be summarized as follows: a spatial filtering technique to jointly reduce reverberation and back-ground noise for hearing aids is investigated and extended to pre-serve the binaural impression of the sound scene.

We start from introducing the two main elements that form the problem area:

1. Room reverberation and its perceived effect. How the human auditory system deals with it inspires our research (Sec. 1.1).

2. Hearing loss and a possible treatment: hearing aids (Sec. 1.2).

(24)

Figure 1: Room reverberation as multi-path propagation.

1.1 r o o m r e v e r b e r at i o n

As depicted in Fig. 1, a sound source in a room causes the

propaga-Room reverberation

tion of a sound wave to the receiver through a direct path, as well as through multiple reflection paths, which are a result of the sound being reflected on the walls, surfaces, and objects of the room. Due to differences in the lengths of the propagation paths to the receiver and in the amount of sound energy absorbed by the walls, each sound wavefront arrives with a different amplitude and phase. The term re-verberation designates the presence of delayed and attenuated copies of the source signal in the received signal. Reverberation comes from the Latin reverberare, i.e. “beat back, strike back, cause to rebound:” from re- “back” + verberare “to strike, to beat.”

Fig. 2 shows a schematic Room Impulse Response (RIR), i.e. the main temporal structure of the received sound at a single point in the room (better described in Chapter 2). After the direct sound (the impulse itself), the first strong reflections occur at first sporadically (early reverberation), later their temporal density increases rapidly, so that the different reflections become essentially indistinguishable (late reverberation). However, at the same time, the reflections carry less and less energy.

Reverberation is essential in defining the perceived spatial impres-sion of the sound scene. Besides that, especially late reverberation can

Degrading effects

introduce degrading effects such as blurring of the speech phonemes and overlap-masking phenomena. An example of overlap-masking is two phonemes with similar or different frequency content occurring sequentially with a brief delay between them. Because of reverber-ation, the initial phoneme will endure and may overlap the second phoneme and its associated reverberation. This overlap masks the sec-ond phoneme, [8], [42]. Moreover, if an interfering source is present in the room, the listener has to deal also with its reverberation. Filtering out this undesired signal is an additional difficulty for the listener.

(25)

1.1 room reverberation 3

direct sound

reverberation

time

Figure 2: Schematic RIR.

As observed also in [30], the distortion caused by reverberation in small rooms seems to be largely unnoticed by normal listeners, since their masking threshold is higher than for hearing impaired listeners. Furthermore, there is a difference in speech intelligibility between monaural (i.e. “one ear”) and binaural (i.e. “two ears”) lis-tening. Most listeners benefit from binaural listening when reverber-ation or other interference sources exist. This indicates that the listen-ers’ binaural system processes the two signals from the ears to reduce reverberation. Binaural listening enables the auditory system to esti-mate the distance and the direction of sound sources, and to detect certain sounds at much lower intensity levels than if only one ear is

used. These phenomena are known as binaural hearing advantage; its Binaural hearing advantage

usefulness is shown in [47] and [59]. Reverberation induces spatial diversity, i.e., the direct sound and the reflections arrive from dif-ferent directions. Spatial diversity is apparently exploited when two ears are used. This diversity can also be exploited by acoustic signal processing algorithms via a spatial processor which combines multi-ple microphone signals in order to do reverberation suppression, i.e. dereverberation. In Chapter 3, we will pursue this option by proposing spatial filtering techniques for noise and reverberation suppression for binaural hearing aids. That means that two signals (each to be received by one ear) will be generated with the intention of making them more intelligible than the input signals. But this is not all. We will also focus on preserving the binaural impression of the sound scene: in addition to a realistic listening experience, the listeners’ au-ditory system will be still able to do additional filtering in the cogni-tive stage by itself on the residual noise and reverberation thanks to the binaural hearing advantage.

(26)

1.2 h e a r i n g l o s s a n d h e a r i n g a i d s

Hearing is considered as normal if the lowest sound that the listener

Hearing loss

can detect (i.e. the hearing threshold) goes up to 25 dB HL (decibels Hearing Level, the better ear is considered), as recognized also by the World Health Organisation (WHO), [56], [66], [14], [29]. The dB HL is a frequency-dependent dB measure relative to the quietest sounds that a young healthy individual ought to be able to hear. If the hearing threshold is above 25 dB HL, we talk about hearing loss or hearing impairment.

From 25 to 40 dB HL, the hearing loss is referred to as mild and causes difficulties in hearing soft speech (such as the voices of women and children or whispers) in noisy situations. Disabling hearing loss includes moderate, severe and profound degrees and refers to hearing loss greater than 40 dB HL in the better ear in adults and a hearing loss greater than 30 dB HL in the better hearing ear in children.

In [56], the WHO estimates that disabling hearing loss affects over

People involved

5% of the world’s population and approximately one-third of people over 65 years of age. If also mild hearing loss is considered, around 16% of the adults is involved and the percentage grows up to 35%−

50% if the population in the age group of 65 and over is taken into ac-count, [63], [41]. The majority of these numerous people could benefit from wearing hearing aids.

A hearing aid is an electroacoustic device which is designed to

Hearing aid

amplify sound for the wearer, with the aim of making speech more intelligible. There are many types of hearing aids, which vary in size, power and circuitry. In Fig. 3, a schematic summary by [57] of the main modern different types can be found. In this work, we will fo-cus on the Behind-The-Ear (BTE) and on the receiver-in-canal (also known as mini-BTE).

Even if the majority of hearing impaired people could benefit from

People do not like

wearing hearing aids wearing hearing aids, just a small portion of them treats the disease:

less than 1 in 10 people with mild hearing loss and 4 in 10 people with disabling hearing loss, Fig. 4. The reasons that prevent people from wearing hearing aids are various. One main cause is stigma-related: a large portion of people trying to adapt to the use of hearing aids be-lieves that it makes them old, less attractive, less intelligent and worth less in the eyes of others, [5]. For this reason, industry is putting big emphasis on reducing the size of the devices in order to make them less visible and intrusive. Another big problem is the cost of the de-vices and of the batteries. Good hearing aids are prohibitive for most users, especially in developing countries. But the issues in which we are more interested are due to technical limitations. For example, it is typical that hearing impaired persons have hearing loss at both ears. As a consequence, they are fitted with a hearing aid at each ear. Since it is very difficult to let the two devices cooperate instantaneously or

(27)

1.2 hearing loss and hearing aids 5

Figure 3: Main modern different types of hearing aids.

almost, a so-called bilateral system is adopted. That is no cooperation between the hearing aids takes place. This greatly affects the binau-ral listening, distorting the auditory impression of the acoustic sound scene, i.e. obstructing the sound localization and the binaural hearing advantage. This point is exactly what we strive to overcome with this work. In order to achieve actual binaural (and not bilateral) process-ing, the devices need to cooperate with each other. An analysis of the main binaural processing techniques for hearing aids can be found in [21]. In this work we will also consider binaural hearing aids, suppos-ing that an ideal wireless link (no latency and transmission errors) is available.

The lack of treatment of hearing loss impacts not only the hearing Lack of treatment

impaired individual but also his or her family, colleagues, and social surroundings. Also at work, a hearing impairment can result in a loss of job function, since the vast majority of jobs these days require communication skills. Thus, hearing loss costs to the society also in

(28)

PEOPLE WITH DISABLING HEARING LOSS PEOPLE WITH MILD HEARING LOSS Not using aids

Wearing hearing aids

Not using aids

Wearing hearing aids

Figure 4: According to [41], just a small portion of hearing impaired people wears hearing aids.

terms of lost productivity1

. The current thesis tries to contribute to the research field that aims to improve hearing aids, such that people can feel more comfortable in wearing them.

1.3 s t r u c t u r e o f t h e t h e s i s

The following of the current thesis is structured in this way.

In order to investigate the effects of recently developed spatial fil-tering techniques applied to hearing aids, in Chapter 2, we present a generator of Binaural Room Impulse Response (BRIR) for hearing aids. The aim is to produce artificial reverberant signals as received

BRIR generator for

hearing aids by BTE hearing aids, with full control on the characteristics of the

room.

Then, the mathematical signal model is presented in Chapter 3, where also a spatial filtering method for joint reverberation and noise

Reverberation and

noise suppression suppression is proposed. This technique is well-known in literature to

be able to provide a significant amount of interference reduction. On the other hand, it is shown that, if a single desired source is assumed, the binaural cues of the residual undesired signal components are severely distorted. Thus, we propose a modification to this method

Spatial impression

preservation that is able to preserve the spatial impression of the sound scene. The

approach is novel in the context of dereverberation2

.

Chapter 4 deals with the definitions of the evaluation tools that we

Evaluation

will use.

In Chapter 5, results obtained from MATLAB®[52] simulations are presented, that show the effectiveness of the proposed method when applied to generated and measured signals.

Chapter 6 concludes the thesis with a summary and an exposition of some open issues that could be improved in future works.

Open issues

1 Further data, statistic and information about hearing loss can be found at http: //www.hear-it.org.

2 Excerpts from this thesis and the tools developed in this context were used in order to write a paper that will appear in the IEEE International Workshop on Acoustic Signal Enhancement 2014:http://www.iwaenc2014.org/. The abstract of the article and related sound examples, which will be discussed also in this thesis, are available athttp://www.audiolabs-erlangen.de/resources/2014-IWAENC-BDICP/.

(29)

2

B I N A U R A L R O O M I M P U L S E R E S P O N S E G E N E R AT O R F O R H E A R I N G A I D S

The first step of this work deals with a tool to simulate binaural room impulse responses for hearing aids.

A Room Impulse Response (RIR) is a function of time that describes RIRs and RTFs

the acoustic reaction of a room at the considered position to a Dirac delta function. If one makes the assumption (that is only a first ap-proximation for most real-life scenarios) that a room and the positions of the source and of the receiver behave like a Linear Time-Invariant (LTI) system, then its acoustic properties are completely characterized by the RIR. That is, one can calculate how any acoustic input will sound inside a room. Nevertheless it is usually easier to analyze sys-tems using transfer functions as opposed to Impulse Responses (IRs). In this context, we will talk about Room Transfer Functions (RTFs).

In general, a Transfer Function (TF) is the Fourier transform of an IR. The Fourier transform of a system’s output may be determined by the multiplication of the system’s TF with the input’s Fourier transform in the frequency domain. An inverse Fourier transform of this result will yield the output in the time domain. The same result could be obtained by convolving the time domain input with the system’s IR, [55].

Simulated RIR are essential for comprehensive testing of acoustic RIR generator: an essential tool

signal processing algorithms (e.g. understand their performance in terms of noise suppression, dereverberation or speech distortion) while controlling parameters (e.g. the room dimensions, the source-receiver distance and the characteristics of the material of the walls).

Recently, many algorithms for the generation of simulated RIR RIR generators

were proposed, e.g. in [36], [44] and [58].

The main new contribution of the developed generator is to ex- BRIR for BTE

tend an available RIR simulation technique to obtain Binaural Room Impulse Responses (BRIRs), especially with additional channels for hearing aids. So not only the RIRs from the source to the ears in a re-verberant environment (i.e. the BRIRs in the strict sense of the word) are considered, but also the RIRs to the microphones that are posi-tioned Behind-The-Ear (BTE) are taken into account. As menposi-tioned in Sec. 1.2, BTE and mini-BTE are very common kinds of hearing aids.

For this purpose, the Oldenburg database ([40]) contributes signifi- IR databases

cantly: it is a database of IRs, measured with three-channel BTE hear-ing aids and an in-ear microphone at both ears of a human head and torso simulator. More about that can be found in Sec. 2.3.2.

(30)

Figure 5: Two-dimensional room schematic for a source S, a receiver R and mirror sources.

In addition, accurate angle-dependent coefficients represent another relevant element of this work: they are used to model the room rever-beration, as summarized in Sec. 2.3.1.

Last but not least, the Image Method [3]: it is the real core of the

The Image Method

is the core proposed BRIR generator (and of the aforementioned [36], [44], [58],

as well). A brief illustration of the method can be found in the follow-ing Sec. 2.1.

2.1 t h e i m a g e m e t h o d

Way back in 1978, J.B. Allen and D. A. Berkley from the Bell

Labo-1978, Allen & Berkley’s Image

Method ratories proposed a powerful technique to simulate a “carefully

con-trolled, easily changed, acoustic environment”, called Image Method. A description of both the general theoretical approach and a specific implementation can be found in [3]. The method models an empty rectangular room (shoebox) and estimates a source-to-receiver RIR us-ing a mirror source method. The source is modeled as a point source, while the receiver is infinitely small, causing no effect at all on the acoustic scene.

Considering just a rectangular enclosure can seem very limiting,

An empty rectangular shoebox

room but it is extremely advantageous for the sake of simplicity and

reli-ability. Indeed, this model can be easily realized in an efficient com-puter program and the solution of a rectangular enclosure rapidly approaches an exact solution of the wave equation as the walls of the room become rigid. Furthermore, if we think about common rooms, they are mostly nearly rectangular (e.g. office environment), if consid-ered without furniture. For these reasons the method has been used widely for a broad range of investigations, in particular for basic stud-ies in the speech enhancement field.

The main idea starts from the consideration that a point source emits a spherical wave that has to satisfy the classical rigid wall

Boundary conditions

boundary conditions, when a rigid wall is present. This boundary condition may be satisfied by mirroring an image symmetrically on the far side of the wall - as sketched in Fig. 5 - and then omitting the presence of the wall. When the main source S is excited, each image

(31)

2.1 the image method 9

point (S’ and S” in Fig. 5) is simultaneously excited, creating spher-ical pressure waves which propagate away from each image point, coming to the receiver as reflections would have done.

In the general case of six reflective walls, each image is itself mir-rored up to an infinite order. In practice, truncation up to a finite order Nm is needed for computer-based simulation. Even so, it can be

seen that, by choosing the order Nm sufficiently large, the truncation

error is negligible.

As might be reasonably expected, the solution in terms of point Rigid walls

images is no longer exact if the room walls are not rigid. However, for the sake of simplicity, the method assumes the approximate point image model even for nonrigid walls. In addition, it assumes real-valued angle-independent and frequency-independent pressure wall reflection coefficients.

The following Eq. (1) has been derived heuristically by Allen and Allen & Berkley’s RIR

Berkley from geometrical considerations as depicted in Fig. 5, ex-tended to the three-dimensional case. The resulting RIR is formulated in [3] as follows: h(t) =pePmeM βx|m1x−q|β |mx| x2 β |my−i| y1 β |my| y2 β |mz−k| z1 β |mz| z2 δ(t−dpmc ) 4πdpm , (1) where:

h(t) room impulse response;

β frequency-independent real-valued angle-independent

pressure wall reflection coefficients (0≤ β≤1) with

the subscripts distinguishing between the 6 walls (x, y, z). β=0, 7 is a typical value for walls; dpm distance between the receiver and the considered

image source (the considered image source depends on the indexes p and m);

p p∈ P={q, i, k}: each of these elements can take values 0 or 1, resulting in eight combinations that form a set of eight different image source positions;

m m∈ M=mx, my, mz : each of these elements can

take values between−Nm and Nm, thus resulting in a

set M of(2Nm+1)3combinations;

t time;

c speed of sound.

The RIR of Eq. (1) can be convolved with any desired anechoic Image Method RTF

(32)

In practice, the frequency counterpart is often considered: it is the form that is exploited in this work. It can be expressed as follows:

H(ω) =∑pePmeM βx|m1x−q|β|xm2x|β|ym1y−i|β|ym2y|βz|m1z−k|β|zm2z| exp(−j ω

cdpm) 4πdpm

(2) where:

H(ω) room transfer function;

j imaginary unit;

ω angular frequency.

The approach gives a lot of flexibility for simulating various room setups, changing the sizes of the room or the positions of source and receiver. In fact, substantial room for improvement can be found in-vestigating the validity of the assumptions of the method. In particu-lar, in the following sections the “invisibility” of the receiver (in Sec. 2.2) and the coarseness of the reflection coefficients β (in Sec. 2.3) are stressed.

2.2 s i m u l at i n g b i nau r a l r o o m i m p u l s e r e s p o n s e s

As already mentioned, Allen & Berkley’s Image Method (and many

Taking into account the presence of the

body works such as [44] and [58]) is not capable of modeling obstacles.

However, an obstacle in the acoustic space (such as the body of the listener or a microphone array) causes scattering and shadowing effects, that cannot be ignored, especially at high frequencies. Moreover, it is a fact that the human body (especially head, pinna, and torso) plays a fundamental role for binaural sound perception. This kind of fil-tering is particularly relevant, since it is exploited by our brain to understand the spatial acoustic scene through the so-called binaural cues - e.g. the well-known Interaural Level Difference (ILD) and Inter-aural Time Difference (ITD) - and monInter-aural cues - caused mostly by the pinna. Descriptors for the binaural cues will be formally defined in Sec. 3.2.3.

Taking into account the presence of a human body is unavoidable in our context, since the focus of this work is on how a sound is perceived by a listener in a room.

In order to do that, two main approaches can be considered: for each reflection either the TFs from each mirror source to the hearing aid microphones or models for the binaural cues have to be integrated into the generation process of the RTFs. Such TFs, usually referred as to Head-Related Transfer Functions (HRTFs), can be obtained by anechoic measurements or using a model.

(33)

2.2 simulating binaural room impulse responses 11 xL(t) xR(t) hL(t) hR(t) x(t)

Figure 6: If hL(t)and hR(t)are the HRIRs, xL(t)is obtained by convolving the anechoic x(t)with hL(t).

This work just focuses on leveraging measured HRTFs, as explained in the following Sec. 2.2.

Head-Related Transfer Functions

An anechoic HRTF, the frequency-domain counterpart of the Head- HRTF and HRIR

Related Impulse Response (HRIR), fully characterizes how the eardrum receives a sound from a certain distance; a pair of HRTFs for two ears can be used to synthesize a binaural sound that will be perceived as coming from a particular direction. As depicted in Fig. 6, the gen-eration of such a sound is performed just convolving an anechoic input signal with the corresponding HRIR for each reflection, [12]. This procedure is correct only if the far-field assumption holds and if the HRIR is measured in anechoic conditions. Simulating BRIRs for hearing aids is possible in a similar way, if a measurement set or a model for these devices exists.

Alongside critical dependence on the relative position between lis- Anthropometric features

tener and sound source, anthropometric features of the human body (especially the shapes of the listener’s outer ear, head and torso) have a key role in HRTF characterization. All these characteristics will in-fluence how (or whether) a listener can accurately tell which direction the sound is coming from.

In order to create a binaural sound through an HRTF we would need the HRTF of the person who will listen to the generated sound.

Since measuring the exact HRTF of each possible listener would be Non-individualized HRTF

burdensome (in terms of equipment, time, complexity, etc.), two main solutions can be adopted to face this kind of problem. A first one con-sists in trying to create a non-individualized HRTF, measuring it on

(34)

mannequin that has average anthropometric features. Measurements of this nature can be found in the MIT public-domain dataset for the KEMAR mannequin, [25], or in the CIPIC HRTF database, [2]. In the latter one, besides the KEMAR mannequin, measurements for 43 subjects are present, as well.

The aforementioned databases are not suitable to simulate sound impinging on modern hearing aids, as they are limited to two-channel information recorded near the entrance of the ear canal and so in-cluding the effects of the pinna, while modern hearing aids typically process 2 or 3 signals per ear, coming from microphones located usu-ally behind the pinna. So, the Oldenburg HRTF database ([40] and Sec. 2.3.2) is a quite precious collection of data for this work. Indeed, this database contains also the measurements of three-channel BTE hear-ing aids, givhear-ing an important contribution to the evaluation of mod-ern hearing aids.

In order to generate a Binaural Room Transfer Function (BRTF), one

HRTFs in the Image

Method can integrate these HRTFs in the Image Method. So Eq. (2) becomes

as follows - using the same notation of Eq. (2) and introducing θpm,

the Direction Of Arrival (DOA) of the pmth simulated reflection:

H(ω) =

pePmeM

β|xm1x−q|β |mx| x2 β |my−i| y1 β |my| y2 β |mz−k| z1 β |mz| z2 HRTF(ω, θpm) exp(−jω cdpm) 4πdpm (3)

Non-individualized HRTFs represent a cheap and straightforward way of providing 3-D perception in headphone reproduction, but may result in evident sound localization errors such as incorrect percep-tion of the source elevapercep-tion, front-back reversals, and lack of external-ization, [54] and [6]. A particularly difficult issue concerns the pinna, where small variations in its shape can produce large changes in the HRTF. On the other hand, individual HRTF measurements on a signif-icant number of subjects may be both time- and resource-expensive. Moreover, each measurement is subject to errors and uncertainty.

Structural modeling of HRTFs ultimately represents an attractive

so-Model of the human

body lution to these shortcomings. As a matter of fact, if one isolates the

contributions of the listener’s head, pinnae, ear canals, shoulders, and torso to the HRTF in different sub-components - each accounting for some well-defined physical phenomenon - then, thanks to linearity, he can reconstruct the global HRTF from a proper combination of all the considered effects. Relating temporal and/or spectral features of each sub-component to the corresponding anthropometric quantities (that may be derived from pictures of the listener, e.g.) would then yield a HRTF model which is both economical and individualizeable, [11], [61], [27].

(35)

2.3 binaural room impulse responses for hearing aids 13

A very basic structural approximation of the HRTF could be just The sphere model

taking into account the principal effects of the head. In this case the unique anthropometric feature would be the head radius. This is clearly far from including all the perceptually relevant effects that the human body introduces. Nevertheless, it can be analytically described and it results being a very flexible approach. Moreover, it can be approximated in the spherical harmonic domain. Exploiting a rigid sphere scattering model (first developed by Clebsch and Rayleight in 1871-1872, [46]), [36] developed a tool, able to simulate the sound pressure on a sphere. By considering microphones placed at locations on this sphere corresponding to ear position on the human head, one can simulate simple HRIRs or even BRIRs, including reflections caus-ing reverberation. Very flexible BRIRs can be simulated by arbitrarily distributing microphones on a rigid spherical microhone array. This tool is called Spherical Microphone array Impulse Response (SMIR) generator and is available on the web, [36].

2.3 b i nau r a l r o o m i m p u l s e r e s p o n s e s f o r h e a r i n g a i d s

The developed BRIR generator has been created by modifying and A new BRIR generator

expanding a version of the SMIR generator proposed in [36] including • implementation of the image method considering a rectangular

empty room (with the exception of the listener);

• angle dependent reflection coefficients instead of the frequency-independent real-valued angle-frequency-independent pressure wall re-flection coefficients introduced in Eq. (1): Sec. 2.3.1;

• possibility to choose between HRTFs and rigid sphere model. The main new extension developed in the context of this work is:

• integrating HRTFs for hearing aids using the Oldenburg database as explained in 2.3.2.

These aforementioned aspects are stressed in the following.

2.3.1 Angle Dependent Reflection Coefficient

In Allen & Berkley’s TF just real-valued frequency- and angle- inde- Accurate angle dependent reflection coefficients

pendent pressure wall reflection coefficients (0 ≤ β ≤ 1) are

con-sidered. In fact, several acoustical investigations showed that the re-flection coefficients of porous surfaces strongly depend on frequency and the angle of incidence, [17]. The frequency- and angle-dependent plane wave reflection coefficient is given by [13]:

β(f , θ) = sin θ− Z1(f) Z2(f)  1− k21 k2 2 cos 2θ0.5 sin θ+ Z1(f) Z2(f)  1− k21 k2 2 cos2θ0.5 (4)

(36)

where:

β(f , θ) plane wave reflection coefficient;

Z1(f) characteristic acoustic impedance of the air (Pa · s/m);

Z2(f) characteristic acoustic impedance of the surface (Pa ·

s/m);

k1 wave number of the sound field in the air;

k2 wave number of the sound field in the material of the

reflecting surface;

θ angle of incidence on the surface;

f frequency.

In a benchmark study [19], Delany and Bazley proposed empiri-cal expressions, then amended by Miki [53] and then again by Ko-matsu [43], based on experimental data for calculating the frequency-dependent values of a characteristic acoustic impedance Z(f) and wave number k, given the flow resistivity σ, [43]:

Z2(f) Z1(f) =1+0.00027  2−log f σ 6.2 −0.0047i  2−log f σ 4.1 (5) k2 k1 =0.0069  2−log f σ 4.1 +i ( 1+0.0004  2−log f σ 6.2) (6)

The flow resistivity σ (Pa · s/m2) fully characterizes the acoustic

Flow resistivity

behavior of a specific material. In a more technical sense, the flow re-sistivity measures how easily air can enter a porous absorber surface, and the resistance that the air flow meets through a structure. In this way, it describes how much sound energy is absorbed within the ma-terial because of boundary effects. A relevant advantage of this model is that the reflection coefficient depends just on a single parameter σ, i.e. no thickness is required. Typical values for σ are shown in Table 1.

2.3.2 The Oldenburg Database

The Oldenburg database ([40]) is an eight-channel database of HRIRs,

A database for

hearing aids measured with three-channel BTE hearing aids and in-ear microphone

(Fig. 7) at both ears of a human head and torso simulator (com-monly referred as dummy head or mannequin) in an anechoic cham-ber. In addition, sets of BRIRs for some natural reverberant envi-ronments reflecting daily-life communication situations are provided.

(37)

2.3 binaural room impulse responses for hearing aids 15

Types of Ground Range Average

Upper limit 250,000 -2,500,000 800,000 Concrete, painted 200,000 200,000 Asphalt, new 5,000 - 15,000 10,000 Sand 40- 906 550 Grass 125- 300 200 Snow 1.3 - 50 29

Table 1: Effective resistivity typical values expressed in kPa · s/m2. A more complete list and description can be found in [17].

Figure 7: BTE hearing aid and in-ear microphone mounted on a dummy head as done for the Oldenburg database.

Even though not exploited in this context, they are important tools in the evaluation of acoustic signal processing algorithms and of rever-beration simulations (Sec. 5.7).

The anechoic HRTFs are used for the developed generator. They A BRIR generator for hearing aids

are combined with the Image Method to simulate not only the BRTF to the entrance of the listener’s ear canals, but also to the listener’s BTE hearing aids. Indeed, the developed BRIR generator for hearing aids uses Eq. (3) together with the Oldenburg database and provides 8 channels: 6 of them are related to the microphones of the 2 BTE hearing aids (one per ear).

The Oldenburg database also provides the information about the Measurements compensation

loudspeaker used to measure the HRIRs. The IR of the loudspeaker has been measured by a reference microphone at the head-and-torso simulator position in the anechoic chamber. So the BRIRs generated by our tool are equalized to a linear frequency response using this

(38)

ref-0 0.5 1 1.5 2 x 104 −30 −25 −20 −15 −10 −5 0 5 10 Frequency (Hz) Amplitude (dB) Loudspeaker TF Proposed equalizer

Figure 8: The TF of the loudspeaker used during the measurements of the Oldenburg database is plotted in blue. The equalizer used in the proposed BRIR generator is plotted in red.

erence IR to mitigate influences of the loudspeaker. In Fig. 8, both the TF of the loudspeaker and the proposed equalization TF are plotted. The equalizer or compensator (referred to as COMP in the follow-ing formula) acts on each HRTF: referrfollow-ing to Eq. (3), one can replace HRTF(ω, θ)with the following expression.

HRTF(ω, θ) =HRTFOldenburg(ω, θ)COMPOldenburg(ω) (7)

where:

HRTFOldenburg(ω, θ) HRTF from the Oldenburg database;

COMPOldenburg(ω) loudspeaker compensation TF.

The loudspeaker compensation TF COMPOldenburg(ω) (plotted in

red in Fig. 8) is obtained by inverting the loudspeaker TF (plotted in blue) and applying a low pass filter.

Now, to compute the BRTFs, the HRTFs are required for all contin-uous angles. The problem is that HRTFOldenburg(ω, θpm) is measured Interpolation

only for certain angles. E.g., the measurements are done each 5 de-grees on the azimuthal plane and for just 4 different elevations. For angles in between, a bilinear interpolation between the 4 nearest data points is performed by the generator, in order to make any angle available, as depicted in Fig. 9.

(39)

2.3 binaural room impulse responses for hearing aids 17

Figure 9: The HRTF corresponding to point hE(at azimuth θ and elevation φ) are obtained by bilinear interpolation from measured data points hA, hB, hC, and hD.

In addition, the generator allows to switch between the employ-ment of the HRTFs of the Oldenburg database and the ones of the MIT KEMAR database, [25], mentioned in Sec. 2.2. In the latter case, just 2 in-ear signals are produced. On the other hand the advantage of the MIT KEMAR database is that it provides an almost full spherical set of HRTFs, while, in the Oldenburg database, just positions with elevation of−10º, 0º, 10º, 20º are available.

Moreover, one can decide to use the head spherical model instead of the dummy HRTFs. The former method leads to perceptually less accurate results but it is faster, the latter one is definitively perceptu-ally superior, but it pays the cost of greater complexity.

In the source code listing 1, it can be seen an example of calling The code

the developed BRIR generator: the result, concerning the in-ear chan-nels only, is shown in Fig. 10. The head shadow on the right side is clearly noticeable. The room setup of these RIRs is shown in Fig. 11, where one can also notice that both the source and the receiver are not placed in the vicinity of the walls in order to obey to the far-field assumption.

(40)

0 0.1 0.2 0.3 −0.01 −0.005 0 0.005 0.01

left in−ear BRIR

Time (s) Amplitude 0 5000 10000 −80 −60 −40 −20 left in−ear BRTF Frequency (Hz) Amplitude (dB) 0 0.1 0.2 0.3 −0.01 −0.005 0 0.005 0.01

right in−ear BRIR

Time (s) Amplitude 0 5000 10000 −80 −60 −40 −20 right in−ear BRTF Frequency (Hz) Amplitude (dB)

Figure 10: Two channels of the generated BRIRs and BRTFs for the situation described by Fig. 11. The listing 1 shows the related source code.

Figure 11: Room setup of the generation of Fig. 10: the red cross depicts the source, while the blue circle stands for the listener. The line coming out the circle can be thought as the nose of the listener, representing his looking direction.

(41)

2.4 conclusions 19

2.4 c o n c l u s i o n s

The new proposed generator was completely implemented using a commercial software package (MATLAB® [52]). A schematic of the

main structure of this tool is presented in Fig. 12: parameters concern-ing the room setup and the source and the receiver position are used by the Image Method to calculate the image sources. Each of these image sources is needed to simulate a reflection with an individual DOA (azimuth θ and elevation φ) and is so filtered by the correspond-ing HRTF for all microphones. In the illustration, each hearcorrespond-ing aid is supposed to have M microphones. As a consequence 2M signals are produced. As a last step, all the reflections relative to the same sensor are summed up.

The developed tool is able to simulate realistic BRIR for hearing aids. It mostly takes advantage of these constitutive blocks:

• The Image Method as core (Sec. 2.1) but with frequency- and angle-dependent reflections coefficients (Sec. 2.3.1);

• The Oldenburg multichannel database of binaural HRIR for BTE hearing aids (Sec. 2.3.2). As an alternative to the Olden-burg database, also the spherical model for the human head can be used (Sec. 2.2).

(42)

Image Method Room characteristics Source position Receiver position Image 0 Image 1 Image N H R T F ( θ N , φ N ) H R T F ( θ 1, φ 1) H R T F ( θ 0, φ 0) ∑ ∑ ∑ 1 BRTF1 BRTF2 BRTF2M 2 2M 2 1 2M 2M 1 2 DB of HRTF for hearing aids

Figure 12: Schematic of the structure of the proposed generator of BRIR for hearing aids.

(43)

2.4 conclusions 21

Listing 1: Calling the proposed BRIR generator via MATLAB® [52] code.

1 c = 343; % Sound velocity (m/s)

procFs = 22000; % Sampling frequency (Hz)

head_pos = [1.5 2 1.8]; % Listener location (x,y,z) in m % Coordinate System

% 0 front of the head

6 % 90 left side of the head

% -90 right side of the head

azimuth= 50; % Source position w.r.t. listener position

elevation = 0; distance = 1.5;

11

L = [5 6 4]; % Room dimensions (x,y,z) in m

sigma = 20*10^4*[1 1 1 1 0 0]; % Effective flow resistivity

nsample = 8*1024; % Length of desired RIR

16

% select mode to use: default is oldenburg_compensated

mode = ’oldenburg_compensated’;

%mode = ’oldenburg’; %mode = ’MITkemar’;

21 %mode = ’headSphericalModel’;

refl_order = Inf; % Maximum reflection order

[src(1),src(2),src(3)] = sph2cart(azimuth/180*pi,elevation/180*pi, distance);

26 src = src+head_pos;

%% Run simulation

[h, H] = brir_generator(c, procFs, head_pos, src, L, sigma, nsample, mode, refl_order);

31 % Outputs:

% h nChannel x nsample matrix containing the calculated BRIRs

% H nChannel x nsample/2+1 matrix containing the calculated

BRTFs

(44)
(45)

3

D E R E V E R B E R AT I O N B Y S PAT I A L F I LT E R I N G F O R H E A R I N G A I D S

This chapter deals with approaches to obtain the clean speech signal, The goal

starting from signals degraded by room reverberation and noise: the aim is to jointly reduce reverberation and noise, in order to improve speech intelligibility in reverberant environments.

Indeed, noise and reverberation reduction is useful in a lot of dif-ferent applications (e.g. hands-free devices) and it is of fundamental importance in hearing aids applications. In particular, late reverbera-tion has a noise-like nature that may cause overlap-masking effects, degrading speech intelligibility, [8], [42].

Modern hearing aids typically feature more than one microphone, Spatial filtering in hearing aids

enabling the use of multi-microphone algorithms, which can not only exploit spectral and temporal information, but also spatial informa-tion of the sound scene. It should be noted that the objective of a binaural signal enhancement algorithm is not only to selectively ex-tract the desired signal and to suppress noise and reverberation, but also to preserve the binaural auditory impression of the sound scene. The final goal is to provide a realistic listening experience for hear-ing impaired persons. This is not only a matter of comfort. Offerhear-ing the proper spatial information to the cognitive stages of the auditory

system can result in better intelligibility, as observed in Sec. 1.1. This Binaural hearing advantage

aspect is referred to as the binaural hearing advantage and its usefulness is shown in [47] and [59]. In [59] the ear is said to be intelligent since the human hearing system can deeply exploit spatial information. For example, the brain can use time and intensity differences between the ears to localize sources and filter out the unwanted ones (ability also known as cocktail party effect). For these reasons, it is important to not distort this kind of information. In addition, the confusion due to a mismatch between the acoustical and the visual information has to be avoided.

It has been reported in [7] that hearing impaired individuals lo-calize sounds better without their current bilateral hearing aids than when using them (with noise reduction schemes that are not designed to preserve the localization cues). It should also be pointed out that, in some cases, such as in street traffic, incorrect sound localization may be endangering.

In order to preserve the binaural auditory impression of the sound

scene, it is required to preserve the binaural cues of all sound sources Cue preservation

in the acoustical scene. There are two main approaches that can be fol-lowed. The first method suggests to use multiple microphones to

Riferimenti

Documenti correlati

The search for Supersymmetric extensions of the Standard Model (SUSY) remains a hot topic in high energy physics in the light of the discovery of the Higgs boson with mass of 125

Ian Andrews, MBBS, FRACP (Sydney Children's Hospital and University of N.S.W., Sydney, Australia, collection of patient and imaging data); Richard Appleton, MD (Alder Hey

Nel corso del XIX secolo moltissimi sono gli architetti tedeschi che, per ragioni diverse e con obiettivi alquanto mutevoli, raggiungono la Sicilia. Nella quasi totalità dei casi

The obejcts marked with ∗ are identified in the COSMOS-VLA Deep Project; (3)-(4) right ascension and declination of (one of the components of) FIRST radio source; (5) number of

The results show again that also for the abundance of dark matter haloes there is a strong degeneracy between the effects of a modi- fication of gravity and of a non-vanishing

Vorrei ringraziare tutti i miei docenti universitari per avermi dato la preparazione teorica che mi ha permesso di affrontare questo lavoro di tesi.. Un ringraziamento speciale a

(A) First hypothesis considers a diploid genome with two sex chromosome pairs, namely I and I ’ and L and L’, bearing ZsMATa cassettes 1, 2, and 3, and ZsMATa cassette, which are