• Non ci sono risultati.

Accuracy, recording interference, and articulatory quality of headsets for ultrasound recordings

N/A
N/A
Protected

Academic year: 2021

Condividi "Accuracy, recording interference, and articulatory quality of headsets for ultrasound recordings"

Copied!
15
0
0

Testo completo

(1)

ContentslistsavailableatScienceDirect

Speech

Communication

journalhomepage:www.elsevier.com/locate/specom

Accuracy,

recording

interference,

and

articulatory

quality

of

headsets

for

ultrasound

recordings

Michael

Pucher

a,∗

,

Nicola

Klingler

a

,

Jan

Luttenberger

a

,

Lorenzo

Spreafico

b a Acoustics Research Institute (ARI), Austrian Academy of Sciences (ÖAW), Vienna, Austria

b Department of Foreign Languages, Literatures and Cultures (DLLCS), University of Bergamo, Italy

a

r

t

i

c

l

e

i

n

f

o

Keywords:

Ultrasound tongue imaging Articulatory phonetics

a

b

s

t

r

a

c

t

In this paper we evaluate the accuracy, recording interference, and articulatory quality of two different ul- trasound probe stabilization headsets: a metallic Ultrasound Stabilisation Headset (USH) and UltraFit, a re- cently developed headset that is 3D printed in Nylon. To evaluate accuracy, we recorded three native speak- ers of German with different head sizes using an optical marker tracking system that provides sub-millimeter tracking accuracy (NaturalPoint OptiTrack Expression). The speakers had to read C 1V 1C 2V 1/2non-words (to

diminish lexical influences) in three conditions: wearing the USH headset, wearing the UltraFit headset, and without a headset. To estimate the relative headset movement, we measured the movement between tracked points on the probe, headset, and speaker’s nose. By also tracking visual marker points on the speaker’s lip and chin, we compared the movement of the outer articulators with and without a headset and, thereby, mea- sured how the headsets interfere with the articulatory space of the speaker. Additionally, we computed the differences in tongue profiles at the acoustic midpoint of V 1under the three conditions and evaluated the ar-

ticulatory recording quality with a distance index and an area index. In the final evaluation, we also compared formant measurements of recordings with and without headsets. With this objective evaluation we provide a systematic analysis of different headsets for Ultrasound Tongue Imaging (UTI) and also contribute to the dis- cussion of using UTI stabilization headsets for recording natural speech. We show that both headsets have a similar accuracy, with the USH performing slightly better overall but introducing the largest error for one speaker, and that the UltraFit headset shows more flexibility during recordings. Each headset influences the lip opening differently. Concerning the tongue movement, there are no significant differences between differ- ent sessions, showing the stability of both headsets during the recordings. Acoustic analysis of formant differ- ences in vowels revealed that the USH headset has a larger influence on formant production than the UltraFit headset.

1. Introduction

Ultrasound TongueImaging (UTI)is amedical-derived technique developedwithinarticulatoryphoneticstostudyreal-timeandoffline tonguemovementsduringspeech(Stone,2005).Inthelastdecade,the technique,whichappearedonthesceneintheearly1980s(Shawkerand SoniesPhd,1984),hasmadeprogressbothonthetechnicalside,with theintroductionof systems thatareincreasingly performing wellin termsofspatialandtemporalresolution(deJongetal.,2019);andon themethodologicalside,withthedevelopmentof techniquesforthe analysisofstaticanddynamicdatathat areincreasinglyinformative (Pinietal.,2019).

Corresponding author.

E-mail addresses: michael.pucher@oeaw.ac.at (M. Pucher), nicola.klingler@oeaw.ac.at (N. Klingler), jan.luttenberger@oeaw.ac.at (J. Luttenberger), lorenzo.spreafico@unibg.it(L. Spreafico).

URL:http://www.sociolectix.org(M. Pucher)

Inadditiontoanalysesofarticulatoryphonetics,theUTItechnique isalsowellsuitedtotechnological(Hueberetal.,2010;Fabreetal., 2017),educational(WilsonandGick,2006;Nakaietal.,2016;Ribeiro etal.,2019)andclinical(Prestonetal.,2016)applications.Amongthe technologicalapplications,themostinterestingonesaresilentspeech interfaces,whicharesystemsthatallowspeechcommunicationwithout audiblevocalization(Bruceetal.,2010).

Among the educationaland clinical applications, interfaces have beendevelopedthatallowforvisualization-alsoinmixedreality en-vironments-ofthetongueprofile,whichcanimprovespeech articula-tionthankstothepositiveactionofthevisualfeedback(Eleanoretal., 2019).

https://doi.org/10.1016/j.specom.2020.07.001

Received 4 November 2019; Received in revised form 7 May 2020; Accepted 6 July 2020 Available online 11 July 2020

(2)

Fig.1. Exploded view of the UltraFit system.

WhateverthefieldofapplicationofUTI,oneoftheopenissuesis thestabilizationoftheultrasoundprobeunderthechinofthespeaker toenablethedefinitionofafixedreferencesystemforanalysisor ob-servations(DavidsonandDecker,2005).Althoughtheresolutionofthis problemisfeltdifferentlybythosewhousethetechniqueforresearch purposesorclinicalpractice, inrecentyearsanumberof ideashave beenproposedtosolveit;theseinclude,forexampletheusageof me-chanicalsystems(StoneandDavis,1995;Scobbieetal.,2008;Davidson andDecker,2005;Caietal.,2011;Derricketal.,2015;2018);or soft-ware(Whalenetal.,2005);orsimplyholdingtheultrasoundprobeby hand(Zharkovaetal.,2015).

In this contribution we intend to deepen the evaluation of one of those solutions, the UltraFit headset (see Fig. 1) developed by

Matosova(2016) andsubsequentlyperfectedandmarketedby Articu-lateInstruments,andcompareitwiththeUltrasoundStabilisation Head-set(USH)developedandmarketeduntil2018bythesamecompany (Scobbieetal.,2008;ArticulateInstrumentsLtd.,2008).The evalua-tionandcomparisonarerelevantbecausetheUSHisamongthemost usedstabilizationdevicesinarticulatoryphoneticslaboratoriesaround theworld.

Thetwoheadsetsdifferfirstin thematerialwithwhich theyare made:UltraFitismadeofnylon,asyntheticpolymer,whileUSHismade ofaluminium,anonmagneticmetal.Apreviouspaper(Spreaficoetal., 2017)describedtheprocessofdevelopingtheUltraFitheadsetand an-alyzeditsusability.

Thedifferenceinthechoiceofmaterialshasrepercussionsformany otheraspects.First,itaffectstheshapeoftheUltraFitheadset,because thepolymercanbeprintedin3D,enablingtheheadsettoobtainamore organicshape,whichisbetterwithregardtoboththefitandthe ma-neuverabilityoftheheadsetduringsetup,aswellaswithregardtothe stabilityof theheadset.Second, thechoiceof thepolymerhas posi-tiverepercussionsfortheweight,whichislessthanthemetalheadset. Thisislikelytobereflectedingreatertolerabilityduringprolonged ses-sionsofuse.Finally,thechoicehasanadvantageintermsofintegration withothertechniquesforinvestigatingspeecharticulation.Ifnecessary, theheadsetcanbeassembledwithoutusingmetallicscrewsandbolts, thus,forexample,ensuringcompatibilityindatacollectionsessions in-volvingtheuseofElectromagneticArticulography(EMA)orMagnetic ResonanceImaging(MRI).

Despite themany advantages, theaccuracy of themeasurements achievableusingtheUltraFitsystemremainedtobetested.A prelim-inaryassessmentofthestabilityandaccuracyofUltraFitwasmadeby

recordingaspeakerandshowingthattheoverallerrorrangeofthe head-setmovementforthisspeakerlaywithin3mm,withmosterrorslying ina1–2mmrange(Spreaficoetal.,2018).

Hence,inthispaperwecomparetheaccuracyoftwodifferent head-sets“USHandUltraFit” byusingdatafromthreedifferentspeakers an-alyzedinreferencetovisualdataaboutthemovementsoftheheadset. Thesemovementsweredetectableexternallyusinganopticaltracking system.Additionally,wereportacousticdataontheproductionof vow-elsandarticulatorydataondiscrepanciesdetectableinthepositioning ofthetongue.Furthermore,wecomparetheresultstoacousticand vi-sualrecordingsofnaturalspeechwithandwithoutwearingtheheadset. Thispaperisorganizedasfollows:InSection2 wedescribethe vi-sual,articulatory,andacousticrecordings.InSection3 wepresentthe analysisbasedonthevisualdata,whichshowstheaccuracyofthe head-setandit’sinfluenceonmouthopening.Section4 containstheanalysis oftheheadsetsbasedonarticulatorydataandSection5thosebasedon formantsderivedfromacousticdata.Section7 concludesthepaper. 2. Data elicitation

Fortheevaluationof theaccuracyof thetwoheadsetsshown in

Fig.2,wedesignedandranadedicatedexperiment.Theexperiment in-volvedthreeinformants.AllinformantswereGermannativespeakersof StandardAustrianGermanorStandardGermanGerman.Theinformants werecharacterizedbyhavingheadsofdifferentcircumference,soasto highlightwhetherthisparameteraffectsthestabilityofthehelmetand thereforetheaccuracyofthemeasurement.Thefirstspeaker(spk1 ), fe-male,hadasmallheadsize(53cmincircumference);thesecond(spk2 ), male,hadanaveragecircumference(57cm);thethird(spk3 ),male,had alargecircumference(60cm).

Eachspeakerwasseatedinfrontofacomputerinasemi-anechoic booth, and was instructed to read aloud the stimuli presented to him/her.Thestimuliconsistedofthefollowingnon-wordsofthetype C1V1C2V1/2 repeated three times: /’paka ’paka ’paka/, /’taka ’taka

’taka/,/’tuki’tuki’tuki/,/’tipi’tipi’tipi/.Eachnon-word,pronounced withatrochaicstressinaccordancewithGermanphonotactics,was re-peatedthreetimesduringeachrecordingsession.

Eachsessionbeganwitha silence trialinwhichthespeakerswere instructedtokeepthetongueinrestpositionandendedwitha swallow trial.Eachspeakerattendedthreerecordingsessions:onewearingthe metalhelmet,onewearingthepolymerhelmet,onewithouthelmets. Duringeachsessionvisualandsynchronizedarticulatoryandacoustic

(3)

Fig.2. UltraFit headset (left) and Ultrasound Stabilisation Headset (USH) (right).

Fig.3. Visual marker configuration (top). Video still from recordings (bottom). Natural - spk2(left column), UltraFit - spk3(middle column), and USH - spk1(right column) recording condition.

(4)

nose-nose sp spk1

3.5

3.52

3.54

3.56

3.58

3.6

Range (mm): 0.9 0.1

nose-nose sp spk2

2.48

2.5

2.52

2.54

2.56

2.58

Range (mm): 0.3 0.1

nose-nose sp spk3

3.48

3.5

3.52

3.54

3.56

3.58

Range (mm): 0.4 0.1

nose-nose sp spk1

3.55

3.6

3.65

3.7

3.75

3.8

Range (mm): 1.1 0.6

nose-nose sp spk2

2.6

2.65

2.7

Range (mm): 1 0.1

nose-nose sp spk3

3.05

3.1

3.15

3.2

Range (mm): 0.7 0.2

Fig.4. Distribution of Euclidean distance between nose marker 1 and nose marker 2 for speech in UltraFit (top), and USH (bottom) condition..

datawascollected.Altogether,thedatabasecontains648trials,namely 216foreachspeaker.

2.1. Visual recordings

FacialmovementwasrecordedusingaNaturalPointOptiTrack Ex-pressionsystemusingsevenFLEX:V100R2infraredcameras.This sys-temrecordsthe3Dpositionofreflectivemarkersgluedtothespeakers faceat100Hz.

Werecordedthespeakerswithoutheadset(natural ),withtheUltraFit headset(UltraFit )andwiththeUSH(USH ).Thehelmetswerefixedby thesameoperatoras firmlyaspossible totheheadofthe speakers. RegardingUltraFit,theauxiliaryVelcrostrapswerenotusedtostabilise theprobearmlaterally.Thenaturalrecordingsweremadetocompare thelipopeningwithandwithoutheadsets.Dependingontherecording conditionwegluedmarkerstothespeaker’snose,thelipsandjaw,the headset,andtheultrasoundprobeasshowninFig.3 foronespeaker.1

Hereweonlyneedareducedsetofmarkers,inpreviousworkweused thissystemtorecordafullsetoffacialmarkersforfacialanimation (Schabusetal.,2014).

Additionallywealsousethefourheadbandmarkersthatareused toremoveheadmovementfromtherecordings.Fortheevaluationwe usetheoutputofthesystemdirectlywithoutapplyinganymanual cor-rections.Table1 alsoshowsthedifferentmarkerconfigurationsforthe differentconditions.Markersonthenosearealsousedtomeasurethe inherenterrorofthesystem,distancesbetweennoseandprobe

mark-1 The adhesive tape on the USH headset was used to cover glossy parts and

thereby improve the visual tracking robustness.

Table1

Number of markers per location / con- dition. Natural UltraFit/USH Headband 4 4 Nose 2 2 Lips 2 2 Jaw 3 3 Headset 0 2 Probe 0 2 11 15

ers areusedformeasuringtheerroroftherecordings,anddistances betweenlipmarkersareusedtocomparemouthopening.

Thedifferentheadsizesofthespeakersarealsoindicatedbythe dis-tancesbetweennoseandprobemarkersinFig.5.Fig.3 showsthe dif-ferentmarkerconfigurationsfor spk1 -spk3. Spk3 isaStandardGerman Germanspeaker,spk1 andspk2 areStandardAustrianGermanspeakers.

2.2. Articulatory recordings

Theanalysisofarticulatorydataisofvaluebecauseitcanleadtothe detectionofdifferencesbetweenthetwoheadsetsthatarenotdetectable bytheanalysisofvisualrecordingsonly.Whilethevisualrecordings re-ferredtoinSections2.1 and3arebasedontheobservationofreflective markersindirectorindirectcontactwiththeskin(whichisindependent fromthepositionofthetongue),thosereferredtointhisSectionand

Section4 arebasedontheobservationofultrasoundrecordingsfrom theprobe,thefixationofwhichiswhytheheadsetsweredeveloped.

(5)

nose-probe sp spk1

15.4

15.5

15.6

15.7

15.8

15.9

Range (mm): 3.3 0.9

nose-probe sp spk2

18.3

18.4

18.5

18.6

Range (mm): 2.7 1.2

nose-probe sp spk3

21.5

21.6

21.7

21.8

21.9

Range (mm): 3.5 1

nose-probe sil spk1

15.45

15.5

15.55

Range (mm): 1.1 0.4

nose-probe sil spk2

18.4

18.45

18.5

18.55

18.6

18.65

18.7

Range (mm): 2.5 1.8

nose-probe sil spk3

21.46

21.48

21.5

21.52

21.54

Range (mm): 0.6 0.3

Fig.5. Distribution of Euclidean distance between nose marker 1 and probe marker 1 for speech (top) and silence (bottom) in the UltraFit condition. Spk1(left column), spk2(middle column), spk3(right column).

Theexperimentaldataconcernonlythesetsofrecordingsinwhich thespeakersworeheadsets.Infact,collectingultrasounddatabyfixing theprobeunderthespeaker’schinbyhandwouldnotguaranteereliable resultsforcomparingtheaccuracyofthestabilizationdevices.

ThearticulatoryrecordingsweremadeusingtheMicroSpeech Re-search Ultrasound System (Articulate InstrumentsLtd., 2017b) mar-ketedbyArticulateAssistantAdvancedTMcoupledtothe5–8MHz

mi-croconvexprobe.Theweightoftheprobeandtwo-thirdsofthelength ofthehangingcable,excludingtheweightoftheconnector,is0.17kg. Thisvalueshouldbetakenintoaccountbecauserecentmodellingwork (Canella,2019)hasshownthatthestabilityofUltraFit(and,therefore accuracy)isstronglyinfluencedbythemassoftheultrasoundprobe. Ultrasoundtongueimagingdatawasrecordedwithafixedfieldofview of150degrees,atdepthsvaryingfrom70mmto80mm,atasampling ratevaryingfrom85 fps to95 fps.Thecollecteddata wasanalyzed usingtheArticulateAssistantAdvancedTM software(AAA,v.2.17.10;

(ArticulateInstrumentsLtd.,2017a)).

WithregardtotherecordingofarticulatoryUTIdata,itisnecessary tohighlightsomepossiblemethodologicalcriticalities.Infact,allthe sessionstookplaceonthesamedayandinvolvedthesameresearchers, soastotrytopartlymitigatetheproblemsrelatedtothereproducibility andrepeatabilityrequirementsofdataanalysisinvolvingbiomarkers (Toegeretal.,2017).

Unfortunately,inanabsolutesensethiswasimpossible.In particu-lar,themostdifficultfactorstocontrolwereoperatorvariability, tech-nicalvariabilityandimageanalysisvariabilityforarticulatoryUTIdata. Withreferencetothefirstfactor,itwaspossibletoexcludethe inter-operatorvariabilitybecausetheset-upofthearticulatory instrumenta-tionwasentrustedtotworesearchersspecializedinthepractice. How-ever,giventhedurationoftheexperiments,itwasnotpossibletocontrol orestimateapossibleintra-operatorvariability.

With reference to the second factor, technical variability, the anatomicaldifferencesofthesubjectsconcerningboththesizeofthe head(intentional)andtheshapeofthechinandthemouthcavity (non-intentional)didnotallowtherepositioningoftheultrasoundprobein anatomicallyidenticalpositionsforeachofthethreesubjects.However, duringtheelicitationphaseofthearticulatorydata-asubjectthatwill bediscussedinmoredetailinSection4-anattemptwasmadetofinda functionalcorrespondenceoftheimages,orientingtheultrasoundprobe soastoincludethepointsofcontactbetweentongueandpalateforthe consonants/k/and/t/.Sinceforourstudywearemainlyinterested inthewithinspeakernotthebetweenspeakervariabilitythetechnical variabilityislesscritical.

Finally,withreferencetothethirdfactor,fortheprocessingofthe articulatorydataandtheextractionoftherelativevaluesitwas neces-sarytoadoptasemi-automaticanalysistechniquethatalsorequiresthe initialmanualdefinitionofthelanguageprofile.Althoughtheoperation wasentrustedtothesameresearcher,alsointhiscaseitisnotpossible toexcludeapossibleintra-operatorvariability.

2.3. Acoustic recordings

Acousticrecordingswereconductedusingadeskmicrophoneand aUSBaudiointerface(FocusriteScarlettSolo)withasamplingrateof 44.1kHzinasemi-anechoicbooth.

3. Analysis of visual data 3.1. Accuracy

Measuring the distance between different visual markers allows for themeasurement ofthemovementoftheheadset relativetothe

(6)

nose-probe sp spk1

21.15

21.2

21.25

21.3

21.35

21.4

21.45

21.5

Range (mm): 1.7 1

nose-probe sp spk2

20.5

20.55

20.6

20.65

20.7

20.75

Range (mm): 1.2 0.4

nose-probe sp spk3

19.1

19.2

19.3

19.4

19.5

19.6

19.7

Range (mm): 4.2 1.2

nose-probe sil spk1

21.35

21.4

21.45

Range (mm): 1.3 0.8

nose-probe sil spk2

20.6

20.65

20.7

20.75

Range (mm): 1.2 0.4

nose-probe sil spk3

19.2

19.25

19.3

Range (mm): 1 0.2

Fig.6. Distribution of Euclidean distance between nose marker 1 and probe marker 1 for speech (top) and silence (bottom) in the USH condition. Spk1(left column), spk2(middle column), spk3(right column).

speaker’shead.Withnomovementthedistanceofthemarkersshould stayfixed withnovariance.Tomeasurethemovementof theprobe inrelationtothespeaker’sheadwemeasurethedistance betweena markeronthenoseandamarkerontheprobe.Tomeasuretheinternal errorofthevisualtrackingsystemwemeasurethedistancebetweenthe twonosemarkers.Thesetwotypesofmeasuresallowustoevaluatethe accuracyoftheheadsets.

Tomeasuretheerroroftherecordingsetupwemeasuredthe dis-tance between both nosemarkers, assuming thatthere is only little changeindistancebetweenthenosemarkers.Smallchangesare pos-sible betweenthenose markers,whenthespeaker producesafacial movementthatincludesmovementoftheface.

Soanychangesinthenose-nosedistancemeasurementscanthen be attributedtothevisual trackinghardware andsoftware,or small movementsofthenose.Thedistributionofthenose-noseerrorisshown onFig.4.Therangeinmillimeter(mm)inthetitleofeachsub-figure isgivenforthe2.5thtothe97.5thpercentile(firstnumber)andforthe 25thto75thpercentileofthedata(secondnumber).

Wecanseethattheerrorisbetween0.1mmand0.6mmfor50% ofthedataforallspeakers,andbetween0.3mmand1.1mmfor95% ofthedataforallspeakers,suchthatwecanconcludethatthesystem performswithsub-millimeteraccuracyalmostallthetime.FortheUSH condition(bottom)thereisaslightlylargererrorof1.1mm(spk1 ),1 mm(spk2 ),and0.7mm(spk3 )forthe2.5thto97.5thpercentile.

Fig.5 and6 showstheEuclideandistancesbetweenthe3Dpoints nosemarker1andprobemarker1forthewholerecordingsessionfor thethreespeakers.ThisshowstheerroroftheUTIheadsetduringthe recordingsession.

Wecanseethatthemaximumerrorinthe2.5thto97.5thpercentile is3.5mm forUltraFitand4.2mm forUSH.Thevalues forUltraFit

inthe2.5thtoto97.5thpercentilerangefrom0.6-3.5mm,forUSH from1.0-4.2mm.Theheadcircumferencecanalsobeindirectlyseen inthedistancesbetweennoseandprobemarkersonthe y -axisforthe UltraFitcondition,fortheUSHconditionthisrelationdoesnotholddue todifferentplacementsoftheultrasoundprobeforthethreespeakers.

WecanseethatthelargesterroroccursfortheUSHconditionwith 4.2mmfor spk3 ,althoughthisspeakerhasalowintrinsicerrorof0.6 mm asshown inFig.4. This showsthatthenose-probe error isnot dependentofthenose-noseerror.

AnF-testwasperformedtotestifthesamplesarefromadistribution withthesamevarianceandsignificantdifferences(𝑝<.001)werefound forallspeakersbetweenthetwoconditions(UltraFitvs.USH).Forspk1 andspk2 USHshowsahigheraccuracy,forspk3 UltraFitshowsahigher accuracy.Sincetheaccuracyvaluesofbothconditionsareinasimilar rangewemayconcludethatbothheadsetshaveasimilarperformance withtheUSHbeingslightlybetter.

Figs.7 and8showsthedistributionsfortheindividualcoordinates (x, y, z ).Thisshowstheerroroftheultrasoundheadsetinthedifferent dimensions.Tomeasurethemovementinthedifferentcoordinateswe havetoremovetheheadmovementfirst.Thisisdonebyusingthefour pointsoftheheadband,althoughweobservedsmallmovementsofthe headbandduringtherecordingsduetomovementsoftheforehead.This islikelytohaveintroducederrorsinthenumbersshowninFigs.7 and

8.ThefixingoftheheadbandmarkerswaseasierwiththeUSHthan withtheUltraFitcondition.Furthermoreonechangeofpositionofthe headbandmarkersintroducesanerrorthatisthenpresentduringthe restoftherecordings.

The large errors especially for the UltraFit condition (Fig. 7) of 22.2mmforspk1 and33.6mmfor spk2 canbeexplainedbytheerrors introducedthroughheadmovementremoval.Ifwesimplycomputethe

(7)

probe speech spk1 x

-8

-7.8

-7.6

-7.4

Range (mm): 4.1 2.2

probe speech spk1 y

12.5

13

13.5

Range (mm): 9 2.9

probe speech spk1 z

24.5

25

25.5

26

26.5

Range (mm): 22.2 12.1

probe speech spk2 x

-16

-15

-14

Range (mm): 13.9 2

probe speech spk2 y

12.5

13

13.5

14

14.5

Range (mm): 13.2 1.9

probe speech spk2 z

30

31

32

33

Range (mm): 33.6 7.7

probe speech spk3 x

-16

-15.5

-15

Range (mm): 4.3 1.5

probe speech spk3 y

8.6

8.8

9

9.2

Range (mm): 3.3 1.1

probe speech spk3 z

23

23.5

24

Range (mm): 5.2 1.9

Fig.7. Distribution of individual coordinates ( x,y,z) for probe marker 1 for speech in the UltraFit condition after head movement removal.

probe speech spk1 x

-14.8

-14.4

-14

-13.6

Range (mm): 7.2 4.5

probe speech spk1 y

8.6

9

9.4

9.8

Range (mm): 7.2 4.8

probe speech spk1 z

28.5

29

29.5

30

Range (mm): 12.9 4.5

probe speech spk2 x

-16.8

-16.4

-16

-15.6

Range (mm): 5.1 0.7

probe speech spk2 y

12

12.5

13

Range (mm): 9.1 1.3

probe speech spk2 z

27

27.5

28

Range (mm): 6.8 4.2

probe speech spk3 x

-15.1

-15

-14.9

-14.8

-14.7

Range (mm): 2.2 0.7

probe speech spk3 y

11

11.2

11.4

11.6

11.8

Range (mm): 5.4 2.9

probe speech spk3 z

29

29.5

30

Range (mm): 8.8 5.8

(8)

nose-probe sil1 spk1

15.4

15.5

15.6

15.7

15.8

Range (mm): 2.9 0.7

nose-probe sp1 spk1

15.4

15.5

15.6

15.7

15.8

Range (mm): 3.1 0.6

nose-probe sp4 spk1

15.4

15.5

15.6

15.7

15.8

Range (mm): 3.4 0.7

nose-probe sil1 spk2

18.3

18.4

18.5

18.6

18.7

Range (mm): 0.2 0.1

nose-probe sp1 spk2

18.3

18.4

18.5

18.6

18.7

Range (mm): 1.6 0.4

nose-probe sp4 spk2

18.3

18.4

18.5

18.6

18.7

Range (mm): 1.2 0.4

nose-probe sil1 spk3

21.4

21.6

21.8

Range (mm): 0.5 0.1

nose-probe sp1 spk3

21.4

21.6

21.8

Range (mm): 3.5 1.1

nose-probe sp4 spk3

21.4

21.6

21.8

Range (mm): 3.5 1.1

Fig.9. Distribution of Euclidean distance between nose marker 1 and probe marker 1 for silence1, speech1, and speech4 in the UltraFit condition.

Table2

Distribution of Euclidean distance be- tween nose marker 1 and probe marker 1 for speech before and after head move- ment removal for the UltraFit headset.

Spk1 Spk2 Spk3 Range before 3.3 2.7 3.5 removal Range after 9 13.5 2.6 removal Difference -5.7 -10.8 0.9

distancesbetweennoseandprobemarkerfromthedatawherehead movementwasremovedandcompareitwiththedistancesinFig.5 we get3.3vs9.0mm(speechofspk1 )and2.7vs13.5mm(speechof spk2 ) asshowninTable2.

WhatwecanstillinferfromFigs.7 and8 isthatthelargesterror liesinthe z -directionthatisfromthespeakersheadintothedirection ofthemicrophone.

ToinvestigatethedynamicsofrecordingsFigs.9 and10 showthe distancesforthefirstsilenceandspeech,andlastspeechrecordingsfor UltraFitandUSHcondition.Inthis waywecanevaluateifthereare changesduringtherecordingsession.

AscanbeseeninFig.9theUltraFitismoreflexiblesinceitallowsfor expansionduringtherecordingsessionfromasmallersizeinthefirst silenceandthenexpandingduringtherecording.FortheUSHin com-parisonthemedianvaluesdonotchangesomuchduringtherecordings. AWilcoxonranksumtestforequalmediansshowssignificant(𝑝< .001)differencesbetweenthefirstsilencerecording(sil1)andthefourth speechrecording(sp4)fortheUltraFitconditionforallthreespeakers andfortheUSHconditionfor spk2 andspk3 .

3.2. Recording interference

Byrecordingvisualmarkersattheouterarticulators(lip,jaw)inthe naturalandtwoheadsetconditionsweareabletomeasureifthereis adifferencebetweentheserecordingconditions,whichcanbedueto constraintsthataresetbytheheadsetsleadingtohypo-or hyperarticu-lation.

Fig.11showstheamountoflipopeningduringtherecordings,which wasmeasuredbythedistancebetweentheupperandlowerlipmarker incm.Itcanbeseenthatthelargestdifferencebetweenthethree con-ditionsappearsattherounded/u/vowel(leftmostFigure),which indi-catesalargeramountofroundingof/u/vowelsintheUSHcondition (hyperarticulation),andaloweramountofroundingintheUltraFit con-dition(hypoarticulation).

Theproductionofthe/a/vowelshowsaverysimilardistribution forallthreeconditions.In/i/vowelstherearealsosmalldifferences betweenthethreeconditions.

4. Analysis of articulatory data

Duringtheexperiment,thetransitionfromthemetalheadsettothe polymerheadset forcedthe researcherstore-position theultrasound probe.Thisre-positioningwasdonewithoutanyaidthatwouldensure thattheprobewaspositionedintheidenticallocationforboth record-ingsessions.Becauseofthis,eachtimetheprobewasre-positioned,a newspatialreferencesystemwasdefined(Stone,2005),differencesin transduceranglesrelativetotheheadwereintroducedanddifferent por-tionsofthetongueandpalatewerevisualized.Thismadeitdifficultto runadirectcomparisonoftongueandpalateprofilesinthetwo record-ingsessionsofthesamespeakerandbetweentherecordingsessionsof thedifferentspeakers(Pinietal.,2019).Therefore,inordertoevaluate

(9)

nose-probe sil1 spk1

21.1

21.2

21.3

21.4

21.5

Range (mm): 3.7 1.8

nose-probe sp1 spk1

21.1

21.2

21.3

21.4

21.5

Range (mm): 2.7 0.5

nose-probe sp4 spk1

21.1

21.2

21.3

21.4

21.5

Range (mm): 1.6 0.7

nose-probe sil1 spk2

20.5

20.6

20.7

20.8

Range (mm): 0.1 0

nose-probe sp1 spk2

20.5

20.6

20.7

20.8

Range (mm): 0.8 0.2

nose-probe sp4 spk2

20.5

20.6

20.7

20.8

Range (mm): 0.5 0.1

nose-probe sil1 spk3

19

19.2

19.4

19.6

Range (mm): 0.2 0.1

nose-probe sp1 spk3

19

19.2

19.4

19.6

Range (mm): 3.2 0.5

nose-probe sp4 spk3

19

19.2

19.4

19.6

Range (mm): 3.2 0.5

Fig.10. Distribution of Euclidean distance between nose marker 1 and probe marker 1 for silence1, speech1, and speech4 in the USH condition.

Opening in cm

0 2 4 6

pdf

0 0.2 0.4 0.6 0.8 1

1.2

Lip opening for vowel /u/

Opening in cm

0 2 4 6

pdf

0 0.2 0.4 0.6 0.8 1 1.2

1.4

Lip opening for vowel /i/

Opening in cm

0 2 4 6

pdf

0 0.5 1 1.5 2 2.5 3

3.5

Lip opening for vowel /a/

Natural UltraFit USH

Fig.11. Lip opening measured by distance of upper and lower lip marker for the vowels /u, i, a/..

thedifferencesbetweenthetwoheadsets,wecomparedtwoindicesa posteriori.

Thefirstindexconsideredisthedistance betweentheultrasound probeandthetongueprofile.Thisdistanceismeasuredalongthe cen-tralradiusof thefansuperimposedoneachultrasoundimagebythe softwareAAA(seeFig.12).Thechoiceoftheradiusisdeliberate;since thedepthsettingoftheultrasoundsystemiscalibratedbytakingthis segmentasaroughreference,aclearimageofthetonguesurfacewitha highspatialresolutionisexpectedtoalwaysbeonthisradius,allowing formoreaccuratemeasurementstobeobtained.Thisdecisionissimilar

totheonediscussedbyViettietal.(2015)whichhasalsoproventobe accurateenoughtosolvethetaskofwordrecognitionfromultrasonic tongueimages(Alessandroetal.,2015).Sincethisindexisbasedon themeasurementofthemaximumtonguedisplacementfortheradius inquestion,anylowqualitypalateimagesareirrelevant.

The second index considered is the areaof the geometricfigure shown in Fig.13.The figurehasfour sidesdefined bythe intersec-tion of the following: (a) the profile of the tongue of the speaker (ADline);(b)thelinejoiningtheplaceofarticulationof∕𝑡 ∕and∕𝑘 ∕ (BC line); (c)and (d)the radiijoining the origin of the ultrasound

(10)

Fig.12. Radius used for the computation of the dis- tance between the probe and the surface of the tongue.

Fig.13. Area taken into account for the computation of the area index.

probewiththeplaceofarticulationof∕𝑡∕(lineAB)and∕𝑘∕(lineCD), respectively.

ThechoicetousetheBCsegmentastheuppersideofthegeometric figureinsteadofthemoreusualpalateprofileisduetothelow qual-ityoftheultrasoundimagesduringswallowing,namelythoseimages thatareused,typically,forthereconstructionofthepalateprofile it-self(EpsteinandStone,2005).Moreover,thedecisiontoconsiderthe placesofarticulationof∕𝑡∕and∕𝑘∕aspointsofreferencebecauseof thedisappearanceofthetongueprofilefromtheultrasoundimagedue tothecontactwiththedentalalveoliandthepalate,respectively,was madebecauseoftheneedtolocatearegionofthevocaltractthatis significantfortheproductionoflinguisticsounds.Thisdecisionpartly takesoverthedecisiontakenbySpreaficoetal.(2015),Recasensand Rodríguez(2016),andDanielandClara(2018) toidentifyarticulatory zones.

Sinceoneoftheaimsoftheresearchwastoshowwhetherthesize ofthespeaker’sheadaffectedtheaccuracyoftheheadsetinanyway, thevaluesofthesecondindexhavenotbeennormalizedsoasto com-pensateforthedifferentvocaltractsizesofthethreeinformants,hence thefollowingparagraphsshowtheresultsofthecomparisonsofthe ab-solutevaluesofthemeasurementsmadefortheareaindex.

Inthissectionwerefertothevaluesmeasuredforthetwoindices inreferencetotheproductionofthesequences/’taka’taka’taka/.The choicefallsonthispseudo-wordonlybecauseitcontainsthevowel∕𝑎∕, whichissupposedtodeterminethemaximumdisplacementoftheprobe (and,thus,oftheheadset)becauseitinvolvesthemaximumjaw open-ing.

Ofthefourrepetitionsrecordedbyeachspeaker,onlythesecond, thirdandfourthwereconsidered.Itwasnecessarytodiscardthefirst repetitionbecauseintwocasesoutofthree(spk1 and spk2 )therewere synchronizationproblemsbetweentheaudioandtheultrasoundsignal thatwouldaffectthereliabilityofthedata.

Inaddition,foreachofthethreerepetitions,thevaluesofthetwo indiceswerecalculatedbasedontheacousticsatthemidpointofthe pronunciationof∕𝑎∕.Thesevalueswereextractedautomaticallyafter manuallyidentifyingthecoordinatesfortheareaindexanddefiningthe formulatocalculateitusingthe“AnalyseValue” functionoftheAAA software.

Afirstboxplotrepresentationof theabsolute valuesforthearea anddistanceindicesshowsthatforeachrecordingthereis homogene-ityinthevarianceofboth.Thesedifferencesareduetothefactthatthe comparisonconcernsabsolutevalues(expressedincm2andcm)while

thesizeofeachspeaker’sheadand,therefore,thepositioningof head-sets andprobe,aswellasthefieldofviewanddepthsettingsinthe ultrasoundsystem,aredifferentforeachsubject(seeFig.14).

Thevalues werecomparedwitheach other.Thefirstcomparison concernedthevariationsinareaanddistancebetweenthesecond,third andfourthrepetitionof/’taka’taka’taka/aspronouncedbyeachofthe threespeakers.Thehypothesiswasthatiftheheadsethadbeenmoved becauseofthecyclesofpronunciationandswallowing,thenthevalues of thetwoindiceswouldhavechangedfromrepetitiontorepetition. However,sincetherearenosignificantdifferencesbetweenthethree repetitions(ANOVA,(p >0.05)),itcanbededucedthatthepositionof theultrasoundprobeattachedtotheheadsetremainsstableacrossall

(11)

spk1 - UltraFit spk1 - USH spk2 - UltraFit spk2 - USH spk3 - UltraFit spk3 - USH

Distance index in cm

3.2 3.4 3.6 3.8 4 4.2 4.4 4.6

Distance index

spk1 - UltraFit spk1 - USH spk2 - UltraFit spk2 - USH spk3 - UltraFit spk3 - USH 1 2 3 4 5 6 7 8 9

Area index

Fig.14. Distribution of values for the distance index in cm (top) and area index in cm 2(bottom).

repetitionsandrecordingsessionsregardlessofthetypeofheadsetor speaker.

Thesecondcomparisonconcernedthevariationoftheareaand dis-tanceindicesbetweentworecordingsmadebythesamespeaker wear-ingthetwotypesofheadset.Theobjectiveinthiscasewasnottocheck forsignificantdifferencesbetweenthetwoindiceswhenwearing Ultra-FitorUSHbecause,forthereasonssetoutabove(there-positioningof theprobe),thiswasexpectedtobethecase.Instead,theobjectivewas toverifywhetherthedifferenceintheindicesbetweentherecordings ofthespeakerswiththepolymerheadsetandthemetalheadsetwas significant.Indeed,ifthiswerethecase,itcouldbededucedthatthe differenceisdependentonthetypeofheadsetused.

Infact, the results of the comparisons show that thedifferences aresignificant for allspeakers, and forboth indices (t -test,area in-dex:spk1 (𝑝<.001),spk2 (𝑝<.001),spk3 (𝑝<.001);distanceindex:spk1 (𝑝<.001),spk2 (𝑝<.001),spk3 (𝑝=.012).Thismaybeduetovariances inabsolutevaluesrelatedtothedifferentsizesofthespeakers’heads andtoinconsistenciesinthere-positioningoftheheadsetsandtheprobe betweenonesessionandtheother.

Wealsoreport theabsolutevalues ofthedifferences betweenthe twoindicesbecausetheyarerelevantforthepurposesofourresearch. Accordingtothedata,thedifferencesare,onaverage,175mm2(spk1 =

204mm2;spk2 =191mm2;spk3 =128mm2)fortheareaindexand2.1

mmforthedistanceindex(spk1 =2.6mm;spk2 =3.2mm; spk3 =0.4). Whilethedatarelatingtotheareaismoredifficulttointerpretbecause itwouldalsodeserveaquantitativediscussionofthedifferencesinthe geometricfigure,thedatarelatingtothedistanceisveryinformative.

First,thelineardistancesfromtheoriginoftheprobetothetongue surfacedetectedwiththepolymerheadsetarealwayssmallerthanthose

detectedwiththemetalheadset,whichcouldindicateagreaterinsertion oftheprobebetweenthemetalprotuberances.Second,theaverage dif-ferencebetweenthemaximumandminimummeasurementsisalways lowerforUltraFit(average:2.3mm)thanforUSH(average:3.8mm), perhapstestifyingtoagreaterstabilityoftheprobe’spositioningduring theexperiment.Moreover,theselastvaluesarerelevantbecausethey presentordersofmagnitudeinlinewiththoseobtainedfromthe anal-ysisofthevisualmarkersconductedwiththeNaturalPointOptiTrack Expressionsystem.

5. Analysis of acoustic data

TheacousticanalysisincludesmeasurementsoftheformantsF1to F3,andthedurationofthestressedandunstressedvowels.Thespectral informationwasextractedusingasemi-automaticprocedureinPRAAT (BoersmaandWeenink,2017).

Thedurationofthevowelwasmeasuredmanually,relyingonthe periodicityandamplitudeofthewaveform.Thescriptcalculatedthe temporalmidpointoftheinterval(startandendofthevowel)and ex-tractedtheformantsatthese timepoints.Fourteenstimulihadtobe excludedduetotechnicalproblemsduringtherecording,resultingin 634stimuliasabasisfortheformantanalyses.

TheformantanalysesincludedseparateanalysesoftheF1,theF2, andcombinedmeasuresofF1andF2.InFig.15 theF1(inHz) val-uesaregivenonthe y -axis,thespeakersandconditions(typeof head-set)areonthe x -axis.Ascanbe seen,theformantfrequenciesforF1 differ in regard of the speaker andcondition.Furthermore, there is muchhighervariabilityin spk1 - female thanintheothertwo speak-ers.RegardingF2(Fig.15 bottom),allspeakersdisplayahigher

(12)

vari-Natural UltraFit USH

F1 (in Hz)

200 400 600 800 1000

Spk1 - female

Natural UltraFit USH 200 400 600 800 1000

Spk2 - male

Natural UltraFit USH 200 400 600 800 1000

Spk3 - male

Natural UltraFit USH

F2 (in Hz)

500 1000 1500 2000 2500 3000

Spk1 - female

Natural UltraFit USH 500 1000 1500 2000 2500 3000

Spk2 - male

Natural UltraFit USH 500 1000 1500 2000 2500 3000

Spk3 - male

Fig.15. Formant distributions for F1 (top) and F2 (bottom).

ability;howeveragain spk1 - female seems tohavethemostvariable production.

Fig.16 displaysthevowelspace duringthethree conditions per speaker,withtheF1onthex-axisandtheF2onthey-axis.Thecolors differentiatethethreeconditions. Commensuratethethreeplots,the formantanalysisshowsthattheproductionofvowelsisinfluencedby conditionandspeaker.Furthermore,itseemsasiftheconditionsnatural andUltraFitproduceamuchmoresimilarvowelspacethantheUSH, indicatingthatthespeakerswereinfluencedtoahigherdegreebythe metalheadintheirvowelproduction.

Toestimatetheinfluenceofthethreeconditions(Natural,UltraFit, USH)andtoaccount forthecombinationofthetwodependent vari-ablesF1andF2,wefittedamultivariateanalysisofvariance(MANOVA) inR(RCoreTeam,2018).Themodelfoundsignificantmaineffects for speaker (𝐹 (4, 129), 𝑝 < . 001), triplet (𝐹 (2, 4. 6), 𝑝 = . 01), condition (𝐹(4,3.1),𝑝<.001),andtargetword(𝐹(6,1025),𝑝<.001)ontheformant values ofthestressed vowel.The follow-upANOVA proofedthat F1 andF2showsignificantinfluencefromspeaker(F1:(𝐹 (2, 417), 𝑝 < . 001) F2:(𝐹(2,417),𝑝<.001))andtargetword (F1:(𝐹(3,417),𝑝<.001),F2: (𝐹(3,417),𝑝<.001))butonlyF1isinfluencedbytriplet(𝐹(1,417),𝑝= . 003)andcondition(𝐹 (2, 417), 𝑝 =. 01).

Forcomparingthetwoheadsetsweareinterestedintheinfluence of therecording condition,theother variables(speaker, triplet, tar-getword)areexpectedtohaveaninfluenceon theformantsasalso revealedbytheANOVA.Only F1not F2isinfluencedbythe record-ingcondition,since theprobe restricts thejaw openingin theUSH andUltraFitcondition,andthefirstformantis themostinformative

forrestrictionsinthejawopening.F2valuesarecommonlyassociated withback-frontvowelsandlesslikelytobeinfluencedbythedifferent conditions.

Therefore wedidaseparate analysis(LinearMixedEffectModel, lmerusing(Batesetal.,2015))forF1ofeachvoweltoaccountforthe possibilityofeffectsindifferentdirectionsmaskingeachother.

Forthevowel/a/themodelrevealedsignificantinfluencesofthe repetition(𝑡 (210)=−2. 927, 𝑝 =. 0038),nosignificantdifferencebetween UltraFitandUSHcondition(𝑡(210)=0.888,𝑝<.37),butatendencyfora differencebetweenUSHandnaturalcondition(𝑡(210)=1.697,𝑝=.091). For/i/themodeldidnotrevealsignificanteffectsneitherfor rep-etition(𝑡 (96)=−0. 49, 𝑝 =. 62)norcondition(UltraFit:𝑡 (96)=1. 170, 𝑝 = .245;natural:𝑡(96)=0.53,𝑝=.568).

For/u/wefoundsignificantdifferencesbetweenUSHand Ultra-Fitcondition(𝑡 (102)=2. 424, 𝑝 =. 017)andUSHandnaturalcondition (𝑡(102)=2.536,𝑝=.013)aswellasatendencyforainfluenceof repeti-tion(𝑡(102)=−1.934,𝑝=.056).

AsisdocumentedinthestatisticsandcanalsobeseeninFigs.16and

17,thespeakersareinfluencedbytheheadsetcondition.Theinfluence of conditiononthevowel/a/forF1suggeststhatspeakerswere re-strictedintheirjawopeningtoagreaterextentinUSHconditionthereby producinglowerF1andmorecentralized/a/vowels(seeFig.17).For thevowel/u/inUSHconditionalsolowerF1wasproduced,whichmay beattributedtogeneralarticulationrestrictionssincetheloweringof thejawplaysaminorroleintheproductionof/u/vowels(seeFig.17). OverallwecanseethattheUSHconditionhadalargerinfluenceonthe firstformantthentheUltraFitcondition.

(13)

F2 (in Hz)

500 1000 1500 2000 2500 3000

F1 (in Hz)

200 400 600 800 1000 1200

Spk1 - female

Natural UltraFit USH

F2 (in Hz)

500 1000 1500 2000 2500 3000

F1 (in Hz)

200 400 600 800 1000 1200

Spk2 - male

Natural UltraFit USH

F2 (in Hz)

500 1000 1500 2000 2500 3000

F1 (in Hz)

200 400 600 800 1000 1200

Spk3 - male

Natural UltraFit USH

Fig.16. Joint formant distributions of F1 ( y-axis, from high to low values) and F2 ( x-axis, from high to low values) for spk1(top), spk2(middle) and spk3(bottom).

a

i

u

USH

Natural UltraFit

USH

Natural UltraFit

USH

Natural UltraFit

300

400

500

600

Fitted F1 (Hz)

(14)

Table3

Overall evaluation of headsets. USH UltraFit Usability ★★★✩✩ ★★★★★ Flexibility ★★★✩✩ ★★★★★ Accuracy ★★★★✩ ★★★★✩ Lip opening ★★★★✩ ★★★★✩ Articulation ★★★★✩ ★★★★✩ Formants ★★★★✩ ★★★★★ Applications ★★★★✩ ★★★★★ 6. Discussion

Theevaluationinthis paperandpreviouswork (Spreaficoet al., 2018)allowsustocomparethetwoheadsetsalongdifferentdimensions suchas

Usability,concerningtheusabilityfromthesideofthespeaker wear-ingtheheadset(comfort,easytouse)andtheexperimenterusing theheadset(fixingtheheadset).UltraFithasamuchbetter usabil-itysinceitislighteranddoesnotrestonpartsoftheheadthatcan inducepain(Spreaficoetal.,2018).

Flexibility,concerningthepossibilitiestousetheheadsetindifferent recordingsetupstogetherwithvisual tracking softwareMRI,etc. HerealsotheUltraFitheadsetismoreflexible,sinceitcanberealized completelyinplasticmaterial(Spreaficoetal.,2018).

Accuracy,concerningthestabilityoftheheadsetduringrecording, whichwasevaluatedinSubsection3.1 whereweshowedthatthe twoheadsetshavesimilaraccuracy.

Lip opening, concerning the question if the headset influences the opening of the lips in some way, which was evaluated in

Subsection3.2andshowedthatbothheadsetsslightlyinfluencethe lipopening.

AnalysisofarticulatorydatainSection4showedthatthepositionof theultrasoundproberemainssatisfactorilysteadyacrossrecording sessionsregardlessofthespeakerorthetypeofheadset.

TheformantanalysisinSection5 showedthattheUSHheadsethas alargerinfluenceontheproductionofthefirstformantF1. Concerningapplicationscenarioswhicharediscussedindetailin

Spreaficoetal.(2018)theUltraFitheadsethastheadvantageof be-ingmoreeasilyusablewithchildrenforeducationalpurposesfor example.

Table3 showsascoringofthetwoheadsetsaccordingtoafive-star systemthatwederivedfromtheoverallevaluation.

7. Conclusion

WeperformedanobjectiveevaluationoftwoheadsetsforUltrasound TongueImaging(UTI),theUSH,ametallicheadsetusedinmany labo-ratoriestoday,andtheUltraFit,anewheadsetmadefrompolymerthat wasrecentlydeveloped.

Usingopticaltrackinghardwareandsoftwareweshowedthatboth headsetshaveasimilaraccuracywiththeUSHperformingslightly bet-teroverallbutintroducingthelargesterrorforonespeaker,andthatthe UltraFitheadsetshowsmoreflexibilityduringrecordings.Bymeasuring alsothelipmovementwithvisualtrackingweshowedthatboth head-setshaveadifferentinfluencetolipopening.Concerningthetongue movementtherearenosignificantdifferencesbetweendifferentsessions showingthestabilityofbothheadsetsduringtherecordings.Acoustic analysisofformantdifferencesinvowelsrevealedthattheUSHheadset hasalargerinfluenceonformantproductionthantheUltraFitheadset. Withtheseresultswemayconcludethatbothheadsetsareequally wellsuitableforrecordingsinspeechscienceresearch,withtheUltraFit beingbetterintermsofusability,flexibility,andproductionofformants,

whichalsomakesitbettersuitablefortechnological,educationaland clinicalapplications.

Declaration of Competing Interest

Dr.LorenzoSpreaficohasafinancialinterestinUltraFit,oneofthe evaluatedheadsets,sincehewasinvolvedinthedevelopmentof Ultra-Fit,whichis nowcommercialisedbyArticulateInstruments.Forthis paperheonlyperformedthearticulatoryanalysis,basedondatathat wascollectedattheAcousticsResearchInstitute(ARI)bytheother au-thors.TheanalysisofvisualandacousticdatawasperformedatARI bytheotherauthors.TheotherauthorsfromtheARIhavenofinancial interest/personalrelationshipsregardingUltraFit.

CRediT authorship contribution statement

Michael Pucher: Conceptualization,Methodology,Software,Data curation, Writing -original draft,Formal analysis, Visualization, Su-pervision,Fundingacquisition. Nicola Klingler: Datacuration,Formal analysis,Visualization.Jan Luttenberger: Datacuration,Formal anal-ysis,Writing-review&editing.Lorenzo Spreafico: Datacuration, For-malanalysis,Visualization.

Acknowledgements

This work was supported by the Austrian Science Fund (FWF):

F6002,I2539.Thisworkwasalsosupportedwithresearchfundsfrom theprojectonDigitalHumanitiesbytheDLLCSDepartmentof Excel-lenceoftheUniversityofBergamo.

References

Alessandro, V., Vittorio, A., Lorenzo, S., 2015. Verso un sistema di riconoscimento auto- matico del parlato tramite immagini ultrasoniche. In: Il farsi e disfarsi del linguag- gio. Acquisizione, mutamento e destrutturazione della struttura sonora del linguag- gio/Language acquisition and language loss. Acquisition, change and disorders of the language sound structure, pp. 477–489. doi: 10.17469/O2101AISV000032 . Articulate Instruments Ltd., 2008. Ultrasound Stabilisation Headset —Users Manual, revi-

sion 1.5.. Articulate Instruments Ltd.. URL http://www.articulateinstruments.com/ Articulate Instruments Ltd., 2017a. Articulate assistant Advanced — Ultra-

sound Module User Guide, version 217.01. Articulate Instruments Ltd.URL http://www.articulateinstruments.com/ .

(15)

Articulate Instruments Ltd., 2017. Installation manual for Micro Ultrasound system — Users Manual, revision 1.5.. Articulate Instruments Ltd.. URL http://www.articulateinstruments.com/

Bates, D., Mächler, M., Bolker, B., Walker, S., 2015. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67 (1), 1–48. doi: 10.18637/jss.v067.i01 .

Boersma, P., Weenink, D., 2017. Praat: doing phonetics by computer. URL: http://www.praat.org .

Bruce, D. , Tanja, S. , Kiyoshi, H. , Thomas, H. , J.M., G. , J.S., B. , 2010. Silent speech inter- faces. Speech Commun. 52 (4), 270–287 .

Cai, J. , Denby, B. , Roussel-Ragot, P. , Dreyfus, G. , Crevier-Buchman, L. , 2011. Recognition and Real Time Performance of a Lightweight Ultrasound Based Silent Speech Interface Employing a Language Model, pp. 1005–1008 .

Canella, G. , 2019. UltraFit: Modelling and Simulation of an Ultrasound Probe Stabilization Headset. Free University of Bozen, Italy Master’s thesis . Unpublished BA thesis Daniel, R., Clara, R., 2018. An ultrasound study of contextual and syllabic effects in con-

sonant sequences produced under heavy articulatory constraint conditions. Speech Commun. 105, 34–52. doi: 10.1016/j.specom.2018.10.007 .

Davidson, L. , Decker, P.D. , 2005. Stabilization techniques for ultrasound imaging of speech articulations. J. Acoust. Soc. Am. 2544 .

Derrick, D. , Best, C. , Fiasson, R. , 2015. Non-metallic ultrasound probe holder for co-collec- tion and co-registration with ema. In: Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015) .

Derrick, D. , Carignan, C. , Chen, W.-r. , Shujau, M. , Best, C.T. , 2018. Three-dimensional printable ultrasound transducer stabilization system. J. Acoust. Soc. Am. 144 (5), EL392–EL398 .

Eleanor, S. , Lloyd, S. , Lam, J. , Cleland, J. , 2019. Systematic review of ultrasound visual biofeedback in intervention for speech sound disorders. Int. J. Lang. Commun.Disord. 54 (5), 705–728 .

Epstein, M.A., Stone, M., 2005. The tongue stops here: ultrasound imaging of the palate. J. Acoust. Soc. Am. 118 (4), 2128–2131. doi: 10.1121/1.2031977 .

Fabre, D. , Hueber, T. , Alameda-Pineda, X. , Badin, P. , 2017. Automatic animation of an articulatory tongue model from ultrasound images of the vocal tract. Speech Commun. 93 (4), 67–75 .

Hueber, T. , Benaroya, E.-L. , Chollet, G. , Denby, B. , reyfus, G. , Stone, M. , 2010. Devel- opment of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Commun. 52 (4), 288–300 .

de Jong, K., Berkson, K., Lulich, S.M., Myers, S., Bohnert, A., 2019. The lingual topography of american english laterals in onsets and codas. J. Acoust. Soc. Am. 145 (3), 1928. doi: 10.1121/1.5102009 .

Matosova, A. , 2016. UltraFit. Free University of Bozen, Italy Master’s thesis . Unpublished BA thesis

Nakai, S. , Beavan, D. , Lawson, E. , Leplatre, G. , Scobbie, J. , Stuart-Smith, J. , 2016. Viewing speech in action: speech articulation videos in the public domain that demonstrate the sounds of the international phonetic alphabet (IPA). Innov. Lang. Learn. teach. 0 (0) . Pini, A. , Spreafico, L. , Vantini, S. , Vietti, A. , 2019. Multi-aspect local inference for func- tional data: Analysis of ultrasound tongue profiles. Journal of Multivariate Analysis 170, 162–185 . Special Issue on Functional Data Analysis and Related Topics Preston, J. , Leece, M. , Maas, E. , 2016. Intensive treatment with ultrasound visual feedback

for speech sound errors in childhood apraxia. Neuroscience 10 (240) .

R Core Team, 2018. R: A Language and Environment for Statistical Computing. R Foun- dation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ Recasens, D. , Rodríguez, C. , 2016. A study on coarticulatory resistance and aggressiveness

for front lingual consonants and vowels using ultrasound. J. Phonetics 59, 58–75 . Ribeiro, M.S., Eshky, A., Richmond, K., Renals, S., 2019. Speaker-independent classifica-

tion of phonetic segments from raw ultrasound in child speech. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1328–1332. doi: 10.1109/ICASSP.2019.8683564 .

Schabus, D. , Pucher, M. , Hofer, G. , 2014. Joint audiovisual hidden semi-Markov mod- el-based speech synthesis. IEEE J. Sel. Top. Signal Process. 8 (2), 336–347 . Scobbie, J.M. , Wrench, A.A. , Linden, M.L.V.D. , 2008. Head-probe stabilisation in ultra-

sound tongue imaging using a headset to permit natural head movement. In: In Pro- ceedings of the 8th Speech Production Workshop: Models and Data, pp. 373–376 . Shawker, T.H. , Sonies Phd, B.C. , 1984. Tongue movement during speech: a real-time ul-

trasound evaluation. J. Clin. Ultrasound 12 (3), 125–133 .

Spreafico, L. , Celata, C. , Vietti, A. , Bertini, C. , Ricci, I. , 2015. An epg+uti study of italian /r/. ICPhS .

Spreafico, L., Matosova, A., Vietti, A., Galata, V., 2017. Two head-probe stabilization de- vices for speech research and applications. Poster presentation. Ultrafest VIII. Pots- dam, October 4–6, 2017.

Spreafico, L. , Pucher, M. , Matosova, A. , 2018. Ultrafit: a speaker-friendly headset for ul- trasound recordings in speech science. In: Proc. Interspeech 2018, pp. 1517–1520 . Stone, M. , 2005. A guide to analyzing tongue motion from ultrasound images. Clin. Lin-

guist. Phon. 19 (6–7), 455–502 .

Stone, M. , Davis, E. , 1995. A head and transducer support system for making ultrasound images of tongue/jaw movement. J. Acoust. Soc. Am. 98 (6), 3107–3112 . Toeger, J. , Sorensen, T. , Somandepalli, K. , Toutios, A. , Lingala, S.G. , Narayanan, S. ,

Nayak, K. , 2017. Test–retest repeatability of human speech biomarkers from static and real-time dynamic magnetic resonance imaging. J. Acoust. Soc. Am. 141 (5), 323–3336 .

Vietti, A. , Spreafico, L. , Anselmi, V. , Spreafico, L. , 2015. Allophonic variation: an articu- latory perspective. In: Presentation Ultrafest VII December 8th-10th, 2015, The Uni- versity of Hong Kong .

Whalen, D.H. , Iskarous, K. , Tiede, M.K. , Ostry, D.J. , Lehnert-LeHouillier, H. , Vatikiotis– Bateson, E. , Hailey, D.S. , 2005. The haskins optically corrected ultrasound system (hocus). J. Speech Lang. Hear. Res. 48 (3), 543–553 .

Wilson, I. , Gick, B. , 2006. Ultrasound technology and second language acquisition re- search. In: Proceedings of the 8th Generative Approaches to Second Language Acqui- sition Conference (GASLA 2006), pp. 148–152 .

Zharkova, N. , Gibbon, F. , Hardcastle, W. , 2015. Quantifying lingual coarticulation using ultrasound imaging data collected with and without head stabilisation. Clin. Linguist. Phon. 29, 1–17 .

Figura

Fig. 1. Exploded view of the UltraFit system.
Fig. 3. Visual marker configuration (top). Video still from recordings (bottom). Natural - spk2 (left column), UltraFit - spk3 (middle column), and USH - spk1 (right column) recording condition.
Fig. 4. Distribution of Euclidean distance between nose marker 1 and nose marker 2 for sp eech in UltraFit (top), and USH (bottom) condition..
Fig. 5. Distribution of Euclidean distance between nose marker 1 and probe marker 1 for sp eech (top) and sil ence (bottom) in the UltraFit condition
+7

Riferimenti

Documenti correlati

In contrast with the personal account, every- thing you invest in the common project generate some earnings for all the group members.. The earnings from the common project will

Deliverable 4.2 Report on the synergies between EU Cohesion Policy and rural development policies Table 3.8: Results from OLS estimation of Equation (1): Cohesion policy categories

Since a Υ-family does not belong to the kind ⋆, and a Υ-object does not belong to a Π-family, the Υ binder provides for a restricted form of higher-order universal quantification

Since the signature is applied on two datasets hybridized over two different microarray platforms with a reduced number of probes, the good performances obtained, as compared to

evidenziati da chi ha proposto la riforma in oggetto, non sembrano in verità corrispondere alle aspettative. È verosimile, allora, che in questo caso l‟annuncio del messaggio

Le imprese che appartengono a questa industria sono molto eterogenee sia per quanto riguarda la dimensione, che il coinvolgimento in questo business, e per poter permettere

Further, we show that for intermediate bargaining power values, the upstream supplier prefers to offer contingent contracts to both downstream firms; whereas for lower values, it