ContentslistsavailableatScienceDirect
Biomedical
Signal
Processing
and
Control
j ou rn a l h o m e pa g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c
Testing
software
tools
for
newborn
cry
analysis
using
synthetic
signals
S.
Orlandi
a,∗,
A.
Bandini
a,b,
F.F.
Fiaschi
a,
C.
Manfredi
aaDepartmentofInformationEngineering,UniversitàdegliStudidiFirenze,Firenze,Italy
bDepartmentofElectrical,ElectronicandInformationEngineering(DEI)“GuglielmoMarconi”,UniversitàdiBologna,Bologna,Italy
a
r
t
i
c
l
e
i
n
f
o
Articlehistory: Received7June2016 Receivedinrevisedform 20September2016 Accepted15December2016 Availableonline21December2016 Keywords: Infantcry Acousticalanalysis Autoregressivemodels Wavelettransform Fundamentalfrequency Resonancefrequencies
a
b
s
t
r
a
c
t
Contactlesstechniquesareofincreasingclinicalinterestastheycanprovideadvantagesintermsof
comfortandsafetyofthepatientwithrespecttosensor-basedmethods.Therefore,theyareparticularly
wellsuitedforvulnerablepatientssuchasnewborns.Specificallytheacousticalanalysisoftheinfantcry
isacontactlessapproachtoassisttheclinicalspecialistinthedetectionofabnormalitiesininfantswith
possibleneurologicaldisorders.Alongwiththeperceptualanalysis,theautomatedanalysisofinfant
cryisusuallyperformedthroughsoftwaretoolsthathowevermightnotbedevotedtothisspecific
signal.Thenewborncryisasignalextremelydifficulttoanalyzewithstandardtechniquesduetoits
quasi-stationarityandtoveryhighrangeoffrequenciesofinterest.Thereforesoftwaretoolsshouldbe
specificallysetandusedwithcaution.Toaddressthisissuethreemethodsaretestedandcompared,
onefreelyavailableandothertwospecificallybuiltusingdifferentapproaches:autoregressiveadaptive
modelsandwavelets.Thethreemethodsarecomparedusingsyntheticsignalscomingfromasynthesizer
developedforthegenerationofbasicmelodicshapesofthenewborncry.Resultspointoutstrengthsand
weaknessesofeachmethod,thussuggestingtheirmostappropriateuseaccordingtothegoalsofthe
analysis.
©2016ElsevierLtd.Allrightsreserved.
1. Introduction
Cryingisthefirstandprimarymethodofcommunicationamong humans.It is a functional expression of basicbiological needs, emotionalorpsychologicalconditionssuchashunger,cold,pain, crampsand even joy [1,2]. It involvesactivation of thecentral nervoussystemandrequiresacoordinatedeffortofseveralbrain regions,mainlybrainstemandlimbicsystem.Thereforeabrain dys-functionmayleadtodisordersinthevibrationofthevocalfoldsand inthecoordinationofthelarynx,pharynxandvocaltract,giving risetoanabnormalcry[3,4].Todatenewborncryanalysisismost oftenperformedwithaperceptiveexaminationbasedon listen-ingtothecryandvisuallyinspectingthesignalwaveformandits spectrogram.Thisapproachisoperator-dependentandrequiresa considerableamountoftimeoftenprohibitiveindailyclinical prac-tice.Anaccurateandautomatedacousticanalysisofnewborncry couldbehelpfultoassistthecliniciantodetectriskmarkersof neu-rodevelopmentaldisorders.Specifically,thedistinctionbetweena regularandanabnormalcryingcanbeveryusefulintheclinical practice.Thereforethescientificcommunityispayingspecial
atten-∗ Correspondingauthor.
E-mailaddress:silvia.orlandi@unifi.it(S.Orlandi).
tiontotechniquesdevotedtoanaccurateautomatedanalysisofthe newborncry[5–7].
Themostsignificantacousticalparametersofinfantcryingare thefundamentalfrequency(F0)andthefirstthreeformant
frequen-ciesofthevocaltract(F1,F2andF3).F0reflectstheregularityofthe
vibrationofthevocalfoldswhileF1,F2andF3 arerelatedtothe
varyingshapeandlengthofthevocaltractduringphonationand thustoitscontrol[8].Actually,itismoreappropriatetoreferto resonancefrequencies(RFs)ratherthanformantsinnewborns.In fact,thevocaltractisalmostflat,themobilityoftheoralcavityis reducedandthebabyisunabletoarticulatevowelorconsonant soundsasthepharynxistooshortandnotwideenoughforthat purpose.ThereforehereafterwewillrefertoF1-F3asofresonance
frequencies(RFs).
InfantF0valuesareusuallyintherange200Hz–800Hz(inthe
caseofhyperphonationtheycanreachandexceed1000Hz)[5,9]. Thisrangeisquitewideincludingbothhealthyandpathological newborns.Indeed nopreciserangesare validatedin the litera-turefordifferentiatingthetwocategories,thepossiblediseases orcategoriesbeingofverydifferentnature(i.e.,deaf,asphyxiated, gastroschistic,preterm,etc).TypicalvaluesforthefirstthreeRFs arearound1000Hz,3000Hzand5000Hz,respectively[10].The RFsestimationisverydifficulttoperformconsideringtheirhigh
http://dx.doi.org/10.1016/j.bspc.2016.12.012
variability.Significantdeviationsfromtheserangesmayberelated topathologicalconditionsofthecentralnervoussystem[11,12].
The automated analysis of infant cry hasits origins several decadesagowhenthetechnologywaslimited.Therefore,itwas mainlybasedontheperceptualanalysismadebycliniciansthrough listeningtothecrysignalandvisualinspectionofspectrogram esti-matedwiththeFastFourierTransform(FFT)[13].Thisapproachis implementedintheMultidimensionalVoiceProgram(MDVPTM),
thefirstandstillusedcommercialtool,thoughdevelopedforadult voices[14].Currently,manyresearchersusePRAAT[15,16]freely availableonline.AsMDVPTM,itwasdevelopedfortheadultvoice.
Itrequiresacarefulmanualsettingofsomeparametersandthus sometechnicalskills[16].
Thus,sinceearlystudies,itwashighlightedtheneedtodevelop dedicatedsoftwaretoolsthatcouldprovideautomaticallythemain parametersoftheinfantcry.Inthelasttwentyyears,mostofthe researchwasdevotedtoF0estimationwithtraditionalapproaches
suchas FFT, correlation and cepstrum [13,17–20]. Instead, few papersaddressedtheRFsestimation:insomepapersF1was esti-matedwithFFT[10,18,21,22]andrecentlyFFTisappliedforF1-F3
estimation[20].InRobbetal.[22],FFTandLinearPrediction(LP) methodswerecompared,withresultscomparableonlyforF1.
Several approaches were tested on synthetic and real sig-nals[7,10,11,23].Anautomaticadaptiveparametricapproachwas developed,calledBioVoice[24],andsuccessfullyappliedto new-borncry[7,25,26].AswellasforF0,thedifficultyintheestimation
oftheRFsismainlyduetothequasi-stationarityandtheveryhigh rangeoffrequenciesofinterestinthenewborncrywhichrequires sophisticatedadaptivenumericaltechniquescharacterizedbyhigh time-frequencyresolution. Totheauthors’knowledgeonly two softwaretoolsexistspecificallydesignedfortheautomaticanalysis ofinfantcrying.TheabovementionedBioVoice[7,23,25–27],based onaninnovativeadaptiveparametricapproachforF0andformants
estimation[10]andsuccessfullytestedagainstothersoftwaretools [11,28]andasoftwaretoolrecentlyproposedbyReggianninietal. [20]thatestimatesF0 bymeansofacepstrumapproach.[11].It
wasshownthatthecepstrumapproachisfasterthanaparametric approach,butlessrobustagainstnoiseinfrequencies(F0andRFs)
estimation[11].
Takingintoaccountthehighvariabilityofthesignalthewavelet approachcouldbeparticularlysuitedtothestudyofneonatalcry thankstoitstime-frequencyvaryingresolutionandlowcomputing time.Thereforeanewautomatedmethodbasedonthewavelet transformisproposedherefortheestimationofF0 andtheRFs
ofnewborncrythat,liketheBioVoicetool,doesnotrequireany manualsettingtobemadebytheuser.Basedontheliteraturethe MexicanhatContinuousWaveletTransform(CWT)isappliedforF0
estimationandthecomplexMorletforRFsestimation[29,30].The proposedapproach,namedWInCA(WaveletInfantCryAnalyzer) iscurrentlyimplementedinMATLABbutcanbeeasilyadaptedto anyembeddedprocessor.
Anewsynthesizercapabletoreproducevariablemelodicshapes ofnewborncrywasdeveloped,basedonthemethodsdescribedin [8,31].ThissynthesizerisusedtotestWInCAagainstPRAATand BioVoice.
Thispaperpresentsthefirstattempttoapplywaveletstothe analysisofnewborncry,thereforethemethodwillbedescribedin thenextsectionalongwiththenewsynthesizer.
Theinnovativefeaturesofthispaperare:asynthesizer specif-icallydevelopedtoreproducethemelodicshapeoftheneonatal cry;anewmethodfornewborncryanalysisbasedonthewavelet transformandacomparisonofthreedifferentmethodsof auto-matedanalysisofcryonsyntheticsignals.Advantagesandlimits ofthethreemethodsarehighlightedandtheoptimaluseofeach ofthemissuggested.
2. Methods
2.1. Thenewborncrysynthesizer
ThesynthesizerwasdevelopedunderMatlabcomputing envi-ronment.Itiscapableofsynthesizingnewborncrysignalswith differentmelodyshapes.Thesynthesizer,basedonamethod devel-opedforadultmalevoices[8,31]iscomposedbytwoblocks:apulse traingeneratorandavocaltractfilter,accordingto[32].
2.1.1. Pulsetraingenerator
Thepulsetraingenerator,basedonadditivesynthesis,buildsa glottalpulsesequence.Itapproximatesaperiodicsignalthrough alinearcombinationofsinewaveswhosefrequenciesarein har-monicratiowitheachother.Thus,thefirststepoftheinfantcry synthesizerassumestheglottalpulsesequenceasa pulsetrain withperiodTthatisthereciprocalofthefundamentalfrequency F0(F0=1/T).
Thesynthesisofeachharmoniccomponentisobtainedthrough anoscillatorblock.Eachoscillatorisdrivenbytwocontrolvectors: theamplitude(a[n])and thefrequency(f[n])oftheoutputsine waves,wherenisthen-thelementofthecontrolvectors(1≤n≤N) andNisrelatedtotheduration(ins)ofthesynthesizedsignaland tothenumberofframesinwhichthesynthesizedsignalis seg-mented.AssumingasamplingfrequencyFs=44.1kHzandaframe
durationof10ms,weget441samplesperframe(SpF).Thus,the framerate(thefrequencyofthecontrolvector)isgivenbytheratio Fs/SpF(100Hz).Withasynthesizedsignalof2sofdurationN=201
samples.
Inthefirstharmonica0[n]=1andf0[n]=F0.Higherharmonics
areobtainedwiththesameamplitudevectorofthefirstharmonic (a0[n])andfrequencyvectorsgivenbymultiplesofF0:
fk[n] =kf0[n] (1)
where1≤k≤KandK=30isthenumberofdigitaloscillators con-sideredinthesynthesizer.Fromtheliterature[5]weknowthatthe rangeofF0fornewborncryisaround200–800Hzandthefirstthree
resonancefrequenciescanreachvaluesupto10kHz.Assumingan averageF0=400Hz,withK=30weareabletosynthesizesignals
withfrequenciesupto12kHz,thuscovering therangeof inter-est.Signalsobtainedbyeachoscillatorarethensummedupgiving risetotheglottalpulsetrainss[n].Accordingtotheadditive
syn-thesisdescribedabove,F0iskeptconstantthroughoutthewhole
vocalemission.Thus,togetatimevaryingF0inthesynthesized
sig-nal,amodificationofthisapproachisproposedherethroughthe modulationofthefirstharmonic.Modulationisobtaineddriving theoscillatorthatgeneratesthefirstharmonicwithavectorf0[n]
madeupbyvariablevalues,asitwillbediscussedinsubsection A3.AccordingtoEq.(1),theotheroscillators(thatgeneratehigher orderharmonics)willgiverisetofrequencymodulatedsignals. 2.1.2. Vocaltractfilter
Vocaltractresonancesaregivenbyafilterbankwith3parallel resonancefilters[31].Eachfilterisgivenbya2nd-orderall-pole filterwithfrequencyresponse:
H(z)= b1
1−a2z−1+a3z−2
(2) Giventhebandwidth(B)ofthefilter,theresonancecentral fre-quency(fc)andωc=2fc/Fs,thefiltercoefficients areobtained
throughthefollowingrelationships[33]:
r=e−BFs (3)
a2=−2rcos(2fc
Fig.1.Basicmelodyshapesofthenewborncryobtainedthroughthenewborncrysynthesizer:a)Complex(double-archmelodyshape);b)Falling(singlearchmelodyshape witharapidF0increasefollowedbyaslowdecrease,withasymmetricshapeskewedtotheleft);c)Plateau(flatmelodyprofile);d)Rising(single-archmelodyshapewith
aslowF0increasefollowedbyarapiddecrease,withasymmetricshapeskewedtotheright);e)Symmetric(singlearchmelodyshapealmostsymmetricwithrespecttoits
midpoint).
a3=r2 (5)
b1=(1−r)
1−2rcos(ωc)+r2 (6)
Thusthefrequencyresponse(2)canbewrittenas:
H(Z)= b1
1−2rcos(ωc)z−1+r2z−2
(7) Thecentralfrequenciesofthethreeparallelfilters(fc)correspond
tothevocaltractformantfrequencies(F1–F3).Thus,theproposed
synthesizercanbecontrolledbysettingthevalueofthe funda-mentalfrequency(F0)andofthefirstthreeformantfrequencies
(F1–F3).
2.1.3. Infantcrysynthesis−melodyshapesandvocaltract resonances
Accordingto[5]thefundamentalfrequencyF0ofhealthy
new-bornsusuallyvariesbetween200Hzand800Hz.Thus,theF0values
ofthesynthesizedsignalsareallowedtovaryinthisrange.F0
varia-tionsweresetaccordingtothebasicmelodyshapesofthenewborn cry[23,34–37]:
• Complex:double-archmelodyshape(Fig.1a);
• Falling:singlearchmelodyshapewitharapidF0 increase
fol-lowedbyaslowdecrease,withasymmetricshapeskewedtothe left(Fig.1b);
• Plateau:flatmelodyprofile(Fig.1c);
• Rising:singlearchmelodyshapewithaslowF0increasefollowed
byarapiddecrease,withasymmetricshapeskewedtotheright (Fig.1d);
• Symmetric: singlearch melody shape almostsymmetric with respecttoitsmidpoint(Fig.1e).
Asmentionedabove,thesynthesizedsignalshavefixed dura-tionequalto2s.Thus,thecontrolvectorsthatdrivetheoscillators aremadeupby201samples.Togetcontrolvectorswithvarying frequencyforsynthesizingthe5basicmelodyshapesofF0,aspline
interpolationwasimplemented.Theinterpolationpoints(nodes) weresettoobtainmelodicshapeswithintherange400Hz–650Hz, whichisthecentralintervaloftherangereportedin[5]. Specif-ically, for each basic shape, the frequency control vectors are obtainedasfollows:
• Thecomplexshapeisdefinedascomposedbymultiplearcs(at least2)[34].Wesynthesizedadouble-arcshapewhosemaxima (withfrequencyequalto590Hz)arelocatedat0.25sand1.25s respectively.Betweenthesemaxima,alocalminimum(with fre-quencyequalto520Hz)islocatedat0.6s,inordertomakethe secondpeakslowerwithrespecttothefirstone(Fig.1a).Thefirst andlastnodesofthevectortobeinterpolatedwerebothsetat 500Hz
• Thefallingshapewasobtainedsettinga maximumfrequency (650Hz)at0.4s,thatallowforaslowdecreasetowards450Hz (Fig.1b).Thefirstnodewassetequalto550Hz
• Theplateaushape wasobtainedby varyingF0 withina quite
narrowrange(540–555Hz)aroundanaverageF0valueequalto
550Hz.Inparticular,thereisaslightlyascendingsectionupto 555Hz,followedbyadecreasetowards540Hz(Fig.1c). • Therisingshapewasobtainedbyreversingthefallingshape.The
decreasetowards550Hz(Fig.1d).Thefirstnodewassetequalto 450Hz
• The symmetric shape was obtained setting the maximum (650Hz)inthemidpointofthevectortobeinterpolated(thus at1s).Thefirstandlastnodesweresetequalto450Hz(Fig.1e). Theamplitudecontrolvectorswereobtainedinthesameway foralltheshapes:increasingamplitudeissetinthefirstpartof thesignal,followedbyanearlyconstantpartandthendecreasing amplitudeattheendofthesignal.
Resonancefrequencyrangesweresetaccordingto[38],where thefirstthreeRFsof28healthytermnewbornswerefoundinthe range:F1[700–1400]Hz;F2[2000–4000]Hz;F3[4000–7000]Hz. Specificallyhereweset:F1=1100Hz,F2=3000HzandF3=5500Hz.
Totesttheproposedmethod,additivewhitenoiseofincreasing amplitude(1%,5%and10%ofthesignalmaximumamplitude)was addedtothesynthesizedsignalsthroughtheAudacitytool[39]. Noisewasaddedtohavemorerealisticmelodicshapesofcriesin healthyorsickinfants,sinceitisalwayscharacterizedby irregu-larities.Thefivemelodicshapesdescribedaboveweresynthesized withoutnoiseandwitheachoneofthethreelevelsofaddednoise. Thereforetwentysyntheticsignalswereanalyzed.
2.2. WInCA:waveletmethod
Inthis section thenovelwavelet-basedmethodfor estimat-ing F0 and RFs is explained. The new tool named WInCA has
beendevelopedspecificallyfortheacousticalanalysisofnewborn cry,characterizedbyhighfundamental frequencyF0 and
quasi-stationarity.Todatenootherapplicationofthewavelettransform tonewborncryexists.Therefore,thechoiceofthemotherwavelet isbasedonresultsobtainedwithadultvoicesignals.Specifically theMexicanHatcontinuouswaveletisappliedforF0 estimation
[29]andthecomplexMorletforRFsestimation[30]. 2.2.1. Continuouswavelettransform
Thewavelettransformfilters asignalf(t)withashiftedand scaledversionofaprototypefunction(t),theso-called“mother wavelet”,acontinuousfunctioninboththetimedomainandthe frequencydomain[40].
TheContinuousWaveletTransform(CWT)off(t)isdefinedas [40]: CWT(a,b;f(t),(t))=
+∞ −∞ f(t)√1 a ∗t−b a dt (8)ThescaleparameteraofaCWTisrelatedtothewidthoftheanalysis window:iteitherdilatesorcompressesthesignal.Theshift param-eterblocatesthewaveletintime.Varyingaandballowslocating thewavelet atthedesiredfrequency andtimeinstant[40].The relationshipbetweenaandthefrequencyisgivenbytheso-called pseudo-frequency(Fa)inHz,definedbythefollowingequation:
Fa= Fc
a (9)
WhereisthesamplingperiodandFcisthewaveletcentral
fre-quency.
ForF0estimation,aMexicanHatCWTisused.TheMexicanHat
CWTisdefinedas: (t)= √2
3
−1
4(1−x2)e−x22 (10)
For each time window and within the F0 frequency range
(200–800Hz),thehighestcoefficientoftheCWTmatrixisfound. Theautocorrelation(AC)iscomputedontherowofthematrixthat
Table1
Frequencybandsofinterestinnewborncry,centerfrequencyFcandbandwidthFb
forthecomplexMorlet.
Frequencyband[Hz] Fc[Hz] Fb[Hz]
F1[800–1500] 0.8 1.98
F2[1500–3500] 0.75 2.25
F3[3500–5500] 1.5 0.56
containsthisvalue,whichcorrespondstotheoptimalscale.F0is
givenby:
F0=Fs/ (11)
Wherereferstotheposition(lag)ofthemaximumoftheAC.
TheestimationofF1–F3isperformedinasimilarway,with
dif-ferentrangesfortheband-passfilterasreportedinTable1with
acomplexMorletwaveletasprototype[40].TheCWTprovidesa moreaccuraterepresentationoftheoscillatorycomponentswithin thesignalwithoutintroducingfluctuationsinthecoefficients[40]. ThecomplexMorletwaveletisdescribedbythefollowing relation-ship[41]: (t)= 1 14 ejcte − t2 2 2t (12)
where 1/1/4 is thenormalization termof the energy;for this
waveletthecenterfrequencyisdefined asc=2Fc.Thescale
parametert isthestandard deviation(STD).It determinesthe
amplitudeofthewavelet.Infactctsetsthelinkbetweenthe
bandwidthof the waveletand itsfrequency Fc.For the Morlet
wavelet,thelattermustassumevaluessuchthat[29,30]:
ωct≥5 (13)
Moreover,thefollowingrelationshipistakenintoaccount:
Fb=2t2 (14)
WhereFbisthebandwidthofthewavelet.Comparingthefrequency
rangesandonanalogyto[29]thevaluesofFcandthecorresponding
valuesofFb weresetasinTable1.Specifically,foreachFc
rela-tivetoeachfrequencyband,Fbwascomputedwithct=5and
accordingtoEqs.(13)and(14). 2.2.2. EstimationofF0
FortheestimationofthefundamentalfrequencyF0,the
pro-posedmethodinvolvesthefollowingsteps:
1.Band-passFIRfilteringwiththeKaiserwindow[200–800]Hz; 2.MexicanHatCWTofthesignal.AmatrixM(dimensionspxq)of
coefficientsisobtained,wherepisthemaximumvalueofthe scaleandqisthenumberofframesonwhichthescaledversions ofthemotherwaveletareshifted.Forasignaldurationof2sand afixedshiftof10ms,q=200.
3.Locationofthescale(line)ofMcorrespondingtothecoefficients ofmaximummodulusandestimationofF0accordingtoEq.(11).
AnF0estimateiscomputedevery10ms.
OneachtimewindowtheCWTscaleparameterawasallowed tovaryintherange1÷55.Thischoiceallowsincludingthewhole range ofinfantcryF0 [5].Therefore theMexicanHatCWT was
appliedwitha=55,=1/Fs=1/44.1s, Fc=0.25Hz.Consequently
Fa=200HzaccordingtoEq.(9).
2.2.3. EstimationofRFs
TheestimationofF1–F3iscarriedoutwithaproceduresimilar
fil-Table2
RMSEforthethreemethods(WInCA,PRAATandBioVoice)appliedtothe20syntheticmelodicshapes:complex,falling,rising,symmetricandplateauwithnoaddednoise, 1%,5%and10%addednoise.Bestresultsarehighlightedinbold.
RMSE[Hz] Noise WInCA PRAAT BIOVOICE
F0 F1 F2 F3 F0 F1 F2 F3 F0 F1 F2 F3 Complex 0.00 6.29 55.27 227.47 600 2.33 50.47 82.53 1050.79 5.87 47.77 65.38 89.35 0.01 6.24 55.27 227.47 600 2.33 50.51 82.56 1049.77 5.86 55.62 77.31 87.91 0.05 6.24 55.27 226.9 600 2.40 50.50 82.37 1025.91 6.10 49.99 79.81 91.70 0.10 6.48 55.91 203.35 600 3.20 51.40 82.59 1007.15 6.10 51.06 76.27 90.74 Falling 0.00 6.62 96.94 467.58 600 2.07 318.30 96.01 1098.48 7.02 120.14 94.51 134.61 0.01 6.62 96.94 467.58 600 2.06 310.11 96.07 1051.83 6.97 102.55 97.15 135.30 0.05 6.62 96.90 464.32 600 2.12 310.11 96.07 1051.83 7.30 107.67 78.41 112.56 0.10 6.62 96.51 441.46 600 2.38 279.06 95.53 987.23 6.98 102.55 97.15 135.30 Rising 0.00 8.99 103.43 495.63 600 2.15 337.57 99.48 1096.02 3.95 119.25 87.68 119.40 0.01 9.04 103.43 495.63 600 2.15 338.22 99.46 1093.86 3.90 105.96 82.37 108.57 0.05 8.85 103.76 474.08 600 2.13 330.36 99.42 1076.90 4.08 102.35 71.76 117.54 0.10 9.34 102.36 463.95 600 2.14 305.95 97.99 1007.47 3.96 104.84 81.57 105.41 Symmetric 0.00 8.21 120.77 388.45 600 2.30 311.86 91.28 1126.26 8.89 138.80 84.80 116.46 0.01 8.28 120.77 388.45 600 2.31 311.62 91.31 1130.12 8.85 128.64 89.26 109.04 0.05 8.53 120.47 362.32 600 2.31 299.99 90.85 1080.83 8.83 128.92 68.46 110.74 0.10 8.59 121.89 308.44 600 2.33 303.84 102.52 996.42 8.92 121.00 73.09 111.82 Plateau 0.00 6.51 24.39 148.42 600 0.17 6.98 89.91 1479.89 3.80 18.85 75.49 46.73 0.01 6.51 24.39 148.42 600 0.17 6.99 90.00 1478.08 3.71 18.80 60.00 45.52 0.05 6.51 24.39 148.42 600 0.24 7.03 90.29 1449.22 3.85 20.73 62.52 47.36 0.10 6.51 24.39 150 600 0.23 7.13 90.26 1365.96 3.89 6.31 73.01 28.76
ter,accordingtoTable1andusingtheComplexMorletasmother
wavelet.
2.3. Estimationthroughexistingmethodsandperformance comparison
WInCAwascomparedwithPRAATandBioVoiceinF0andRFs
estimation.
WithBioVoiceF0estimationisperformedbymeansofa
two-stepalgorithm.First,theSimpleInverseFilterTracking(SIFT)is appliedtotimewindowsofshortandfixedlength;afterwards,F0
isadaptivelyestimatedonsignalframesofvariablelengththrough theAverage Magnitude Difference Function (AMDF) withinthe rangeprovidedbytheSIFT[7,24,42].Resonancefrequenciesare obtainedbypeakpickinginthepowerspectraldensity(PSD) eval-uatedonthesameadaptivetimewindowsusedforF0estimation.
Oneachvaryingtimewindow,anautoregressivemodeloforder q=22(halfofthesamplingfrequencyinkHz)isapplied.Thischoice forqcomesfromacousticandphysiologicalconstraints,givingan enoughdetailed spectrumwhile preventingspectral smoothing andconsequentlossofspectralpeaks[23,43].
InPRAATF0isestimatedthroughtheautocorrelationofthe
sig-nalcalculatedonshortframeswhileLinearPredictiveCoding(LPC) isappliedfortheRFsestimation[15,16].Weunderlineherethat usingthedefaultvaluesofPRAATforbothF0 andRFsrangesas
wellastheirnumber definitelyincorrectresultswereobtained. Thereforeacareful selectionofthebestvalueswascarriedout, necessarilyinanempiricalway.SpecificallywechosetheF0range
equalto200–800HzandforRFsupto10000Hz.ThenumberofRFs wassetto5butweconsideredonlythefirstthreefrequencies[16]. Once that F0–F3 were estimated with the three methods
(WInCA,BiovoiceandPRAAT),theF0vectorswereresampledat
100Hz(samplingfrequencyofthecontrolvectorsofthe synthe-sizedsignals).Inthisway,thecomparisonwiththereferencevalues
waspossiblethroughthecalculationoftheroot-mean-squareerror (RMSE)inHz,definedas:
RMSE=
1 N Ni=1 (xi−yi)2 (15)
where N=201is thenumber of samplesof thecontrol vectors (lengthoftheF0 vectorofthesynthesizedsignal),xiis thei-th
F0 valueofthesyntheticsignaland yi isthecorrespondingi-th
estimatedF0values.AsthesynthesizedsignalshadfixedRFs,the
interpolationprocesswasnotperformedontheRFsvectors(F1–F3).
ThusNinEq.(15)isequaltothenumberofsamplescontainedinthe RFsvectorestimatedwiththethreemethodsandxicorrespondsto
theRFsvaluesconsideredforthesynthesizedsignals(F1=1100Hz,
F2=3000Hz,F3=5500Hz).
3. Results
Table2showstheRMSEvaluesfor eachmethodandthe20 syntheticmelodicshapes:complex,falling,rising,symmetricand plateauwith:noaddednoise,1%,5%and10%addednoise.Best resultsarehighlightedinbold.
Toprovideanoverallpicture,Table3summarizesthemean val-uesofRMSEoverallthemelodyshapesandallnoiselevelsforthe threemethods.
4. Discussion
Thecomparisonamongthethreemethodsreportedinthispaper is made according to synthetic signals that simulate the main melodicshapesofnewborncry.Totesttherobustnessofthe meth-odsincreasingnoiselevelsareaddedtothesignals.Theproposed
Table3
OverallRMSEofF0–F3estimateswiththethreemethods.Bestresultsarehighlightedinbold.
RMSEF0[Hz] RMSEF1[Hz] RMSEF2[Hz] RMSEF3[Hz]
WInCA 7.38±1.16 80.17±36.13 334.92±135.48 600±0.00
PRAAT 1.88±0.89 199.40±144.11 92.33±6.28 1135.20±164.39
synthesizerrepresentsabreakthroughwithrespecttoexisting lit-erature.
Togiveameasureofthematchingcapabilityofthethree
meth-odstheRMSEiscomputedbetweenthesyntheticsignalsandthe
estimatedones.Results(Tables2and3)showthat:
–PRAATgivesslightlybetterresultsasfarasF0estimationis
con-cerned,withameanerrorlessthan2Hz.However,itfailsinthe RFsestimation(exceptforplateau)withameanerrorranging from92Hz(forF2)upto1135HzforF3.
Wepointoutagainthattheseresultsareobtainedaftera care-fulmanualselectionofrangesandthresholdsforF0andforthe
RFs,accordingtoliterature.Muchworseresultsareobtainedifthe defaultvalueswereused(40–500HzforF0,upto5500HzforRFs,
considering5formants)[16].ThusPRAATmustbeusedcarefully. AnadditionaladvantageofPRAATisundoubtedlythelow com-putingtime:theanalysisofasignalof1sofdurationrequiresless than0.5sforF0estimationandabout2sfortheRFs.
–WInCA: F0 gives an acceptable errorbut higher than PRAAT.
ConcerningRFs,quitegoodresultsareobtainedforF1
estima-tiononly.ThecomputingtimeiscomparabletoPRAAT:for1s ofrecordingWInCArequires0.9sfortheestimationofF0 and
2.8sfortheestimationofRFs.AnotheradvantageisthatWInCA doesnotrequireanymanualsetting.IfintegratedintoBioVoice (orprovidedwiththepre-processingstep)itcouldbeasuitable alternativetoothertools.
–BioVoice:QuitegoodresultsforF0(meanerror<6Hz)andbetter
resultsforRFsascomparedtoPRAAT.ForF2andF3itgivesthe
bestresultsamongthethreetools(errorslessthan100Hz),with anerrorsimilartothatofWInCAforF1(82Hzagainst80Hzwith
WInCA).Moreoveritdoesnotrequireanymanualsetting param-etersandimplementsthepre-processingstep.Finally,itallows theanalysisofmultiplesignalssimultaneously,buildingfolders withexportablegraphsandtables.Theseadvantagesalongwith thesophisticatedimplementedtechniquesforF0 andRFs
esti-mationrepresentitsmaindrawbackthatisthecomputational complexity(for1sofrecodingitrequires3sforF0 estimation
and4.7sfortheRFs).
Thus,theuseofonetechniqueoranotheroneshouldtherefore becarefullyevaluatedaccordingtothespecificneedsinorderto avoiderrorsandunreliableresults.
Furthersuggestionsincludethechoiceoftheanalysiswindow andtheminimumdurationofcryunitstoanalyze.Thetime dura-tionoftheanalysiswindowshouldbesetasshortaspossible.In fact,weappliedawindowof10msinWInCAfortheestimationof F0 andRFs.However,mostofthetraditionalanalysistechniques
do not allowadequatefrequency resolution for frames of such shortdurationandtypicallymuchlongerwindows(even50ms) areusedleadingtoapoortemporalresolution.Asacompromiseit isthereforesuggestedtousewindowsuptoamaximumof20ms ofduration(usedinBioVoice).
AmongthetechniquespresentedherePRAATandBioVoicehave alreadybeenappliedtorealsignals[7,25,44],whiletheapplication ofWInCAwillbethesubjectoffuturestudies.
5. Conclusions
Thispaperpresentsacomparisonofthreedifferenttechniques fortheautomatedacousticalanalysisofnewborncry.Specifically twoexistingtools(PRAATandBioVoice)arecomparedtoanewly developedone,namedWInCA(WaveletInfantCryAnalyzer).We pointoutthattheinfantcryisasignalextremelydifficulttoanalyze
withstandardtechniquesduetoitsquasi-stationarityandtothe highrangeoffrequenciesofinterest.Thisstudycouldrepresent aguidelinefortheautomatedacousticalanalysisofinfantcry,in ordertoincludetheseobjectivemethodsintheclinicalpractice alongwithperceptualassessments.
Thecryingofnewbornsisafunctionalexpressionofbasic bio-logicalneeds.Itisbasedonacoordinatedeffortofseveralbrain regions,mainlybrainstemandlimbicsystemandislinkedtothe breathsystem.Thereforecryfeaturesreflectthedevelopmentand theintegrityofthecentralnervoussystem.
In addition to other clinical tests the automated infant cry analysis,whenproperlyapplied,isasuitablecheapand contact-lessapproachforanearlyassessmentoftheneurologicalstateof infants.Areliableautomatedmethodfortheestimationofcrying acousticalcharacteristicscouldprovideasupporttotheperceptive analysismadebycliniciansreducingtherequiredamountoftime oftenprohibitiveindailyclinicalpractice.
Inthis work,a newborncrysynthesizerandanew wavelet-basedmethodfornewborncryanalysisarepresented.Synthetic signals are obtained implementing a new accurate simulation modelthattakesintoaccountthevariabilityofthesignalunder study as wellas the basic shapesof thenewborn crymelody. Adatabaseofthesyntheticsignalisavailableonrequesttothe authors.
ThenewanalysistoolnamedWInCAhasbeendeveloped specif-icallyfortheacousticalanalysisofnewborncry,characterizedby highfundamentalfrequencyF0andforbeingquasi-stationary.The
waveletapproachisinfactparticularlysuitedtothestudyofinfant crythankstoitstimefrequencyhighresolutioncharacteristicsand lowcomputingtime.ToassessitsperformanceWInCAiscompared tothealreadyexistingtoolsPRAATandBioVoiceonasetof syn-theticmelodicF0shapescorruptedbyincreasingnoiselevels.
Results showthat PRAATgives slightly betterF0 resultsbut
requirespropersettings.WInCAisbestsuitedforF1estimationand
BioVoicegivesbestresultsforF2andF3.Itisalsothemostrobust
againstincreasingnoise.
Asspecifiedintheintroduction,theaimofthisworkisto pro-poseforthefirsttimeacomparisonamongobjectivemethodsfor newborncryanalysis.Thisstudywasperformedonsynthetic sig-nalsbecauseweneedtoknowtheexactreferencevaluesofsignals toobtainanaccuratecomparison.
Thesynthesizerproposedinthisworkiscapabletogeneratethe fivebasicwaveformsofthenewborncrythatarepointedoutinthe literature:rising,falling,symmetric,complexandplateau.The syn-theticsignalshaveagoodmatchwiththerealcriesalsofromthe perceptualpointofview.Ofcourse,theydonotrepresentthewide varietyofrealcries,whichincludetensofshapes,butare never-thelessuseful,sincenoreferencesexisttodate.Uponrequestthe authorswillprovidethesynthesizedsignalstotestnewmethods forinfantcryanalysis.
Theapplicationtorealsignalswillbethesubjectoffutureworks. Inparticular,theadvantagesofBiovoiceandWInCAwillbemerged for theanalysisofinfantcryin ordertodetectclinically useful parametersandranges.Moreover,itcouldbeappliedtoany cate-goryofsicknewbornsortocomparepretermandfullterminfants aswell.
Acknowledgements
This workwas partially carried onunder Project PGR00202 “Analysisandclassificationofvoiceandfacialexpression: appli-cation toneurologicaldisordersin neonatesandadults”, Italian Ministry of Foreign Affairs − Progetti Grande Rilevanza Italy-Mexico.
References
[1]O.Wasz-Höckert,E.Valanne,V.Vuorenkoski,K.Michelsson,A.Sovijarvi, Analysisofsometypesofvocalizationinthenewbornandinearlyinfancy, Ann.Paediatr.Fenn.9(1963)1–10.
[2]O.Wasz-Höckert,T.J.Partanen,V.Vuorenkoski,K.Michelsson,E.Valanne,The identificationofsomespecificmeaningsininfantvocalization,Experientia20 (3)(1964)154.
[3]K.Michelsson,K.Eklund,P.Leppanen,H.Lyytinen,Crycharacteristicsof172 healthy1-to7-day-oldinfantsm,FoliaPhoniatr.Logop.5(2002)190–200.
[4]H.E.Baeck,M.N.deSouza,Longitudinalstudyofthefundamentalfrequencyof hungercriesalongthefirst6monthsofhealthybabies,J.Voice5(2007) 551–559.
[5]H.Rothganger,Analysisofthesoundsofthechildinthefirstyearofageanda comparisontothelanguage,EarlyHum.Dev.75(2003)55–69.
[6]O.F.Reyes-Galaviz,E.A.Tirado,C.A.Reyes-Garcia,etal.,Classificationofinfant cryingtoidentifypathologiesinrecentlybornbabieswithANFIS,in:Klaus Miesenberger(Ed.),ComputersHelpingPeoplewithSpecialNeeds,Springer, Berlin,2004,pp.408–415.
[7]C.Manfredi,L.Bocchi,S.Orlandi,L.Spaccaterra,G.P.Donzelli,
High-resolutioncryanalysisinpretermnewborninfants,Med.Eng.Phys.31 (5)(2009)528–532.
[8]J.R.Deller,J.H.Hansen,J.G.Proakis,Discrete-TimeProcessingofSpeechSignal, McMillanPublishingCompany,NewYork,1993,pp.724–728.
[9]P.S.Zeskind,Infantcryingandthesynchronyofarousal,in:TheEvolutionof EmotionalCommunication:FromSoundsinNonhumanMammalstoSpeech andMusicinMan,OxfordUniversityPress,Oxford,2013,pp.155–174.
[10]A.Fort,A.Ismaelli,C.Manfredi,P.Bruscaglioni,Parametricand non-parametricestimationofspeechformants:applicationtoinfantcry, Med.Eng.Phys.18(8)(1996)677–691.
[11]A.Fort,C.Manfredi,Acousticanalysisofnewborninfantcrysignals,Med.Eng. Phys.20(1998)432–442.
[12]S.J.Sheinkopf,J.M.Iverson,B.M.Lester,Atypicalcryacousticsin6-month-old infantsatriskforautismspectrumdisorder,AutismRes.5(5)(2012)331–339.
[13]P.Sirviö,K.Michelsson,Sound-spectrographiccryanalysisofnormaland abnormalnewborninfants,FoliaPhoniatr.Logop.28(3)(1976)161–173.
[14]CorpKayElemetrics,Multi-DimensionalVoiceProgram(MDVP):Software InstructionManual,KayElemetrics,LinconPark,2003.
[15]P.Boersma,D.Weenink,Praat:DoingPhoneticsbyComputer[Computer Program]2014Version5.374,2014,http://www.praat.org/,Accessed:07 June2016.
[16]P.Boersma,Praat,asystemfordoingphoneticsbycomputer,Glot.Int.5 (9/10)(2002)341–345.
[17]M.Z.Laufer,Y.Horii,Fundamentalfrequencycharacteristicsofinfant non-distressvocalizationduringthefirsttwenty-fourweeks,J.ChildLang.4 (02)(1977)171–184.
[18]K.Michelsson,O.Michelsson,Phonationinthenewborn,infantcry,Int.J. Pediatr.Otorhinol.49(1999)S297–S301.
[19]M.P.Robb,D.H.Crowell,P.Dunn-Rankin,Suddeninfantdeathsyndrome:cry characteristics,Int.J.Pediatr.Otorhinol.77(8)(2013)1263–1267.
[20]B.Reggiannini,S.J.Sheinkopf,H.F.Silverman,X.Li,B.M.Lester,Aflexible analysistoolforthequantitativeacousticassessmentofinfantcry,J.Speech Lang.Hear.Res.6(5)(2013)1416–1428.
[21]Y.Kheddache,C.Tadj,Resonancefrequenciesbehaviorinpathologiccriesof newborns,J.Voice29(1)(2014)1–12.
[22]M.P.Robb,A.T.Cacace,Estimationofformantfrequenciesininfantcry,Int.J. Pediatr.Otorhinol.32(1)(1995)57–67.
[23]K.Wermke,W.Mende,C.Manfredi,P.Bruscaglioni,Developmentalaspectsof infant’scrymelodyandformants,Med.Eng.Phys.24(2002)501–514.
[24]C.Manfredi,L.Bocchi,G.Cantarella,Amultipurposeuser-friendlytoolfor voiceanalysis:applicationtopathologicaladultvoices,Biomed.Signal Process.Control4(2009)212–220.
[25]S.Orlandi,L.Bocchi,G.P.Donzelli,C.Manfredi,Centralbloodoxygen saturationvscryinginpretermnewborns,Biomed.Sig.Proc.Control7(2012) 88–92.
[26]S.Orlandi,P.H.Dejonckere,J.Schoentgen,J.Lebacq,N.Rruqja,C.Manfredi, Effectivepre-processingoflongtermnoisyaudiorecordings.Anaidtoclinical monitoring,Biomed.SignalProcess.Control8(6)(2013)799–810.
[27]C.Manfredi,V.Tocchioni,L.Bocchi,Arobusttoolfornewborninfantcry analysis,in:Conf.Proc.IEEEEng.Med.Biol.Soc.,NewYork,NY,2006,pp. 509–512.
[28]N.Rruqja,P.H.Dejonckere,G.Cantarella,J.Schoentgen,S.Orlandi,S.D. Barbagallo,C.Manfredi,Testingsoftwaretoolswithsynthesizeddeviant voicesformedicolegalassessmentofoccupationaldysphonia,Biomed.Signal Process.Control13(2014)71–78.
[29]L.Cnockaert,J.Schoentgen,P.Auzou,C.Ozsancak,L.Defebvre,F.Grenez, Low-frequencyvocalmodulationsinvowelsproducedbyParkinsonian subjects,SpeechCommun.50(4)(2003)288–300.
[30]L.Falek,A.Amrouche,L.Fergani,H.Teffahi,A.Djeradi,Formanticanalysisof speechsignalbywavelettransform,Conf.Proc.WorldCongr.Eng.vol.2 (2011)1572–1576.
[31]G.DePoli,C.Drioli,F.Avanzini,SintesiDeiSegnaliAudio,1999,http://www. dei.unipd.it/∼musica/Dispense/cap5.pdf.Accessed:07June2016.
[32]J.D.Markel,A.H.Gray,LinearPredictionofSpeech,Springer–Verlag,Berlin, Heidelberg,NewYork,2011.
[33]G.DePoli,F.Avanzini,Soundmodeling:signal-basedapproaches.In: AlgorithmsforSoundandMusicComputing,2009.
[34]K.Wermke,W.Mende,Musicalelementsinhumaninfants’cries:inthe beginningisthemelody,MusicaeScientiae13(Suppl.2)(2009)151–175.
[35]G.Varallyay,Themelodyofcrying,Int.J.Pediatr.Otorhinol.71(11)(2007) 1699–1708.
[36]E.Amaro-Camargo,C.A.Reyes-García,Applyingstatisticalvectorsofacoustic characteristicsfortheautomaticclassificationofinfantcry,in:Advanced IntelligentComputingTheoriesandApplications.WithAspectsofTheoretical andMethodologicalIssues,SpringerBerlin,Heidelberg,2007,pp.1078–1085.
[37]K.Wermke,D.Leising,A.Stellzig-Eisenhauer,Relationofmelodycomplexity ininfants’criestolanguageoutcomeinthesecondyearoflife:alongitudinal study,Clin.Linguist.Phonet.21(2007)961–973.
[38]S.Orlandi,C.A.Reyes-Garcia,A.Bandini,G.P.Donzelli,C.Manfredi, Applicationofpatternrecognitiontechniquestotheclassificationof full-Termandpreterminfantcry,J.Voice30(6)(2016)656–663.
[39]http://www.audacityteam.org/,Accessed:07June2016.
[40]C.Chui,AnIntroductiontoWavelets,AcademicPress,SanDiego,CA,USA, 1995.
[41]P.S.Addison,TheIllustratedWaveletTransformHandbook:Introductory TheoryandApplicationsinScience,Engineering,MedicineandFinance, InstituteofPhysicsPublishing,London,2002.
[42]C.Manfredi,G.Peretti,Anewinsightintopost-surgicalobjectivevoicequality evaluation.Applicationtothyroplasticmedialisation,IEEETrans.Biomed. Eng.53(3)(2006)442–451.
[43]C.Manfredi,Adaptivenoiseenergyestimationinpathologicalspeechsignals, IEEETrans.Biomed.Eng.47(2000)1538–1542.
[44]G.Esposito,M.delCarmenRostagno,P.Rostagno,P.Venuti,J.D.Haltigan,D.S. Messinger,Briefreport:atypicalexpressionofdistressduringtheseparation phaseofthestrangesituationprocedureininfantsiblingsathighriskforASD, J.AutismDev.Disord.(2013)1–6.