Testing software tools for newborn cry analysis using synthetic signals

(1)

ContentslistsavailableatScienceDirect

Biomedical

Signal

Processing

and

Control

j ou rn a l h o m e pa g e :w w w . e l s e v i e r . c o m / l o c a t e / b s p c

Testing

software

tools

for

newborn

cry

analysis

using

synthetic

signals

S. Orlandi

a,∗

_,

_A.

_Bandini

a,b

_,

_F.F.

_Fiaschi

a

_,

_C.

_Manfredi

a

a_Department_of_Information_Engineering,_Università_degli_Studi_di_Firenze,_Firenze,_Italy

b_Department_of_Electrical,_Electronic_and_Information_Engineering_(DEI)_“Guglielmo_Marconi”,_Università_di_Bologna,_Bologna,_Italy

a

r

t

i

c

l

e

i

n

f

o

Articlehistory: Received7June2016 Receivedinrevisedform 20September2016 Accepted15December2016 Availableonline21December2016 Keywords: Infantcry Acousticalanalysis Autoregressivemodels Wavelettransform Fundamentalfrequency Resonancefrequencies

a

b

s

t

r

a

c

t

Contactlesstechniquesareofincreasingclinicalinterestastheycanprovideadvantagesintermsof

comfortandsafetyofthepatientwithrespecttosensor-basedmethods.Therefore,theyareparticularly

wellsuitedforvulnerablepatientssuchasnewborns.Speciﬁcallytheacousticalanalysisoftheinfantcry

isacontactlessapproachtoassisttheclinicalspecialistinthedetectionofabnormalitiesininfantswith

possibleneurologicaldisorders.Alongwiththeperceptualanalysis,theautomatedanalysisofinfant

cryisusuallyperformedthroughsoftwaretoolsthathowevermightnotbedevotedtothisspeciﬁc

signal.Thenewborncryisasignalextremelydifﬁculttoanalyzewithstandardtechniquesduetoits

quasi-stationarityandtoveryhighrangeoffrequenciesofinterest.Thereforesoftwaretoolsshouldbe

speciﬁcallysetandusedwithcaution.Toaddressthisissuethreemethodsaretestedandcompared,

onefreelyavailableandothertwospeciﬁcallybuiltusingdifferentapproaches:autoregressiveadaptive

modelsandwavelets.Thethreemethodsarecomparedusingsyntheticsignalscomingfromasynthesizer

developedforthegenerationofbasicmelodicshapesofthenewborncry.Resultspointoutstrengthsand

weaknessesofeachmethod,thussuggestingtheirmostappropriateuseaccordingtothegoalsofthe

analysis.

1. Introduction

Cryingisthefirstandprimarymethodofcommunicationamong humans.It is a functional expression of basicbiological needs, emotionalorpsychologicalconditionssuchashunger,cold,pain, crampsand even joy [1,2]. It involvesactivation of thecentral nervoussystemandrequiresacoordinatedeffortofseveralbrain regions,mainlybrainstemandlimbicsystem.Thereforeabrain dys-functionmayleadtodisordersinthevibrationofthevocalfoldsand inthecoordinationofthelarynx,pharynxandvocaltract,giving risetoanabnormalcry[3,4].Todatenewborncryanalysisismost oftenperformedwithaperceptiveexaminationbasedon listen-ingtothecryandvisuallyinspectingthesignalwaveformandits spectrogram.Thisapproachisoperator-dependentandrequiresa considerableamountoftimeoftenprohibitiveindailyclinical prac-tice.Anaccurateandautomatedacousticanalysisofnewborncry couldbehelpfultoassistthecliniciantodetectriskmarkersof neu-rodevelopmentaldisorders.Specifically,thedistinctionbetweena regularandanabnormalcryingcanbeveryusefulintheclinical practice.Thereforethescientificcommunityispayingspecial

atten-∗ Correspondingauthor.

E-mailaddress:silvia.orlandi@uniﬁ.it(S.Orlandi).

tiontotechniquesdevotedtoanaccurateautomatedanalysisofthe newborncry[5–7].

Themostsigniﬁcantacousticalparametersofinfantcryingare thefundamentalfrequency(F0)andtheﬁrstthreeformant

frequen-ciesofthevocaltract(F1,F2andF3).F0reﬂectstheregularityofthe

vibrationofthevocalfoldswhileF1,F2andF3 arerelatedtothe

varyingshapeandlengthofthevocaltractduringphonationand thustoitscontrol[8].Actually,itismoreappropriatetoreferto resonancefrequencies(RFs)ratherthanformantsinnewborns.In fact,thevocaltractisalmostﬂat,themobilityoftheoralcavityis reducedandthebabyisunabletoarticulatevowelorconsonant soundsasthepharynxistooshortandnotwideenoughforthat purpose.ThereforehereafterwewillrefertoF1-F3asofresonance

frequencies(RFs).

InfantF0valuesareusuallyintherange200Hz–800Hz(inthe

caseofhyperphonationtheycanreachandexceed1000Hz)[5,9]. Thisrangeisquitewideincludingbothhealthyandpathological newborns.Indeed nopreciserangesare validatedin the litera-turefordifferentiatingthetwocategories,thepossiblediseases orcategoriesbeingofverydifferentnature(i.e.,deaf,asphyxiated, gastroschistic,preterm,etc).TypicalvaluesfortheﬁrstthreeRFs arearound1000Hz,3000Hzand5000Hz,respectively[10].The RFsestimationisverydifﬁculttoperformconsideringtheirhigh

http://dx.doi.org/10.1016/j.bspc.2016.12.012

(2)

variability.Signiﬁcantdeviationsfromtheserangesmayberelated topathologicalconditionsofthecentralnervoussystem[11,12].

The automated analysis of infant cry hasits origins several decadesagowhenthetechnologywaslimited.Therefore,itwas mainlybasedontheperceptualanalysismadebycliniciansthrough listeningtothecrysignalandvisualinspectionofspectrogram esti-matedwiththeFastFourierTransform(FFT)[13].Thisapproachis implementedintheMultidimensionalVoiceProgram(MDVPTM),

theﬁrstandstillusedcommercialtool,thoughdevelopedforadult voices[14].Currently,manyresearchersusePRAAT[15,16]freely availableonline.AsMDVPTM,itwasdevelopedfortheadultvoice.

Itrequiresacarefulmanualsettingofsomeparametersandthus sometechnicalskills[16].

Thus,sinceearlystudies,itwashighlightedtheneedtodevelop dedicatedsoftwaretoolsthatcouldprovideautomaticallythemain parametersoftheinfantcry.Inthelasttwentyyears,mostofthe researchwasdevotedtoF0estimationwithtraditionalapproaches

suchas FFT, correlation and cepstrum [13,17–20]. Instead, few papersaddressedtheRFsestimation:insomepapersF1was esti-matedwithFFT[10,18,21,22]andrecentlyFFTisappliedforF1-F3

estimation[20].InRobbetal.[22],FFTandLinearPrediction(LP) methodswerecompared,withresultscomparableonlyforF1.

Several approaches were tested on synthetic and real sig-nals[7,10,11,23].Anautomaticadaptiveparametricapproachwas developed,calledBioVoice[24],andsuccessfullyappliedto new-borncry[7,25,26].AswellasforF0,thedifﬁcultyintheestimation

oftheRFsismainlyduetothequasi-stationarityandtheveryhigh rangeoffrequenciesofinterestinthenewborncrywhichrequires sophisticatedadaptivenumericaltechniquescharacterizedbyhigh time-frequencyresolution. Totheauthors’knowledgeonly two softwaretoolsexistspeciﬁcallydesignedfortheautomaticanalysis ofinfantcrying.TheabovementionedBioVoice[7,23,25–27],based onaninnovativeadaptiveparametricapproachforF0andformants

estimation[10]andsuccessfullytestedagainstothersoftwaretools [11,28]andasoftwaretoolrecentlyproposedbyReggianninietal. [20]thatestimatesF0 bymeansofacepstrumapproach.[11].It

wasshownthatthecepstrumapproachisfasterthanaparametric approach,butlessrobustagainstnoiseinfrequencies(F0andRFs)

estimation[11].

Takingintoaccountthehighvariabilityofthesignalthewavelet approachcouldbeparticularlysuitedtothestudyofneonatalcry thankstoitstime-frequencyvaryingresolutionandlowcomputing time.Thereforeanewautomatedmethodbasedonthewavelet transformisproposedherefortheestimationofF0 andtheRFs

ofnewborncrythat,liketheBioVoicetool,doesnotrequireany manualsettingtobemadebytheuser.Basedontheliteraturethe MexicanhatContinuousWaveletTransform(CWT)isappliedforF0

estimationandthecomplexMorletforRFsestimation[29,30].The proposedapproach,namedWInCA(WaveletInfantCryAnalyzer) iscurrentlyimplementedinMATLABbutcanbeeasilyadaptedto anyembeddedprocessor.

Anewsynthesizercapabletoreproducevariablemelodicshapes ofnewborncrywasdeveloped,basedonthemethodsdescribedin [8,31].ThissynthesizerisusedtotestWInCAagainstPRAATand BioVoice.

Thispaperpresentstheﬁrstattempttoapplywaveletstothe analysisofnewborncry,thereforethemethodwillbedescribedin thenextsectionalongwiththenewsynthesizer.

Theinnovativefeaturesofthispaperare:asynthesizer specif-icallydevelopedtoreproducethemelodicshapeoftheneonatal cry;anewmethodfornewborncryanalysisbasedonthewavelet transformandacomparisonofthreedifferentmethodsof auto-matedanalysisofcryonsyntheticsignals.Advantagesandlimits ofthethreemethodsarehighlightedandtheoptimaluseofeach ofthemissuggested.

2. Methods

2.1. Thenewborncrysynthesizer

ThesynthesizerwasdevelopedunderMatlabcomputing envi-ronment.Itiscapableofsynthesizingnewborncrysignalswith differentmelodyshapes.Thesynthesizer,basedonamethod devel-opedforadultmalevoices[8,31]iscomposedbytwoblocks:apulse traingeneratorandavocaltractﬁlter,accordingto[32].

2.1.1. Pulsetraingenerator

Thepulsetraingenerator,basedonadditivesynthesis,buildsa glottalpulsesequence.Itapproximatesaperiodicsignalthrough alinearcombinationofsinewaveswhosefrequenciesarein har-monicratiowitheachother.Thus,theﬁrststepoftheinfantcry synthesizerassumestheglottalpulsesequenceasa pulsetrain withperiodTthatisthereciprocalofthefundamentalfrequency F0(F0=1/T).

Thesynthesisofeachharmoniccomponentisobtainedthrough anoscillatorblock.Eachoscillatorisdrivenbytwocontrolvectors: theamplitude(a[n])and thefrequency(f[n])oftheoutputsine waves,wherenisthen-thelementofthecontrolvectors(1≤n≤N) andNisrelatedtotheduration(ins)ofthesynthesizedsignaland tothenumberofframesinwhichthesynthesizedsignalis seg-mented.AssumingasamplingfrequencyFs=44.1kHzandaframe

durationof10ms,weget441samplesperframe(SpF).Thus,the framerate(thefrequencyofthecontrolvector)isgivenbytheratio Fs/SpF(100Hz).Withasynthesizedsignalof2sofdurationN=201

samples.

Intheﬁrstharmonica0[n]=1andf0[n]=F0.Higherharmonics

areobtainedwiththesameamplitudevectoroftheﬁrstharmonic (a0[n])andfrequencyvectorsgivenbymultiplesofF0:

fk[n] =kf0[n] (1)

where1≤k≤KandK=30isthenumberofdigitaloscillators con-sideredinthesynthesizer.Fromtheliterature[5]weknowthatthe rangeofF0fornewborncryisaround200–800Hzandtheﬁrstthree

resonancefrequenciescanreachvaluesupto10kHz.Assumingan averageF0=400Hz,withK=30weareabletosynthesizesignals

withfrequenciesupto12kHz,thuscovering therangeof inter-est.Signalsobtainedbyeachoscillatorarethensummedupgiving risetotheglottalpulsetrainss[n].Accordingtotheadditive

syn-thesisdescribedabove,F0iskeptconstantthroughoutthewhole

vocalemission.Thus,togetatimevaryingF0inthesynthesized

sig-nal,amodificationofthisapproachisproposedherethroughthe modulationofthefirstharmonic.Modulationisobtaineddriving theoscillatorthatgeneratesthefirstharmonicwithavectorf0[n]

madeupbyvariablevalues,asitwillbediscussedinsubsection A3.AccordingtoEq.(1),theotheroscillators(thatgeneratehigher orderharmonics)willgiverisetofrequencymodulatedsignals. 2.1.2. Vocaltractﬁlter

Vocaltractresonancesaregivenbyafilterbankwith3parallel resonancefilters[31].Eachfilterisgivenbya2nd-orderall-pole filterwithfrequencyresponse:

H(z)= b1

1−a2z−1+a3z−2

(2) Giventhebandwidth(B)ofthefilter,theresonancecentral fre-quency(fc)andωc=2fc/Fs,thefiltercoefficients areobtained

throughthefollowingrelationships[33]:

r=e−BFs (3)

a2=−2rcos(2fc

(3)

Fig.1.Basicmelodyshapesofthenewborncryobtainedthroughthenewborncrysynthesizer:a)Complex(double-archmelodyshape);b)Falling(singlearchmelodyshape witharapidF0increasefollowedbyaslowdecrease,withasymmetricshapeskewedtotheleft);c)Plateau(ﬂatmelodyproﬁle);d)Rising(single-archmelodyshapewith

aslowF0increasefollowedbyarapiddecrease,withasymmetricshapeskewedtotheright);e)Symmetric(singlearchmelodyshapealmostsymmetricwithrespecttoits

midpoint).

a3=r2 (5)

b1=(1−r)

1−2rcos(ωc)+r2 (6)

Thusthefrequencyresponse(2)canbewrittenas:

H(Z)= b1

1−2rcos(ωc)z−1+r2z−2

(7) Thecentralfrequenciesofthethreeparallelﬁlters(fc)correspond

tothevocaltractformantfrequencies(F1–F3).Thus,theproposed

synthesizercanbecontrolledbysettingthevalueofthe funda-mentalfrequency(F0)andoftheﬁrstthreeformantfrequencies

(F1–F3).

2.1.3. Infantcrysynthesis−melodyshapesandvocaltract resonances

Accordingto[5]thefundamentalfrequencyF0ofhealthy

new-bornsusuallyvariesbetween200Hzand800Hz.Thus,theF0values

ofthesynthesizedsignalsareallowedtovaryinthisrange.F0

varia-tionsweresetaccordingtothebasicmelodyshapesofthenewborn cry[23,34–37]:

• Complex:double-archmelodyshape(Fig.1a);

• Falling:singlearchmelodyshapewitharapidF0 increase

fol-lowedbyaslowdecrease,withasymmetricshapeskewedtothe left(Fig.1b);

• Plateau:ﬂatmelodyproﬁle(Fig.1c);

• Rising:singlearchmelodyshapewithaslowF0increasefollowed

byarapiddecrease,withasymmetricshapeskewedtotheright (Fig.1d);

• Symmetric: singlearch melody shape almostsymmetric with respecttoitsmidpoint(Fig.1e).

Asmentionedabove,thesynthesizedsignalshaveﬁxed dura-tionequalto2s.Thus,thecontrolvectorsthatdrivetheoscillators aremadeupby201samples.Togetcontrolvectorswithvarying frequencyforsynthesizingthe5basicmelodyshapesofF0,aspline

interpolationwasimplemented.Theinterpolationpoints(nodes) weresettoobtainmelodicshapeswithintherange400Hz–650Hz, whichisthecentralintervaloftherangereportedin[5]. Specif-ically, for each basic shape, the frequency control vectors are obtainedasfollows:

• Thecomplexshapeisdefinedascomposedbymultiplearcs(at least2)[34].Wesynthesizedadouble-arcshapewhosemaxima (withfrequencyequalto590Hz)arelocatedat0.25sand1.25s respectively.Betweenthesemaxima,alocalminimum(with fre-quencyequalto520Hz)islocatedat0.6s,inordertomakethe secondpeakslowerwithrespecttothefirstone(Fig.1a).Thefirst andlastnodesofthevectortobeinterpolatedwerebothsetat 500Hz

• Thefallingshapewasobtainedsettinga maximumfrequency (650Hz)at0.4s,thatallowforaslowdecreasetowards450Hz (Fig.1b).Theﬁrstnodewassetequalto550Hz

• Theplateaushape wasobtainedby varyingF0 withina quite

narrowrange(540–555Hz)aroundanaverageF0valueequalto

550Hz.Inparticular,thereisaslightlyascendingsectionupto 555Hz,followedbyadecreasetowards540Hz(Fig.1c). • Therisingshapewasobtainedbyreversingthefallingshape.The

(4)

decreasetowards550Hz(Fig.1d).Theﬁrstnodewassetequalto 450Hz

• The symmetric shape was obtained setting the maximum (650Hz)inthemidpointofthevectortobeinterpolated(thus at1s).Theﬁrstandlastnodesweresetequalto450Hz(Fig.1e). Theamplitudecontrolvectorswereobtainedinthesameway foralltheshapes:increasingamplitudeissetintheﬁrstpartof thesignal,followedbyanearlyconstantpartandthendecreasing amplitudeattheendofthesignal.

Resonancefrequencyrangesweresetaccordingto[38],where theﬁrstthreeRFsof28healthytermnewbornswerefoundinthe range:F1[700–1400]Hz;F2[2000–4000]Hz;F3[4000–7000]Hz. Speciﬁcallyhereweset:F1=1100Hz,F2=3000HzandF3=5500Hz.

Totesttheproposedmethod,additivewhitenoiseofincreasing amplitude(1%,5%and10%ofthesignalmaximumamplitude)was addedtothesynthesizedsignalsthroughtheAudacitytool[39]. Noisewasaddedtohavemorerealisticmelodicshapesofcriesin healthyorsickinfants,sinceitisalwayscharacterizedby irregu-larities.Theﬁvemelodicshapesdescribedaboveweresynthesized withoutnoiseandwitheachoneofthethreelevelsofaddednoise. Thereforetwentysyntheticsignalswereanalyzed.

2.2. WInCA:waveletmethod

Inthis section thenovelwavelet-basedmethodfor estimat-ing F0 and RFs is explained. The new tool named WInCA has

beendevelopedspeciﬁcallyfortheacousticalanalysisofnewborn cry,characterizedbyhighfundamental frequencyF0 and

quasi-stationarity.Todatenootherapplicationofthewavelettransform tonewborncryexists.Therefore,thechoiceofthemotherwavelet isbasedonresultsobtainedwithadultvoicesignals.Speciﬁcally theMexicanHatcontinuouswaveletisappliedforF0 estimation

[29]andthecomplexMorletforRFsestimation[30]. 2.2.1. Continuouswavelettransform

Thewavelettransformﬁlters asignalf(t)withashiftedand scaledversionofaprototypefunction␺(t),theso-called“mother wavelet”,acontinuousfunctioninboththetimedomainandthe frequencydomain[40].

TheContinuousWaveletTransform(CWT)off(t)isdeﬁnedas [40]: CWT(a,b;f(t),␺(t))=

+∞ −∞ f(t)√1 a␺ ∗

t−b a

dt (8)

ThescaleparameteraofaCWTisrelatedtothewidthoftheanalysis window:iteitherdilatesorcompressesthesignal.Theshift param-eterblocatesthewaveletintime.Varyingaandballowslocating thewavelet atthedesiredfrequency andtimeinstant[40].The relationshipbetweenaandthefrequencyisgivenbytheso-called pseudo-frequency(Fa)inHz,deﬁnedbythefollowingequation:

Fa= Fc

a (9)

WhereisthesamplingperiodandFcisthewaveletcentral

fre-quency.

ForF0estimation,aMexicanHatCWTisused.TheMexicanHat

CWTisdeﬁnedas: ␺(t)= √2

3

−1

4₍₁−_x2_)e−x22 ₍₁₀₎

For each time window and within the F0 frequency range

(200–800Hz),thehighestcoefﬁcientoftheCWTmatrixisfound. Theautocorrelation(AC)iscomputedontherowofthematrixthat

Table1

Frequencybandsofinterestinnewborncry,centerfrequencyFcandbandwidthFb

forthecomplexMorlet.

Frequencyband[Hz] Fc[Hz] Fb[Hz]

F1[800–1500] 0.8 1.98

F2[1500–3500] 0.75 2.25

F3[3500–5500] 1.5 0.56

containsthisvalue,whichcorrespondstotheoptimalscale.F0is

givenby:

F0=Fs/␶ (11)

Where␶referstotheposition(lag)ofthemaximumoftheAC.

TheestimationofF1–F3isperformedinasimilarway,with

dif-ferentrangesfortheband-passﬁlterasreportedinTable1with

acomplexMorletwaveletasprototype[40].TheCWTprovidesa moreaccuraterepresentationoftheoscillatorycomponentswithin thesignalwithoutintroducingﬂuctuationsinthecoefﬁcients[40]. ThecomplexMorletwaveletisdescribedbythefollowing relation-ship[41]: (t)= 1 14 ej␻ct_e − t2 2 2t ₍₁₂₎

where 1/␲1/4 _is _the_{normalization} _term_of _the _energy;_for _this

waveletthecenterfrequencyisdeﬁned as␻c=2␲Fc.Thescale

parameter␴t isthestandard deviation(STD).It determinesthe

amplitudeofthewavelet.Infact␻c␴tsetsthelinkbetweenthe

bandwidthof the waveletand itsfrequency Fc.For the Morlet

wavelet,thelattermustassumevaluessuchthat[29,30]:

ωct≥5 (13)

Moreover,thefollowingrelationshipistakenintoaccount:

Fb=2t2 (14)

WhereFbisthebandwidthofthewavelet.Comparingthefrequency

rangesandonanalogyto[29]thevaluesofFcandthecorresponding

valuesofFb weresetasinTable1.Speciﬁcally,foreachFc

rela-tivetoeachfrequencyband,Fbwascomputedwith␻c␴t=5and

accordingtoEqs.(13)and(14). 2.2.2. EstimationofF0

FortheestimationofthefundamentalfrequencyF0,the

pro-posedmethodinvolvesthefollowingsteps:

1.Band-passFIRﬁlteringwiththeKaiserwindow[200–800]Hz; 2.MexicanHatCWTofthesignal.AmatrixM(dimensionspxq)of

coefﬁcientsisobtained,wherepisthemaximumvalueofthe scaleandqisthenumberofframesonwhichthescaledversions ofthemotherwaveletareshifted.Forasignaldurationof2sand aﬁxedshiftof10ms,q=200.

3.Locationofthescale(line)ofMcorrespondingtothecoefﬁcients ofmaximummodulusandestimationofF0accordingtoEq.(11).

AnF0estimateiscomputedevery10ms.

OneachtimewindowtheCWTscaleparameterawasallowed tovaryintherange1÷55.Thischoiceallowsincludingthewhole range ofinfantcryF0 [5].Therefore theMexicanHatCWT was

appliedwitha=55,=1/Fs=1/44.1s, Fc=0.25Hz.Consequently

Fa=200HzaccordingtoEq.(9).

2.2.3. EstimationofRFs

TheestimationofF1–F3iscarriedoutwithaproceduresimilar

(5)

ﬁl-Table2

RMSEforthethreemethods(WInCA,PRAATandBioVoice)appliedtothe20syntheticmelodicshapes:complex,falling,rising,symmetricandplateauwithnoaddednoise, 1%,5%and10%addednoise.Bestresultsarehighlightedinbold.

RMSE[Hz] Noise WInCA PRAAT BIOVOICE

F0 F1 F2 F3 F0 F1 F2 F3 F0 F1 F2 F3 Complex 0.00 6.29 55.27 227.47 600 2.33 50.47 82.53 1050.79 5.87 47.77 65.38 89.35 0.01 6.24 55.27 227.47 600 2.33 50.51 82.56 1049.77 5.86 55.62 77.31 87.91 0.05 6.24 55.27 226.9 600 2.40 50.50 82.37 1025.91 6.10 49.99 79.81 91.70 0.10 6.48 55.91 203.35 600 3.20 51.40 82.59 1007.15 6.10 51.06 76.27 90.74 Falling 0.00 6.62 96.94 467.58 600 2.07 318.30 96.01 1098.48 7.02 120.14 94.51 134.61 0.01 6.62 96.94 467.58 600 2.06 310.11 96.07 1051.83 6.97 102.55 97.15 135.30 0.05 6.62 96.90 464.32 600 2.12 310.11 96.07 1051.83 7.30 107.67 78.41 112.56 0.10 6.62 96.51 441.46 600 2.38 279.06 95.53 987.23 6.98 102.55 97.15 135.30 Rising 0.00 8.99 103.43 495.63 600 2.15 337.57 99.48 1096.02 3.95 119.25 87.68 119.40 0.01 9.04 103.43 495.63 600 2.15 338.22 99.46 1093.86 3.90 105.96 82.37 108.57 0.05 8.85 103.76 474.08 600 2.13 330.36 99.42 1076.90 4.08 102.35 71.76 117.54 0.10 9.34 102.36 463.95 600 2.14 305.95 97.99 1007.47 3.96 104.84 81.57 105.41 Symmetric 0.00 8.21 120.77 388.45 600 2.30 311.86 91.28 1126.26 8.89 138.80 84.80 116.46 0.01 8.28 120.77 388.45 600 2.31 311.62 91.31 1130.12 8.85 128.64 89.26 109.04 0.05 8.53 120.47 362.32 600 2.31 299.99 90.85 1080.83 8.83 128.92 68.46 110.74 0.10 8.59 121.89 308.44 600 2.33 303.84 102.52 996.42 8.92 121.00 73.09 111.82 Plateau 0.00 6.51 24.39 148.42 600 0.17 6.98 89.91 1479.89 3.80 18.85 75.49 46.73 0.01 6.51 24.39 148.42 600 0.17 6.99 90.00 1478.08 3.71 18.80 60.00 45.52 0.05 6.51 24.39 148.42 600 0.24 7.03 90.29 1449.22 3.85 20.73 62.52 47.36 0.10 6.51 24.39 150 600 0.23 7.13 90.26 1365.96 3.89 6.31 73.01 28.76

ter,accordingtoTable1andusingtheComplexMorletasmother

wavelet.

2.3. Estimationthroughexistingmethodsandperformance comparison

WInCAwascomparedwithPRAATandBioVoiceinF0andRFs

estimation.

WithBioVoiceF0estimationisperformedbymeansofa

two-stepalgorithm.First,theSimpleInverseFilterTracking(SIFT)is appliedtotimewindowsofshortandﬁxedlength;afterwards,F0

isadaptivelyestimatedonsignalframesofvariablelengththrough theAverage Magnitude Difference Function (AMDF) withinthe rangeprovidedbytheSIFT[7,24,42].Resonancefrequenciesare obtainedbypeakpickinginthepowerspectraldensity(PSD) eval-uatedonthesameadaptivetimewindowsusedforF0estimation.

Oneachvaryingtimewindow,anautoregressivemodeloforder q=22(halfofthesamplingfrequencyinkHz)isapplied.Thischoice forqcomesfromacousticandphysiologicalconstraints,givingan enoughdetailed spectrumwhile preventingspectral smoothing andconsequentlossofspectralpeaks[23,43].

InPRAATF0isestimatedthroughtheautocorrelationofthe

sig-nalcalculatedonshortframeswhileLinearPredictiveCoding(LPC) isappliedfortheRFsestimation[15,16].Weunderlineherethat usingthedefaultvaluesofPRAATforbothF0 andRFsrangesas

wellastheirnumber deﬁnitelyincorrectresultswereobtained. Thereforeacareful selectionofthebestvalueswascarriedout, necessarilyinanempiricalway.SpeciﬁcallywechosetheF0range

equalto200–800HzandforRFsupto10000Hz.ThenumberofRFs wassetto5butweconsideredonlytheﬁrstthreefrequencies[16]. Once that F0–F3 were estimated with the three methods

(WInCA,BiovoiceandPRAAT),theF0vectorswereresampledat

100Hz(samplingfrequencyofthecontrolvectorsofthe synthe-sizedsignals).Inthisway,thecomparisonwiththereferencevalues

waspossiblethroughthecalculationoftheroot-mean-squareerror (RMSE)inHz,deﬁnedas:

RMSE=

1 N N

i=1 (xi−yi)2 (15)

where N=201is thenumber of samplesof thecontrol vectors (lengthoftheF0 vectorofthesynthesizedsignal),xiis thei-th

F0 valueofthesyntheticsignaland yi isthecorrespondingi-th

estimatedF0values.AsthesynthesizedsignalshadﬁxedRFs,the

interpolationprocesswasnotperformedontheRFsvectors(F1–F3).

ThusNinEq.(15)isequaltothenumberofsamplescontainedinthe RFsvectorestimatedwiththethreemethodsandxicorrespondsto

theRFsvaluesconsideredforthesynthesizedsignals(F1=1100Hz,

F2=3000Hz,F3=5500Hz).

3. Results

Table2showstheRMSEvaluesfor eachmethodandthe20 syntheticmelodicshapes:complex,falling,rising,symmetricand plateauwith:noaddednoise,1%,5%and10%addednoise.Best resultsarehighlightedinbold.

Toprovideanoverallpicture,Table3summarizesthemean val-uesofRMSEoverallthemelodyshapesandallnoiselevelsforthe threemethods.

4. Discussion

Thecomparisonamongthethreemethodsreportedinthispaper is made according to synthetic signals that simulate the main melodicshapesofnewborncry.Totesttherobustnessofthe meth-odsincreasingnoiselevelsareaddedtothesignals.Theproposed

Table3

OverallRMSEofF0–F3estimateswiththethreemethods.Bestresultsarehighlightedinbold.

RMSEF0[Hz] RMSEF1[Hz] RMSEF2[Hz] RMSEF3[Hz]

WInCA 7.38±1.16 80.17±36.13 334.92±135.48 600±0.00

PRAAT 1.88±0.89 199.40±144.11 92.33±6.28 1135.20±164.39

(6)

synthesizerrepresentsabreakthroughwithrespecttoexisting lit-erature.

Togiveameasureofthematchingcapabilityofthethree

meth-odstheRMSEiscomputedbetweenthesyntheticsignalsandthe

estimatedones.Results(Tables2and3)showthat:

–PRAATgivesslightlybetterresultsasfarasF0estimationis

con-cerned,withameanerrorlessthan2Hz.However,itfailsinthe RFsestimation(exceptforplateau)withameanerrorranging from92Hz(forF2)upto1135HzforF3.

Wepointoutagainthattheseresultsareobtainedaftera care-fulmanualselectionofrangesandthresholdsforF0andforthe

RFs,accordingtoliterature.Muchworseresultsareobtainedifthe defaultvalueswereused(40–500HzforF0,upto5500HzforRFs,

considering5formants)[16].ThusPRAATmustbeusedcarefully. AnadditionaladvantageofPRAATisundoubtedlythelow com-putingtime:theanalysisofasignalof1sofdurationrequiresless than0.5sforF0estimationandabout2sfortheRFs.

–WInCA: F0 gives an acceptable errorbut higher than PRAAT.

ConcerningRFs,quitegoodresultsareobtainedforF1

estima-tiononly.ThecomputingtimeiscomparabletoPRAAT:for1s ofrecordingWInCArequires0.9sfortheestimationofF0 and

2.8sfortheestimationofRFs.AnotheradvantageisthatWInCA doesnotrequireanymanualsetting.IfintegratedintoBioVoice (orprovidedwiththepre-processingstep)itcouldbeasuitable alternativetoothertools.

–BioVoice:QuitegoodresultsforF0(meanerror<6Hz)andbetter

resultsforRFsascomparedtoPRAAT.ForF2andF3itgivesthe

bestresultsamongthethreetools(errorslessthan100Hz),with anerrorsimilartothatofWInCAforF1(82Hzagainst80Hzwith

WInCA).Moreoveritdoesnotrequireanymanualsetting param-etersandimplementsthepre-processingstep.Finally,itallows theanalysisofmultiplesignalssimultaneously,buildingfolders withexportablegraphsandtables.Theseadvantagesalongwith thesophisticatedimplementedtechniquesforF0 andRFs

esti-mationrepresentitsmaindrawbackthatisthecomputational complexity(for1sofrecodingitrequires3sforF0 estimation

and4.7sfortheRFs).

Thus,theuseofonetechniqueoranotheroneshouldtherefore becarefullyevaluatedaccordingtothespeciﬁcneedsinorderto avoiderrorsandunreliableresults.

Furthersuggestionsincludethechoiceoftheanalysiswindow andtheminimumdurationofcryunitstoanalyze.Thetime dura-tionoftheanalysiswindowshouldbesetasshortaspossible.In fact,weappliedawindowof10msinWInCAfortheestimationof F0 andRFs.However,mostofthetraditionalanalysistechniques

do not allowadequatefrequency resolution for frames of such shortdurationandtypicallymuchlongerwindows(even50ms) areusedleadingtoapoortemporalresolution.Asacompromiseit isthereforesuggestedtousewindowsuptoamaximumof20ms ofduration(usedinBioVoice).

AmongthetechniquespresentedherePRAATandBioVoicehave alreadybeenappliedtorealsignals[7,25,44],whiletheapplication ofWInCAwillbethesubjectoffuturestudies.

5. Conclusions

Thispaperpresentsacomparisonofthreedifferenttechniques fortheautomatedacousticalanalysisofnewborncry.Speciﬁcally twoexistingtools(PRAATandBioVoice)arecomparedtoanewly developedone,namedWInCA(WaveletInfantCryAnalyzer).We pointoutthattheinfantcryisasignalextremelydifﬁculttoanalyze

withstandardtechniquesduetoitsquasi-stationarityandtothe highrangeoffrequenciesofinterest.Thisstudycouldrepresent aguidelinefortheautomatedacousticalanalysisofinfantcry,in ordertoincludetheseobjectivemethodsintheclinicalpractice alongwithperceptualassessments.

Thecryingofnewbornsisafunctionalexpressionofbasic bio-logicalneeds.Itisbasedonacoordinatedeffortofseveralbrain regions,mainlybrainstemandlimbicsystemandislinkedtothe breathsystem.Thereforecryfeaturesreﬂectthedevelopmentand theintegrityofthecentralnervoussystem.

In addition to other clinical tests the automated infant cry analysis,whenproperlyapplied,isasuitablecheapand contact-lessapproachforanearlyassessmentoftheneurologicalstateof infants.Areliableautomatedmethodfortheestimationofcrying acousticalcharacteristicscouldprovideasupporttotheperceptive analysismadebycliniciansreducingtherequiredamountoftime oftenprohibitiveindailyclinicalpractice.

Inthis work,a newborncrysynthesizerandanew wavelet-basedmethodfornewborncryanalysisarepresented.Synthetic signals are obtained implementing a new accurate simulation modelthattakesintoaccountthevariabilityofthesignalunder study as wellas the basic shapesof thenewborn crymelody. Adatabaseofthesyntheticsignalisavailableonrequesttothe authors.

ThenewanalysistoolnamedWInCAhasbeendeveloped specif-icallyfortheacousticalanalysisofnewborncry,characterizedby highfundamentalfrequencyF0andforbeingquasi-stationary.The

waveletapproachisinfactparticularlysuitedtothestudyofinfant crythankstoitstimefrequencyhighresolutioncharacteristicsand lowcomputingtime.ToassessitsperformanceWInCAiscompared tothealreadyexistingtoolsPRAATandBioVoiceonasetof syn-theticmelodicF0shapescorruptedbyincreasingnoiselevels.

Results showthat PRAATgives slightly betterF0 resultsbut

requirespropersettings.WInCAisbestsuitedforF1estimationand

BioVoicegivesbestresultsforF2andF3.Itisalsothemostrobust

againstincreasingnoise.

Asspeciﬁedintheintroduction,theaimofthisworkisto pro-posefortheﬁrsttimeacomparisonamongobjectivemethodsfor newborncryanalysis.Thisstudywasperformedonsynthetic sig-nalsbecauseweneedtoknowtheexactreferencevaluesofsignals toobtainanaccuratecomparison.

Thesynthesizerproposedinthisworkiscapabletogeneratethe ﬁvebasicwaveformsofthenewborncrythatarepointedoutinthe literature:rising,falling,symmetric,complexandplateau.The syn-theticsignalshaveagoodmatchwiththerealcriesalsofromthe perceptualpointofview.Ofcourse,theydonotrepresentthewide varietyofrealcries,whichincludetensofshapes,butare never-thelessuseful,sincenoreferencesexisttodate.Uponrequestthe authorswillprovidethesynthesizedsignalstotestnewmethods forinfantcryanalysis.

Theapplicationtorealsignalswillbethesubjectoffutureworks. Inparticular,theadvantagesofBiovoiceandWInCAwillbemerged for theanalysisofinfantcryin ordertodetectclinically useful parametersandranges.Moreover,itcouldbeappliedtoany cate-goryofsicknewbornsortocomparepretermandfullterminfants aswell.

Acknowledgements

This workwas partially carried onunder Project PGR00202 “Analysisandclassiﬁcationofvoiceandfacialexpression: appli-cation toneurologicaldisordersin neonatesandadults”, Italian Ministry of Foreign Affairs − Progetti Grande Rilevanza Italy-Mexico.

(7)

References

[1]O.Wasz-Höckert,E.Valanne,V.Vuorenkoski,K.Michelsson,A.Sovijarvi, Analysisofsometypesofvocalizationinthenewbornandinearlyinfancy, Ann.Paediatr.Fenn.9(1963)1–10.

[2]O.Wasz-Höckert,T.J.Partanen,V.Vuorenkoski,K.Michelsson,E.Valanne,The identiﬁcationofsomespeciﬁcmeaningsininfantvocalization,Experientia20 (3)(1964)154.

[3]K.Michelsson,K.Eklund,P.Leppanen,H.Lyytinen,Crycharacteristicsof172 healthy1-to7-day-oldinfantsm,FoliaPhoniatr.Logop.5(2002)190–200.

[4]H.E.Baeck,M.N.deSouza,Longitudinalstudyofthefundamentalfrequencyof hungercriesalongtheﬁrst6monthsofhealthybabies,J.Voice5(2007) 551–559.

[5]H.Rothganger,Analysisofthesoundsofthechildintheﬁrstyearofageanda comparisontothelanguage,EarlyHum.Dev.75(2003)55–69.

[6]O.F.Reyes-Galaviz,E.A.Tirado,C.A.Reyes-Garcia,etal.,Classiﬁcationofinfant cryingtoidentifypathologiesinrecentlybornbabieswithANFIS,in:Klaus Miesenberger(Ed.),ComputersHelpingPeoplewithSpecialNeeds,Springer, Berlin,2004,pp.408–415.

[7]C.Manfredi,L.Bocchi,S.Orlandi,L.Spaccaterra,G.P.Donzelli,

High-resolutioncryanalysisinpretermnewborninfants,Med.Eng.Phys.31 (5)(2009)528–532.

[8]J.R.Deller,J.H.Hansen,J.G.Proakis,Discrete-TimeProcessingofSpeechSignal, McMillanPublishingCompany,NewYork,1993,pp.724–728.

[9]P.S.Zeskind,Infantcryingandthesynchronyofarousal,in:TheEvolutionof EmotionalCommunication:FromSoundsinNonhumanMammalstoSpeech andMusicinMan,OxfordUniversityPress,Oxford,2013,pp.155–174.

[10]A.Fort,A.Ismaelli,C.Manfredi,P.Bruscaglioni,Parametricand non-parametricestimationofspeechformants:applicationtoinfantcry, Med.Eng.Phys.18(8)(1996)677–691.

[11]A.Fort,C.Manfredi,Acousticanalysisofnewborninfantcrysignals,Med.Eng. Phys.20(1998)432–442.

[12]S.J.Sheinkopf,J.M.Iverson,B.M.Lester,Atypicalcryacousticsin6-month-old infantsatriskforautismspectrumdisorder,AutismRes.5(5)(2012)331–339.

[13]P.Sirviö,K.Michelsson,Sound-spectrographiccryanalysisofnormaland abnormalnewborninfants,FoliaPhoniatr.Logop.28(3)(1976)161–173.

[14]CorpKayElemetrics,Multi-DimensionalVoiceProgram(MDVP):Software InstructionManual,KayElemetrics,LinconPark,2003.

[15]P.Boersma,D.Weenink,Praat:DoingPhoneticsbyComputer[Computer Program]2014Version5.374,2014,http://www.praat.org/,Accessed:07 June2016.

[16]P.Boersma,Praat,asystemfordoingphoneticsbycomputer,Glot.Int.5 (9/10)(2002)341–345.

[17]M.Z.Laufer,Y.Horii,Fundamentalfrequencycharacteristicsofinfant non-distressvocalizationduringtheﬁrsttwenty-fourweeks,J.ChildLang.4 (02)(1977)171–184.

[18]K.Michelsson,O.Michelsson,Phonationinthenewborn,infantcry,Int.J. Pediatr.Otorhinol.49(1999)S297–S301.

[19]M.P.Robb,D.H.Crowell,P.Dunn-Rankin,Suddeninfantdeathsyndrome:cry characteristics,Int.J.Pediatr.Otorhinol.77(8)(2013)1263–1267.

[20]B.Reggiannini,S.J.Sheinkopf,H.F.Silverman,X.Li,B.M.Lester,Aﬂexible analysistoolforthequantitativeacousticassessmentofinfantcry,J.Speech Lang.Hear.Res.6(5)(2013)1416–1428.

[21]Y.Kheddache,C.Tadj,Resonancefrequenciesbehaviorinpathologiccriesof newborns,J.Voice29(1)(2014)1–12.

[22]M.P.Robb,A.T.Cacace,Estimationofformantfrequenciesininfantcry,Int.J. Pediatr.Otorhinol.32(1)(1995)57–67.

[23]K.Wermke,W.Mende,C.Manfredi,P.Bruscaglioni,Developmentalaspectsof infant’scrymelodyandformants,Med.Eng.Phys.24(2002)501–514.

[24]C.Manfredi,L.Bocchi,G.Cantarella,Amultipurposeuser-friendlytoolfor voiceanalysis:applicationtopathologicaladultvoices,Biomed.Signal Process.Control4(2009)212–220.

[25]S.Orlandi,L.Bocchi,G.P.Donzelli,C.Manfredi,Centralbloodoxygen saturationvscryinginpretermnewborns,Biomed.Sig.Proc.Control7(2012) 88–92.

[26]S.Orlandi,P.H.Dejonckere,J.Schoentgen,J.Lebacq,N.Rruqja,C.Manfredi, Effectivepre-processingoflongtermnoisyaudiorecordings.Anaidtoclinical monitoring,Biomed.SignalProcess.Control8(6)(2013)799–810.

[27]C.Manfredi,V.Tocchioni,L.Bocchi,Arobusttoolfornewborninfantcry analysis,in:Conf.Proc.IEEEEng.Med.Biol.Soc.,NewYork,NY,2006,pp. 509–512.

[28]N.Rruqja,P.H.Dejonckere,G.Cantarella,J.Schoentgen,S.Orlandi,S.D. Barbagallo,C.Manfredi,Testingsoftwaretoolswithsynthesizeddeviant voicesformedicolegalassessmentofoccupationaldysphonia,Biomed.Signal Process.Control13(2014)71–78.

[29]L.Cnockaert,J.Schoentgen,P.Auzou,C.Ozsancak,L.Defebvre,F.Grenez, Low-frequencyvocalmodulationsinvowelsproducedbyParkinsonian subjects,SpeechCommun.50(4)(2003)288–300.

[30]L.Falek,A.Amrouche,L.Fergani,H.Teffahi,A.Djeradi,Formanticanalysisof speechsignalbywavelettransform,Conf.Proc.WorldCongr.Eng.vol.2 (2011)1572–1576.

[31]G.DePoli,C.Drioli,F.Avanzini,SintesiDeiSegnaliAudio,1999,http://www. dei.unipd.it/∼musica/Dispense/cap5.pdf.Accessed:07June2016.

[32]J.D.Markel,A.H.Gray,LinearPredictionofSpeech,Springer–Verlag,Berlin, Heidelberg,NewYork,2011.

[33]G.DePoli,F.Avanzini,Soundmodeling:signal-basedapproaches.In: AlgorithmsforSoundandMusicComputing,2009.

[34]K.Wermke,W.Mende,Musicalelementsinhumaninfants’cries:inthe beginningisthemelody,MusicaeScientiae13(Suppl.2)(2009)151–175.

[35]G.Varallyay,Themelodyofcrying,Int.J.Pediatr.Otorhinol.71(11)(2007) 1699–1708.

[36]E.Amaro-Camargo,C.A.Reyes-García,Applyingstatisticalvectorsofacoustic characteristicsfortheautomaticclassiﬁcationofinfantcry,in:Advanced IntelligentComputingTheoriesandApplications.WithAspectsofTheoretical andMethodologicalIssues,SpringerBerlin,Heidelberg,2007,pp.1078–1085.

[37]K.Wermke,D.Leising,A.Stellzig-Eisenhauer,Relationofmelodycomplexity ininfants’criestolanguageoutcomeinthesecondyearoflife:alongitudinal study,Clin.Linguist.Phonet.21(2007)961–973.

[38]S.Orlandi,C.A.Reyes-Garcia,A.Bandini,G.P.Donzelli,C.Manfredi, Applicationofpatternrecognitiontechniquestotheclassiﬁcationof full-Termandpreterminfantcry,J.Voice30(6)(2016)656–663.

[39]http://www.audacityteam.org/,Accessed:07June2016.

[40]C.Chui,AnIntroductiontoWavelets,AcademicPress,SanDiego,CA,USA, 1995.

[41]P.S.Addison,TheIllustratedWaveletTransformHandbook:Introductory TheoryandApplicationsinScience,Engineering,MedicineandFinance, InstituteofPhysicsPublishing,London,2002.

[42]C.Manfredi,G.Peretti,Anewinsightintopost-surgicalobjectivevoicequality evaluation.Applicationtothyroplasticmedialisation,IEEETrans.Biomed. Eng.53(3)(2006)442–451.

[43]C.Manfredi,Adaptivenoiseenergyestimationinpathologicalspeechsignals, IEEETrans.Biomed.Eng.47(2000)1538–1542.

[44]G.Esposito,M.delCarmenRostagno,P.Rostagno,P.Venuti,J.D.Haltigan,D.S. Messinger,Briefreport:atypicalexpressionofdistressduringtheseparation phaseofthestrangesituationprocedureininfantsiblingsathighriskforASD, J.AutismDev.Disord.(2013)1–6.