• Non ci sono risultati.

Automatic Classification of Epilepsy Types Using Ontology-based and Genetic based Machine Learning

N/A
N/A
Protected

Academic year: 2021

Condividi "Automatic Classification of Epilepsy Types Using Ontology-based and Genetic based Machine Learning"

Copied!
10
0
0

Testo completo

(1)

ContentslistsavailableatScienceDirect

Artificial

Intelligence

in

Medicine

jou rn al h om e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / a i i m

Automatic

classification

of

epilepsy

types

using

ontology-based

and

genetics-based

machine

learning

Yohannes

Kassahun

a,∗

,

Roberta

Perrone

b,∗∗

,

Elena

De

Momi

b

,

Elmar

Berghöfer

f

,

Laura

Tassi

c

,

Maria

Paola

Canevini

d

,

Roberto

Spreafico

e

,

Giancarlo

Ferrigno

b

,

Frank

Kirchner

a,f

aFachbereich3MathematicsandComputerScience,UniversityofBremen,Robert-Hooke-Str.5,D-28359Bremen,Germany

bDepartmentofElectronics,InformationandBioengineering(DEIB),PolitecnicodiMilano,PiazzaLeonardodaVinci32,20133Milano,Italy

cCentroChirurgiaEpilessia“ClaudioMunari”,DipartimentodiNeuroscienze,OspedaleNiguardaGranda,PiazzaOspedaleMaggiore3,20162Milano,

Italy

dEpilepsyCenter,SanPaoloHospital,ViaA.DiRudini8,20142Milan,Italy

eDepartmentofExperimentalNeurophysiologyandEpileptology,IstitutoNazionaleNeurologico‘C.Besta’,ViaCeloria11,20133Milano,Italy

fGermanResearchCenterforArtificialIntelligence(DFKI),RoboticsInnovationCenter,Robert-Hooke-Str.5,D-28359Bremen,Germany

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received4September2013

Receivedinrevisedform24February2014

Accepted7March2014

Keywords:

Ontology-basedclassification

Genetics-basedclassification

Datamining(knowledgediscovery)from

medicaldata

Epileptogeniczoneidentification

a

b

s

t

r

a

c

t

Objectives:Inthepresurgicalanalysisfordrug-resistantfocalepilepsies,thedefinitionofthe epilepto-geniczone,whichisthecorticalareawhereictaldischargesoriginate,isusuallycarriedoutbyusing clinical,electrophysiologicalandneuroimagingdataanalysis.Clinicalevaluationisbasedonthevisual detectionofsymptomsduringepilepticseizures.Thisworkaimsatdevelopingafullyautomaticclassifier ofepileptictypesandtheirlocalizationusingictalsymptomsandmachinelearningmethods.

Methods:Wepresent theresultsachievedbyusingtwomachinelearningmethods.Thefirstis an ontology-basedclassificationthatcandirectlyincorporatehumanknowledge,whilethesecondisa genetics-baseddataminingalgorithmthatlearnsorextractsthedomainknowledgefrommedicaldata inimplicitform.

Results:Thedevelopedmethodsaretestedonaclinicaldatasetof129patients.Theperformanceofthe methodsismeasuredagainsttheperformanceofsevenclinicians,whoselevelofexpertiseishigh/very high,inclassifyingtwoepilepsytypes:temporallobeepilepsyandextra-temporallobeepilepsy.When comparingtheperformanceofthealgorithmswiththatofasingleclinician,whoisoneoftheseven cli-nicians,thealgorithmsshowaslightlybetterperformancethantheclinicianonthreetestsetsgenerated randomlyfrom99patientsoutofthe129patients.Theaccuracyobtainedforthetwomethodsandthe clinicianisasfollows:firsttestset65.6%and75%forthemethodsand56.3%fortheclinician,secondtest set66.7%and76.2%forthemethodsand61.9%fortheclinician,andthirdtestset77.8%forthemethods andtheclinician.Whencomparedwiththeperformanceofthewholepopulationofcliniciansontherest 30patientsoutofthe129patients,wherethepatientswereselectedbythecliniciansthemselves,the meanaccuracyofthemethods(60%)isslightlyworsethanthemeanaccuracyoftheclinicians(61.6%). Resultsshowthatthemethodsperformatthelevelofexperiencedclinicians,whenboththemethods andthecliniciansusethesameinformation.

Conclusion:Ourresultsdemonstratethatthedevelopedmethodsformimportantingredientsforrealizing afullyautomaticclassificationofepilepsytypesandcancontributetothedefinitionofsignsthataremost importantfortheclassification.

©2014PublishedbyElsevierB.V.

∗ Correspondingauthor.Tel.:+4942125752299.

∗∗ Correspondingauthor.

E-mailaddresses:kassahun@informatik.uni-bremen.de(Y.Kassahun),

roberta.perrone@polimi.it(R.Perrone).

1. Introduction

Fordrugresistantfocalepilepticpatients,brainsurgerycanbe

theonlyoption tocurethepatient.Theepileptogeniczone(EZ)

is the corticalarea where ictal dischargesoriginate and which

must be surgically resected or disconnectedto achieve seizure

freedom[1].ThedefinitionoftheEZisusuallyobtainedthrough

http://dx.doi.org/10.1016/j.artmed.2014.03.001

(2)

non-invasivepre-surgicalinvestigations,inmostcasesbyclinical,

electrophysiologicaland neuroimagingdataanalysis[2].Clinical

evaluation is based on the anamnestic data and onthe visual

analysisofsymptomsduringepilepticseizuresinlongtermscalp

video-electroencephalograms(video-EEG).Intra-cerebralinvasive

monitoring(stereo-EEG) maybeneededforverydifficultcases.

Clinicalmanifestationscanbejustsubjective[3](epilepticauras

withoutimpairmentofconsciousness)ormayprogressto

objec-tivesignsthatcanbeobservedandanalyzed,oftenassociatedwith

impairmentofawareness.Thechronologicalsequenceof

symp-tomsandtheirrelativedurationmightindicatetheseizureonset

areaandtheprogressiveinvolvementofadjacentbrainstructures

andis,therefore,crucialforthedefinitionoftheEZ[4],though

iden-ticalsymptomsmayarisefromdifferentcorticalregions,leadingto

misleadinglocalizationoftheEZ[5,6].Themostsignificant

symp-tomsappearinthefirst60s,evenifthepost-ictalconfusioncanbe

prolonged,mainlyifassociatedwithspeechdisturbances,whichis

thecaseforEZlocatedinthedominanthemisphere.

Temporallobeepilepsies(TLE),inwhichseizuresoriginatefrom

thetemporallobe,representthemajorityoffocalepilepsies(over

60%).ThesurgicaltreatmentofTLEhasthebestresultswithalmost

70%ofthepatientsreportedtobeseizure-freeaftersurgery[7].The

duration,thetimecourse,thelossofconsciousnessandthe

post-ictalphenomenaaredifferentinTLEandextra-temporalseizures

(ExTE).Seizuresarisingfromthetemporallobehavearelatively

gradualevolution (comparedtoExTE)[8].In TLEthesubjective

phenomenaaremorefrequent,thelossofconsciousnessismore

gradual,seizuresarelongerandthepost-ictalconfusionismore

important.TheidealsequenceofictalmodificationsinTLEisa

sub-jectivemanifestation(epigastricfeelingoftenraisingup,auditory,

visualandpsychicsymptoms,arequiteexclusivelylocatedinthe

temporallobe)followedbyalossofconsciousness,oro-alimentary

automatisms,homolateralheadorientationandacontralateralarm

dystonicposturingwithhomolateralgesturalautomatisms.Onthe

contrary,seizuresin ExTE are frequentlyabrupt, brief,without

subjectivemanifestations,thelossofconsciousnessisprecocious,

nooro-alimentaryautomatismsarepresentandmotor

modifica-tioncanbebilateralandmoretonic.However,notalltheseizures

areeasilyrecognizableandparticularlythehypermotorgestural

automatismscanoccurinbothTLEandExTE[5].The

state-of-the-artisthevisualanalysisoftheclinicaldataperformedbyexpert

cliniciansalwaysinassociationwithanatomicalmedicalimages

andEEGdata.Inliterature,thereisnotyetanyautomatic

classifi-cationofepilepsytypesbasedonclinicalsymptomsonly.

Itwouldbeextremelyimportanttoprovideclinicianswithan

automaticclassifierof epilepsytypes,whichcouldhelpto

vali-datetheopinionofthecliniciansincaseofagreementbetween

themethodsandtheclinicians,orwould,onthecontrary,forcethe

clinicianstorevisetheirdecisionorlookinmoredetailatthecase.

Inthispaperweanalyzetwomethodsforrealizinganautomatic

classifierofepilepsytypes,startingfrommedicaldata,whichis

usu-allyunstructuredinformation.Thefirstmethodisabletolearnthe

humanknowledgeonhowtoperformdiagnosis,basedonclinical

symptomsandthesecondautomaticallyretrievesknowledgefrom

medicaldata,searchingforexistingpatternsinthedata(e.g.

clin-icalsymptoms).Forthefirstmethodweusedanontology-based

classificationapproach[9],whileforthesecondmethodweused

genetics-basedmachinelearning(datamining)[10].Inadditionto

dealingwithunstructureddata,bothmethodscanalsodealwith

heterogeneousandincompletedata.Themotivationbehind

choos-ingthetwomethodsistheviewthatlearningbynomeansentails

atabularasaview.Itinvolvestheincorporationofpriorknowledge

thatcanaccelerateorotherwiseimprovethelearningprocess.A

combinationoftheabovemethods willallowus toincorporate

domainknowledgeandatthesametimediscovernovelmedical

knowledge.Thetwomethodsweretestedonaclinicaldatasetof

patientsandcomparedwiththeclassificationscarriedoutbyexpert

clinicians.

Thepaperisorganizedasfollows:firstwereviewrelevantwork

inthefieldofontology-basedclassificationandgenetics-baseddata

miningin thecontext of learning from medical data. Thenwe

presentthealgorithmsusedinthepaper,followedbythe

presen-tationoftheactualexperimentswithpatientdataandtheresults

obtained.Finally,wegiveaconclusionandanoutlookbasedonthe

workpresentedinthispaper.

2. Reviewofrelatedwork

In this paper two differentalgorithms are presented, which

exploitdifferentmethods:theontology-basedclassification(OBC)

and the genetics-based data mining (GDM). We give a review

ofrelated workintheareaof ontology-basedclassificationand

genetics-baseddatamining.

2.1. Ontology-basedclassification

Ontological modelling provides a description of a specific

knowledgedomainand itis madeofclasses, properties(which

expressrelationshipsbetweenclasses)andinstances(whichare

the“groundlevel”componentsofanontologyandpopulateclasses)

[11].Inmedicalfield,ontologiescanthereforebeusedforsharing

medicalknowledgeinareliableformatsothatitis

understand-ableandcanbeprocessedbybothhumansandmachines.Medical

ontologieslikeOpenGALEN[12],SNOMED-CT[13]andUMLS[14]

areusedinthehealthcarepractice.TheOpenGALENontologywas

usedinurologytodevelopadecisionsupportsystemfortreating

patientswithurinary-tractinfections[15].SNOMED-CTontology

provideshealthcareterminologywithcomprehensivecoverageof

diseases,clinicalfindings,etiologiesandtherapiesintheparticular

fieldofanesthesiology.Itisusedinelectronichealthcarerecords

(EHR)and isrelated tocritical patient-specificinformation,like

drugallergies.SNOMED-CTmakesthemedicalknowledgemore

usableandaccessible[16].EPILONT1isanontologyaboutepilepsy

domainandepilepticseizureswhiletheEpilepsyandSeizure

Ontol-ogy(EpSO)2givesaclassificationofepilepsysyndromes,location,

etiologyandrelatedmedicalconditionsaccordingtothe

Interna-tionalLeagueAgainstEpilepsy(ILAE)organizationstandards.

Ontologiesmadeofhierarchiesandpropertiesbetweenclasses

canbeusefulfordataaggregationandclustering.Suchontologies

providedomainknowledgeandsupporttheinterpretationof

rela-tionsidentifiedindatasetthroughdataminingprocesses,basedon

linguisticorstatisticaltechniques[17].Asanexample,the

founda-tionalmodelofanatomy(FMA)[18]ontologyisusedasasourceof

anatomicalknowledgeforpredictingtheconsequencesofinjury.

Inthisapplication,knowledgeaboutspatialrelationsbetweenthe

injuryandvitalorgansisprovidedbytheFMA[19].

Ontologicalmodelingwasusedalsoinotherdomains,like

cus-tomerknowledgeassessment[20],whereontologiesareusedto

describe customers population and the ontology graph,

repre-sentedwitharcsandnodes,isusedforclassifyingthemaccording

todifferentcriteria.In [21],theauthorsdesignedsurgical

mod-elsof neurosurgery makinguseof ontology and described 106

surgicalcases. Through classification trees and clustering

algo-rithms,theyextractedsurgicalknowledge,facilitatingthesurgical

decision-makingprocessandsurgicalplanning.In[22],Leeetal.

triedtoovercomethelimitofclassicalontologydealingwith

uncer-tainknowledgeandclassifieddifferentdiabetessyndromesusing

ontology-based inference rules. Highly expressive rules-based

1http://bioportal.bioontology.org/ontologies/EPILONT. 2http://prism.case.edu/prism/index.php/.

(3)

languages,likeSWRL[23],failinrepresentingfuzzyinformation

[24],asincaseofsymptomsoccurrence.Forthisreason,weexploit

ontologiestoextractnewprobabilisticinformationconsideringthe

depthofconceptswithrespecttotherootconceptandthedistance

betweensymptoms.

2.2. Genetics-baseddatamining

Datamining(DM)isamachinelearningprocedure,whichcan

beusedtoautomaticallyfindusefulpatternsfromadataset[25].

Theprocedurehasbeensuccessfullyappliedinvariousfieldsand

ismakingitswayintohealthcaresystems.Dataminingalgorithms

canlearnfromexamplesandmodelthenon-linearrelationships

betweenvariables.Themostcommonlyuseddatamining

algo-rithms for health care systems include naive Bayes classifiers,

neural networks, support vector machines, logistic regression,

fuzzyrulesanddecisiontrees.

ThereareseveralDMtechniquesdevelopedfordiagnosing

dis-eases.Forexample,Soni etal.[26],andDangareand Apte[27]

presenteddataminingtechniquesforheartdiseasediagnosis,and

Ganesanetal.[28]presentedtheuseofartificialneuralnetworks

forcancerdiagnosis.

DMtechniquesarealsodevelopedforprognosingdiseases.In

[29], a Bayesianexpertsystemfor clinically detectingcoronary

arterydiseaseisgiven.In[30],DMtechniquesareusedfor

pre-dictingheartattacks.In[31],artificialneuralnetworkshavebeen

appliedforprognosingendstagekidneydisease.TheworkofFloyd

[32]presentstheapplicationofDMtechniquesforprognosisofthe

pancreaticcancer.

Moreover,someauthorscomparedtheperformance of

algo-rithms for the diagnosis or prognoses purposes. In [33], the

discriminatorypowerofk-nearestneighbors,logisticregression,

artificial neural networks, decision trees, and support vector

machines on classifying pigmented skin lesions for diagnosis

purposeis analyzed.In[34],asummary ofcomparisonof

well-performingDMalgorithms usedfor bothdisease diagnosisand

disease prognosis is given. Prasad et al. [35] compared

auto-associativememoryneuralnetworks,Bayesiannetworks,iterative

dichotomized 3 (ID3) and C4.5in the diagnosisof thedisease

asthma.

DMtechniquesarefurthermoreappliedtoanalyzevarious

sig-nalsandtheirrelationshipswithparticulardiseasesorsymptoms.

Anexampleistheapplicationofsupportvectormachinesin

auto-maticseizuredetectioninEEG,whichcanbeusedindiagnosing

neurologicaldisordersrelatedtoepilepsy[36].

Inadditiontodiseasediagnosesand/orprognoses, DM

tech-niquesareusedtoextractdiagnosticrulesfrommedicaldataset

[37,38].Therulesgeneratedaresimilartothosecreatedmanually

inexpertsystemsandthereforecanbeeasilyvalidatedbydomain

experts.ForadetailedreviewofDMtechniquesandchallengesin

miningmedicaldata,pleasereferto[39–42].

Thegenetics-baseddataminingmethodpresentedinthispaper

isrelatedtotheworkintheareaofdataminingfordiscovering

temporalhiddenknowledgefrommedicaldataset.In[43],learning

aBayesiannetworkandrulesisusedtodiscovermedical

knowl-edge.Agrammar-basedgeneticprogrammingisusedasasearch

algorithm.Themethodisappliedinthedomainsoflimbfracture

and scoliosis. Thework by Tan et al.[44] utilizesgenetic

pro-grammingandgeneticalgorithmstoevolveclassificationrulesfor

hepatitisandbreastcancerdatasets.NuovoandCatania[45]

inte-gratedfuzzylogicand geneticalgorithmsinordertodiscovera

fuzzyrulebaseddiagnosticsystemfrommedicaldatasetsforbreast

cancerandPima-IndiandiabetesdatafoundintheUCIrepository

ofmachine learning databases[46], andmedical dataon

apha-siafoundinAachenaphasiadatabase[47].Dametal.presented

a neural-basedlearning classifier system[48] that incorporates

neuralnetworksintoalearningclassifiersystems[49].Themethod

wasappliedonmedicaldatasetssuchasthediabetes,breast

can-cer,heartdiseaseandlymphographyfoundintheUCIrepository

ofmachinelearningdatabases.In[50]decisiontreesareevolved

usingevolutionaryalgorithms.

3. Methods

In thissectionwe giveanoverviewofthemethods thatwe

developedandusedtodeterminethetypeofepilepsyofpatients

basedonclinicaldatadescribedinSection4.1.

3.1. Ontology-basedclassification

Theontology-basedclassificationmethodfocusesonepileptic

symptomscorrelation.Theontologicalmodelingdescribesthe

rela-tionshipsbetweenclassesandclassesandindividuals.

Inthe“patientseizureontology”,ictalsymptomsareontological

instances(i.e.thegroundlevelobjects)ofclass“ictal-symptom”

(suchas“visualillusions”,“fear”,etc.happeningduringseizures).

Therelationshipbetweenclassesisexpressedbypredicates(such

as“subclassOf”,“hasManifestation”,etc.).Inthedomain

ontol-ogy,therearesomegeneralclasses(i.e.collectionsofobjects)like

the“ictal-symptom”classwhichgroupsallthepossible

symp-tomsthatcanbeobservedduringanepilepticseizure.Fig.1shows

theseizureontologymodel.

The “patient” class groups all the patients of the dataset

and itis linkedtothe“seizure”generalclass bytheproperty

“hasSeizure”. Every epileptic seizure is described by a set of

manifestations, defined in the “manifestation” and

“noMani-festation” classes, which refer toobserved and not observed

symptomsduringseizurebythecliniciansandthesetwoclasses

are linkedtothe“seizure” class by“hasManifestation” and

“hasNoManifestation”relationshipsrespectively.Inadditionto

this,observedmanifestationsarecharacterizedbytheiroccurrence

time.“early”and“late”classesaresubclassesofthe

“manifes-tation”class.Symptomsmanifestedinthefirst10sbelongtothe

“early”class,whilelater symptomsbelongtothe“late”class.

“manifestation”and“noManifestation”classesaredefinedas

“unionOf”objectsoftheclass“ictal-symptom”.Asin[51],inorder

totranslatetherelationshipbetweenpairsofsymptoms,

consider-ingtheirproperties,wecomputedthedistancesintermsofgraph

arcs.

TheCkmatrix,whichexpressesthecorrelationbetweeneach

pairof symptoms(i.e.thelikelihood that ifduringan epileptic

seizuresymptomiispresent,symptomjispresentaswell)fora

specificpatientk(k∈{1,...,M},whereMisthenumberofpatients)

iscomputed.

Eachcorrelationmatrixelementck(i,j)betweenpairsof

symp-tomsiscalculatedusing

ck(i,j)=

(d(si)+d(sj))˛+ˇ

((D(si,sj)∗˛)+2max(d(si),d(sj)))+ˇ

(1)

i,j∈{1,...N},whereNisthetotalnumberofsymptoms,siandsj

aretheconsideredsymptoms(e.g.“headdeviation”and“visual

illu-sion”),d(si)andd(sj),arethedistancesofsiandsjfromtheseizure

classintheontologytreeasshowninFig.2.D(si,sj)represents

thelengthoftheshortestpathbetweensiandsjintheontology

tree(numberofarcsbetweensiandsj)asshowninFig.3aandb.

max(d(si),d(sj))representsthelongestdistancefromthe“seizure”

classinthedomainontologytreeofthetwosymptomssiandsj.The

variables˛andˇrangefrom0to1.˛isusedtobalancethe

pro-portionbetweendandDvalues,ˇmakesthedenominatornotto

(4)

Fig.1. Ontologymodelofapatientseizurewithearlyandlatesymptomsmanifestations.“eyesdeviation”isobservedinthefirst10softheregistration,while“head

deviation”and“visualillusion”appearafterthefirst10s.

Fig.2.Thedistanceofsymptomsifromthe“seizure”classintheontologytreeiscalculatedasthenumberofarcsinbetweenanditisequalto2.

Fig.3.Thedistancebetweentwosymptoms(D)iscalculatedasthenumberofarcsbetweenthem:e.g.thedistancebetween“headdeviation”and“visualillusion”is2arcs

(5)

Fig.4.InteractionbetweentheGAandtheMLtools.InthefigureMindicatesthepopulationsizeofthegeneticalgorithm.

Fig.5.AchromosomeencodingasolutionproposedbytheGA.ThegeneCjencodesthejthclassifiertotrainanduse.ThefirstNgenesfollowingthegeneencodingthe

classifierrepresentthetimeintervalstoconsider.Forsymptomsithetimeintervalisgivenbytesi−t

b

si.ThegeneFisaflagencodingwhetherafeatureselectionwillbe

employed.Ifitsvalueisone,thenfeatureselectionwillbeemployed,andifitsvalueiszero,nofeatureselectionwillbeused.ThelastgeneFsmencodesthefeatureselection

methodtoapply.

EachelementmcTLE(i,j)andmcExTE(i,j)ofthemeanofthe

matri-cesthatexpressthecorrelationbetweensymptomsforeachgroup

(TLEandExTEpatients,MCTLEandMCExTEmatrices),iscomputedas

mc(i,j)TLE= 1 W W



w=1 Cw(i,j) (2) mc(i,j)ExTE= 1 V V



v=1 Cv(i,j), (3)

whereW+V=M,WandVarethenumberofpatientsofthedataset

withtemporallobeepilepsiesandextratemporallobeepilepsies

respectively.

Inordertoclassifytheseizuretypes,eachseizureisassigned

tothegroupwhosedistancefromMCisminimum,wherethe

dis-tances,DTLEandDExTE,arecomputedby

DTLE=











N



i=1



N j=1

(ch(i,j)−mc(i,j)TLE)2

(4) DExTE=











N



i=1



N j=1

(ch(i,j)−mc(i,j)ExTE)2

(5)

h∈{1,...H},whereHisthetotalnumberofpatientstobetested.

Weoptimized˛andˇusingthetenfoldcross-validationonthe

dataset[52]bycontinuouslychangingtheirvalues.Varyingthe

val-uesof˛andˇ,doesnotaffecttheclassificationpower,evenifthe

correlationbetweensymptomsischanged.Wethereforesetboth

˛andˇto0.5,asthesumofthevariableshastobeone[51].

Theepilepticseizuresontologyis builtinontologyweb

lan-guage(OWL),thepopularontologicallanguage,usingProtégé3.5,

anopen-sourcetool,forcreating,editingandvisualizingontologies

[53].Wefollowedtheproposedguideline[54],collectingspecific

domainknowledgefromexperts,textbooksandpapersintegrating

existingontologiesanddefiningclassesandclasshierarchy.

3.2. Genetics-baseddatamining

TheGDM algorithm is a combinationof a geneticalgorithm

(GA),Pyevolve[55],andmachinelearning(ML)toolsforsupervised

learningWEKA[56]andOrange[57],whichimplementdifferent

classifiers(e.g.decisiontrees,multilayerperceptrons,support

vec-tormachines,etc.).TheflowchartofGDMisshowninFig.4.

Theparametersthatareoptimizedbythegeneticalgorithmare:

1theclassifier,Cj,totrain,

2theinstantoftimeinwhichasymptomsistartstsbiandtheinstant

oftimewhenitendste

si,

3thefeatureselectionmethod,Fsm,thatisusedtodetermine

symp-tomswhichplayanimportantroleinclassifyingpatientsintoTLE

andExTEgroups.

TheGAstartsbygeneratingrandomsolutions(i.e.the

chromo-somess1,s2,...,sk,...,sM,whereMisthenumberofsolutions).

Fig.5showsachromosomeencodingpropertiesofasolution.Each

solution,sk,willbedecodedintotheclassifiertotrainCj,the

fea-tureselectionflag,F,thefeatureselectionmethodtoapply,Fsm,

andthecorrespondingtrainingdata.Aftertraining,themachine

learningtoolsgivebacktheconfusionmatrix[58]thatresultsafter

thetenfoldcross-validation[52]isperformed.Fromtheconfusion

matrix,thefitnessoftheindividual(qualityofasolution)is

calcu-latedusingtheharmonicmeanaccuracy(HMACC),whichisgiven

by

HMACC(sk)=

2

(1/(1−FP(sk)))+(1/(1−FN(sk))),

(6)

Table1

GAparameters.

Populationsize 50

Mutationprobability 0.05

Crossoverprobability 0.8

Maximumnumberofgenerations 100

Selectionmethod Rankbasedselection[59]

whereFP(sk)standsforthefalsepositiverateandFN(sk)standsfor

thefalsenegativeratecorrespondingtoasolutionsk.Thevalues

fortheHMACCliebetweenzeroandone.AhighervalueofHMACC

meansabettersolution.IfasolutionclassifiescorrectlyalltheTLE

patientsbutmisclassifiestheExTEpatients,itwillresultinalow

HMACCvalue.

Theoptimizationprocessof thegeneticalgorithm isrun for

somegenerations and stopped if the maximum HMACC sofar

obtaineddoesnot improveover the last 20 generations orthe

maximumnumberofgenerationsisovercome.Table1showsthe

parameters of the geneticalgorithm for resultsobtained using

genetics-baseddataminingalgorithm.Theparameterswereset

manuallyandkeptthesameforallexperimentsreportedinthis

paper.

4. Experimentsandresults

Toevaluatemachinelearningmethods,onehastotakecareof

afewgeneralproblems.Forcategorizingtheepilepsytypesinto

temporal(TLE)andextra-temporal(ExTE),themethodspresented

inthispaperaretrainedonatrainingdatasetandtestedonatest

dataset.Thesetwodatasetswereselectedtobedisjointbecausethe

algorithmsareoptimizedonthetrainingdatasetforinstanceusing

crossvalidation.Sincethetestdatasetisnotincludedinthetraining

datasetatall,itiscompletelyunknowntothemethodswhenit

ispresentedtothemethodstotesttheirperformance.Hencethe

responsesofthemethodsgiveanideaonhowthemethodswould

performonanewdatadataset(newpatientsinourcase).

Inthissectionwedescribefirsttheclinicaldataofpatientswe

usedforanalyzingtheperformanceofthemethodsinSection4.1

andthentheresultsobtainedwithitinSections4.2,4.3and4.4.In

contrasttothestatisticalanalysisgiveninSection4.4,inSections

4.2and4.3wedealwiththecomparisonbetweenthemethodsand

clinicians.Forcliniciansre-samplingoftestdatasetsisnot

possi-blebecausetheclinicianswouldremembertheexamplesinthe

datasets.Therefore,different“trials”canonlybedoneby

differ-entclinicians.Inthefirsttest(Section4.2),weprovidedaclinician,

whoselevelofexpertiseisveryhigh,withthreetestdatasetsof

patients,andweaskedhertoclassifyeachpatient.Inthesecond

test(Section4.3)weaskedsevendifferentclinicians,whose

exper-tiseishigh/veryhigh,toclassify30patientsthatdidnotbelongto

anytraining,testingorvalidationdatasetpreviouslyused.

4.1. Descriptionofdatasetused

Theclinicaldatasetofpatientsthatweusedinourworkwas

obtainedfromthree clinicalepileptological centers:CarloBesta

NeurologicalInstitute,NiguardaCaGrandaHospitalandSanPaolo

Hospital,allinMilan,Italy.Allpatientsusedintheexperiments

reportedinthispaperaredrug-resistantepilepticpatients,whodo

notpresentanyotherpathology(e.g.malignanttumors),andthey

wereoperatedonandseizure-freeforatleastoneyear.Thisfact

isusedasaground-truthforthegenerationofthelabelsofthe

patients,whichareTLEandExTE.Allofthepatientshavesigned

aninformedconsenttothetreatmentoftheiranonymizeddata.

Thedatasetfor theexperimentsconsistsof99 labeledpatients,

ofwhich 60sufferfromTLEand 39fromExTE.Furthermore,an

additionaltestsetof30patients,ofwhich10sufferfromTLEand

Table2

Patientsdataset’soccurrencetable.Therowsareforsubjectiveandobjective

symp-toms,thecolumnsareforthetwogroupsofpatients(TLEandExTE).Thefirst14

symptomsaresubjective,thelast15symptomsareobjective.

No. Symptom TLE ExTE

1 Epigastric 20 1

2 Auditoryillusion 0 0

3 Auditoryhallucinations 0 2

4 Visualillusionsimple 1 0

5 Visualillusioncomplex 0 1

6 Visualhallucinationsimple 0 1

7 Visualhallucinationscomplex 0 1

8 Olfactivehallucinations 1 0

9 Gustatoryhallucinations 1 0

10 Psychicmanifestation 10 2

11 Fear 2 3

12 Autonomic(tachicardia,dyspnoea,etc.) 11 8 13 Motor-sensitivemanifestationmonolateral 1 2 14 Motor-sensitivemanifestationbilateral 1 0

15 Oroalimentaryautomatisms 36 10

16 Gesturalautomatismsmonolateral 26 14

17 Gesturalautomatismsbilateral 16 12

18 Headdeviation 35 38

19 Eyesdeviation 32 29

20 Motorclonicmonolateralarm 5 6

21 Motorclonicbilateralarms 1 2

22 Motorclonicmonolateralleg 1 3

23 Motorclonicbilaterallegs 1 4

24 Motorclonicmonolateralmouth 3 5

25 Motordystonicmonolateralarm 27 23

26 Motordystonicbilateralarms 14 10

27 Motordystonicmonolateralleg 2 5

28 Motordystonicbilaterallegs 6 7

29 Autonomic(flushing,shialorrhea,vomiting) 11 3

20fromExTE,isusedfortheexperimentdescribedinSection4.3.

Thedatasetisgeneratedasfollows.Onceclinicianshavea

collec-tionofvideorecordingsofepilepticseizures,theycreateatablefor

eachpatientinwhich29symptomsarepresentandwhichgives

thedurationinsecondsoftheseizures.Thefirst14 rowsofthe

tablecorrespondtothesubjectivesymptomsandthenext15to

theobjectivesymptoms.Foreachpatient,cliniciansfillinthetable,

assigningtothesymptomsobserveda‘1’,anda‘0’tomanifestation

notobserved(Fig.6).Thepatientfromwhichthedataisshownin

Fig.6hasthreedifferentsymptoms(“epigastric”sensation,“visual

illusionsimple”and“motordystonicmonolateralleg”).The

occur-renceofsymptomsinpatientsdatasetusedforthisworkisshown

inTable2.

4.2. Test1.ComparisonofOBCandGDMwithasingleclinician

ThecomparisonofOBCandGDMdiagnosiswithasingle

clin-ician,expertinepilepsytypeclassification,isperformedonthree

testdatasetsgeneratedfromthetrainingsetweobtained.The

sin-gleclinician’slevelofexpertiseisveryhighandsheisoneofthe

7clinicianswhoparticipatedinthesecondstudy(seeSection4.3).

Thetestdatasetsaregeneratedbyselectingrandomentriesfrom

thetrainingset.Afterselectingthetestdatasets,therestofthedata

isusedintrainingOBCandGDM.Pleasenotethatwhilethe

mem-bersofthetestdatasetsdonotchangeonceselected,themembers

ofthetrainingandvalidationwillchangeduringcross-validation.

Table3givestheexperimentalsetupusedforTest1.

Table4showstheperformancewithrespecttoaccuracy,that

isthenumberofcorrectlyclassifiedpatientsfortheclinician,OBC

andGDMonthetestdatasets.Theaccuracyweusedisgivenbythe

equation

Accuracy=#correctlyclassifiedTLEs+#correctlyclassifiedExTEs

#TLEs+#ExTEs (7)

wherethesymbol#standsfor“numberof”.Cochrantestwas

(7)

Fig.6.Exampleofpatientsymptoms.Symptomsarelistedinthefirstcolumn.A‘1’standsforapresenceofasymptom,anda‘0’fortheabsenceofasymptomataparticular time.

AscanbeseenfromTable4,theperformanceofOBCandGDMis

betterthantheclinicianfordataset0andcomparableperformance

fordataset1anddataset2evenifthereisnostatisticalsignificance

(p0=0.2592,p1=0.5292,p2=1).Table5showsthereliabilityofthe

agreementsusingtheFleiss’kappavalues[61]amongtheclinician,

OBCandGDM.Fordataset0anddataset2,OBCandGDMhavethe

highestreliabilityofagreements,whilefordataset1GDMhasthe

highestreliabilityagreementwiththeclinician.Fordataset0,GDM

hastheleastreliabilityofagreementwiththeclinician.

4.3. Test2.ComparisonofOBCandGDMwithsevenclinicians

Sevenclinicians,expertsinthefieldofepilepsydiagnosis,were

askedtoclassifyepilepsytypesonthebasisofapatient’sseizure

symptoms.Pleasenotethatthelevelofexpertiseofalloftheseven

cliniciansishigh/veryhigh.AccuracywascomputedasinEq.(7)

Table3

Test1.ExperimentalsetupusedforcomparingOBCandGDMclassificationwitha

singleclinician.

Dataset0 Dataset1 Dataset2

TLE ExTE TLE ExTE TLE ExTE

Training 39 19 39 29 49 32

Validation 4 5 7 3 5 4

Testing 17 15 14 7 6 3

Table4

Test1.Accuracyobtainedfortheclinician,OBCandGDMonthreetestdatasets.

Dataset0 Dataset1 Dataset2

Clinician 18/32 13/21 7/9

OBC 24/32 14/21 7/9

GDM 21/32 16/21 7/9

Table5

Test1.Fleiss’kappavalues.

Dataset0 Dataset1 Dataset2

OBCandclinician 0.13 0.25 0.13

GDMandclinician 0.02 0.57 0.45

OBCandGDM 0.60 0.49 0.59

Clinician,OBCandGDM 0.55 0.36 0.31

Table6

Test2.Experimentalsetup.

TLE ExTE

Training 45 26

Validation 5 3

Testing 10 20

andtheCochrantestwasperformed(p≤0.05).Table6showsthe

experimentalsetupforTest2.

AscanbeseenfromTable7bothOBCandGDMperformatthe

levelofexperts(p=0.4139).Table8showsthecollaborative

deci-sionmadebytheclinicians,thealgorithms,andthealgorithmsand

theclinicians.Amajorityofvotesisusedtodeterminethe

deci-sionofthecliniciansandthealgorithms.Fortheclinicians,ifmore

thanthreecliniciansvotedinfavorofanepilepsytype,thiswillbe

consideredasthedecisionoftheclinicians.ForOBCandGDM,only

caseswherebothvotedinfavorofanepilepsytypeisconsidered

asthedecisionofthealgorithms.Ifbothdisagree,thiscasewillbe

consideredasatiesincethereisnomajorityvotinginfavorofan

epilepsytype.Anotherinterestinganalysisistheagreementlevels

oftheclinicians,thealgorithmsandthealgorithmsandthe

clini-cians.Table9showstheFleiss’kappavaluesfortheclinicians,the

algorithmsandthealgorithmsandtheclinicians.

Fromthetable,onecanseethatthereliabilityofagreements

ofthecliniciansisbetterthanthereliabilityofagreementsofthe

algorithms.Moreover, onecan seethatthereliability of

agree-mentsofthealgorithmswiththecliniciansiscomparablewithGDM

slightlybetterthanOBC.FollowingLandisandKoch[62]thereisa

fairagreementbetweenOBCandGDMandamoderateagreement

amongtheclinicians.Onecanalsoseethatthere isamoderate

agreementamongOBC,GDMandtheclinicians.

Table7

Test2.AccuracyobtainedbyOBC,GDMandsevenclinicianson30testingpatients.

ThesymbolsCl-1,Cl-2,...,Cl-7standforclinician1,clinician2,...andclinician7

respectively.

OBC GDM Cl-1 Cl-2 Cl-3 Cl-4 Cl-5 Cl-6 Cl-7

18/30 18/30 15/30 16/30 19/30 19/30 19/30 20/30 22/30

Table8

AccuracyobtainedbyOBCandGDMandsevenclinicianson30novelpatientsusing majorityvoting.

Clinicians OBCandGDM All

Majorityvoting 20/30 12/30 20/30

Tie – 11/30 –

Table9

Test2.Fleiss’kappavaluesfortheclinicians,OBCandGDM,andOBC,GDMandthe clinicians.

Fleiss’kappa

Clinicians 0.55

OBCandclinicians 0.44

GDMandclinicians 0.48

OBCandGDM 0.27

(8)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

HMACC Boxplot

M D G C B O

HMACC

Fig.7. AboxplotdiagramoftheHMACCof100valuesfrom100repetitionsofthe trainingandtestingofthealgorithms.TheboxplotoftheHMACCisonlyforthetest accuracy.

4.4. Test3.Statisticalanalysisofthemethods

Togetan ideaof theperformance that wecan expectfrom

thealgorithmsweusedstandardmeasuresfromstatistics,suchas

expectedvalue(mean)andthestandarddeviation.Inordertobe

abletocalculatesuchvalueswehavetorepeatthetrainingphase

andthetestingofthealgorithmsindependentlyandmultipletimes.

Sincethedatacomesfromrealpatientsitisnotpossibleto

gener-atealargenumberoftestsamples.Therefore,theonlypossibility

istogeneratenewtrainingandtestdatasetsbyre-samplingthe

availabledata.

Wedecidedtouse20%oftheavailabledataasthetestdataset

andtheremaining80%asthetrainingdataset.Foreachtrialthetest

datasetisdrawnfromthedatabyrandomselection,20%ofthe

tem-poraland20%oftheextra-temporalexamplesandtheremaining

examplesareusedasthetrainingdataset.Selectingthesamples

inthiswayguaranteesthatthedistributionofthetwoclassesis

preservedforthetestandthetrainingdataset.Thealgorithmsare

trainedonthetrainingdatasetandaftertheyareoptimizedfortheir

performancetheyareevaluatedonthetestdataset.Whenthetest

performanceiscalculateditisimportantthatthealgorithmsare

completelyresetsothattheydonothaveanyinformationabout

thepriortrial.Thetrainingandtestingofthealgorithmsisrepeated

100times.

Afterthetestperformanceofthealgorithmsismeasured100

times,themeanvalueandthestandarddeviationofalltrialsare

calculated.ThemeanvalueisnowtheexpectedvalueoftheHMACC

thatanalgorithmwouldmakeonnewpatientdataandthestandard

deviationgivesameasureofhowstablethealgorithmis.Therefore,

ahighexpectationvalueoftheHMACCaswellasasmallstandard

deviationisdesired.

Theresultsobtainedbythealgorithms areshown in a

box-plotdiagraminFig.7.Theplotshowsthemedianaccuracyofthe

algorithms,whichgivesagoodideaoftheexpectedlikelihoodof

correctlyclassifyingnew data.Thebox aroundthe meanvalue

showsthe25thand75thpercentilesoftheaccuracyvalues

dur-ingthe100repetitionsandthewhiskersshowtheminimumand

maximumrangeswhicharenotdetectedasoutliers,whileoutliers

aremarkedascrosses.Thisgivesanideahowsensitivethe

algo-rithmistotheselectionofthetrainingdataset.Inarealapplication,

aftertrainingthealgorithmwithalloftheavailabledata,thisisthe

rangeinwhichonecanexpecttheaccuracytolie.Ingeneral,itis

Table10

Themean()andthestandarddeviation()oftheHMACCforbothmethodsonthe

100testruns.

 

OBC 0.7125 0.1080

GDM 0.8072 0.1065

expectedthattheexpectationoftheaccuracywillincreaseandthe

standarddeviationwilldecreasewithadditionaltrainingexamples.

Therefore,ifthealgorithmsarecontinuouslytrainedonnewdata,

theirperformancewillincreaseovertime.

InFig.7onecanseethatthemedianaccuracyfortheGDMis

about82%andfortheOBCitis70%.Thevaluesforthemean

accu-raciesandthestandarddeviationsforthetwoalgorithmsaregiven

inTable10.Itisobviousthatbothalgorithmsshowamuchbetter

resultthanrandomguessing,whichwouldbe50%.

4.5. ComputationalefficiencyofOBCandGDM

TheaveragetrainingtimeneededforGDMfortrainingon79

patientstakesonaverage3h,whiletheresultingclassifiertakes

onaverage0.1sfordeliveringtheoutputoftheclassificationona

standardpersonalcomputer.Sincegeneticalgorithmscanbeeasily

parallelized,inprincipleitshouldbepossibletoreducethetraining

timeofGDMdrastically.ForOBCthetimethatisneededtotrainthe

classifieron79patientsisonaverage10min,andoncetheclassifier

iscomputedittakes0.3mstoclassifyanovelpatient.Thisshows

thatoncethealgorithmsaretrainedtheycanbeappliedinpractical

settings.

5. Conclusionandoutlook

Clinical evaluation of epileptic patients is based on careful

visualdetectionofsymptomsduringepilepticseizuresinlongterm

Video-EEGrecordings.Weproposedtwomachinelearning

meth-odstohelpthecliniciansduringthediagnosis.Thefirstmethod

(OBC)canincorporatetheknowledgeofexpertcliniciansdirectly,

whilethe secondmethod(GDM) canmine ordiscover medical

knowledgeinimplicitformfrommedicaldata.Ourontology-based

classificationmethod(OBC)isnotbasedoninferencerulessince

itisnotpossibletodefinedeterministicrulescorrelatingthe

pres-enceortheabsenceofaparticularsymptomtoepilepsytypes.Both

methodsconsiderthetimingofoccurrenceofsymptomssinceearly

symptomsare moresignificantthanlate symptoms(i.e.typical

behavioralpatternsarecloselyrelatedtotheseizureinitialspread

location).Inadditiontothis,thegenetics-baseddatamining

algo-rithmhasafeatureselectionstep,whereafeatureisaparticular

symptom.Throughfeatureselectionitispossibletoidentifywhich

featureplaysanimportantroleintheclassificationoftheepilepsy

types.Itispossibletoconcludethatthemethodscanbeappliedto

discoverinterestingspatial(symptomsplayinganimportantrole)

andtemporal(sequenceofsymptoms)patternsinmedicaldatafor

aparticulardisease.

Whencomparingtheperformanceofthealgorithmswiththat

ofa singleclinician, both OBCand GDM showa slightly better

performance(evenifnotsignificant)thantheclinician.When

com-paredwiththeperformanceofthewholepopulationofclinicians

(inourcasesevenclinicians),theaccuracyofthemachine

learn-ingalgorithms(18/30)isslightlyworsethantheaverageaccuracy

of theclinicians (18.5/30) (see Section 4.3).Results of the two

comparisonsshowthattheperformanceofthemachinelearning

algorithmsisapproachingthelevelofexpertsforthecasewhere

boththecliniciansandthemachinelearningmethodsexploitonly

theclinicaldatausedinthispaper(stringsof‘1’sand‘0’sintime).

(9)

thatGDMperformsslightlybetterthanOBC,andbothalgorithms

obviouslyperformmuchbetterthanrandomguessing.The

accu-racyvaluesobtainedbythemethodsareatfirstglancenotexpected

bypersonsnormallyworkingwithothermachinelearning

bench-markproblems,sinceinthesebenchmarkproblemstheaccuracy

values obtainedareusually higher thanthe resultsreportedin

thispaper.Therefore,onewouldconsiderthattheresultsarenot

satisfactory.However,when comparingtheperformance ofthe

algorithmswiththatoftheclinicians,itcanbeeasilyseenfrom

theresultsthatthealgorithmsapproachthelevelofexpertiseof

veryexperiencedclinicians,whoparticipatedintheexperiments

reportedinthepaper.

ConsideringtheFleiss’kappaindex,computedvaluesarestrictly

dependentontheconsidereddatasets (theagreement between

OBCandGDMcanbesubstantialorfairincaseofdifferentdataset

considered,evenif thedatasetshavethesame numberof

sub-jects).Kappavaluesinterpretationissomewhatcontroversial,the

referencevalueforouranalysisistheagreementamong allthe

clinicians,whichis0.55.Fromtheresults,itappearsthatthe

devel-opedmethodshavethesameperformanceincomparisontothe

poolofconsideredclinicians,whenallareanalyzingthesamedata.

Aninteresting application of the machine learning methods

developedin thispaperwouldbein theareaoftraining young

cliniciansin theclassification of epilepsytypes. It would make

thelearningprocessfortheyoungclinicianseasier.Incaseofless

experiencedclinicians,automaticclassificationalgorithmscanbe

usedtoteachhowasymptom,ratherthananotherone,couldbe

moreimportantandsignificantforaspecificgroupofepilepsy.An

investigationofthistopicwouldbeinterestingforfuturework.In

additiontothis,themachinelearningmethodscouldbeapplied

togenerateanalternativeopinionforclassifyingepilepsytypes.

Thiswouldhelp tovalidatetheopinionof theclinicianin case

ofagreementbetweenthemethodsandtheclinicians,orwould,

incase of divergence,forcetheclinician torevise his/her

deci-sionorlookatthecase inmoredetail. Afurtherapplicationof

themethodsliesinformalizingmedicalknowledgeandbest

prac-tices.Theadvantageofdevelopinganontologyofepilepticseizure

isthesymptomsclassificationstandard, sothatallclinical

cen-terswouldbeabletorepresentanepilepticseizureinthesame

way.The useof anontologytodefine symptomsof an

epilep-ticseizurewillalsoovercometheproblemofterminologyused

by different clinicalcenters for symptoms description.Thus, it

isimportant tohavea glossaryof thesymptoms: ourontology

willfacethisdiscrepancy(i.e.“owl:sameAs”statementindicates

that twoindividuals actually refer tothesame thing:the

indi-viduals have thesame “identity”,so theyinheritall properties

or “rdfs:seeAlso”statement used to indicatea resource that

mightprovideadditionalinformationaboutthesubjectresource).

Thisis obviously a difficult taskand requires a detailed

analy-sisofthestructureandtheconceptsofmedicalterminologies.It

canbeachievedbyconstructing medicaldomainontologiesfor

representingmedicalterminologysystems.Inthiswork,the

real-izationofanontology(OBC)thatrepresentsthedomainofepilepsy

diagnosiswillhelpinformalizingthesymptomsobservable

dur-inganepilepticseizureandtocorrectlyclassifythetwotypesof

epilepsy.Moreover,afurthersubclassificationshouldbeobtained

inExTEconcerningthedifferentlobes(parietal,insular,occipital

andfrontal).

An important bottleneck that always arises when machine

learningistransferredtopracticalimplementationisthe

knowl-edgetransferfromtheexperttotheagent.Inthiscase,clinicians

areabletoclassifypatientsobservingtheseizureusingtheir

sensi-tivityacquiredbyexperience,whichisdifficulttotranslateintothe

artificialintelligencealgorithm.Ithastobenoticedthatinstandard

clinicalroutine,cliniciansbasetheirevaluationalsoon

anamnes-ticandneuroimagingdata.Inordertoimprovetheperformance

ofautomaticclassificationalgorithms,moreinformationshouldbe

encodedandsubsequentlyprovidedtotheclassifiers.

In healthcare,data isfrequentlynot idealorincomplete.In

ourcase, a problemcanarise ifcliniciansmake errors in

com-pilingthetableofsymptomsoccurrenceduringepilepsyseizure.

Thesemistakescanbecorrectedbecausewehaveasecond

reli-ablekindofdatathatisthevideorecordingsofepilepsyseizures.

Also,healthcarerecordsareunstructuredinformation.Our

onto-logicalapproachisalsosuitedforsuchkindofdata,sinceontologies

canbestructuredtoautomaticallyclassifyunstructuredtext,once

correspondencesaredefined.Futureworkcouldaddnewclinical

informationofapatient,likeanewsymptom,“staring”(a

motion-lessstatewiththeinterruptionofallthepreviousactivities),when

patientsstareatseizureonset,andwhichismorefrequent

dur-ingtemporallobeepilepsies.Anadditionalvariablecouldimprove

theclassifier, even moreif thatvariable had higheroccurrence

inasetofpopulationpatients. Theelectroencephalogram(EEG)

detectselectricalactivityinthebrainanditisafundamentaltool

fortheanalysisinthediagnosisofepilepsybecauseelectrical

alter-ations,oftenveryindicative,canalsobepresent intheabsence

ofothersymptoms.Otherdiagnostictestsincludemagnetic

res-onanceimagingandlaboratorytests,whichcanverifyorexclude

specificcauses.Inthissense,withthehelpoftheproposed

algo-rithms,differenttypesofinformationcouldbeunifiedandthen

processedtoobtainamoreaccuratediagnosis.Sinceitisknown

thatcliniciansusedifferentsourcesofinformationotherthanthe

codedclinicaldatausedinthispaperfordecision,itremainstobe

seenhowthealgorithmsimprovetheirperformanceifadditional

informationisavailable.Moreover,thereliabilityofthemethods

tocorrectlyclassifyco-morbidpatientscouldbeevaluatedinorder

toseeifthepresentedmethodsareabletolocalizetheepileptic

focusalthoughsomesymptomsmaybeproducedbysecondary

pathology.

Thepaperisafeasibilitystudyontheapplicationofmachine

learningtechniquesfortheautomaticlocalizationofthe

epilepto-geniczone.Althoughtheepilepsytypesanalyzedarecharacterized

by distinctive signs during the seizure manifestations, patients

diagnosisbasedonthevisualinspectionofsymptomsduringthe

first60scanbesubjective,asshownbycliniciandisagreementson

the30patientsanalyzed.

TheOBCandGDMmethodspresentedcanbeappliedin

min-ingmedicalknowledgeasshowninthepaper,orevenbeusedto

discoveranovelideaintheareaofmedicaldiagnosisofa

partic-ulardisease.Itisworthunderliningthatthismakestheautomatic

classificationsystemscreativeandinteractive,givingbackthe

dis-covered knowledge totheclinicians.Thisin turn improvesthe

existingknowledgeofclassifyingepilepsytypes.

Acknowledgements

Thiswork is supportedin partwithintheACTIVE European

project (FP7-ICT-2009-6-270460) and the EuRoSurge European

project(FP7-ICT-2011-7-288233).Wewouldliketothankthetwo

anonymousreviewersfortheirconstructivesuggestionsand

com-ments.

References

[1]Kahane P,Landré E, Minotti L, Francione S,Ryvlin P. The Bancaudand Talairachviewontheepileptogeniczone:aworkinghypothesis.Epileptic Dis-ord2006;8:16–26.

[2]MiserocchiA,CascardoB,PiroddiC,FuschilloD,CardinaleF,NobiliL,etal. Surgeryfortemporallobeepilepsyinchildren:relevanceofpresurgical eval-uationandanalysisofoutcome.JNeurosurgPediatr2013;11(3):1–12[Clinical article].

[3]LüdersH,AcharyaJ,BaumgartnerC,BenbadisS,BleaselA,BurgessR,etal. Semiologicalseizureclassification.Epilepsia1998;39(9):1006–13.

(10)

[4]Nobili L,Francione S,Mai R, Cardinale F, CastanaL, TassiL, etal. Sur-gical treatment of drug-resistant nocturnal frontal lobe epilepsy. Brain 2007;130(2):561–73.

[5]TassiL,MeroniA,DeleoF,VillaniF,MaiR,RussoGL,etal.Temporallobe epilepsy:neuropathologicalandclinicalcorrelationsin243surgicallytreated patients.EpilepticDisord2009;11(4):281–92.

[6]MaiR,SartoriI,FrancioneS,TassiL,CastanaL,CardinaleF,etal.Sleep-related hyperkineticseizures:alwaysafrontalonset?NeurolSci2005;26(3):220–4.

[7]JacksonGD,BriellmannRS,KuznieckyRI.Temporallobeepilepsy.In:Kuzniecky RI,JacksonGD,editors.Magneticresonanceinepilepsy.2nded.Burlington: AcademicPress;2005.p.99–176[chapter4].

[8]GuptaA,JeavonsP,HughesR,CovanisA.Auraintemporallobeepilepsy: clin-icalandelectroencephalographiccorrelation.JNeurolNeurosurgPsychiatry 1983;46(12):1079–83.

[9]BurgerS,StiegerB.Ontology-basedclassificationofunstructured informa-tion.In:Thefifthinternationalconferenceondigitalinformationmanagement (ICDIM).IEEE;2010.p.254–9.

[10]KovacsT.Genetics-basedmachinelearning.In:RozenbergG,BckT,KokJ, edit-ors.Handbookofnaturalcomputing.Berlin,Heidelberg:Springer;2012.p. 937–86.

[11]SowaJ.Ontology,metadata,andsemiotics.In:GanterB,MineauG,editors. Conceptualstructures:logical,linguistic,andcomputationalissues,vol.186 ofLectureNotesinComputerScience.Berlin,Heidelberg:Springer;2000.p. 55–81.

[12]RectorAL,RogersJE,PoleP.TheGALENhighlevelontology.StudHealthTechnol Inform1996;34:174–8.

[13]BosL,DonnellyK.SNOMED-CT:theadvancedterminologyandcodingsystem foreHealth.StudHealthTechnolInform2006;121:279–90.

[14]Bodenreider O.Theunified medicallanguage system (UMLS):integrating biomedicalterminology.NucleicAcidsRes2004;32(suppl.1):D267–70.

[15]KarlssonD.Adesignandprototypeforadecision-supportsysteminthefield ofurinarytractinfections-applicationofOpenGALENtechniquesforindexing medicalinformation.StudHealthTechnolInform2001;84:479–83.

[16]ElevitchFR.SNOMED-CT:electronichealthrecordenhancesanesthesiapatient safety.AANAJ2005;73(5):361–6.

[17]BodenreiderO.Biomedicalontologiesinaction:roleinknowledge manage-ment,dataintegrationanddecisionsupport.YearbMedInform2008;47:67–79.

[18]RosseC,MejinoJrJL.Areferenceontologyforbiomedicalinformatics:the foundationalmodelofanatomy.JBiomedInform2003;36(6):478–500.

[19]RubinDL,DameronO,BashirY,GrossmanD,DevP,MusenMA.Usingontologies linkedwithgeometricmodelstoreasonaboutpenetratinginjuries.ArtifIntell Med2006;37(3):167–76.

[20]ZhangL,WangZ.Ontology-basedclusteringalgorithmwithfeatureweights.J ComputInformSyst2010;6(9):2959–66.

[21]JanninP,MorandiX.Surgicalmodelsforcomputer-assistedneurosurgery. Neu-roimage2007;37(3):783–91.

[22]LeeC-S,WangM-H,AcamporaG,LoiaV,HsuC-Y.Ontology-basedintelligent fuzzyagentfordiabetesapplication.In:IEEEsymposiumonintelligentagents, 2009(IA’09).IEEE;2009.p.16–22.

[23]HorrocksI,Patel-SchneiderPF,BoleyH,TabetS,GrosofB,DeanM,etal.SWRL: asemanticwebrulelanguagecombiningOWLandRuleML.W3CMember Submission2004;21:79.

[24]PanJ,StoilosG,StamouG,TzouvarasV,HorrocksI.f-SWRL:afuzzy exten-sionofSWRL.In:SpaccapietraS,AbererK,Cudr-MaurouxP,editors.Journal onDataSemanticsVI,vol.4090ofLectureNotesinComputerScience.Berlin, Heidelberg:Springer;2006.p.28–46.

[25]GorunescuF.Datamining:concepts,modelsandtechniques.Berlin, Heidel-berg:Springer-Verlag;2011.

[26]SoniJ,AnsariU,SharmaD,SoniS.Predictivedataminingformedicaldiagnosis: anoverviewofheartdiseaseprediction.IntJComputAppl2011;17(8):43–8.

[27]DangareCS,ApteSS.Improvedstudyofheartdiseasepredictionsystemusing dataminingclassificationtechniques.IntJComputAppl2012;47(10):44–8.

[28]GanesanN,VenkateshK,RamaMA,PalaniAM.Applicationofneuralnetworks indiagnosingcancer diseaseusingdemographicdata. IntJComput Appl 2010;1(26):76–85.

[29]ChuC-M,ChienW-C,LaiC-H,BludauH-B,TschaiH-J,LuPaiS-MH,etal.A Bayesianexpertsystemforclinicaldetectingcoronaryarterydisease.JMed Sci2009;4:187–94.

[30]SrinivasK,RaniBK,GovrdhanA,KarimnagarJ.Applicationsofdatamining techniquesinhealthcareandpredictionofheartattacks.IntJComputSciEng 2010;2(2):250–5.

[31]DiNoiaT,OstuniVC,PesceF,BinettiG,NasoD,SchenaFP,etal.Anendstage kid-neydiseasepredictorbasedonanartificialneuralnetworksensemble.Expert SystAppl2013;40(11):4438–45.

[32]FloydS.Dataminingtechniquesforprognosisinpancreaticcancer[Master’s thesis].USA:WorcesterPolytechnicInstitute;2007.

[33]DreiseitlS,Ohno-machadoL,KittlerH,VinterboS,BinderM.Acomparisonof machinelearningmethodsforthediagnosisofpigmentedskinlesions.JBiomed Inform2001;34:28–36.

[34]Kolc¸eE,FrasheriN.Aliteraturereviewofdataminingtechniquesusedin healthcaredatabases.In:MarkovskiS,GusevM,editors.ProceedingsofICT innovations.2012.p.577–82.

[35]PrasadB,PrasadP,SagarY.Acomparativestudyofmachinelearning algo-rithmsasexpertsystemsinmedicaldiagnosis(asthma).In:MeghanathanN, KaushikB,NagamalaiD,editors.Advancesincomputerscienceandinformation technology,vol.131ofcommunicationsincomputerandinformationscience. Berlin,Heidelberg:Springer;2011.p.570–6.

[36]FanJ,ShaoC,OuyangY,WangJ,LiS,WangZ.Automaticseizuredetectionbased onsupportvectormachineswithgeneticalgorithms.In:WangT-D,LiX,Chen S-H,WangX,AbbassH,IbaH,etal.,editors.Simulatedevolutionandlearning, vol.4247ofLectureNotesinComputerScience.Berlin,Heidelberg:Springer; 2006.p.845–52.

[37]MeamarzadehH,KhayyambashiM,SaraeeM-H.Extractingtemporalrulesfrom medicaldata.In:Internationalconferenceoncomputertechnologyand devel-opment(ICCTD09),vol.1.2009.p.327–31.

[38]MenaLJ,OrozcoEE,FelixVG,OstosR,MelgarejoJ,MaestreG.Machine learn-ingapproachtoextractdiagnosticandprognosticthresholds:applicationin prognosisofcardiovascularmortality.In:Computationalandmathematical methodsinmedicine;2012,6pp.

[39]FreitasA.Asurveyofevolutionaryalgorithmsfordataminingandknowledge discovery.In:GhoshA,TsutsuiS,editors.Advancesinevolutionarycomputing, naturalcomputingseries.Berlin,Heidelberg:Springer;2003.p.819–45.

[40]WasanSK,BhatnagarV,KaurH.Theimpactofdataminingtechniqueson medicaldiagnostics.DataSciJ2006;5:119–26.

[41]HosseinkhahF,AshktorabH,VeenR,OwrangM.Challengesindataminingon medicaldatabases.IGIGlobal;2009.p.1393–404[chapter83].

[42]Satyanandam N,SatyanarayanaCh, RiyazuddinMd,ShaikA.Data mining machinelearningapproachesandmedicaldiagnosesystems:asurvey.IntJ ComputOrgTrends2012;2(3):53–60.

[43]NganPS,WongML,LamW,LeungKS,ChengJCY.Medicaldataminingusing evolutionarycomputation.ArtifIntellMed1999;16(1):73–96.

[44]TanK,YuQ,HengC,LeeT.Evolutionarycomputingforknowledgediscovery inmedicaldiagnosis.ArtifIntellMed2003;27(2):129–54.

[45]DiNuovoA,CataniaV.Genetictuningoffuzzyruledeepstructuresfor effi-cientknowledgeextractionfrommedicaldata.IEEEIntConfSystManCybernet 2006;6:5053–8.

[46]BacheK,LichmanM.UCImachinelearningrepository.URL:http://archive.

ics.uci.edu/ml[accessed25.11.13].

[47]Axer H,JantzenJ,vonKeyserlingkDG.Anaphasiadatabaseonthe inter-net: a model for computer-assisted analysis in aphasiology. Brain Lang 2000;75(3):390–8.

[48]DamHH,AbbassHA,LokanC,YaoX.Neural-basedlearningclassifiersystems. IEEETransKnowlDataEng2008;20(1):26–39.

[49]UrbanowiczRJ,MooreJH.Learningclassifiersystems:acompleteintroduction, review,androadmap.ArtifEvolAppl2009;2009:1–25.

[50]KokolP,PohorecS,tiglicG,PodgorelecV.Evolutionarydesignofdecision treesformedicalapplication.WileyInterdisciplinaryReviews:DataMining andKnowledgeDiscovery2012;2(3):237–54.

[51]SiregarP,ToulouseP.Model-baseddiagnosisofbraindisorders:aprototype framework.ArtifIntellMed1995;7(4):315–42.

[52]KohaviR.Astudyofcross-validationandbootstrapforaccuracyestimation andmodelselection.In:Proceedingsofthe14thinternationaljoint confer-enceonArtificialintelligence–vol.2,IJCAI-95.SanFrancisco,CA,USA:Morgan KaufmannPublishersInc.;1995.p.1137–43.

[53]KnublauchH,FergersonRW,NoyNF,MusenMA.TheProtégéOWLplugin: anopendevelopmentenvironmentforsemanticwebapplications.In:The SemanticWeb-ISWC2004.Springer;2004.p.229–43.

[54]NoyNF,McGuinnessDL.Ontologydevelopment101:aguidetocreatingyour firstontology[Tech.rep.].StanfordKnowledgeSystemsLaboratory(KSL-01-05) andStanfordMedicalInformatics(SMI-2001-0880);2001.

[55]PeroneCS.Pyevolve:apythonopen-sourceframeworkforgeneticalgorithms. SIGEVOlution2009;4(1):12–20.

[56]HallM,FrankE,HolmesG,PfahringerB,ReutemannP,WittenIH.TheWEKA dataminingsoftware:anupdate.SIGKDDExplorations2009;11(1):10–8.

[57]CurkT,DemarJ,XuQ,LebanG,PetrovicU,BratkoI,etal.Microarraydatamining withvisualprogramming.Bioinformatics2005;21:396–8.

[58]TingK.Confusionmatrix.In:SammutC,WebbG,editors.Encyclopediaof machinelearning.USA:Springer;2010.p.209.

[59]WhitleyD.TheGENITORalgorithmandselectionpressure:whyrank-based allocationofreproductivetrialsisbest.In:Proceedingsofthethird interna-tionalconferenceongeneticalgorithms.MorganKaufmann;1989.p.116–21.

[60]Cochran WG.The 2 testof goodness of fit. AnnMath Stat 1952;23(3): 315–45.

[61]FleissJL.Statisticalmethodsforratesandproportions,2ndedition,Wiley seriesinprobabilityandmathematicalstatistics.NewYork:JohnWiley&Sons; 1981.

[62]LandisJR,KochGG.Themeasurementofobserveragreementforcategorical data.Biometrics1977;33(1):159–74.

Riferimenti

Documenti correlati

We investigate (i) the complexity of deciding whether an ontology annotated with provenance information entails a conjunctive query associated with a prove- nance polynomial, and

In particular, we compare a fully automated feature selection method based on ontology-based data summaries with more classical ones, and we evaluate the performance of these methods

The cytotoxic effect of the natural porphyrin precursor 5-aminolevulinic acid (ALA) exposed to high energy shock waves (HESW) was investigated in vitro on

Although the underlying molecular mechanism remains unclear, pharma- cological blockade of AMPK or its genetic ablation by specific RNAi indicate that the activity of this kinase

The fundamental role of persistent diagrams is explicitly shown in the following Representation Theorem 1.6 [10, 14], claiming that they uniquely determine 1- dimensional

Bottom panel: Gaia (de- reddened) color-magnitude diagram of all member stars of Kshir and GD-1, previously shown in Fig 1. The absolute magnitude were then obtained by correcting

Moreover, studies on the social dynamics of knowledge in different practice contexts, and specifically how these affect science communication efforts and research uptake, also have

It accesses a number of predefined web/social/media sources (e.g., broadcasters official web sites, social networks, TV channels, etc) and processes them in order to extract