ContentslistsavailableatScienceDirect
Artificial
Intelligence
in
Medicine
jou rn al h om e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / a i i m
Automatic
classification
of
epilepsy
types
using
ontology-based
and
genetics-based
machine
learning
Yohannes
Kassahun
a,∗,
Roberta
Perrone
b,∗∗,
Elena
De
Momi
b,
Elmar
Berghöfer
f,
Laura
Tassi
c,
Maria
Paola
Canevini
d,
Roberto
Spreafico
e,
Giancarlo
Ferrigno
b,
Frank
Kirchner
a,faFachbereich3–MathematicsandComputerScience,UniversityofBremen,Robert-Hooke-Str.5,D-28359Bremen,Germany
bDepartmentofElectronics,InformationandBioengineering(DEIB),PolitecnicodiMilano,PiazzaLeonardodaVinci32,20133Milano,Italy
cCentroChirurgiaEpilessia“ClaudioMunari”,DipartimentodiNeuroscienze,OspedaleNiguardaCàGranda,PiazzaOspedaleMaggiore3,20162Milano,
Italy
dEpilepsyCenter,SanPaoloHospital,ViaA.DiRudini8,20142Milan,Italy
eDepartmentofExperimentalNeurophysiologyandEpileptology,IstitutoNazionaleNeurologico‘C.Besta’,ViaCeloria11,20133Milano,Italy
fGermanResearchCenterforArtificialIntelligence(DFKI),RoboticsInnovationCenter,Robert-Hooke-Str.5,D-28359Bremen,Germany
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received4September2013
Receivedinrevisedform24February2014
Accepted7March2014
Keywords:
Ontology-basedclassification
Genetics-basedclassification
Datamining(knowledgediscovery)from
medicaldata
Epileptogeniczoneidentification
a
b
s
t
r
a
c
t
Objectives:Inthepresurgicalanalysisfordrug-resistantfocalepilepsies,thedefinitionofthe epilepto-geniczone,whichisthecorticalareawhereictaldischargesoriginate,isusuallycarriedoutbyusing clinical,electrophysiologicalandneuroimagingdataanalysis.Clinicalevaluationisbasedonthevisual detectionofsymptomsduringepilepticseizures.Thisworkaimsatdevelopingafullyautomaticclassifier ofepileptictypesandtheirlocalizationusingictalsymptomsandmachinelearningmethods.
Methods:Wepresent theresultsachievedbyusingtwomachinelearningmethods.Thefirstis an ontology-basedclassificationthatcandirectlyincorporatehumanknowledge,whilethesecondisa genetics-baseddataminingalgorithmthatlearnsorextractsthedomainknowledgefrommedicaldata inimplicitform.
Results:Thedevelopedmethodsaretestedonaclinicaldatasetof129patients.Theperformanceofthe methodsismeasuredagainsttheperformanceofsevenclinicians,whoselevelofexpertiseishigh/very high,inclassifyingtwoepilepsytypes:temporallobeepilepsyandextra-temporallobeepilepsy.When comparingtheperformanceofthealgorithmswiththatofasingleclinician,whoisoneoftheseven cli-nicians,thealgorithmsshowaslightlybetterperformancethantheclinicianonthreetestsetsgenerated randomlyfrom99patientsoutofthe129patients.Theaccuracyobtainedforthetwomethodsandthe clinicianisasfollows:firsttestset65.6%and75%forthemethodsand56.3%fortheclinician,secondtest set66.7%and76.2%forthemethodsand61.9%fortheclinician,andthirdtestset77.8%forthemethods andtheclinician.Whencomparedwiththeperformanceofthewholepopulationofcliniciansontherest 30patientsoutofthe129patients,wherethepatientswereselectedbythecliniciansthemselves,the meanaccuracyofthemethods(60%)isslightlyworsethanthemeanaccuracyoftheclinicians(61.6%). Resultsshowthatthemethodsperformatthelevelofexperiencedclinicians,whenboththemethods andthecliniciansusethesameinformation.
Conclusion:Ourresultsdemonstratethatthedevelopedmethodsformimportantingredientsforrealizing afullyautomaticclassificationofepilepsytypesandcancontributetothedefinitionofsignsthataremost importantfortheclassification.
©2014PublishedbyElsevierB.V.
∗ Correspondingauthor.Tel.:+4942125752299.
∗∗ Correspondingauthor.
E-mailaddresses:kassahun@informatik.uni-bremen.de(Y.Kassahun),
roberta.perrone@polimi.it(R.Perrone).
1. Introduction
Fordrugresistantfocalepilepticpatients,brainsurgerycanbe
theonlyoption tocurethepatient.Theepileptogeniczone(EZ)
is the corticalarea where ictal dischargesoriginate and which
must be surgically resected or disconnectedto achieve seizure
freedom[1].ThedefinitionoftheEZisusuallyobtainedthrough
http://dx.doi.org/10.1016/j.artmed.2014.03.001
non-invasivepre-surgicalinvestigations,inmostcasesbyclinical,
electrophysiologicaland neuroimagingdataanalysis[2].Clinical
evaluation is based on the anamnestic data and onthe visual
analysisofsymptomsduringepilepticseizuresinlongtermscalp
video-electroencephalograms(video-EEG).Intra-cerebralinvasive
monitoring(stereo-EEG) maybeneededforverydifficultcases.
Clinicalmanifestationscanbejustsubjective[3](epilepticauras
withoutimpairmentofconsciousness)ormayprogressto
objec-tivesignsthatcanbeobservedandanalyzed,oftenassociatedwith
impairmentofawareness.Thechronologicalsequenceof
symp-tomsandtheirrelativedurationmightindicatetheseizureonset
areaandtheprogressiveinvolvementofadjacentbrainstructures
andis,therefore,crucialforthedefinitionoftheEZ[4],though
iden-ticalsymptomsmayarisefromdifferentcorticalregions,leadingto
misleadinglocalizationoftheEZ[5,6].Themostsignificant
symp-tomsappearinthefirst60s,evenifthepost-ictalconfusioncanbe
prolonged,mainlyifassociatedwithspeechdisturbances,whichis
thecaseforEZlocatedinthedominanthemisphere.
Temporallobeepilepsies(TLE),inwhichseizuresoriginatefrom
thetemporallobe,representthemajorityoffocalepilepsies(over
60%).ThesurgicaltreatmentofTLEhasthebestresultswithalmost
70%ofthepatientsreportedtobeseizure-freeaftersurgery[7].The
duration,thetimecourse,thelossofconsciousnessandthe
post-ictalphenomenaaredifferentinTLEandextra-temporalseizures
(ExTE).Seizuresarisingfromthetemporallobehavearelatively
gradualevolution (comparedtoExTE)[8].In TLEthesubjective
phenomenaaremorefrequent,thelossofconsciousnessismore
gradual,seizuresarelongerandthepost-ictalconfusionismore
important.TheidealsequenceofictalmodificationsinTLEisa
sub-jectivemanifestation(epigastricfeelingoftenraisingup,auditory,
visualandpsychicsymptoms,arequiteexclusivelylocatedinthe
temporallobe)followedbyalossofconsciousness,oro-alimentary
automatisms,homolateralheadorientationandacontralateralarm
dystonicposturingwithhomolateralgesturalautomatisms.Onthe
contrary,seizuresin ExTE are frequentlyabrupt, brief,without
subjectivemanifestations,thelossofconsciousnessisprecocious,
nooro-alimentaryautomatismsarepresentandmotor
modifica-tioncanbebilateralandmoretonic.However,notalltheseizures
areeasilyrecognizableandparticularlythehypermotorgestural
automatismscanoccurinbothTLEandExTE[5].The
state-of-the-artisthevisualanalysisoftheclinicaldataperformedbyexpert
cliniciansalwaysinassociationwithanatomicalmedicalimages
andEEGdata.Inliterature,thereisnotyetanyautomatic
classifi-cationofepilepsytypesbasedonclinicalsymptomsonly.
Itwouldbeextremelyimportanttoprovideclinicianswithan
automaticclassifierof epilepsytypes,whichcouldhelpto
vali-datetheopinionofthecliniciansincaseofagreementbetween
themethodsandtheclinicians,orwould,onthecontrary,forcethe
clinicianstorevisetheirdecisionorlookinmoredetailatthecase.
Inthispaperweanalyzetwomethodsforrealizinganautomatic
classifierofepilepsytypes,startingfrommedicaldata,whichis
usu-allyunstructuredinformation.Thefirstmethodisabletolearnthe
humanknowledgeonhowtoperformdiagnosis,basedonclinical
symptomsandthesecondautomaticallyretrievesknowledgefrom
medicaldata,searchingforexistingpatternsinthedata(e.g.
clin-icalsymptoms).Forthefirstmethodweusedanontology-based
classificationapproach[9],whileforthesecondmethodweused
genetics-basedmachinelearning(datamining)[10].Inadditionto
dealingwithunstructureddata,bothmethodscanalsodealwith
heterogeneousandincompletedata.Themotivationbehind
choos-ingthetwomethodsistheviewthatlearningbynomeansentails
atabularasaview.Itinvolvestheincorporationofpriorknowledge
thatcanaccelerateorotherwiseimprovethelearningprocess.A
combinationoftheabovemethods willallowus toincorporate
domainknowledgeandatthesametimediscovernovelmedical
knowledge.Thetwomethodsweretestedonaclinicaldatasetof
patientsandcomparedwiththeclassificationscarriedoutbyexpert
clinicians.
Thepaperisorganizedasfollows:firstwereviewrelevantwork
inthefieldofontology-basedclassificationandgenetics-baseddata
miningin thecontext of learning from medical data. Thenwe
presentthealgorithmsusedinthepaper,followedbythe
presen-tationoftheactualexperimentswithpatientdataandtheresults
obtained.Finally,wegiveaconclusionandanoutlookbasedonthe
workpresentedinthispaper.
2. Reviewofrelatedwork
In this paper two differentalgorithms are presented, which
exploitdifferentmethods:theontology-basedclassification(OBC)
and the genetics-based data mining (GDM). We give a review
ofrelated workintheareaof ontology-basedclassificationand
genetics-baseddatamining.
2.1. Ontology-basedclassification
Ontological modelling provides a description of a specific
knowledgedomainand itis madeofclasses, properties(which
expressrelationshipsbetweenclasses)andinstances(whichare
the“groundlevel”componentsofanontologyandpopulateclasses)
[11].Inmedicalfield,ontologiescanthereforebeusedforsharing
medicalknowledgeinareliableformatsothatitis
understand-ableandcanbeprocessedbybothhumansandmachines.Medical
ontologieslikeOpenGALEN[12],SNOMED-CT[13]andUMLS[14]
areusedinthehealthcarepractice.TheOpenGALENontologywas
usedinurologytodevelopadecisionsupportsystemfortreating
patientswithurinary-tractinfections[15].SNOMED-CTontology
provideshealthcareterminologywithcomprehensivecoverageof
diseases,clinicalfindings,etiologiesandtherapiesintheparticular
fieldofanesthesiology.Itisusedinelectronichealthcarerecords
(EHR)and isrelated tocritical patient-specificinformation,like
drugallergies.SNOMED-CTmakesthemedicalknowledgemore
usableandaccessible[16].EPILONT1isanontologyaboutepilepsy
domainandepilepticseizureswhiletheEpilepsyandSeizure
Ontol-ogy(EpSO)2givesaclassificationofepilepsysyndromes,location,
etiologyandrelatedmedicalconditionsaccordingtothe
Interna-tionalLeagueAgainstEpilepsy(ILAE)organizationstandards.
Ontologiesmadeofhierarchiesandpropertiesbetweenclasses
canbeusefulfordataaggregationandclustering.Suchontologies
providedomainknowledgeandsupporttheinterpretationof
rela-tionsidentifiedindatasetthroughdataminingprocesses,basedon
linguisticorstatisticaltechniques[17].Asanexample,the
founda-tionalmodelofanatomy(FMA)[18]ontologyisusedasasourceof
anatomicalknowledgeforpredictingtheconsequencesofinjury.
Inthisapplication,knowledgeaboutspatialrelationsbetweenthe
injuryandvitalorgansisprovidedbytheFMA[19].
Ontologicalmodelingwasusedalsoinotherdomains,like
cus-tomerknowledgeassessment[20],whereontologiesareusedto
describe customers population and the ontology graph,
repre-sentedwitharcsandnodes,isusedforclassifyingthemaccording
todifferentcriteria.In [21],theauthorsdesignedsurgical
mod-elsof neurosurgery makinguseof ontology and described 106
surgicalcases. Through classification trees and clustering
algo-rithms,theyextractedsurgicalknowledge,facilitatingthesurgical
decision-makingprocessandsurgicalplanning.In[22],Leeetal.
triedtoovercomethelimitofclassicalontologydealingwith
uncer-tainknowledgeandclassifieddifferentdiabetessyndromesusing
ontology-based inference rules. Highly expressive rules-based
1http://bioportal.bioontology.org/ontologies/EPILONT. 2http://prism.case.edu/prism/index.php/.
languages,likeSWRL[23],failinrepresentingfuzzyinformation
[24],asincaseofsymptomsoccurrence.Forthisreason,weexploit
ontologiestoextractnewprobabilisticinformationconsideringthe
depthofconceptswithrespecttotherootconceptandthedistance
betweensymptoms.
2.2. Genetics-baseddatamining
Datamining(DM)isamachinelearningprocedure,whichcan
beusedtoautomaticallyfindusefulpatternsfromadataset[25].
Theprocedurehasbeensuccessfullyappliedinvariousfieldsand
ismakingitswayintohealthcaresystems.Dataminingalgorithms
canlearnfromexamplesandmodelthenon-linearrelationships
betweenvariables.Themostcommonlyuseddatamining
algo-rithms for health care systems include naive Bayes classifiers,
neural networks, support vector machines, logistic regression,
fuzzyrulesanddecisiontrees.
ThereareseveralDMtechniquesdevelopedfordiagnosing
dis-eases.Forexample,Soni etal.[26],andDangareand Apte[27]
presenteddataminingtechniquesforheartdiseasediagnosis,and
Ganesanetal.[28]presentedtheuseofartificialneuralnetworks
forcancerdiagnosis.
DMtechniquesarealsodevelopedforprognosingdiseases.In
[29], a Bayesianexpertsystemfor clinically detectingcoronary
arterydiseaseisgiven.In[30],DMtechniquesareusedfor
pre-dictingheartattacks.In[31],artificialneuralnetworkshavebeen
appliedforprognosingendstagekidneydisease.TheworkofFloyd
[32]presentstheapplicationofDMtechniquesforprognosisofthe
pancreaticcancer.
Moreover,someauthorscomparedtheperformance of
algo-rithms for the diagnosis or prognoses purposes. In [33], the
discriminatorypowerofk-nearestneighbors,logisticregression,
artificial neural networks, decision trees, and support vector
machines on classifying pigmented skin lesions for diagnosis
purposeis analyzed.In[34],asummary ofcomparisonof
well-performingDMalgorithms usedfor bothdisease diagnosisand
disease prognosis is given. Prasad et al. [35] compared
auto-associativememoryneuralnetworks,Bayesiannetworks,iterative
dichotomized 3 (ID3) and C4.5in the diagnosisof thedisease
asthma.
DMtechniquesarefurthermoreappliedtoanalyzevarious
sig-nalsandtheirrelationshipswithparticulardiseasesorsymptoms.
Anexampleistheapplicationofsupportvectormachinesin
auto-maticseizuredetectioninEEG,whichcanbeusedindiagnosing
neurologicaldisordersrelatedtoepilepsy[36].
Inadditiontodiseasediagnosesand/orprognoses, DM
tech-niquesareusedtoextractdiagnosticrulesfrommedicaldataset
[37,38].Therulesgeneratedaresimilartothosecreatedmanually
inexpertsystemsandthereforecanbeeasilyvalidatedbydomain
experts.ForadetailedreviewofDMtechniquesandchallengesin
miningmedicaldata,pleasereferto[39–42].
Thegenetics-baseddataminingmethodpresentedinthispaper
isrelatedtotheworkintheareaofdataminingfordiscovering
temporalhiddenknowledgefrommedicaldataset.In[43],learning
aBayesiannetworkandrulesisusedtodiscovermedical
knowl-edge.Agrammar-basedgeneticprogrammingisusedasasearch
algorithm.Themethodisappliedinthedomainsoflimbfracture
and scoliosis. Thework by Tan et al.[44] utilizesgenetic
pro-grammingandgeneticalgorithmstoevolveclassificationrulesfor
hepatitisandbreastcancerdatasets.NuovoandCatania[45]
inte-gratedfuzzylogicand geneticalgorithmsinordertodiscovera
fuzzyrulebaseddiagnosticsystemfrommedicaldatasetsforbreast
cancerandPima-IndiandiabetesdatafoundintheUCIrepository
ofmachine learning databases[46], andmedical dataon
apha-siafoundinAachenaphasiadatabase[47].Dametal.presented
a neural-basedlearning classifier system[48] that incorporates
neuralnetworksintoalearningclassifiersystems[49].Themethod
wasappliedonmedicaldatasetssuchasthediabetes,breast
can-cer,heartdiseaseandlymphographyfoundintheUCIrepository
ofmachinelearningdatabases.In[50]decisiontreesareevolved
usingevolutionaryalgorithms.
3. Methods
In thissectionwe giveanoverviewofthemethods thatwe
developedandusedtodeterminethetypeofepilepsyofpatients
basedonclinicaldatadescribedinSection4.1.
3.1. Ontology-basedclassification
Theontology-basedclassificationmethodfocusesonepileptic
symptomscorrelation.Theontologicalmodelingdescribesthe
rela-tionshipsbetweenclassesandclassesandindividuals.
Inthe“patientseizureontology”,ictalsymptomsareontological
instances(i.e.thegroundlevelobjects)ofclass“ictal-symptom”
(suchas“visualillusions”,“fear”,etc.happeningduringseizures).
Therelationshipbetweenclassesisexpressedbypredicates(such
as“subclassOf”,“hasManifestation”,etc.).Inthedomain
ontol-ogy,therearesomegeneralclasses(i.e.collectionsofobjects)like
the“ictal-symptom”classwhichgroupsallthepossible
symp-tomsthatcanbeobservedduringanepilepticseizure.Fig.1shows
theseizureontologymodel.
The “patient” class groups all the patients of the dataset
and itis linkedtothe“seizure”generalclass bytheproperty
“hasSeizure”. Every epileptic seizure is described by a set of
manifestations, defined in the “manifestation” and
“noMani-festation” classes, which refer toobserved and not observed
symptomsduringseizurebythecliniciansandthesetwoclasses
are linkedtothe“seizure” class by“hasManifestation” and
“hasNoManifestation”relationshipsrespectively.Inadditionto
this,observedmanifestationsarecharacterizedbytheiroccurrence
time.“early”and“late”classesaresubclassesofthe
“manifes-tation”class.Symptomsmanifestedinthefirst10sbelongtothe
“early”class,whilelater symptomsbelongtothe“late”class.
“manifestation”and“noManifestation”classesaredefinedas
“unionOf”objectsoftheclass“ictal-symptom”.Asin[51],inorder
totranslatetherelationshipbetweenpairsofsymptoms,
consider-ingtheirproperties,wecomputedthedistancesintermsofgraph
arcs.
TheCkmatrix,whichexpressesthecorrelationbetweeneach
pairof symptoms(i.e.thelikelihood that ifduringan epileptic
seizuresymptomiispresent,symptomjispresentaswell)fora
specificpatientk(k∈{1,...,M},whereMisthenumberofpatients)
iscomputed.
Eachcorrelationmatrixelementck(i,j)betweenpairsof
symp-tomsiscalculatedusing
ck(i,j)=
(d(si)+d(sj))˛+ˇ
((D(si,sj)∗˛)+2max(d(si),d(sj)))+ˇ
(1)
i,j∈{1,...N},whereNisthetotalnumberofsymptoms,siandsj
aretheconsideredsymptoms(e.g.“headdeviation”and“visual
illu-sion”),d(si)andd(sj),arethedistancesofsiandsjfromtheseizure
classintheontologytreeasshowninFig.2.D(si,sj)represents
thelengthoftheshortestpathbetweensiandsjintheontology
tree(numberofarcsbetweensiandsj)asshowninFig.3aandb.
max(d(si),d(sj))representsthelongestdistancefromthe“seizure”
classinthedomainontologytreeofthetwosymptomssiandsj.The
variables˛andˇrangefrom0to1.˛isusedtobalancethe
pro-portionbetweendandDvalues,ˇmakesthedenominatornotto
Fig.1. Ontologymodelofapatientseizurewithearlyandlatesymptomsmanifestations.“eyesdeviation”isobservedinthefirst10softheregistration,while“head
deviation”and“visualillusion”appearafterthefirst10s.
Fig.2.Thedistanceofsymptomsifromthe“seizure”classintheontologytreeiscalculatedasthenumberofarcsinbetweenanditisequalto2.
Fig.3.Thedistancebetweentwosymptoms(D)iscalculatedasthenumberofarcsbetweenthem:e.g.thedistancebetween“headdeviation”and“visualillusion”is2arcs
Fig.4.InteractionbetweentheGAandtheMLtools.InthefigureMindicatesthepopulationsizeofthegeneticalgorithm.
Fig.5.AchromosomeencodingasolutionproposedbytheGA.ThegeneCjencodesthejthclassifiertotrainanduse.ThefirstNgenesfollowingthegeneencodingthe
classifierrepresentthetimeintervalstoconsider.Forsymptomsithetimeintervalisgivenbytesi−t
b
si.ThegeneFisaflagencodingwhetherafeatureselectionwillbe
employed.Ifitsvalueisone,thenfeatureselectionwillbeemployed,andifitsvalueiszero,nofeatureselectionwillbeused.ThelastgeneFsmencodesthefeatureselection
methodtoapply.
EachelementmcTLE(i,j)andmcExTE(i,j)ofthemeanofthe
matri-cesthatexpressthecorrelationbetweensymptomsforeachgroup
(TLEandExTEpatients,MCTLEandMCExTEmatrices),iscomputedas
mc(i,j)TLE= 1 W W
w=1 Cw(i,j) (2) mc(i,j)ExTE= 1 V V v=1 Cv(i,j), (3)whereW+V=M,WandVarethenumberofpatientsofthedataset
withtemporallobeepilepsiesandextratemporallobeepilepsies
respectively.
Inordertoclassifytheseizuretypes,eachseizureisassigned
tothegroupwhosedistancefromMCisminimum,wherethe
dis-tances,DTLEandDExTE,arecomputedby
DTLE=
N i=1⎛
⎝
N j=1(ch(i,j)−mc(i,j)TLE)2
⎞
⎠
(4) DExTE= N i=1⎛
⎝
N j=1(ch(i,j)−mc(i,j)ExTE)2
⎞
⎠
(5)h∈{1,...H},whereHisthetotalnumberofpatientstobetested.
Weoptimized˛andˇusingthetenfoldcross-validationonthe
dataset[52]bycontinuouslychangingtheirvalues.Varyingthe
val-uesof˛andˇ,doesnotaffecttheclassificationpower,evenifthe
correlationbetweensymptomsischanged.Wethereforesetboth
˛andˇto0.5,asthesumofthevariableshastobeone[51].
Theepilepticseizuresontologyis builtinontologyweb
lan-guage(OWL),thepopularontologicallanguage,usingProtégé3.5,
anopen-sourcetool,forcreating,editingandvisualizingontologies
[53].Wefollowedtheproposedguideline[54],collectingspecific
domainknowledgefromexperts,textbooksandpapersintegrating
existingontologiesanddefiningclassesandclasshierarchy.
3.2. Genetics-baseddatamining
TheGDM algorithm is a combinationof a geneticalgorithm
(GA),Pyevolve[55],andmachinelearning(ML)toolsforsupervised
learningWEKA[56]andOrange[57],whichimplementdifferent
classifiers(e.g.decisiontrees,multilayerperceptrons,support
vec-tormachines,etc.).TheflowchartofGDMisshowninFig.4.
Theparametersthatareoptimizedbythegeneticalgorithmare:
1theclassifier,Cj,totrain,
2theinstantoftimeinwhichasymptomsistartstsbiandtheinstant
oftimewhenitendste
si,
3thefeatureselectionmethod,Fsm,thatisusedtodetermine
symp-tomswhichplayanimportantroleinclassifyingpatientsintoTLE
andExTEgroups.
TheGAstartsbygeneratingrandomsolutions(i.e.the
chromo-somess1,s2,...,sk,...,sM,whereMisthenumberofsolutions).
Fig.5showsachromosomeencodingpropertiesofasolution.Each
solution,sk,willbedecodedintotheclassifiertotrainCj,the
fea-tureselectionflag,F,thefeatureselectionmethodtoapply,Fsm,
andthecorrespondingtrainingdata.Aftertraining,themachine
learningtoolsgivebacktheconfusionmatrix[58]thatresultsafter
thetenfoldcross-validation[52]isperformed.Fromtheconfusion
matrix,thefitnessoftheindividual(qualityofasolution)is
calcu-latedusingtheharmonicmeanaccuracy(HMACC),whichisgiven
by
HMACC(sk)=
2
(1/(1−FP(sk)))+(1/(1−FN(sk))),
Table1
GAparameters.
Populationsize 50
Mutationprobability 0.05
Crossoverprobability 0.8
Maximumnumberofgenerations 100
Selectionmethod Rankbasedselection[59]
whereFP(sk)standsforthefalsepositiverateandFN(sk)standsfor
thefalsenegativeratecorrespondingtoasolutionsk.Thevalues
fortheHMACCliebetweenzeroandone.AhighervalueofHMACC
meansabettersolution.IfasolutionclassifiescorrectlyalltheTLE
patientsbutmisclassifiestheExTEpatients,itwillresultinalow
HMACCvalue.
Theoptimizationprocessof thegeneticalgorithm isrun for
somegenerations and stopped if the maximum HMACC sofar
obtaineddoesnot improveover the last 20 generations orthe
maximumnumberofgenerationsisovercome.Table1showsthe
parameters of the geneticalgorithm for resultsobtained using
genetics-baseddataminingalgorithm.Theparameterswereset
manuallyandkeptthesameforallexperimentsreportedinthis
paper.
4. Experimentsandresults
Toevaluatemachinelearningmethods,onehastotakecareof
afewgeneralproblems.Forcategorizingtheepilepsytypesinto
temporal(TLE)andextra-temporal(ExTE),themethodspresented
inthispaperaretrainedonatrainingdatasetandtestedonatest
dataset.Thesetwodatasetswereselectedtobedisjointbecausethe
algorithmsareoptimizedonthetrainingdatasetforinstanceusing
crossvalidation.Sincethetestdatasetisnotincludedinthetraining
datasetatall,itiscompletelyunknowntothemethodswhenit
ispresentedtothemethodstotesttheirperformance.Hencethe
responsesofthemethodsgiveanideaonhowthemethodswould
performonanewdatadataset(newpatientsinourcase).
Inthissectionwedescribefirsttheclinicaldataofpatientswe
usedforanalyzingtheperformanceofthemethodsinSection4.1
andthentheresultsobtainedwithitinSections4.2,4.3and4.4.In
contrasttothestatisticalanalysisgiveninSection4.4,inSections
4.2and4.3wedealwiththecomparisonbetweenthemethodsand
clinicians.Forcliniciansre-samplingoftestdatasetsisnot
possi-blebecausetheclinicianswouldremembertheexamplesinthe
datasets.Therefore,different“trials”canonlybedoneby
differ-entclinicians.Inthefirsttest(Section4.2),weprovidedaclinician,
whoselevelofexpertiseisveryhigh,withthreetestdatasetsof
patients,andweaskedhertoclassifyeachpatient.Inthesecond
test(Section4.3)weaskedsevendifferentclinicians,whose
exper-tiseishigh/veryhigh,toclassify30patientsthatdidnotbelongto
anytraining,testingorvalidationdatasetpreviouslyused.
4.1. Descriptionofdatasetused
Theclinicaldatasetofpatientsthatweusedinourworkwas
obtainedfromthree clinicalepileptological centers:CarloBesta
NeurologicalInstitute,NiguardaCaGrandaHospitalandSanPaolo
Hospital,allinMilan,Italy.Allpatientsusedintheexperiments
reportedinthispaperaredrug-resistantepilepticpatients,whodo
notpresentanyotherpathology(e.g.malignanttumors),andthey
wereoperatedonandseizure-freeforatleastoneyear.Thisfact
isusedasaground-truthforthegenerationofthelabelsofthe
patients,whichareTLEandExTE.Allofthepatientshavesigned
aninformedconsenttothetreatmentoftheiranonymizeddata.
Thedatasetfor theexperimentsconsistsof99 labeledpatients,
ofwhich 60sufferfromTLEand 39fromExTE.Furthermore,an
additionaltestsetof30patients,ofwhich10sufferfromTLEand
Table2
Patientsdataset’soccurrencetable.Therowsareforsubjectiveandobjective
symp-toms,thecolumnsareforthetwogroupsofpatients(TLEandExTE).Thefirst14
symptomsaresubjective,thelast15symptomsareobjective.
No. Symptom TLE ExTE
1 Epigastric 20 1
2 Auditoryillusion 0 0
3 Auditoryhallucinations 0 2
4 Visualillusionsimple 1 0
5 Visualillusioncomplex 0 1
6 Visualhallucinationsimple 0 1
7 Visualhallucinationscomplex 0 1
8 Olfactivehallucinations 1 0
9 Gustatoryhallucinations 1 0
10 Psychicmanifestation 10 2
11 Fear 2 3
12 Autonomic(tachicardia,dyspnoea,etc.) 11 8 13 Motor-sensitivemanifestationmonolateral 1 2 14 Motor-sensitivemanifestationbilateral 1 0
15 Oroalimentaryautomatisms 36 10
16 Gesturalautomatismsmonolateral 26 14
17 Gesturalautomatismsbilateral 16 12
18 Headdeviation 35 38
19 Eyesdeviation 32 29
20 Motorclonicmonolateralarm 5 6
21 Motorclonicbilateralarms 1 2
22 Motorclonicmonolateralleg 1 3
23 Motorclonicbilaterallegs 1 4
24 Motorclonicmonolateralmouth 3 5
25 Motordystonicmonolateralarm 27 23
26 Motordystonicbilateralarms 14 10
27 Motordystonicmonolateralleg 2 5
28 Motordystonicbilaterallegs 6 7
29 Autonomic(flushing,shialorrhea,vomiting) 11 3
20fromExTE,isusedfortheexperimentdescribedinSection4.3.
Thedatasetisgeneratedasfollows.Onceclinicianshavea
collec-tionofvideorecordingsofepilepticseizures,theycreateatablefor
eachpatientinwhich29symptomsarepresentandwhichgives
thedurationinsecondsoftheseizures.Thefirst14 rowsofthe
tablecorrespondtothesubjectivesymptomsandthenext15to
theobjectivesymptoms.Foreachpatient,cliniciansfillinthetable,
assigningtothesymptomsobserveda‘1’,anda‘0’tomanifestation
notobserved(Fig.6).Thepatientfromwhichthedataisshownin
Fig.6hasthreedifferentsymptoms(“epigastric”sensation,“visual
illusionsimple”and“motordystonicmonolateralleg”).The
occur-renceofsymptomsinpatientsdatasetusedforthisworkisshown
inTable2.
4.2. Test1.ComparisonofOBCandGDMwithasingleclinician
ThecomparisonofOBCandGDMdiagnosiswithasingle
clin-ician,expertinepilepsytypeclassification,isperformedonthree
testdatasetsgeneratedfromthetrainingsetweobtained.The
sin-gleclinician’slevelofexpertiseisveryhighandsheisoneofthe
7clinicianswhoparticipatedinthesecondstudy(seeSection4.3).
Thetestdatasetsaregeneratedbyselectingrandomentriesfrom
thetrainingset.Afterselectingthetestdatasets,therestofthedata
isusedintrainingOBCandGDM.Pleasenotethatwhilethe
mem-bersofthetestdatasetsdonotchangeonceselected,themembers
ofthetrainingandvalidationwillchangeduringcross-validation.
Table3givestheexperimentalsetupusedforTest1.
Table4showstheperformancewithrespecttoaccuracy,that
isthenumberofcorrectlyclassifiedpatientsfortheclinician,OBC
andGDMonthetestdatasets.Theaccuracyweusedisgivenbythe
equation
Accuracy=#correctlyclassifiedTLEs+#correctlyclassifiedExTEs
#TLEs+#ExTEs (7)
wherethesymbol#standsfor“numberof”.Cochrantestwas
Fig.6.Exampleofpatientsymptoms.Symptomsarelistedinthefirstcolumn.A‘1’standsforapresenceofasymptom,anda‘0’fortheabsenceofasymptomataparticular time.
AscanbeseenfromTable4,theperformanceofOBCandGDMis
betterthantheclinicianfordataset0andcomparableperformance
fordataset1anddataset2evenifthereisnostatisticalsignificance
(p0=0.2592,p1=0.5292,p2=1).Table5showsthereliabilityofthe
agreementsusingtheFleiss’kappavalues[61]amongtheclinician,
OBCandGDM.Fordataset0anddataset2,OBCandGDMhavethe
highestreliabilityofagreements,whilefordataset1GDMhasthe
highestreliabilityagreementwiththeclinician.Fordataset0,GDM
hastheleastreliabilityofagreementwiththeclinician.
4.3. Test2.ComparisonofOBCandGDMwithsevenclinicians
Sevenclinicians,expertsinthefieldofepilepsydiagnosis,were
askedtoclassifyepilepsytypesonthebasisofapatient’sseizure
symptoms.Pleasenotethatthelevelofexpertiseofalloftheseven
cliniciansishigh/veryhigh.AccuracywascomputedasinEq.(7)
Table3
Test1.ExperimentalsetupusedforcomparingOBCandGDMclassificationwitha
singleclinician.
Dataset0 Dataset1 Dataset2
TLE ExTE TLE ExTE TLE ExTE
Training 39 19 39 29 49 32
Validation 4 5 7 3 5 4
Testing 17 15 14 7 6 3
Table4
Test1.Accuracyobtainedfortheclinician,OBCandGDMonthreetestdatasets.
Dataset0 Dataset1 Dataset2
Clinician 18/32 13/21 7/9
OBC 24/32 14/21 7/9
GDM 21/32 16/21 7/9
Table5
Test1.Fleiss’kappavalues.
Dataset0 Dataset1 Dataset2
OBCandclinician 0.13 0.25 0.13
GDMandclinician 0.02 0.57 0.45
OBCandGDM 0.60 0.49 0.59
Clinician,OBCandGDM 0.55 0.36 0.31
Table6
Test2.Experimentalsetup.
TLE ExTE
Training 45 26
Validation 5 3
Testing 10 20
andtheCochrantestwasperformed(p≤0.05).Table6showsthe
experimentalsetupforTest2.
AscanbeseenfromTable7bothOBCandGDMperformatthe
levelofexperts(p=0.4139).Table8showsthecollaborative
deci-sionmadebytheclinicians,thealgorithms,andthealgorithmsand
theclinicians.Amajorityofvotesisusedtodeterminethe
deci-sionofthecliniciansandthealgorithms.Fortheclinicians,ifmore
thanthreecliniciansvotedinfavorofanepilepsytype,thiswillbe
consideredasthedecisionoftheclinicians.ForOBCandGDM,only
caseswherebothvotedinfavorofanepilepsytypeisconsidered
asthedecisionofthealgorithms.Ifbothdisagree,thiscasewillbe
consideredasatiesincethereisnomajorityvotinginfavorofan
epilepsytype.Anotherinterestinganalysisistheagreementlevels
oftheclinicians,thealgorithmsandthealgorithmsandthe
clini-cians.Table9showstheFleiss’kappavaluesfortheclinicians,the
algorithmsandthealgorithmsandtheclinicians.
Fromthetable,onecanseethatthereliabilityofagreements
ofthecliniciansisbetterthanthereliabilityofagreementsofthe
algorithms.Moreover, onecan seethatthereliability of
agree-mentsofthealgorithmswiththecliniciansiscomparablewithGDM
slightlybetterthanOBC.FollowingLandisandKoch[62]thereisa
fairagreementbetweenOBCandGDMandamoderateagreement
amongtheclinicians.Onecanalsoseethatthere isamoderate
agreementamongOBC,GDMandtheclinicians.
Table7
Test2.AccuracyobtainedbyOBC,GDMandsevenclinicianson30testingpatients.
ThesymbolsCl-1,Cl-2,...,Cl-7standforclinician1,clinician2,...andclinician7
respectively.
OBC GDM Cl-1 Cl-2 Cl-3 Cl-4 Cl-5 Cl-6 Cl-7
18/30 18/30 15/30 16/30 19/30 19/30 19/30 20/30 22/30
Table8
AccuracyobtainedbyOBCandGDMandsevenclinicianson30novelpatientsusing majorityvoting.
Clinicians OBCandGDM All
Majorityvoting 20/30 12/30 20/30
Tie – 11/30 –
Table9
Test2.Fleiss’kappavaluesfortheclinicians,OBCandGDM,andOBC,GDMandthe clinicians.
Fleiss’kappa
Clinicians 0.55
OBCandclinicians 0.44
GDMandclinicians 0.48
OBCandGDM 0.27
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
HMACC Boxplot
M D G C B OHMACC
Fig.7. AboxplotdiagramoftheHMACCof100valuesfrom100repetitionsofthe trainingandtestingofthealgorithms.TheboxplotoftheHMACCisonlyforthetest accuracy.
4.4. Test3.Statisticalanalysisofthemethods
Togetan ideaof theperformance that wecan expectfrom
thealgorithmsweusedstandardmeasuresfromstatistics,suchas
expectedvalue(mean)andthestandarddeviation.Inordertobe
abletocalculatesuchvalueswehavetorepeatthetrainingphase
andthetestingofthealgorithmsindependentlyandmultipletimes.
Sincethedatacomesfromrealpatientsitisnotpossibleto
gener-atealargenumberoftestsamples.Therefore,theonlypossibility
istogeneratenewtrainingandtestdatasetsbyre-samplingthe
availabledata.
Wedecidedtouse20%oftheavailabledataasthetestdataset
andtheremaining80%asthetrainingdataset.Foreachtrialthetest
datasetisdrawnfromthedatabyrandomselection,20%ofthe
tem-poraland20%oftheextra-temporalexamplesandtheremaining
examplesareusedasthetrainingdataset.Selectingthesamples
inthiswayguaranteesthatthedistributionofthetwoclassesis
preservedforthetestandthetrainingdataset.Thealgorithmsare
trainedonthetrainingdatasetandaftertheyareoptimizedfortheir
performancetheyareevaluatedonthetestdataset.Whenthetest
performanceiscalculateditisimportantthatthealgorithmsare
completelyresetsothattheydonothaveanyinformationabout
thepriortrial.Thetrainingandtestingofthealgorithmsisrepeated
100times.
Afterthetestperformanceofthealgorithmsismeasured100
times,themeanvalueandthestandarddeviationofalltrialsare
calculated.ThemeanvalueisnowtheexpectedvalueoftheHMACC
thatanalgorithmwouldmakeonnewpatientdataandthestandard
deviationgivesameasureofhowstablethealgorithmis.Therefore,
ahighexpectationvalueoftheHMACCaswellasasmallstandard
deviationisdesired.
Theresultsobtainedbythealgorithms areshown in a
box-plotdiagraminFig.7.Theplotshowsthemedianaccuracyofthe
algorithms,whichgivesagoodideaoftheexpectedlikelihoodof
correctlyclassifyingnew data.Thebox aroundthe meanvalue
showsthe25thand75thpercentilesoftheaccuracyvalues
dur-ingthe100repetitionsandthewhiskersshowtheminimumand
maximumrangeswhicharenotdetectedasoutliers,whileoutliers
aremarkedascrosses.Thisgivesanideahowsensitivethe
algo-rithmistotheselectionofthetrainingdataset.Inarealapplication,
aftertrainingthealgorithmwithalloftheavailabledata,thisisthe
rangeinwhichonecanexpecttheaccuracytolie.Ingeneral,itis
Table10
Themean()andthestandarddeviation()oftheHMACCforbothmethodsonthe
100testruns.
OBC 0.7125 0.1080
GDM 0.8072 0.1065
expectedthattheexpectationoftheaccuracywillincreaseandthe
standarddeviationwilldecreasewithadditionaltrainingexamples.
Therefore,ifthealgorithmsarecontinuouslytrainedonnewdata,
theirperformancewillincreaseovertime.
InFig.7onecanseethatthemedianaccuracyfortheGDMis
about82%andfortheOBCitis70%.Thevaluesforthemean
accu-raciesandthestandarddeviationsforthetwoalgorithmsaregiven
inTable10.Itisobviousthatbothalgorithmsshowamuchbetter
resultthanrandomguessing,whichwouldbe50%.
4.5. ComputationalefficiencyofOBCandGDM
TheaveragetrainingtimeneededforGDMfortrainingon79
patientstakesonaverage3h,whiletheresultingclassifiertakes
onaverage0.1sfordeliveringtheoutputoftheclassificationona
standardpersonalcomputer.Sincegeneticalgorithmscanbeeasily
parallelized,inprincipleitshouldbepossibletoreducethetraining
timeofGDMdrastically.ForOBCthetimethatisneededtotrainthe
classifieron79patientsisonaverage10min,andoncetheclassifier
iscomputedittakes0.3mstoclassifyanovelpatient.Thisshows
thatoncethealgorithmsaretrainedtheycanbeappliedinpractical
settings.
5. Conclusionandoutlook
Clinical evaluation of epileptic patients is based on careful
visualdetectionofsymptomsduringepilepticseizuresinlongterm
Video-EEGrecordings.Weproposedtwomachinelearning
meth-odstohelpthecliniciansduringthediagnosis.Thefirstmethod
(OBC)canincorporatetheknowledgeofexpertcliniciansdirectly,
whilethe secondmethod(GDM) canmine ordiscover medical
knowledgeinimplicitformfrommedicaldata.Ourontology-based
classificationmethod(OBC)isnotbasedoninferencerulessince
itisnotpossibletodefinedeterministicrulescorrelatingthe
pres-enceortheabsenceofaparticularsymptomtoepilepsytypes.Both
methodsconsiderthetimingofoccurrenceofsymptomssinceearly
symptomsare moresignificantthanlate symptoms(i.e.typical
behavioralpatternsarecloselyrelatedtotheseizureinitialspread
location).Inadditiontothis,thegenetics-baseddatamining
algo-rithmhasafeatureselectionstep,whereafeatureisaparticular
symptom.Throughfeatureselectionitispossibletoidentifywhich
featureplaysanimportantroleintheclassificationoftheepilepsy
types.Itispossibletoconcludethatthemethodscanbeappliedto
discoverinterestingspatial(symptomsplayinganimportantrole)
andtemporal(sequenceofsymptoms)patternsinmedicaldatafor
aparticulardisease.
Whencomparingtheperformanceofthealgorithmswiththat
ofa singleclinician, both OBCand GDM showa slightly better
performance(evenifnotsignificant)thantheclinician.When
com-paredwiththeperformanceofthewholepopulationofclinicians
(inourcasesevenclinicians),theaccuracyofthemachine
learn-ingalgorithms(18/30)isslightlyworsethantheaverageaccuracy
of theclinicians (18.5/30) (see Section 4.3).Results of the two
comparisonsshowthattheperformanceofthemachinelearning
algorithmsisapproachingthelevelofexpertsforthecasewhere
boththecliniciansandthemachinelearningmethodsexploitonly
theclinicaldatausedinthispaper(stringsof‘1’sand‘0’sintime).
thatGDMperformsslightlybetterthanOBC,andbothalgorithms
obviouslyperformmuchbetterthanrandomguessing.The
accu-racyvaluesobtainedbythemethodsareatfirstglancenotexpected
bypersonsnormallyworkingwithothermachinelearning
bench-markproblems,sinceinthesebenchmarkproblemstheaccuracy
values obtainedareusually higher thanthe resultsreportedin
thispaper.Therefore,onewouldconsiderthattheresultsarenot
satisfactory.However,when comparingtheperformance ofthe
algorithmswiththatoftheclinicians,itcanbeeasilyseenfrom
theresultsthatthealgorithmsapproachthelevelofexpertiseof
veryexperiencedclinicians,whoparticipatedintheexperiments
reportedinthepaper.
ConsideringtheFleiss’kappaindex,computedvaluesarestrictly
dependentontheconsidereddatasets (theagreement between
OBCandGDMcanbesubstantialorfairincaseofdifferentdataset
considered,evenif thedatasetshavethesame numberof
sub-jects).Kappavaluesinterpretationissomewhatcontroversial,the
referencevalueforouranalysisistheagreementamong allthe
clinicians,whichis0.55.Fromtheresults,itappearsthatthe
devel-opedmethodshavethesameperformanceincomparisontothe
poolofconsideredclinicians,whenallareanalyzingthesamedata.
Aninteresting application of the machine learning methods
developedin thispaperwouldbein theareaoftraining young
cliniciansin theclassification of epilepsytypes. It would make
thelearningprocessfortheyoungclinicianseasier.Incaseofless
experiencedclinicians,automaticclassificationalgorithmscanbe
usedtoteachhowasymptom,ratherthananotherone,couldbe
moreimportantandsignificantforaspecificgroupofepilepsy.An
investigationofthistopicwouldbeinterestingforfuturework.In
additiontothis,themachinelearningmethodscouldbeapplied
togenerateanalternativeopinionforclassifyingepilepsytypes.
Thiswouldhelp tovalidatetheopinionof theclinicianin case
ofagreementbetweenthemethodsandtheclinicians,orwould,
incase of divergence,forcetheclinician torevise his/her
deci-sionorlookatthecase inmoredetail. Afurtherapplicationof
themethodsliesinformalizingmedicalknowledgeandbest
prac-tices.Theadvantageofdevelopinganontologyofepilepticseizure
isthesymptomsclassificationstandard, sothatallclinical
cen-terswouldbeabletorepresentanepilepticseizureinthesame
way.The useof anontologytodefine symptomsof an
epilep-ticseizurewillalsoovercometheproblemofterminologyused
by different clinicalcenters for symptoms description.Thus, it
isimportant tohavea glossaryof thesymptoms: ourontology
willfacethisdiscrepancy(i.e.“owl:sameAs”statementindicates
that twoindividuals actually refer tothesame thing:the
indi-viduals have thesame “identity”,so theyinheritall properties
or “rdfs:seeAlso”statement used to indicatea resource that
mightprovideadditionalinformationaboutthesubjectresource).
Thisis obviously a difficult taskand requires a detailed
analy-sisofthestructureandtheconceptsofmedicalterminologies.It
canbeachievedbyconstructing medicaldomainontologiesfor
representingmedicalterminologysystems.Inthiswork,the
real-izationofanontology(OBC)thatrepresentsthedomainofepilepsy
diagnosiswillhelpinformalizingthesymptomsobservable
dur-inganepilepticseizureandtocorrectlyclassifythetwotypesof
epilepsy.Moreover,afurthersubclassificationshouldbeobtained
inExTEconcerningthedifferentlobes(parietal,insular,occipital
andfrontal).
An important bottleneck that always arises when machine
learningistransferredtopracticalimplementationisthe
knowl-edgetransferfromtheexperttotheagent.Inthiscase,clinicians
areabletoclassifypatientsobservingtheseizureusingtheir
sensi-tivityacquiredbyexperience,whichisdifficulttotranslateintothe
artificialintelligencealgorithm.Ithastobenoticedthatinstandard
clinicalroutine,cliniciansbasetheirevaluationalsoon
anamnes-ticandneuroimagingdata.Inordertoimprovetheperformance
ofautomaticclassificationalgorithms,moreinformationshouldbe
encodedandsubsequentlyprovidedtotheclassifiers.
In healthcare,data isfrequentlynot idealorincomplete.In
ourcase, a problemcanarise ifcliniciansmake errors in
com-pilingthetableofsymptomsoccurrenceduringepilepsyseizure.
Thesemistakescanbecorrectedbecausewehaveasecond
reli-ablekindofdatathatisthevideorecordingsofepilepsyseizures.
Also,healthcarerecordsareunstructuredinformation.Our
onto-logicalapproachisalsosuitedforsuchkindofdata,sinceontologies
canbestructuredtoautomaticallyclassifyunstructuredtext,once
correspondencesaredefined.Futureworkcouldaddnewclinical
informationofapatient,likeanewsymptom,“staring”(a
motion-lessstatewiththeinterruptionofallthepreviousactivities),when
patientsstareatseizureonset,andwhichismorefrequent
dur-ingtemporallobeepilepsies.Anadditionalvariablecouldimprove
theclassifier, even moreif thatvariable had higheroccurrence
inasetofpopulationpatients. Theelectroencephalogram(EEG)
detectselectricalactivityinthebrainanditisafundamentaltool
fortheanalysisinthediagnosisofepilepsybecauseelectrical
alter-ations,oftenveryindicative,canalsobepresent intheabsence
ofothersymptoms.Otherdiagnostictestsincludemagnetic
res-onanceimagingandlaboratorytests,whichcanverifyorexclude
specificcauses.Inthissense,withthehelpoftheproposed
algo-rithms,differenttypesofinformationcouldbeunifiedandthen
processedtoobtainamoreaccuratediagnosis.Sinceitisknown
thatcliniciansusedifferentsourcesofinformationotherthanthe
codedclinicaldatausedinthispaperfordecision,itremainstobe
seenhowthealgorithmsimprovetheirperformanceifadditional
informationisavailable.Moreover,thereliabilityofthemethods
tocorrectlyclassifyco-morbidpatientscouldbeevaluatedinorder
toseeifthepresentedmethodsareabletolocalizetheepileptic
focusalthoughsomesymptomsmaybeproducedbysecondary
pathology.
Thepaperisafeasibilitystudyontheapplicationofmachine
learningtechniquesfortheautomaticlocalizationofthe
epilepto-geniczone.Althoughtheepilepsytypesanalyzedarecharacterized
by distinctive signs during the seizure manifestations, patients
diagnosisbasedonthevisualinspectionofsymptomsduringthe
first60scanbesubjective,asshownbycliniciandisagreementson
the30patientsanalyzed.
TheOBCandGDMmethodspresentedcanbeappliedin
min-ingmedicalknowledgeasshowninthepaper,orevenbeusedto
discoveranovelideaintheareaofmedicaldiagnosisofa
partic-ulardisease.Itisworthunderliningthatthismakestheautomatic
classificationsystemscreativeandinteractive,givingbackthe
dis-covered knowledge totheclinicians.Thisin turn improvesthe
existingknowledgeofclassifyingepilepsytypes.
Acknowledgements
Thiswork is supportedin partwithintheACTIVE European
project (FP7-ICT-2009-6-270460) and the EuRoSurge European
project(FP7-ICT-2011-7-288233).Wewouldliketothankthetwo
anonymousreviewersfortheirconstructivesuggestionsand
com-ments.
References
[1]Kahane P,Landré E, Minotti L, Francione S,Ryvlin P. The Bancaudand Talairachviewontheepileptogeniczone:aworkinghypothesis.Epileptic Dis-ord2006;8:16–26.
[2]MiserocchiA,CascardoB,PiroddiC,FuschilloD,CardinaleF,NobiliL,etal. Surgeryfortemporallobeepilepsyinchildren:relevanceofpresurgical eval-uationandanalysisofoutcome.JNeurosurgPediatr2013;11(3):1–12[Clinical article].
[3]LüdersH,AcharyaJ,BaumgartnerC,BenbadisS,BleaselA,BurgessR,etal. Semiologicalseizureclassification.Epilepsia1998;39(9):1006–13.
[4]Nobili L,Francione S,Mai R, Cardinale F, CastanaL, TassiL, etal. Sur-gical treatment of drug-resistant nocturnal frontal lobe epilepsy. Brain 2007;130(2):561–73.
[5]TassiL,MeroniA,DeleoF,VillaniF,MaiR,RussoGL,etal.Temporallobe epilepsy:neuropathologicalandclinicalcorrelationsin243surgicallytreated patients.EpilepticDisord2009;11(4):281–92.
[6]MaiR,SartoriI,FrancioneS,TassiL,CastanaL,CardinaleF,etal.Sleep-related hyperkineticseizures:alwaysafrontalonset?NeurolSci2005;26(3):220–4.
[7]JacksonGD,BriellmannRS,KuznieckyRI.Temporallobeepilepsy.In:Kuzniecky RI,JacksonGD,editors.Magneticresonanceinepilepsy.2nded.Burlington: AcademicPress;2005.p.99–176[chapter4].
[8]GuptaA,JeavonsP,HughesR,CovanisA.Auraintemporallobeepilepsy: clin-icalandelectroencephalographiccorrelation.JNeurolNeurosurgPsychiatry 1983;46(12):1079–83.
[9]BurgerS,StiegerB.Ontology-basedclassificationofunstructured informa-tion.In:Thefifthinternationalconferenceondigitalinformationmanagement (ICDIM).IEEE;2010.p.254–9.
[10]KovacsT.Genetics-basedmachinelearning.In:RozenbergG,BckT,KokJ, edit-ors.Handbookofnaturalcomputing.Berlin,Heidelberg:Springer;2012.p. 937–86.
[11]SowaJ.Ontology,metadata,andsemiotics.In:GanterB,MineauG,editors. Conceptualstructures:logical,linguistic,andcomputationalissues,vol.186 ofLectureNotesinComputerScience.Berlin,Heidelberg:Springer;2000.p. 55–81.
[12]RectorAL,RogersJE,PoleP.TheGALENhighlevelontology.StudHealthTechnol Inform1996;34:174–8.
[13]BosL,DonnellyK.SNOMED-CT:theadvancedterminologyandcodingsystem foreHealth.StudHealthTechnolInform2006;121:279–90.
[14]Bodenreider O.Theunified medicallanguage system (UMLS):integrating biomedicalterminology.NucleicAcidsRes2004;32(suppl.1):D267–70.
[15]KarlssonD.Adesignandprototypeforadecision-supportsysteminthefield ofurinarytractinfections-applicationofOpenGALENtechniquesforindexing medicalinformation.StudHealthTechnolInform2001;84:479–83.
[16]ElevitchFR.SNOMED-CT:electronichealthrecordenhancesanesthesiapatient safety.AANAJ2005;73(5):361–6.
[17]BodenreiderO.Biomedicalontologiesinaction:roleinknowledge manage-ment,dataintegrationanddecisionsupport.YearbMedInform2008;47:67–79.
[18]RosseC,MejinoJrJL.Areferenceontologyforbiomedicalinformatics:the foundationalmodelofanatomy.JBiomedInform2003;36(6):478–500.
[19]RubinDL,DameronO,BashirY,GrossmanD,DevP,MusenMA.Usingontologies linkedwithgeometricmodelstoreasonaboutpenetratinginjuries.ArtifIntell Med2006;37(3):167–76.
[20]ZhangL,WangZ.Ontology-basedclusteringalgorithmwithfeatureweights.J ComputInformSyst2010;6(9):2959–66.
[21]JanninP,MorandiX.Surgicalmodelsforcomputer-assistedneurosurgery. Neu-roimage2007;37(3):783–91.
[22]LeeC-S,WangM-H,AcamporaG,LoiaV,HsuC-Y.Ontology-basedintelligent fuzzyagentfordiabetesapplication.In:IEEEsymposiumonintelligentagents, 2009(IA’09).IEEE;2009.p.16–22.
[23]HorrocksI,Patel-SchneiderPF,BoleyH,TabetS,GrosofB,DeanM,etal.SWRL: asemanticwebrulelanguagecombiningOWLandRuleML.W3CMember Submission2004;21:79.
[24]PanJ,StoilosG,StamouG,TzouvarasV,HorrocksI.f-SWRL:afuzzy exten-sionofSWRL.In:SpaccapietraS,AbererK,Cudr-MaurouxP,editors.Journal onDataSemanticsVI,vol.4090ofLectureNotesinComputerScience.Berlin, Heidelberg:Springer;2006.p.28–46.
[25]GorunescuF.Datamining:concepts,modelsandtechniques.Berlin, Heidel-berg:Springer-Verlag;2011.
[26]SoniJ,AnsariU,SharmaD,SoniS.Predictivedataminingformedicaldiagnosis: anoverviewofheartdiseaseprediction.IntJComputAppl2011;17(8):43–8.
[27]DangareCS,ApteSS.Improvedstudyofheartdiseasepredictionsystemusing dataminingclassificationtechniques.IntJComputAppl2012;47(10):44–8.
[28]GanesanN,VenkateshK,RamaMA,PalaniAM.Applicationofneuralnetworks indiagnosingcancer diseaseusingdemographicdata. IntJComput Appl 2010;1(26):76–85.
[29]ChuC-M,ChienW-C,LaiC-H,BludauH-B,TschaiH-J,LuPaiS-MH,etal.A Bayesianexpertsystemforclinicaldetectingcoronaryarterydisease.JMed Sci2009;4:187–94.
[30]SrinivasK,RaniBK,GovrdhanA,KarimnagarJ.Applicationsofdatamining techniquesinhealthcareandpredictionofheartattacks.IntJComputSciEng 2010;2(2):250–5.
[31]DiNoiaT,OstuniVC,PesceF,BinettiG,NasoD,SchenaFP,etal.Anendstage kid-neydiseasepredictorbasedonanartificialneuralnetworksensemble.Expert SystAppl2013;40(11):4438–45.
[32]FloydS.Dataminingtechniquesforprognosisinpancreaticcancer[Master’s thesis].USA:WorcesterPolytechnicInstitute;2007.
[33]DreiseitlS,Ohno-machadoL,KittlerH,VinterboS,BinderM.Acomparisonof machinelearningmethodsforthediagnosisofpigmentedskinlesions.JBiomed Inform2001;34:28–36.
[34]Kolc¸eE,FrasheriN.Aliteraturereviewofdataminingtechniquesusedin healthcaredatabases.In:MarkovskiS,GusevM,editors.ProceedingsofICT innovations.2012.p.577–82.
[35]PrasadB,PrasadP,SagarY.Acomparativestudyofmachinelearning algo-rithmsasexpertsystemsinmedicaldiagnosis(asthma).In:MeghanathanN, KaushikB,NagamalaiD,editors.Advancesincomputerscienceandinformation technology,vol.131ofcommunicationsincomputerandinformationscience. Berlin,Heidelberg:Springer;2011.p.570–6.
[36]FanJ,ShaoC,OuyangY,WangJ,LiS,WangZ.Automaticseizuredetectionbased onsupportvectormachineswithgeneticalgorithms.In:WangT-D,LiX,Chen S-H,WangX,AbbassH,IbaH,etal.,editors.Simulatedevolutionandlearning, vol.4247ofLectureNotesinComputerScience.Berlin,Heidelberg:Springer; 2006.p.845–52.
[37]MeamarzadehH,KhayyambashiM,SaraeeM-H.Extractingtemporalrulesfrom medicaldata.In:Internationalconferenceoncomputertechnologyand devel-opment(ICCTD09),vol.1.2009.p.327–31.
[38]MenaLJ,OrozcoEE,FelixVG,OstosR,MelgarejoJ,MaestreG.Machine learn-ingapproachtoextractdiagnosticandprognosticthresholds:applicationin prognosisofcardiovascularmortality.In:Computationalandmathematical methodsinmedicine;2012,6pp.
[39]FreitasA.Asurveyofevolutionaryalgorithmsfordataminingandknowledge discovery.In:GhoshA,TsutsuiS,editors.Advancesinevolutionarycomputing, naturalcomputingseries.Berlin,Heidelberg:Springer;2003.p.819–45.
[40]WasanSK,BhatnagarV,KaurH.Theimpactofdataminingtechniqueson medicaldiagnostics.DataSciJ2006;5:119–26.
[41]HosseinkhahF,AshktorabH,VeenR,OwrangM.Challengesindataminingon medicaldatabases.IGIGlobal;2009.p.1393–404[chapter83].
[42]Satyanandam N,SatyanarayanaCh, RiyazuddinMd,ShaikA.Data mining machinelearningapproachesandmedicaldiagnosesystems:asurvey.IntJ ComputOrgTrends2012;2(3):53–60.
[43]NganPS,WongML,LamW,LeungKS,ChengJCY.Medicaldataminingusing evolutionarycomputation.ArtifIntellMed1999;16(1):73–96.
[44]TanK,YuQ,HengC,LeeT.Evolutionarycomputingforknowledgediscovery inmedicaldiagnosis.ArtifIntellMed2003;27(2):129–54.
[45]DiNuovoA,CataniaV.Genetictuningoffuzzyruledeepstructuresfor effi-cientknowledgeextractionfrommedicaldata.IEEEIntConfSystManCybernet 2006;6:5053–8.
[46]BacheK,LichmanM.UCImachinelearningrepository.URL:http://archive.
ics.uci.edu/ml[accessed25.11.13].
[47]Axer H,JantzenJ,vonKeyserlingkDG.Anaphasiadatabaseonthe inter-net: a model for computer-assisted analysis in aphasiology. Brain Lang 2000;75(3):390–8.
[48]DamHH,AbbassHA,LokanC,YaoX.Neural-basedlearningclassifiersystems. IEEETransKnowlDataEng2008;20(1):26–39.
[49]UrbanowiczRJ,MooreJH.Learningclassifiersystems:acompleteintroduction, review,androadmap.ArtifEvolAppl2009;2009:1–25.
[50]KokolP,PohorecS,tiglicG,PodgorelecV.Evolutionarydesignofdecision treesformedicalapplication.WileyInterdisciplinaryReviews:DataMining andKnowledgeDiscovery2012;2(3):237–54.
[51]SiregarP,ToulouseP.Model-baseddiagnosisofbraindisorders:aprototype framework.ArtifIntellMed1995;7(4):315–42.
[52]KohaviR.Astudyofcross-validationandbootstrapforaccuracyestimation andmodelselection.In:Proceedingsofthe14thinternationaljoint confer-enceonArtificialintelligence–vol.2,IJCAI-95.SanFrancisco,CA,USA:Morgan KaufmannPublishersInc.;1995.p.1137–43.
[53]KnublauchH,FergersonRW,NoyNF,MusenMA.TheProtégéOWLplugin: anopendevelopmentenvironmentforsemanticwebapplications.In:The SemanticWeb-ISWC2004.Springer;2004.p.229–43.
[54]NoyNF,McGuinnessDL.Ontologydevelopment101:aguidetocreatingyour firstontology[Tech.rep.].StanfordKnowledgeSystemsLaboratory(KSL-01-05) andStanfordMedicalInformatics(SMI-2001-0880);2001.
[55]PeroneCS.Pyevolve:apythonopen-sourceframeworkforgeneticalgorithms. SIGEVOlution2009;4(1):12–20.
[56]HallM,FrankE,HolmesG,PfahringerB,ReutemannP,WittenIH.TheWEKA dataminingsoftware:anupdate.SIGKDDExplorations2009;11(1):10–8.
[57]CurkT,DemarJ,XuQ,LebanG,PetrovicU,BratkoI,etal.Microarraydatamining withvisualprogramming.Bioinformatics2005;21:396–8.
[58]TingK.Confusionmatrix.In:SammutC,WebbG,editors.Encyclopediaof machinelearning.USA:Springer;2010.p.209.
[59]WhitleyD.TheGENITORalgorithmandselectionpressure:whyrank-based allocationofreproductivetrialsisbest.In:Proceedingsofthethird interna-tionalconferenceongeneticalgorithms.MorganKaufmann;1989.p.116–21.
[60]Cochran WG.The 2 testof goodness of fit. AnnMath Stat 1952;23(3): 315–45.
[61]FleissJL.Statisticalmethodsforratesandproportions,2ndedition,Wiley seriesinprobabilityandmathematicalstatistics.NewYork:JohnWiley&Sons; 1981.
[62]LandisJR,KochGG.Themeasurementofobserveragreementforcategorical data.Biometrics1977;33(1):159–74.