Automatic Classification of Epilepsy Types Using Ontology-based and Genetic based Machine Learning

(1)

ContentslistsavailableatScienceDirect

Artiﬁcial

Intelligence

in

Medicine

jou rn al h om e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / a i i m

Automatic

classiﬁcation

of

epilepsy

types

using

ontology-based

and

genetics-based

machine

learning

Yohannes

Kassahun

a,∗

_,

_Roberta

_Perrone

b,∗∗

_,

_Elena

_De

_Momi

b

_,

_Elmar

_Berghöfer

f

_,

Laura

Tassi

c

_,

_Maria

_Paola

_Canevini

d

_,

_Roberto

_Spreaﬁco

e

_,

_Giancarlo

_Ferrigno

b

_,

Frank

Kirchner

a,f

a_Fachbereich₃_–_Mathematics_and_Computer_Science,_University_of_Bremen,_{Robert-Hooke-Str.}_5,_D-28359_Bremen,_Germany

b_Department_of_Electronics,_Information_and_{Bioengineering}_(DEIB),_Politecnico_di_Milano,_Piazza_Leonardo_da_Vinci_32,₂₀₁₃₃_Milano,_Italy

c_Centro_Chirurgia_Epilessia_“Claudio_Munari”,_Dipartimento_di_{Neuroscienze,}_Ospedale_Niguarda_Cà_Granda,_Piazza_Ospedale_Maggiore_3,₂₀₁₆₂_Milano,

Italy

d_Epilepsy_Center,_San_Paolo_Hospital,_Via_A._Di_Rudini_8,₂₀₁₄₂_Milan,_Italy

e_Department_of_Experimental_{Neurophysiology}_and_{Epileptology,}_Istituto_Nazionale_Neurologico_‘C._Besta’,_Via_Celoria_11,₂₀₁₃₃_Milano,_Italy

f_German_Research_Center_for_Artiﬁcial_Intelligence_(DFKI),_Robotics_Innovation_Center,_{Robert-Hooke-Str.}_5,_D-28359_Bremen,_Germany

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received4September2013

Receivedinrevisedform24February2014

Accepted7March2014

Keywords:

Ontology-basedclassiﬁcation

Genetics-basedclassiﬁcation

Datamining(knowledgediscovery)from

medicaldata

Epileptogeniczoneidentiﬁcation

a

b

s

t

r

a

c

t

Objectives:Inthepresurgicalanalysisfordrug-resistantfocalepilepsies,thedeﬁnitionofthe epilepto-geniczone,whichisthecorticalareawhereictaldischargesoriginate,isusuallycarriedoutbyusing clinical,electrophysiologicalandneuroimagingdataanalysis.Clinicalevaluationisbasedonthevisual detectionofsymptomsduringepilepticseizures.Thisworkaimsatdevelopingafullyautomaticclassiﬁer ofepileptictypesandtheirlocalizationusingictalsymptomsandmachinelearningmethods.

Methods:Wepresent theresultsachievedbyusingtwomachinelearningmethods.Theﬁrstis an ontology-basedclassiﬁcationthatcandirectlyincorporatehumanknowledge,whilethesecondisa genetics-baseddataminingalgorithmthatlearnsorextractsthedomainknowledgefrommedicaldata inimplicitform.

Results:Thedevelopedmethodsaretestedonaclinicaldatasetof129patients.Theperformanceofthe methodsismeasuredagainsttheperformanceofsevenclinicians,whoselevelofexpertiseishigh/very high,inclassifyingtwoepilepsytypes:temporallobeepilepsyandextra-temporallobeepilepsy.When comparingtheperformanceofthealgorithmswiththatofasingleclinician,whoisoneoftheseven cli-nicians,thealgorithmsshowaslightlybetterperformancethantheclinicianonthreetestsetsgenerated randomlyfrom99patientsoutofthe129patients.Theaccuracyobtainedforthetwomethodsandthe clinicianisasfollows:ﬁrsttestset65.6%and75%forthemethodsand56.3%fortheclinician,secondtest set66.7%and76.2%forthemethodsand61.9%fortheclinician,andthirdtestset77.8%forthemethods andtheclinician.Whencomparedwiththeperformanceofthewholepopulationofcliniciansontherest 30patientsoutofthe129patients,wherethepatientswereselectedbythecliniciansthemselves,the meanaccuracyofthemethods(60%)isslightlyworsethanthemeanaccuracyoftheclinicians(61.6%). Resultsshowthatthemethodsperformatthelevelofexperiencedclinicians,whenboththemethods andthecliniciansusethesameinformation.

Conclusion:Ourresultsdemonstratethatthedevelopedmethodsformimportantingredientsforrealizing afullyautomaticclassificationofepilepsytypesandcancontributetothedefinitionofsignsthataremost importantfortheclassification.

∗ Correspondingauthor.Tel.:+4942125752299.

∗∗ Correspondingauthor.

E-mailaddresses:kassahun@informatik.uni-bremen.de(Y.Kassahun),

roberta.perrone@polimi.it(R.Perrone).

1. Introduction

Fordrugresistantfocalepilepticpatients,brainsurgerycanbe

theonlyoption tocurethepatient.Theepileptogeniczone(EZ)

is the corticalarea where ictal dischargesoriginate and which

must be surgically resected or disconnectedto achieve seizure

freedom[1].ThedeﬁnitionoftheEZisusuallyobtainedthrough

http://dx.doi.org/10.1016/j.artmed.2014.03.001

(2)

non-invasivepre-surgicalinvestigations,inmostcasesbyclinical,

electrophysiologicaland neuroimagingdataanalysis[2].Clinical

evaluation is based on the anamnestic data and onthe visual

analysisofsymptomsduringepilepticseizuresinlongtermscalp

video-electroencephalograms(video-EEG).Intra-cerebralinvasive

monitoring(stereo-EEG) maybeneededforverydifﬁcultcases.

Clinicalmanifestationscanbejustsubjective[3](epilepticauras

withoutimpairmentofconsciousness)ormayprogressto

objec-tivesignsthatcanbeobservedandanalyzed,oftenassociatedwith

impairmentofawareness.Thechronologicalsequenceof

symp-tomsandtheirrelativedurationmightindicatetheseizureonset

areaandtheprogressiveinvolvementofadjacentbrainstructures

andis,therefore,crucialforthedeﬁnitionoftheEZ[4],though

iden-ticalsymptomsmayarisefromdifferentcorticalregions,leadingto

misleadinglocalizationoftheEZ[5,6].Themostsigniﬁcant

symp-tomsappearintheﬁrst60s,evenifthepost-ictalconfusioncanbe

prolonged,mainlyifassociatedwithspeechdisturbances,whichis

thecaseforEZlocatedinthedominanthemisphere.

Temporallobeepilepsies(TLE),inwhichseizuresoriginatefrom

thetemporallobe,representthemajorityoffocalepilepsies(over

60%).ThesurgicaltreatmentofTLEhasthebestresultswithalmost

70%ofthepatientsreportedtobeseizure-freeaftersurgery[7].The

duration,thetimecourse,thelossofconsciousnessandthe

post-ictalphenomenaaredifferentinTLEandextra-temporalseizures

(ExTE).Seizuresarisingfromthetemporallobehavearelatively

gradualevolution (comparedtoExTE)[8].In TLEthesubjective

phenomenaaremorefrequent,thelossofconsciousnessismore

gradual,seizuresarelongerandthepost-ictalconfusionismore

important.TheidealsequenceofictalmodiﬁcationsinTLEisa

sub-jectivemanifestation(epigastricfeelingoftenraisingup,auditory,

visualandpsychicsymptoms,arequiteexclusivelylocatedinthe

temporallobe)followedbyalossofconsciousness,oro-alimentary

automatisms,homolateralheadorientationandacontralateralarm

dystonicposturingwithhomolateralgesturalautomatisms.Onthe

contrary,seizuresin ExTE are frequentlyabrupt, brief,without

subjectivemanifestations,thelossofconsciousnessisprecocious,

nooro-alimentaryautomatismsarepresentandmotor

modiﬁca-tioncanbebilateralandmoretonic.However,notalltheseizures

areeasilyrecognizableandparticularlythehypermotorgestural

automatismscanoccurinbothTLEandExTE[5].The

state-of-the-artisthevisualanalysisoftheclinicaldataperformedbyexpert

cliniciansalwaysinassociationwithanatomicalmedicalimages

andEEGdata.Inliterature,thereisnotyetanyautomatic

classiﬁ-cationofepilepsytypesbasedonclinicalsymptomsonly.

Itwouldbeextremelyimportanttoprovideclinicianswithan

automaticclassiﬁerof epilepsytypes,whichcouldhelpto

vali-datetheopinionofthecliniciansincaseofagreementbetween

themethodsandtheclinicians,orwould,onthecontrary,forcethe

clinicianstorevisetheirdecisionorlookinmoredetailatthecase.

Inthispaperweanalyzetwomethodsforrealizinganautomatic

classiﬁerofepilepsytypes,startingfrommedicaldata,whichis

usu-allyunstructuredinformation.Theﬁrstmethodisabletolearnthe

humanknowledgeonhowtoperformdiagnosis,basedonclinical

symptomsandthesecondautomaticallyretrievesknowledgefrom

medicaldata,searchingforexistingpatternsinthedata(e.g.

clin-icalsymptoms).Fortheﬁrstmethodweusedanontology-based

classiﬁcationapproach[9],whileforthesecondmethodweused

genetics-basedmachinelearning(datamining)[10].Inadditionto

dealingwithunstructureddata,bothmethodscanalsodealwith

heterogeneousandincompletedata.Themotivationbehind

choos-ingthetwomethodsistheviewthatlearningbynomeansentails

atabularasaview.Itinvolvestheincorporationofpriorknowledge

thatcanaccelerateorotherwiseimprovethelearningprocess.A

combinationoftheabovemethods willallowus toincorporate

domainknowledgeandatthesametimediscovernovelmedical

knowledge.Thetwomethodsweretestedonaclinicaldatasetof

patientsandcomparedwiththeclassiﬁcationscarriedoutbyexpert

clinicians.

Thepaperisorganizedasfollows:ﬁrstwereviewrelevantwork

intheﬁeldofontology-basedclassiﬁcationandgenetics-baseddata

miningin thecontext of learning from medical data. Thenwe

presentthealgorithmsusedinthepaper,followedbythe

presen-tationoftheactualexperimentswithpatientdataandtheresults

obtained.Finally,wegiveaconclusionandanoutlookbasedonthe

workpresentedinthispaper.

2. Reviewofrelatedwork

In this paper two differentalgorithms are presented, which

exploitdifferentmethods:theontology-basedclassiﬁcation(OBC)

and the genetics-based data mining (GDM). We give a review

ofrelated workintheareaof ontology-basedclassiﬁcationand

genetics-baseddatamining.

2.1. Ontology-basedclassiﬁcation

Ontological modelling provides a description of a speciﬁc

knowledgedomainand itis madeofclasses, properties(which

expressrelationshipsbetweenclasses)andinstances(whichare

the“groundlevel”componentsofanontologyandpopulateclasses)

[11].Inmedicalﬁeld,ontologiescanthereforebeusedforsharing

medicalknowledgeinareliableformatsothatitis

understand-ableandcanbeprocessedbybothhumansandmachines.Medical

ontologieslikeOpenGALEN[12],SNOMED-CT[13]andUMLS[14]

areusedinthehealthcarepractice.TheOpenGALENontologywas

usedinurologytodevelopadecisionsupportsystemfortreating

patientswithurinary-tractinfections[15].SNOMED-CTontology

provideshealthcareterminologywithcomprehensivecoverageof

diseases,clinicalﬁndings,etiologiesandtherapiesintheparticular

ﬁeldofanesthesiology.Itisusedinelectronichealthcarerecords

(EHR)and isrelated tocritical patient-speciﬁcinformation,like

drugallergies.SNOMED-CTmakesthemedicalknowledgemore

usableandaccessible[16].EPILONT1_is_an_ontology_about_epilepsy

domainandepilepticseizureswhiletheEpilepsyandSeizure

Ontol-ogy(EpSO)2_gives_a_{classiﬁcation}_of_epilepsy_syndromes,_location,

etiologyandrelatedmedicalconditionsaccordingtothe

Interna-tionalLeagueAgainstEpilepsy(ILAE)organizationstandards.

Ontologiesmadeofhierarchiesandpropertiesbetweenclasses

canbeusefulfordataaggregationandclustering.Suchontologies

providedomainknowledgeandsupporttheinterpretationof

rela-tionsidentiﬁedindatasetthroughdataminingprocesses,basedon

linguisticorstatisticaltechniques[17].Asanexample,the

founda-tionalmodelofanatomy(FMA)[18]ontologyisusedasasourceof

anatomicalknowledgeforpredictingtheconsequencesofinjury.

Inthisapplication,knowledgeaboutspatialrelationsbetweenthe

injuryandvitalorgansisprovidedbytheFMA[19].

Ontologicalmodelingwasusedalsoinotherdomains,like

cus-tomerknowledgeassessment[20],whereontologiesareusedto

describe customers population and the ontology graph,

repre-sentedwitharcsandnodes,isusedforclassifyingthemaccording

todifferentcriteria.In [21],theauthorsdesignedsurgical

mod-elsof neurosurgery makinguseof ontology and described 106

surgicalcases. Through classiﬁcation trees and clustering

algo-rithms,theyextractedsurgicalknowledge,facilitatingthesurgical

decision-makingprocessandsurgicalplanning.In[22],Leeetal.

triedtoovercomethelimitofclassicalontologydealingwith

uncer-tainknowledgeandclassiﬁeddifferentdiabetessyndromesusing

ontology-based inference rules. Highly expressive rules-based

1_{http://bioportal.bioontology.org/ontologies/EPILONT}_. 2_{http://prism.case.edu/prism/index.php/}_.

(3)

languages,likeSWRL[23],failinrepresentingfuzzyinformation

[24],asincaseofsymptomsoccurrence.Forthisreason,weexploit

ontologiestoextractnewprobabilisticinformationconsideringthe

depthofconceptswithrespecttotherootconceptandthedistance

betweensymptoms.

2.2. Genetics-baseddatamining

Datamining(DM)isamachinelearningprocedure,whichcan

beusedtoautomaticallyﬁndusefulpatternsfromadataset[25].

Theprocedurehasbeensuccessfullyappliedinvariousﬁeldsand

ismakingitswayintohealthcaresystems.Dataminingalgorithms

canlearnfromexamplesandmodelthenon-linearrelationships

betweenvariables.Themostcommonlyuseddatamining

algo-rithms for health care systems include naive Bayes classiﬁers,

neural networks, support vector machines, logistic regression,

fuzzyrulesanddecisiontrees.

ThereareseveralDMtechniquesdevelopedfordiagnosing

dis-eases.Forexample,Soni etal.[26],andDangareand Apte[27]

presenteddataminingtechniquesforheartdiseasediagnosis,and

Ganesanetal.[28]presentedtheuseofartiﬁcialneuralnetworks

forcancerdiagnosis.

DMtechniquesarealsodevelopedforprognosingdiseases.In

[29], a Bayesianexpertsystemfor clinically detectingcoronary

arterydiseaseisgiven.In[30],DMtechniquesareusedfor

pre-dictingheartattacks.In[31],artiﬁcialneuralnetworkshavebeen

appliedforprognosingendstagekidneydisease.TheworkofFloyd

[32]presentstheapplicationofDMtechniquesforprognosisofthe

pancreaticcancer.

Moreover,someauthorscomparedtheperformance of

algo-rithms for the diagnosis or prognoses purposes. In [33], the

discriminatorypowerofk-nearestneighbors,logisticregression,

artiﬁcial neural networks, decision trees, and support vector

machines on classifying pigmented skin lesions for diagnosis

purposeis analyzed.In[34],asummary ofcomparisonof

well-performingDMalgorithms usedfor bothdisease diagnosisand

disease prognosis is given. Prasad et al. [35] compared

auto-associativememoryneuralnetworks,Bayesiannetworks,iterative

dichotomized 3 (ID3) and C4.5in the diagnosisof thedisease

asthma.

DMtechniquesarefurthermoreappliedtoanalyzevarious

sig-nalsandtheirrelationshipswithparticulardiseasesorsymptoms.

Anexampleistheapplicationofsupportvectormachinesin

auto-maticseizuredetectioninEEG,whichcanbeusedindiagnosing

neurologicaldisordersrelatedtoepilepsy[36].

Inadditiontodiseasediagnosesand/orprognoses, DM

tech-niquesareusedtoextractdiagnosticrulesfrommedicaldataset

[37,38].Therulesgeneratedaresimilartothosecreatedmanually

inexpertsystemsandthereforecanbeeasilyvalidatedbydomain

experts.ForadetailedreviewofDMtechniquesandchallengesin

miningmedicaldata,pleasereferto[39–42].

Thegenetics-baseddataminingmethodpresentedinthispaper

isrelatedtotheworkintheareaofdataminingfordiscovering

temporalhiddenknowledgefrommedicaldataset.In[43],learning

aBayesiannetworkandrulesisusedtodiscovermedical

knowl-edge.Agrammar-basedgeneticprogrammingisusedasasearch

algorithm.Themethodisappliedinthedomainsoflimbfracture

and scoliosis. Thework by Tan et al.[44] utilizesgenetic

pro-grammingandgeneticalgorithmstoevolveclassiﬁcationrulesfor

hepatitisandbreastcancerdatasets.NuovoandCatania[45]

inte-gratedfuzzylogicand geneticalgorithmsinordertodiscovera

fuzzyrulebaseddiagnosticsystemfrommedicaldatasetsforbreast

cancerandPima-IndiandiabetesdatafoundintheUCIrepository

ofmachine learning databases[46], andmedical dataon

apha-siafoundinAachenaphasiadatabase[47].Dametal.presented

a neural-basedlearning classiﬁer system[48] that incorporates

neuralnetworksintoalearningclassiﬁersystems[49].Themethod

wasappliedonmedicaldatasetssuchasthediabetes,breast

can-cer,heartdiseaseandlymphographyfoundintheUCIrepository

ofmachinelearningdatabases.In[50]decisiontreesareevolved

usingevolutionaryalgorithms.

3. Methods

In thissectionwe giveanoverviewofthemethods thatwe

developedandusedtodeterminethetypeofepilepsyofpatients

basedonclinicaldatadescribedinSection4.1.

3.1. Ontology-basedclassiﬁcation

Theontology-basedclassiﬁcationmethodfocusesonepileptic

symptomscorrelation.Theontologicalmodelingdescribesthe

rela-tionshipsbetweenclassesandclassesandindividuals.

Inthe“patientseizureontology”,ictalsymptomsareontological

instances(i.e.thegroundlevelobjects)ofclass“ictal-symptom”

(suchas“visualillusions”,“fear”,etc.happeningduringseizures).

Therelationshipbetweenclassesisexpressedbypredicates(such

as“subclassOf”,“hasManifestation”,etc.).Inthedomain

ontol-ogy,therearesomegeneralclasses(i.e.collectionsofobjects)like

the“ictal-symptom”classwhichgroupsallthepossible

symp-tomsthatcanbeobservedduringanepilepticseizure.Fig.1shows

theseizureontologymodel.

The “patient” class groups all the patients of the dataset

and itis linkedtothe“seizure”generalclass bytheproperty

“hasSeizure”. Every epileptic seizure is described by a set of

manifestations, deﬁned in the “manifestation” and

“noMani-festation” classes, which refer toobserved and not observed

symptomsduringseizurebythecliniciansandthesetwoclasses

are linkedtothe“seizure” class by“hasManifestation” and

“hasNoManifestation”relationshipsrespectively.Inadditionto

this,observedmanifestationsarecharacterizedbytheiroccurrence

time.“early”and“late”classesaresubclassesofthe

“manifes-tation”class.Symptomsmanifestedintheﬁrst10sbelongtothe

“early”class,whilelater symptomsbelongtothe“late”class.

“manifestation”and“noManifestation”classesaredeﬁnedas

“unionOf”objectsoftheclass“ictal-symptom”.Asin[51],inorder

totranslatetherelationshipbetweenpairsofsymptoms,

consider-ingtheirproperties,wecomputedthedistancesintermsofgraph

arcs.

TheCkmatrix,whichexpressesthecorrelationbetweeneach

pairof symptoms(i.e.thelikelihood that ifduringan epileptic

seizuresymptomiispresent,symptomjispresentaswell)fora

speciﬁcpatientk(k∈{1,...,M},whereMisthenumberofpatients)

iscomputed.

Eachcorrelationmatrixelementck(i,j)betweenpairsof

symp-tomsiscalculatedusing

ck(i,j)=

(d(si)+d(sj))˛+ˇ

((D(si,sj)∗˛)+2max(d(si),d(sj)))+ˇ

(1)

i,j∈{1,...N},whereNisthetotalnumberofsymptoms,siandsj

aretheconsideredsymptoms(e.g.“headdeviation”and“visual

illu-sion”),d(si)andd(sj),arethedistancesofsiandsjfromtheseizure

classintheontologytreeasshowninFig.2.D(si,sj)represents

thelengthoftheshortestpathbetweensiandsjintheontology

tree(numberofarcsbetweensiandsj)asshowninFig.3aandb.

max(d(si),d(sj))representsthelongestdistancefromthe“seizure”

classinthedomainontologytreeofthetwosymptomssiandsj.The

variables˛andˇrangefrom0to1.˛isusedtobalancethe

pro-portionbetweendandDvalues,ˇmakesthedenominatornotto

(4)

Fig.1. Ontologymodelofapatientseizurewithearlyandlatesymptomsmanifestations.“eyesdeviation”isobservedintheﬁrst10softheregistration,while“head

deviation”and“visualillusion”appearaftertheﬁrst10s.

Fig.2.Thedistanceofsymptomsifromthe“seizure”classintheontologytreeiscalculatedasthenumberofarcsinbetweenanditisequalto2.

Fig.3.Thedistancebetweentwosymptoms(D)iscalculatedasthenumberofarcsbetweenthem:e.g.thedistancebetween“headdeviation”and“visualillusion”is2arcs

(5)

Fig.4.InteractionbetweentheGAandtheMLtools.IntheﬁgureMindicatesthepopulationsizeofthegeneticalgorithm.

Fig.5.AchromosomeencodingasolutionproposedbytheGA.ThegeneCjencodesthejthclassiﬁertotrainanduse.TheﬁrstNgenesfollowingthegeneencodingthe

classiﬁerrepresentthetimeintervalstoconsider.Forsymptomsithetimeintervalisgivenbytesi−t

b

si.ThegeneFisaﬂagencodingwhetherafeatureselectionwillbe

employed.Ifitsvalueisone,thenfeatureselectionwillbeemployed,andifitsvalueiszero,nofeatureselectionwillbeused.ThelastgeneFsmencodesthefeatureselection

methodtoapply.

EachelementmcTLE(i,j)andmcExTE(i,j)ofthemeanofthe

matri-cesthatexpressthecorrelationbetweensymptomsforeachgroup

(TLEandExTEpatients,MCTLEandMCExTEmatrices),iscomputedas

mc(i,j)TLE= 1 W W

w=1 Cw(i,j) (2) mc(i,j)ExTE= 1 V V

v=1 Cv(i,j), (3)

whereW+V=M,WandVarethenumberofpatientsofthedataset

withtemporallobeepilepsiesandextratemporallobeepilepsies

respectively.

Inordertoclassifytheseizuretypes,eachseizureisassigned

tothegroupwhosedistancefromMCisminimum,wherethe

dis-tances,DTLEandDExTE,arecomputedby

DTLE=

N

i=1

⎛

⎝

N j=1

(ch(i,j)−mc(i,j)TLE)2

⎞

⎠

(4) DExTE=

N

i=1

⎛

⎝

N j=1

(ch(i,j)−mc(i,j)ExTE)2

⎞

⎠

(5)

h∈{1,...H},whereHisthetotalnumberofpatientstobetested.

Weoptimized˛andˇusingthetenfoldcross-validationonthe

dataset[52]bycontinuouslychangingtheirvalues.Varyingthe

val-uesof˛andˇ,doesnotaffecttheclassiﬁcationpower,evenifthe

correlationbetweensymptomsischanged.Wethereforesetboth

˛andˇto0.5,asthesumofthevariableshastobeone[51].

Theepilepticseizuresontologyis builtinontologyweb

lan-guage(OWL),thepopularontologicallanguage,usingProtégé3.5,

anopen-sourcetool,forcreating,editingandvisualizingontologies

[53].Wefollowedtheproposedguideline[54],collectingspeciﬁc

domainknowledgefromexperts,textbooksandpapersintegrating

existingontologiesanddeﬁningclassesandclasshierarchy.

3.2. Genetics-baseddatamining

TheGDM algorithm is a combinationof a geneticalgorithm

(GA),Pyevolve[55],andmachinelearning(ML)toolsforsupervised

learningWEKA[56]andOrange[57],whichimplementdifferent

classiﬁers(e.g.decisiontrees,multilayerperceptrons,support

vec-tormachines,etc.).TheﬂowchartofGDMisshowninFig.4.

Theparametersthatareoptimizedbythegeneticalgorithmare:

1theclassiﬁer,Cj,totrain,

2theinstantoftimeinwhichasymptomsistartstsb_iandtheinstant

oftimewhenitendste

s_i,

3thefeatureselectionmethod,Fsm,thatisusedtodetermine

symp-tomswhichplayanimportantroleinclassifyingpatientsintoTLE

andExTEgroups.

TheGAstartsbygeneratingrandomsolutions(i.e.the

chromo-somess1,s2,...,sk,...,sM,whereMisthenumberofsolutions).

Fig.5showsachromosomeencodingpropertiesofasolution.Each

solution,sk,willbedecodedintotheclassiﬁertotrainCj,the

fea-tureselectionﬂag,F,thefeatureselectionmethodtoapply,Fsm,

andthecorrespondingtrainingdata.Aftertraining,themachine

learningtoolsgivebacktheconfusionmatrix[58]thatresultsafter

thetenfoldcross-validation[52]isperformed.Fromtheconfusion

matrix,theﬁtnessoftheindividual(qualityofasolution)is

calcu-latedusingtheharmonicmeanaccuracy(HMACC),whichisgiven

by

HMACC(sk)=

2

(1/(1−FP(sk)))+(1/(1−FN(sk))),

(6)

Table1

GAparameters.

Populationsize 50

Mutationprobability 0.05

Crossoverprobability 0.8

Maximumnumberofgenerations 100

Selectionmethod Rankbasedselection[59]

whereFP(sk)standsforthefalsepositiverateandFN(sk)standsfor

thefalsenegativeratecorrespondingtoasolutionsk.Thevalues

fortheHMACCliebetweenzeroandone.AhighervalueofHMACC

meansabettersolution.IfasolutionclassiﬁescorrectlyalltheTLE

patientsbutmisclassiﬁestheExTEpatients,itwillresultinalow

HMACCvalue.

Theoptimizationprocessof thegeneticalgorithm isrun for

somegenerations and stopped if the maximum HMACC sofar

obtaineddoesnot improveover the last 20 generations orthe

maximumnumberofgenerationsisovercome.Table1showsthe

parameters of the geneticalgorithm for resultsobtained using

genetics-baseddataminingalgorithm.Theparameterswereset

manuallyandkeptthesameforallexperimentsreportedinthis

paper.

4. Experimentsandresults

Toevaluatemachinelearningmethods,onehastotakecareof

afewgeneralproblems.Forcategorizingtheepilepsytypesinto

temporal(TLE)andextra-temporal(ExTE),themethodspresented

inthispaperaretrainedonatrainingdatasetandtestedonatest

dataset.Thesetwodatasetswereselectedtobedisjointbecausethe

algorithmsareoptimizedonthetrainingdatasetforinstanceusing

crossvalidation.Sincethetestdatasetisnotincludedinthetraining

datasetatall,itiscompletelyunknowntothemethodswhenit

ispresentedtothemethodstotesttheirperformance.Hencethe

responsesofthemethodsgiveanideaonhowthemethodswould

performonanewdatadataset(newpatientsinourcase).

Inthissectionwedescribeﬁrsttheclinicaldataofpatientswe

usedforanalyzingtheperformanceofthemethodsinSection4.1

andthentheresultsobtainedwithitinSections4.2,4.3and4.4.In

contrasttothestatisticalanalysisgiveninSection4.4,inSections

4.2and4.3wedealwiththecomparisonbetweenthemethodsand

clinicians.Forcliniciansre-samplingoftestdatasetsisnot

possi-blebecausetheclinicianswouldremembertheexamplesinthe

datasets.Therefore,different“trials”canonlybedoneby

differ-entclinicians.Intheﬁrsttest(Section4.2),weprovidedaclinician,

whoselevelofexpertiseisveryhigh,withthreetestdatasetsof

patients,andweaskedhertoclassifyeachpatient.Inthesecond

test(Section4.3)weaskedsevendifferentclinicians,whose

exper-tiseishigh/veryhigh,toclassify30patientsthatdidnotbelongto

anytraining,testingorvalidationdatasetpreviouslyused.

4.1. Descriptionofdatasetused

Theclinicaldatasetofpatientsthatweusedinourworkwas

obtainedfromthree clinicalepileptological centers:CarloBesta

NeurologicalInstitute,NiguardaCaGrandaHospitalandSanPaolo

Hospital,allinMilan,Italy.Allpatientsusedintheexperiments

reportedinthispaperaredrug-resistantepilepticpatients,whodo

notpresentanyotherpathology(e.g.malignanttumors),andthey

wereoperatedonandseizure-freeforatleastoneyear.Thisfact

isusedasaground-truthforthegenerationofthelabelsofthe

patients,whichareTLEandExTE.Allofthepatientshavesigned

aninformedconsenttothetreatmentoftheiranonymizeddata.

Thedatasetfor theexperimentsconsistsof99 labeledpatients,

ofwhich 60sufferfromTLEand 39fromExTE.Furthermore,an

additionaltestsetof30patients,ofwhich10sufferfromTLEand

Table2

Patientsdataset’soccurrencetable.Therowsareforsubjectiveandobjective

symp-toms,thecolumnsareforthetwogroupsofpatients(TLEandExTE).Theﬁrst14

symptomsaresubjective,thelast15symptomsareobjective.

No. Symptom TLE ExTE

1 Epigastric 20 1

2 Auditoryillusion 0 0

3 Auditoryhallucinations 0 2

4 Visualillusionsimple 1 0

5 Visualillusioncomplex 0 1

6 Visualhallucinationsimple 0 1

7 Visualhallucinationscomplex 0 1

8 Olfactivehallucinations 1 0

9 Gustatoryhallucinations 1 0

10 Psychicmanifestation 10 2

11 Fear 2 3

12 Autonomic(tachicardia,dyspnoea,etc.) 11 8 13 Motor-sensitivemanifestationmonolateral 1 2 14 Motor-sensitivemanifestationbilateral 1 0

15 Oroalimentaryautomatisms 36 10

16 Gesturalautomatismsmonolateral 26 14

17 Gesturalautomatismsbilateral 16 12

18 Headdeviation 35 38

19 Eyesdeviation 32 29

20 Motorclonicmonolateralarm 5 6

21 Motorclonicbilateralarms 1 2

22 Motorclonicmonolateralleg 1 3

23 Motorclonicbilaterallegs 1 4

24 Motorclonicmonolateralmouth 3 5

25 Motordystonicmonolateralarm 27 23

26 Motordystonicbilateralarms 14 10

27 Motordystonicmonolateralleg 2 5

28 Motordystonicbilaterallegs 6 7

29 Autonomic(ﬂushing,shialorrhea,vomiting) 11 3

20fromExTE,isusedfortheexperimentdescribedinSection4.3.

Thedatasetisgeneratedasfollows.Onceclinicianshavea

collec-tionofvideorecordingsofepilepticseizures,theycreateatablefor

eachpatientinwhich29symptomsarepresentandwhichgives

thedurationinsecondsoftheseizures.Theﬁrst14 rowsofthe

tablecorrespondtothesubjectivesymptomsandthenext15to

theobjectivesymptoms.Foreachpatient,cliniciansﬁllinthetable,

assigningtothesymptomsobserveda‘1’,anda‘0’tomanifestation

notobserved(Fig.6).Thepatientfromwhichthedataisshownin

Fig.6hasthreedifferentsymptoms(“epigastric”sensation,“visual

illusionsimple”and“motordystonicmonolateralleg”).The

occur-renceofsymptomsinpatientsdatasetusedforthisworkisshown

inTable2.

4.2. Test1.ComparisonofOBCandGDMwithasingleclinician

ThecomparisonofOBCandGDMdiagnosiswithasingle

clin-ician,expertinepilepsytypeclassiﬁcation,isperformedonthree

testdatasetsgeneratedfromthetrainingsetweobtained.The

sin-gleclinician’slevelofexpertiseisveryhighandsheisoneofthe

7clinicianswhoparticipatedinthesecondstudy(seeSection4.3).

Thetestdatasetsaregeneratedbyselectingrandomentriesfrom

thetrainingset.Afterselectingthetestdatasets,therestofthedata

isusedintrainingOBCandGDM.Pleasenotethatwhilethe

mem-bersofthetestdatasetsdonotchangeonceselected,themembers

ofthetrainingandvalidationwillchangeduringcross-validation.

Table3givestheexperimentalsetupusedforTest1.

Table4showstheperformancewithrespecttoaccuracy,that

isthenumberofcorrectlyclassiﬁedpatientsfortheclinician,OBC

andGDMonthetestdatasets.Theaccuracyweusedisgivenbythe

equation

Accuracy=#correctlyclassiﬁedTLEs+#correctlyclassiﬁedExTEs

#TLEs+#ExTEs (7)

wherethesymbol#standsfor“numberof”.Cochrantestwas

(7)

Fig.6.Exampleofpatientsymptoms.Symptomsarelistedintheﬁrstcolumn.A‘1’standsforapresenceofasymptom,anda‘0’fortheabsenceofasymptomataparticular time.

AscanbeseenfromTable4,theperformanceofOBCandGDMis

betterthantheclinicianfordataset0andcomparableperformance

fordataset1anddataset2evenifthereisnostatisticalsigniﬁcance

(p0=0.2592,p1=0.5292,p2=1).Table5showsthereliabilityofthe

agreementsusingtheFleiss’kappavalues[61]amongtheclinician,

OBCandGDM.Fordataset0anddataset2,OBCandGDMhavethe

highestreliabilityofagreements,whilefordataset1GDMhasthe

highestreliabilityagreementwiththeclinician.Fordataset0,GDM

hastheleastreliabilityofagreementwiththeclinician.

4.3. Test2.ComparisonofOBCandGDMwithsevenclinicians

Sevenclinicians,expertsintheﬁeldofepilepsydiagnosis,were

askedtoclassifyepilepsytypesonthebasisofapatient’sseizure

symptoms.Pleasenotethatthelevelofexpertiseofalloftheseven

cliniciansishigh/veryhigh.AccuracywascomputedasinEq.(7)

Table3

Test1.ExperimentalsetupusedforcomparingOBCandGDMclassiﬁcationwitha

singleclinician.

Dataset0 Dataset1 Dataset2

TLE ExTE TLE ExTE TLE ExTE

Training 39 19 39 29 49 32

Validation 4 5 7 3 5 4

Testing 17 15 14 7 6 3

Table4

Test1.Accuracyobtainedfortheclinician,OBCandGDMonthreetestdatasets.

Clinician 18/32 13/21 7/9

OBC 24/32 14/21 7/9

GDM 21/32 16/21 7/9

Table5

Test1.Fleiss’kappavalues.

OBCandclinician 0.13 0.25 0.13

GDMandclinician 0.02 0.57 0.45

OBCandGDM 0.60 0.49 0.59

Clinician,OBCandGDM 0.55 0.36 0.31

Table6

Test2.Experimentalsetup.

TLE ExTE

Training 45 26

Validation 5 3

Testing 10 20

andtheCochrantestwasperformed(p≤0.05).Table6showsthe

experimentalsetupforTest2.

AscanbeseenfromTable7bothOBCandGDMperformatthe

levelofexperts(p=0.4139).Table8showsthecollaborative

deci-sionmadebytheclinicians,thealgorithms,andthealgorithmsand

theclinicians.Amajorityofvotesisusedtodeterminethe

deci-sionofthecliniciansandthealgorithms.Fortheclinicians,ifmore

thanthreecliniciansvotedinfavorofanepilepsytype,thiswillbe

consideredasthedecisionoftheclinicians.ForOBCandGDM,only

caseswherebothvotedinfavorofanepilepsytypeisconsidered

asthedecisionofthealgorithms.Ifbothdisagree,thiscasewillbe

consideredasatiesincethereisnomajorityvotinginfavorofan

epilepsytype.Anotherinterestinganalysisistheagreementlevels

oftheclinicians,thealgorithmsandthealgorithmsandthe

clini-cians.Table9showstheFleiss’kappavaluesfortheclinicians,the

algorithmsandthealgorithmsandtheclinicians.

Fromthetable,onecanseethatthereliabilityofagreements

ofthecliniciansisbetterthanthereliabilityofagreementsofthe

algorithms.Moreover, onecan seethatthereliability of

agree-mentsofthealgorithmswiththecliniciansiscomparablewithGDM

slightlybetterthanOBC.FollowingLandisandKoch[62]thereisa

fairagreementbetweenOBCandGDMandamoderateagreement

amongtheclinicians.Onecanalsoseethatthere isamoderate

agreementamongOBC,GDMandtheclinicians.

Table7

Test2.AccuracyobtainedbyOBC,GDMandsevenclinicianson30testingpatients.

ThesymbolsCl-1,Cl-2,...,Cl-7standforclinician1,clinician2,...andclinician7

respectively.

OBC GDM Cl-1 Cl-2 Cl-3 Cl-4 Cl-5 Cl-6 Cl-7

18/30 18/30 15/30 16/30 19/30 19/30 19/30 20/30 22/30

Table8

AccuracyobtainedbyOBCandGDMandsevenclinicianson30novelpatientsusing majorityvoting.

Clinicians OBCandGDM All

Majorityvoting 20/30 12/30 20/30

Tie – 11/30 –

Table9

Test2.Fleiss’kappavaluesfortheclinicians,OBCandGDM,andOBC,GDMandthe clinicians.

Fleiss’kappa

Clinicians 0.55

OBCandclinicians 0.44

GDMandclinicians 0.48

OBCandGDM 0.27

(8)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

HMACC Boxplot

M D G C B O

HMACC

Fig.7. AboxplotdiagramoftheHMACCof100valuesfrom100repetitionsofthe trainingandtestingofthealgorithms.TheboxplotoftheHMACCisonlyforthetest accuracy.

4.4. Test3.Statisticalanalysisofthemethods

Togetan ideaof theperformance that wecan expectfrom

thealgorithmsweusedstandardmeasuresfromstatistics,suchas

expectedvalue(mean)andthestandarddeviation.Inordertobe

abletocalculatesuchvalueswehavetorepeatthetrainingphase

andthetestingofthealgorithmsindependentlyandmultipletimes.

Sincethedatacomesfromrealpatientsitisnotpossibleto

gener-atealargenumberoftestsamples.Therefore,theonlypossibility

istogeneratenewtrainingandtestdatasetsbyre-samplingthe

availabledata.

Wedecidedtouse20%oftheavailabledataasthetestdataset

andtheremaining80%asthetrainingdataset.Foreachtrialthetest

datasetisdrawnfromthedatabyrandomselection,20%ofthe

tem-poraland20%oftheextra-temporalexamplesandtheremaining

examplesareusedasthetrainingdataset.Selectingthesamples

inthiswayguaranteesthatthedistributionofthetwoclassesis

preservedforthetestandthetrainingdataset.Thealgorithmsare

trainedonthetrainingdatasetandaftertheyareoptimizedfortheir

performancetheyareevaluatedonthetestdataset.Whenthetest

performanceiscalculateditisimportantthatthealgorithmsare

completelyresetsothattheydonothaveanyinformationabout

thepriortrial.Thetrainingandtestingofthealgorithmsisrepeated

100times.

Afterthetestperformanceofthealgorithmsismeasured100

times,themeanvalueandthestandarddeviationofalltrialsare

calculated.ThemeanvalueisnowtheexpectedvalueoftheHMACC

thatanalgorithmwouldmakeonnewpatientdataandthestandard

deviationgivesameasureofhowstablethealgorithmis.Therefore,

ahighexpectationvalueoftheHMACCaswellasasmallstandard

deviationisdesired.

Theresultsobtainedbythealgorithms areshown in a

box-plotdiagraminFig.7.Theplotshowsthemedianaccuracyofthe

algorithms,whichgivesagoodideaoftheexpectedlikelihoodof

correctlyclassifyingnew data.Thebox aroundthe meanvalue

showsthe25thand75thpercentilesoftheaccuracyvalues

dur-ingthe100repetitionsandthewhiskersshowtheminimumand

maximumrangeswhicharenotdetectedasoutliers,whileoutliers

aremarkedascrosses.Thisgivesanideahowsensitivethe

algo-rithmistotheselectionofthetrainingdataset.Inarealapplication,

aftertrainingthealgorithmwithalloftheavailabledata,thisisthe

rangeinwhichonecanexpecttheaccuracytolie.Ingeneral,itis

Table10

Themean()andthestandarddeviation()oftheHMACCforbothmethodsonthe

100testruns.

OBC 0.7125 0.1080

GDM 0.8072 0.1065

expectedthattheexpectationoftheaccuracywillincreaseandthe

standarddeviationwilldecreasewithadditionaltrainingexamples.

Therefore,ifthealgorithmsarecontinuouslytrainedonnewdata,

theirperformancewillincreaseovertime.

InFig.7onecanseethatthemedianaccuracyfortheGDMis

about82%andfortheOBCitis70%.Thevaluesforthemean

accu-raciesandthestandarddeviationsforthetwoalgorithmsaregiven

inTable10.Itisobviousthatbothalgorithmsshowamuchbetter

resultthanrandomguessing,whichwouldbe50%.

4.5. ComputationalefﬁciencyofOBCandGDM

TheaveragetrainingtimeneededforGDMfortrainingon79

patientstakesonaverage3h,whiletheresultingclassiﬁertakes

onaverage0.1sfordeliveringtheoutputoftheclassiﬁcationona

standardpersonalcomputer.Sincegeneticalgorithmscanbeeasily

parallelized,inprincipleitshouldbepossibletoreducethetraining

timeofGDMdrastically.ForOBCthetimethatisneededtotrainthe

classiﬁeron79patientsisonaverage10min,andoncetheclassiﬁer

iscomputedittakes0.3mstoclassifyanovelpatient.Thisshows

thatoncethealgorithmsaretrainedtheycanbeappliedinpractical

settings.

5. Conclusionandoutlook

Clinical evaluation of epileptic patients is based on careful

visualdetectionofsymptomsduringepilepticseizuresinlongterm

Video-EEGrecordings.Weproposedtwomachinelearning

meth-odstohelpthecliniciansduringthediagnosis.Theﬁrstmethod

(OBC)canincorporatetheknowledgeofexpertcliniciansdirectly,

whilethe secondmethod(GDM) canmine ordiscover medical

knowledgeinimplicitformfrommedicaldata.Ourontology-based

classiﬁcationmethod(OBC)isnotbasedoninferencerulessince

itisnotpossibletodeﬁnedeterministicrulescorrelatingthe

pres-enceortheabsenceofaparticularsymptomtoepilepsytypes.Both

methodsconsiderthetimingofoccurrenceofsymptomssinceearly

symptomsare moresigniﬁcantthanlate symptoms(i.e.typical

behavioralpatternsarecloselyrelatedtotheseizureinitialspread

location).Inadditiontothis,thegenetics-baseddatamining

algo-rithmhasafeatureselectionstep,whereafeatureisaparticular

symptom.Throughfeatureselectionitispossibletoidentifywhich

featureplaysanimportantroleintheclassiﬁcationoftheepilepsy

types.Itispossibletoconcludethatthemethodscanbeappliedto

discoverinterestingspatial(symptomsplayinganimportantrole)

andtemporal(sequenceofsymptoms)patternsinmedicaldatafor

aparticulardisease.

Whencomparingtheperformanceofthealgorithmswiththat

ofa singleclinician, both OBCand GDM showa slightly better

performance(evenifnotsigniﬁcant)thantheclinician.When

com-paredwiththeperformanceofthewholepopulationofclinicians

(inourcasesevenclinicians),theaccuracyofthemachine

learn-ingalgorithms(18/30)isslightlyworsethantheaverageaccuracy

of theclinicians (18.5/30) (see Section 4.3).Results of the two

comparisonsshowthattheperformanceofthemachinelearning

algorithmsisapproachingthelevelofexpertsforthecasewhere

boththecliniciansandthemachinelearningmethodsexploitonly

theclinicaldatausedinthispaper(stringsof‘1’sand‘0’sintime).

(9)

thatGDMperformsslightlybetterthanOBC,andbothalgorithms

obviouslyperformmuchbetterthanrandomguessing.The

accu-racyvaluesobtainedbythemethodsareatﬁrstglancenotexpected

bypersonsnormallyworkingwithothermachinelearning

bench-markproblems,sinceinthesebenchmarkproblemstheaccuracy

values obtainedareusually higher thanthe resultsreportedin

thispaper.Therefore,onewouldconsiderthattheresultsarenot

satisfactory.However,when comparingtheperformance ofthe

algorithmswiththatoftheclinicians,itcanbeeasilyseenfrom

theresultsthatthealgorithmsapproachthelevelofexpertiseof

veryexperiencedclinicians,whoparticipatedintheexperiments

reportedinthepaper.

ConsideringtheFleiss’kappaindex,computedvaluesarestrictly

dependentontheconsidereddatasets (theagreement between

OBCandGDMcanbesubstantialorfairincaseofdifferentdataset

considered,evenif thedatasetshavethesame numberof

sub-jects).Kappavaluesinterpretationissomewhatcontroversial,the

referencevalueforouranalysisistheagreementamong allthe

clinicians,whichis0.55.Fromtheresults,itappearsthatthe

devel-opedmethodshavethesameperformanceincomparisontothe

poolofconsideredclinicians,whenallareanalyzingthesamedata.

Aninteresting application of the machine learning methods

developedin thispaperwouldbein theareaoftraining young

cliniciansin theclassiﬁcation of epilepsytypes. It would make

thelearningprocessfortheyoungclinicianseasier.Incaseofless

experiencedclinicians,automaticclassiﬁcationalgorithmscanbe

usedtoteachhowasymptom,ratherthananotherone,couldbe

moreimportantandsigniﬁcantforaspeciﬁcgroupofepilepsy.An

investigationofthistopicwouldbeinterestingforfuturework.In

additiontothis,themachinelearningmethodscouldbeapplied

togenerateanalternativeopinionforclassifyingepilepsytypes.

Thiswouldhelp tovalidatetheopinionof theclinicianin case

ofagreementbetweenthemethodsandtheclinicians,orwould,

incase of divergence,forcetheclinician torevise his/her

deci-sionorlookatthecase inmoredetail. Afurtherapplicationof

themethodsliesinformalizingmedicalknowledgeandbest

prac-tices.Theadvantageofdevelopinganontologyofepilepticseizure

isthesymptomsclassiﬁcationstandard, sothatallclinical

cen-terswouldbeabletorepresentanepilepticseizureinthesame

way.The useof anontologytodeﬁne symptomsof an

epilep-ticseizurewillalsoovercometheproblemofterminologyused

by different clinicalcenters for symptoms description.Thus, it

isimportant tohavea glossaryof thesymptoms: ourontology

willfacethisdiscrepancy(i.e.“owl:sameAs”statementindicates

that twoindividuals actually refer tothesame thing:the

indi-viduals have thesame “identity”,so theyinheritall properties

or “rdfs:seeAlso”statement used to indicatea resource that

mightprovideadditionalinformationaboutthesubjectresource).

Thisis obviously a difﬁcult taskand requires a detailed

analy-sisofthestructureandtheconceptsofmedicalterminologies.It

canbeachievedbyconstructing medicaldomainontologiesfor

representingmedicalterminologysystems.Inthiswork,the

real-izationofanontology(OBC)thatrepresentsthedomainofepilepsy

diagnosiswillhelpinformalizingthesymptomsobservable

dur-inganepilepticseizureandtocorrectlyclassifythetwotypesof

epilepsy.Moreover,afurthersubclassiﬁcationshouldbeobtained

inExTEconcerningthedifferentlobes(parietal,insular,occipital

andfrontal).

An important bottleneck that always arises when machine

learningistransferredtopracticalimplementationisthe

knowl-edgetransferfromtheexperttotheagent.Inthiscase,clinicians

areabletoclassifypatientsobservingtheseizureusingtheir

sensi-tivityacquiredbyexperience,whichisdifﬁculttotranslateintothe

artiﬁcialintelligencealgorithm.Ithastobenoticedthatinstandard

clinicalroutine,cliniciansbasetheirevaluationalsoon

anamnes-ticandneuroimagingdata.Inordertoimprovetheperformance

ofautomaticclassiﬁcationalgorithms,moreinformationshouldbe

encodedandsubsequentlyprovidedtotheclassiﬁers.

In healthcare,data isfrequentlynot idealorincomplete.In

ourcase, a problemcanarise ifcliniciansmake errors in

com-pilingthetableofsymptomsoccurrenceduringepilepsyseizure.

Thesemistakescanbecorrectedbecausewehaveasecond

reli-ablekindofdatathatisthevideorecordingsofepilepsyseizures.

Also,healthcarerecordsareunstructuredinformation.Our

onto-logicalapproachisalsosuitedforsuchkindofdata,sinceontologies

canbestructuredtoautomaticallyclassifyunstructuredtext,once

correspondencesaredeﬁned.Futureworkcouldaddnewclinical

informationofapatient,likeanewsymptom,“staring”(a

motion-lessstatewiththeinterruptionofallthepreviousactivities),when

patientsstareatseizureonset,andwhichismorefrequent

dur-ingtemporallobeepilepsies.Anadditionalvariablecouldimprove

theclassiﬁer, even moreif thatvariable had higheroccurrence

inasetofpopulationpatients. Theelectroencephalogram(EEG)

detectselectricalactivityinthebrainanditisafundamentaltool

fortheanalysisinthediagnosisofepilepsybecauseelectrical

alter-ations,oftenveryindicative,canalsobepresent intheabsence

ofothersymptoms.Otherdiagnostictestsincludemagnetic

res-onanceimagingandlaboratorytests,whichcanverifyorexclude

speciﬁccauses.Inthissense,withthehelpoftheproposed

algo-rithms,differenttypesofinformationcouldbeuniﬁedandthen

processedtoobtainamoreaccuratediagnosis.Sinceitisknown

thatcliniciansusedifferentsourcesofinformationotherthanthe

codedclinicaldatausedinthispaperfordecision,itremainstobe

seenhowthealgorithmsimprovetheirperformanceifadditional

informationisavailable.Moreover,thereliabilityofthemethods

tocorrectlyclassifyco-morbidpatientscouldbeevaluatedinorder

toseeifthepresentedmethodsareabletolocalizetheepileptic

focusalthoughsomesymptomsmaybeproducedbysecondary

pathology.

Thepaperisafeasibilitystudyontheapplicationofmachine

learningtechniquesfortheautomaticlocalizationofthe

epilepto-geniczone.Althoughtheepilepsytypesanalyzedarecharacterized

by distinctive signs during the seizure manifestations, patients

diagnosisbasedonthevisualinspectionofsymptomsduringthe

ﬁrst60scanbesubjective,asshownbycliniciandisagreementson

the30patientsanalyzed.

TheOBCandGDMmethodspresentedcanbeappliedin

min-ingmedicalknowledgeasshowninthepaper,orevenbeusedto

discoveranovelideaintheareaofmedicaldiagnosisofa

partic-ulardisease.Itisworthunderliningthatthismakestheautomatic

classiﬁcationsystemscreativeandinteractive,givingbackthe

dis-covered knowledge totheclinicians.Thisin turn improvesthe

existingknowledgeofclassifyingepilepsytypes.

Acknowledgements

Thiswork is supportedin partwithintheACTIVE European

project (FP7-ICT-2009-6-270460) and the EuRoSurge European

project(FP7-ICT-2011-7-288233).Wewouldliketothankthetwo

anonymousreviewersfortheirconstructivesuggestionsand

com-ments.

References

[1]Kahane P,Landré E, Minotti L, Francione S,Ryvlin P. The Bancaudand Talairachviewontheepileptogeniczone:aworkinghypothesis.Epileptic Dis-ord2006;8:16–26.

[2]MiserocchiA,CascardoB,PiroddiC,FuschilloD,CardinaleF,NobiliL,etal. Surgeryfortemporallobeepilepsyinchildren:relevanceofpresurgical eval-uationandanalysisofoutcome.JNeurosurgPediatr2013;11(3):1–12[Clinical article].

[3]LüdersH,AcharyaJ,BaumgartnerC,BenbadisS,BleaselA,BurgessR,etal. Semiologicalseizureclassiﬁcation.Epilepsia1998;39(9):1006–13.

(10)

[4]Nobili L,Francione S,Mai R, Cardinale F, CastanaL, TassiL, etal. Sur-gical treatment of drug-resistant nocturnal frontal lobe epilepsy. Brain 2007;130(2):561–73.

[5]TassiL,MeroniA,DeleoF,VillaniF,MaiR,RussoGL,etal.Temporallobe epilepsy:neuropathologicalandclinicalcorrelationsin243surgicallytreated patients.EpilepticDisord2009;11(4):281–92.

[6]MaiR,SartoriI,FrancioneS,TassiL,CastanaL,CardinaleF,etal.Sleep-related hyperkineticseizures:alwaysafrontalonset?NeurolSci2005;26(3):220–4.

[7]JacksonGD,BriellmannRS,KuznieckyRI.Temporallobeepilepsy.In:Kuzniecky RI,JacksonGD,editors.Magneticresonanceinepilepsy.2nded.Burlington: AcademicPress;2005.p.99–176[chapter4].

[8]GuptaA,JeavonsP,HughesR,CovanisA.Auraintemporallobeepilepsy: clin-icalandelectroencephalographiccorrelation.JNeurolNeurosurgPsychiatry 1983;46(12):1079–83.

[9]BurgerS,StiegerB.Ontology-basedclassiﬁcationofunstructured informa-tion.In:Theﬁfthinternationalconferenceondigitalinformationmanagement (ICDIM).IEEE;2010.p.254–9.

[10]KovacsT.Genetics-basedmachinelearning.In:RozenbergG,BckT,KokJ, edit-ors.Handbookofnaturalcomputing.Berlin,Heidelberg:Springer;2012.p. 937–86.

[11]SowaJ.Ontology,metadata,andsemiotics.In:GanterB,MineauG,editors. Conceptualstructures:logical,linguistic,andcomputationalissues,vol.186 ofLectureNotesinComputerScience.Berlin,Heidelberg:Springer;2000.p. 55–81.

[12]RectorAL,RogersJE,PoleP.TheGALENhighlevelontology.StudHealthTechnol Inform1996;34:174–8.

[13]BosL,DonnellyK.SNOMED-CT:theadvancedterminologyandcodingsystem foreHealth.StudHealthTechnolInform2006;121:279–90.

[14]Bodenreider O.Theuniﬁed medicallanguage system (UMLS):integrating biomedicalterminology.NucleicAcidsRes2004;32(suppl.1):D267–70.

[15]KarlssonD.Adesignandprototypeforadecision-supportsystemintheﬁeld ofurinarytractinfections-applicationofOpenGALENtechniquesforindexing medicalinformation.StudHealthTechnolInform2001;84:479–83.

[16]ElevitchFR.SNOMED-CT:electronichealthrecordenhancesanesthesiapatient safety.AANAJ2005;73(5):361–6.

[17]BodenreiderO.Biomedicalontologiesinaction:roleinknowledge manage-ment,dataintegrationanddecisionsupport.YearbMedInform2008;47:67–79.

[18]RosseC,MejinoJrJL.Areferenceontologyforbiomedicalinformatics:the foundationalmodelofanatomy.JBiomedInform2003;36(6):478–500.

[19]RubinDL,DameronO,BashirY,GrossmanD,DevP,MusenMA.Usingontologies linkedwithgeometricmodelstoreasonaboutpenetratinginjuries.ArtifIntell Med2006;37(3):167–76.

[20]ZhangL,WangZ.Ontology-basedclusteringalgorithmwithfeatureweights.J ComputInformSyst2010;6(9):2959–66.

[21]JanninP,MorandiX.Surgicalmodelsforcomputer-assistedneurosurgery. Neu-roimage2007;37(3):783–91.

[22]LeeC-S,WangM-H,AcamporaG,LoiaV,HsuC-Y.Ontology-basedintelligent fuzzyagentfordiabetesapplication.In:IEEEsymposiumonintelligentagents, 2009(IA’09).IEEE;2009.p.16–22.

[23]HorrocksI,Patel-SchneiderPF,BoleyH,TabetS,GrosofB,DeanM,etal.SWRL: asemanticwebrulelanguagecombiningOWLandRuleML.W3CMember Submission2004;21:79.

[24]PanJ,StoilosG,StamouG,TzouvarasV,HorrocksI.f-SWRL:afuzzy exten-sionofSWRL.In:SpaccapietraS,AbererK,Cudr-MaurouxP,editors.Journal onDataSemanticsVI,vol.4090ofLectureNotesinComputerScience.Berlin, Heidelberg:Springer;2006.p.28–46.

[25]GorunescuF.Datamining:concepts,modelsandtechniques.Berlin, Heidel-berg:Springer-Verlag;2011.

[26]SoniJ,AnsariU,SharmaD,SoniS.Predictivedataminingformedicaldiagnosis: anoverviewofheartdiseaseprediction.IntJComputAppl2011;17(8):43–8.

[27]DangareCS,ApteSS.Improvedstudyofheartdiseasepredictionsystemusing dataminingclassiﬁcationtechniques.IntJComputAppl2012;47(10):44–8.

[28]GanesanN,VenkateshK,RamaMA,PalaniAM.Applicationofneuralnetworks indiagnosingcancer diseaseusingdemographicdata. IntJComput Appl 2010;1(26):76–85.

[29]ChuC-M,ChienW-C,LaiC-H,BludauH-B,TschaiH-J,LuPaiS-MH,etal.A Bayesianexpertsystemforclinicaldetectingcoronaryarterydisease.JMed Sci2009;4:187–94.

[30]SrinivasK,RaniBK,GovrdhanA,KarimnagarJ.Applicationsofdatamining techniquesinhealthcareandpredictionofheartattacks.IntJComputSciEng 2010;2(2):250–5.

[31]DiNoiaT,OstuniVC,PesceF,BinettiG,NasoD,SchenaFP,etal.Anendstage kid-neydiseasepredictorbasedonanartiﬁcialneuralnetworksensemble.Expert SystAppl2013;40(11):4438–45.

[32]FloydS.Dataminingtechniquesforprognosisinpancreaticcancer[Master’s thesis].USA:WorcesterPolytechnicInstitute;2007.

[33]DreiseitlS,Ohno-machadoL,KittlerH,VinterboS,BinderM.Acomparisonof machinelearningmethodsforthediagnosisofpigmentedskinlesions.JBiomed Inform2001;34:28–36.

[34]Kolc¸eE,FrasheriN.Aliteraturereviewofdataminingtechniquesusedin healthcaredatabases.In:MarkovskiS,GusevM,editors.ProceedingsofICT innovations.2012.p.577–82.

[35]PrasadB,PrasadP,SagarY.Acomparativestudyofmachinelearning algo-rithmsasexpertsystemsinmedicaldiagnosis(asthma).In:MeghanathanN, KaushikB,NagamalaiD,editors.Advancesincomputerscienceandinformation technology,vol.131ofcommunicationsincomputerandinformationscience. Berlin,Heidelberg:Springer;2011.p.570–6.

[36]FanJ,ShaoC,OuyangY,WangJ,LiS,WangZ.Automaticseizuredetectionbased onsupportvectormachineswithgeneticalgorithms.In:WangT-D,LiX,Chen S-H,WangX,AbbassH,IbaH,etal.,editors.Simulatedevolutionandlearning, vol.4247ofLectureNotesinComputerScience.Berlin,Heidelberg:Springer; 2006.p.845–52.

[37]MeamarzadehH,KhayyambashiM,SaraeeM-H.Extractingtemporalrulesfrom medicaldata.In:Internationalconferenceoncomputertechnologyand devel-opment(ICCTD09),vol.1.2009.p.327–31.

[38]MenaLJ,OrozcoEE,FelixVG,OstosR,MelgarejoJ,MaestreG.Machine learn-ingapproachtoextractdiagnosticandprognosticthresholds:applicationin prognosisofcardiovascularmortality.In:Computationalandmathematical methodsinmedicine;2012,6pp.

[39]FreitasA.Asurveyofevolutionaryalgorithmsfordataminingandknowledge discovery.In:GhoshA,TsutsuiS,editors.Advancesinevolutionarycomputing, naturalcomputingseries.Berlin,Heidelberg:Springer;2003.p.819–45.

[40]WasanSK,BhatnagarV,KaurH.Theimpactofdataminingtechniqueson medicaldiagnostics.DataSciJ2006;5:119–26.

[41]HosseinkhahF,AshktorabH,VeenR,OwrangM.Challengesindataminingon medicaldatabases.IGIGlobal;2009.p.1393–404[chapter83].

[42]Satyanandam N,SatyanarayanaCh, RiyazuddinMd,ShaikA.Data mining machinelearningapproachesandmedicaldiagnosesystems:asurvey.IntJ ComputOrgTrends2012;2(3):53–60.

[43]NganPS,WongML,LamW,LeungKS,ChengJCY.Medicaldataminingusing evolutionarycomputation.ArtifIntellMed1999;16(1):73–96.

[44]TanK,YuQ,HengC,LeeT.Evolutionarycomputingforknowledgediscovery inmedicaldiagnosis.ArtifIntellMed2003;27(2):129–54.

[45]DiNuovoA,CataniaV.Genetictuningoffuzzyruledeepstructuresfor efﬁ-cientknowledgeextractionfrommedicaldata.IEEEIntConfSystManCybernet 2006;6:5053–8.

[46]BacheK,LichmanM.UCImachinelearningrepository.URL:http://archive.

ics.uci.edu/ml[accessed25.11.13].

[47]Axer H,JantzenJ,vonKeyserlingkDG.Anaphasiadatabaseonthe inter-net: a model for computer-assisted analysis in aphasiology. Brain Lang 2000;75(3):390–8.

[48]DamHH,AbbassHA,LokanC,YaoX.Neural-basedlearningclassiﬁersystems. IEEETransKnowlDataEng2008;20(1):26–39.

[49]UrbanowiczRJ,MooreJH.Learningclassiﬁersystems:acompleteintroduction, review,androadmap.ArtifEvolAppl2009;2009:1–25.

[50]KokolP,PohorecS,tiglicG,PodgorelecV.Evolutionarydesignofdecision treesformedicalapplication.WileyInterdisciplinaryReviews:DataMining andKnowledgeDiscovery2012;2(3):237–54.

[51]SiregarP,ToulouseP.Model-baseddiagnosisofbraindisorders:aprototype framework.ArtifIntellMed1995;7(4):315–42.

[52]KohaviR.Astudyofcross-validationandbootstrapforaccuracyestimation andmodelselection.In:Proceedingsofthe14thinternationaljoint confer-enceonArtiﬁcialintelligence–vol.2,IJCAI-95.SanFrancisco,CA,USA:Morgan KaufmannPublishersInc.;1995.p.1137–43.

[53]KnublauchH,FergersonRW,NoyNF,MusenMA.TheProtégéOWLplugin: anopendevelopmentenvironmentforsemanticwebapplications.In:The SemanticWeb-ISWC2004.Springer;2004.p.229–43.

[54]NoyNF,McGuinnessDL.Ontologydevelopment101:aguidetocreatingyour ﬁrstontology[Tech.rep.].StanfordKnowledgeSystemsLaboratory(KSL-01-05) andStanfordMedicalInformatics(SMI-2001-0880);2001.

[55]PeroneCS.Pyevolve:apythonopen-sourceframeworkforgeneticalgorithms. SIGEVOlution2009;4(1):12–20.

[56]HallM,FrankE,HolmesG,PfahringerB,ReutemannP,WittenIH.TheWEKA dataminingsoftware:anupdate.SIGKDDExplorations2009;11(1):10–8.

[57]CurkT,DemarJ,XuQ,LebanG,PetrovicU,BratkoI,etal.Microarraydatamining withvisualprogramming.Bioinformatics2005;21:396–8.

[58]TingK.Confusionmatrix.In:SammutC,WebbG,editors.Encyclopediaof machinelearning.USA:Springer;2010.p.209.

[59]WhitleyD.TheGENITORalgorithmandselectionpressure:whyrank-based allocationofreproductivetrialsisbest.In:Proceedingsofthethird interna-tionalconferenceongeneticalgorithms.MorganKaufmann;1989.p.116–21.

[60]Cochran WG.The 2 testof goodness of ﬁt. AnnMath Stat 1952;23(3): 315–45.

[61]FleissJL.Statisticalmethodsforratesandproportions,2ndedition,Wiley seriesinprobabilityandmathematicalstatistics.NewYork:JohnWiley&Sons; 1981.

[62]LandisJR,KochGG.Themeasurementofobserveragreementforcategorical data.Biometrics1977;33(1):159–74.