ContentslistsavailableatScienceDirect
Applied
Soft
Computing
jo u r n al hom e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / a s o c
Naïve
Bayes
ant
colony
optimization
for
designing
high
dimensional
experiments
M.
Borrotti
a,b,∗,
G.
Minervini
d,
D.
De
Lucrezia
e,
I.
Poli
b,caInstituteofAppliedMathematicsandInformationTechnologies,CNR-IMATI,viaBassini15,20133Milan,Italy bEuropeanCentreforLivingTechnology,Ca’FoscariUniversityofVenice,SanMarco2940,30124Venice,Italy
cDepartmentofEnvironmentalScience,InformaticsandStatistics,Ca’FoscariUniversityofVenice,Dorsoduro2137,30123Venice,Italy dDepartmentofBiology,UniversityofPadua,ViaU.Bassi58,35121Padua,Italy
eExploraBiotechS.r.l.,ViadellaLibertá9,30175Venice,Italy
a
r
t
i
c
l
e
i
n
f
o
Articlehistory:
Received15October2015 Receivedinrevisedform8July2016 Accepted10August2016
Availableonline22August2016 Keywords:
Antcolonyoptimization NaïveBayesclassifier Experimentaldesign Enzymeengineering Highdimensionality
a
b
s
t
r
a
c
t
Inalargenumberofexperimentalproblems,highdimensionalityofthesearchareaandeconomical constraintscanseverelylimitthenumberofexperimentalpointsthatcanbetested.Withinthese con-straints,classicaloptimizationtechniquesperformpoorly,inparticular,whenlittleaprioriknowledgeis available.Inthisworkweinvestigatethepossibilityofcombiningapproachesfromstatisticalmodeling andbio-inspiredalgorithmstoeffectivelyexploreahugesearchspace,samplingonlyalimitednumberof experimentalpoints.Tothispurpose,weintroduceanovelapproach,combiningantcolonyoptimization (ACO)andnaïveBayesclassifier(NBC)thatis,thenaïveBayesantcolonyoptimization(NACO) proce-dure.WecompareNACOwithothersimilarapproachesdevelopingasimulationstudy.Wethenderive theNACOprocedurewiththegoaltodesignartificialenzymeswithnosequencehomologytotheextant one.Ourfinalaimistomimicthenaturalfoldof200aminoacids1AGYserineesterasefromFusarium solani.
©2016ElsevierB.V.Allrightsreserved.
1. Introduction
Enzymesare biologicalmoleculesthat catalyzethousandsof
chemicalreactionsthatsustainlife.Eachenzymeisapolymer
gen-eratedfromaseriesofupto20differentaminoacids.Polymers
canbe representedaswordsformed accordingtothealphabet
a={a1,a2,a3,...,a20}.Theymaydifferinlength,sequenceand
aminoacidorder.Asecondaryandtertiarystructure,whichdefines
enzymeactivity,isassociatedtoeachsequence[1].Enzyme
activi-tiescanbeassociatedtoeachofthesecombinationsofaminoacids
throughexperimentation.Biologicalexperimentationaimsto
opti-mizethesequenceofaminoacidsinordertofindenzymeswith
thebestactivities.Thenumberofpossiblecombinationsofamino
acidsrapidlyincreaseleadingtoacombinatorialexplosionofthe
searchspace.Designingartificialenzymeswithdesiredproperties
is knownin computationalbiology asEnzymeEngineering[2,3].
EnzymeEngineeringisanimportanttaskinnumerousfieldssuch
∗ Correspondingauthorat:InstituteofAppliedMathematicsandInformation Technologies,CNR-IMATI,viaBassini15,20133Milan,Italy.
E-mailaddress:matteo.borrotti@mi.imati.cnr.it(M.Borrotti).
aschemicals,pharmaceuticals,fuel,foodoragriculturaladditives
[4].
In this contextit isnecessary todevelopnewenzyme
engi-neeringprocedures basedoncomputationalapproaches ableto
identifybestsolutionswiththelowestnumberofcostlyandtime
consumingexperimentations.Differentoptimizationapproaches
havebeenproposedintheliteraturetotackleproblemswithhigh
dimensionalandcomplexsearchspaces[5–11].Someapproaches
arebasedonEvolutionaryComputationAlgorithms[12].Genetic
Algorithms(GAs)[13,14],forexample,usetechniquesinspiredby
naturalevolution,i.e.mutation,selectionandcrossover,inorderto
optimizecomplexfunctions.Theapplicationoftheseapproaches
islimitedbyseveralissuessuchastheneedofsolidapriori
knowl-edgeoftherelationshipbetweenthestructureandfunctionofan
enzyme,thevastnumberofpossiblemodificationsofanexisting
enzyme,thelackofefficientscreeningproceduresabletoreduce
thenumberofmodifiedenzymestobetestedandthe
computa-tionaleffortneededtocreatenewenzymes.Apromisingsolution
iscombiningmetaheuristicalgorithmsandstatisticalmodels.This
isanovelresearchareathataddressesproblemscharacterizedby
largedesignspace,high-orderinteractionsbetweenvariables,and,
complexandnon-linearexperimentalsurfaces[15].Some
exam-plesofthesehybridizedmethodscanbefoundin[16–21].Arecent
http://dx.doi.org/10.1016/j.asoc.2016.08.018
M.Borrottietal./AppliedSoftComputing49(2016)259–268
approach isproposedin [21]where an optimizationalgorithm,
calledCo-InformationCompositeLikelihood(COIL),basedonthe
evolutionaryparadigm is introducedby coupling cross-entropy
sampling[22]andcompositelikelihoodprinciples[23]todesign
novelenzymeswithimprovedfunctionalities.
Inthispaper,weintroduceanovelapproachthatcombinesant
colonyoptimization(ACO)[24–26]andthenaïveBayesClassifier
(NBC)[27],callednaïve-Bayesantcolonyoptimization(NACO).The
ACOapproachhasbeenwidelyusedinmanyoptimization
prob-lemsrangingfromsequentialorderingstudies[28]toopenshop
scheduling[29].ACOhasbeenalsoappliedtobiologicalproblems
suchasDNAsequencingproblems[30,31].
SeveraltheoreticalandpracticalfeaturesofNBChavebeen
stud-iedthathaveshownitsreliabilityintermsofclassificationaccuracy
[32–37].Oneofthemostpowerfulproceduresproposedinthe
bio-logicalfieldisanalgorithmforgeneticbiomarkerselectionsand
subjectclassificationsfromthesimultaneousanalysisof
genome-wide Single-NucleotidePolymorphism (SNP) databased onthe
naïveBayesframework[37].
ThekeyideaofNACOistoincludeinformationachievedviathe
NBCmethodinthepathconstructionmechanismofACO,inorderto
drivethesearchprocesstowardsthetargetregioninamuchmore
effectiveway.NACOwasevaluatedbothinasimulationstudyand
inarealcaseofbuildingartificialenzymes.Inparticular,theNACO
algorithmwasconstructedandasimulationcomparisonwithaset
ofotherproceduresfrequentlyusedforthispurposewas
devel-oped.Thealgorithmwasthenderivedwiththegoalofdesigning
artificialenzymeswithnosequencehomologytoextantones;the
finalaimbeingtomimicthenaturalfoldof200aminoacidlong
1AGYserineesterasefromFusariumsolani[38].
Thispaperisorganizedasfollows.InSection2theoptimization
problemisformalizedandinSection3,webrieflydescribetheACO
andNBCalgorithm.InSection4,weintroduceourapproach,
naïve-Bayesantcolonyoptimization(NACO).InSection5,weevaluate
NACOperformanceinasimulationstudy,comparingresultswith
well-knowntechniquesand,inSection6,weapplyNACOtothe
designofnewenzymes.Finally,someconclusionsaredrawnin
Section7.
2. Theoptimizationproblem
Enzymeengineeringcanbedescribedasacombinatorial
opti-mizationproblemP=(X,f)inwhich:
-fis theobjective function tobeoptimized and defined as f:
D1×D2×···×Dd→R+whereD1,...,Ddarethedomainsofthe
followingvariablesinX={x1,...,xd};
-Xisthesetofallpossiblecombinationsx=(x1li,...,xdli)where
xili ∈Diindicatesthevalueofthevariablexi
∀
iwherei=1,...,d.Thefunctionfcalculatesaresponsevalueforeachcombinationof
variables,y=f(x1li,...,xdli)withli=1,...,k.Withinthissetting,
thefinalaimistofindthecombination,orsolution,x∗ ∈Xwith
maximumresponsevalue,thatis,f(x∗)>f(x)
∀
x∈X.Toaddresstheenzymeengineeringproblemweproposethat
the variables represent the set of amino-acids in the enzyme
sequence,thevariabledomainisthealphabeta,Xisthesetofall
possiblecandidateenzymestobetestedandtheobjectivefunction
istheenzymeactivitytobemaximized.
3. AntcolonyoptimizationandnaïveBayesclassifier
In this section webriefly introduce ant colonyoptimization
(ACO)andnaïveBayesclassifier(NBC),whichweintendtocombine
inannewapproachtohighdimensionaloptimizationproblems.
3.1. Antcolonyoptimization(ACO)
Antcolonyoptimization(ACO)[24]isametaheuristicalgorithm
introducedforcomplexcombinatorialproblems.Itisinspiredby
thebehaviour of specificreal antcolonies in search of foodin
theirenvironment.Innatureantsexplorethesurrounding
environ-mentforfoodsourcethroughrandomwalks.Oncefoundthefood,
antsreturntotheircolonyleavingbehindatrailofpheromones.
Pheromonesarevolatilecompoundsthatquicklydryoffsothata
shortpathfromcolonytofoodsourceismorelikelytopreserve
detectableamountofpheromones.Thisinturnwillallowmore
antstofollowthatpathandthusreinforcethepheromonetrails.
Thispositivefeedbackeventuallyleadsallantstouseasinglepath
fromthecolonytothefood.
ThemainideaoftheACOalgorithmistomimicthisbehaviour
withartificialantswalkingaroundagraphrepresentingthe
envi-ronment;morespecifically,theproblemtosolveisillustratedin
Fig.1.
ACOalgorithm isappliedtoproblems that canbedescribed
asagraphG=(N,A)whereNisthesetofnodesandAthesetof
arcsthatfullyconnectthenodes.Aweight!i,j,calledpheromone
value,isassociatedwitheacharcwhichconnectsnodeiwithnode
j.The pheromone valuerepresents theattractiveness of a
spe-cific arc for theants:the higherthe amountof pheromone on
anarc,thehighertheprobabilitythat antswillchooseitwhen
constructingsolutions.Inaddition,aheuristicvalue"i,j,which
rep-resentsaprioriinformation,isintroducedtomovefromnodeito
nodej.
The construction of the ACOalgorithm includes the
follow-ingphases[26,39]:(i)Constructantsolutions,(ii)Daemonactions
and(iii)Updatepheromone.Thefirstphase,Constructantsolutions,
representstheprocedureneededfor antstoconstructsolutions
incrementally:startingfromafirstnode,antsaddanewnodeat
eachiterationoftheprocedureinordertobuildtheentiresolution.
Anantdecideswheretogonextinaccordancewithpheromone
val-ues!i,jandtheheuristicvalues"i,j.Therelativeinfluenceofthese
twovaluesisweighedinaccordancetotwoparameters,called˛>0
andˇ≥0,respectively.Thesecondphase, Daemonactions,
com-prisesallproblem-specificoperationsthatmaybeconsideredfor
boostingtheperformanceofACOalgorithms.Themainexample
ofsuchoperationsistheintroductionoflocalsearchtechniques
[40,41].Thethirdphase,Update pheromone,includesthe
proce-duretoupdatespheromonevalues.Thisprocedureisdividedin
twophases.First,pheromoneevaporationisappliedtodecrease
pheromonevalues.Thedegreeofdecreasedependsonthe
param-eter#∈[0,1],calledtheevaporationrate.Theaimofpheromone
Fig.1.Representationofapossiblegraphwhereantsmove,fromastartingnodeto adestinationnode.
M.Borrottietal./AppliedSoftComputing49(2016)259–268
evaporationistoavoidanunlimitedincreaseinpheromone
val-ues and to allow the ant colony to forget poor choices made
previously.Apheromonedepositisthenappliedtoincreasethe
pheromonevalues thatbelongtogoodsolutionsthat antshave
generated.
In the literature there are many different versions of this
algorithm andoneofthemostpowerfulversionsof ACOis the
MAX–MINAntSystem(MMAS)[42].MMASisableto
exten-sivelyexploitthesearch spacebyallowingonlya singleantto
depositpheromoneaftereachiteration,forexample,theantthat
reachedthebestsolutioninthecurrentiteration,called
interation-bestsolutionsib.Moreover,itusesasimplemethodforlimitingthe
strengthofthepheromonetrailthateffectivelyavoidspremature
convergenceofthesearch.
3.2. NaïveBayesclassifier(NBC)
ThenaïveBayesclassifier[27]isaprobabilisticmethod,based
onBayesrule,whichassumesthatthevariables{x1,...,xd}inthe
setXbeallconditionallyindependentfromeachother,giventhe
responsevaluey.Inotherwords,anaïveBayesclassifierassumes
thatthepresence(orabsence)ofaparticularvariableinacandidate
solutionisunrelatedtothepresence(orabsence)ofanyother
vari-ableand,further,assumesthatinteractionsbetweenvariablesdo
notinfluencetheresponsevalue.Moreprecisely,consideringa
can-didatesolutionx=(x1li,...,xdli)wherexili ∈Diindicatesthevalue
ofthevariablexi
∀
iwherei=1,...,d,wecancalculateprobabilityP(x1li,...,xdli|y)asfollows: P(x1li,...,xdli|y)= d
!
i=1 P(xili|y). (1)TheaimofthenaïveBayesclassifieristoderivethe
probabil-itydistributionofy,foreachnewcandidatesolutionnotalready
tested.StartingfromEq.(1),itispossibletoderivethenaïveBayes
algorithminthefollowingway:yisassumedtobediscretewithr
possiblevaluesandvariables{x1,...,xd}canassumeanydiscrete
orrealvalues.Theprobabilitythatywilltakeitsrthpossiblevalue,
accordingtoBayesrule,isgivenas:
P(y=yr|x1li,...,xdli)=
P(y=yr)P(x1li,...,xdli|yr)
"
rj=1P(y=yj)P(x1li,...,xdli|yj)
, (2)
wherethesumistakenoverallpossiblevaluesyjwithj=1,...,r.
Assumingthen{x1,...,xd}tobeconditionallyindependentgiven
y,wecanwritethefollowingequation:
P(y=yr|x1li,...,xdli)=
P(y=yr)
#
di=1P(xili|yr)"
rj=1P(y=yj)
#
di=1P(xili|yj). (3)
Eq.(3)isthefundamentalequationforthenaïveBayesclassifier.
Givenanewcombinationofvariablesxnew,thisequationallowsto
derivetheprobabilitythatywilltakeapossiblevalueinitsdiscrete
domain,givenxnewandthedistributionsP(y)andP(x
ili|y)withi=1,
...,destimatedfromtheavailablesetofcandidatesolutionsand
theassociatedsetofresponsevalues.
4. Naïve-Bayesantcolonyoptimization(NACO)
Inbuildingthenaïve-Bayesantcolonyoptimization(NACO),
we assumethat a colony ofants mutuallycollaborate toreach
theoptimum in ahighdimensional andcomplex searchspace.
Thiscolonysharesinformation andcollaborates viapheromone
values and information extraction by NBC. Pheromone update
rules are appliedonce allants have builttheirsolutions based
Fig.2. FlowdiagramofNACOapproach.
on the iteration-best solution, sib, and the probability
distri-bution to obtain a certain response given a specific set of
variables.
TheconstructionofNACOapproach(seeFig.2)canbedescribed
asfollows:
1.Initializationoftheprobabilitywithwhichanantwchoosesto
gofromnodeitonodej.Eachnodehasthesameprobabilityto
bechosen.
2.Generationofasetofcandidatesolutionsofsizem(colonyof
ants).
3.Evaluationoftheresponseyofeachcandidatesolutions
accord-ingtotheobjectivefunctionf.
4.Iftheoptimalsolutionis found(orthemaximumnumberof
iterationsisreached)stoptheprocedure.Otherwise,continue.
5.Identificationoftheiteration-bestsolution,sib.
6.CalculationoftheNBConthecurrentsetofcandidatesolutions.
Ateach iteration, NBCfocuses onthevalues oftheresponse
greaterthanaproblem-specificthreshold$,with$ ∈R.
7.Updatingtheprobabilitywithwhichanantwchoosestogofrom
nodeitonodejusingtheinformationextractedinpoints3and
4.
8.Gotopoint2.
Ateachiteration,thecandidatesolutionresponseiscomputed.
Inordertoupdatetheprobabilitywithwhichanantwchoosesto
gofromnodeitonodej,NACO,first,identifiestheiteration-best
solution,sibandthen,inordertofocusonthecandidatesolutions
thathavereachedthehighestresponses,candidatesolutionsare
dividedintwoclassesaccordingtoa certainconstantthreshold
($, with $ ∈R) such as the specific quantile of the response
M.Borrottietal./AppliedSoftComputing49(2016)259–268
function,$ canbeequal tothefirstquartile.Consequently,the
followingrulewillbeapplied:
yr=
$
yr=1=1 I(y≤$)=1
yr=0=0 I(y≤$)=0,
whereI()istheindicatorfunction.
Giventhesetofcandidatesolutionsandtheassignedclasses,
theNBCderivestheprobabilitiesP(y=yr)andP(xil
i|yr)withr={0,
1}.Consideringcategoricalvariablestheseprobabilitydistributions
arerepresentedwithconditionalprobabilities,whichareestimated
fromthedatabycountingtheoccurrencesof eachvariableand
classlabelyr.Allthepossiblecombinationsobtainedwiththexili
inthecandidatesolutionswithyr=1=1arethenconsideredandthe
correspondingP(y=yr|xnew)calculated.Thepossiblesolutionwith
thehighestprobabilityisthen selectedandtherelatedP(xili|yr)
areusedto weighonthearcs.These weights arenamed Naïve
Informationanddenoted%.
4.1. Probabilityupdateprocedure
InNACO,eacharcdescribestheprobabilityofmovingfromnode
itonodej.Thisprobabilityisusuallycalculatedonthebasisof
pheromonequantityandheuristicinformation,i.e.apriori
infor-mationconcerningtheproblem.Inourapproachthisprobability
alsodependsonNaïveInformation%.
Giventhestateofthegraphatiterationt−1ofthealgorithm,
theNaïveInformation%ijisextractedfromthesetofsolutions
eval-uatedatiterationtandthesetof{%ij}isupdated.Subsequently,
theiteration-bestsolutionsibisidentifiedandthecorresponding
pheromonevalueisupdated.
Theprobabilityatiterationtofgoingfromnodeitonodejwill
beobtainedbycombiningthepheromonevalueswiththeheuristic
values(ifavailable)andwithNaïveInformationasfollows:
pij(t)= [!ij(t)] ˛[" ij]ˇ[%ij(t)]ı
"
p∈Ni[!ip(t)] ˛[" ip]ˇ[%ip(t)]ı∀
p∈Ni, (4)where!ij(t)istheamountofpheromonevalueand%ij(t)isthe
NaïveInformationonarc(i,j)atiterationtand"i,jisthe
heuris-ticvalueonthesamearc(i,j).Ni denotesthesetofnodesthat
canbevisitedfromnodeiand˛,ˇandıaretheparametersthat
controltherelativeweightofthepheromonetrail,heuristic
infor-mationandNaïveInformation.Afteralltheantshavecompleted
theirtour,pheromoneevaporationonallarcsisappliedconsistent
withevaporationfactor#,#∈(0,1].
ACOperformancestronglydependsontheavailabilityof
rel-evant apriori information of theproblem. The introduction of
NaïveInformationaimstoavoidtheneedforheuristic
informa-tionin order totackle high-dimensional design problemswith
computationally-expensive complex functions without a priori
knowledge.
5. Simulationstudy
WeevaluatedtheefficiencyandrobustnessofNACOapproach
by developing a simulation study where we compared this
approachwithotheroptimizationtechniques.Weconsideredtwo
benchmark functions selected to test the performance of the
techniques.Thesefunctionsarefrequentlyusedtocompare
opti-mizationproceduresintheliterature[21,43].
5.1. Benchmarkfunctions
In oursimulation studytheresponse value ofeach solution
wasderivedassumingthemodely=ϕh(x)+
'
,'
∼N(0,(h2),whereh=1,2denotestwotypesofresponsesurface.Theoptimalsolution
minimizesthedeterministicfunctionϕh,whichwasinterpretedas
theobjectivefunction.WecomputedMonteCarlosimulation
com-posedof50runsandcharacterizedbydvariableswithklevelseach.
Arunwasstoppediftheglobalminimumwasachievedbeforea
maximumnumberofiterationsT=30.
TheResponseSurface1takesthefollowingform:
ϕ1(x1,...,xd)=10d+ d
%
i=1 k%
l=1&
ˇil2Iil−10cos(2)ˇil)Iil'
, (5)whereIil=1,ifxi=xil,andIil=0,otherwise.Foralli=1,...,d,the
constantsˇil,l=1,...,k,formedanequallyspacedgridofkelements
on(−5,5].Thefunctionϕ1hasmultiplelocalminimawithsimilar
valueswithrespecttotheglobalminimum.
TheResponseSurface2takesthefollowingform:
ϕ2(x1,...,xd)=
%
E∈P:|E|≤d∗
"E, (6)
where"EistheeffectcorrespondingtotheindexsetE ∈P,P=
P({1,...,d})thepowersetconstructedfromindexes{1,...,d}and
thesumwastakenoverallvariablecombinationswithatmostd*
elements.Theeffects"Earecomputedas"E=
!
i∈E
ˇiliIili,whereIili
isanindicatorvariabletakingvalue1,ifxi=xil
i,(li=1,...,k)and
thecoefficientsˇili ∈{ˇ1,...,ˇk}formedaequallyspacedgridof
kelementsontheinterval(0,1].Inallthesimulationspresented
hered*issetequalto⌈d/2⌉inaccordanceto[21].So,whenusing
d=4variables,wecomputeonlymaineffectsandtwo-ways
inter-actions.Ifd=20,wetakeallmaineffectsandallinteractionswith
atmostd*=10variables.
5.2. Searchapproachesforcomparison
InordertovalidatetheperformanceofNACOweselectedthree
approachesfromtheliterature:
• MAX–MIN Ant System(MMAS)[42] is based onthree key
aspects.Firstofall,inthecolonyofantsonlyasingleantisused
toupdatethepheromonetrailsaftereachiteration helpingto
exploitthebestsolutionsfoundduringaniterationorduringthe
runofthealgorithm.Thesecondaspectrestrictstherangeof
pos-siblevaluesfortheprobabilityofchoosingaspecificarcwhich
helpstoavoidearlystagnation.Thefinalaspectrequiresthatthe
pheromonetrailisdeliberatelyinitializedtoamaximumvalue,
!max.Stützleetal.[42]showbynumericalanalysisthatMMAS
achievesastronglyimprovedperformancecompared toother
versionsofACOalgorithm.
WeimplementedanMMASwith˛=1,withoutanyapriori
informationregarding theproblemand withevaporationrate
#=0.88.Theparameter#wasusedtoavoidunlimited
accumu-lationofthepheromonetrailsenablingthealgorithmtoforget
baddecisionspreviouslytaken.Infact,ifanarc isnotchosen
bytheants,itsassociatedpheromonevaluedecreases
exponen-tiallyinthenumberofiterations[44].Sinceevaporationrate#
hasanessentialroleinthealgorithmsperformancewedecided
toperformadedicatedtuninganalysis.Theresultsareshownin
Fig.3wherethebehaviourofMMASwithdifferent#valuesis
reported.Fromthefigurewenoticethatwhen#=0.88the
algo-rithmisabletooutperformtheotherversionsofMMASinthe
twobenchmarkfunctionswithk=13andd=2.
• GeneticAlgorithm(GA)[13,14]isconsideredoneof themost
powerfulalgorithmsforoptimization.GAsareadaptiveheuristic
searchalgorithmsbasedontheevolutionaryideaofnatural
M.Borrottietal./AppliedSoftComputing49(2016)259–268
Fig.3.TuninganalysisofMMASwithdifferentvaluesof#withk=13andd=2.
selectioninthepresenceofvariation-inducingoperatorssuchas
mutationandrecombination(crossover).Further,afitness
func-tionisusedtoevaluatesolutions,andtheselectionprocessis
basedonfitnessvalues(responses).
We implementeda GAwhere thecandidatesolutionswere
generatedbyaone-pointcrossoverwitha5%mutationrateand
proportionalselectionassamplingschemes[21].
• Co-InformationCompositeLikelihood(COIL)[21] optimization
isanovelapproachforhighdimensionalproblems.COILisan
adaptiveprocedurewherenewcandidatesolutionsarechosen
byminimizingtheco-informationcompositelikelihood
objec-tivefunction,derivedfromcouplingimportancesamplingand
compositelikelihoodprinciples.
COILdependsontwoparameters:(i)afixedthreshold$that
dividestheresponseintwoclassesand(ii)aparameter"that
con-trolsthetrade-offbetweenexplorationandexploitationofthe
searchspace.Inthesesimulations,wheretheaimistominimize
theresponse,$isthefirstquartileoftheresponsedistribution
and"t=1−1/(1+t),t≤TwheretisthecurrentiterationandTthe
maximumnumberofiterations.Thischoicefor"tismotivated
bytheneedtothoroughlyexploreXduringtheinitialiterations
andexploitthebestsolutionsmoreefficientlylateron[21].
NACO was tested with the following fixed parameters: (i)
#=0.88and(ii)theconstantthreshold$wasequaltothefirst
quar-tileasinCOIL.˛andıweredecidedinaccordancewithpreliminary
resultsasshowninFig.4andhavebeenfixedto˛=1andı=2.We
didnotreporttheresultsobtainedwithresponsesurfaceϕ1since
thedifferentsettingsgavesimilarperformance.
Inallthefourapproachesthenumberofcandidatesolutions
tobetestedateachiterationwassetasm=k×20inordertohave
enoughcandidatesolutionsandrelatedresponsestoobtainreliable
estimationsateachiteration[21].Atuningparameterprocedure
wasimplementedin[21]tooptimizetheperformanceofGAand
COILonthesamebenchmarkfunctionsusedtooptimizetheNACO
parameters.
5.3. Computationalresults
In order to compare the performance of NACO with the
approaches presented in Paragraph 5.2, two indicators were
derivedtoshowthemaindifferencebetweenthefouralgorithms
• expected number of iterations to hit the optimal solutions,
denotedbyEHI;
• explored fraction of the search space, %X=(EHI×m×100)/
dim(X)wheredim(X)isthedimensionofthewholeexperimental
searchspace.
TounderstandhowNACObehavedwhenthedimensionofthe
search space increasedwe consideredvarious values for k and
d,correspondingtodim(X)=kd≈104,105 and 106. Theresults
are reportedin Table1.Table 1(a)reports theresultsobtained
fordim(X)≈104.Thebenchmarkfunctionconsideredforthe
spe-cific simulation is reported in the first column. In columns 2,
3, 4wereportthenumber oflevelsk,thenumber ofvariables
d considered in the specific instance and the number of
can-didate solutions tested at each iteration. The value of EHI and
theexploredfractionofthesearchspacewasthencalculatedfor
eachmethod.InTable1(b)and(c)thesameresultsarereported
but considering largersearchspaces, such asdim(X)≈105 and
dim(X)≈106.
Our numerical studies show that NACO converges to the
optimuminallthedesignspacesizesandrequiresa verysmall
numberoftestedsolutionsinallsimulations.Thisisaverysmall
number,infact,atdim(X)≈104 NACOneedsjustlessthanthree
iterationstoreachtheoptimum,equalto0,ofresponsesurface
2 4 6 8 10 14.0 14.5 15.0 15.5 16.0 16.5 17.0 Experiment Batches Response Global Optimum =1, =1 =1, =2 =2, =1 =2, =2
Fig.4. TuninganalysisofNACOwithdifferentvaluesofıand˛withk=13andd=2 andresponsesurfaceϕ2.
M.Borrottietal./AppliedSoftComputing49(2016)259–268 Table1
Expectednumberofiterationsneededtoachievetheoptimalsolution(EHI)andthecorrespondentfractionofexploredsearchspace(%X)forMMAS,COILandGA.
k d m NACO MMAS COIL GA
EHI %X EHI %X EHI %X EHI %X
(a)dim(X)≈104 ϕ1 2 13 40 2.74 1.34 16.26 7.94 3.42 1.67 8.30 4.05 4 7 80 2.22 0.54 14.86 3.63 2.94 0.72 6.34 1.55 10 4 200 2.48 0.99 18.66 7.46 3.30 1.32 8.60 3.44 ϕ2 2 13 40 3.48 1.70 17.68 8.63 4.26 2.08 9.36 4.57 4 7 80 2.86 0.70 23.10 5.64 4.12 1.01 9.16 2.24 10 4 200 3.20 1.28 20.60 8.24 4.64 1.86 7.80 3.12 (b)dim(X)≈105 ϕ1 2 17 40 3.34 0.20 23.70 1.45 4.26 0.26 10.80 0.66 4 8 80 2.54 0.31 21.52 2.63 3.48 0.48 7.72 0.94 10 5 200 3.38 0.27 29.04 2.32 4.60 0.37 12.24 0.98 ϕ2 2 17 40 5.14 0.31 23.28 1.42 7.16 0.44 12.14 0.74 4 8 80 3.34 0.41 26.44 3.23 4.66 0.57 10.36 1.26 10 5 200 4.56 0.36 28.44 2.28 6.54 0.52 16.38 1.31 (c)dim(X)≈106 ϕ1 2 20 40 3.62 0.07 27.70 0.52 4.52 0.09 13.28 0.25 4 10 80 2.96 0.06 27.16 0.52 4.20 0.08 10.60 0.20 10 6 200 4.12 0.08 28.42 0.57 6.06 0.10 16.08 0.32 ϕ2 2 20 40 6.12 0.12 27.08 0.52 11.20 0.21 16.96 0.32 4 10 80 5.02 0.10 29.60 0.56 6.98 0.13 16.74 0.32 10 6 200 5.96 0.12 30.00 0.60 7.12 0.14 19.24 0.38
ϕ1andlessthanfouriterationsforresponsesurfaceϕ2,with
opti-mumvalueequalto13.89.Themethodperformsparticularlywell
whendim(X)≈106consideringresponsesurfaceϕ1,sinceNACO
needsless than0.07%(inaverage)of thetotal spacetoachieve
theoptimum.MMASperformsverybadlyinallthesimulations
becauseitdoesnotbenefitfromtheinclusionofheuristic
informa-tion,anessentialcomponentinACOalgorithms.Thisresultshows
thatthecombinationofthenaïveBayesclassifierintheACO
frame-workbooststheperformanceofNACOoutperformingalltheother
approaches.
Ifdandkincrease,NACOisstillabletoreachtheoptimumin
areasonabletime.Infact,consideringresponsesurfaceϕ2,k=10
andd=6,NACOrequires5.17iterations(inaverage)toreachthe
optimum,whichis61.67.Intheothercase,responsesurfaceϕ1,
NACOrequires3.57iterations(inaverage)toreachtheoptimum,
whichis0.
InFig.5,wecomparetheconvergenceofourmethodwithCOIL,
MMASandGAmethods.ThisanalysisconfirmsthatNACO
outper-formsMMASandGA,infactNACOtypicallyreachestheminimum
fasterthantheothertwoapproaches.NACOandCOILapproaches
havesimilarperformanceinalmostalldimensions.
WefurtherinvestigatedthedifferencebetweenNACOandall
theotherapproachesperformingapairwiseStudent’st-test.The
nullhypothesis,H0,isthattheaveragebestvalueisequalbetween
twoapproachesotherwisethetwovaluesaredifferent(alternative
hyphotesis,H1).Ifthep-value,p,islessthan(orequalto)˛pthen
thenullhypothesisisrejectedinfavorofthealternative
hypoth-esis.NACOis significantlydifferentwith˛p=0.05 ina pairwise
comparisonwithMMASinallinstances.NACOisalsosignificantly
differentwith˛p=0.05withGAmethodconsideringk=2andd=20
andk=10andd=(5,6)withresponsesurfaceϕ1.Ifweconsider
responsesurfaceϕ2,NACOissignificantlydifferentwithk=4and
d=10andk=10andd=(5,6).Ifweset˛p=0.10,NACOis
signif-icantlydifferentincomparisonwithGAmethodinallinstances
exceptfork=4and d=(7,10)withϕ1 andk=2and d=(13,17)
withϕ2.ThepairwiseStudent’st-testconfirmsthatNACOandCOIL
havesimilarbehaviourinallinstancessinceweacceptedthenull
hypothesisH0with˛p=0.05.
TofurtherunderstandthemaindifferencebetweenNACOand
COIL,weinvestigatethecomputationaltimeasafunctionofthe
averagenumberofsecondsneededtoreachtheoptimalvalue.We
thenranthetwoalgorithmsassumingthesamesettingofthe
previ-ousstudy.Alsointhiscase,wecomputedaMonteCarlosimulation
composedby50runs.Arunwasstoppediftheglobalminimum
wasachievedbeforeamaximumnumberofiterationsT=15.For
thisstudy,wedecidedtolimitthenumberofiterationssincewe
areinterestedinthecomputationalcostandnotinreachingthe
optimum.TheresultsarereportedinTable2inwhichthe
aver-agenumberofseconds,*sec,andrelativestandarddeviation,(,are
reportedforbothalgorithms.Intermsofcomputationaltime,NACO
performsbetterthanCOILinalmostalldimensions.Insomecases
thedifferenceislarge.Forinstance,NACOrequireslessthan13sto
reachtheoptimumwhenk=2andd=20whitϕ1.COILneedsmore
than38min.
Alsointhisstudy,weappliedapairwiseStudent’st-testtoassess
iftheaveragecomputationaltimeissignificantlydifferentbetween
theNACOandCOILmethods,alternativehyphotesisH1.TheNACO
p-value,p,islessthan˛p=0.05,thenthealternativehypothesisis
acceptedinalmostallinstancesexceptfork=10andd=(5,6)with
ϕ2.
6. Theartificial1AGYserineesterase
The NACO approach was used to address the problem of
designinganartificialenzymewhichmimicsthefold(i.e.
three-dimensionalshape)ofthenaturalenzyme1AGYoffungusFusarium
solani [38] (see Fig.6).Artificial enzymes composedof 4
sub-sequences (d=4) linkedtogether sequentially werebuilt and a
solutionxwasthenmadefromasequenceofthefourvariables.
Eachsubsequencewasembodiedbyaspecificstringofaminoacids
calledpseudo-domainschosenfrom acollectionof 95 different
pseudo-domains.Accordingly,thecorrespondingvariabledomain
wascomposedofk={1,...,95}levels.ThesearchspaceX
com-prisedallthepossiblepermutationsrepeatingkpseudo-domainsin
dpositionsequalling954=8.1×107possiblesolutions.For
M.Borrottietal./AppliedSoftComputing49(2016)259–268
Fig.5.ConvergencebehaviorforExperiment1(ϕ1)andExperiment2(ϕ2)fortheNACO(solidlines),COIL(dashedlineswithdots),MMAS(dottedlines)andGA(dashed
lines)methods.Thefirst,secondandthirdrowsofthedisplayshowMonteCarloaveragesofthebestresponsevalueford=13,17,20,respectively,andk=2.Thevertical barsare95%MonteCarloconfidenceintervals.
66).Inthisproblem,theobjectivefunctionwastomaximizethe
structuralsimilaritybetweenagivencandidateartificialenzyme
andthenaturalenzyme1AGY[45].Thesimilarityscoreis
calcu-latedbymeasuringthe2Dstructuresimilaritytothetargetenzyme
bypairwisealignmentof2DstructurespredictedusingPSIPRED
algorithm[46,47].Thescorerangesfrom0to1000,thehigherthe
scorethehigheristhestructuresimilaritybetweenthecandidate
artificialenzymeandthetargetenzyme.
Generally,inenzymeengineeringproblems,direct
experimen-talevaluationofpotentialsolutionsistheonlymeansforknowing
how an enzyme performs. Experimentation, however, is costly
andtimeconsuming.Inourstudy,realexperimentationwas
cou-pledwiththeNACOalgorithm. Solutionswereevaluatedin the
realworldby conductingphysicalexperiments andby creating
newcandidatesolutionsusedinacomputationalsetting.Atthis
point, evaluationswerefedbacktothecomputationalphaseof
theapproachand thegenerationofsubsequentsolutionswasa
functionofthese.
Themaximumnumberofiterationswasestimatedin5cycles
considering current lab procedures and corresponding costs.
Furthermore, at each iteration, the number of candidate
solu-tions was 96, which corresponded to the technology used for
experimentations(96microwellplate).
6.1. Graphrepresentation
To address the Enzyme Engineering problem of designing a
particularenzyme whichmimics thenatural enzyme1AGY,we
Table2
Averagenumberofseconds,*sec,neededtoreachtheoptimum.ThemaximumnumberofiterationswasfixedtoT=15.Thestandarddeviation,(,isalsoreported.
213 217 220 47 48 410 104 105 106 ϕ1 NACO *(sec 9.571.13 10.465.69 12.253.19 12.957.12 10.756.14 16.796.15 21.4467.08 27.7963.03 124.4542.20 COIL *(sec 12.331.73 230.7065.05 2288.67821.85 1.296.66 33.755.18 6303.6015908.38 1.283.39 40.836.55 679.2388.28 ϕ2 NACO *(sec 9.213.09 27.0720.89 55.1959.32 17.747.73 16.806.18 35.9616.50 31.1147.04 74.5792.62 1765.895035.15 COIL *(sec 13.493.81 468.05256.62 17549.7423912.79 2.277.86 38.8612.97 1772.351095.41 1.814.04 11.7856.37 8639.3318928.35
M.Borrottietal./AppliedSoftComputing49(2016)259–268
Fig.6. Thecrystalstructureoftargetprotein1AGY.Thepurpleelements
repre-sentalphahelices,theblueonesturnsandtheyellowonesabetastrand.Gray
linesarepredictedcoilwithnodefinedsecondarystructure.Catalyticresiduesare
representedinball-and-stickconfiguration.
designedanetworkwhereanodecorrespondstoapseudo-domain
andanarccorrespondstotheconnectionbetweenvariableiina
positionandvariablejinthenextpositionoftheenzymesequence
(seeFig.7).Thetotalnumberofnodeswas95×4=380sincean
enzymeis composedof 4subsequences (d=4)and each
subse-quence can be chosen among k=95 different pseudo-domains.
Therefore,acandidatesolutionisapathcomposedof4variables
(4nodes);wherethis pathrepresentsafull-lengthenzyme.We
assumedthatnoheuristicinformationwasavailablewithregards
totheproblem.
6.2. Results
Toaddresstheproblemofdiscoveringtheoptimalcombination
thatmimicsthefold(i.e.three-dimensionalshape)ofthenatural
Fig.7. Antcolonygraphrepresentationwhereantsmovefromxitoxj.Asolutionis
apathcomposedbydnodesandd−1arcs.
enzyme1AGY,wedevelopedNACOandwecomparedtheresults
with the COIL approach. The main descriptive statistics of the
responsevaluesarereportedinTable3.NACOidentifiedthehigher
responsevalueafterfouriterationsreachingtheresponsevalue
845.0andsampling384pointsinthesearchspace.COILneeded480
pointstoreachthesamevalue.Thisresultshowsthatinan
exper-imentalsettingNACO,comparedtoCOIL,ismoreefficientinterms
ofreducedcosts.Further,thethirdquantileinTable3,showsthat
NACOhasthecapacitytomovefastertowardstheoptimalregion
withrespecttoCOIL,incrementingthevalueof thethird
quan-tilebyabout50%fromthefirsttotheseconditeration.Thisisalso
confirmedapplyingaStudent’st-testbetweenvaluesachievedby
NACOandCOILmethods.WeperformedapairwiseStudent’st-test
foreachiterationwithH1:*NACOc =/*COILc withc=1,...,5.
Consid-eringiterationc=1,weacceptedH0sincethetwoapproachesstart
fromthesamesetofartificialenzymes.Consideringiterationsc=2
and3,theaveragebiological fitnessscoreresponsesare
signifi-cantlydifferentbetweenNACOandCOIL,whichconfirmsadifferent
behaviourtowardstheoptimalregion.Inthelasttwoiterations,
c=4and5,weacceptedH0.Ifweconsideralltheartificialenzymes
testedbythetwoapproaches,theaveragebiologicalfitnessscore
responsesaresignificantlydifferent.
In Fig.8,we reportedthebestenzyme sequence(see [5,11]
formoredetailsaboutaminoacidssequences),foundbyNACO,
thatmimics thetertiarystructureof thenaturalenzyme 1AGY.
1AGYischaracterizedbybilaterallysymmetricwithabeta-sheet
coreandasetofalphahelicesconnectedbyunstructuredloops.
ThebestartificialenzymefoundbyNACOinthefourthiteration
presentsasolidstructurearoundabeta-sheetcoreandbilateral
symmetrysimilartothenaturalenzyme1AGY.Althoughthebest
artificialenzymeidentifiedin thefourthiterationlackssomeof
001 - 050 EACI QGE SP I WAVGRGHATPAL RLT NVDTPQKP DAQ KDG RRL LKQEL TG L
051 - 100 D D S Y A V P R K A D C E T D G Q T R S E D S S K K Y K A S L F S A T M E V M Q Q W C Y K K T Q W D
101 - 150 QP HVQYQEE I A ES QGDL PNPGKRFNNKDSGGEKLGD NNP T GK SVP CVLA N
151 - 200 TGCDI SVNS HT YAPPET RAFDCSFKLS L EDI LANKI RKL F AH I VS KCGWG
Fig.8. AminoacidssequenceofthebestartificialenzymefoundbyNACO.Eachrowisasubsequenceof50aminoacids(i.e.pseudo-domain). Table3
MaindescriptivestatisticsofNACOandCOILforthe5iterations.Ineachiteration96solutionsaretested.
Iter. Min. 1stQu. Median Mean 3rdQu. Max
(a)NACOperformance
1 189.0 327.5 379.0 384.0 432.0 596.0 2 306.0 568.8 601.5 602.7 647.2 792.0 3 436.0 534.2 589.0 598.3 650.2 830.0 4 241.0 476.5 579.4 579.4 712.8 845.0 5 402.0 509.5 582.0 602.4 751.2 841.0 (b)COILperformance 1 189.0 327.5 379.0 384.0 432.0 596.0 2 230.0 338.8 399.5 424.8 498.2 759.0 3 217.0 412.0 493.0 482.4 549.8 754.0 4 219.0 491.0 562.5 552.8 615.2 780.0 5 400.0 533.5 585.0 590.1 643.0 845.0
M.Borrottietal./AppliedSoftComputing49(2016)259–268
thestructuralelementsofthetargetenzyme(e.g.misplacement
ofthealphaheliceswithrespecttobetacore),itclearlyshowsan
overallsimilaritytothetargetenzyme.
7. Conclusion
In this work, we have presented a novelapproach for high
dimensionalproblemswithcomputationally-expensivecomplex
functionswithoutaprioriknowledge.Themainideabehindthe
newmethodistocombinethenaïveBayesclassifiertoantcolony
optimizationtocreatetheresultingnaïveBayesantcolony
opti-mization(NACO)algorithm.
Numericalresultspoint outquick convergencetotheglobal
optimum.Oneinterestingfindingistheremarkablescalabilityof
themethodtotheoverallsizeofthedesignspaceX.Asthenumber
ofvariables/variableinteractionsincreases,NACOshowsgrowing
efficiency,measuredintermsofthetotalnumberofexperiments
neededtoreachtheoptimum.ThisaspectmakesNACOavalid
choicewhendealingwithhigh-dimensionalsettingscharacterized
bylargedesignspaces.
NACOhasalsobeentestedonacomplexenzymeengineering
problemwiththeaimofidentifyinganartificialenzymewhich
mimics thenatural enzyme 1AGY. The approach representsan
interactiveprocesswherethedialoguebetweendesignand
lab-oratory experimentation at each generation creates a path in
the combinatorial search spacethat may lead toward a region
of optimality. The evolving design requires a small number of
experimentalpointstotestfromgenerationtogenerationand,
con-sequently,alimitedinvestmentintermsofresources.Furthermore,
thesmallnumberoftestsmakeeachexperimentalphaseextremely
rapid.
Acknowledgements
Thiswork wassupported bythe EU integrated project
Pro-grammableArtificialCellEvolution(PACE)inFP6-IST-FET-002035,
Complex Systems Initiative and by the Fondazione di Venezia
– DICE project. The authors would like to acknowledge the
EuropeanCentreforLivingTechnology(www.ecltech.org)for
pro-vidingopportunitiesofpresentationandfruitfuldiscussionofthe
research.Furthermore,theauthorsaregratefultoThomas
Stüt-zle,MauroBirattariandAlessandraGiovagnolifortheirveryuseful
suggestions.Theauthorsgratefullyacknowledgetheeditorandthe
refereesfortheircommentsandsuggestions.
References
[1]C.-I.Branden,J.Tooze,IntroductiontoProteinStructure,GarlandPub.,1999.
[2]A.Zanghellini,Denovocomputationalenzymedesign,Curr.Opin.Biotechnol. 29(C)(2014)132–138.
[3]J.Damborsky,J.Brezovsky,Computationaltoolsfordesigningand engineeringenzymes,Curr.Opin.Chem.Biol.19(C)(2014)8–16.
[4]B.M.Nestl,B.A.Nebel,B.Hauer,Recentprogressinindustrialbiocatalysis, Curr.Opin.Chem.Biol.15(2)(2011)187–193.
[5]A.M.M.S.Ullah,ADNA-basedcomputingmethodforsolvingcontrolchart patternrecognitionproblems,CIRPJ.Manuf.Sci.Technol.3(4)(2010) 293–303.
[6]V.K.Garlapati,P.R.Vundavilli,R.Banerjee,Evaluationoflipaseproductionby geneticalgorithmandparticleswarmoptimizationandtheircomparative study,Appl.Biochem.Biotechnol.162(5)(2010)1350–1361.
[7]A.M.Mohsen,A.T.Khader,D.Ramachandram,Anoptimizationalgorithm basedonharmonysearchforRNAsecondarystructureprediction,in:Z.W. Geem(Ed.),RecentAdvancesinHarmonySearchAlgorithm,Studiesin ComputationalIntelligence,vol.270,SpringerBerlinHeidelberg,2010,pp. 163–174.
[8]Y.Zhang,J.Xu,Z.Yuan,H.Xu,Q.Yu,Artificialneuralnetwork-genetic algorithmbasedoptimizationfortheimmobilizationofcellulaseonthesmart polymerEudragitL-100,Bioresour.Technol.101(9)(2010)3153–3158.
[9]B.Blum,M.I.Jordan,D.Baker,Featurespaceresamplingforprotein conformationalsearch,Proteins78(6)(2010)1583–1593.
[10]M.Borrotti,D.DeLucrezia,G.Minervini,I.Poli,Amodelbasedantcolony designfortheproteinengineeringproblem,in:M.Dorigo,M.Birattari,G.Di Caro,R.Doursat,A.Engelbrecht,D.Floreano,L.Gambardella,R.Groß,E.Sahin, H.Sayama,T.Stützle(Eds.),SwarmIntelligence,LectureNotesinComputer Science,vol.6234,SpringerBerlinHeidelberg,2010,pp.352–359.
[11]A.M.M.S.Ullah,D.D’Addona,N.Arai,DNAbasedcomputingforunderstanding complexshapes,BioSystems117(1)(2014)40–53.
[12]K.A.DeJong,EvolutionaryComputation,MITPress,2006.
[13]D.E.Goldberg,GeneticAlgorithmsinSearch,OptimizationandMachine Learning,Addison-WesleyLongmanPublishingCo.,1989.
[14]J.H.Holland,AdaptationinNaturalandArtificialSystems,MITPress,1992.
[15]R.Baragona,F.Battaglia,I.Poli,EvolutionaryStatisticalProcedures, Springer-Verlag,2011.
[16]R.Berni,D.DeMarch,F.M.Stefanini,T-optimalityandneuralnetworks:a comparisonofapproachesforbuildingexperimentaldesigns,Appl.Stoch. ModelsBus.Ind.29(5)(2013)454–467.
[17]F.Caschera,M.A.Bedau,A.Buchanan,J.Cawse,D.deLucrezia,G.Gazzola, M.M.Hanczyc,N.H.Packard,Copingwithcomplexity:machinelearning optimizationofcell-freeproteinsynthesis,Biotechnol.Bioeng.108(9)(2011) 2218–2228.
[18]J.Ji,R.Hu,H.Zhang,C.Liu,AhybridmethodforlearningBayesiannetworks basedonantcolonyoptimization,Appl.SoftComput.11(4)(2011) 3373–3384.
[19]R.Tang,S.Fong,X.S.Yang,S.Deb,Integratingnature-inspiredoptimization algorithmstok-meansclustering,in:SeventhInternationalConferenceon DigitalInformationManagement,2012,pp.116–123.
[20]M.Forlin,D.Slanzi,I.Poli,Combiningprobabilisticdependencymodelsand particleswarmoptimizationforparameterinferenceinstochasticbiological systems,in:F.L.Gaol,Q.V.Nguyen(Eds.),Proceedingsofthe20112nd InternationalCongressonComputerApplicationsandComputationalScience, AdvancesinIntelligentandSoftComputing,vol.145,SpringerBerlin Heidelberg,2012,pp.437–443.
[21]D.Ferrari,M.Borrotti,D.D.March,Responseimprovementincomplex experimentsbyco-informationcompositelikelihoodoptimization,Stat. Comput.24(3)(2014)351–363.
[22]Y.Rubinstein,D.P.Kroese,TheCross-EntropyMethod:AUnifiedApproachto MonteCarloSimulation,RandomizedOptimizationandMachineLearning, SpringerVerlag,2004.
[23]B.G.Lindsay,Compositelikelihoodmethods,in:N.U.Prabhu(Ed.),Statistical InferencefromStochasticProcesses,1988,pp.221–239.
[24]M.Dorigo,Optimization,learningandnaturalalgorithms(Ph.D.thesis), PolitecnicodiMilano–DEI,1992.
[25]M.Dorigo,V.Maniezzo,A.Colorni,Antsystem:optimizationbyacolonyof cooperatingagents,IEEETrans.Syst.ManCybern.B:Cybern.26(1)(1996) 29–41.
[26]M.Dorigo,G.DiCaro,Theantcolonyoptimizationmetaheuristic,in:D.Corne, M.Dorigo,F.Glover,D.Dasgupta,P.Moscato,R.Poli,K.V.Price(Eds.),New IdeasinOptimization,McGraw-Hill,1999,pp.11–32.
[27]T.M.Mitchell,MachineLearning,McGraw-Hill,1997.
[28]R.Montemanni,D.H.Smith,L.M.Gambardella,Aheuristicmanipulation techniqueforthesequentialorderingproblem,Comput.Oper.Res.35(2008) 3931–3944.
[29]C.Blum,Beam-ACO–hybridizingantcolonyoptimizationwithbeamsearch: anapplicationtoopenshopscheduling,Comput.Oper.Res.32(6)(2005) 1565–1591.
[30]C.Blum,M.Y.Vallès,M.J.Blesa,AnantcolonyoptimizationalgorithmforDNA sequencingbyhybridization,Comput.Oper.Res.35(11)(2008)3620–3635.
[31]S.J.Shyu,C.Y.Tsai,Findingthelongestcommonsubsequenceformultiple biologicalsequencesbyantcolonyoptimization,Comput.Oper.Res.36(1) (2009)73–91.
[32]P.J.Bickel,E.Levina,SometheoryforFisher’slineardiscriminantfunction, naïveBayes,andsomealternativeswhentherearemanyvariablesthan observations,Bernoulli10(6)(2004)989–1010.
[33]I.Rish,AnempiricalstudyoftheNaïveBayesclassifier,in:TheIJCAI-01 WorkshoponEmpiricalMethodsinArtificialIntelligent,2001, pp.41–46.
[34]M.Yousef,M.Nebozhyn,H.Shatkay,S.Kanterakis,L.C.Showe,M.K.Showe, Combiningmulti-speciesgenomicdataformicroRNAidentificationusinga NaïveBayesclassifier,Bioinformatics22(11)(2006)1325–1334.
[35]G.L.Rosen,E.R.Reichenberger,A.M.Rosenfeld,NBC:theNaïveBayes classificationtoolwebserverfortaxonomicclassificationofmetagenomic reads,Bioinformatics27(1)(2011)127–129.
[36]M.Sahami,S.Dumais,D.Heckerman,E.Horvitz,ABayesianapproachto filteringjunke-mail,in:LearningforTextCategorization:Papersfromthe 1998Workshop,1998,pp.1–8.
[37]F.Sambo,E.Trifoglio,B.D.Camillo,G.M.Toffolo,C.Cobelli,BagofNaïveBayes: biomarkerselectionandclassificationfromgenome-wideSNPdata,BMC Bioinf.13(14)(2012)1–10.
[38]S.Longhi,M.Czjzek,V.Lamzin,A.Nicolas,C.Cambillau,Atomicresolution (1.0a)crystalstructureofFusariumsolanicutinase:stereochemicalanalysis,J. Mol.Biol.268(4)(1997)779–799.
[39]C.Blum,A.Roli,Metaheuristicincombinatorialoptimization:overviewand conceptualcomparison,ACMComput.Surv.35(3)(2003)268–308.
[40]L.Gambardella,M.Dorigo,Anantcolonysystemhybridizedwithanewlocal searchforsequentialorderingproblem,INFORMSJ.Comput.12(3)(2000) 237–255.
M.Borrottietal./AppliedSoftComputing49(2016)259–268 [41]P.Pellegrini,E.Moretti,Acomputationalanalysisonahybridapproach:
quick-and-dirtyantcolonyoptimization,Appl.Math.Sci.3(23)(2009) 1127–1140.
[42]T.Stützle,H.Hoos,MAX–MINantsystem,FutureGener.Comput.Syst.16 (8)(2000)889–914.
[43]X.-S.Yang,EngineeringOptimization:AnIntroductionwithMetaheuristic Applications,JohnWileyandSons,2010.
[44]M.Dorigo,T.Stützle,AntColonyOptimization,BradfordCompany,Scituate, MA,USA,2004.
[45]G.Minervini,G.Evangelista,L.Villanova,D.Slanzi,D.D.Lucrezia,I.Poli,P.L. Luisi,F.Polticelli,Massivenon-naturalproteinsstructurepredictionusing gridtechnologies,BMCBioinf.10(6)(2009)1–9.
[46]D.T.Jones,Proteinsecondarystructurepredictionbasedonposition-specific scoringmatrices,J.Mol.Biol.292(2)(1999)195–202.
[47]D.Slanzi,D.DeLucrezia,I.Poli,QueryingBayesiannetworkstodesign experimentswithapplicationto1AGYserineesteraseproteinengineering, Chemom.Intell.Lab.Syst.149(A)(2015)28–38.