Naïve colony Bayes optimization ant for designing high dimensionalexperiments Applied Soft Computing

(1)

ContentslistsavailableatScienceDirect

Applied

Soft

Computing

jo u r n al hom e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / a s o c

Naïve

Bayes

ant

colony

optimization

for

designing

high

dimensional

experiments

M. Borrotti

a,b,∗

_,

_G.

_Minervini

d

_,

_D.

_De

_Lucrezia

e

_,

_I.

_Poli

b,c

a_Institute_of_Applied_Mathematics_and_Information_{Technologies,}_CNR-IMATI,_via_Bassini_15,₂₀₁₃₃_Milan,_Italy b_European_Centre_for_Living_Technology,_Ca’_Foscari_University_of_Venice,_San_Marco_2940,₃₀₁₂₄_Venice,_Italy

c_Department_of_{Environmental}_Science,_Informatics_and_Statistics,_Ca’_Foscari_University_of_Venice,_Dorsoduro_2137,₃₀₁₂₃_Venice,_Italy d_Department_of_Biology,_University_of_Padua,_Via_U._Bassi_58,₃₅₁₂₁_Padua,_Italy

e_Explora_Biotech_S.r.l.,_Via_della_Libertá_9,₃₀₁₇₅_Venice,_Italy

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Received15October2015 Receivedinrevisedform8July2016 Accepted10August2016

Availableonline22August2016 Keywords:

Antcolonyoptimization NaïveBayesclassifier Experimentaldesign Enzymeengineering Highdimensionality

a

b

s

t

r

a

c

t

Inalargenumberofexperimentalproblems,highdimensionalityofthesearchareaandeconomical constraintscanseverelylimitthenumberofexperimentalpointsthatcanbetested.Withinthese con-straints,classicaloptimizationtechniquesperformpoorly,inparticular,whenlittleaprioriknowledgeis available.Inthisworkweinvestigatethepossibilityofcombiningapproachesfromstatisticalmodeling andbio-inspiredalgorithmstoeffectivelyexploreahugesearchspace,samplingonlyalimitednumberof experimentalpoints.Tothispurpose,weintroduceanovelapproach,combiningantcolonyoptimization (ACO)andnaïveBayesclassifier(NBC)thatis,thenaïveBayesantcolonyoptimization(NACO) proce-dure.WecompareNACOwithothersimilarapproachesdevelopingasimulationstudy.Wethenderive theNACOprocedurewiththegoaltodesignartificialenzymeswithnosequencehomologytotheextant one.Ourfinalaimistomimicthenaturalfoldof200aminoacids1AGYserineesterasefromFusarium solani.

1. Introduction

Enzymesare biologicalmoleculesthat catalyzethousandsof

chemicalreactionsthatsustainlife.Eachenzymeisapolymer

gen-eratedfromaseriesofupto20differentaminoacids.Polymers

canbe representedaswordsformed accordingtothealphabet

a={a1,a2,a3,...,a20}.Theymaydifferinlength,sequenceand

aminoacidorder.Asecondaryandtertiarystructure,whichdefines

enzymeactivity,isassociatedtoeachsequence[1].Enzyme

activi-tiescanbeassociatedtoeachofthesecombinationsofaminoacids

throughexperimentation.Biologicalexperimentationaimsto

opti-mizethesequenceofaminoacidsinordertofindenzymeswith

thebestactivities.Thenumberofpossiblecombinationsofamino

acidsrapidlyincreaseleadingtoacombinatorialexplosionofthe

searchspace.Designingartificialenzymeswithdesiredproperties

is knownin computationalbiology asEnzymeEngineering[2,3].

EnzymeEngineeringisanimportanttaskinnumerousfieldssuch

∗ Correspondingauthorat:InstituteofAppliedMathematicsandInformation Technologies,CNR-IMATI,viaBassini15,20133Milan,Italy.

E-mailaddress:matteo.borrotti@mi.imati.cnr.it(M.Borrotti).

aschemicals,pharmaceuticals,fuel,foodoragriculturaladditives

[4].

In this contextit isnecessary todevelopnewenzyme

engi-neeringprocedures basedoncomputationalapproaches ableto

identifybestsolutionswiththelowestnumberofcostlyandtime

consumingexperimentations.Differentoptimizationapproaches

havebeenproposedintheliteraturetotackleproblemswithhigh

dimensionalandcomplexsearchspaces[5–11].Someapproaches

arebasedonEvolutionaryComputationAlgorithms[12].Genetic

Algorithms(GAs)[13,14],forexample,usetechniquesinspiredby

naturalevolution,i.e.mutation,selectionandcrossover,inorderto

optimizecomplexfunctions.Theapplicationoftheseapproaches

islimitedbyseveralissuessuchastheneedofsolidapriori

knowl-edgeoftherelationshipbetweenthestructureandfunctionofan

enzyme,thevastnumberofpossiblemodificationsofanexisting

enzyme,thelackofefficientscreeningproceduresabletoreduce

thenumberofmodifiedenzymestobetestedandthe

computa-tionaleffortneededtocreatenewenzymes.Apromisingsolution

iscombiningmetaheuristicalgorithmsandstatisticalmodels.This

isanovelresearchareathataddressesproblemscharacterizedby

largedesignspace,high-orderinteractionsbetweenvariables,and,

complexandnon-linearexperimentalsurfaces[15].Some

exam-plesofthesehybridizedmethodscanbefoundin[16–21].Arecent

http://dx.doi.org/10.1016/j.asoc.2016.08.018

(2)

M.Borrottietal./AppliedSoftComputing49(2016)259–268

approach isproposedin [21]where an optimizationalgorithm,

calledCo-InformationCompositeLikelihood(COIL),basedonthe

evolutionaryparadigm is introducedby coupling cross-entropy

sampling[22]andcompositelikelihoodprinciples[23]todesign

novelenzymeswithimprovedfunctionalities.

Inthispaper,weintroduceanovelapproachthatcombinesant

colonyoptimization(ACO)[24–26]andthenaïveBayesClassifier

(NBC)[27],callednaïve-Bayesantcolonyoptimization(NACO).The

ACOapproachhasbeenwidelyusedinmanyoptimization

prob-lemsrangingfromsequentialorderingstudies[28]toopenshop

scheduling[29].ACOhasbeenalsoappliedtobiologicalproblems

suchasDNAsequencingproblems[30,31].

SeveraltheoreticalandpracticalfeaturesofNBChavebeen

stud-iedthathaveshownitsreliabilityintermsofclassificationaccuracy

[32–37].Oneofthemostpowerfulproceduresproposedinthe

bio-logicalfieldisanalgorithmforgeneticbiomarkerselectionsand

subjectclassificationsfromthesimultaneousanalysisof

genome-wide Single-NucleotidePolymorphism (SNP) databased onthe

naïveBayesframework[37].

ThekeyideaofNACOistoincludeinformationachievedviathe

NBCmethodinthepathconstructionmechanismofACO,inorderto

drivethesearchprocesstowardsthetargetregioninamuchmore

effectiveway.NACOwasevaluatedbothinasimulationstudyand

inarealcaseofbuildingartificialenzymes.Inparticular,theNACO

algorithmwasconstructedandasimulationcomparisonwithaset

ofotherproceduresfrequentlyusedforthispurposewas

devel-oped.Thealgorithmwasthenderivedwiththegoalofdesigning

artificialenzymeswithnosequencehomologytoextantones;the

finalaimbeingtomimicthenaturalfoldof200aminoacidlong

1AGYserineesterasefromFusariumsolani[38].

Thispaperisorganizedasfollows.InSection2theoptimization

problemisformalizedandinSection3,webrieflydescribetheACO

andNBCalgorithm.InSection4,weintroduceourapproach,

naïve-Bayesantcolonyoptimization(NACO).InSection5,weevaluate

NACOperformanceinasimulationstudy,comparingresultswith

well-knowntechniquesand,inSection6,weapplyNACOtothe

designofnewenzymes.Finally,someconclusionsaredrawnin

Section7.

2. Theoptimizationproblem

Enzymeengineeringcanbedescribedasacombinatorial

opti-mizationproblemP=(X,f)inwhich:

-fis theobjective function tobeoptimized and defined as f:

D1×D2×···×Dd→R+whereD1,...,Ddarethedomainsofthe

followingvariablesinX={x1,...,x_d};

-Xisthesetofallpossiblecombinationsx=(x1li,...,xdli)where

xili ∈Diindicatesthevalueofthevariablexi

∀

iwherei=1,...,d.

Thefunctionfcalculatesaresponsevalueforeachcombinationof

variables,y=f(x1l_i,...,xdli)withli=1,...,k.Withinthissetting,

thefinalaimistofindthecombination,orsolution,x∗ _∈_X_with

maximumresponsevalue,thatis,f(x∗₎_>_f_(x)

_∀

_x_∈_X.

Toaddresstheenzymeengineeringproblemweproposethat

the variables represent the set of amino-acids in the enzyme

sequence,thevariabledomainisthealphabeta,Xisthesetofall

possiblecandidateenzymestobetestedandtheobjectivefunction

istheenzymeactivitytobemaximized.

3. AntcolonyoptimizationandnaïveBayesclassifier

In this section webriefly introduce ant colonyoptimization

(ACO)andnaïveBayesclassifier(NBC),whichweintendtocombine

inannewapproachtohighdimensionaloptimizationproblems.

3.1. Antcolonyoptimization(ACO)

Antcolonyoptimization(ACO)[24]isametaheuristicalgorithm

introducedforcomplexcombinatorialproblems.Itisinspiredby

thebehaviour of specificreal antcolonies in search of foodin

theirenvironment.Innatureantsexplorethesurrounding

environ-mentforfoodsourcethroughrandomwalks.Oncefoundthefood,

antsreturntotheircolonyleavingbehindatrailofpheromones.

Pheromonesarevolatilecompoundsthatquicklydryoffsothata

shortpathfromcolonytofoodsourceismorelikelytopreserve

detectableamountofpheromones.Thisinturnwillallowmore

antstofollowthatpathandthusreinforcethepheromonetrails.

Thispositivefeedbackeventuallyleadsallantstouseasinglepath

fromthecolonytothefood.

ThemainideaoftheACOalgorithmistomimicthisbehaviour

withartificialantswalkingaroundagraphrepresentingthe

envi-ronment;morespecifically,theproblemtosolveisillustratedin

Fig.1.

ACOalgorithm isappliedtoproblems that canbedescribed

asagraphG=(N,A)whereNisthesetofnodesandAthesetof

arcsthatfullyconnectthenodes.Aweight!i,j,calledpheromone

value,isassociatedwitheacharcwhichconnectsnodeiwithnode

j.The pheromone valuerepresents theattractiveness of a

spe-cific arc for theants:the higherthe amountof pheromone on

anarc,thehighertheprobabilitythat antswillchooseitwhen

constructingsolutions.Inaddition,aheuristicvalue"i,j,which

rep-resentsaprioriinformation,isintroducedtomovefromnodeito

nodej.

The construction of the ACOalgorithm includes the

follow-ingphases[26,39]:(i)Constructantsolutions,(ii)Daemonactions

and(iii)Updatepheromone.Thefirstphase,Constructantsolutions,

representstheprocedureneededfor antstoconstructsolutions

incrementally:startingfromafirstnode,antsaddanewnodeat

eachiterationoftheprocedureinordertobuildtheentiresolution.

Anantdecideswheretogonextinaccordancewithpheromone

val-ues!i,jandtheheuristicvalues"i,j.Therelativeinfluenceofthese

twovaluesisweighedinaccordancetotwoparameters,called˛>0

andˇ≥0,respectively.Thesecondphase, Daemonactions,

com-prisesallproblem-specificoperationsthatmaybeconsideredfor

boostingtheperformanceofACOalgorithms.Themainexample

ofsuchoperationsistheintroductionoflocalsearchtechniques

[40,41].Thethirdphase,Update pheromone,includesthe

proce-duretoupdatespheromonevalues.Thisprocedureisdividedin

twophases.First,pheromoneevaporationisappliedtodecrease

pheromonevalues.Thedegreeofdecreasedependsonthe

param-eter#∈[0,1],calledtheevaporationrate.Theaimofpheromone

Fig.1.Representationofapossiblegraphwhereantsmove,fromastartingnodeto adestinationnode.

(3)

evaporationistoavoidanunlimitedincreaseinpheromone

val-ues and to allow the ant colony to forget poor choices made

previously.Apheromonedepositisthenappliedtoincreasethe

pheromonevalues thatbelongtogoodsolutionsthat antshave

generated.

In the literature there are many different versions of this

algorithm andoneofthemostpowerfulversionsof ACOis the

MAX–MINAntSystem(MMAS)[42].MMASisableto

exten-sivelyexploitthesearch spacebyallowingonlya singleantto

depositpheromoneaftereachiteration,forexample,theantthat

reachedthebestsolutioninthecurrentiteration,called

interation-bestsolutionsib_._Moreover,_it_uses_a_simple_method_for_limiting_the

strengthofthepheromonetrailthateffectivelyavoidspremature

convergenceofthesearch.

3.2. NaïveBayesclassifier(NBC)

ThenaïveBayesclassifier[27]isaprobabilisticmethod,based

onBayesrule,whichassumesthatthevariables{x1,...,xd}inthe

setXbeallconditionallyindependentfromeachother,giventhe

responsevaluey.Inotherwords,anaïveBayesclassifierassumes

thatthepresence(orabsence)ofaparticularvariableinacandidate

solutionisunrelatedtothepresence(orabsence)ofanyother

vari-ableand,further,assumesthatinteractionsbetweenvariablesdo

notinfluencetheresponsevalue.Moreprecisely,consideringa

can-didatesolutionx=(x1l_i,...,xdli)wherexili ∈Diindicatesthevalue

ofthevariablexi

∀

iwherei=1,...,d,wecancalculateprobability

P(x1li,...,xdli|y)asfollows: P(x1li,...,xdli|y)= d

!

i=1 P(xili|y). (1)

TheaimofthenaïveBayesclassifieristoderivethe

probabil-itydistributionofy,foreachnewcandidatesolutionnotalready

tested.StartingfromEq.(1),itispossibletoderivethenaïveBayes

algorithminthefollowingway:yisassumedtobediscretewithr

possiblevaluesandvariables{x1,...,x_d}canassumeanydiscrete

orrealvalues.Theprobabilitythatywilltakeitsrthpossiblevalue,

accordingtoBayesrule,isgivenas:

P(y=yr|x1l_i,...,xdli)=

P(y=yr)P(x1li,...,xdli|yr)

"

r

j=1P(y=yj)P(x1li,...,xdli|yj)

, (2)

wherethesumistakenoverallpossiblevaluesy_jwithj=1,...,r.

Assumingthen{x1,...,x_d}tobeconditionallyindependentgiven

y,wecanwritethefollowingequation:

P(y=yr|x1l_i,...,xdli)=

P(y=yr)

#

d_i=1P(x_il_i|yr)

"

r

j=1P(y=yj)

#

d_i=1P(xili|yj)

. (3)

Eq.(3)isthefundamentalequationforthenaïveBayesclassifier.

Givenanewcombinationofvariablesxnew_,_this_equation_allows_to

derivetheprobabilitythatywilltakeapossiblevalueinitsdiscrete

domain,givenxnew_and_the_{distributions}_P(y)_and_P(x

ili|y)withi=1,

...,destimatedfromtheavailablesetofcandidatesolutionsand

theassociatedsetofresponsevalues.

4. Naïve-Bayesantcolonyoptimization(NACO)

Inbuildingthenaïve-Bayesantcolonyoptimization(NACO),

we assumethat a colony ofants mutuallycollaborate toreach

theoptimum in ahighdimensional andcomplex searchspace.

Thiscolonysharesinformation andcollaborates viapheromone

values and information extraction by NBC. Pheromone update

rules are appliedonce allants have builttheirsolutions based

Fig.2. FlowdiagramofNACOapproach.

on the iteration-best solution, sib_, _and _the _probability

distri-bution to obtain a certain response given a specific set of

variables.

TheconstructionofNACOapproach(seeFig.2)canbedescribed

asfollows:

1.Initializationoftheprobabilitywithwhichanantwchoosesto

gofromnodeitonodej.Eachnodehasthesameprobabilityto

bechosen.

2.Generationofasetofcandidatesolutionsofsizem(colonyof

ants).

3.Evaluationoftheresponseyofeachcandidatesolutions

accord-ingtotheobjectivefunctionf.

4.Iftheoptimalsolutionis found(orthemaximumnumberof

iterationsisreached)stoptheprocedure.Otherwise,continue.

5.Identificationoftheiteration-bestsolution,sib_.

6.CalculationoftheNBConthecurrentsetofcandidatesolutions.

Ateach iteration, NBCfocuses onthevalues oftheresponse

greaterthanaproblem-specificthreshold$,with$ ∈R.

7.Updatingtheprobabilitywithwhichanantwchoosestogofrom

nodeitonodejusingtheinformationextractedinpoints3and

4.

8.Gotopoint2.

Ateachiteration,thecandidatesolutionresponseiscomputed.

Inordertoupdatetheprobabilitywithwhichanantwchoosesto

gofromnodeitonodej,NACO,first,identifiestheiteration-best

solution,sib_and_then,_in_order_to_focus_on_the_candidate_solutions

thathavereachedthehighestresponses,candidatesolutionsare

dividedintwoclassesaccordingtoa certainconstantthreshold

($, with $ ∈R) such as the specific quantile of the response

(4)

function,$ canbeequal tothefirstquartile.Consequently,the

followingrulewillbeapplied:

yr=

$

y_r=1=1 I(y≤$)=1

y_r=0=0 I(y≤$)=0,

whereI()istheindicatorfunction.

Giventhesetofcandidatesolutionsandtheassignedclasses,

theNBCderivestheprobabilitiesP(y=yr)andP(x_il

i|yr)withr={0,

1}.Consideringcategoricalvariablestheseprobabilitydistributions

arerepresentedwithconditionalprobabilities,whichareestimated

fromthedatabycountingtheoccurrencesof eachvariableand

classlabelyr.Allthepossiblecombinationsobtainedwiththexili

inthecandidatesolutionswithyr=1=1arethenconsideredandthe

correspondingP(y=yr|xnew)calculated.Thepossiblesolutionwith

thehighestprobabilityisthen selectedandtherelatedP(xili|yr)

areusedto weighonthearcs.These weights arenamed Naïve

Informationanddenoted%.

4.1. Probabilityupdateprocedure

InNACO,eacharcdescribestheprobabilityofmovingfromnode

itonodej.Thisprobabilityisusuallycalculatedonthebasisof

pheromonequantityandheuristicinformation,i.e.apriori

infor-mationconcerningtheproblem.Inourapproachthisprobability

alsodependsonNaïveInformation%.

Giventhestateofthegraphatiterationt−1ofthealgorithm,

theNaïveInformation%ijisextractedfromthesetofsolutions

eval-uatedatiterationtandthesetof{%ij}isupdated.Subsequently,

theiteration-bestsolutionsib_is_identified_and_the_{corresponding}

pheromonevalueisupdated.

Theprobabilityatiterationtofgoingfromnodeitonodejwill

beobtainedbycombiningthepheromonevalueswiththeheuristic

values(ifavailable)andwithNaïveInformationasfollows:

pij(t)= [!ij(t)] ˛_[" ij]ˇ[%ij(t)]ı

"

p∈Ni[!ip(t)] ˛_[" ip]ˇ[%ip(t)]ı

∀

p∈Ni, (4)

where!ij(t)istheamountofpheromonevalueand%ij(t)isthe

NaïveInformationonarc(i,j)atiterationtand"i,jisthe

heuris-ticvalueonthesamearc(i,j).Ni denotesthesetofnodesthat

canbevisitedfromnodeiand˛,ˇandıaretheparametersthat

controltherelativeweightofthepheromonetrail,heuristic

infor-mationandNaïveInformation.Afteralltheantshavecompleted

theirtour,pheromoneevaporationonallarcsisappliedconsistent

withevaporationfactor#,#∈(0,1].

ACOperformancestronglydependsontheavailabilityof

rel-evant apriori information of theproblem. The introduction of

NaïveInformationaimstoavoidtheneedforheuristic

informa-tionin order totackle high-dimensional design problemswith

computationally-expensive complex functions without a priori

knowledge.

5. Simulationstudy

WeevaluatedtheefficiencyandrobustnessofNACOapproach

by developing a simulation study where we compared this

approachwithotheroptimizationtechniques.Weconsideredtwo

benchmark functions selected to test the performance of the

techniques.Thesefunctionsarefrequentlyusedtocompare

opti-mizationproceduresintheliterature[21,43].

5.1. Benchmarkfunctions

In oursimulation studytheresponse value ofeach solution

wasderivedassumingthemodely=ϕh(x)+

'

,

'

∼N(0,(_h2),where

h=1,2denotestwotypesofresponsesurface.Theoptimalsolution

minimizesthedeterministicfunctionϕh,whichwasinterpretedas

theobjectivefunction.WecomputedMonteCarlosimulation

com-posedof50runsandcharacterizedbydvariableswithklevelseach.

Arunwasstoppediftheglobalminimumwasachievedbeforea

maximumnumberofiterationsT=30.

TheResponseSurface1takesthefollowingform:

ϕ1(x1,...,xd)=10d+ d

%

i=1 k

%

l=1

&

ˇ_il2Iil−10cos(2)ˇil)Iil

'

, (5)

whereI_il=1,ifx_i=x_il,andI_il=0,otherwise.Foralli=1,...,d,the

constantsˇ_il,l=1,...,k,formedanequallyspacedgridofkelements

on(−5,5].Thefunctionϕ1hasmultiplelocalminimawithsimilar

valueswithrespecttotheglobalminimum.

TheResponseSurface2takesthefollowingform:

ϕ2(x1,...,x_d)=

%

E∈P:|E|≤d∗

"E, (6)

where"EistheeffectcorrespondingtotheindexsetE ∈P,P=

P({1,...,d})thepowersetconstructedfromindexes{1,...,d}and

thesumwastakenoverallvariablecombinationswithatmostd*

elements.Theeffects"Earecomputedas"E=

!

i∈E

ˇiliIili,whereIili

isanindicatorvariabletakingvalue1,ifx_i=x_il

i,(li=1,...,k)and

thecoefficientsˇili ∈{ˇ1,...,ˇk}formedaequallyspacedgridof

kelementsontheinterval(0,1].Inallthesimulationspresented

hered*_is_set_equal_to_⌈d/2⌉_in_accordance_to_[21]_._So,_when_using

d=4variables,wecomputeonlymaineffectsandtwo-ways

inter-actions.Ifd=20,wetakeallmaineffectsandallinteractionswith

atmostd*₌₁₀_variables.

5.2. Searchapproachesforcomparison

InordertovalidatetheperformanceofNACOweselectedthree

approachesfromtheliterature:

• MAX–MIN Ant System(MMAS)[42] is based onthree key

aspects.Firstofall,inthecolonyofantsonlyasingleantisused

toupdatethepheromonetrailsaftereachiteration helpingto

exploitthebestsolutionsfoundduringaniterationorduringthe

runofthealgorithm.Thesecondaspectrestrictstherangeof

pos-siblevaluesfortheprobabilityofchoosingaspecificarcwhich

helpstoavoidearlystagnation.Thefinalaspectrequiresthatthe

pheromonetrailisdeliberatelyinitializedtoamaximumvalue,

!max.Stützleetal.[42]showbynumericalanalysisthatMMAS

achievesastronglyimprovedperformancecompared toother

versionsofACOalgorithm.

WeimplementedanMMASwith˛=1,withoutanyapriori

informationregarding theproblemand withevaporationrate

#=0.88.Theparameter#wasusedtoavoidunlimited

accumu-lationofthepheromonetrailsenablingthealgorithmtoforget

baddecisionspreviouslytaken.Infact,ifanarc isnotchosen

bytheants,itsassociatedpheromonevaluedecreases

exponen-tiallyinthenumberofiterations[44].Sinceevaporationrate#

hasanessentialroleinthealgorithmsperformancewedecided

toperformadedicatedtuninganalysis.Theresultsareshownin

Fig.3wherethebehaviourofMMASwithdifferent#valuesis

reported.Fromthefigurewenoticethatwhen#=0.88the

algo-rithmisabletooutperformtheotherversionsofMMASinthe

twobenchmarkfunctionswithk=13andd=2.

• GeneticAlgorithm(GA)[13,14]isconsideredoneof themost

powerfulalgorithmsforoptimization.GAsareadaptiveheuristic

searchalgorithmsbasedontheevolutionaryideaofnatural

(5)

Fig.3.TuninganalysisofMMASwithdifferentvaluesof#withk=13andd=2.

selectioninthepresenceofvariation-inducingoperatorssuchas

mutationandrecombination(crossover).Further,afitness

func-tionisusedtoevaluatesolutions,andtheselectionprocessis

basedonfitnessvalues(responses).

We implementeda GAwhere thecandidatesolutionswere

generatedbyaone-pointcrossoverwitha5%mutationrateand

proportionalselectionassamplingschemes[21].

• Co-InformationCompositeLikelihood(COIL)[21] optimization

isanovelapproachforhighdimensionalproblems.COILisan

adaptiveprocedurewherenewcandidatesolutionsarechosen

byminimizingtheco-informationcompositelikelihood

objec-tivefunction,derivedfromcouplingimportancesamplingand

compositelikelihoodprinciples.

COILdependsontwoparameters:(i)afixedthreshold$that

dividestheresponseintwoclassesand(ii)aparameter"that

con-trolsthetrade-offbetweenexplorationandexploitationofthe

searchspace.Inthesesimulations,wheretheaimistominimize

theresponse,$isthefirstquartileoftheresponsedistribution

and"t=1−1/(1+t),t≤TwheretisthecurrentiterationandTthe

maximumnumberofiterations.Thischoicefor"tismotivated

bytheneedtothoroughlyexploreXduringtheinitialiterations

andexploitthebestsolutionsmoreefficientlylateron[21].

NACO was tested with the following fixed parameters: (i)

#=0.88and(ii)theconstantthreshold$wasequaltothefirst

quar-tileasinCOIL.˛andıweredecidedinaccordancewithpreliminary

resultsasshowninFig.4andhavebeenfixedto˛=1andı=2.We

didnotreporttheresultsobtainedwithresponsesurfaceϕ1since

thedifferentsettingsgavesimilarperformance.

Inallthefourapproachesthenumberofcandidatesolutions

tobetestedateachiterationwassetasm=k×20inordertohave

enoughcandidatesolutionsandrelatedresponsestoobtainreliable

estimationsateachiteration[21].Atuningparameterprocedure

wasimplementedin[21]tooptimizetheperformanceofGAand

COILonthesamebenchmarkfunctionsusedtooptimizetheNACO

parameters.

5.3. Computationalresults

In order to compare the performance of NACO with the

approaches presented in Paragraph 5.2, two indicators were

derivedtoshowthemaindifferencebetweenthefouralgorithms

• expected number of iterations to hit the optimal solutions,

denotedbyEHI;

• explored fraction of the search space, %X=(EHI×m×100)/

dim(X)wheredim(X)isthedimensionofthewholeexperimental

searchspace.

TounderstandhowNACObehavedwhenthedimensionofthe

search space increasedwe consideredvarious values for k and

d,correspondingtodim(X)=kd_≈₁₀4_,₁₀5 _and ₁₀6_. _The_results

are reportedin Table1.Table 1(a)reports theresultsobtained

fordim(X)≈104.Thebenchmarkfunctionconsideredforthe

spe-cific simulation is reported in the first column. In columns 2,

3, 4wereportthenumber oflevelsk,thenumber ofvariables

d considered in the specific instance and the number of

can-didate solutions tested at each iteration. The value of EHI and

theexploredfractionofthesearchspacewasthencalculatedfor

eachmethod.InTable1(b)and(c)thesameresultsarereported

but considering largersearchspaces, such asdim(X)≈105 _and

dim(X)≈106.

Our numerical studies show that NACO converges to the

optimuminallthedesignspacesizesandrequiresa verysmall

numberoftestedsolutionsinallsimulations.Thisisaverysmall

number,infact,atdim(X)≈104 NACOneedsjustlessthanthree

iterationstoreachtheoptimum,equalto0,ofresponsesurface

2 4 6 8 10 14.0 14.5 15.0 15.5 16.0 16.5 17.0 Experiment Batches Response Global Optimum =1, =1 =1, =2 =2, =1 =2, =2

Fig.4. TuninganalysisofNACOwithdifferentvaluesofıand˛withk=13andd=2 andresponsesurfaceϕ2.

(6)

M.Borrottietal./AppliedSoftComputing49(2016)259–268 Table1

Expectednumberofiterationsneededtoachievetheoptimalsolution(EHI)andthecorrespondentfractionofexploredsearchspace(%X)forMMAS,COILandGA.

k d m NACO MMAS COIL GA

EHI %X EHI %X EHI %X EHI %X

(a)dim(X)≈104 ϕ1 2 13 40 2.74 1.34 16.26 7.94 3.42 1.67 8.30 4.05 4 7 80 2.22 0.54 14.86 3.63 2.94 0.72 6.34 1.55 10 4 200 2.48 0.99 18.66 7.46 3.30 1.32 8.60 3.44 ϕ2 2 13 40 3.48 1.70 17.68 8.63 4.26 2.08 9.36 4.57 4 7 80 2.86 0.70 23.10 5.64 4.12 1.01 9.16 2.24 10 4 200 3.20 1.28 20.60 8.24 4.64 1.86 7.80 3.12 (b)dim(X)≈105 ϕ1 2 17 40 3.34 0.20 23.70 1.45 4.26 0.26 10.80 0.66 4 8 80 2.54 0.31 21.52 2.63 3.48 0.48 7.72 0.94 10 5 200 3.38 0.27 29.04 2.32 4.60 0.37 12.24 0.98 ϕ2 2 17 40 5.14 0.31 23.28 1.42 7.16 0.44 12.14 0.74 4 8 80 3.34 0.41 26.44 3.23 4.66 0.57 10.36 1.26 10 5 200 4.56 0.36 28.44 2.28 6.54 0.52 16.38 1.31 (c)dim(X)≈106 ϕ1 2 20 40 3.62 0.07 27.70 0.52 4.52 0.09 13.28 0.25 4 10 80 2.96 0.06 27.16 0.52 4.20 0.08 10.60 0.20 10 6 200 4.12 0.08 28.42 0.57 6.06 0.10 16.08 0.32 ϕ2 2 20 40 6.12 0.12 27.08 0.52 11.20 0.21 16.96 0.32 4 10 80 5.02 0.10 29.60 0.56 6.98 0.13 16.74 0.32 10 6 200 5.96 0.12 30.00 0.60 7.12 0.14 19.24 0.38

ϕ1andlessthanfouriterationsforresponsesurfaceϕ2,with

opti-mumvalueequalto13.89.Themethodperformsparticularlywell

whendim(X)≈106consideringresponsesurfaceϕ1,sinceNACO

needsless than0.07%(inaverage)of thetotal spacetoachieve

theoptimum.MMASperformsverybadlyinallthesimulations

becauseitdoesnotbenefitfromtheinclusionofheuristic

informa-tion,anessentialcomponentinACOalgorithms.Thisresultshows

thatthecombinationofthenaïveBayesclassifierintheACO

frame-workbooststheperformanceofNACOoutperformingalltheother

approaches.

Ifdandkincrease,NACOisstillabletoreachtheoptimumin

areasonabletime.Infact,consideringresponsesurfaceϕ2,k=10

andd=6,NACOrequires5.17iterations(inaverage)toreachthe

optimum,whichis61.67.Intheothercase,responsesurfaceϕ1,

NACOrequires3.57iterations(inaverage)toreachtheoptimum,

whichis0.

InFig.5,wecomparetheconvergenceofourmethodwithCOIL,

MMASandGAmethods.ThisanalysisconfirmsthatNACO

outper-formsMMASandGA,infactNACOtypicallyreachestheminimum

fasterthantheothertwoapproaches.NACOandCOILapproaches

havesimilarperformanceinalmostalldimensions.

WefurtherinvestigatedthedifferencebetweenNACOandall

theotherapproachesperformingapairwiseStudent’st-test.The

nullhypothesis,H0,isthattheaveragebestvalueisequalbetween

twoapproachesotherwisethetwovaluesaredifferent(alternative

hyphotesis,H1).Ifthep-value,p,islessthan(orequalto)˛pthen

thenullhypothesisisrejectedinfavorofthealternative

hypoth-esis.NACOis significantlydifferentwith˛p=0.05 ina pairwise

comparisonwithMMASinallinstances.NACOisalsosignificantly

differentwith˛p=0.05withGAmethodconsideringk=2andd=20

andk=10andd=(5,6)withresponsesurfaceϕ1.Ifweconsider

responsesurfaceϕ2,NACOissignificantlydifferentwithk=4and

d=10andk=10andd=(5,6).Ifweset˛p=0.10,NACOis

signif-icantlydifferentincomparisonwithGAmethodinallinstances

exceptfork=4and d=(7,10)withϕ1 andk=2and d=(13,17)

withϕ2.ThepairwiseStudent’st-testconfirmsthatNACOandCOIL

havesimilarbehaviourinallinstancessinceweacceptedthenull

hypothesisH0with˛p=0.05.

TofurtherunderstandthemaindifferencebetweenNACOand

COIL,weinvestigatethecomputationaltimeasafunctionofthe

averagenumberofsecondsneededtoreachtheoptimalvalue.We

thenranthetwoalgorithmsassumingthesamesettingofthe

previ-ousstudy.Alsointhiscase,wecomputedaMonteCarlosimulation

composedby50runs.Arunwasstoppediftheglobalminimum

wasachievedbeforeamaximumnumberofiterationsT=15.For

thisstudy,wedecidedtolimitthenumberofiterationssincewe

areinterestedinthecomputationalcostandnotinreachingthe

optimum.TheresultsarereportedinTable2inwhichthe

aver-agenumberofseconds,*sec,andrelativestandarddeviation,(,are

reportedforbothalgorithms.Intermsofcomputationaltime,NACO

performsbetterthanCOILinalmostalldimensions.Insomecases

thedifferenceislarge.Forinstance,NACOrequireslessthan13sto

reachtheoptimumwhenk=2andd=20whitϕ1.COILneedsmore

than38min.

Alsointhisstudy,weappliedapairwiseStudent’st-testtoassess

iftheaveragecomputationaltimeissignificantlydifferentbetween

theNACOandCOILmethods,alternativehyphotesisH1.TheNACO

p-value,p,islessthan˛p=0.05,thenthealternativehypothesisis

acceptedinalmostallinstancesexceptfork=10andd=(5,6)with

ϕ2.

6. Theartificial1AGYserineesterase

The NACO approach was used to address the problem of

designinganartificialenzymewhichmimicsthefold(i.e.

three-dimensionalshape)ofthenaturalenzyme1AGYoffungusFusarium

solani [38] (see Fig.6).Artificial enzymes composedof 4

sub-sequences (d=4) linkedtogether sequentially werebuilt and a

solutionxwasthenmadefromasequenceofthefourvariables.

Eachsubsequencewasembodiedbyaspecificstringofaminoacids

calledpseudo-domainschosenfrom acollectionof 95 different

pseudo-domains.Accordingly,thecorrespondingvariabledomain

wascomposedofk={1,...,95}levels.ThesearchspaceX

com-prisedallthepossiblepermutationsrepeatingkpseudo-domainsin

dpositionsequalling954₌_8.1_×₁₀7_possible_solutions._For

(7)

Fig.5.ConvergencebehaviorforExperiment1(ϕ1)andExperiment2(ϕ2)fortheNACO(solidlines),COIL(dashedlineswithdots),MMAS(dottedlines)andGA(dashed

lines)methods.Thefirst,secondandthirdrowsofthedisplayshowMonteCarloaveragesofthebestresponsevalueford=13,17,20,respectively,andk=2.Thevertical barsare95%MonteCarloconfidenceintervals.

66).Inthisproblem,theobjectivefunctionwastomaximizethe

structuralsimilaritybetweenagivencandidateartificialenzyme

andthenaturalenzyme1AGY[45].Thesimilarityscoreis

calcu-latedbymeasuringthe2Dstructuresimilaritytothetargetenzyme

bypairwisealignmentof2DstructurespredictedusingPSIPRED

algorithm[46,47].Thescorerangesfrom0to1000,thehigherthe

scorethehigheristhestructuresimilaritybetweenthecandidate

artificialenzymeandthetargetenzyme.

Generally,inenzymeengineeringproblems,direct

experimen-talevaluationofpotentialsolutionsistheonlymeansforknowing

how an enzyme performs. Experimentation, however, is costly

andtimeconsuming.Inourstudy,realexperimentationwas

cou-pledwiththeNACOalgorithm. Solutionswereevaluatedin the

realworldby conductingphysicalexperiments andby creating

newcandidatesolutionsusedinacomputationalsetting.Atthis

point, evaluationswerefedbacktothecomputationalphaseof

theapproachand thegenerationofsubsequentsolutionswasa

functionofthese.

Themaximumnumberofiterationswasestimatedin5cycles

considering current lab procedures and corresponding costs.

Furthermore, at each iteration, the number of candidate

solu-tions was 96, which corresponded to the technology used for

experimentations(96microwellplate).

6.1. Graphrepresentation

To address the Enzyme Engineering problem of designing a

particularenzyme whichmimics thenatural enzyme1AGY,we

Table2

Averagenumberofseconds,*sec,neededtoreachtheoptimum.ThemaximumnumberofiterationswasfixedtoT=15.Thestandarddeviation,(,isalsoreported.

213 ₂17 ₂20 ₄7 ₄8 ₄10 ₁₀4 ₁₀5 ₁₀6 ϕ1 NACO *₍sec 9.57_1.13 10.46_5.69 12.25_3.19 12.95_7.12 10.75_6.14 16.79_6.15 _21.4467.08 _27.7963.03 124.45_42.20 COIL *₍sec 12.33_1.73 230.70_65.05 2288.67_821.85 _1.296.66 33.75_5.18 6303.60_15908.38 _1.283.39 40.83_6.55 679.23_88.28 ϕ2 NACO *₍sec 9.21_3.09 27.07_20.89 55.19_59.32 17.74_7.73 16.80_6.18 35.96_16.50 _31.1147.04 _74.5792.62 1765.89_5035.15 COIL *₍sec 13.49_3.81 468.05_256.62 17549.74_23912.79 _2.277.86 38.86_12.97 1772.35_1095.41 _1.814.04 _11.7856.37 8639.33_18928.35

(8)

Fig.6. Thecrystalstructureoftargetprotein1AGY.Thepurpleelements

repre-sentalphahelices,theblueonesturnsandtheyellowonesabetastrand.Gray

linesarepredictedcoilwithnodefinedsecondarystructure.Catalyticresiduesare

representedinball-and-stickconfiguration.

designedanetworkwhereanodecorrespondstoapseudo-domain

andanarccorrespondstotheconnectionbetweenvariableiina

positionandvariablejinthenextpositionoftheenzymesequence

(seeFig.7).Thetotalnumberofnodeswas95×4=380sincean

enzymeis composedof 4subsequences (d=4)and each

subse-quence can be chosen among k=95 different pseudo-domains.

Therefore,acandidatesolutionisapathcomposedof4variables

(4nodes);wherethis pathrepresentsafull-lengthenzyme.We

assumedthatnoheuristicinformationwasavailablewithregards

totheproblem.

6.2. Results

Toaddresstheproblemofdiscoveringtheoptimalcombination

thatmimicsthefold(i.e.three-dimensionalshape)ofthenatural

Fig.7. Antcolonygraphrepresentationwhereantsmovefromxitoxj.Asolutionis

apathcomposedbydnodesandd−1arcs.

enzyme1AGY,wedevelopedNACOandwecomparedtheresults

with the COIL approach. The main descriptive statistics of the

responsevaluesarereportedinTable3.NACOidentifiedthehigher

responsevalueafterfouriterationsreachingtheresponsevalue

845.0andsampling384pointsinthesearchspace.COILneeded480

pointstoreachthesamevalue.Thisresultshowsthatinan

exper-imentalsettingNACO,comparedtoCOIL,ismoreefficientinterms

ofreducedcosts.Further,thethirdquantileinTable3,showsthat

NACOhasthecapacitytomovefastertowardstheoptimalregion

withrespecttoCOIL,incrementingthevalueof thethird

quan-tilebyabout50%fromthefirsttotheseconditeration.Thisisalso

confirmedapplyingaStudent’st-testbetweenvaluesachievedby

NACOandCOILmethods.WeperformedapairwiseStudent’st-test

foreachiterationwithH1:*NACOc =/*COILc withc=1,...,5.

Consid-eringiterationc=1,weacceptedH0sincethetwoapproachesstart

fromthesamesetofartificialenzymes.Consideringiterationsc=2

and3,theaveragebiological fitnessscoreresponsesare

signifi-cantlydifferentbetweenNACOandCOIL,whichconfirmsadifferent

behaviourtowardstheoptimalregion.Inthelasttwoiterations,

c=4and5,weacceptedH0.Ifweconsideralltheartificialenzymes

testedbythetwoapproaches,theaveragebiologicalfitnessscore

responsesaresignificantlydifferent.

In Fig.8,we reportedthebestenzyme sequence(see [5,11]

formoredetailsaboutaminoacidssequences),foundbyNACO,

thatmimics thetertiarystructureof thenaturalenzyme 1AGY.

1AGYischaracterizedbybilaterallysymmetricwithabeta-sheet

coreandasetofalphahelicesconnectedbyunstructuredloops.

ThebestartificialenzymefoundbyNACOinthefourthiteration

presentsasolidstructurearoundabeta-sheetcoreandbilateral

symmetrysimilartothenaturalenzyme1AGY.Althoughthebest

artificialenzymeidentifiedin thefourthiterationlackssomeof

001 - 050 EACI QGE SP I WAVGRGHATPAL RLT NVDTPQKP DAQ KDG RRL LKQEL TG L

051 - 100 D D S Y A V P R K A D C E T D G Q T R S E D S S K K Y K A S L F S A T M E V M Q Q W C Y K K T Q W D

101 - 150 QP HVQYQEE I A ES QGDL PNPGKRFNNKDSGGEKLGD NNP T GK SVP CVLA N

151 - 200 TGCDI SVNS HT YAPPET RAFDCSFKLS L EDI LANKI RKL F AH I VS KCGWG

Fig.8. AminoacidssequenceofthebestartificialenzymefoundbyNACO.Eachrowisasubsequenceof50aminoacids(i.e.pseudo-domain). Table3

MaindescriptivestatisticsofNACOandCOILforthe5iterations.Ineachiteration96solutionsaretested.

Iter. Min. 1stQu. Median Mean 3rdQu. Max

(a)NACOperformance

1 189.0 327.5 379.0 384.0 432.0 596.0 2 306.0 568.8 601.5 602.7 647.2 792.0 3 436.0 534.2 589.0 598.3 650.2 830.0 4 241.0 476.5 579.4 579.4 712.8 845.0 5 402.0 509.5 582.0 602.4 751.2 841.0 (b)COILperformance 1 189.0 327.5 379.0 384.0 432.0 596.0 2 230.0 338.8 399.5 424.8 498.2 759.0 3 217.0 412.0 493.0 482.4 549.8 754.0 4 219.0 491.0 562.5 552.8 615.2 780.0 5 400.0 533.5 585.0 590.1 643.0 845.0

(9)

thestructuralelementsofthetargetenzyme(e.g.misplacement

ofthealphaheliceswithrespecttobetacore),itclearlyshowsan

overallsimilaritytothetargetenzyme.

7. Conclusion

In this work, we have presented a novelapproach for high

dimensionalproblemswithcomputationally-expensivecomplex

functionswithoutaprioriknowledge.Themainideabehindthe

newmethodistocombinethenaïveBayesclassifiertoantcolony

optimizationtocreatetheresultingnaïveBayesantcolony

opti-mization(NACO)algorithm.

Numericalresultspoint outquick convergencetotheglobal

optimum.Oneinterestingfindingistheremarkablescalabilityof

themethodtotheoverallsizeofthedesignspaceX.Asthenumber

ofvariables/variableinteractionsincreases,NACOshowsgrowing

efficiency,measuredintermsofthetotalnumberofexperiments

neededtoreachtheoptimum.ThisaspectmakesNACOavalid

choicewhendealingwithhigh-dimensionalsettingscharacterized

bylargedesignspaces.

NACOhasalsobeentestedonacomplexenzymeengineering

problemwiththeaimofidentifyinganartificialenzymewhich

mimics thenatural enzyme 1AGY. The approach representsan

interactiveprocesswherethedialoguebetweendesignand

lab-oratory experimentation at each generation creates a path in

the combinatorial search spacethat may lead toward a region

of optimality. The evolving design requires a small number of

experimentalpointstotestfromgenerationtogenerationand,

con-sequently,alimitedinvestmentintermsofresources.Furthermore,

thesmallnumberoftestsmakeeachexperimentalphaseextremely

rapid.

Acknowledgements

Thiswork wassupported bythe EU integrated project

Pro-grammableArtificialCellEvolution(PACE)inFP6-IST-FET-002035,

Complex Systems Initiative and by the Fondazione di Venezia

– DICE project. The authors would like to acknowledge the

EuropeanCentreforLivingTechnology(www.ecltech.org)for

pro-vidingopportunitiesofpresentationandfruitfuldiscussionofthe

research.Furthermore,theauthorsaregratefultoThomas

Stüt-zle,MauroBirattariandAlessandraGiovagnolifortheirveryuseful

suggestions.Theauthorsgratefullyacknowledgetheeditorandthe

refereesfortheircommentsandsuggestions.

References

[1]C.-I.Branden,J.Tooze,IntroductiontoProteinStructure,GarlandPub.,1999.

[2]A.Zanghellini,Denovocomputationalenzymedesign,Curr.Opin.Biotechnol. 29(C)(2014)132–138.

[3]J.Damborsky,J.Brezovsky,Computationaltoolsfordesigningand engineeringenzymes,Curr.Opin.Chem.Biol.19(C)(2014)8–16.

[4]B.M.Nestl,B.A.Nebel,B.Hauer,Recentprogressinindustrialbiocatalysis, Curr.Opin.Chem.Biol.15(2)(2011)187–193.

[5]A.M.M.S.Ullah,ADNA-basedcomputingmethodforsolvingcontrolchart patternrecognitionproblems,CIRPJ.Manuf.Sci.Technol.3(4)(2010) 293–303.

[6]V.K.Garlapati,P.R.Vundavilli,R.Banerjee,Evaluationoflipaseproductionby geneticalgorithmandparticleswarmoptimizationandtheircomparative study,Appl.Biochem.Biotechnol.162(5)(2010)1350–1361.

[7]A.M.Mohsen,A.T.Khader,D.Ramachandram,Anoptimizationalgorithm basedonharmonysearchforRNAsecondarystructureprediction,in:Z.W. Geem(Ed.),RecentAdvancesinHarmonySearchAlgorithm,Studiesin ComputationalIntelligence,vol.270,SpringerBerlinHeidelberg,2010,pp. 163–174.

[8]Y.Zhang,J.Xu,Z.Yuan,H.Xu,Q.Yu,Artificialneuralnetwork-genetic algorithmbasedoptimizationfortheimmobilizationofcellulaseonthesmart polymerEudragitL-100,Bioresour.Technol.101(9)(2010)3153–3158.

[9]B.Blum,M.I.Jordan,D.Baker,Featurespaceresamplingforprotein conformationalsearch,Proteins78(6)(2010)1583–1593.

[10]M.Borrotti,D.DeLucrezia,G.Minervini,I.Poli,Amodelbasedantcolony designfortheproteinengineeringproblem,in:M.Dorigo,M.Birattari,G.Di Caro,R.Doursat,A.Engelbrecht,D.Floreano,L.Gambardella,R.Groß,E.Sahin, H.Sayama,T.Stützle(Eds.),SwarmIntelligence,LectureNotesinComputer Science,vol.6234,SpringerBerlinHeidelberg,2010,pp.352–359.

[11]A.M.M.S.Ullah,D.D’Addona,N.Arai,DNAbasedcomputingforunderstanding complexshapes,BioSystems117(1)(2014)40–53.

[12]K.A.DeJong,EvolutionaryComputation,MITPress,2006.

[13]D.E.Goldberg,GeneticAlgorithmsinSearch,OptimizationandMachine Learning,Addison-WesleyLongmanPublishingCo.,1989.

[14]J.H.Holland,AdaptationinNaturalandArtificialSystems,MITPress,1992.

[15]R.Baragona,F.Battaglia,I.Poli,EvolutionaryStatisticalProcedures, Springer-Verlag,2011.

[16]R.Berni,D.DeMarch,F.M.Stefanini,T-optimalityandneuralnetworks:a comparisonofapproachesforbuildingexperimentaldesigns,Appl.Stoch. ModelsBus.Ind.29(5)(2013)454–467.

[17]F.Caschera,M.A.Bedau,A.Buchanan,J.Cawse,D.deLucrezia,G.Gazzola, M.M.Hanczyc,N.H.Packard,Copingwithcomplexity:machinelearning optimizationofcell-freeproteinsynthesis,Biotechnol.Bioeng.108(9)(2011) 2218–2228.

[18]J.Ji,R.Hu,H.Zhang,C.Liu,AhybridmethodforlearningBayesiannetworks basedonantcolonyoptimization,Appl.SoftComput.11(4)(2011) 3373–3384.

[19]R.Tang,S.Fong,X.S.Yang,S.Deb,Integratingnature-inspiredoptimization algorithmstok-meansclustering,in:SeventhInternationalConferenceon DigitalInformationManagement,2012,pp.116–123.

[20]M.Forlin,D.Slanzi,I.Poli,Combiningprobabilisticdependencymodelsand particleswarmoptimizationforparameterinferenceinstochasticbiological systems,in:F.L.Gaol,Q.V.Nguyen(Eds.),Proceedingsofthe20112nd InternationalCongressonComputerApplicationsandComputationalScience, AdvancesinIntelligentandSoftComputing,vol.145,SpringerBerlin Heidelberg,2012,pp.437–443.

[21]D.Ferrari,M.Borrotti,D.D.March,Responseimprovementincomplex experimentsbyco-informationcompositelikelihoodoptimization,Stat. Comput.24(3)(2014)351–363.

[22]Y.Rubinstein,D.P.Kroese,TheCross-EntropyMethod:AUnifiedApproachto MonteCarloSimulation,RandomizedOptimizationandMachineLearning, SpringerVerlag,2004.

[23]B.G.Lindsay,Compositelikelihoodmethods,in:N.U.Prabhu(Ed.),Statistical InferencefromStochasticProcesses,1988,pp.221–239.

[24]M.Dorigo,Optimization,learningandnaturalalgorithms(Ph.D.thesis), PolitecnicodiMilano–DEI,1992.

[25]M.Dorigo,V.Maniezzo,A.Colorni,Antsystem:optimizationbyacolonyof cooperatingagents,IEEETrans.Syst.ManCybern.B:Cybern.26(1)(1996) 29–41.

[26]M.Dorigo,G.DiCaro,Theantcolonyoptimizationmetaheuristic,in:D.Corne, M.Dorigo,F.Glover,D.Dasgupta,P.Moscato,R.Poli,K.V.Price(Eds.),New IdeasinOptimization,McGraw-Hill,1999,pp.11–32.

[27]T.M.Mitchell,MachineLearning,McGraw-Hill,1997.

[28]R.Montemanni,D.H.Smith,L.M.Gambardella,Aheuristicmanipulation techniqueforthesequentialorderingproblem,Comput.Oper.Res.35(2008) 3931–3944.

[29]C.Blum,Beam-ACO–hybridizingantcolonyoptimizationwithbeamsearch: anapplicationtoopenshopscheduling,Comput.Oper.Res.32(6)(2005) 1565–1591.

[30]C.Blum,M.Y.Vallès,M.J.Blesa,AnantcolonyoptimizationalgorithmforDNA sequencingbyhybridization,Comput.Oper.Res.35(11)(2008)3620–3635.

[31]S.J.Shyu,C.Y.Tsai,Findingthelongestcommonsubsequenceformultiple biologicalsequencesbyantcolonyoptimization,Comput.Oper.Res.36(1) (2009)73–91.

[32]P.J.Bickel,E.Levina,SometheoryforFisher’slineardiscriminantfunction, naïveBayes,andsomealternativeswhentherearemanyvariablesthan observations,Bernoulli10(6)(2004)989–1010.

[33]I.Rish,AnempiricalstudyoftheNaïveBayesclassifier,in:TheIJCAI-01 WorkshoponEmpiricalMethodsinArtificialIntelligent,2001, pp.41–46.

[34]M.Yousef,M.Nebozhyn,H.Shatkay,S.Kanterakis,L.C.Showe,M.K.Showe, Combiningmulti-speciesgenomicdataformicroRNAidentificationusinga NaïveBayesclassifier,Bioinformatics22(11)(2006)1325–1334.

[35]G.L.Rosen,E.R.Reichenberger,A.M.Rosenfeld,NBC:theNaïveBayes classificationtoolwebserverfortaxonomicclassificationofmetagenomic reads,Bioinformatics27(1)(2011)127–129.

[36]M.Sahami,S.Dumais,D.Heckerman,E.Horvitz,ABayesianapproachto filteringjunke-mail,in:LearningforTextCategorization:Papersfromthe 1998Workshop,1998,pp.1–8.

[37]F.Sambo,E.Trifoglio,B.D.Camillo,G.M.Toffolo,C.Cobelli,BagofNaïveBayes: biomarkerselectionandclassificationfromgenome-wideSNPdata,BMC Bioinf.13(14)(2012)1–10.

[38]S.Longhi,M.Czjzek,V.Lamzin,A.Nicolas,C.Cambillau,Atomicresolution (1.0a)crystalstructureofFusariumsolanicutinase:stereochemicalanalysis,J. Mol.Biol.268(4)(1997)779–799.

[39]C.Blum,A.Roli,Metaheuristicincombinatorialoptimization:overviewand conceptualcomparison,ACMComput.Surv.35(3)(2003)268–308.

[40]L.Gambardella,M.Dorigo,Anantcolonysystemhybridizedwithanewlocal searchforsequentialorderingproblem,INFORMSJ.Comput.12(3)(2000) 237–255.

(10)

M.Borrottietal./AppliedSoftComputing49(2016)259–268 [41]P.Pellegrini,E.Moretti,Acomputationalanalysisonahybridapproach:

quick-and-dirtyantcolonyoptimization,Appl.Math.Sci.3(23)(2009) 1127–1140.

[42]T.Stützle,H.Hoos,MAX–MINantsystem,FutureGener.Comput.Syst.16 (8)(2000)889–914.

[43]X.-S.Yang,EngineeringOptimization:AnIntroductionwithMetaheuristic Applications,JohnWileyandSons,2010.

[44]M.Dorigo,T.Stützle,AntColonyOptimization,BradfordCompany,Scituate, MA,USA,2004.

[45]G.Minervini,G.Evangelista,L.Villanova,D.Slanzi,D.D.Lucrezia,I.Poli,P.L. Luisi,F.Polticelli,Massivenon-naturalproteinsstructurepredictionusing gridtechnologies,BMCBioinf.10(6)(2009)1–9.

[46]D.T.Jones,Proteinsecondarystructurepredictionbasedonposition-specific scoringmatrices,J.Mol.Biol.292(2)(1999)195–202.

[47]D.Slanzi,D.DeLucrezia,I.Poli,QueryingBayesiannetworkstodesign experimentswithapplicationto1AGYserineesteraseproteinengineering, Chemom.Intell.Lab.Syst.149(A)(2015)28–38.