Fa oltà di Ingegneria
Corso diLaurea Spe ialisti ain IngegneriaInformati a
tesi di laurea
Logo Dete tion Appli ation
Extra tion, Mat hing, Segmentation
and Classi at ion
Relatore: MassimoMelu i
Correlatore: RoelofvanZwol
Laureando: Mi hele Trevisiol
Questatesidilaureaèstatala on lusionediun'esperienzain redibile he
hovissutotra PadovaeBar ellona,tra un mondouniversitarioeun ambi-
entelavorativodiri er a,un'avventura hemihafatto res ere eimparare
molto. Tutto questo lo devo al prof. Massimo Melu i dell'Università di
Padova, he miha permessodi s opriree divivere un mondonuovo.
A Yahoo! sono stato guidato e a ompagnato da persone e ezionali: il
prof. Ri ardoBaeza-Yates diYahoo! Resear h Bar elona, he mihadato
un'opportunitàperilmiofuturo, ildr. RoelofvanZwolel'ing. LluisGar-
ia Pueyo, on ui ho ollaborato, he mi hanno fatto onos ere il mondo
della ri er a seguendomi in questi mesi, dandomi du ia e insegnandomi
tanto.
Ringrazio la mia famiglia, mia mamma Daniela, mio papà Maurizio e
mia sorella Martina, he non mi hanno mai negato un'opportunita' e he
sono sempre stati presenti.
Un ringraziamento spe iale va a Marta, he è stata sempre al mio -
an o, dandomi un appoggio per non adere e on la quale spero di vivere
an ora tante importanti esperienze. Non posso non itare Nello, Balto,
Enri o, Elisa, Matteo, Lally, Fox e an ora Matteo he sono persone su
ui possosempre ontare. Silvia,perlasua ostantepresenza, Alvise,Ste-
fanoeMauro,peressermistativi inisoprattuttoinquestiultimimomenti.
Ringrazio tutti i miei ompagni di università, on ui ho ondiviso gli
studiegliesami,maan hemolteserateestupendeavventure. Imieiami i
diVeronaedi al etto on uihopassatotantimomentiindimenti abilitra
emozioni, viaggi e terzi tempi. Ultime ma non meno importanti le nuove
ami iziediBar ellona, legaminuovi maforti he mihannoa ompagnato
inquesto per orso.
Aknowledgements
Abstra t I
1 Introdu tion III
2 Related Work 1
3 Test Colle tion 5
3.1 Logo Dete tion Database . . . . . . . . . . . . . . . . . . . 5
3.2 Logos Colle tion . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 Developing. . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Fli krColle tion . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Test and Train Set . . . . . . . . . . . . . . . . . . . . . . 12
4 Features Extra tion 15 4.1 SIFT vs SURF . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.1 SIFT . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.2 SURF . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Mat hing Points. . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Mat hing Te hnique Comparison . . . . . . . . . . 21
4.2.2 Correlation . . . . . . . . . . . . . . . . . . . . . . 21
4.2.3 Bounding Box . . . . . . . . . . . . . . . . . . . . . 22
4.3 Image Filtering . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Problemsand Con lusions . . . . . . . . . . . . . . . . . . 26
5 Colle tion Segmentation 29 5.1 Varian e-Balan ed De ision Tree . . . . . . . . . . . . . . 30
5.1.1 Implementation . . . . . . . . . . . . . . . . . . . . 31
5.1.2 Experiments . . . . . . . . . . . . . . . . . . . . . . 33
5.2 RandomVe tors . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.2 Signatures Distribution . . . . . . . . . . . . . . . . 37
5.3 Other Te hniques . . . . . . . . . . . . . . . . . . . . . . . 39
5.3.1 K-Means Clustering. . . . . . . . . . . . . . . . . . 40
6 Classi ation and Ranking 43 6.1 Classi ation Methodology . . . . . . . . . . . . . . . . . . 44
6.1.1 Varian e-Balan edDe ision Tree . . . . . . . . . . 44
6.1.2 (Semi-)RandomVe tors . . . . . . . . . . . . . . . 45
6.1.3 Other Te hniques . . . . . . . . . . . . . . . . . . . 46
6.2 Ranking of the Candidates . . . . . . . . . . . . . . . . . . 47
6.2.1 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2.2 ProportionalTe hnique. . . . . . . . . . . . . . . . 49
7 Con lusions and Future Developments 51 7.1 Future Developments . . . . . . . . . . . . . . . . . . . . . 52
This thesis des ribes the resear h work arried out to full the Master in
Computer S ien e at the University of Padua. The work was performed
during a visit at Yahoo! Resear h in Bar elona and was supervised by
Roelof van Zwol and Lluis Gar ia Puyeo. The visit was funded by the
EU Erasmus Programme. The resear h work des ribed inthis thesis was
part ofa largerresear h proje t whi h isunderway atYahoo! Resear h in
Bar elona.
Theresar hworkdes ribed inthisthesisaimedataddressingthe prob-
lemofdete tingandretrievingallthe logos ontainedinanimagegiven as
input. The problemofdete ting and retrievingalogo onsists ofavariety
of steps, ea h of whi h has dierent possible solutions. These steps were
investigatedinthisresear hwork. Althoughlogodete tionisanimportant
resear hproblemdue tothepotentialindustrialimpa t,asolutionhasnot
yetbeen proposed in the literature to our knowledge.
This thesisisorganizedintwoparts: olle tionpreparationand exper-
imentalstudy. Asregardstotheformer, a olle tionof logoswasdesigned
and implementedtotrain the lassier, toidentify and toextra tthe logo
features whi h were eventuallyused forlogo dete tion. The latter regards
thedete tionoflogosfromaninputimage. Inparti ular,theexperimental
study aimed to dete t if the input image ontains one or more logos and
Introdu tion
Beingabletoanalyzeavideo, extra tframesandtryingtogureoutwhat
they ontain automati ally is still anopen hallenge. The problem of en-
tity dete tion and re ognition in videos an be seen as the analysis of a
frames sequen e and, therefore, simplied to the analysis of single image.
Unfortunately there are no easy ways to identify orre tly what a pi ture
ontains.
This thesis is a pea e of a larger proje t fo used on the analysis of
images,toexaminewhetherit ontainslogos, andif so,tore ognize them.
What is a logo? It's a name, symbol or trademark designed for an easy
re ognition. Thus, given a generi image, the goal of the proje t is to
ompare its ontent with all the brands in a database and identify whi h
ones mat h with input image. Moreover, we will try tolo ate the logo in
the input image, obtainingthe orre tposition.
To better understandhowit'sstru tured thiswork,it'sne essarysplit
the main pro ess in two parts: the rst one is the data olle tions prepa-
ration, a large set of logos and images for training and testing, and the
se ond one is the real appli ation. This part is resumed in the following
steps:
Features Extra tion: Extra t the key-points of allthe logos (and all
the test images). These points are very des riptive for the ontents
of an image, and they are used to ompare and to analyze parts of
images.
Features Segmentation: Given a large set of key-points, extra ted
from the logos olle tion, we have to segment all of them to obtain
Forreal appli ationwemean the maingoal of the proje t,that ishow
to retrieve the logo(s) from an image given as input. We an resume this
pro edure inthe followingpoints:
Extra t the features: Dene the key-points of the input image with
the same te hnique used toextra t the key-points fromthe logos.
Classify the features: Given the features of the image, we want to
nd the logos segment that mat h better ea h key-point. All the
logos points in those segments are onsidered possibles andidates
logos.
Rank the andidates: Now we have a list of logos whi h have in
ommons some key-points with the input image. We want to rank
these points by some hara teristi s and keep only the most similar
ones.
Sele t the logo(s): After the ranking we have all the informationto
de ide if there is or not a logo inside the image and, in ase, whi h
one is ontained. Besidesinthispart we ande ideifthereare more
than one logo.
So, toarrange the assortment of brands, we retrieved all logos from a
web site famous to host alarge olle tionof more than 150K logos 1
. This
olle tion onstitutes our train set. On the other hand we used 16K im-
ages fortesting,andweretrieved themfromawebsitethat guarantees the
presen e of logos inits images 2
.
The main problem related to the proje t goal is how to ompare two
images . We don't need to ompare if an image is more or less similar to
another, butifanimageisin ludedintoanother. Theaimisverydierent
and the omplexity too. Todothis thereis onlyonewayinliterature: ex-
tra t the features pointsof both imagesand use them forthe omparison.
The rst issue is how to treat these points, be ause if we want to dete t
the logos features we need a way to ompare all the input image points
with all the logos ones: and there is no question to do it point-to-point.
In this ontextitisveryimportanttoknowwhatimagefeatures are: they
are the hara teristi parts of an image, texture, olor, and shape are ex-
amples of image features. There are some algorithms that an omputes
1
BrandsoftheWorld,http://www.brandsoftheworld. om/
2
Group "Brands, Marks & Logos" at Fli kr,
them asve tors, usually they are stru tured with64 or128 dimensions of
oating values. It is ne essary larify the dieren e from global and lo al
features. Therst one des ribes the wholeimage,ora bigpartof that, as
the brightness orthe ontrast of aphoto. The se ondone ismorespe i ,
itdes ribesapoint,oragroupof losepointsthatare very hara teristi s
for the image,but for example, the brightness in asmall part an be very
dierentfromthe globalvalue. This proje tuse thelo alfeaturesbe ause
it ompares parts of images and not the entire ones. The main problem
then is howto extra t features fromimages and how tomat hthem with
logos inthe trainingset. Weextra ted themobtainingaunique signature
foreveryinterestingpointoftheimage. The hoi eofthesepointsdepends
onthe approa hused: usuallyonefeatureisdes ribedbyoneve tor alled
lo al des riptor. Thereare several methodstoextra tthese points;inthis
thesis we'lldis uss about the main ones and we'llexplain our de ision.
It is important to onsider the working environment of this proje t, be-
ausetheautomati re ognitionoflogosimpliesastorageofalargenumber
of data: imagesof logos, informationonthe label,features extra ted, et .
A ordinglyone of the problems is the omputation time to ompare the
image with all the logos, we used 175K brands even if the number is ex-
pe ted to growday by day.
For the features extra tion, the problem is not in the image but in the
brand, be ause a logo may onsist of symbols, words or anything else. A
ommon and problemati ondition is that it ould be very simple with-
out some parti ular details than helps to a good dete tion; in on lusion
it may be hard dene a good features extra tor and a good omparison
metri s. After a brief resear h in literature, we de ided to use the SURF
des riptor. This de ision was taken for two main reasons: pre ision and
fast omputationspeed. Giventheinputimageandits al ulatedfeatures,
weneedtomat hits hara teristi swiththelogosimagesinthe olle tion.
Toavoidthetraversaloftheentire olle tionwithone-to-one omparisons,
weuse some te hniques tosegmentthe featuresin sub-groupsby ommon
properties. Anumeri alexampletounderstandbetterthesituationshould
be this: a logos (200x200) has an average of 100 features points and our
olle tion ontains 175K logos so they willbe 17Milion of points; instead
an image (500x375) has in average 440 features points: so, a set of om-
parisons point-to-point should be about 7500 million operations and this
is una eptable.
We analyzed dierent methods to resolve the problem: K-Means luster-
the points olle tionin groupsusing one of these te hniques, and retrieve
a representative ve tor for ea h of them. So, on e omputed the ve tors
of the input image, they willbe mat hed only with these ve tors and not
with allthe olle tion. By this we an redu e the number of omparisons
and onsequentlythe omputationtime. To ompare twove torsinamul-
tidimensionalspa e thereare manyway: weused the Pearson Correlation
that measures the similarity al ulating the osine of the angle between
them. Itisate hniqueoftenusedinTextMiningandinmanyotherelds.
It's important to dene how is stru tured this thesis be ause, as we
said, this is a part of a more omplex proje t. This is a Master Thesis
in Computer S ien e and ontains in details only the work developed by
the undersigned. The se tion that regards parts of dierent authors are
orre tlyhighlighted. Forthatreasontherearenotalltheexperimentation
andallthete hniquesused,butonlythemainones,nevertheless thethesis
is wrote togivea full idea of the proje t and it isstru tured as follow:
Chapter two: "Related Works" , It's essential spe ied the sour es
that have allowed us to study, understand and resolve the various prob-
lems en ountered inthe work.
Chapter three: "Test Colle tion" , In this hapter we'll des ribe the
two olle tions used: one with the brands logos and the other one with
Fli kr'sphotos. Therst oneaslogossour eandthese ondoneastesting
images. Willbe dis uss the reasonsof how itwas possible toobtain these
olle tions.
Chapter four: "Features Extra tion" ,Willbedis ussabout available
features and its problems. There are dierent des riptors, we'll see the
better one for the proje t after a short omparison with other methods.
Furthermore there will be abrief some appli ableee ts tothe imagesto
try to dete t the features in a more learly way. In the end we'll des ribe
how tomat h the points with relatedproblems and solutions.
Chapter ve: "Colle tion Segmentation", In this hapter we'll ana-
lyzehowtosegmentthe olle tion. We'lldes ribethe te hniquestryingto
highlightthe advantages and disadvantages of ea h methods with a short
des ription of the implementation ode.
Chapter six: "Classi ation and Ranking " , This is the real appli a-
logosretrieved. It ontainsthe lassi ationsofthefeatures extra tedand
the sele tionbetween the mostrepresentativelogos segments. Moreoverit
ontains the te hniques torank the logos andidates.
Chapter seven: "Con lusions and Future Work" , This is the nal
dis ussion aboutthe work,with ommentsand observationsand moreover
with suggestions and future steps or goals topursue.
Related Work
In this hapter we will prin ipally des ribe the state-of-the-art whi h will
be the basis of our work: features des riptor, segmentation, lassi ation
and lustering. There will be a short des ription of many te hniques al-
readyimplementedandusedin ertain ases whi hare losertoour needs.
There are dierent evaluation points to analyze, and some texts and pa-
pers were very useful for our de ision.
When we talk about logo dete tion we must onsider dierent mean-
ings: the main one is the identi ation of the logo into a do ument [1℄.
In this ase the dete tion ismade easierbe ause of thewhite ba kground.
In [2℄is ommonly used geometri invariants to prune the logos database
and lo al invariants to have a more rened mat h; on the other hand, in
[3℄ they used spatial density to involve a page segmentation omputing.
Anotherspread eld isthe removalof the hannel brandfroma television
program. This issueis analyzedinmany worksby Zhu[4℄ [5℄ orby Zhang
[6℄. There are dierent ways used to solve these topi , but they are all
based onsome fa ts: in televisionthe logo is ommonlyin a orner inthe
samepositionand,moreover, the ba kground images hange ontinuously,
instead, the logo's pixels that remain xed [7℄. This is useful for om-
mer ialdevelopment: in the transmissions re order where it'srequired an
automati removaloftheadvertising. It'sinterestingtonoti ethatduring
a television program they keep the tv logo but not during the advertise-
ments.
However the original goal of this proje t isto dete ta logo in avideo,
the main issue is to manageand retrieve the multimediadata [8℄. A brief
but omplete introdu tionof this area is des ribed in [9℄. The arti le was
ontent analysis stru tured on the following points: feature extra tion,
stru ture analysis, abstra tion, and indexing. This is broadly the pro-
edure adopted by our proje t, but ea h pro ess poses many hallenging
resear h problems. In this area we annot avoidmentioningthe Sivi and
Zisserman's Video Google [10, 11℄, these two works are done respe tively
in 2003 and in 2006 in whi h they reated a strong onne tion between
video andtextretrieval. They ompared avideoframetoado umentand
they onsidered avisual interest point asa word, applying the te hniques
of Text Retrieval. Again, in 2005, they published a te hni al report for
MIT [12℄ with unsupervised method. A pe uliar aspe t is the ompari-
son with a semi-supervised approa h [13℄. In almost allthese works they
used the SIFT Des riptor [14℄. In this proje twe willuse adierentkind
of feature extra toralgorithm with similarbut faster performan e, the so
alled SURF [15, 16℄. Thereare other ways toextra t lo alfeatures from
an image su h as [17, 18, 19, 20℄, this does not depend on the presen e
of edges or orners in the image but it is purely intensity-based whi h is
invariant to full ane transformations. The Harris dete tor [21℄ instead,
based onthe prede essorpresented by Morave [22℄ inhisPhD thesis,has
the aimtosele t the orners hara teristi s. On the otherhand the Maxi-
mally StableExtremal Regions [23℄isanotherblobdete tion methodwhi h
extra ts some area of interest, similar to a image segmentation; there is
additionallyanextendedversionwith oloursupport[24℄. Therearemany
debatesanddis ussionsaboutwhi histhebestdes riptorin[25,26,27,28℄
for example, but it ontinues tobe anissue withoutsolution.
Themainproblemoflogodete tioninimagesistondthebest wayto
omparethefeaturesextra tedfromtheimagestotheones olle tedinthe
logo database. Ifwedoitpoint-to-pointahuge numberof omparisonwill
o ur. Forthis reason we need some expedients. Furtheraid toour s ope
it isthe segmentation of the image. Segmenting animageintoregionsit's
very helpful be ause the area toextra t the features is alreadysegmented
and we an ompare the features only for determinate surfa es. MIT [29℄
and Berkeley[30℄built someappli ationsuseful totest theirsegmentation
te hniques. The issue of these methodsis the hard ompatibility with the
features extra tor algorithm. Indeed as SIFT or SURF dete t a lot of
points near the orners or lose the borders, omputing them only in one
region an lead to lossof information. Another way to redu e operations
and omputationaltime istosegment ingroupsall the set ofve tors, and
to analyze the image features to dene whi h is the best group for every
is the way hosen by this proje t.
In this thesis, to split the ve tors olle tion of logos' features, we use
a de ision tree, spe i ally a binary de ision tree. This is a method om-
monly used in de ision analysis to help identi ation. Indeed it'sused to
nd the logo into a do ument too, for example in [31℄ to redu e the false
positive fromthe andidates.