• Non ci sono risultati.

LluisGar- ia Pueyo, on ui ho ollaborato, he mi hanno fatto onos ere il mondo della ri er a seguendomi in questi mesi, dandomi du ia e insegnandomi tanto

N/A
N/A
Protected

Academic year: 2021

Condividi "LluisGar- ia Pueyo, on ui ho ollaborato, he mi hanno fatto onos ere il mondo della ri er a seguendomi in questi mesi, dandomi du ia e insegnandomi tanto"

Copied!
74
0
0

Testo completo

(1)

Fa oltà di Ingegneria

Corso diLaurea Spe ialisti ain IngegneriaInformati a

tesi di laurea

Logo Dete tion Appli ation

Extra tion, Mat hing, Segmentation

and Classi at ion

Relatore: MassimoMelu i

Correlatore: RoelofvanZwol

Laureando: Mi hele Trevisiol

(2)
(3)

Questatesidilaureaèstatala on lusionediun'esperienzain redibile he

hovissutotra PadovaeBar ellona,tra un mondouniversitarioeun ambi-

entelavorativodiri er a,un'avventura hemihafatto res ere eimparare

molto. Tutto questo lo devo al prof. Massimo Melu i dell'Università di

Padova, he miha permessodi s opriree divivere un mondonuovo.

A Yahoo! sono stato guidato e a ompagnato da persone e ezionali: il

prof. Ri ardoBaeza-Yates diYahoo! Resear h Bar elona, he mihadato

un'opportunitàperilmiofuturo, ildr. RoelofvanZwolel'ing. LluisGar-

ia Pueyo, on ui ho ollaborato, he mi hanno fatto onos ere il mondo

della ri er a seguendomi in questi mesi, dandomi du ia e insegnandomi

tanto.

Ringrazio la mia famiglia, mia mamma Daniela, mio papà Maurizio e

mia sorella Martina, he non mi hanno mai negato un'opportunita' e he

sono sempre stati presenti.

Un ringraziamento spe iale va a Marta, he è stata sempre al mio -

an o, dandomi un appoggio per non adere e on la quale spero di vivere

an ora tante importanti esperienze. Non posso non itare Nello, Balto,

Enri o, Elisa, Matteo, Lally, Fox e an ora Matteo he sono persone su

ui possosempre ontare. Silvia,perlasua ostantepresenza, Alvise,Ste-

fanoeMauro,peressermistativi inisoprattuttoinquestiultimimomenti.

Ringrazio tutti i miei ompagni di università, on ui ho ondiviso gli

studiegliesami,maan hemolteserateestupendeavventure. Imieiami i

diVeronaedi al etto on uihopassatotantimomentiindimenti abilitra

emozioni, viaggi e terzi tempi. Ultime ma non meno importanti le nuove

ami iziediBar ellona, legaminuovi maforti he mihannoa ompagnato

inquesto per orso.

(4)
(5)

Aknowledgements

Abstra t I

1 Introdu tion III

2 Related Work 1

3 Test Colle tion 5

3.1 Logo Dete tion Database . . . . . . . . . . . . . . . . . . . 5

3.2 Logos Colle tion . . . . . . . . . . . . . . . . . . . . . . . 8

3.2.1 Developing. . . . . . . . . . . . . . . . . . . . . . . 9

3.3 Fli krColle tion . . . . . . . . . . . . . . . . . . . . . . . 9

3.4 Test and Train Set . . . . . . . . . . . . . . . . . . . . . . 12

4 Features Extra tion 15 4.1 SIFT vs SURF . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 SIFT . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.2 SURF . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2 Mat hing Points. . . . . . . . . . . . . . . . . . . . . . . . 20

4.2.1 Mat hing Te hnique Comparison . . . . . . . . . . 21

4.2.2 Correlation . . . . . . . . . . . . . . . . . . . . . . 21

4.2.3 Bounding Box . . . . . . . . . . . . . . . . . . . . . 22

4.3 Image Filtering . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 Problemsand Con lusions . . . . . . . . . . . . . . . . . . 26

5 Colle tion Segmentation 29 5.1 Varian e-Balan ed De ision Tree . . . . . . . . . . . . . . 30

5.1.1 Implementation . . . . . . . . . . . . . . . . . . . . 31

5.1.2 Experiments . . . . . . . . . . . . . . . . . . . . . . 33

5.2 RandomVe tors . . . . . . . . . . . . . . . . . . . . . . . 35

(6)

5.2.2 Signatures Distribution . . . . . . . . . . . . . . . . 37

5.3 Other Te hniques . . . . . . . . . . . . . . . . . . . . . . . 39

5.3.1 K-Means Clustering. . . . . . . . . . . . . . . . . . 40

6 Classi ation and Ranking 43 6.1 Classi ation Methodology . . . . . . . . . . . . . . . . . . 44

6.1.1 Varian e-Balan edDe ision Tree . . . . . . . . . . 44

6.1.2 (Semi-)RandomVe tors . . . . . . . . . . . . . . . 45

6.1.3 Other Te hniques . . . . . . . . . . . . . . . . . . . 46

6.2 Ranking of the Candidates . . . . . . . . . . . . . . . . . . 47

6.2.1 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . 49

6.2.2 ProportionalTe hnique. . . . . . . . . . . . . . . . 49

7 Con lusions and Future Developments 51 7.1 Future Developments . . . . . . . . . . . . . . . . . . . . . 52

(7)

This thesis des ribes the resear h work arried out to full the Master in

Computer S ien e at the University of Padua. The work was performed

during a visit at Yahoo! Resear h in Bar elona and was supervised by

Roelof van Zwol and Lluis Gar ia Puyeo. The visit was funded by the

EU Erasmus Programme. The resear h work des ribed inthis thesis was

part ofa largerresear h proje t whi h isunderway atYahoo! Resear h in

Bar elona.

Theresar hworkdes ribed inthisthesisaimedataddressingthe prob-

lemofdete tingandretrievingallthe logos ontainedinanimagegiven as

input. The problemofdete ting and retrievingalogo onsists ofavariety

of steps, ea h of whi h has dierent possible solutions. These steps were

investigatedinthisresear hwork. Althoughlogodete tionisanimportant

resear hproblemdue tothepotentialindustrialimpa t,asolutionhasnot

yetbeen proposed in the literature to our knowledge.

This thesisisorganizedintwoparts: olle tionpreparationand exper-

imentalstudy. Asregardstotheformer, a olle tionof logoswasdesigned

and implementedtotrain the lassier, toidentify and toextra tthe logo

features whi h were eventuallyused forlogo dete tion. The latter regards

thedete tionoflogosfromaninputimage. Inparti ular,theexperimental

study aimed to dete t if the input image ontains one or more logos and

(8)
(9)

Introdu tion

Beingabletoanalyzeavideo, extra tframesandtryingtogureoutwhat

they ontain automati ally is still anopen hallenge. The problem of en-

tity dete tion and re ognition in videos an be seen as the analysis of a

frames sequen e and, therefore, simplied to the analysis of single image.

Unfortunately there are no easy ways to identify orre tly what a pi ture

ontains.

This thesis is a pea e of a larger proje t fo used on the analysis of

images,toexaminewhetherit ontainslogos, andif so,tore ognize them.

What is a logo? It's a name, symbol or trademark designed for an easy

re ognition. Thus, given a generi image, the goal of the proje t is to

ompare its ontent with all the brands in a database and identify whi h

ones mat h with input image. Moreover, we will try tolo ate the logo in

the input image, obtainingthe orre tposition.

To better understandhowit'sstru tured thiswork,it'sne essarysplit

the main pro ess in two parts: the rst one is the data olle tions prepa-

ration, a large set of logos and images for training and testing, and the

se ond one is the real appli ation. This part is resumed in the following

steps:

ˆ Features Extra tion: Extra t the key-points of allthe logos (and all

the test images). These points are very des riptive for the ontents

of an image, and they are used to ompare and to analyze parts of

images.

ˆ Features Segmentation: Given a large set of key-points, extra ted

from the logos olle tion, we have to segment all of them to obtain

(10)

Forreal appli ationwemean the maingoal of the proje t,that ishow

to retrieve the logo(s) from an image given as input. We an resume this

pro edure inthe followingpoints:

ˆ Extra t the features: Dene the key-points of the input image with

the same te hnique used toextra t the key-points fromthe logos.

ˆ Classify the features: Given the features of the image, we want to

nd the logos segment that mat h better ea h key-point. All the

logos points in those segments are onsidered possibles andidates

logos.

ˆ Rank the andidates: Now we have a list of logos whi h have in

ommons some key-points with the input image. We want to rank

these points by some hara teristi s and keep only the most similar

ones.

ˆ Sele t the logo(s): After the ranking we have all the informationto

de ide if there is or not a logo inside the image and, in ase, whi h

one is ontained. Besidesinthispart we ande ideifthereare more

than one logo.

So, toarrange the assortment of brands, we retrieved all logos from a

web site famous to host alarge olle tionof more than 150K logos 1

. This

olle tion onstitutes our train set. On the other hand we used 16K im-

ages fortesting,andweretrieved themfromawebsitethat guarantees the

presen e of logos inits images 2

.

The main problem related to the proje t goal is how to ompare two

images . We don't need to ompare if an image is more or less similar to

another, butifanimageisin ludedintoanother. Theaimisverydierent

and the omplexity too. Todothis thereis onlyonewayinliterature: ex-

tra t the features pointsof both imagesand use them forthe omparison.

The rst issue is how to treat these points, be ause if we want to dete t

the logos features we need a way to ompare all the input image points

with all the logos ones: and there is no question to do it point-to-point.

In this ontextitisveryimportanttoknowwhatimagefeatures are: they

are the hara teristi parts of an image, texture, olor, and shape are ex-

amples of image features. There are some algorithms that an omputes

1

BrandsoftheWorld,http://www.brandsoftheworld. om/

2

Group "Brands, Marks & Logos" at Fli kr,

(11)

them asve tors, usually they are stru tured with64 or128 dimensions of

oating values. It is ne essary larify the dieren e from global and lo al

features. Therst one des ribes the wholeimage,ora bigpartof that, as

the brightness orthe ontrast of aphoto. The se ondone ismorespe i ,

itdes ribesapoint,oragroupof losepointsthatare very hara teristi s

for the image,but for example, the brightness in asmall part an be very

dierentfromthe globalvalue. This proje tuse thelo alfeaturesbe ause

it ompares parts of images and not the entire ones. The main problem

then is howto extra t features fromimages and how tomat hthem with

logos inthe trainingset. Weextra ted themobtainingaunique signature

foreveryinterestingpointoftheimage. The hoi eofthesepointsdepends

onthe approa hused: usuallyonefeatureisdes ribedbyoneve tor alled

lo al des riptor. Thereare several methodstoextra tthese points;inthis

thesis we'lldis uss about the main ones and we'llexplain our de ision.

It is important to onsider the working environment of this proje t, be-

ausetheautomati re ognitionoflogosimpliesastorageofalargenumber

of data: imagesof logos, informationonthe label,features extra ted, et .

A ordinglyone of the problems is the omputation time to ompare the

image with all the logos, we used 175K brands even if the number is ex-

pe ted to growday by day.

For the features extra tion, the problem is not in the image but in the

brand, be ause a logo may onsist of symbols, words or anything else. A

ommon and problemati ondition is that it ould be very simple with-

out some parti ular details than helps to a good dete tion; in on lusion

it may be hard dene a good features extra tor and a good omparison

metri s. After a brief resear h in literature, we de ided to use the SURF

des riptor. This de ision was taken for two main reasons: pre ision and

fast omputationspeed. Giventheinputimageandits al ulatedfeatures,

weneedtomat hits hara teristi swiththelogosimagesinthe olle tion.

Toavoidthetraversaloftheentire olle tionwithone-to-one omparisons,

weuse some te hniques tosegmentthe featuresin sub-groupsby ommon

properties. Anumeri alexampletounderstandbetterthesituationshould

be this: a logos (200x200) has an average of 100 features points and our

olle tion ontains 175K logos so they willbe 17Milion of points; instead

an image (500x375) has in average 440 features points: so, a set of om-

parisons point-to-point should be about 7500 million operations and this

is una eptable.

We analyzed dierent methods to resolve the problem: K-Means luster-

(12)

the points olle tionin groupsusing one of these te hniques, and retrieve

a representative ve tor for ea h of them. So, on e omputed the ve tors

of the input image, they willbe mat hed only with these ve tors and not

with allthe olle tion. By this we an redu e the number of omparisons

and onsequentlythe omputationtime. To ompare twove torsinamul-

tidimensionalspa e thereare manyway: weused the Pearson Correlation

that measures the similarity al ulating the osine of the angle between

them. Itisate hniqueoftenusedinTextMiningandinmanyotherelds.

It's important to dene how is stru tured this thesis be ause, as we

said, this is a part of a more omplex proje t. This is a Master Thesis

in Computer S ien e and ontains in details only the work developed by

the undersigned. The se tion that regards parts of dierent authors are

orre tlyhighlighted. Forthatreasontherearenotalltheexperimentation

andallthete hniquesused,butonlythemainones,nevertheless thethesis

is wrote togivea full idea of the proje t and it isstru tured as follow:

Chapter two: "Related Works" , It's essential spe ied the sour es

that have allowed us to study, understand and resolve the various prob-

lems en ountered inthe work.

Chapter three: "Test Colle tion" , In this hapter we'll des ribe the

two olle tions used: one with the brands logos and the other one with

Fli kr'sphotos. Therst oneaslogossour eandthese ondoneastesting

images. Willbe dis uss the reasonsof how itwas possible toobtain these

olle tions.

Chapter four: "Features Extra tion" ,Willbedis ussabout available

features and its problems. There are dierent des riptors, we'll see the

better one for the proje t after a short omparison with other methods.

Furthermore there will be abrief some appli ableee ts tothe imagesto

try to dete t the features in a more learly way. In the end we'll des ribe

how tomat h the points with relatedproblems and solutions.

Chapter ve: "Colle tion Segmentation", In this hapter we'll ana-

lyzehowtosegmentthe olle tion. We'lldes ribethe te hniquestryingto

highlightthe advantages and disadvantages of ea h methods with a short

des ription of the implementation ode.

Chapter six: "Classi ation and Ranking " , This is the real appli a-

(13)

logosretrieved. It ontainsthe lassi ationsofthefeatures extra tedand

the sele tionbetween the mostrepresentativelogos segments. Moreoverit

ontains the te hniques torank the logos andidates.

Chapter seven: "Con lusions and Future Work" , This is the nal

dis ussion aboutthe work,with ommentsand observationsand moreover

with suggestions and future steps or goals topursue.

(14)
(15)

Related Work

In this hapter we will prin ipally des ribe the state-of-the-art whi h will

be the basis of our work: features des riptor, segmentation, lassi ation

and lustering. There will be a short des ription of many te hniques al-

readyimplementedandusedin ertain ases whi hare losertoour needs.

There are dierent evaluation points to analyze, and some texts and pa-

pers were very useful for our de ision.

When we talk about logo dete tion we must onsider dierent mean-

ings: the main one is the identi ation of the logo into a do ument [1℄.

In this ase the dete tion ismade easierbe ause of thewhite ba kground.

In [2℄is ommonly used geometri invariants to prune the logos database

and lo al invariants to have a more rened mat h; on the other hand, in

[3℄ they used spatial density to involve a page segmentation omputing.

Anotherspread eld isthe removalof the hannel brandfroma television

program. This issueis analyzedinmany worksby Zhu[4℄ [5℄ orby Zhang

[6℄. There are dierent ways used to solve these topi , but they are all

based onsome fa ts: in televisionthe logo is ommonlyin a orner inthe

samepositionand,moreover, the ba kground images hange ontinuously,

instead, the logo's pixels that remain xed [7℄. This is useful for om-

mer ialdevelopment: in the transmissions re order where it'srequired an

automati removaloftheadvertising. It'sinterestingtonoti ethatduring

a television program they keep the tv logo but not during the advertise-

ments.

However the original goal of this proje t isto dete ta logo in avideo,

the main issue is to manageand retrieve the multimediadata [8℄. A brief

but omplete introdu tionof this area is des ribed in [9℄. The arti le was

(16)

ontent analysis stru tured on the following points: feature extra tion,

stru ture analysis, abstra tion, and indexing. This is broadly the pro-

edure adopted by our proje t, but ea h pro ess poses many hallenging

resear h problems. In this area we annot avoidmentioningthe Sivi and

Zisserman's Video Google [10, 11℄, these two works are done respe tively

in 2003 and in 2006 in whi h they reated a strong onne tion between

video andtextretrieval. They ompared avideoframetoado umentand

they onsidered avisual interest point asa word, applying the te hniques

of Text Retrieval. Again, in 2005, they published a te hni al report for

MIT [12℄ with unsupervised method. A pe uliar aspe t is the ompari-

son with a semi-supervised approa h [13℄. In almost allthese works they

used the SIFT Des riptor [14℄. In this proje twe willuse adierentkind

of feature extra toralgorithm with similarbut faster performan e, the so

alled SURF [15, 16℄. Thereare other ways toextra t lo alfeatures from

an image su h as [17, 18, 19, 20℄, this does not depend on the presen e

of edges or orners in the image but it is purely intensity-based whi h is

invariant to full ane transformations. The Harris dete tor [21℄ instead,

based onthe prede essorpresented by Morave [22℄ inhisPhD thesis,has

the aimtosele t the orners hara teristi s. On the otherhand the Maxi-

mally StableExtremal Regions [23℄isanotherblobdete tion methodwhi h

extra ts some area of interest, similar to a image segmentation; there is

additionallyanextendedversionwith oloursupport[24℄. Therearemany

debatesanddis ussionsaboutwhi histhebestdes riptorin[25,26,27,28℄

for example, but it ontinues tobe anissue withoutsolution.

Themainproblemoflogodete tioninimagesistondthebest wayto

omparethefeaturesextra tedfromtheimagestotheones olle tedinthe

logo database. Ifwedoitpoint-to-pointahuge numberof omparisonwill

o ur. Forthis reason we need some expedients. Furtheraid toour s ope

it isthe segmentation of the image. Segmenting animageintoregionsit's

very helpful be ause the area toextra t the features is alreadysegmented

and we an ompare the features only for determinate surfa es. MIT [29℄

and Berkeley[30℄built someappli ationsuseful totest theirsegmentation

te hniques. The issue of these methodsis the hard ompatibility with the

features extra tor algorithm. Indeed as SIFT or SURF dete t a lot of

points near the orners or lose the borders, omputing them only in one

region an lead to lossof information. Another way to redu e operations

and omputationaltime istosegment ingroupsall the set ofve tors, and

to analyze the image features to dene whi h is the best group for every

(17)

is the way hosen by this proje t.

In this thesis, to split the ve tors olle tion of logos' features, we use

a de ision tree, spe i ally a binary de ision tree. This is a method om-

monly used in de ision analysis to help identi ation. Indeed it'sused to

nd the logo into a do ument too, for example in [31℄ to redu e the false

positive fromthe andidates.

(18)

Riferimenti

Documenti correlati

contrary, it pursues the aim to arouse the reader’s curiosity about this topic and to make him/her realize how vast and sometimes blurred the field of screen translation can be, in

Díaz Cintas and Orero define it as a method which “involves replacing the original soundtrack containing the actors’ dialogue with a TL recording that reproduces the original

1 •Proprietà2 (monotonia): Date due distribuzioni unitarie con n termini, rispettivamente, x 1,x2,..,xne y1,y2,..,yn, se vale la condizione x j≤yjper ogni j, e almeno una volta

LAD Left anterior descending coronary artery LCL Lateral collateral ligament. LCX Left circumflex coronary artery LES Lower

We note that since the correlation function is an integral of the power spectrum across a range of scales, if the power spectrum is systematically underestimated across the range

Different types of gestures (metaphoric- referential and pragmatic ones) were identified, and in some cases, different types of visuals, too, contributed to multimodal ensembles with

completamente sbarazzarsi delle polveri eventual- mente accumulatesi alla superficie cutanea duran- te le ore di lavoro, e far ritorno alle loro 'abitazioni

However, in the canonical scenario of stel- lar evolution (no rotation), such systems are not promising as SNe Ia progenitors since the accretion rate resulting after the merging of