VECTORS
A. Bonzanini,R. Leonardi, P. Migliorati
DEA - University of Brescia , Via Branze, 38- 25123, Brescia, Italy
Tel: +39 030 3715434;fax: +39 030 380014
e-mail: leon@ing.unibs.it
ABSTRACT
With the diusion of large video databasesand
"elec-tronicprogram guides",theproblemof semanticvideo
indexingisofgreatinterest. Inliteraturewecanfound
manyvideoindexingalgorithms,basedonvarioustypes
oflow-levelfeatures,buttheproblemofsemanticvideo
indexingislessstudiedandsurelyitisagreat
challeng-ingone. Inthispaperwepresentaparticularsemantic
videoindexingalgorithmbasedonthemotion
informa-tionextractedfromMPEGcompressedbit-stream. This
algorithm is an example of solution to the problem of
nding asemantic event (scoring of a goal)in case of
specic typeofsequences(soccervideo).
1 INTRODUCTION
With modern technologies we are able to store and
transmitlargequantitiesofaudiovisualmaterial,thanks
to the development of devices with large storage and
transmissioncapabilitiesandtotheuseofeÆcientdata
compressiontechniques.
The need of an eective management of audiovisual
materialhasbroughttothedevelopmentofsemantic
in-dexingtechniques[2][1]usefulforvarious applications,
suchas,forexample,electronic programguides.
InFigure 1is representedthefact that,whileaman
uses its cognitive skills to face the semantic indexing
problem, anautomatic systemcan face it in twosteps
[3]: in the rst step some "low-level indexes" are
ex-tractedin order to representlowlevelinformation ina
compactway; in the second step weneed a
"decision-makingalgorithm"toextractasemanticindexfrom
low-levelindexes.
While the problem of low-levelindexes extraction is
widely discussed in literature, the problem of nding
decision-makingalgorithmsislessstudied; furthermore
itisquiteclearthatthisproblemcanbepartiallysolved
onlyforparticulartypesofsequences.
In this work we have addressed the problem of
se-mantic video indexing using the associated motion
in-formation. We have faced this problem tryingto nd
acorrelationbetweensemanticfeaturesand motion
in-Inliteraturemanyindexessuitabletobeusedforany
type of videosequences canbe found, no matterwhat
thesemanticcontentofthesequenceis;onthecontrary
the problem of nding a correlation between these
in-dexes and semanticcontent of video sequences canbe
solved only for a specic type of sequences. For this
reason in this work we haveconsidered theproblem of
semanticindexing of soccervideo sequences; the
stan-dard formatchosen for their representation is MPEG,
whichiswidelyusedforstorageandtransmissionofTV
signals.
Forsoccervideosequences thesemanticcontentcan
beidentiedwiththepresenceofinterestingeventssuch
as, forexample,goals, shots to goal,andso on. These
events canbe found at the beginning orat theend of
the game actions. A good semantic index of a soccer
videosequence couldbetherefore asummary madeup
of a list of all game actions, each characterized by its
beginningandending event. Such asummarycouldbe
veryusefultosatisfyvarioustypesofsemanticqueries.
Inparticularwehavechosenthreelow-levelmotion
in-dexeswhichrepresentthefollowingcharacteristics: lack
of motion and camera operations, represented by pan
and zoom parameters. We have then studied the
cor-relationbetweentheseindexesand thesemanticevents
dened above,and we havefound that cameramotion
isverymeaningful. Exploitingthiscorrelation,the
pro-posed algorithmdetectsthe presenceofgoalsand
rele-vantevents.
The document is organized as follow. In Section 2
we will discuss the indexes chosen to represent these
low-levelmotioninformation. InSection3theproposed
algorithmispresented,whileinSection4wewillreport
someexperimentalresults. Finalconclusionsaredrawn
in Section5.
2 LOW-LEVELMOTION INDEXES
Wehaveusedthreemotionindexesrelatedtothe
follow-ing infomation: "lack of motion" and "camera motion
parameters",pan andzoom.
Sequence Cognitive skills Index extraction Decision-making algorithm Semantic index H H H H H j ? + @ @ @ @ @ R system Human
Figure1: Semanticvideoindexingproblemsolution.
eachP-frameby:
= 1 MN I M 1 X i=0 N 1 X j=0 q v 2 x (i;j)+v 2 y (i;j) (1) <S no motion (2)
where M and N are the frame dimensions (in
mac-roblocks),I is thenumberof intra-coded macroblocks,
v
x and v
y
are the horizontal and vertical components
ofmotionvectorsandS
no motion
isthethresholdvalue.
Wehaveevaluatedthecorrelationbetweenthis
parame-terandsemanticeventsforvariousthresholdvalues,and
wehavefoundthatwiththevalueofS
no motion =4we
are able to detect 65 over92 semantic events with 60
falsedetections (asking the presence of lack of motion
for at least 3 P-frames). Qualitativelywe have found
lackof motion before eventsat the beginning of game
actionsorafter eventsattheendofgameactions.
Camera motion parameters, represented by "pan"
and "zoom" factors [5], have been evaluated using a
least-mean square method applied to P-frame motion
elds[6]. Wehavedetectedfastpan (orfastzoom)by
thresholdingthepan factor(orthezoomfactor),using
thethresholdvalueS
pan (orS zoom ): pan>S pan (3) zoom>S zoom (4)
Exploitingthecorrelationbetweenpanandzoom
fac-torsand semanticevents, wehavenoticed that wecan
ndfastpan in correspondencewith goalshots orfast
passes, andfast zoomin correspondencewith
interest-ingsituations underlinedbythecameraoperator.
Ask-ingthepresence offastpan ORfast zoom forat least
3P-frames, weare abletodetect53 over103semantic
2
2.5
3
3.5
4
4.5
5
20
30
40
50
60
70
80
90
S
no−motion
n.
correct detection
false detection
Figure 2: Number of signicant events detected(total
number: 92)andfalsedetections,withvariousvaluesof
S no motion .
15
16
17
18
19
20
21
22
23
24
25
10
20
30
40
50
60
70
80
90
S
pan
n.
correct detection
false detection
Figure 3: Number of signicant events detected(total
number: 103) and falsedetections, withvarious values
ofS
pan .
0.01
0.012
0.014
0.016
0.018
0.02
0.022
0.024
0.026
0.028
0.03
15
20
25
30
35
40
45
50
55
60
S
zoom
n.
correct detection
false detection
Figure 4: Number of signicant events detected(total
number: 103) and falsedetections, with variousvalues
ofS
zoom .
3 THE GOAL-FINDING ALGORITHM
ThemotionindexesabovementionedarenotsuÆcient,
individually, to reach satisfying results (all semantic
eventsdetected with fewfalse detection). Tond
par-ticular events, such as, for example, goals or shot to
goals, we have tried to exploit the temporal evolution
of motionindexes in correspondence with such events.
Wehavenoticed that in correspondence with goalswe
can nd fast pan or zoom followed by lack of motion
followedby ashot cut. The sequence of this low level
eventshavethereforebeendetectedandexploited.
Shot cuts havebeen detected using only motion
in-formation too; in particular, we have used the sharp
variation of the above mentioned motion indexes and
thenumberof intra-coded macroblocksof P-frames[7]
[4].
4 EXPERIMENTAL RESULTS
Wehave tested theperformance of the proposed
goal-ndingalgorithmon2hoursofMPEG2sequences
con-tainingthesemanticeventsreportedon table1, where
isreported thenumberofdetected events too. Almost
alllivegoalsare detected,and thealgorithmis ableto
detectsomeshotstogoaltoo,whileitgivespoorresults
on penalties. The number of false detection is quite
relevant, but we have to take into account that these
resultsare obtained using motioninformation only, so
these falsedetection will probably be eliminated using
otherinformation(typicallyaudioinformation).
5 CONCLUSIONS
Inthispaperwehavepresentedasemanticindexing
al-gorithmbasedonmotioninformationextracteddirectly
Events Present Detected
Live Replay Total Live Replay Total
Goals 20 14 34 18 7 25
Shotstogoal 21 12 33 8 3 11
Penalties 6 1 7 1 0 1
Total 47 27 74 27 10 37
False 116
Table1: Performanceoftheproposedgoal-nding
algo-rithm.
algorithmwhichexploitsthetemporalevolutionofsome
low-levelmotionindexes,togetherwithitsperformance
when appliedto someMPEG2 testsequences. The
ob-tained results are encouraging, particularly if we take
intoaccountthe fact that the problem isverydiÆcult
andthat wehaveusedmotioninformationonly.
Currentresearchare devoted to theextension ofthe
proposedideatoothertypesofsequences.
References
[1] C. Saraceno and R. Leonardi, "Indexing
audio-visual databases through a joint audio and video
processing",InternationalJournalof Imaging
Sys-temsandTechnology,9(5):320-331,Oct.1998.
[2] Yao Wang,Zhu Liu, andJin-chengHuang,
"Mul-timedia ContentAnalysis UsingAudio and Visual
Information",ToappearinSignalProcessing
Mag-azine.
[3] Reginald Lagendijk, "A Position Statement for
Panel1: ImageRetrieval", Proc. of theVLBV99,
Kyoto,Japan,pp.14-15,October29-30,1999.
[4] Thomas Sikora, "MPEG Digital Video-Coding
Standards", IEEE Signal Processing Magazine,
September1997,vol.14,n. 5.
[5] GAvid,"Determiningtree-dimensionalmotionand
structure from optical ow generated by several
moving object", IEEE Trans. on PAMI,
PAMI-7:384-401,July1985.
[6] P. Migliorati and S. Tubaro, "Multistage motion
estimation for image interpolation", Image
Com-munication,July1995,pp.187-199.
[7] YiningDengandB.S.Manjunath,"Content-based
searchof video using color,texture, andmotion",
Proc. of IEEE International Conference onImage
ProcessingICIP-97,vol.2,SantaBarbara,