Semantic Video Indexing Using MPEG Motion Vectors

(1)

VECTORS

A. Bonzanini,R. Leonardi, P. Migliorati

DEA - University of Brescia , Via Branze, 38- 25123, Brescia, Italy

Tel: +39 030 3715434;fax: +39 030 380014

e-mail: leon@ing.unibs.it

ABSTRACT

With the diusion of large video databasesand

"elec-tronicprogram guides",theproblemof semanticvideo

indexingisofgreatinterest. Inliteraturewecanfound

manyvideoindexingalgorithms,basedonvarioustypes

oflow-levelfeatures,buttheproblemofsemanticvideo

indexingislessstudiedandsurelyitisagreat

challeng-ingone. Inthispaperwepresentaparticularsemantic

videoindexingalgorithmbasedonthemotion

informa-tionextractedfromMPEGcompressedbit-stream. This

algorithm is an example of solution to the problem of

nding asemantic event (scoring of a goal)in case of

specic typeofsequences(soccervideo).

1 INTRODUCTION

With modern technologies we are able to store and

transmitlargequantitiesofaudiovisualmaterial,thanks

to the development of devices with large storage and

transmissioncapabilitiesandtotheuseofeÆcientdata

compressiontechniques.

The need of an eective management of audiovisual

materialhasbroughttothedevelopmentofsemantic

in-dexingtechniques[2][1]usefulforvarious applications,

suchas,forexample,electronic programguides.

InFigure 1is representedthefact that,whileaman

uses its cognitive skills to face the semantic indexing

problem, anautomatic systemcan face it in twosteps

[3]: in the rst step some "low-level indexes" are

ex-tractedin order to representlowlevelinformation ina

compactway; in the second step weneed a

"decision-makingalgorithm"toextractasemanticindexfrom

low-levelindexes.

While the problem of low-levelindexes extraction is

widely discussed in literature, the problem of nding

decision-makingalgorithmsislessstudied; furthermore

itisquiteclearthatthisproblemcanbepartiallysolved

onlyforparticulartypesofsequences.

In this work we have addressed the problem of

se-mantic video indexing using the associated motion

in-formation. We have faced this problem tryingto nd

acorrelationbetweensemanticfeaturesand motion

in-Inliteraturemanyindexessuitabletobeusedforany

type of videosequences canbe found, no matterwhat

thesemanticcontentofthesequenceis;onthecontrary

the problem of nding a correlation between these

in-dexes and semanticcontent of video sequences canbe

solved only for a specic type of sequences. For this

reason in this work we haveconsidered theproblem of

semanticindexing of soccervideo sequences; the

stan-dard formatchosen for their representation is MPEG,

whichiswidelyusedforstorageandtransmissionofTV

signals.

Forsoccervideosequences thesemanticcontentcan

beidentiedwiththepresenceofinterestingeventssuch

as, forexample,goals, shots to goal,andso on. These

events canbe found at the beginning orat theend of

the game actions. A good semantic index of a soccer

videosequence couldbetherefore asummary madeup

of a list of all game actions, each characterized by its

beginningandending event. Such asummarycouldbe

veryusefultosatisfyvarioustypesofsemanticqueries.

Inparticularwehavechosenthreelow-levelmotion

in-dexeswhichrepresentthefollowingcharacteristics: lack

of motion and camera operations, represented by pan

and zoom parameters. We have then studied the

cor-relationbetweentheseindexesand thesemanticevents

dened above,and we havefound that cameramotion

isverymeaningful. Exploitingthiscorrelation,the

pro-posed algorithmdetectsthe presenceofgoalsand

rele-vantevents.

The document is organized as follow. In Section 2

we will discuss the indexes chosen to represent these

low-levelmotioninformation. InSection3theproposed

algorithmispresented,whileinSection4wewillreport

someexperimentalresults. Finalconclusionsaredrawn

in Section5.

2 LOW-LEVELMOTION INDEXES

Wehaveusedthreemotionindexesrelatedtothe

follow-ing infomation: "lack of motion" and "camera motion

parameters",pan andzoom.

(2)

Sequence Cognitive skills Index extraction Decision-making algorithm Semantic index H H H H H j ? + @ @ @ @ @ R system Human

Figure1: Semanticvideoindexingproblemsolution.

eachP-frameby:

= 1 MN I M 1 X i=0 N 1 X j=0 q v 2 x (i;j)+v 2 y (i;j) (1) <S no motion (2)

where M and N are the frame dimensions (in

mac-roblocks),I is thenumberof intra-coded macroblocks,

v

x and v

y

are the horizontal and vertical components

ofmotionvectorsandS

no motion

isthethresholdvalue.

Wehaveevaluatedthecorrelationbetweenthis

parame-terandsemanticeventsforvariousthresholdvalues,and

wehavefoundthatwiththevalueofS

no motion =4we

are able to detect 65 over92 semantic events with 60

falsedetections (asking the presence of lack of motion

for at least 3 P-frames). Qualitativelywe have found

lackof motion before eventsat the beginning of game

actionsorafter eventsattheendofgameactions.

Camera motion parameters, represented by "pan"

and "zoom" factors [5], have been evaluated using a

least-mean square method applied to P-frame motion

elds[6]. Wehavedetectedfastpan (orfastzoom)by

thresholdingthepan factor(orthezoomfactor),using

thethresholdvalueS

pan (orS zoom ): pan>S pan (3) zoom>S zoom (4)

Exploitingthecorrelationbetweenpanandzoom

fac-torsand semanticevents, wehavenoticed that wecan

ndfastpan in correspondencewith goalshots orfast

passes, andfast zoomin correspondencewith

interest-ingsituations underlinedbythecameraoperator.

Ask-ingthepresence offastpan ORfast zoom forat least

3P-frames, weare abletodetect53 over103semantic

2

2.5

3

3.5

4

4.5

5

20

30

40

50

60

70

80

90 S

_no−motion

n.

correct detection

false detection

Figure 2: Number of signicant events detected(total

number: 92)andfalsedetections,withvariousvaluesof

S no motion .

15

16

17

18

19

20

21

22

23

24

25

10

20

30

40

50

60

70

80

90 S

_pan

n.

correct detection

false detection

number: 103) and falsedetections, withvarious values

ofS

pan .

(3)

0.01

0.012

0.014

0.016

0.018

0.02

0.022

0.024

0.026

0.028

0.03

15

20

25

30

35

40

45

50

55

60 S

_zoom

n.

correct detection

false detection

number: 103) and falsedetections, with variousvalues

ofS

zoom .

3 THE GOAL-FINDING ALGORITHM

ThemotionindexesabovementionedarenotsuÆcient,

individually, to reach satisfying results (all semantic

eventsdetected with fewfalse detection). Tond

par-ticular events, such as, for example, goals or shot to

goals, we have tried to exploit the temporal evolution

of motionindexes in correspondence with such events.

Wehavenoticed that in correspondence with goalswe

can nd fast pan or zoom followed by lack of motion

followedby ashot cut. The sequence of this low level

eventshavethereforebeendetectedandexploited.

Shot cuts havebeen detected using only motion

in-formation too; in particular, we have used the sharp

variation of the above mentioned motion indexes and

thenumberof intra-coded macroblocksof P-frames[7]

[4].

4 EXPERIMENTAL RESULTS

Wehave tested theperformance of the proposed

goal-ndingalgorithmon2hoursofMPEG2sequences

con-tainingthesemanticeventsreportedon table1, where

isreported thenumberofdetected events too. Almost

alllivegoalsare detected,and thealgorithmis ableto

detectsomeshotstogoaltoo,whileitgivespoorresults

on penalties. The number of false detection is quite

relevant, but we have to take into account that these

resultsare obtained using motioninformation only, so

these falsedetection will probably be eliminated using

otherinformation(typicallyaudioinformation).

5 CONCLUSIONS

Inthispaperwehavepresentedasemanticindexing

al-gorithmbasedonmotioninformationextracteddirectly

Events Present Detected

Live Replay Total Live Replay Total

Goals 20 14 34 18 7 25

Shotstogoal 21 12 33 8 3 11

Penalties 6 1 7 1 0 1

Total 47 27 74 27 10 37

False 116

Table1: Performanceoftheproposedgoal-nding

algo-rithm.

algorithmwhichexploitsthetemporalevolutionofsome

low-levelmotionindexes,togetherwithitsperformance

when appliedto someMPEG2 testsequences. The

ob-tained results are encouraging, particularly if we take

intoaccountthe fact that the problem isverydiÆcult

andthat wehaveusedmotioninformationonly.

Currentresearchare devoted to theextension ofthe

proposedideatoothertypesofsequences.

References

[1] C. Saraceno and R. Leonardi, "Indexing

audio-visual databases through a joint audio and video

processing",InternationalJournalof Imaging

Sys-temsandTechnology,9(5):320-331,Oct.1998.

[2] Yao Wang,Zhu Liu, andJin-chengHuang,

"Mul-timedia ContentAnalysis UsingAudio and Visual

Information",ToappearinSignalProcessing

Mag-azine.

[3] Reginald Lagendijk, "A Position Statement for

Panel1: ImageRetrieval", Proc. of theVLBV99,

Kyoto,Japan,pp.14-15,October29-30,1999.

[4] Thomas Sikora, "MPEG Digital Video-Coding

Standards", IEEE Signal Processing Magazine,

September1997,vol.14,n. 5.

[5] GAvid,"Determiningtree-dimensionalmotionand

structure from optical ow generated by several

moving object", IEEE Trans. on PAMI,

PAMI-7:384-401,July1985.

[6] P. Migliorati and S. Tubaro, "Multistage motion

estimation for image interpolation", Image

Com-munication,July1995,pp.187-199.

[7] YiningDengandB.S.Manjunath,"Content-based

searchof video using color,texture, andmotion",

Proc. of IEEE International Conference onImage

ProcessingICIP-97,vol.2,SantaBarbara,