• Non ci sono risultati.

Structural Properties of XPath Fragments

N/A
N/A
Protected

Academic year: 2021

Condividi "Structural Properties of XPath Fragments"

Copied!
22
0
0

Testo completo

(1)

UNIVERSITY

OF TRENTO

DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY

38050 Povo – Trento (Italy), Via Sommarive 14

http://www.dit.unitn.it

STRUCTURAL PROPERTIES OF XPATH FRAGMENTS

Michael Benedikt, Wenfei Fan and Gabriel M. Kuper

October 2002

Technical Report # DIT-02-0089

(2)
(3)

Michael Benedikt BellLaboratories benedikt@research.bell-labs.com Wenfei Fan BellLaboratories wenfei@research.bell-labs.com Gabriel M. Kuper UniversitadiTrento kuper@acm.org Abstract

We study structural properties of each of the main

sublanguagesofXPath[8] commonlyused in practice.

First,wecharacterizetheexpressivepowerofthese

lan-guagefragmentsin termsof both logics and tree

pat-terns. Second, we investigate closure properties,

fo-cusing on the ability to perform basic Boolean

oper-ationswhileremainingwithinthefragment. Wegivea

completepictureoftheclosurepropertiesofthese

frag-ments,treatingXPathexpressionsbothasfunctionsof

arbitrary nodes in a document tree, and as functions

that are applied only at the root of the tree. Finally,

weprovidesoundandcompleteaxiomsystemsand

nor-malformsforseveralofthesefragments. Theseresults

are useful for simpli cation of XPath expressions and

optimizationofXMLqueries.

1 Introduction

XPath[8] is a languagefor specifying the selectionof

element nodes within XML documents. It has been

widely used in XML querylanguages (e.g., XSLT [7],

XQL [22],XQuery[5]),XMLspeci cations(e.g.,XML

Schema [23]), and subscription systems (e.g., [6]). It

supports a number of powerfulmodalities and thus is

rather expensive to process [3, 18, 10]. In practice,

many applications do not need the excessive power

of the full language; they use only a fragment of

XPath. Forexample, XML Schema speci es integrity

constraintswithanXPathfragmentthatdoesnot

sup-portupwardmodalities(theparentandancestoraxes).

Itisthusnecessarytostudytheexpressivenessand

opti-mizationoftheseXPathfragments,sincetheiranalysis

andsimpli cationarecriticalforeÆcientprocessingof

XMLdocuments.

Theseconsiderationsmotivateustoconsidera

vari-etyoffragments,focusingonseveraldichotomies:

 downward vs.upward: somefragmentssupport

bothdownwardandupwardnavigation,while

oth-ersallowonlydownwardtraversal.

 recursive vs. nonrecursive: somefragments

al-low navigation alongthe ancestor and descendant

axes, while others permit only parent and child

axes.

 quali ed vs. non-quali ed: fragments may or

maynotincludequali ers (predicatestesting

prop-ertiesofanotherexpression).

Theaimofourworkistoshowhowthesefragments

di erfromoneanother,in termsofexpressivenessand

structural properties, andto developmethods for

sim-plifyingXPathexpressionsineachfragment.

Our rstcontributionisacharacterizationofthe

ex-pressivenessofthesefragmentsintermsoflogicsaswell

astreepatterns,aclassofqueriesfundamentalinmany

areasofcomputerscience[12,14] andstudiedrecently

inconnectionwithLDAPandXML[1]. Extendingthe

preliminary results of [15], we show that the natural

XPath fragments with quali ers can be characterized

viathepositiveexistentialfragmentof rst-orderlogic,

andthatthesefragmentsalsomatchexactlythequeries

formed using appropriate tree patterns. One

surpris-ing consequence of our logicalcharacterization is that

equality quali ers canbeadded to the fragments with

quali ers,withoutincreasein expressivepower.

The second contribution is to outline the

contain-ments that hold between these fragments, and to

dis-coverwhatadditionaloperationscanbede nedwithin

them. Using ourlogicalandpattern characterizations,

weshowthatthecontainmentsholdingbetweenXPath

fragmentscandi ersigni cantlydependingonwhether

theirexpressionsaretobeevaluatedatarbitrarynodes

of an XML document tree or are restricted to work

onlyfrom therootof thetree;that is, wedescribethe

containments that hold under general equivalence and

rootequivalence. Intheprocessweshowthatevery

ex-pression with upward modalities is root-equivalent to

onewithoutupwardnavigation. Thisisimportantfor,

(4)

sure properties of XPath fragments. Knowing that a

fragmentF isclosedunderasetofoperationsS isnot

only helpful for optimization algorithms {

manipula-tions involving S canbe done while remaining in the

fragment{itisalso usefulforXPathimplementers,as

it indicates that the operations in S can be built on

topofthefundamentaloperationsofthefragment. We

giveacompletepictureoftheclosure propertiesunder

BooleanoperationsforallofourXPathfragments,

mak-inguseofthelogicalandpatterncharacterizations

men-tionedabove. As with containment,weshow that the

closureproperties ofafragmentunder general

equiva-lencemaydi erfromthose underrootequivalence.

The nalcontributionofthepaperisin connection

withsimpli cation ofXPathexpressions. Weapproach

thisproblembasedonproofsystems forXPathusinga

set of axioms that allow simpli cation of expressions.

Thisis aninitial steptowardestablishinganalgebraic

frameworkbothfordirectingtheoptimizationprocess,

andfor allowinganoptimizer totrade o weaker

sim-pli cationforloweroptimization time. Notethat each

individualfragmentrequires aseparateanalysis of the

axiomatizability of containment/equivalence. Indeed,

given fragmentsF 1  F 2 , F 2

may admit a simple set

ofsimpli cationrulesalthoughF

1

doesnot,due tothe

lackof certainclosurepropertiesinF

1

,andvice versa.

To this end, weestablish somepreliminary results,

both positive and negative, forthe axiomatizabilityof

ourXPath fragments. We showthat no nite

axiom-atization can exist for any of our fragments; but we

givesoundandcompletecomputableaxiomsystemsfor

thedownwardnon-quali ed fragments. We show that

somefragmentsactuallybecome nitelyaxiomatizable

whenthe labels arerestrictedto come from a xed

-nitealphabet. Wealsoprovidenormalforms forsome

ofthesefragments,whichareuniqueuptoequivalence

ofexpressions.

Taken together, these results give a picture of the

advantagesanddisadvantagesofworkingwithina

par-ticularfragment.

Related work: There hasbeenconsiderablework on

theexpressiveness and complexityof XPath. Oneline

ofinvestigationstudiesformal modelsofXPath[3,18,

17, 16], abstracting away from the concrete syntax of

XPathinto apowerfulautomatonmodel{complexity

boundsarethenderivedfromboundsontheautomata.

The results to date have generally given considerable

insightintotheexpressivenessandcomplexityofXPath

asawhole,buthaveshedlittlelightonthedistinction

bestofourknowledge,thereisnopreviousworkdealing

directly withtheexpressivenessofXPathfragments.

An active area of research with direct bearing on

XPath optimization has been the analysis of the

con-tainment problem for XPath: in aseries of papers [1,

9, 15, 25], lowerand upperbounds forthe complexity

of containment havebeen establishedfor anumber of

XPathfragments. Note thatunderstanding

axiomatiz-ability ofcontainment in afragment requires a

funda-mentallydi erentanalysis from thesemanticmethods

usedinthesepapers,sincetheexistenceofanaxiom

sys-tem isapropertyofthe fragmentas a whole. Solving

containment itself is only astep toward optimization,

but developing a framework for, and thorough study

of,simpli cationinthecontextofXPathfragments

re-mainsanimportantopenresearchissue.

Closest in spirit to our paper is [19]. It presentsa

set of rewriterules for eliminating upward modalities,

which is similar to some of our results in Section 5.

However,[19]focusesonasinglelargeXPathfragment,

andtheirresultsdonotapplytothesmallerfragments

consideredhere. Normalformsforonesimplefragment

ofXPathareexaminedin[26];fortreepatterns,[1,20]

present algorithms for achieving minimization, which

canbeviewedasacertainnormalformfortreepatterns.

Theissueofaxiomatizingexpressionequivalencehas

been investigated for a number of formalisms related

to XPath: [21] shows that there can be no nite

ax-iomsystemforregularexpressions, while[11]gives

ax-iom systemsfor propositionaldynamic logic with

con-verse. Elimination of inverse roles (upwardmodality)

hasbeenstudied fordescriptionlogics[4]. Theresults

ofpropositionaldynamiclogicanddescriptionlogicsdo

notcarryoverheresincetheydealwithgeneral

struc-tures(graphs)insteadoftrees;furthermore,sincethose

languagessupportanegationoperator,the

expressive-ness di ers radically from XPath, which (as we shall

see)isnotnegation-closed.

Organization: Section2de nesourXPathfragments,

followedbyacharacterizationoftheirexpressivepower

inSection3. Closurepropertiesareinvestigatedin

Sec-tions 4 under general equivalence. Section 5 revisits

expressivenessandclosurepropertiesunderroot

equiv-alence. Section 6 provides axiomsystems and normal

forms forseveral fragments,and Section7summarizes

the main results of the paper. Proofs are in the

(5)

XPathexpressions are built upfrom anin nite set of

labels(tags,names). ThelargestfragmentofXPath

studiedinthispaper,denotedbyX

" r;[] ,issyntactically de nedasfollows: p ::=  j ; j l j # j " j #  j "  j p=p j p[p j p[q];

where , ;, l denote the empty path, the empty set,

and a name in , respectively; `[' and `=' stand for

unionand concatenation, `#' and `"' for thechild-axis

andparent-axis,`#

 'and`"



'forthe

descendant-or-self-axisand ancestor-or-self-axis,respectively;and nally,

qinp[q]iscalled aquali er andde ned by:

q ::= pjlabel=l;

wherepisanX

"

r;[]

expressionandlisanamein.

ThesemanticsofXPathexpressionsisgivenwith

re-spect toanXMLdocumentmodeledas anode-labeled

tree, referred to as an XML tree. Each node n in an

XML tree T (denoted by n 2 T) is labeled with a

name tag from some nite alphabet 

0

  (we

as-sumew.l.o.g. that

0

hasatleasttwosymbolsinit). It

alsohasa(possibly empty)listofchildren;itis called

aleaf of T ifit hasno children. A distinguishednode

rt iscalledtherootofT;eachnodein T exceptforthe

root hasaparent. Wedo notconsider orderof nodes

inT sinceourXPathfragmentsignoretheorder.Inan

XML tree T,an X

"

r;[ ]

expression pis interpreted asa

binarypredicateonthenodesofT. Thatis,foranyn,

n 0

2T,T j=p(n;n

0

)i oneofthefollowingissatis ed:

(1)ifp=,thenn=n

0

;(2)ifp=;,thenT j=p(n;n

0 )

isfalse foranyn

0

; (3) ifp=l, thenn

0

is achild ofn,

and is labeled with l; (4) if p = #, then n

0

is a child

of n, and itslabeldoes notmatter; (5) ifp= ",then

n 0 is the parent of n; (6) if p = #  , then n 0 is either n itself oradescendant of n; (7) if p= "  , then n 0 is

eithernitselforanancestorofn;(8)ifp=p

1 =p

2

,then

there exists x 2T such that T j=p

1 (n;x)^p 2 (x;n 0 ); (9)ifp=p 1 [p 2 ,thenT j=p 1 (n;n 0 )_p 2 (n;n 0 );(10)if p=p 1

[q],thenthere aretwocases: whenq isp

2 ,there existsn 00 2T suchthatT j=p 1 (n;n 0 )^p 2 (n 0 ;n 00 );when

q isalabeltest \label=l",n

0

islabeledwith l. This

semanticsisinthesamespiritastheonegivenin[24].

Example. Referring to a node n in an XML tree T,

someX

"

r;[]

expressionsandtheirsemanticsare:

p 1

: #=A: allthegrandchildrenofnthat arelabeledA;

p 2

: "=A: alltheA-siblingsofn, includingnitself

ifitislabeledA;

3

p 4

: "[label=A]: theparentof nifitislabeledA,

andtheemptyset otherwise;

p 5 : #  =A=# 

: allthedescendantsofnthathavean

A-ancestorwhichitselfisadescendantof n;

p 6 : #  =A[B[# 

=C]]: alltheA-descendantsofnthat

haveaB-childwhichinturnhasaC-descendant;

p 7 : #  =B=" 

: nodeshavingaB-descendantwhich

isalsoadescendantofn;note thatthesenodes

arenotnecessarilythemselvesdescendantsofn;

p 8 : #  =A[" 

=B]: alltheA-descendantsofnthat

haveanancestorwithaB-child.

IfT j=p(n;n

0

)thenwesaythatn

0

isreachable from

nviap. Weusen[[p]]to denotethesetofallthenodes

reachedfromnviap,i.e.,fn

0 jn 0 2T; T j=p(n;n 0 )g.

Example. Foranyleafnin T,n[[#]]=;,n[[#



]]=fng,

whilert[["]]=;and rt[[" 

]]=frtgfortherootrt ofT.

TwoXPathexpressionspandp

0

aregenerally

equiv-alent (or simply equivalent when it is clear from the

context),denotedbypp

0

,i foranyXMLtreeT and

anynoden2T,n[[p]]=n[[p

0

]]. Wesaytwoexpressions

areequivalent over

0 (denotedby  0 )where 0 isa

xed nite alphabet,if theaboveholdsfor any treeT

whoselabelsarein 

0

. Forexample,#is equivalentto

A[BoverthealphabetfA;Bg,butnotingeneral. We

will usually work with the stronger notion of general

equivalence , and specify when results also hold for

restrictedequivalence|equivalencew.r.t. some nite

alphabet

0 .

We use the following notations for subclasses of

X "

r;[]

: the subscript `[ ]' means that the subclass

al-lows quali ers, i.e., expressions of the form p[q]; the

subscript `r' indicates the support for \recursion"#

 ;

andthesuperscript`"'denotesthesupportforupward

modality`"'(and`"



'in presenceof thesubscriptr).

ThesubclassesofX

"

r;[]

consideredinthispapercan

be classi ed into two categories: with or without

re-cursion. Inthe rst category,X

" r

isthesubclass

with-out quali ers; X

r;[]

is thesubclass with quali ersbut

without "and "



; and X

r

is the subclass with neither

quali ers, "nor"



. Inthe second category, X

" [] is the subclassofX " r;[] without#  and"  ;X [] andX " further restrict X " [ ]

by disallowing, respectively, " and

quali- ers; nally,X isthesubclassofX

"

[]

withneither"nor

quali ers. All these fragments havebeen found to be

useful in practice. For example, X

r

is used by XML

Schema[23]tospecifyintegrityconstraints.

(6)

X

[ ]

X

r

X

r

X

X

[ ]

X

r,

[ ]

X

X

r,

[ ]

Figure1: FragmentsofXPath

inFig.1(seetheappendix foraproof).

Proposition 2.1: The fragments form the lattice of

Fig.1;that is,thereisanedgefromfragmentF

1 toF 2 i F 1 F 2

, i.e., ifeveryexpression ofF

1

isequivalent

toanexpressioninF

2

. 2

Example. TheXPathexpressionsgivenabovecanbe

classi edasfollows: p

1

is in all these fragments; p

2 is inX " ,X " [] ,X " r ,X " r;[]

,but isnotintheothers;p

3 isin X [] , X " [] ,X r;[] , andX " r;[] only, while p 4 is in X " [] and X " r;[] only;p 5 is in X r , X " r , X r;[] and X " r;[] but is not intheothers;p 6 isinX r;[] ,X " r;[] onlywhilep 7 isinX " r andX " r;[] only; nally,p 8 isonlyin X " r;[] .

A weakerequivalence relation isde ned asfollows:

pand p

0

areroot equivalent,denoted byp

r p 0 ,i for any XML tree T, rt[[p]] = rt[[p 0

]], where rt is the root

of T. In Section 5 we shall see that root equivalence

attensthehierarchyofFig.1.

The fragments studied by [15] do not support

up-wardtraversalsandunion,and areproperlycontained

in X

r;[]

. Note that because the fragmentsconsidered

in [15] do not deal with upward modalities such as "

and "



, the distinction between root equivalence and

general equivalence is not relevant to this prior work,

sincethese notionsare thesamein the absenceof

up-wardmodalities. Although[9]considersfragments

sim-ilar to X

"

r;[]

, it doesnot study the closure properties

andaxiomsystemsinvestigatedhere.

3 Logic and quali ed fragments

Inthissectionwecharacterizetheexpressivenessofeach

ofthefragmentswith quali ersde ned intheprevious

section,bymeansof bothpredicatelogicandtree

pat-terns. Asweshallsee,thelogicalcharacterizationisnot

onlyinterestinginitsownright,butisalsousefulinthe

analysisoftheclosurepropertiesof ourfragments.

We beginwith asimpleobservation. Onemightbe

interestedinextendingquali ersinanX

"

r;[]

expression

p[q]byincludingconjunctionanddisjunction:

q ::= p j label=l j q^q j q_q;

with thesemantics: at anynode nin an XMLtree T,

[q 1

^q

2

]istruei thereexistnodesn

0 andn 00 suchthat both q 1 (n;n 0 ) and q 2 (n;n 00 ), and such [q 1 _q 2 ] is true i q 1 (n;n 0 ) orq 2 (n;n 00

). Let us denote this extension

of X " r;[] by X " r;[^;_]

. Thenextproposition shows,

how-ever, that conjunction and disjunction add no

expres-sive power over X

"

r;[]

(see the appendix for a proof).

This allows us to use conjunction and disjunction as

shorthandsin quali ersofX

"

r;[]

expressions. This also

carriesovertoalloftheotherfragmentswithquali ers,

i.e.,X r;[] ,X " [] ,andX [] . Proposition3.1: X " r;[] andX " r;[^;_] areequivalent. 2

We next show that each of these fragments with

quali ers can be captured using a version of

positive-existential rst-order logic, and de ne notions of tree

patterns that capture the expressive powerof each of

thesefragments. Aprecursortothisworkis [15]where

inanalyzingthesubsetofX

r;[]

consistingofexpressions

built up withoutthe union operator `[', they observe

that this fragment isequivalentto thequeriesde ned

usinganaturalnotionof\unarytreepattern".

Theformalde nitionsofourlogicsandpattern

lan-guagesaregivenbelow.

Let 9

+

(child) be the fragment of rst order logic

builtupfromtherelationschild,labelpredicatesP(x)

foreachnamePaswellasequality`=',byclosingunder

^, _and 9, while 9

+

(child;desc ) is the corresponding

fragmentbuiltupfrom child ,desc ,thelabelpredicates

and `='. The semantics is the standard semantics of

rst-order logic over trees (with child and desc given

theirstandardinterpretation withinatree).

Weuse9

+

(child)(c;s)todenote 9

+

(child)formulae

with exactly the variables c and s free, and similarly

for 9

+

(child;desc )(c;s). Note that an 9

+

(child)(c;s)

formula de nes a function from a node c to a set of

nodess.

A forest pattern pc is aforest with labelson nodes

and edges, where nodes are labeled by names or the

wildcardsymbol`'(thatmatchesanynode),andedges

are labeled with # or #



. It has a distinguished node

(7)

nodesreferredtoastheselectednodesofpc. Apattern

pc thus hasthe form (V;E;l;c;S),where V is the

un-derlying forest domain, E the ordering,l the labeling

function, and c, S the context node and selected set,

respectively.

A forest pattern can be given semantics by

trans-latingitinto 9

+

(child;desc ). Thepattern (V;E;l;c;S)

is translated asfollows: Letting V = v

1 :::v

n

be the

elements ofV, assume,w.l.o.g., that v

0 is thecontext node, v 1 :::v k

are the selected nodes, and v

k +1 :::v

n

aretheremainingnodesofV. Thenlet(x

v 0 ;:::;x v k ) betheformula: 9x v k +1 ::: x v n ( ^ P ^ v2V;l(v)=P P(x v ) ^ ^ (v;v 0 )2E;l(v;v 0 )=child child(x v ;x v 0 ) ^ ^ (v;v 0 )2E;l(v;v 0 )=desc desc (x v ;x v 0 ))

Specialkindsofpatternsweconsiderare:

 A tree pattern is a forest pattern consisting of a

singletree.

 Achildpatternisoneinwhichalledgesarelabeled

with#.

 A unarypatternis oneinwhichthe selectedsetS

hasexactlyonenodeinit.

A forest pattern query is a nite set of forest

pat-terns,withallpatternshavingthesamecardinalityfor

theselected set S (designedto model the arity of the

query). A forest pattern query t returns, given atree

andadistinguishedcontext node,theunionofall

out-putsreturnedbythepatternsinit,i.e.,[t]=

S

tp2t [tp],

where[tp]is therelationde ned bytp. Byconvention

we take the empty forest pattern query to be

equiv-alent to false. We likewise talk about a child

pat-ternquery, unary tree pattern query, etc (note that a

treepattern queryis a nite setof treepatterns). We

let S

TP(child) be the set of unary child tree pattern

queriesand

S

TP(child;desc) be the set of unary tree

patternqueries.

It is easy to see that a tree pattern query can be

expressed in 9

+

(child;desc), and that a child pattern

querycanbeexpressed in 9

+

(child). A unarypattern

can be translated as above to a formula (x

v0 ;x vn ): renamingx v 0 ascandx v n ass,wegetaformula(c;s).

The rstresultisforthefragmentX

"

r;[]

.

Theorem3.2: Thefollowinglanguagesareequivalent

inexpressivepower  X r;[] ,  S TP(child;desc ),  9 + (child;desc )(c;s). 2

Proof sketch: Theproofconsistsofthreeparts.

1. S TP(child;desc)X " r;[] .

It suÆces to show that each unary tree pattern

p = (V;E;l;c;s) has an equivalent formula in X

"

r;[] .

Wehandlethecasewherecisneitheradescendantnor

anancestorofs;theothertwocasesaresimilarbut

sim-pler. We rstde neamappingQfromnodesv2V to

quali ers. Themappingisde nedinductively(starting

fromtheleaves)asfollows:

Q(v) = (label=l(v)) -- if l(v)6=` 0 ^ ^ (v;v 0 )2E;l(v;v 0 )=desc #  =[Q(v 0 )] ^ ^ (v;v)2E;l(v;v 0 )=child #=[Q(v 0 )]

Nowletrt betherootnodeof(V;E). Letc=v

0 ,:::,

v n

=rt bethepathfrom cto rt,andletrt =y

0 ,:::,

y m

= s be the path from rt to s. Let u

i : 0  i  n 1 be de ned by: u i = " if l(v i ;v i+1 ) = child and "  otherwise. Let d i = # ifl(y i ;y i+1 ) =child and #  otherwise.

Thenwetranslatethepatternpto

[Q(v 0 )]=u 0 [Q(v 1 )]=u n 2 [Q(v n 1 )]=u n 1 [Q(rt)]= d 0 [Q(y 1 )]==d m [Q(s)]:

Thatis,startingatthecontextnodec,theX

"

r;[]

expres-sion rstgoesupwardtotherootrtandthendownward

totheselectednodess. Onecanverifythat this

trans-lation iscorrect. 2. X " r;[] 9 + (child;desc )(c;s).

Thisisimmediatefromthede nition (logic

transla-tion)ofthesemanticsofX

" r;[]

expressionsgiveninthe

previoussection. 3. 9 + (child;desc )(c;s) S TP(child;desc).

Weprovethemoregeneralfactthatanarbitrary

for-mula(c;s 1 ;:::;s n )in9 +

(child;desc )isequivalenttoa

treepatternquery. Letbeaformula. Wecanassume

thatitisoftheform9x

1 :::x

n

where is

quanti er-free. We now create a graph whose vertices are the

variablesof andwhoseedgescorrespondtotheatomic

formulaein (withnodesandedgeslabeledbythe

cor-responding unaryand binaryatomicformulaesatis ed

by these variables). Bycombiningstrongly-connected

(8)

getan equivalentformulaforwhich thecorresponding

graphis atree. From thistree itis easyto generatea

treepatternquery. Detailsarein theappendix.

Thiscompletestheproofofthetheorem. 2

Wenowconsider thesituation for X

"

[]

. Part of the

previousresultgoesthroughasbefore. Tocapture the

expressivepower of X " [] precisely, let 9 + (child)[loc] be

the rst-order logic built up from child and the label

predicates via conjunction, disjunction and the

quan-ti cation 9x 2 B(c;n) (local quanti cation), for every

integern. Here 9x2 B(c;n) (x;~y) holds i there is

aconnected set of nodes of size at mostn in thetree

thatcontainsxandc(wesay\xisintheballofradius

naroundc"inthiscase). Welet9

+

(child)[loc](c;~s)be

thefragmentof 9

+

(child)[loc]with the further

restric-tionthateachs

i

isrestrictedtobeintheballofradius

naroundc. Itisclearthateverypropertyexpressiblein

9 +

(child)[loc]canbeexpressedin9

+

(child)(bymaking

thechainleadingtoc explicit). Then wehave(see the

appendixforaproof):

Theorem3.3: Thefollowinglanguagesareequivalent

inexpressivepower  X " [] ,  S TP(child),  9 + (child)[loc](c;s). 2

Asan immediateresultofTheorems 3.2and 3.3,it

followsthatequalitytestinquali ersaddsnoexpressive

power overX " [] and X " r;[]

. XPath allows quali ers of

theform [q

1

=q

2

] withthefollowingsemantics: atany

node n in an XML tree T, [q 1 = q 2 ] is true i there exists anode n 0

such that bothq

1 (n;n 0 ) and q 2 (n;n 0 ) hold, i.e., n[[q 1 ]] and n[[q 2

]] are not disjoint. Note that

thesemanticsofequalitytestinXPathisquitedi erent

fromthatofquali erconjunction.

Corollary 3.4: The extensions of X

" [] and X " r;[] to

allowtheequalitytestq

1

=q

2

inquali ersarethesame

asX " [] and X " r;[] ,respectively. 2

Proof sketch: This follows from the fact that the

statementsoftheform[q

1

=q

2

]canclearlybeexpressed

in 9

+

(child) and9

+

(child;desc ): let

1 (s 1 )and 2 (s 2 ) bethe9 + (child)(resp. 9 +

(child;desc ))formulae

repre-sentingq

1

and q

2

, withfreevariabless

1

and s

2

denot-ing the selectednodes; then [q

1 =q 2 ] is equivalent to 9x( 1 (x)^ 2 (x)). 2

Forexample,theexpression

[A=#  =B=#  =A=#  =C=#  ] [A=#  =B=#  =C=#  [A=#  =C=#  =B=#  ]:

There is an additionalequivalence betweenthe full

logic 9

+

(child) andforest patterns, which canbe

veri- edalongthesamelines asTheorems3.2and3.3.

Theorem3.5:Everyformulain9

+

(child)isequivalent

to achildforestquery,andviceversa. 2

We now turn to the downward fragments. Let us

de ne S

TP(child;desc)[down]tobetreepatternqueries

wherethenodelabeledc isrestrictedto beattheroot

of each query. Let 9

+

(child;desc )[down](c;s) be the

fragmentof 9

+

(child;desc)(c;s) in which every bound

variable as well as s is syntactically restricted to be

a descendant-or-self of the node c. Then similarly to

Theorem3.2 wehave(seetheappendixforaproof):

Theorem3.6: Thefollowinglanguagesareequivalent

in expressivepower:  X r;[] ,  S

TP(child;desc )[down],

 9

+

(child;desc )[down](c;s).

2

NowwegivethecharacterizationsforX

[]

. Wede ne

S

TP(child)[down] to be the fragment of

S

TP(child)

withthenodecattherootofeachtreepattern,andlet

9 +

(child)[loc;down]betheintersectionofthelanguages

9 +

(child)[loc]and9

+

(child;desc )[down],i.e.,every

vari-ableandalsosaresyntacticallyrestrictedtobea xed

descendantofc. ThensimilartoTheorem3.3,wehave

(seetheappendix foraproof):

Theorem3.7: Thefollowinglanguagesareequivalent

in expressivepower  X [] ,  S TP(child)[down],  9 + (child)[loc;down](c;s). 2 4 Closure properties

An XML queryfrequently involvesunion, intersection

andcomplementationofXPathexpressions. This

moti-vatesthestudyoftheclosurepropertiesofXPathunder

theseBooleanoperations,i.e.,whethertheBoolean

op-erationspreserveourXPathfragments. Thisis

impor-tantfor, amongother things,queryoptimization. The

(9)

will remainrelevant, given that mostapplications will

useonly aportion of XPath2.0, and that XPathand

XQuery implementations will focus their optimization

e ortsonparticularsubsetsof thelanguage.

Formally, a fragment F of XPath is closed under

intersection i for any expressions p

1

, p

2

in F, there

existsanexpressionpinF,denotedbyp

1 \p 2 ,suchthat n[[p]]=n[[p 1 ]]\n[[p 2

]]foranyXMLtreeT andanynode

n 2 T. Similarly, F is closed under complementation

i foranypin F, there existsp

0 in F, denoted by :p, such thatn[[p 0 ]]=fn 0 jn2T; n62n[[p]]gfor anyXML

treeT andn2T. Closureunder unioncanbede ned

similarly; all of our fragments are closed under union

sincetheyallcontaintheoperator `['.

Theorem4.1: ThefragmentsX,X

[] ,X " [] ,X r , X r;[] , andX " r;[]

areclosedunderintersection,whereasX

" and X " r arenot. 2

Proof sketch: Theorems3.2, 3.3,3.6 and3.7

charac-terizethequali edfragmentsX

" r;[] ,X " [] ,X r;[] andX []

with the logics 9

+

(child;desc )(c;s) and 9

+

(child)(c;s)

as well as their restrictions 9

+

(child;desc)[down](c;s)

9 +

(child)[loc;down](c;s), respectively. In particular,

for expressions p

1

and p

2

in any of these fragments,

their intersection can be expressed in the

correspond-inglogicas9y 1  1 (x;y 1 ) ^ 9y 2  2 (x;y 2 ),where  1 and  2

arethecodingsofp

1

andp

2

respectively. Fromthis,

itimmediately followsthat these fragmentsare closed

underintersectionsincethelogicsareclearlyclosed

un-derconjunction.

We now show that X

r

is closed under intersection

usinganautomata-theoreticapproach. Asimilar

argu-mentalso worksforX.

Letpbeanexpression inX

r

. Acceptance ofanode

n by such a p depends only on the labels on a path

from the context node c to n, hence p can be

mod-eledbyanautomatonthatacceptsstringswithcbeing

itsstartstate. Theautomatoncorrespondingto pcan

betakento beacyclic, withtheexceptionofself-loops

which canbefollowed regardlessofthe alphabet

sym-bol(corresponding tothesymbol#



). Conversely,any

such automaton canbe convertedto anX

r

-expression

bytakingtheunionofallpathsfromthestarttoan

ac-ceptingstate, introducing #



whereversuchaself-loop

occurs.Wenowshowthattheserestrictedautomataare

closedunderthestandardproductconstruction. LetM

and M

0

be twosuch automata,and letM

00

bethe

re-sultof thestandardCartesianproductconstructionof

theintersection[13]. WeshowthatM

00

correspondsto

r

thatthetransitiongraphofM

00

hasacyclewhichisnot

aself-loop. Let(q 1 ;q 0 1 ),:::, (q n ;q 0 n )beacyclein M 00

withn>1withallpairsdistinct. Thenq

1 , :::, q n ,q 1 andq 0 1 ,:::,q 0 n ,q 0 1

arecyclesinMandM

0 ,respectively. Butthen, q 1 =q 2 ==q n and q 0 1 =q 0 2 ==q 0 n ,

using the fact that M and M

0

have no non-self-loop

cycles. Fromthiswegetacontradiction.

Toshow thenegativeresults,consider the

intersec-tion \ "=A="=#oftwoX

"

expressions{this

expres-sionreturnsthecontextnodealone,butonlyifithasa

siblinglabeled`A'. Weshallshowthatthisintersection

cannot beexpressed in X

" r

, andthus notinX

" either.

Toseethis,supposethatpweresuchanX

" r

expression.

LetT bethetree withrootrt, havingchildrenn

1

(la-beled B), n

2

(labeled A), and n

3

(labeledB). Letn

1

bethecontextnode; clearlyn

1

[[p]]=fn

1

g. Then there

must exist a simple path p

0

, i.e., a sequence of labels

and ", such that n

1 [[p

0

]] = fn

1

g and for everytree T

andnoden,n[[p 0 ]]n[[p]]. Letv 1 :::v k beasequenceof

nodesstartingandendingn

1 thatwitnessesn 1 2n 1 [[p 0 ]]. If noneof thev i isrt, thenn 1 willbein n 1 [[p]] evenif

wedeletethenoden

2

fromthetree,acontradiction. If

somev

j

isrt, thenreplaceeach subsequentoccurrence

ofn

1

inthepathbyn

3

,anditfollowsthatn

1 [[p

0

]]must

include the node n

3

, once morea contradiction. This

concludes theproof. 2

Forcomplementation,ontheotherhand:

Theorem 4.2: None ofthe fragmentsis closed under

complementation. 2

Proofsketch: Weconsiderherethecaseofequivalence

overa xed nitealphabet

0

{theproofswillalsowork

for general equivalence, but in such a case one could

simply show that the complement of the simple path

\A"cannotberepresentedinanyofthefragments.

We distinguish between two groups of fragments{

those with recursion (#



and possibly "



), and those

without.

First, consider anon-recursivefragment. Let p

1 be

thesimplepath\A". Weclaim thatp=:(p

1

)cannot

berepresented in thisfragment. Tosee this, let mbe

thesizeofp. Itiseasytoseethat pcannotmatch any

pathoflength>m,acontradiction.

In the caseof recursive fragments, when fragments

are considered w.r.t. equivalenceovera xed nite

al-phabet 

0

, this argument does not work: if the

al-phabet is fA;A 2 ;:::;A n g, :A can be represented as [A 2 [[A n [#=#=# 

(10)

X

r

X

r,

[ ]

X

r,

[ ]

X

r

X

[ ]

X

X

[ ]

X

Figure2: XPathfragmentsunderrootequivalence

andletp 1 =#  =A=#  . Notethat p 1

doesnotmakeuse

of"orofquali ers,andhenceisin alloftherecursive

fragments. Weclaim that the complement of p

1

can-notbeexpressed in X

"

r;[]

,and hencenotin anyof the

smallerrecursivefragments.

Weshowthisbyprovingthatthecomplementofp

1

cannot be expressed in 9

+

(child;desc)(c;s). Let p be

arepresentationof thecomplementof p

1

in this logic,

of size m. Consider two structures T

1 and T 2 . The structure T 1

consists of a root rt followed by a chain

of length 2

m+1

, with all these nodes labeled B (and

hencethe end of the chain is in the complementwith

respecttocontextnodert). ThestructureT

2

issimilar,

exceptthatanodeinthemiddleofthechainislabeled

A(andhencethenodeattheendofthechainisnotin

thecomplement). Asimpleargumentshowsthatevery

m-quanti er9

+

(child;desc )(c;s)sentenceholdinginT

1

alsoholdsinT

2

,acontradiction. 2

5 Root equivalence

Recall the notion of root equivalence de ned in

Sec-tion2. Underthis weakerequivalencerelation, the

hi-erarchyof Fig.1collapses: theXPathfragmentsform

thelatticeofFig.2. Thisfollowsfrom:

Theorem5.1: Underrootequivalence,

1. X " r X r;[] but X r;[] 6X " r ; 2. X " X []

; moreover,in the absenceoflabeltests,

i.e., quali ersoftheform[label=l],X

" =X [] ; 3. X r;[] =X " r;[] ; 4. X [] =X " [] . 2

As an immediate consequence, any XPath

expres-sionwith upwardmodalities inanyof thesefragments

isroot-equivalentto anXPathexpression withneither

" nor " 

. This is important for, among other things,

processingstreamingdata.

Proof sketch: 1. X " r X r;[] .

Firstnotethatp=(p

1 [p 2 )=p 0  r p=p 1 =p 0 [ p=p 2 =p 0 .

Thus, w.l.o.g.,weassumethat anyX

" r expression isof theformp 1 [[p k ,whereeachp i (1ik)hasthe form 1 == m ,andeach j isoneof;,,l,#,#  ,",or " 

. ItsuÆcestoshowthateachp

i

canbeconvertedto

anequivalentX

r;[]

expression. Theconversionisdone

fromlefttoright,startingwiththe rstoccurrenceof"

or" 

, using thefollowingrewriting rules(we omitthe

trivialrulesfor;and):

"=p ) ;; if"isthe rstsymbolofp i ; "  =p ) p; if"  isthe rstsymbolofp i ; p==" ) p[]; where is#oralabel;  1 == n ="  )  1 [ 2 == n ] [  [ 1 == i [ i+1 == n ] [  [ 1 == n ; where i is#,#  oralabel. Inotherwords,if 1 is"(resp." 

),the rst(resp.

sec-ond)ruleisappliedtoeliminateit;otherwisewe

elimi-nate "and"



byintroducingquali ersusingtheother

rules. Inthese rules, p, p

1 , and p 2 denote X " r

expres-sions without upward modalities, and theleft-to-right

conversionimpliesthat arewritingruleis appliedonly

ifitsleft-handsidematchesthepre xofanX

" r

expres-sion. Although the rewriting may introduce `[', one

needsonly to considerp

i

'sthat do notcontain`['

be-cause of the rewriting rules for `[' referredto earlier.

It is straightforward to verify that each rewriting rule

is root-equivalence preserving and that p

i

canbe

con-vertedtoanX

r;[]

expressionina nitenumberofsteps,

concludingtheproofofcontainment.

ThecontainmentisproperbecausetheX

r;[] expres-sion #  =A[# 

=B] is not root-equivalent to any X

" r

ex-pression. Seetheappendix foraproof.

2. X " X [ ] , and furthermore, X " = X [] in the

ab-senceoflabeltests.

The proof that X

"

r

 X

r;[]

given abovealso shows

X "

X

[]

. Forthe other direction,it suÆces to show

that without label tests in quali ers, each X

[]

expres-sion can be rewritten to an equivalent X

" r

expression.

This can be done by eliminating the quali ers using

therewritingrules,alongthesamelinesastheproofof

part1above(seetheappendix).

Note that in thepresenceof labeltests, X

[]

6X

" .

(11)

whether theroot islabeledA. It is easyto showthat

thisX

[ ]

expressionisnotexpressiblein X

" . 3. X r;[] =X " r;[] .

From Theorems 3.2 and 3.6, X

r;[]

and X

"

r;[] are

equivalenttothetreepatternqueries

S

TP(child;desc )

and S

TP(child;desc )[down] respectively. Recall that

S

TP(child;desc )[down] is de ned to be the fragment

of S

TP(child;desc ) where the context node labeled c

is restricted to be at the root of each query. Under

rootequivalence,thecontextnodecisalwaysexplicitly

restricted to be at the root. Hence,

S

TP(child;desc )

and S

TP(child;desc)[down] have the same expressive

powerunder rootequivalence;thesameholdsforX

" r;[] andX r;[] . 4. X [] =X " [ ] .

In a similar way to part 3, we use Theorems 3.3

and3.7toshowthatX

[]

andX

"

[]

arerootequivalentto

eachother. 2

Asaconsequenceof Theorem5.1 thefollowingcan

beeasilyveri edalongthesamelinesasProposition2.1.

Corollary5.2: Underrootequivalence,thefragments

formthelatticeofFig.2;thatis,thereisanedgefrom

fragmentF 1 to F 2 i F 1 F 2

, i.e., ifeveryexpression

ofF 1 isroot-equivalenttoanexpressionin F 2 . 2 Recall that X "

is not closed under intersection as

far asgeneralequivalence is concerned. The situation

isdi erentwhenweconsiderrootequivalence:

Corollary5.3: Underrootequivalence,allofour

frag-mentsexceptforX

" r

areclosedunderintersection.

How-ever, they remain not closed under complementation.

2

Thiscanbeveri edusingTheorems4.1,4.2and5.1

(seetheappendixforaproof).

6 Axiom systems and normal forms

We next study axiomatizabilityand normal forms for

XPath. Inparticular,wepresentpreliminaryresultsfor

twofragments,namely,X

r

andX. Althoughtheresults

of this section are established under full equivalence,

theyalsoholdunder rootequivalence.

An XPathtermoverafragmentF isbuiltupfromthe

constructors of F, but supplementing thebase case of

labels(names)and with asetof expressionvariables

(rangedoverbyp

1

, :::,p

n

). Foraterm overF,the

variablesusedinconstructing arecalledthefree

vari-ablesof. AlltheXPathexpressionsthatwehaveseen

beforearejusttermswithnofreevariables,andwillalso

bereferredtoasgroundterms,orsimplyexpressions.

AnequationforafragmentF isanequalityofterms.

GivenasetofequationsAinF,theequivalencerelation

equivalence modulo A, 

A

, is the smallestequivalence

relationbetweentermsthatcontainsthesymmetric

clo-sure of A (considering A asa binaryrelation between

terms) andclosedundertheinferencerules:

1. if   A  0 then ()  A ( 0

) for any

substitu-tion  that maps the free variables in  or 

0 to expressionsofF; 2. if  A  0 then  A 0 ,where 0 isobtainedfrom

byreplacingsomeoccurrenceofsubexpression 

in with

0 .

A set of equations A is sound for afragment F of

XPathifforeveryexpressions

1 and 2 ofF, 1  A  2 implies 1  2

,andiscompleteifforeverysuch

1 and  2 ,  1   2 implies  1  A  2

. A (resp. nite) axiom

system forF is a(resp. nite)set of equations that is

sound and complete. A fragment of XPath is said to

beaxiomatizablei ithasanaxiomsystem,and nitely

axiomatizable i ithasa niteaxiomsystem.

A ( nite) axiomsystemforafragmentof XPathis

useful in,among otherthings, optimizingand

normal-izingtheXPathexpressions.

Unfortunately, none of thefragmentsconsidered in

thispaperis nitely axiomatizable.

Proposition6.1: Noneofthefragmentsis nitely

ax-iomatizable. 2

Proofsketch: WeshowthatX isnot nitely

axioma-tizable. Thesameproofalsoappliestootherfragments.

Assume, by contradiction, that there were a nite

axiomatizationAforX. SinceA has nitely many

ax-ioms,thereareonly nitelymanylabelsofmentioned

in A. Letc beanamethatdoesnotappearin A.

Ob-servethatc[#  #holds. SinceAiscomplete, there

must bea proof S of c[# 

A

# using theinference

rules and axioms in A. Since c is notin anyof these

axioms,andcappearsintheproof,itcanonlybe

(12)

Notethatthisproofwillstillbeaproofif,ineachsuch

E,onesubstitutesapathc=cforeach occurrenceof c.

This yields aproof S

0

which is thesame asS, except

that c=cappears in S

0

whereverc appears in S. Thus

it is a proof for c=c[# 

A

#, which doesnot hold.

Hence Acannot besound. Therefore,it isnota nite

axiomatizationforX. 2

6.2 Axiomsystems forX

r

andX

We next present a natural set of axiom schemas for

twofragments,namely,X

r

andX. Weshowthatwhen

equivalenceovera nitealphabet

0

isconsidered,X is

nitelyaxiomatizable. Oneapplication ofthese axiom

systems,derivinganormalformforexpressionsinthese

fragments,willbeshownlater.

It is worth mentioning that since the equivalence

problem forXPath expressionsin allof ourfragments

isdecidable(e.g. byappealingto [16] ),atrivial

com-putable axiomatization can be taken for any of our

fragmentsbysimplyenumeratingallgroundequalities.

Clearlysuchanaxiomatizationwouldnotassist

simpli- cationofXPath,andwouldalsonothelpinproviding

niteaxiomatizationsforrestrictedequivalence.

Fragment X

r

. Table1 providesan axiomsystemfor

X r

,denotedbyI

r

. Thesystemisin nitesinceforeach

labellinthealphabet,there isanl-childruleinI

r .

Whenconsidered withrespect to equivalence overany

xed nite alphabet 

0

, I

r

is still sound, but it is no

longercomplete. Notethat by theconcat-associativity

axiomin I r , we can write  1 = 2 = 3 for ( 1 = 2 )= 3 and  1 =( 2 = 3 ). Theorem6.2: ForanyX r expressions and 0 ,  0 i   Ir  0 . 2 Example. Using I r , #  =#   #  can be veri ed as follows: #  =#   I r #  =(# 

[ ) bythedescendantsaxiom

 Ir #  =#  [#  byleft-distribution  Ir #  bydescendants-union

ToproveTheorem6.2,wede neacontainment

rela-tion. AnXPathexpressionp

1 iscontainedinp 2 (written p 1 p 2

)i foranyXMLtree T andanynoden inT,

n[[p

1

]]n[[p

2

]]. Ifwewantcontainmenttoholdonlyfor

treesovera xedalphabet

0 , wewritep 1  0 p 2 .

Thefollowing lemma showsthat I

r also suÆces to determinecontainmentofX r expressions. =p  p  p= (empty-path) ; [ p  p (empty-set-union) p 1 =;=p 2  ; (empty-set-concat) l [ #  # (l -child) #    [ #=#  (descendants) p [ #   #  (descendants-union) #=#   #  =# (child-descendants) p 1 =(p 2 =p 3 )  (p 1 =p 2 )=p 3 (concat-associativity) p 1 [(p 2 [p 3 )  (p 1 [p 2 )[p 3 (union-associativity) p1 [ p2  p2 [ p1 (union-commutativity) p [ p  p (union-idempotent) p=(p1 [ p2)  p=p1 [ p=p2 (left-distributivity) (p1 [ p2)=p  p1=p [ p2=p (right-distributivity) Table1: I r : axioms forX r

Lemma6.3: ForanyX

r expressions 1 and 2 , 1  2 i  2 [ 1  Ir  2 . 2

If this holds, then so does Theorem 6.2. Indeed, if

 1  2 then 1  2 and 2  1

. ThusfromLemma6.3

itfollowsthat 2 [ 1  Ir  2 and 1 [ 2  Ir  1 . Thenby

theunion-commutativityaxiomandtheinferencerules,

 1  I r  2 . Conversely,if 1  I r  2

,thenby the

union-idempotentaxiomandtheinferencerules,

1 [ 2  Ir  2 and  2 [ 1  I r  1

. Again from Lemma 6.3 it follows

that  1  2 and 2  1 ;thatis, 1  2 .

It is worth mentioning that Lemma 6.3 is also

in-terestingin itsown right, sinceit iscommonfor XML

queriestocheckcontainmentofXPathexpressions.

Theproofofthelemmaisgivenintheappendix. It

should bementionedthattheproofcanbemade

e ec-tive, thus providing an algorithm for testing

contain-ment,andhencefortestingequivalence.

Fragment X. We now consider X. Let I be I

r

ex-cluding thedescendants, descendants-union and

child-descendants axioms. Fora nite alphabet 

0

, assume

that it canbe enumerated asl

1 , :::, l n . A nite ax-iomsystemI f ( 0

)consistsofrulesin I exceptforthe

l-childrulesandwiththeadditionofthefollowingrule:

l 1

[ [ l

n

 # ( nite-alphabet)

Note that for each l 2 

0

, l[#  # canbederived

(13)

ofI ( 0

).

OnecanverifythatI isanaxiomatizationofX

un-der generalequivalence (the default notion), and that

foreach nite

0 ,I f ( 0 )isa niteaxiomatizationofX

forequivalenceover

0

(seetheappendixforaproof).

Theorem6.4: For anyX expressions

1 and 2 ,   1  2 i  1  I  2 ,and 1  2 i  2 [ 1  I  2 ;

 Foreach nite 

0 ,  1  0  2 i  1  I f (0)  2 ,and  1   0  2 i  2 [ 1  I f ( 0 )  2 . 2

6.3 Normal forms forX

r

andX

Westudy herenormalformsforX

r

andX.

A fragmentF of XPathhasanormal formifthere

isa function from F to asubsetF

nl

ofF such that

foranyexpressions

1 and  2 of F,  1  2 i ( 1 )= ( 2 ), i.e., ( 1 ) and ( 2

) are syntactically equal to

each other. We shall say that F

nl

is a normal form

of F. Observe that  is determined by F

nl

. For an

expression in F,wereferto()asthenormal form

of w.r.t.F

nl

,orsimplythenormalformifF

nl

isclear

from the context. Similarly, wecan say that F has a

normalform forequivalenceovera nitealphabet

0 .

Anormalform isusefulforsimplifying theanalysis

ofequivalenceofXPathexpressionssinceitallowsone

toreducesemanticequivalencetosyntacticequality.

Fragment X

r

. An X

r

expression is saidto be

union-free if it does not contain union `['. Below we give

anormal form for union-free X

r

expressions. Normal

formsforX

r

expressionswithunions arestillunder

in-vestigation.

Aunion-free X

r

expression isin thenormal form

if (1) it does not contain  (resp. ;) unless = 

(resp. = ;); (2) any contiguous sequence in

consisting of #'s and at least one #

 is of the form #  =#=#  ==#=# 

(with the same number of

occur-rences of #); and (3) does not contain consecutive

# 

=# 

. The next proposition shows that the set of

union-freeX

r

expressionsofthisformisindeedanormal

formforallunion-freeX

r

expressions(seetheappendix

foraproof).

Proposition 6.5: Foranyunion-free X

r

expression

thereexistsauniqueX

r expression 0 suchthat  0 and 0

is in the normal form above. Furthermore, in

the case of a nite alphabet 

0

, for any union-free

thereexistssuchaunique

0 suchthat  0 0 . 2

Fragment X. We nextde ne anormalform for

gen-eral X expressions. An X expression  is in theunion

normalform ifitisoftheform

1 [[ k ,wherefor each i , referredto asadisjunctof , (1) i doesnot containunless i

=,and doesnotcontain;unless

 =;; (2)

i

isa union-free expression; and(3)

i is

notcontainedinanyother

j ,i.e., i 6 j forallj6=i.

When considering a xed nite alphabet 

0 = fl 1 ;:::;l n

g,weaddthecondition(4) doesnotcontain

#,which isrepresentedinsteadbyl

1

[[l

n .

The following then holds (see the appendix for a

proof):

Proposition 6.6: ForanyX expression there isan

X expression 0 suchthat  0 and 0 isin theunion

normalform satisfyingconditions (1){(3)givenabove.

Forevery nite alphabet 

0 there is  0  0  with  0

satisfying(1){(4). Moreover,inbothcases

0

isunique

uptoorderingofthedisjuncts. 2

7 Conclusion

We have investigated important structural properties

of a variety of XPath fragments, parameterized with

modalitiesthatincludequali ers,upwardtraversal(the

parent and ancestor axes), and recursion (descendant

and ancestor). First, we have provided

characteriza-tionsof the fragmentswith quali ersin termsof both

logicsandtreepatterns(Fig.3). Second,wehavegiven

acompletepictureoftheclosurepropertiesofallthese

fragments(Fig.4),under bothgeneralequivalenceand

root equivalence. Finally, wehave established

prelim-inary results on axiom systems and normal forms for

some of these fragments. These results not only

ad-vanceourunderstandingofdi erentXPathmodalities,

but are alsouseful for simpli cation of XPath

expres-sionsandoptimizationofXMLqueries.

One open problem is the nite axiomatizability of

X r

for

 0

,andsimilarlyfortheotherfragments. Itis

wellknownthatregularexpressionsdonothavea nite

axiomsystemwhentheinferencerulesarerestrictedto

the ones given in Section 6, even when the alphabet

consistsofasinglesymbol[21]. AlthoughX

r

isalarge

class of regular expressions, we speculate that 

0 is

nitely axiomatizableforX

r

. Wearecurrently

investi-gatingthis issue forX

r

terms, which maycontainfree

variables. Another topic for future work is to study

theexpressivenessandclosurepropertiesofother

frag-ments,especiallythoseallowingnegationsinquali ers.

Thethird topic isto strengthenourlogical

(14)

characterization X [] X [] X r;[] X r;[] logic 9 + (child)[loc;down](c;s) 9 + (child)[loc](c;s) 9 + (child;desc)[down](c;s) 9 + (child;desc )(c;s) treepatterns S TP(child)[down] S TP(child) S

TP(child;desc )[down]

S

TP(child;desc )

Figure 3: Characterizationsoftheexpressiveness ofthefragments

closureproperties X X [] X " X r X " [] X r;[] X " r X " r;[]

intersection under Yes Yes No Yes Yes Yes No Yes

under

r

Yes Yes Yes Yes Yes Yes No Yes

complementation under No No No No No No No No

under

r

No No No No No No No No

Figure 4: Theclosurepropertiesofthefragments

given here can be extended to handle the case where

trees have data values. In addition, the

characteriza-tionscan be made moreprecise by mappinginto

pos-itive existential logic with two variables. The fourth

topic isto reinvestigatethese issuesin the presenceof

DTDs/XML Schema and integrity constraints, which

commonlycoexist with XPathexpressions in practice.

Finally, weare also exploringthe use ofour resultsin

developingrewritesystemsandalgorithmsfor

simplify-ingXPathexpressions.

References

[1] S.Amer-Yahia,S.Cho,V.Lakshmaman,andD.Srivistava.

Minimizationoftreepatternqueries. InSIGMOD,2001.

[2] A.Berglundetal. XMLPathLanguage(XPath)2.0.W3C

WorkingDraft,Dec.2001.http://www.w3.org/TR/x path 20.

[3] G.Bex,S.Maneth,andF.Neven.Aformalmodelforan

ex-pressivefragmentofXSLT.InformationSystems,27(1):21{

39,2002.

[4] D.Calvanese,G.DeGiacomo,M.Lenzerini,andD.Nardi.

Reasoninginexpressivedescriptionlogics. InHandbook of

Automated Reasoning,pages1581{1634.Elsevier,2001.

[5] D.Chamberlinetal.XQuery1.0: AnXMLquerylanguage.

W3CWorkingDraft,June2001.

http://www.w3.org/TR/xque ry.

[6] C.Chan,P.Felber, M.Garofalakis,and R.Rastogu.

EÆ-cient lteringof XMLdocumentswithXPath expressions.

InICDE,2002.

[7] J.Clark. XSLTransformations(XSLT). W3C

Recommen-dation,Nov.1999. http://www.w3.org/TR/x slt.

[8] J.ClarkandS.DeRose.XMLPathLanguage(XPath).W3C

Recommendation,Nov.1999.http://www.w3.org/TR/xpa th.

[9] A.DeutschandV.Tannen.ContainmentforclassesofXPath

expressionsunderintegrityconstraints.InKRDB,2001.

[10] G.Gottlob, C.Koch,and R.Pichler. EÆcientalgorithms

forprocessingXPathqueries.InVLDB,2002.

[11] D. Harel,D. Kozen, and J.Tiuryn. DynamicLogic. The

MITPress,2000.

[12] C.Ho mannandM.O'Donnell.Patternmatchingintrees.

JACM,29(1):68{95,1982.

[13] J.E.HopcroftandJ.D.Ullman. IntroductiontoAutomata

Theory,LanguagesandComputation.AddisonWesley,1979.

[14] P.KilpelainenandH.Manilla.Orderedandunorderedtree

inclusion. SIAMJ.Comput.,24(2):340{356,1995.

[15] G.Miklau and D.Suciu. Containment andequivalence of

XPathexpressions.InPODS,2002.

[16] T. Milo,D. Suciu, andV. Vianu. Typecheckingfor XML

transformers.InPODS,2001.

[17] M.Murata.ExtendedpathexpressionsforXML.InPODS,

2001.

[18] F. Nevenand T.Schwentick. Expressiveand eÆcient

lan-guagesfortree-structureddata. InPODS,2000.

[19] D.Olteanu,H.Meuss,T.Furche,andF.Bry.XPath:

Look-ingforward.InWorkshoponXML-BasedDataManagement

(XMLDM),2002.

[20] P.Ramanan.EÆcientalgorithmsforminimizingtreepattern

queries. InSIGMOD,2002.

[21] V. Redko. On de ning relations for the algebra of

regu-larevents.UkrainskiiMatematicheskiiZhurnal,16:120{126,

1964. (Russian).

[22] J. Robie, J. Lapp,and D. Schach. XML Query Language

(XQL).WorkshoponXMLQueryLanguages,Dec.1998.

[23] H. Thompsonet al. XMLSchema. W3C WorkingDraft,

May2001. http://www.w3.org/XML/S che ma.

[24] P.Wadler. TwosemanticsforXPath.Technicalreport,Bell

Labs,2000.

[25] P.Wood. OntheequivalenceofXMLpatterns. In

Compu-tational Logic2000,2000.

[26] P.Wood.MinimisingsimpleXPathexpressions.InWebDB,

(15)

Proof of Proposition2.1

Obviously,ifthereisanedgefromF

1

toF

2

inFig.1thenfromthesyntacticde nitionofthesefragmentsitfollows

thatF

1

F

2

. Fortheotherdirection,itsuÆcestoshowthefollowing.

Claim1: NoneofX [] ,X " andX r

iscontainedinanyoftheothertwofragments.

Consider(1) [label =A] in X [] , (2) " in X " , and (3) #  in X r

. It is easyto see the following: expression (1)

witnesses that X [] 6 X " and X [] 6 X r , expression (2) witnesses X " 6 X [] and X " 6 X r , and expression (3) witnessesX r 6X [] andX r 6X " .

Asaresult,thisalsoshowsthat noneofX

[] ,X " andX r iscontainedinX. Claim2: NoneofX r;[] ,X " [] andX " r

iscontainedintheanyoftheothertwofragments.

Consider(1) X r;[] expression#  =A[#  =B], (2) X " []

expression "[label=A], and(3) X

" r expression #  =B="  . It

is not diÆcult to verify thefollowing: expression (1) witnesses that X

r;[] 6 X " [] and X r;[] 6X " r , expression (2) witnesses X " [] 6X r;[] and X " [] 6 X " r

, and expression (3) witnesses X

" r 6X r;[] and X " r 6X " []

. Forexample, the

proofofTheorem5.1givesaformalargumentforthat#

 =A[#



=B]is notequivalenttoanyexpressioninX

" r .

Asaresult,this alsoshowsthat noneofX

r;[] ,X " [] ,andX " r iscontainedinanyofX [] ,X " ,andX r . Claim3: X " r;[]

isnotcontainedin anyofX

r;[] ,X " [] ,andX " r .

ItisnotdiÆculttoshowthat["



=A]=#



is notequivalenttoanyexpressioninanyofX

r;[] ,X " [] andX " r . Proof of Proposition3.1 Obviously,everyX " r;[] expressionisinX " r;[^;_]

. ThusitsuÆcestoprovethateveryX

" r;[^;_] expressionp[q]isequivalent toanX " r;[]

expression. Thiscandonebyrewritingp[q]toanequivalentexpressionthatcontainsneither`^'nor`_',

basedontherulesp[q

1 ^q 2 ])p[q 1 ]=[q 2 ]andp[q 1 _q 2 ])p[q 1 ][p[q 2

]. Eachsuchsteprewritesanexpressionto

anequivalentform,andonecan thenconvert p[q]toanequivalentX

"

r;[]

expressionin a nitenumberofsteps. 2

Proof of Theorem3.2 Thecontainments S TP(child;desc )  X " r;[] 9 +

(child;desc )(c;s) were given in the body of thepaper. We now

givetheproofof9

+

(child;desc)(c;s)

S

TP(child;desc ).

Asmentionedinthebodyofthepaper,weprovethemoregeneralfactthatanarbitraryformula(c;s

1 ;:::;s n ) in 9 +

(child;desc)(c;s) isequivalent to atreepattern query. Letbeaformula,and write as9x

1 :::x

n

where

is quanti er-free. Turn into disjunctive normal form, and move the disjunction outside of the existential

quanti cation. Since the set of tree pattern queries is clearly closed under unions, it suÆces to show that the

resultholds for  of the form 9x

1 :::x

n

(~x;c;~s), where is aconjunction of statementsof the form desc (v

1 ;v 2 ), child(v 1 ;v 2

)andP(v)(with`='eliminatedviavariablesubstitution). LetV bethesetofvariablesin andconsider

thestructureL( )=(V;E;l;c;s

1 ;:::;s

n

),withconstantsforthevariablesc,~sof ,whereE(v

1 ;v 2 )i child(v 1 ;v 2 ) ordesc (v 1 ;v 2

)is aconjunct of ,and llabelsanedge withchild ifthe rstcaseholds andwith descifthe second

caseholdsandthe rstdoesnot. Byaddingextraquanti ers,wecanassumethatanytwovariablesv

1

andv

2 have

aleast upperbound in therelationE,andwecanalsoassume thatthere arenoedgerelationsE(v

1 ;v

2

)derivable

from other edge relations bytransitivity (by removingany such redundantclauses), and that llabelseverynode

(16)

1 n

treepatternsandtalkaboutanequivalentformulaF(L)in9

+

(child;desc )(c;s). Clearly,if(V;E)formsatree,then

F(L) is equivalent to atree pattern; in what follows we will modify the structure L =L( )to get an equivalent

structureinwhich(V;E)isatree.

We rstshowthat wecantake(V;E)to beacyclic. First,if thereis anycycle in (V;E) that containsanedge

labeledchild,then isequivalentto thetreepattern queryfalse, andsimilarly ifthereare twonodesin thesame

strongly-connectedcomponentof(V;E)with di erentnodelabellings. Otherwise,wetakethequotientofL( )by

theequivalencerelation \beingin thesamestronglyconnectedcomponent", andobtainan acyclicgraph. Wecan

checkthatthenewstructureL

0

hasF(L

0

)equivalentto F(L( )).

If(V;E)hasthepropertythat betweentheroot ofV and anyothernode thereisexactlyonedirected E-path,

thenL( )isatreepatternandwearedone. Foranacycliclabeledpartial orderLletp(L)bethenumberofpaths

fromtherootto aleaf. WewillgiveafunctionDC fromlabeledacyclicpartialorderssuchthat F(L)isequivalent

totheunionofF(L

i

)forL

i

2DC(L)andsuchthat wheneverLisnotatreep(L

i

)<p(L)foreveryL

i

2DC(L).

Iteratingthe function DC until each newlyobtainedpartial order is atree givesus a collectionof edge-and-node

labeledtreesL

i

suchtheF(L( ))isequivalenttothedisjunctionF(L).

Wede neDC onL=(V;E;l)asfollows: takeanyv

1

,v

2

suchthattherearetwodisjoint(exceptforend-points)

pathsfromv

1

to v

2

,andchooseanytwosuchpathsp

1 andp 2 , wherethep i 's donotinclude v 1 andv 2 (andhence

arecompletelydisjoint). Eachpathp

i

canbecodedbyastringcode(p

i

)consistingofnodesalternatingwith either

childordesc. WecallanystringwconsistingofalternatingnodesofV andelementsoffchild ;descgacode,andfor

anycodeletNodes(w)bethenodesappearinginw.

Chooseaninterleaving ofp

1

andp

2

;thatis,acodew,anorder-preservingf

1

mappingNodes(code(p

1

))bijectively

intoNodes(w),andorder-preservingf

2

mappingNodes(code(p

2

))bijectivelyinto wsuchthat:

 Nodes(w)areexactlytheunionoftherangesoff

1

andf

2 ,

 wheneverasequence(v;child;v

0

)appearsasa(contiguous)subwordofp

i

,then(f(v);child;f(v

0

))appearsasa

subwordof w.

Given any interleaving,there is acorrespondingstructure formed by replacing range(p

1

)[range(p

2

) with w and

connecting nodes that were connected to a node n in some p

i

with the image f

i

(n) in w, and connecting nodes

withinwaccordingtotheedgescodedin w.

One can check that for every such L

0

formed from this process, F(L

0

) implies F(L). Furthermore, one can

constructasurjectionoftheroot-to-leafpathsinLintotheroot-to-leafpathsofL

0

that isnot1{1,henceshowing

p(L 0

)<p(L).

Thiscompletestheproofoftheinclusionandhencethetheorem. 2

Proof of Theorem3.3

Theproofthat X

"

[]

iscontained in9

+

(child)[loc](c;s) isstraightforward,and theproofthat theset

S

TP(child)of

unarychild treepatterns iscontainedin X

"

[]

followsfromthe sameconstructionasin Theorem3.6. Itremains to

show that 9

+

(child)[loc](c;s) is contained in

S

TP(child). We go throughthe same argumentas in Theorem 3.6.

Theonly di erenceis in the stepjustifying that foreverytwovariables, thereis someupper bound in thelattice

L( ): in Theorem3.6 this wasmadetrue by addingextra quanti cations, but in thiscaseit isguaranteed bythe

(17)

We can show X r;[]

 9

+

(child;desc )[down](c;s) by induction, using the inductive semantics for X

r;[]

in terms

of logic; for example, the inductive translation of p=#

 is 9s 0  p (c;s 0 )^desc (s 0 ;s) where  p

is the coding of the

path p, and by applying the induction hypothesis to 

p

and the transitivity of desc , we see that the result is in

9 + (child;desc)[down](c;s). To see 9 + (child;desc)[down](c;s)  S

TP(child;desc )[down], inspect the proof of 9

+

(child;desc )(c;s) 

S

TP(child;desc ) ofTheorem 3.2andnote that in thiscasethevariable cwill beanupperboundon allvariables,

andhencethetranslationtherewillput cattherootofthetree.

Finally, to see

S

TP(child;desc)[down]  X

r;[]

, inspect the translation given for

S

TP(child;desc )  X

"

r;[] of

Theorem3.2andnotethat ifc isattheroot,itproducesnooccurrencesof"or"



. 2

Proof of Theorem3.7

Again the direction X

[]

 9

+

(child)[loc;down](c;s) can be seen by inspecting the translation. The direction

9 +

(child)[loc;down](c;s) 

S

TP(child)[down] follows because the original translation from 9

+

(child;desc )(c;s)

to S

TP(child;desc ) has already been shown to produce both a

S

TP(child) query in the case of a formula in

9 +

(child)[loc](c;s),anda

S

TP(child;desc )[down]queryin thecaseofaformulain 9

+

(child;desc )[down](c;s); asa

resultitmust produceaqueryin theintersection oftreepattern queries

S

TP(child)\

S

TP(child;desc )[down] for

aformulain 9

+

(child)[loc;down](c;s). Forthe lastdirection, thetranslationfrom

S

TP(child)[down]to X

[] again

usesthe translationin the proof of

S TP(child;desc ) X " r;[] , and for S

TP(child)[down] queriesthis will produce

neitherupwardmodalitiesnordescendantaxes. 2

Proof of Theorem5.1

Weshowthefollowingclaims. Theproofsof theotherparts weregivenin thebodyofthepaper.

1. X r;[] 6X " r .

Toseethat thecontainmentisproper,consider theexpression#

 =A[#



=B]. WeclaimthatthisX

r;[]

expression

isnotroot-equivalenttoanyX

" r

expression. Toseethis,assumethatpissuchanX

" r

-expression. LetN bethesize

ofp,andconsiderthefollowingstructureT: T hasrootrt,withchildn

1

,whichin turnhastwochildrenn

2

andn

3 ,

bothlabeled`A'. Thenodesn

2

and n

3

havebeneath themachain oflength N ofnodes, alllabeled`C', withthe

exceptionofthenoden

4

attheendofthechainbelown

2 ,whichislabeled`B'. Clearly,n 2 2rt[[#  =A[#  =B]]], andhencen 2

2rt[[p]]. Asin theproofin Theorem4.1, wecanassumethatpisa

simplepath. Considerthesequenceofnodesinanacceptingpathrt,v

1 ,:::,v k ,n 2 . Ifn 4

isnotamongthesenodes,

itis easyto seethat n

3

2rt[[p]], acontradiction. Letv

i

bethe lastoccurrenceof n

4

onthis path. Wedistinguish

twocases: 1. n 1 is amongv i+1 ,:::, v k

. Butthen, byfollowingtheotherpathin thetree,and usingthefact that thepath

doesnotreachaleafagain,wecanseeoncemorethat n

3 2rt[[p]], acontradiction. 2. n 1 is not among v i+1 , :::, v k

. Since the length of the chain of nodes above n

4

is greater than the number

of symbolsin p, itfollowsthat one ofthe symbols in pis "



and this mustcorrespondwith at least one pair

(v j ;v j+1 ), (ji), where v j+1

isan ancestor,butnota directparent,of v

j

. Inthis case,wereplace v

j , v j+1 , :::,v k

bytheirrespectivechildren,whichshowsthatn

0 2r[[p]], wheren 0 isthechildof n 2 . Sincen 0 islabeled `C', thisisacontradiction.

(18)

2. X =X []

intheabsenceoflabeltests.

We show that without label tests in quali ers, each X

[]

expression can be rewritten to a root-equivalent X

" r

expression. Notethat []

r

;thuswecanassumethateveryX

[]

expressionisoftheformp

1 [[p k witheach p i

havingthe form 

1 [q 1 ]== m [q m ], where  j

is one of ;,, l, or#. Starting from theinnermost quali ers, we

eliminatethequali ersusingtherewritingrules:

[] ) ; [;] ) ;; [l=q] ) =l[q]="; [#=q] ) =#[q]="; [p 1 [p 2 ] ) [p 1 ][[p 2 ]:

Here  is anysymboland therules canbeapplied at any nesteddepth of quali ers. A straightforwardinduction

on the number and the size of quali ers shows that with these rules one can convert an X

[]

expression into a

root-equivalentX

" r

expressionin a nitenumberofsteps. ThusX

"

=X

[]

intheabsenceoflabeltestsinquali ers.

Proof of Corollary5.3

Firstobservethat ifafragmentisclosedunderintersectionunder generalequivalence,itremainsclosedunder root

equivalence. Thus fromTheorem4.1itfollowsthat underrootequivalence, allofthefragmentsexceptX

"

andX

" r

areclosedunderintersection.

Wenowshowthatunderrootequivalence,X

"

isclosedunderintersection. ConsiderarbitraryX

"

expressionsp

1

andp

2

. Theorem 5.1 tellsus that underroot equivalence, there existtwoX

[] expressions p 0 1 andp 0 2 equivalent to p 1 andp 2

respectively,suchthatp

0 i

hastheformp

i;1

[[p

i;ki

andp

i;j

hastheform

i;j 1 [q i;j 1 ]== i;j m [q i;j m ],where  i;j s is not unlessp i;j is itself ,and q i;j s

doesnotcontainlabeltests. Theintersection ofp

0 1 and p 0 2 is aunionof X []

expressionsoftheform

1 [q 1 ]== m [q m

], oneforeachpair

p 1;s =  1;s 1 [q 1;s 1 ]== 1;s m [q 1;s m ] p 2;s 0 =  2;s 0 1 [q 2;s 0 1 ]== 2;s 0 m [q 2;s 0 m ]

such that forj 2[1;m],

1;s j (resp. 2;s 0 j

)isnot unlessforall j

0 >j,  1;s j =(resp. 2;s 0 j =; thisis areasonable

assumptionwhenconjunction^is allowedin quali ers),q

j =q 1;s j ^q 2;s 0 j ,and  j =min ( 1;s ; 2;s 0 ). Here min(; 0 )

isde nedasfollows: (1)itis;ifeitheroneof  or

0

is;,orif and

0

arebothdi erentlabels,orifone ofthem

iswhereastheotherisnot;(2)itisifbothand

0

are;(3)itislifoneof or

0

isthelabellandtheotheris

#;and(4)itis if=

0

. It iseasytoverifythattheunionofallsuch expressionsisindeedtheintersectionof p

1

andp

2

. Observethat theunionisanX

[]

expressioncontainingnolabeltests. Theorem5.1 thenimpliesthat this

expressioncanbeexpressedinX

"

underrootequivalence. Inotherwords,theconjunctionofp

1 andp 2 isexpressible inX " . However,X " r

isnotclosed underintersection. Tosee this,considerthe X

" r expressions#  =A and#  =B="  . The

rst expression selectsthose nodes in the tree labeled `A', while the second selects those that havea descendant

labeled`B'. Theintersectionisthereforeprecisely#



=A[#



=B],whichhasalreadybeenshown(Theorem5.1,part1)

tobeinexpressibleinX

" r .

Figura

Figure 1: F ragments of XPath
Figure 2: XPath fragments under root equivalence
Figure 3: Characterizations of the expressiveness of the fragments

Riferimenti

Documenti correlati

“yes” responses to script-consistent distracters (i.e., False alarms Consistent), “yes” re­ sponses to script-inconsistent distracters (i.e., False alarms Inconsistent),

IOCM can be defined as a strategic cost management approach based on performing cost management activities beyond the boundaries of a single organization to include supply

Attualmente nella scuola italiana, le attività di movimento ricoprono un ruolo importante e di conseguenza il laboratorio motorio che rappresenta una modalità di

Alcuni, erroneamente, considerano l’Integrated Report come una modalità più semplice per aggregare informazioni finanziarie e performance sostenibili in un unico report,

I 30-40enni di oggi sono conside- rati una generazione di rottu- ra, che ha tagliato molti ponti con la precedente e anche per questo ha difficoltà ad accetta- re un'idea di

Per dar conto della creatività dell’interprete, Davidson rinuncia a una descrizione della competenza linguistica come di un insieme di regole, di convenzioni e di

Legends: Pf1-Pf5: pet preference (i.e. palatability); F1-F5: produces shiny coat; S1-S5: normal stool appearance; O1-O5: food smell; Lk1-Lk5: feed appearance; Ps1-Ps5:

A European Organisation for Research and Treatment of Cancer phase III trial of adjuvant whole-brain radiotherapy versus observation in patients with one to three brain metastases