Dipartimento di informatica e automazione Università di Roma Tre
WEB SITES NEED
MODELS AND SCHEMES
Paolo Atzeni
atzeni@dia.uniroma3.it
http://www.dia.uniroma3.it/~ atzeni
Outline
u
Databases and information systems over the Web: a great opportunity
u
The design of Web-based information systems (WBIS)
u
Models for the design of WBIS
u
Conceptual and logical design of WBIS
u
Conclusions
Paolo Atzeni ER98 3
u
Databases and information systems over the Web: a great opportunity
G
The design of Web-based information systems (WBIS)
G
Models for the design of WBIS
G
Conceptual and logical design of WBIS
G
Conclusions
Database and information systems technology: present and future
®
the technology of relational databases is now mature and reliable: a de-facto
standard for business applications
®
the current challenge is integration:
® integration of technologies
® integration of distributed, autonomous, heterogeneous systems
Paolo Atzeni ER98 5
The need for cross-fertilization
l database technology was developed within the domain of business applications; other domains have other requirements (also very different from each other)
l database technology and “X” technology can be complementary, with potential mutual benefit
l basic problems have been solved; specific areas may have specific problems, for which the general solutions need not be satisfactory
Integration of information systems
u
Various motivations:
– interaction of components independently developed
– cooperation of previously separated business processes
– cooperation (or merge) of companies u
Typical requirements:
– distribution – heterogeneity – autonomy
Paolo Atzeni ER98 7
A topical issue
l a request from our “users”:
– “computing facilities should become similar to standard utilities (gas, phone, power, etc.)”
l our usual reply:
– “computing services have application specific features for which standard services would only be a limited solutions (as it is the case for the other utilities)”
l however
– what would a standard offer of services be?
The great opportunity
u Internet (and Intranets and Extranets) and the World-Wide-Web offer a great opportunity
u a simplified stack of layers:
–cooperation (of applications)
–interoperability (ftp, telnet, mail, http, ...) –connectivity
u standardization climbs stacks (functionalities get standardized and go down: think to
database systems!)
Paolo Atzeni ER98 9
The Web: a great opportunity
u the diffusion of the Web is ...
u the Web (with its browsers) is becoming a standard interface for the final user
–the protocol is very simple and public –the interface is uniform
–the content is very rich (in breadth and depth)
u it is becoming a standard interface for accessing many services, with information systems and databases of every type
Evolution of the Web
u
Publishing information
u
Interactive services
u
Cooperative work
u
Integration (of sources, services, etc.)
u
Embedded systems
u
Extranets
Paolo Atzeni ER98 11
Web and DBs: a contradiction?
u databases are well structured and organized
u how much structure and organization is there in the Web?
u it depends, both on the source and on the user
u there are different degrees of granularity and structure for our data
u we need to be able to make conversions (from DB to HT and viceversa)
we need the best of
databases and hypertexts!
Paolo Atzeni ER98 13
Database perspectives on the Web
u
What is the equivalent of a database (a
“source of data”)?
– a page independently of the others – the whole Web
– a site
Database approaches
u bottom-up: accessing information from Web sources
u top-down: designing and maintaining Web sites
u global: integrating existing sites and offering the information through new ones
Paolo Atzeni ER98 15
Integration on the Web
u the Web is a simple and powerful integration tool; it allows the natural implementation of (data-centred) cooperative approach
u various approaches:
–coarse integration: pages of hypertextual links
–fine-grain integration: unified interfaces for accessing different (usually similar)
information systems available on the Web
Problems
u
Databases can be queried in a flexible way; hypertexts are easy to access, but cannot be “queried”
u
Web sites are often difficult to explore, use and monitor
u
Web sites are difficult to design and
maintain
Paolo Atzeni ER98 17
Database approaches
u bottom-up: accessing information from Web sources, and integrating them
u top-down: designing and maintaining Web sites
u global: integrating existing sites and offering the information through new ones
Web-based information systems: a database point of view
u
Data-Intensive Web Sites:
– large amount of data
– significance the hypertext structure
Paolo Atzeni ER98 19
G
Databases and information systems over the Web: a great opportunity
u
The design of Web-based information systems (WBIS)
G
Models for the design of WBIS
G
Conceptual and logical design of WBIS
G
Conclusions
Problems with many Web-sites (design)
u
information is often poorly organized and difficult to access
u
it is not even clear which pieces of information are available
u
the access structure is casual and many dandling references occur
u
the style of presentation is
heterogeneous
Paolo Atzeni ER98 21
Problems with many Web-sites (maintenance)
u
difficulties in updating the content
u
difficulties in changing the initially defined structure
u
difficulties in changing the presentation details
Web-based information systems
u
What we have:
–DBMSs for the management of data –various tools for the generation of Web
pages
u
What we advocate:
–a systematic approach to Web site design:
models, steps, guidelines
–tools to support the development process
Paolo Atzeni ER98 23
Hypertext data-independence
Hypertext data-independence
Paolo Atzeni ER98 25
Hypertext data-independence
Hypertext data-independence
u Data “what information is offered through the site and what are the conceptual details and the logical organization”
u Hypertext “how data is arranged in pages and what navigation links correlate them”
u Presentation “the appearance of each piece of information in pages”
Paolo Atzeni ER98 27
Design Issues
u Data
choosing the content
u Hypertext choosing
navigation paths
u Presentation
defining layout and graphics
uData
changing the content
uHypertext changing
navigation paths
uPresentation changing layout and graphics
Maintenance
Issues
Paolo Atzeni ER98 29
G
Databases and information systems over the Web: a great opportunity
G
The design of Web-based information systems (WBIS)
u
Models for the design of WBIS
G
Conceptual and logical design of WBIS
G
Conclusions
Components and Models
data ER and Relational
hypertext
presentation HTML
What is missing is a model for hypertexts!
Paolo Atzeni ER98 31
Models for hypertexts
u in data-intensive Web sites (and often in general) there are (many) pages with a similar (or even the same) structure
u thirty or forty years ago people realized that in an application it is often the case that there are records with the same structure; files with a rather fixed structure were invented with this purpose
u the notion of scheme of the database was later introduced as an overall description of the content of a database
A Web page
Paolo Atzeni ER98 33
A page-scheme:
ProfessorPage
ProfessorPage
Name Position Address EMail
ResearchList Area ToResP
ADM (Araneus Data Model): a logical model for Web hypertexts
u page-schemes
u “unique” pages
u simple attributes –text, images, ...
–link (anchor, URL)
u complex attributes: lists (possibly nested)
u heterogeneous union
u form (as virtual list over form fields and link to
Paolo Atzeni ER98 35
A Web page
(containing a list of links)
A “unique” page-scheme:
ProfessorListPage
ProfessorListPage
ProfessorList Name ToProfP
Paolo Atzeni ER98 37
An ADM Scheme
ProfessorListPage
ProfessorList Name ToProfP
ProfessorPage
Name Position Address EMail
ResearchList Area ToResP
Heterogeneous Union and Forms
Paolo Atzeni ER98 39
Heterogeneous Union and Forms in ADM
ProfessorListPage
ProfessorList Name ToProfP
ProfessorPage
Name Position Address EMail
ResearchList Area
ToResP Name
Submit
U
SearchProfPage
Data Models
ER
ADM
Database Conceptual Scheme (entities - relationships)
Hypertext Logical Scheme (page-schemes, links)
There is a lot of ‘distance’ between the two!
Paolo Atzeni ER98 41
A simple ER scheme
An ADM scheme
Paolo Atzeni ER98 43
NCM
Hypertext ConceptualScheme (macroentities, directed relationships, aggregations)
Data Models
Hypertext Logical Scheme (page-schemes, links)
ER
NCM
Hypertext ConceptualScheme (macroentities, directed relationships, aggregations)
ADM
NCM fills the gap between the two
Database Conceptual Scheme (entities - relationships)
Paolo Atzeni ER98 45
Navigation Conceptual Model (NCM)
Hypertext
Conceptual Features
uWhich concepts should be the hypertext nodes
uWhich should be the navigation paths between nodes
uHow nodes should be aggregated to build the hierarchical access structure
NCM Constructs
u Macroentity
u Directed Relationship
u Aggregation
NCM: Macroentities and directed relationships
Professor
Course
Student
Teacher
Tutorship Name
Room Email
Name
Description
Day
Room Lesson Hour
N 1:1 1:N
1:1
Name
...
Paolo Atzeni ER98 47
NCM: aggregation nodes
Professor Course Student
Teacher Tutorship
1:N 1:1 1:1
Seminar People
Department
Activities
An NCM scheme
Paolo Atzeni ER98 49
G
Databases and information systems over the Web: a great opportunity
G
The design of Web-based information systems (WBIS)
G
Models for the design of WBIS
u
Conceptual and logical design of WBIS
G
Conclusions
The Araneus Methodology
Database conceptual design
Hypertext logical design
Presentation design
Page Generation Site generation
Presentation design Requirements analysis
Database logical design
Hypertext logical design Hypertext conceptual design
Paolo Atzeni ER98 51
design from scratch
Database conceptual design
Hypertext logical design
Presentation design
Page Generation Site generation
Presentation design Requirements analysis
Database logical design
Hypertext logical design Hypertext conceptual design
design from an existing database (with an ER scheme)
Database conceptual design
Hypertext logical design
Presentation design
Page Generation Site generation
Presentation design Requirements analysis
Database logical design
Hypertext logical design Hypertext conceptual design
Paolo Atzeni ER98 53
design from an existing database (without an ER scheme)
Database conceptual design (reverse engineering)
Hypertext logical design
Presentation design
Page Generation Site generation
Presentation design Requirements analysis
Database logical design
Hypertext logical design Hypertext conceptual design
Hypertext conceptual design:
from ER to NCM
Database conceptual design
Hypertext logical design
Presentation design
Page Generation Site generation
Presentation design Requirements analysis
Database logical design
Hypertext logical design Hypertext conceptual design
Paolo Atzeni ER98 55
Hypertext Conceptual Design
ER scheme NCM Scheme
u
step 1
choose and describe macroentities: design
“views” over the input ER scheme u
step 2
choose navigation paths u
step 3
shape the hypertext access structure on the basis of (“bottom-up”) conceptual aggregation
Hypertext Conceptual Design
ER scheme NCM Scheme
u
step 1
choose and describe macroentities:design “views” over the input ER scheme
+ usually it corresponds to “de-normalize”
the input ER scheme
Course
Lesson
Course
Name
Description Name
Description Day
Hour Lesson
Day Hour 1:1
1:N
1:N
ER NCM
Paolo Atzeni ER98 57
Hypertext Conceptual Design
ER scheme NCM Scheme
u
step 2
choose navigation paths+ it may introduce redundancies
Professor
Research-Group
1:1
1:N
ER NCM
Paper
Professor
Research-Group
1:1
1:N
Paper
1:N
1:N
1:N 1:N
1:N 1:N
Hypertext Conceptual Design
ER scheme NCM Scheme
u
step 3
shape the hypertext access structure+ it is based on “bottom-up” conceptual aggregations
Professor
Research-Group
1:1
1:N
NCM NCM
Seminar
Professor
Research-Group
1:1
1:N
Seminar
...
...
Research Activities
Paolo Atzeni ER98 59
The Input ER scheme
The resulting NCM scheme
Paolo Atzeni ER98 61
Hypertext logical design:
from NCM to ADM
Database conceptual design
Hypertext logical design
Presentation design
Page Generation Site generation
Presentation design Requirements analysis
Database logical design
Hypertext logical design Hypertext conceptual design
Hypertext Logical Design
NCM scheme ADM Scheme
u
step 1
map each macroentity into either
la page-scheme or
la list inside a page-scheme u
step 2
map each directed relationship into a (list of) link attribute(s)
u
step 3
map each aggregation into a unique page- scheme with link attributes to the target page-schemes
Paolo Atzeni ER98 63
Hypertext Logical Design
Step 1 (example)
Hypertext Logical Design
Step 1 (example)
Paolo Atzeni ER98 65
Hypertext Logical Design
Step 2 (example)
Hypertext Logical Design
Step 3 (example)
Paolo Atzeni ER98 67
R e s u l t i n g A D M S c h e m e
Maintenance
u
The Schemes help designers to maintain the hypertext structure
u
Maintenance activities correspond to apply scheme transformations:
– introduce multilevel lists – introduce forms
– split pages – ...
Paolo Atzeni ER98 69
Maintenance: example
G
Databases and information systems over the Web: a great opportunity
G
The design of Web-based information systems (WBIS)
G
Models for the design of WBIS
G
Conceptual and logical design of WBIS
u
Conclusions
Paolo Atzeni ER98 71
Conclusions
u
Models and schemes
– are essential in the design and documentation of Web sites
– can help in the generation of Web sites – can also be useful to support querying,
extraction, and integration
DBLP Site at Trier
http://dblp.uni-trier.de
Paolo Atzeni ER98 73
DBLP Site at Trier: ADM Scheme
Queries over Web Sites:
Wrappers
• The need of Wrappers;
• Pages are often logically homogeneous
Name : TEXT E.F. Codd
wrapper
Internet
HTML ADM
Paolo Atzeni ER98 75
Queries over Web Sites:
Reverse Engineering
u Building a database representation of a site is a reverse engineering process;
u First Step: Deriving the logical structure of data in the site;
u Second Step: Wrapping pages in order to map physical HTML sources to database objects;
u Both processes should be automated;
Queries over Web Sites:
Query Interfaces: Ulixes
Example of SQL Query: "All papers by Codd in the VLDB Conference"
DEFINE TABLE VLDBPapersByCodd(Title, Year)
AS AuthorSearchPage .NameForm.Submit ->
AuthorPage.WorkList IN DBLPScheme
USING AuthorPage.WorkList.Title, AuthorPage .WorkList.Year,
WHERE AuthorSearchPage .NameForm.Name='E. F. Codd’
AND AuthorPage .WorkList.Reference LIKE '%VLDB%'
Paolo Atzeni ER98 77
Integration of Web Sites
• Data-Centered Cooperative Applications on the Web:
nExtraction of data from existing sites;
nCorrelation;
nGeneration of new sites;
• Dealing with Heterogeneities:
nSchematic heterogeneities;
nSemantic heterogeneities;
Integration of Web Sites in Araneus
Paolo Atzeni ER98 79
Integration of Web Sites:
The Integrated Web Museum
– Integrates data coming from several Virtual Museums from the Web (Uffizi, Louvre and Capodimonte);
– Data are re-organized:
nUffizi, paintings organized by rooms;
nLouvre, Capodimonte, works organized by collections;
nIntegrated Museum, data organized by author.
The Integrated Web Museum
Paolo Atzeni ER98 81
The Integrated Web Museum
Bibliography
u
Will be available soon (together with the presentation) on my Web site:
http://www.dia.uniroma3.it/~atzeni
Paolo Atzeni ER98 83
Acknowledgements
u The Araneus project at Roma Tre:
– Gianni Mecca – Paolo Merialdo – Alessandro Masci – Valter Crescenzi – Giuseppe Sindoni – Marco Magnante