An An Introduction to Relational Introduction to Relational DatabasesDatabases
An Introduction to
Relational Databases An An Introduction to Introduction to
Relational
Relational Databases Databases
–– ChapterChapter 1 –1 –
Foundations of Information Systems (WS 2008/09) Foundations
Foundations of Information Systems (WS 2008/09)of Information Systems (WS 2008/09)
Strategy
Strategy in this chapterin this chapter
Microsoft Access for Beginners Microsoft Access Microsoft Access
for Beginners for Beginners
Basic concepts of relational
databases Basic
Basic conceptsconcepts of relational of relational
databases databases Strategy
Strategy in thisin this chapter:chapter:
•• PracticePractice
•• Application orientedApplication oriented
•• Hands-Hands-on on experienceexperience
•• TheoryTheory
•• PrinciplesPrinciples and ideasand ideas
•• Scientific basisScientific basis
BothBoth sidessidesareare important:important: LearningLearningLearning without applying is rather useless !withoutwithout applyingapplyingis rather uselessis rather useless !!
Relational
Relational databasesdatabases: General remarks: General remarks
•• At presentAt present, , thethe DB-DB-marketmarket is completely dominated by systems is completely dominated by systems supporting the supporting the relational
relational modelmodel of dataof data..
•• Leading (commercialLeading (commercial) manufacturers) manufacturers of relational DBof relational DB--productsproducts::
•• The notionThe notion"relational" is"relational" is motivatedmotivated bybythethemathematicalmathematicalconceptconcept of a relationof a relation. . Relations in
Relations in mathematics are sets of mathematics are sets of tuplestuples. .
•• Relational databasesRelational databases areare collectionscollectionsof of one or moreone or more relations.relations.
•• In practiceIn practice, relations , relations can becan bevisualized asvisualized as tables,tables, the rowsthe rows of of which are individual which are individual records
records of of datadatawith thewith the samesame(homogeneous(homogeneous) ) fieldfield structure.structure.
•• In scienceIn science, relational , relational databases have a databases have a broad range broad range of of theoretical foundationstheoretical foundations.. Oracle
Oracle SybaseSybase
Microsoft (Access, SQL Server)
Microsoft (Access, SQL Server) Postgres Postgres (Freeware)(Freeware) IBM (DB2,
IBM (DB2, Informix)Informix) MySQLMySQL (Freeware) (Freeware)
Relational
Relational databasesdatabases: History: History
Edgar F.
Edgar F. CoddCodd
•• TheTheideaideato organize datato organize data in tablesin tables isis quite quite old and
old and pretty obviouspretty obvious. .
•• TheTheidea to idea to investigate this representationinvestigate this representation of data by means of data by means of of the theory of relationsthe theory of relations is due
is due to to one one man, man, who proposed this viewwho proposed this view at the at the end of end of the the 1960s: 1960s:
Edgar F.
Edgar F. CoddCodd
•• In 1970, he publishedIn 1970, he published his seminalhis seminal paper paper
"A Relational Model of
"A Relational Model of Data for Large Data for Large Shared Data
Shared Data Banks",Banks",
in whichin which he fixedhe fixed all foundationsall foundations of relationalof relational databases
databases withwith amazingamazingprecisionprecision and and clarityclarity. .
•• Codd died in Codd died in earlyearly 2003.2003.
For this pioneering work Codd received the Turing Award in 1982, the „Nobel price of informatics".
For this pioneering work Codd receivedFor this pioneering work Codd received the Turing theTuring AwardAwardin 1982, thein 1982, the „Nobel „Nobel price
price of informaticsof informatics".".
Literature Literature
A strong recommendation for your own studiesA strong recommendation for your own studies::
Ramon A. Mata-Toledo, Pauline K. Cushman
“Fundamentals of Relational Databases”
(Schaum's Outlines)
McGraw-Hill Professional ISBN 978-0071361880 249 pp.
€ 15,99 (amazon.de) Ramon A. Mata
Ramon A. Mata-Toledo-Toledo, , Pauline K. CushmanPauline K. Cushman
““Fundamentals of Relational Databases”Fundamentals of Relational Databases”
(Schaum's(Schaum's Outlines) Outlines)
McGMcGraw-raw-Hill Professional Hill Professional ISBN
ISBN 978-978-0071361880 0071361880 249 pp.
249 pp.
€€ 15,99 (amazon15,99 (amazon.de).de)
There are many
There are many good and good and expensive academic textbooks expensive academic textbooks on (relational) on (relational) databasesdatabases.. This one is cheap
This one is cheap and and not really not really a higha high--profile book, profile book, but it fits perfectly with ourbut it fits perfectly with our lecture
lecture, , is is upup--toto--date, date, very readable and very readable and covers exactly what you needcovers exactly what you need.. Everybody should have
Everybody should have his/her his/her own copy!own copy!
A A briefbriefprimer in dataprimer in data representationrepresentationformatsformatsand dataand data manipulationmanipulationparadigmsparadigms
•• BeforeBefore „diving„diving into“into“ relational databasesrelational databases proper, weproper, we will brieflywill briefly investigateinvestigatevariousvarious competitive
competitive formatsformats of representingof representing and manipulatingand manipulating datadata arrangedarrangedin tabularin tabular form:form:
•• plainplain text filestext files
•• formattedformatted text filestext files
•• spreadsheetsspreadsheets
•• relational databasesrelational databases
•• In eachIn each representationrepresentationformat, format, thethe datadataareare storedstored in filesin files. Such . Such filesfiles maymaywell bewell be con-con- sidered
sidered as databasesas databases ––however, however, therethere areare different degreesdifferent degrees of „of „databasedatabase--nessness““!!
•• EachEach formatformat comescomesalongalong withwith a speciala special software system (orsoftware system (or programprogram) that) that controlscontrols anyany kindkind of accessof access to and manipulationto and manipulation of of thethe respectiverespective „database„database““..
•• DataData manipulationmanipulation in thisin this contextcontextmeansmeans searchingsearching forforspecialspecial datadata in thein the filefile and/orand/or changing
changing (adding(adding, , deletingdeleting, , modifyingmodifying) ) datadata..
•• EachEach of theseof these pairspairs of representationof representation formatformat+ manipulation+ manipulation system cansystem can bebe viewedviewed asas a particulara particular variantvariant of theof the equationequation DBS = DBMS + DB. DBS = DBMS + DB.
Reminder:
Reminder: databasedatabase (management(management) systems) systems
DBDB DBMSDBMS
Database System Database System Database System
. . . .
users
users and and application
application programmesprogrammes DBMS: Database Management SystemDBMS:
databases databases
(application(application--independent independent servicesservices forfor managing
managing data)data)
A A littlelittle„case„case study“study“: : TheThe chemicalchemicalelementselements„database„database““
Representing
Representing information/information/datadata aboutaboutthethe 116 chemical116 chemical elementselementsin different in different formats:formats:
•• as textfile as textfile ((separatedseparatedbyby tabstabs))
•• as textfile as textfile ((separatedseparatedbyby semicolonssemicolons))
•• as Word as Word filefile
•• as as Excel Excel filefile
•• as Access as Access (relational) database(relational) database
•• TheThe simplestsimplest formatformatforfor representingrepresenting thethe chemicalchemicalelementselements datadataisis thethe text file format..
•• Text filesText files (extension(extension .txt under.txt under Windows) Windows) areareconceptuallyconceptually just longjust long stringsstrings of printable of printable symbols
symbols (such as digits(such as digits, characters, characters, or, or eveneven blanksblanks) arranged) arranged in linesin lines. .
•• ByBy hand, spaceshand, spaces in a text filein a text file cancan bebearrangedarranged in such a wayin such a way thatthat e.g. a tabular structuree.g. a (rows(rows--columnscolumns))appearsappears..
•• Thus, Thus, text files appearappear asas simple
simple databases..
•• A text editorA cancan bebe usedused forfor performingperformingsimplesimple manipulations
manipulations of theof the filefile contents
contents, such , such as patternas pattern matching
matching and substringand substring replacement
replacement: : TheTheeditoreditor
Elements
Elements databasedatabase: .txt : .txt filefile + text editor+ text editor (1)(1)
•• DataData in text filesin text files cancan bebe arrangedarranged in anyin any form form convenientconvenientforfor humanshumans readingreading thatthatfile.file.
•• TheThe text editortext editor isis unableunableto „to „seesee““ thethe particularparticular structuringstructuring conventionconvention (e.g. columns(e.g. columns).).
•• Computer programsComputer programs usingusing thethe
„text DB„text DB““ don‘don‘t t needneedvisualvisual support
support either, either, theythey just needjust need somesome meansmeans of separating of separating individual
individual partspartsof theof the data.data.
•• In thisIn this text filetext file versionversion of of thethe elements
elements data, data, thethe line structure isis retainedretained (one(one elementelementper line),per line), butbutcolumnscolumnsarearejust just separatedseparated byby a semicolon delimiter.a .
•• WeWe will deal laterwill deal later withwithdifferent different popular
popular structuringstructuringconventionsconventions (not(not mademade forforpeople, butpeople, but forfor pro-pro-
grams
grams) such ) such asas XML and RDF.XML and RDF.
Elements
Elements databasedatabase: .txt : .txt filefile + text editor+ text editor (2)(2)
•• MS Office MS Office offersoffersa mucha much moremore powerfulpowerfultext text editor, editor, calledcalled MS Word, , supportingsupporting e.g.e.g.
a „a „truetrue““ tabulartabularformatformatforfor datadata(visualizing(visualizing rowsrowsand columnsand columns automatically). Textautomatically). Text files
files managedmanaged byby thetheWord software areWord software are identifiableidentifiable byby theirtheir extensionextension.doc..
•• In In additionaddition to theto the normal
normal functionsfunctions ofof a „a „plainplain““ text editortext editor, , Word
Word offersoffers severalseveral
„luxury„luxury““ variantsvariants of text
of text editingediting..
•• WordWord‘s ‘s abilityability to searchto search and changeand change datadata isis notnot moremore powerfulpowerfulthanthan thatthat of „of „normalnormal““editorseditors..
•• BeingBeing ableable to recognize to recognize tables
tables, , howeverhowever, , makes makes Word
Word moremore convenient.convenient.
Elements
Elements databasedatabase: .doc : .doc filefile + MS Word editor+ MS Word editor
•• Word files are just „illusions“ created by the MS Word system. IfIf openedopened withwitha normala normal text
text editoreditor, , theythey turn out to beturn out to be specialspecial text filestext files containingcontaininga lot of a lot of crypticcrypticspecialspecial symbolssymbols generated
generated (and used(and used) by) by thethe Word software.Word software.
•• TheThe„Word DBMS„Word DBMS““ interprets
interprets thesethesespecialspecial symbols
symbols in order to, e.g.,in order to, e.g., generate
generate thethe table formattable format notnotvisiblevisible in „in „plainplain““ text
text filesfiles..
•• In additionIn addition, Word , Word insertsinserts plenty
plenty of otherof other internalinternal codecode whichwhichisis neededneeded in order to
in order to bebe ableable toto offer
offer thethe extra functionsextra functions notnotpresentpresent in a textin a text editor
editor..
Elements
Elements databasedatabase: .doc : .doc filefile + text editor+ text editor
•• ContainedContained in eachin each Office packageOffice package on Microsoft PCs, thereon Microsoft PCs, there isisan an eveneven moremore powerfulpowerful tooltool forformanagingmanaging data, a sodata, a so--calledcalled spreadsheetspreadsheet programprogramcalledcalledMS Excel supporting supporting files
files withwithextensionextension .xls..
•• IfIf processedprocessed viavia Excel,Excel, eveneven moremore detailsdetails ofof tabular
tabular structurestructurebecomebecome visible
visible and canand can bebe mani-mani- pulated
pulated..
•• For searchingFor searching and changingand changing,, thethe Excel system doesExcel system does notnot exceed
exceed thethe functionalityfunctionalityofof Word.
Word.
•• Excel, howeverExcel, however, , isis speciaspecia-- lized
lized in statistical evalua-in tions of numericalof numerical datadata..
Elements
Elements databasedatabase: .: .xlsxls filefile+ MS Excel system+ MS Excel system
•• BehindBehind thethe surface, surface, howeverhowever, , therethere isis againagain a speciallya specially formattedformatted text filetext file format, format, containing
containing plentyplenty of of internalinternalcontrolcontrol symbolssymbols interpretedinterpretedbybythethe Excel software.Excel software.
•• IfIf openedopened withwitha normal text a normal text editor, editor, thisthis „hidden„hidden““ informationinformationbecomesbecomes visible. visible. ItIt isis inin principle
principle notnotdifferent from different from handhand-made-made separatorsseparatorslikelike semicolons
semicolons mixedmixed withwith thethe „proper„proper““datadata partsparts as seen
as seen before.before.
•• WhatWhat mattersmatters isisthethe specialspecial software
software managingmanaging thesethese
„enhanced„enhanced text filestext files“, “, in in thisthis casecase thethe ExcelExcel system.
system.
•• EnhancedEnhanced system functiosystem functio-- nality
nality requiresrequiresan an enhancedenhanced representation format for representation format for data. data.
Elements
Elements databasedatabase: .: .xlsxls filefile+ text editor+ text editor
•• EachEach MS Windows computerMS Windows computer withwithMS Office software MS Office software offersoffersan an evenevenmoremore powerfulpowerfultooltool forfor managingmanaging data, data, calledcalled MS Access ––thisthis isis thethefirstfirstsystem computersystem computer scientistsscientists wouldwould callcall a „a „realreal““ DBMS. Files „DBMS. Files „understoodunderstood““ byby Access haveAccess have thethe extensionextension.mdb..
•• Access supportsAccess supports a tabulara tabular viewview of of data, data, tootoo, like, like Word and Excel, Word and Excel, butbut offers
offers a mucha much, , muchmuch moremore powerfulpowerfulsetsetofof techniques
techniques forfor searchingsearching and changingand changing data.data.
•• Access will beAccess will be consideredconsidered in morein more detail in detail in thethe
remainder
remainder of of thisthis section
section, , dedicateddedicated to to so-so-calledcalled relational databases..
Elements
Elements databasedatabase: .: .mdbmdb filefile+ MS Access DBMS+ MS Access DBMS
•• OpeningOpening an mdban mdb..filefile (alias an Access database(alias an Access database) ) withwith a text editora text editor revealsreveals thethe „true„true nature“nature“ of of thetherepresentationrepresentation againagain: : HeavylyHeavyly formattedformatted text filetext file formatformatwithwith excessiveexcessiveuseuseof of
internal
internal codingcodinginterpretableinterpretable forfor thethe MS Access DBMS onlyMS Access DBMS only..
•• NeverthelessNevertheless, searching, searching e.g.
e.g. forfor certaincertain symbolssymbols oror stringsstringswithinwithin thisthis filefile withwith a text editora text editor returns
returns thethe same same results
results as searchingas searching in thein the human-human-friendlyfriendly tabular
tabular text filetext file fromfrom thethe beginning.beginning.
•• TrickyTricky internalinternal format-format- tingting plus intelligentplus intelligent interpreting
interpreting softwaresoftware isis ableable to generateto generate powerful illusions powerful illusions about databases about databases!!
Elements
Elements databasedatabase: .: .mdbmdb filefile+ text editor+ text editor
Relational
Relational databasedatabase proper vs. dedicatedproper vs. dedicated filefileformatsformats
•• In theIn the remainderremainder of of thisthischapterchapter, we, we will usewill use thethe last of theselast of these representationrepresentationformatsformats only:only:
•• In additionIn addition, we, we will forgetwill forget aboutabouttext text editors, Word and Excel, and editors, Word and Excel, and exploreexplore thethe power ofpower of Access
Access, a , a truetrue relational DBMS.relational DBMS.
•• In theIn the companioncompanion lecturelecturebyby Prof. Hofmann-Prof. Hofmann-ApitiusApitius youyouwill getwill get to knowto know a widea wide spectrumspectrum of additional
of additional datadata representationrepresentation formats, formats, manymany of of themthemdevelopeddevelopedparticularlyparticularly forfor lifelife science
science applicationsapplications („(„dedicateddedicated formats“formats“).).
•• All of All of thesethese formatsformatsareare ultimatelyultimately basedbasedon text fileson text files as underlyingas underlying „real„real““ format. Specialformat. Special structuring
structuring informationinformationisis alwaysalways interleaved with „interleaved with „plainplain““ datadata ––asas was shownwas shown forfor thethe general
general--purposepurposeformatsformats .doc, ..doc, .xlsxls, and ., and .mdbmdb discusseddiscussed beforebefore..
•• In additionIn addition, most, most of theseof these dedicateddedicated formatsformatscomescomeswithwith itsits ownown „gatekeeper„gatekeeper““ software,software, comparable
comparable to Word, Excel orto Word, Excel or Access in thatAccess in that ititinterpretsinterprets thethe specialspecialformatformatin a in a particu-particu- larlar, system-, system-specificspecific way. way.
Relational Databases Relational
Relational DatabasesDatabases
Our example
Our example DBMS: Microsoft AccessDBMS: Microsoft Access
•• AccessAccessisis a DBMS fora DBMS for relational databasesrelational databases (data(data organizedorganized in form of tablesin form of tables), ), developeddeveloped and distributedand distributed sincesince 1992 by1992 by Microsoft.Microsoft.
•• "Access-"Access-Homepage" at Microsoft:Homepage" at Microsoft:
http://
http://www.www.microsoftmicrosoft.com.com/office/office//accessaccess//defaultdefault..aspasp
•• recentrecentversionversionin MS Office packagesin MS Office packages: Access 2000: Access 2000
•• Access Access isis veryvery wellwell--suitedsuited forfor smallsmall to to mediummediumDB DB applicatonsapplicatons in singlein single--useruser mode.mode.
•• useful internetuseful internet tutorialstutorialson Access on Access
•• Michael BrydonMichael Brydon‘‘s s tutorialtutorial at Simon Fraser at Simon Fraser University, University, CanadaCanada http://
http://mis.mis.busbus..sfusfu..caca//tutorialstutorials//MSAccessMSAccess//tutorialstutorials..htmlhtml
•• Maggie StraplandMaggie Strapland‘‘s Access s Access pagespages at University of Bristol, UKat University of Bristol, UK http://
http://www.www.brisbris.ac.ac..ukuk//isis//servicesservices/software//software/packagespackages//accessaccess//
•• Jakob Jakob LindenmeyerLindenmeyer‘‘s Access s Access tutorialtutorialat ETH Zat ETH Züürich, Schweizrich, Schweiz http://
http://www.inf.www.inf.ethz.ethz.chch/personal//personal/lindenmelindenme//publicationspublications//accessaccess//AccessTutorialAccessTutorial..htmlhtml
• In additionIn addition, , therethere areare many, many, manymany booksbooks on howon how to useto use Access, mostAccess, most of of themthemnotnot really
really thatthat helpful, helpful, becausebecause therethere isis poorpoorstructurestructure and tooand too manymany detailsdetails..
Throughout the course, we will use a small, but handy DBMS available on most PCs.
Throughout the course
Throughout the course, , we we will will use use a a smallsmall,, but handy
but handy DBMS DBMS availableavailable on most on most PCs.PCs.
A A first example database: European first example database: European geographygeography
•• Our first „Our first „real real worldworld““ database exampledatabase example is about geography
is about geography: Facts about coun: Facts about coun-- tries
tries, cities, cities etc. in Europe !etc. in Europe !
•• A wealth A wealth of of geo data can be accessedgeo data can be accessed freely
freely on on the the web in web in the „the „World World FactFact Book“Book“ of of the CIA:the CIA:
http://
http://www.www.ciacia..govgov//ciacia/publications/publications// factbook
factbook//indexindex..htmlhtml
•• You will find a You will find a database called database called
„europe„europe.mdb.mdb““
on the lecture homepage for your ownon the lecture homepage for your own
„„experimentsexperiments““. . This This will will continuously continuously grow
grow ––you are invited to you are invited to helphelp!!
•• At this moment At this moment in in our lectureour lecture, , the geothe geo database serves as
database serves as a a first first „„appetizerappetizer““ toto (relational)
(relational) database managementdatabase management..
A relational
A relational database aboutdatabase aboutEuropean geographyEuropean geography
Europe.
Europe.mdb mdb isis a small a small database for introductory database for introductory purposes
purposes. . Just
Just now it contains now it contains twotwo tables
tables, , one one on on countriescountries in Europe,
in Europe, the other the other on on cities
cities. . Today we
Today we will just will just learnlearn the most basic ideas about the most basic ideas about relational
relational databasesdatabases ––some some of you of you will will be already famibe already fami-- liar with this
liar with this..
And we And we will will discuss thediscuss the important question why important question why a simple text a simple text filefile is notis not sufficient for keeping data sufficient for keeping data..
Tables
Tables in Access: some basic terminologyin Access: some basic terminology
Italy I Rome 301230 57460274 2001 . . .
Italy
Italy I Rome I Rome 301230 57460274 2001301230 57460274 2001 . . .
. . .
country code capital area population yea country code capital area population yearr
field names field names
countries countries
table
table namename
text
text char text integer integer integerchar text integer integer integer
field data type field data type record record
Relational
Relational tables are gridstables are grids, , the fields the fields of which are consistingof which are consisting of columns of columns and and rowsrows. . There is
There is a specific terminology for a specific terminology for such such tables in Access.tables in Access.
Terminological confusion possible Terminological confusion possible !!
relation table datasheet
tuple row record
attribute column field name
domain data type field data type
relation
relation tabletable datasheetdatasheet tuple
tuple rowrow recordrecord
attribute
attribute columncolumn field namefield name domain
domain data typedata type field data typefield data type theory
theory SQLSQL Access Access Unfortunately
Unfortunately, , the basic conceptsthe basic concepts of the of the relational relational model model arearedenoteddenoted byby different termsdifferent terms depending
depending on theon the context. context. There are There are synonymoussynonymous, , but but different different terminologiesterminologies in databasein database theory
theory, , the standard the standard DB DB language language SQL and MS Access:SQL and MS Access:
Be Be warned warned of of this this „„Babylonic confusionBabylonic confusion““of of terms terms –– we urgently recommend that youwe urgently recommend that you always
always stick to a stick to a single single system of notions system of notions in a in a consistent mannerconsistent manner. . It doesnIt doesn‘‘t matter t matter whichwhich system
system you use you use –– but never mix but never mix them them upup !!
Access table:
Access table: Datasheet view Datasheet view
Access table Access table
„countries„countries":":
Datasheet view Datasheet view
Access table: Design Access table: Design viewview
Access table Access table
„countries„countries":":
Design
Design viewview
field size field size format format input mask input mask caption caption default value default value validation rule validation rule validation validation texttext Required Required
Allow zero length Allow zero length Indexed
Indexed
Unicode compression Unicode compression
field name field data type
field name field data type description description