Scien&ﬁc and Large Data Visualiza&on Introduc&on to Informa&on Visualiza&on Massimiliano Corsini Visual Compu,ng Lab, ISTI - CNR - Italy

(1)

Scien&ﬁc and Large Data Visualiza&on Introduc&on to Informa&on Visualiza&on

Massimiliano Corsini

Visual Compu,ng Lab, ISTI - CNR - Italy

(2)

Next lessons – Overview

•  Introduc&on to Informa&on Visualiza&on

•  Mo&va&ons

•  Data Types, Graph Types and Visual Percep&on

•  Mul&dimensional Data

•  Graph Drawing

•  Prac&ce

–  Visualiza&on on the Web –  Javascript, WebGL

–  D3.js

(3)

Lesson 14-15 – Intro to InfoVis

•  Introduc&on and mo&va&ons

•  Ingredients of eﬀec&ve visualiza&on

•  Data types

•  Graph types

•  Visual percep&on

(4)

Informa,on Visualiza,on

•  Informa,on visualiza,on is the study of

(interac,ve) visual representa,ons of abstract data to reinforce human cogni,on. [Wikipedia]

•  The use of computer-supported, interac,ve, visual representa,ons of abstract data to

amplify cogni,on. [Card et al. 1999]

•  The use of computer graphics and interac,on to assist humans in solving problems.

[Purchase et al. 2008]

(5)

•  The purpose of informa,on visualiza,on is to amplify cogni,ve performance, not just to create interes,ng pictures. [Card 2007]

•  Infographics is a visual tool for

communica,on, for the understanding and for the analysis. [Alberto Cairo]

(6)

•  Diﬀerence from Scien4ﬁc Visualiza4on:

– Informa,on visualiza,on treats also abstract data (numerical and non-numerical data).

– In scien,ﬁc visualiza,on spa,al representa,on is given.

•  Diﬀerence from Visual Analy4cs:

– In Visual Analy,cs the accent is on the reasoning/

interac,on loop.

(7)

InfoGraphics – Charts

From the Wall Street Journal

(8)

InfoGraphics – Diagrams

Juan Colombata & Enzo Oliva – La Voz del Interior (Argentina)

(9)

Mo,va,ons

(10)

Mo,va,ons

Which country is close to its historical maximum ?

(11)

Mo,va,ons

Easier to answer… Why ?

(12)

Ra,onale

•  The human visual system (HVS) is very good at iden,ﬁes and analyzes paZerns.

•  We can visualize data to make easy for our brain to analyze them, for example to do comparisons.

(13)

Eﬀec,veness

•  To be eﬀec,ve, data visualiza,on should be taken into account several factors:

– Data type

– Goal (func,on)

– Visual Percep,on System

(14)

Func,onal Art

•  Func,on does not

dictate but restrict our choices.

•  This is par,cularly true for infographics.

(15)

Example – Comparisons

From “The functional art” by Alberto Cairo

(16)

Cleveland and McGill 1984

Less accurate

Less accurate

Adapted from “The functional art” by Alberto Cairo

(17)

Key Ingredients

•  In the following we focus on:

– Data types

•  One dimension, two dimensions, N-dimensions

•  Quan,ta,ve, Ordinal, Nominal

– Graph types

– Visual Percep,on

•  We give some design guidelines ,me by ,me.

(18)

Data

•  Informa,on is obtained from data (!)

•  Structured / Unstructured.

•  Generated by sensors, by computers, by humans, etc.

(19)

Variable Types

•  According to Stevens (1946):

– Nominal

•  Labels (e.g. apples, oranges, bananas)

– Ordinal

•  To ordering things (e.g. ranks of movies)

– Interval

•  Interval scale of measurements (e.g. ,me of departure- arrival)

– Ra,o

•  Measures deﬁned on a ra,o scale (e.g. the mass of an object)

S. S. Stevens, “On the theory of scales and measurements.”, Science, 103, pp. 677-680, 1946.

(20)

Variable Types

•  Category

– Steven’s nominal class (e.g. country names, type of disease)

•  Ordinal

– Labels expressing degree (e.g. cold, hot, very hot) – In general, encoded as integer data.

•  Quan,ta,ve

– Intervals, measures, etc.

– In general, real-numbered data.

(21)

Data Dimensions

•  Common dimensions:

– Univariate, bivariate, trivariate – Mul,-variate (N>3 dimensions)

•  Variables can be dependent or independent.

•  Each case is a point in a space with N dimension (data point).

(22)

Data Dimensions

•  A set of data can be represented by a table with N columns (one for each variable).

Variable 1 Variable 2 Variable 3 Data 1

Data 2 Data 3 Data 4

…

(23)

Data Rela,onships

•  A table can be used to represent a rela,onship between diﬀerent data.

User Id Game id Data 1 Smith 023923 Data 2 James 238548 Data 3 Frank 385753 Data 4 Powell 357352

…

(24)

Data Rela,onships

•  1-to-1

•  1-to-many

•  Many-to-many

•  Rela,onships may also have aZributes

(25)

What we want..

•  A set of well formed and interconnected tables of data.

(26)

What we have..

•  Data does not come in the form we would like.

•  Data may have inconsistencies:

– Corrupted data – Missing data

– Several data may be equivalent (e.g. text ﬁeld

“R&D”, “Research”, “Research and Development”,

“r*d”)

(27)

Data Processing Pipeline

•  Data does not come in the form we would like.

•  Typical data processing pipeline (from raw data to clean structured data):

1.  Collect data.

2.  Data simpliﬁca,on (extract a subset of interest).

3.  Clean and structure them.

•  An output of the pipeline can be also metadata.

(28)

Data Collec,on

•  Search and download data.

•  Parse text.

•  Convert between diﬀerent formats.

– Examples: CVS to JSON , MySQL to HTML , etc.

•  Merge heterogeneous sources.

(29)

Data Simpliﬁca,on

•  Filter / Selec,on

– Remove unwanted data

– Remove invalid data (null values)

•  Aggrega,on

– Collapse several data points into a single one – Replace some values with minimum, maximum,

average, total, etc.

(30)

Data Processing Pipeline

•  Typically performed automa,cally using scripts.

•  Human-guided data transforma,ons is possible (through macro-opera,ons)

– Use appropriate tools (e.g. OpenReﬁne - h?p://openreﬁne.org)

(31)

Metadata

•  Data which describes the data

– Role of variables – Type of variables – Constraints

– Dependencies

•  Collec,on and processing opera,ons can be also described.

(32)

How to Present Data ?

•  How to present data graphically ?

– To allow visual analysis – To highlight paZerns

– To answer some speciﬁc ques,ons – Etc.

(33)

Quan,ta,ve Values

•  We have just men,oned the work by Cleveland & McGill (1984)

– Posi,on – Length

– Angle / slope – Area

– Volume

– Color satura,on / shading

Perceptual Accuracy

(34)

From “Data Visualization” Course by John C. Hart, for Coursera, 2015.

QUANTITATIVE ORDINAL NOMINAL

Posi,on Posi,on Posi,on

Length Density Hue

Angle Satura,on Texture

Slope Hue Connec,on

Area Texture Containment

Volume Connec,on Density

Density Containment Satura,on

Satura,on Length Shape

Hue Angle Length

Slope Angle

Area Slope

Volume Area

Volume

(35)

Table vs Graphs

•  Present tables directly is preferred when:

– Few data points

– Precise values are important

(36)

Univariate Data

•  Few interes,ng solu,ons.

•  Sta,s,cal descrip,on:

– Mean, median, standard devia,on, quar,les.

•  Warning! (some data that appears to be univariate are actually bivariate).

(37)

Stemplots

•  Also called stem and leaf plot.

•  Used to display quan,ta,ve data, generally from small data sets (50 or

fewer observa,ons).

•  Easy to print.

•  Easy to read.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(38)

Stemplots

(39)

Box and Whisker Plots

•  Box and Whisker Plot (or Box Plot) is a convenient way of visually displaying a data distribu,on through their quar,les.

•  Advantages:

–  Key values (average, median, 25th percen,le, etc.)

–  If there are any outliers and what their values are.

–  If the data is skewed and in what direc,on.

(40)

Box and Whisker Plots

(41)

Line Charts/Line Graphs

•  Invented by the Scotsh engineer and sta,s,cian William Playfair (1759-1823)

•  Two quan,ta,ve variables, typically:

–  X à ,me or intervals , Y any

•  Line indicates that there are also intermediate values

(42)

Bar Charts

•  Bivariate Data

– One nominal variable (typically independent) – One quan,ta,ve variable

(typically dependent variable)

•  Horizontal/Ver,cal bars

•  Do not confuse with histograms.

(43)

Bar Charts

3D?!

Not a very good idea..

(44)

Histograms

– One independent and one dependent variable – The ﬁrst variable is

quan,zed in intervals (bins)

(45)

Pie Charts

– One independent and one dependent variable

•  Good for a quick visual check.

•  Not good for:

– Many values.

– Accurate comparisons.

(46)

(47)

Pie Charts

3D?!

Not more readable.

(48)

Figure by Luiz Salomão.

(49)

•  Essen,ally, a pie chart with the center area cut out.

•  Allows to focus more on arc length instead of

comparing the propor,on between slices.

•  More space eﬃcient.

Donut Charts

(50)

Sunburst Charts

•  To shows hierarchy

through a series of rings.

Each ring corresponds to a level in the hierarchy.

•  Hierarchy moving outwards from the center.

•  Colour can be used to highlight hierarchal groupings or speciﬁc categories.

(51)

Sunburst Charts

Produces by Space Radar app (https://github.com/zz85/space-radar)

(52)

ScaZer Plots

– Two independent variables

•  Good to iden,fy

rela,onships, outliers and clusters.

Variable 1 Variable 2

(53)

ScaZer Plots

Variable 1 Variable 2

Outliers High

Density

Variable 1 Variable 2 Clusters

(54)

Surface Graphs

•  Trivariate Data

– Three con,nuous variables

– Two independent and one dependent

(55)

Surface Graphs

•  Color may be associated to the dependent variable.

(56)

Surface Graphs

•  Level curves can be also used.

(57)

3D ScaZer Plots

– Three quan,ta,ve variables

•  Same concept of 2D ScaZer Plot

(58)

Colored ScaZer Plots

•  Color can encode a

variable or a category

(59)

Bubble Charts

– Three quan,ta,ve variables

•  Do not allow for accurate comparison.

•  Colors can be used to show diﬀerent categories.

(60)

Bubble Charts

Figure from http://gapminder.com

(61)

Chloropleth Map

–  Quan,ta,ve value over a geographical areas/regions

•  Data are coloured, shaded or paZerned in diﬀerent ways.

•  Good for an overview, not for accurate comparison.

•  Small areas can be underemphasized.

(62)

Word Cloud

•  Word Clouds displays how frequently words appear in a given body of text, by making the size of each word propor4onal to its

frequency.

•  Arrangement and color can vary a lot.

•  Word Clouds can be used to compare two bodies of text or to give a quick idea of

repea,ng keywords (e.g. used by researchers to summarize the content of their papers).

(63)

Word Clouds

•  Disadvantages:

– Long words are emphasized over short words.

– Words whose leZers contain many ascenders and descenders may receive more aZen,on.

– No accuracy comparison, mainly used for aesthe,c reasons.

(64)

Word Clouds (example)

Produces by www.jasondavies.com/wordcloud.

(65)

Produces by www.jasondavies.com/wordcloud.

(66)

From www.wordclouds.com .

(67)

Summary

•  A brief introduc,on on Informa,on Visualiza,on has been given.

•  Informa,on coming from data. Data that

should be collected and processed properly.

•  A quick panoramic of graph types has been given.

(68)

Ques&ons ?