• Non ci sono risultati.

Scien&fic and Large Data Visualiza&on Introduc&on to Informa&on Visualiza&on Massimiliano Corsini Visual Compu,ng Lab, ISTI - CNR - Italy

N/A
N/A
Protected

Academic year: 2021

Condividi "Scien&fic and Large Data Visualiza&on Introduc&on to Informa&on Visualiza&on Massimiliano Corsini Visual Compu,ng Lab, ISTI - CNR - Italy"

Copied!
68
0
0

Testo completo

(1)

Scien&fic and Large Data Visualiza&on Introduc&on to Informa&on Visualiza&on

Massimiliano Corsini

Visual Compu,ng Lab, ISTI - CNR - Italy

(2)

Next lessons – Overview

•  Introduc&on to Informa&on Visualiza&on

•  Mo&va&ons

•  Data Types, Graph Types and Visual Percep&on

•  Mul&dimensional Data

•  Graph Drawing

•  Prac&ce

–  Visualiza&on on the Web –  Javascript, WebGL

–  D3.js

(3)

Lesson 14-15 – Intro to InfoVis

•  Introduc&on and mo&va&ons

•  Ingredients of effec&ve visualiza&on

•  Data types

•  Graph types

•  Visual percep&on

(4)

Informa,on Visualiza,on

•  Informa,on visualiza,on is the study of

(interac,ve) visual representa,ons of abstract data to reinforce human cogni,on. [Wikipedia]

•  The use of computer-supported, interac,ve, visual representa,ons of abstract data to

amplify cogni,on. [Card et al. 1999]

•  The use of computer graphics and interac,on to assist humans in solving problems.

[Purchase et al. 2008]

(5)

Informa,on Visualiza,on

•  The purpose of informa,on visualiza,on is to amplify cogni,ve performance, not just to create interes,ng pictures. [Card 2007]

•  Infographics is a visual tool for

communica,on, for the understanding and for the analysis. [Alberto Cairo]

(6)

Informa,on Visualiza,on

•  Difference from Scien4fic Visualiza4on:

– Informa,on visualiza,on treats also abstract data (numerical and non-numerical data).

– In scien,fic visualiza,on spa,al representa,on is given.

•  Difference from Visual Analy4cs:

– In Visual Analy,cs the accent is on the reasoning/

interac,on loop.

(7)

InfoGraphics – Charts

From the Wall Street Journal

(8)

InfoGraphics – Diagrams

Juan Colombata & Enzo Oliva – La Voz del Interior (Argentina)

(9)

Mo,va,ons

(10)

Mo,va,ons

Which country is close to its historical maximum ?

(11)

Mo,va,ons

Easier to answer… Why ?

(12)

Ra,onale

•  The human visual system (HVS) is very good at iden,fies and analyzes paZerns.

•  We can visualize data to make easy for our brain to analyze them, for example to do comparisons.

(13)

Effec,veness

•  To be effec,ve, data visualiza,on should be taken into account several factors:

– Data type

– Goal (func,on)

– Visual Percep,on System

(14)

Func,onal Art

•  Func,on does not

dictate but restrict our choices.

•  This is par,cularly true for infographics.

(15)

Example – Comparisons

From “The functional art” by Alberto Cairo

(16)

Cleveland and McGill 1984

Less accurate

Less accurate

Adapted from “The functional art” by Alberto Cairo

(17)

Key Ingredients

•  In the following we focus on:

– Data types

•  One dimension, two dimensions, N-dimensions

•  Quan,ta,ve, Ordinal, Nominal

– Graph types

– Visual Percep,on

•  We give some design guidelines ,me by ,me.

(18)

Data

•  Informa,on is obtained from data (!)

•  Structured / Unstructured.

•  Generated by sensors, by computers, by humans, etc.

(19)

Variable Types

•  According to Stevens (1946):

– Nominal

•  Labels (e.g. apples, oranges, bananas)

– Ordinal

•  To ordering things (e.g. ranks of movies)

– Interval

•  Interval scale of measurements (e.g. ,me of departure- arrival)

– Ra,o

•  Measures defined on a ra,o scale (e.g. the mass of an object)

S. S. Stevens, “On the theory of scales and measurements.”, Science, 103, pp. 677-680, 1946.

(20)

Variable Types

•  Category

– Steven’s nominal class (e.g. country names, type of disease)

•  Ordinal

– Labels expressing degree (e.g. cold, hot, very hot) – In general, encoded as integer data.

•  Quan,ta,ve

– Intervals, measures, etc.

– In general, real-numbered data.

(21)

Data Dimensions

•  Common dimensions:

– Univariate, bivariate, trivariate – Mul,-variate (N>3 dimensions)

•  Variables can be dependent or independent.

•  Each case is a point in a space with N dimension (data point).

(22)

Data Dimensions

•  A set of data can be represented by a table with N columns (one for each variable).

Variable 1 Variable 2 Variable 3 Data 1

Data 2 Data 3 Data 4

(23)

Data Rela,onships

•  A table can be used to represent a rela,onship between different data.

User Id Game id Data 1 Smith 023923 Data 2 James 238548 Data 3 Frank 385753 Data 4 Powell 357352

(24)

Data Rela,onships

•  1-to-1

•  1-to-many

•  Many-to-many

•  Rela,onships may also have aZributes

(25)

What we want..

•  A set of well formed and interconnected tables of data.

(26)

What we have..

•  Data does not come in the form we would like.

•  Data may have inconsistencies:

– Corrupted data – Missing data

– Several data may be equivalent (e.g. text field

“R&D”, “Research”, “Research and Development”,

“r*d”)

(27)

Data Processing Pipeline

•  Data does not come in the form we would like.

•  Typical data processing pipeline (from raw data to clean structured data):

1.  Collect data.

2.  Data simplifica,on (extract a subset of interest).

3.  Clean and structure them.

•  An output of the pipeline can be also metadata.

(28)

Data Collec,on

•  Search and download data.

•  Parse text.

•  Convert between different formats.

– Examples: CVS to JSON , MySQL to HTML , etc.

•  Merge heterogeneous sources.

(29)

Data Simplifica,on

•  Filter / Selec,on

– Remove unwanted data

– Remove invalid data (null values)

•  Aggrega,on

– Collapse several data points into a single one – Replace some values with minimum, maximum,

average, total, etc.

(30)

Data Processing Pipeline

•  Typically performed automa,cally using scripts.

•  Human-guided data transforma,ons is possible (through macro-opera,ons)

– Use appropriate tools (e.g. OpenRefine - h?p://openrefine.org)

(31)

Metadata

•  Data which describes the data

– Role of variables – Type of variables – Constraints

– Dependencies

•  Collec,on and processing opera,ons can be also described.

(32)

How to Present Data ?

•  How to present data graphically ?

– To allow visual analysis – To highlight paZerns

– To answer some specific ques,ons – Etc.

(33)

Quan,ta,ve Values

•  We have just men,oned the work by Cleveland & McGill (1984)

– Posi,on – Length

– Angle / slope – Area

– Volume

– Color satura,on / shading

Perceptual Accuracy

(34)

From “Data Visualization” Course by John C. Hart, for Coursera, 2015.

QUANTITATIVE ORDINAL NOMINAL

Posi,on Posi,on Posi,on

Length Density Hue

Angle Satura,on Texture

Slope Hue Connec,on

Area Texture Containment

Volume Connec,on Density

Density Containment Satura,on

Satura,on Length Shape

Hue Angle Length

Slope Angle

Area Slope

Volume Area

Volume

(35)

Table vs Graphs

•  Present tables directly is preferred when:

– Few data points

– Precise values are important

(36)

Univariate Data

•  Few interes,ng solu,ons.

•  Sta,s,cal descrip,on:

– Mean, median, standard devia,on, quar,les.

•  Warning! (some data that appears to be univariate are actually bivariate).

(37)

Stemplots

•  Also called stem and leaf plot.

•  Used to display quan,ta,ve data, generally from small data sets (50 or

fewer observa,ons).

•  Easy to print.

•  Easy to read.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(38)

Stemplots

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(39)

Box and Whisker Plots

•  Box and Whisker Plot (or Box Plot) is a convenient way of visually displaying a data distribu,on through their quar,les.

•  Advantages:

–  Key values (average, median, 25th percen,le, etc.)

–  If there are any outliers and what their values are.

–  If the data is skewed and in what direc,on.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(40)

Box and Whisker Plots

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(41)

Line Charts/Line Graphs

•  Invented by the Scotsh engineer and sta,s,cian William Playfair (1759-1823)

•  Two quan,ta,ve variables, typically:

–  X à ,me or intervals , Y any

•  Line indicates that there are also intermediate values

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(42)

Bar Charts

•  Bivariate Data

– One nominal variable (typically independent) – One quan,ta,ve variable

(typically dependent variable)

•  Horizontal/Ver,cal bars

•  Do not confuse with histograms.

(43)

Bar Charts

3D?!

Not a very good idea..

(44)

Histograms

•  Bivariate Data

– One independent and one dependent variable – The first variable is

quan,zed in intervals (bins)

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(45)

Pie Charts

•  Bivariate Data

– One independent and one dependent variable

•  Good for a quick visual check.

•  Not good for:

– Many values.

– Accurate comparisons.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(46)

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(47)

Pie Charts

3D?!

Not more readable.

(48)

Figure by Luiz Salomão.

(49)

•  Essen,ally, a pie chart with the center area cut out.

•  Allows to focus more on arc length instead of

comparing the propor,on between slices.

•  More space efficient.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

Donut Charts

(50)

Sunburst Charts

•  To shows hierarchy

through a series of rings.

Each ring corresponds to a level in the hierarchy.

•  Hierarchy moving outwards from the center.

•  Colour can be used to highlight hierarchal groupings or specific categories.

Figure from Data Visualization Catalogue (http://datavizcatalogue.com)

(51)

Sunburst Charts

Produces by Space Radar app (https://github.com/zz85/space-radar)

(52)

ScaZer Plots

•  Bivariate Data

– Two independent variables

•  Good to iden,fy

rela,onships, outliers and clusters.

Variable 1 Variable 2

(53)

ScaZer Plots

Variable 1 Variable 2

Outliers High

Density

Variable 1 Variable 2 Clusters

(54)

Surface Graphs

•  Trivariate Data

– Three con,nuous variables

– Two independent and one dependent

(55)

Surface Graphs

•  Color may be associated to the dependent variable.

(56)

Surface Graphs

•  Level curves can be also used.

(57)

3D ScaZer Plots

•  Trivariate Data

– Three quan,ta,ve variables

•  Same concept of 2D ScaZer Plot

(58)

Colored ScaZer Plots

•  Trivariate Data

•  Color can encode a

variable or a category

(59)

Bubble Charts

•  Trivariate Data

– Three quan,ta,ve variables

•  Do not allow for accurate comparison.

•  Colors can be used to show different categories.

(60)

Bubble Charts

Figure from http://gapminder.com

(61)

Chloropleth Map

•  Bivariate Data

–  Quan,ta,ve value over a geographical areas/regions

•  Data are coloured, shaded or paZerned in different ways.

•  Good for an overview, not for accurate comparison.

•  Small areas can be underemphasized.

(62)

Word Cloud

•  Word Clouds displays how frequently words appear in a given body of text, by making the size of each word propor4onal to its

frequency.

•  Arrangement and color can vary a lot.

•  Word Clouds can be used to compare two bodies of text or to give a quick idea of

repea,ng keywords (e.g. used by researchers to summarize the content of their papers).

(63)

Word Clouds

•  Disadvantages:

– Long words are emphasized over short words.

– Words whose leZers contain many ascenders and descenders may receive more aZen,on.

– No accuracy comparison, mainly used for aesthe,c reasons.

(64)

Word Clouds (example)

Produces by www.jasondavies.com/wordcloud.

(65)

Word Clouds (example)

Produces by www.jasondavies.com/wordcloud.

(66)

Word Clouds (example)

From www.wordclouds.com .

(67)

Summary

•  A brief introduc,on on Informa,on Visualiza,on has been given.

•  Informa,on coming from data. Data that

should be collected and processed properly.

•  A quick panoramic of graph types has been given.

(68)

Ques&ons ?

Riferimenti

Documenti correlati

[rank 6+4] Given the string S = abaabba, construct the BWT of S, and then show how to locate the position in S of the 4-th character in the BWT (counting from 0) by assuming

 La parte progettuale vale 20/30, mentre la parte teorica vale 10/30. Voto finale = Progetto + max

•  Tensor visualization is extremely difficult.. irregular structure – refers to topology of data-set. •  Data-sets with regular topology, we do not need to store

–  Visual Percep,on à our brain (visual cortex).... Rods and

•  Trichromacity Theory and Opponent Color Theory work at different levels.. •  Trichromacy Theory explains what happen at level

Figure from Information Visualization Course, Università Roma 3... Spineplots

•  The core idea is to preserve the geodesic distance between data points.. •  Geodesic is the shortest path between two points on a

Let us given the strings S = {abac, abra, bacco, bei, acat}, show the structure of a Ternary Search tree built by inserting them in alphabetic order.. Describe how you can use