• Non ci sono risultati.

Laurea in Computer Science Laurea in Computer Science MAGISTRALE MAGISTRALE

N/A
N/A
Protected

Academic year: 2021

Condividi "Laurea in Computer Science Laurea in Computer Science MAGISTRALE MAGISTRALE"

Copied!
7
0
0

Testo completo

(1)

1

Image Processing (for documents) Image Processing (for documents)

Image Representation – Pre-processing – Tasks – Image Representation – Pre-processing – Tasks –

Applications Applications

Laurea in Computer Science Laurea in Computer Science

MAGISTRALE MAGISTRALE

Corso di

ARTIFICIAL INTELLIGENCE Stefano Ferilli

Questi lucidi sono stati preparati per uso didattico. Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato il riferimento. Tutto o parte del materiale può essere fotocopiato per uso personale o didattico ma non può essere distribuito per uso commerciale. Qualunque altro uso richiede una specifica autorizzazione da parte dell'Università degli Studi di Bari e degli altri autori coinvolti.

“A picture is worth a thousand words”

Unknown Author

Motivations

Images widespread and central in everyday life

Pictures

Videos

(Documents)

Documents as images

Images in documents

Images extremely information-dense objects

Human processing of images expression of intelligence

Perceptual level

Introduction

Image Processing

A traditional objective of AI

Aims:

Identifying and recognizing objects

Understanding scenes

Need for automatic processing of images

Problems

Huge amount of data to be handled

Manual processing infeasible

Semantics

Objects not explicit

3D world into 2D images

Types

2 approaches

Vector graphics

Based on geometric-mathematical graphic primitives

+ Scalability

+ Storage occupation

Used for ‘artificial’ images

Raster graphics

Description based on a grid of colored dots

+ representation of halftones

Used for ‘natural’ images

Formats

Vector graphics

SVG (Scalable Vector Graphics)

Raster graphics

BMP (BitMaP)

GIF (Graphics Interchange Format)

TIFF (Tagged Image File Format)

JPEG (Joint Picture Experts Group)

PNG (Portable Network Graphics)

FITS (Flexible Image Transport System)

...

(2)

Raster Images

Sampling grid

Bitmap

Pixel (Picture Element)

Density or Depth

bit/pixel

n bits  2

n

colors

Resolution

width * height * depth / 8

Color Spaces

Mathematical model for color representation

3 channels

(RYB)

Tricromy (RGB)

Additive

Quadricromy (CMYK)

Subtractive

YUV

Gray-levels

HSV/HSB (Hue, Saturation, Value/Brightness)

Linear

...

Transparency

Alpha channel

Color Representation

Color

1 byte/channel  256 levels/channel

3 bytes/pixel  17 777 216 colors (true color)

Gray Level

1 byte/pixel  256 levels

Average of R, G, B

Black&White

1 bit/pixel

(Alpha channel)

1 byte  256 levels

Our focus

Documents

Documents as images

Images in documents

2 types

Born-digital

Layout and content information partly explicit

Digitized

Documents themselves are images

All information must be extracted from images

Documents

Examples

Documents

Examples

(3)

Documents

Examples

Documents

Examples

Documents

Examples

Document Layout

Visual appearance of documents

Organization of document components

Very regular in most document types

Articles, Books, Bills, Letters, ...

Useful to approach automatic processing

Content-based

Indication for semantics

Not explicit

Processing needed to extract it

Layout Structure

Geometric/perceptive structure in which components are visually organized

Characterizes each document

Fundamental for comprehension

Defines a hierarchy of abstract representations of a document’s geometric organization

Leaves = basic blocks identifiable in the document

Internal nodes = various livels of compound components

Root = whole document

Sequence of pages

Layout Hierarchy

(multi-page) Document

Page

Frame

Block

Text line

Word

Basic text block

Graphical line

Image

Graphic

(4)

Layout Structure

Not all pages interesting for one’s goals

Document processing by single pages

Scope restriction

Computational savings

A page may group different visual components

Frames

Rectangular areas of interest

Each associated to a specific logical function

Layout Analysis

Process for identifying a document’s components and their hierarchy

Should result in a set of frames, each of which can be associated to a specific logical component

Layout types

Manhattan

non-Manhattan

Layout Analysis

Problems in digitized documents (images)

Rotation

Specks

Deformations

Unstructured

Computational complexity

(Black & White)

Pre-processing Tasks

Binarization

Skew correction

Dewarping

Color separation

Line identification

Object Identification

Optical Character Recognition (OCR)

Binarization

Steps

Transformation into gray scale

Luminance = average of chromatic components

Thresholding

Based on distribution of gray levels

Global: computed on the whole image

E.g. Otsu

Local: based on a neighborhood of each pixel

E.g. Niblack

Scaling

Scale reduction

Squares of nxn pixels collapsed in a single pixel

Black&White: majority of collapsed pixels

(Gray level / color: average of collapsed pixels)

Advantages

Removes insignificant details

Small specks

Reduces computational complexity

(5)

Deskewing

Skew angle

Non-horizontal orientation of text lines

Positive: clockwise

Negative: counterclockwise

Estimated through horizontal projection

Try various positive and negative angles

OK high peaks having similar size as characters

KO smooth shape with low peaks

Possibly on selected

areas of the document

Correction:

opposite rotation

Deskewing

Problem: multiple orientations

Choice of dominant orientation

Separation of different zones

Dewarping

Some documents digitized through photography

Historical documents

Introduction of geometric distorsions

Waving

Perspective

2 approaches

Specialized hardware (expensive)

Generic hardware

Manipulation software

Dewarping

Grid straightening

Geometric approach

Dewarping

Grid straightening

Image

Original

Dewarped

OCR

Original

Dewarped

Spread Factor

Indicator of layout complexity

Average distance between regioni / average height of regions

>1 : simple structure

A few sparse regions

<1 : complex strutture

Many dense regions

0 5 10 15 20 25 30 35 ...

60 50

0 0,5 1 1,5 2 2,5 3

Spread Factor Th

res ho

ld

s Ch

Cv Ca

(6)

Layout Analysis

Identification of frames

Rectangles enclosing portions of contents

Classification of methodologies

Direction in tree building

top-down

More specific

bottom-up

More generic

hybrid

Layout Analysis

Geometric Structure Analysis

Input

Vector representation of blocks that make up a document

Graphic or text elements

Output

High-level geometric structure

frames

Boxes that should represent consistent and significant block aggregations

XY-cut

Alternate and recursive Horizontal / Vertical cut of portions of a document

Based on projections

Horizontal + Cut

Vertical + Cut

XY-cut

Results

Run Length Smoothing

Run

Sequence of pixels of the same type delimited by pixels of opposite type

White/Black

Run Length

Number of pixels that make up a run

Run Length Smoothing

Removal of some runs

Shorter than a pre-defined threshold by transforming pixels in the opposite type

t = 3

Run Length Smoothing

RLSA

Horizontal smoothing with threshold t

h

Vertical smoothing with threshold t

v

AND

Horizontal smoothing

with threshold t

a

(7)

Run Length Smoothing

RLSO (Run Length Smoothing with OR)

Horizontal smoothing with threshold t

h

Vertical smoothing with threshold t

v

OR

RLSA RLSO

Run Length Smoothing

RLSO

Advantages

Handles non- Manhattan layout

Less parameters

Lower thresholds

Less steps

Efficiency

Can be applied to connected components

PDF

Different definition of run length

Run Length Smoothing

RLSO

Automatic assessment of thresholds

Histograms of runs

Cumulative

Graphic slope

Derivative

Problem

Documents with characters in very different sizes

Docstrum

Assessment of the document spectrum

Bottom-up

Works on connected components

Clustering

Based on k-Nearest Neighbors

Coordinates: Euclidean distance + Angle

Not sensitive to skew

Useful to assess it

CLiDE

Chemical Literature Data Extraction

Bottom-up

Works on connected components

Progressive grouping in a tree

Sensitive to skew

Some levels:

References

S. Ferilli

“Automatic Digital Document Processing and

Management - Problems, Algorithms and

Techniques”, Springer, 2011

Riferimenti

Documenti correlati

● Representation = a system able to handle symbols that allows to access the body of knowledge so as to choose the actions to attain some goals. ●

● So, #$Individual includes, among other things, physical objects, temporal sub-abstractions of physical objects, numbers, relationships and groups. ● An instance of #$Individual

– Human reasoning: if a fact cannot be proven false, it is assumed to be true?.

Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato

Essi contengono materiale originale di proprietà dell'Università degli Studi di Bari e/o figure di proprietà di altri autori, società e organizzazioni di cui e' riportato

● Abduction then emerges as an important computational paradigm that would be needed for certain problem solving tasks within a declarative representation of the problem

Automatic Inductive and Analogical Reasoning on Concept Networks: a Forensic Application. Candidate: Fabio Leuzzi –

Our main results are as follows: (a) exponential memory is sufficient and may be necessary for winning strategies in energy parity games; (b) the problem of deciding the winner