• Non ci sono risultati.

An hand tracking and haptic feedback system for fully immersive virtual environments

N/A
N/A
Protected

Academic year: 2021

Condividi "An hand tracking and haptic feedback system for fully immersive virtual environments"

Copied!
161
0
0

Testo completo

(1)

U

NIVERSITÀ DEGLI

S

TUDI DI

P

ISA

S

CUOLA

S

UPERIORE

S

ANT

’A

NNA

M

ASTER OF

S

CIENCE IN

E

MBEDDED

C

OMPUTING

S

YSTEM

A hand tracking system with haptic

feedback for fully immersive virtual

environment

Author:

Michele LAMANNA

Supervisors:

Dott. Ing. Franco TECCHIA

Prof. Marcello CARROZZINO

(2)
(3)

iii

“Every problem solved is another revealed.”

(4)
(5)

v

Università degli Studi di Pisa

Abstract

Facoltà di ingegneria

Dipartimento dell’ingegneria dell’informazione Embedded Computing System

A hand tracking system with haptic feedback for fully immersive virtual environment

by Michele LAMANNA

Tracking has been since the beginning of Virtual Reality the core task of any VR ap-plication. In particular, hand tracking has been addressed only in the last decade thanks to the improvement in the field of technologies and computer vision. There still are not standard solutions or approach. Hand tracking will be crucial in VR application as this can allow the user to interact with the virtual environment with-out the use of metaphors, therefore boosting the feeling of immersion and presence. Another important instrument of any virtual application is the capability to give to the user feedback. In this thesis we propose a new method to track the user hand and give an electric feedback when needed. Specifically the tracking is performed using infra-red led and a tracking system made by Optitrack, and a microcontroller driving the infrared led. Everything is managed by a client executing on the PC. The electric feedback is managed by the virtual environment also executing on the PC but the actual electric pulse is driven by the microcontroller. In the first phase we designed and developed a glove prototype capable of both active tracking and give feedback. The algorithm used for the tracking task is a fusion between a time division multiple access approach. Furthermore we achieve a satisfying electrical feedback through the use of an electromagnetic transformer, driven by an opera-tional amplifier, which can recreate a decent number of different feedback.

(6)
(7)

vii

Ringraziamenti

Il primo e più importante ringraziamento che voglio fare va alla mia famiglia, in particolare ai miei genitori, che mi hanno permesso di affrontare questo percorso in totale serenità, senza pressioni, e mi hanno dato l’opportunità di concentrarmi a pieno sulla mia carriera scolastica.

Ringrazio Alice, per avermi sopportato da quasi sei anni a questa parte, ancora nessuno ha capito come fai e/o chi te lo fa fare! Scherzi a parte, grazie per essermi sempre stata a fianco, sincera, dolce e paziente, con la speranza di rimanere a fianco, insieme, per ancora molto tempo. . .

rinGRazio i miei cOmpagni di una vita, letteralmente ci conoSciamo da più di dieci anni, eppure Sembra ieri che sI condividevano i Banchi. Unici, Fedeli e Fantas-tici, Ogni volta che ci riuNiamo, è sempre una fEsta! Grazie a Cipo, Bubi, il Grossi, Erica, Andrea, Cheto, Tommy, Best, Alessia e tutti gli altri... avessi un bicchiere in mano, proporrei un brindisi a noi!

Ringrazio gli amici che mi hanno accompagnato in questo viaggio durato 6 lunghi (lunghissimi) anni, viaggio che tutti noi non vedevamo l’ora che finisse in fretta, ma che invece ci siamo goduto tutto, fino alla fine, ridendo, bevendo, mangiando (soprattutto mangiando!) e talvolta, studiando. Un Grazie particolare ad Antonio, che è stato paziente come pochi nello spiegarci tutte le cose incomprensibili a noi comuni mortali, oltre che gentile e di conforto quando ne ho avuto bisogno; a Sara, per avermi insegnato ad essere più determinato e "cattivo"; a Federico, che oltre ad essere un amico ed un esperto allenatore di pokémon, è stato pazzo abbastanza per buttarsi con me in un progetto lungo un anno. . . e non un progetto qualsiasi, IL progetto!

Ringrazio di tutto cuore l’Azione Cattolica, sia parrocchiale che diocesana, e le persone meravigliose che la compongono. Grazie a l’AC mi sono formato, sono cresciuto e ho conosciuto persone con cui lavorare e collaborare per conseguire un

(8)

unico obbiettivo è solo che un piacere. In particolare grazie a Francesco, l’amico migliore che mi sia capitato.

Infine ringrazio tutte le persone che lavorano al Percro e mi hanno aiutato in questo progetto, Franco e Sandro in primis, a cui ho rotto le scatole più e più volte: ed ogni volta con infinita pazienza mi hanno ascoltato ed aiutato. Grazie per aver avuto l’opportunità di fare esperienza in un ambito che mi affascina moltissimo e in cui spero di poter lavorare in futuro.

(9)

ix

Contents

Abstract v

Ringraziamenti vii

1 Introduction 1

1.1 Virtual Reality a brief history . . . 1

Battle of Borodino 1812 . . . 1

Stereoscopic photos & viewers 1838 . . . 2

Link Trainer 1929 . . . 2

Morton Heilig’s inventions 1955-1960 . . . 3

Headsight 1961 . . . 4

The Ultimate Display 1965 . . . 5

Sword of Damocles 1968 . . . 5

Artificial Reality 1969 . . . 6

Virtual Reality 1987 . . . 6

Virtuality Group Arcade Machines 1991 . . . 7

VR glasses by SEGA . . . 7

Nintendo Virtual Boy 1995 . . . 8

The Matrix 1999 . . . 8

1.2 The importance of Virtual Reality . . . 9

Low level abstraction - Conservation and safeguard . . . 9

Medium level abstraction - Architecture . . . 9

High level abstraction - Information Lanscapes . . . 10

CAD modeling . . . 11

Ergonomics . . . 11

(10)

1.3 Interaction in Virtual Environments . . . 13

1.3.1 Direct interaction . . . 13

Locomotion interface - Virtuix Omni VR Treadmill . . . 15

Brain Computer Interfaces . . . 16

1.3.2 Interaction metaphores . . . 16

Navigation metaphors . . . 17

Selection metaphors . . . 17

1.4 Virtual Reality: Presence, immersion and Interaction . . . 18

1.5 Thesis structure . . . 19

Chapter 2: related work . . . 19

Chapter 3: The proposed system . . . 20

Chapter 4: Implementation . . . 20

Chapter 5: Experiments and results . . . 20

Chapter 6: Conclusions and future applications . . . 21

2 Related work 23 2.1 Topics . . . 23

2.2 Articles . . . 25

2.2.1 Real-Time Hand-Tracking with a Color Glove . . . 25

2.2.2 Efficient and Precise Interactive Hand Tracking Through Joint, Continuous Optimization of Pose and Correspondences . . . . 28

2.2.3 Efficient Model-based 3D Tracking of Hand Articulations us-ing Kinect . . . 32

2.2.4 Model-Based Hand Tracking with Texture, Shading and Self-occlusions . . . 37

2.2.5 Receptive Field Characteristics Under Electrotactile Stimula-tion of the Fingertip . . . 42

3 The proposed system 49 3.1 Capture system and tracking . . . 49

Mechanical trackers . . . 50

Magnetic Tracker . . . 50

(11)

xi

Inertial tracker . . . 52

Markerless tracking . . . 53

Optical Trackers . . . 54

3.1.1 Optical Tracking . . . 54

Inside - out approach . . . 55

Outside - in approach . . . 56

Passive trackers . . . 57

Active trackers . . . 60

3.2 Microcontroller and glove . . . 61

3.2.1 Timing . . . 62

3.3 Association Algorithm . . . 62

3.3.1 Undesired noise tracked as led . . . 63

3.3.2 Occlusions and misplacements . . . 64

3.4 Virtual Environment . . . 65

3.4.1 The XVR Studio 2.0 . . . 67

XVR framework . . . 67

XVR Functions . . . 67

The type management . . . 69

4 Implementation 71 4.1 IRLed and microcontroller . . . 71

4.1.1 Description . . . 71

4.1.2 Behavior . . . 75

4.1.3 Code . . . 75

4.1.4 Problems and choices of implementation . . . 81

4.2 Client running on the PC . . . 84

4.2.1 Behavior . . . 84

4.2.2 Association Algorithm . . . 84

4.2.3 Problems and choices of implementation . . . 87

4.2.4 Code . . . 87

The Mmarker Class . . . 88

(12)

The Mmarker Class - Mmarker.cpp . . . 89

Global variables . . . 91

Initialization code . . . 93

Function Data Handler . . . 94

4.3 Virtual Environment . . . 101

4.3.1 Behavior . . . 101

4.3.2 Problems and choices of implementation . . . 103

4.3.3 Code . . . 104

marker Class . . . 104

Level Class . . . 105

Cube Class . . . 106

Fingertip class . . . 107

Creation of a level instance . . . 107

Global variables . . . 109

OnTimer() function . . . 110

my_level.step() function . . . 112

OnFrame() function . . . 114

5 Experiments and results 119 5.1 Electric feedback experiment . . . 119

Description . . . 119 Results . . . 121 5.2 Tracking experiment . . . 123 Description . . . 123 Results . . . 123 5.3 Testing . . . 127

6 Conclusions and future work 129 6.1 Future work . . . 130

(13)

xiii

List of Figures

1.1 A section of Franz Robaud, Battle of Borodino, 1812 . . . 1

1.2 People watching the panoramic painting . . . 2

1.3 Example of a stereoscopic photo . . . 2

1.4 Edward Link and a Link Trainer[37] . . . 3

1.5 Sensorama system [31] . . . 4

1.6 The Sword of Damocles [43] . . . 6

1.7 Example of a virtuality group arcade machine . . . 7

1.8 SEGA VR glasses . . . 8

1.9 Nintendo Virtual Boy set . . . 8

1.10 Graphic representation of vectors . . . 14

1.11 Virtuix Omni VR Treadmill . . . 15

1.12 On the left, graphics that reveal a P300 peak; in the middle an interface model, in which every symbol flashes at different frequency; on the right, graphics showing how the brain activity changes according to the frequency of the input wave. . . 16

2.1 Patched glove, [49] . . . 26

2.2 Hand Model, [46] . . . 28

2.3 From pose to surface, [46] . . . 29

2.4 Different starting points, different results, [46] . . . 31

2.5 Comparison with described approach (in the figure "this paper", as the image is taken directly from Taylor et al.[46]) and state of the art algorithms on DEXTER . . . 32

2.6 Comparison with described approach (in the figure "this paper", as the image is taken directly from Taylor et al.[46]) and state of the art algorithms on NYU . . . 32

(14)

2.7 From real to model [30] . . . 33

2.8 Performances of PSO algorithm w.r.t. (a) the PSO param (b) the dis-tance from the sensor (c) noise (d) viewpoint variation. [30] . . . 36

2.9 Various hand poses correctly recognized by the method. [30] . . . 36

2.10 Error frequency expressed in mm [30] . . . 36

2.11 Skinned hand model, Γθ visualized [25] . . . 37

2.12 First sequence illustrating improvement due to selfocclusion forces. Each row corresponds in order to : the observed image, the final syn-thetic image, the final residual image, the synsyn-thetic side view with 45deg, the final synthetic image with residual summed on surface, the residual for visible points on the surface, the synthetic side view[25] . 42 2.13 Second sequence. Each row corresponds in order to : the observed im-age, the final synthetic image with limited pose space, the final resid-ual image, the synthetic side view with an angle of 45deg, the final synthetic image with full pose space, the residual image, the synthetic side view[25] . . . 43

2.14 Third sequence. Each row corresponds in order to : the observed im-age, the final synthetic image with limited pose space, the final resid-ual image, the synthetic side view with an angle of 45deg, the final synthetic image with full pose space, the residual image, the synthetic side view[25] . . . 43

2.15 Experiment scheme[50] . . . 44

2.16 Electrodes disposition[50] . . . 44

2.17 Chart of percentage correct responses for electrode size, interelectrode spacing, and frequency of stimulation. Half-rectified, anodic square-wave pulses of varying pulse width (1.03 +/− 0.70 ms) duration were used for the electrical stimulation. Percentages indicate mean percent-age correct responses across all subjects and trials. Error bars indicate standard error. [50] . . . 46

(15)

xv

2.18 Chart of percentage correct responses for electrode size and interelec-trode spacing. Stimulation frequencies were combined due to lack of significance. Percentages indicate mean percentage correct responses across all subjects and trials. Error bars indicate standard error.[50] . . 47 2.19 Chart of percentage error for various error types, electrode sizes, and

interelectrode spacing. Percentages indicate mean percentage correct responses across all subjects and trials. Error bars indicate standard error. Media-lateral errors are the errors in which the subject indicates the horizontal opposite of the stimulated electrode, proximal distal the vertical opposite electrode, and diagonal the electrode in the op-posite corner. [50] . . . 47

3.1 The architecture of the proposed system. Dotted line represents

in-direct communication (led turned on) other lines represent message exchange. . . 49

3.2 example of mechanical tracker from Fontana et Al. [12] . . . 50

3.3 example of magnetic tracker disposition: the receivers are mounted

on the user and the transmitter is fixed at enough distance. . . 51

3.4 Example of ultrasonic tracker . . . 52

3.5 Example of movement perceivable by inertia sensors . . . 52

3.6 Example of leap motion application. the small silver and black box

under the hands is actually the device of leap motion[28] . . . 53

3.7 Example of leap motion application. The leap motion can be attached

to an HMD (head mounted display) such as the HTC vive in order to "carry" the leap and increment the workspace. . . 54

3.8 Example of inside-out approach setup. Camera on the user’s head

and trackers on the ceiling . . . 55

3.9 Example of hiball setup. Camera recognize the pattern of the sensor

over the user’s head and retrieves position and orientation . . . 56 3.10 Example of outside - in approach setup. Tracker on the user’s head

(16)

3.11 Example of squared black and white marker in an augmented reality application. The camera sees the marker associated with that monster, and shows it on the screen over the marker, with correct position and

orientation. . . 58

3.12 Example of reflectance spheres. Those tracker can be mounted over a rigid body to make them easily recognizable . . . 59

3.13 Example of colored markers. In this project the markers were simply two balloons, one orange and one green, because they are two colors that can be filtered with ease. . . 60

3.14 Led positions on the back of the hand, 16 total. . . 61

3.15 Communication scheme among Camera, Pc, and Microcontroller, deeper the level further in time. . . 62

3.16 1. two led correctly identified; 2. two led close to each other; 3. iden-tity exchange happened, the letters are switched because of a mis-placement. . . 64

3.17 Bubble strategy visualized, yellow means led turned on, grey means led turned off. . . 65

3.18 Example of realistic virtual environment . . . 65

3.19 A frame taken from avatar, an example of fantastic virtual environment. 66 4.1 Front scheme of the microcontroller, Teensy 3.2 . . . 72

4.2 Back scheme of the microcontroller, Teensy 3.2 . . . 72

4.3 IR led by Optitrack . . . 73

4.4 Led positions on the back of the hand, 16 total. . . 73

4.5 Our glove . . . 73

4.6 Electrical schematics for led circuit . . . 74

4.7 Electrodes placed inside the glove. The ones on fingertips will be con-nected to the transformer output, the other two will be concon-nected to GND. . . 74

4.8 Tansformer electric scheme: elements with same name have equal value. Ep are the electrodes connected to the palm, Ef are the elec-trodes connected to the fingertips. . . 75

(17)

xvii

4.9 First case, working with the TRIO camera. Distance camera-glove is

relatively short, so we can afford 3 groups. Group blue is the first, red the second, and green the third. At each frame there are only one group "active" and inside the group, only one led is turned off. . . 82 4.10 Second case, working in the CAVE. Distance camera-glove is very

long, so we must use at least 4 groups. Group blue is the first, red the second, green the third, and pink the fourth. At each frame there is only one group "active" and inside the group, only one led is turned off. . . 82 4.11 First case, working with the TRIO camera. having 3 groups, the

dispo-sition is not linear, otherwise the groups would have been all placed in the same place (all group two on the bone, all group three in the

middle part of the finger, all the group one on the fingertips). . . 83

4.12 Second case, working in the CAVE. Disposition is linear because nat-urally having three led on each finger and have four groups makes the order shift by one at every finger . . . 83 4.13 Client running on the PC flowchart. . . 85 4.14 Association algorithm flowchart. . . 86 4.15 The hand drawn in the virtual environment. Each finger has a color

(red for the thumb, yellow for the index, green for the middle, blue for the ring, purple for the pinkie, and the palm is white). Darker the color further the marker from the palm. . . 102

5.1 The electrodes setup of the subject for the electric feedback. One

elec-trode is attached to the fingertip, the other one near the palm. . . 119

5.2 The led setup of the subject for the electric feedback. One led is enough

for this experiment, so we kept the led on the fingertip of the subject with an elastic band. . . 120

5.3 The setup of the virtual environment for the electric feedback. There

is only one marker (red sphere) which represent the only let that was attached to the index of the subject. . . 121

(18)

5.4 Diagram showing the subject feeling of standard parameters. The question is in Italian, and it is traduced as: "With the standard param-eters, do you feel you were able to identify correctly weak, medium, and high pulses?". The colors are coded as orange: yes; purple: yes, with difficult; yellow: sometimes i confused them. . . 121

5.5 In this diagram we show the actual result of the test. On the x axis

there is the number of pulses correctly identified divided by two, on the y axes the number of subject that have given that response. . . 122

5.6 Diagram showing the subject feeling of standard parameters. The

question is in Italian, and it is traduced as: "With your parameters, do you feel you were able to identify correctly weak, medium, and high pulses?". The colors are coded as blue:yes, definitely; orange: yes; purple: yes, with difficult; green: no, definitely . . . 122

5.7 In this diagram we show the actual result of the test. On the x axis

there is the number of pulses correctly identified divided by two, on the y axes the number of subject that have given that response. . . 122

5.8 The glove setup of the subject for the tracking experiment. The glove

is attached to the panel via bi-adhesive tape, so is the reflectance ball. . 123

5.9 Figure that shows the x coordinates of the reflectance sphere, and 3

IRLed mounted on the glove. . . 124 5.10 Figure that shows the y coordinates of the reflectance sphere, and 3

IRLed mounted on the glove. . . 124 5.11 Figure that shows the z coordinates of the reflectance sphere, and 3

IRLed mounted on the glove. . . 125 5.12 In this image we drew the coordinate x of the reflectance sphere, and

the coordinate x of the Led 0, summed to the difference between the first two values of those arrays. As it can be seen, the tracking is prac-tically the same. . . 125

(19)

xix

5.13 Zooming the previous image, it can be noticed what we have de-scribed before: the tracked led is constant for some frames, the frames in which it actually is not being tracked. Keep an eye on the X axis, because the duration of those constant value isn’t enough for humans to notice the difference. . . 126 5.14 Same as 5.12, but the offset was not the same, we try to keep them a

little bit separated to appreciate better other aspects. . . 126 5.15 Same as 5.13, but the offset was not the same, we try to keep them a

little bit separated to appreciate better other aspects. Also we zoomed another part of the curves. . . 127

(20)
(21)

xxi

List of Abbreviations

AR Augmented Reality

DAC Digital to Analog Converter

DOF Degree Of Freedom

HMD Head Mounted Display

ICP Iterated Closest Point

OPAMP OPerational AMPlifier

VR Virtual Reality

VE Virtual Environment

PSO Particle Swarm Optimization

(22)
(23)

xxiii

A zia Gina, che mi ha trasmesso l’amore e l’entusiasmo per

l’educazione dei ragazzi, e attraverso loro, l’amore verso Dio.

(24)
(25)

1

Chapter 1

Introduction

1.1

Virtual Reality a brief history

Virtual Reality1 is an emerging field which has an enormous potential. Although

the fascinating idea of being present in another place is getting old, the nonstop innovation and improvement in technology are now allowing the realization of valid systems that involve the concept of a virtual environment.

Battle of Borodino 1812

The origin of "virtual reality", in the sense of immersion and presence, can be seen in the panoramic paintings from the nineteenth century. The artist’s idea is to fill the viewer’s entire field of vision, giving the impression of standing in the middle of the representation. This is the case of the Battle of Borodino a 115 meters long, 360 degree circular painting by Franz Robaud.

FIGURE1.1: A section of Franz Robaud, Battle of Borodino, 1812

(26)

FIGURE1.2: People watching the panoramic painting

Stereoscopic photos & viewers 1838

In 1838 Charles Wheatstone’s research demonstrated that the things that we see are actually artificially elaborated by our brain. As a matter of fact, our eyes see inde-pendently and communicate to our brain two different images in 2D that our brain elaborate as a single image in 3D. Viewing two side by side stereoscopic images or photo through tools like stereoscopes, gave the subjects depth cues of what they were seeing.

FIGURE1.3: Example of a stereoscopic photo

Link Trainer 1929

In 1929 Edward link developed the "Link Trainer", a platform used to train people on how to pilot an airplane. It was entirely electromechanical, controlled by motors that were linked to the input device on the cabin, in order to modify pitch and roll. A smaller motor simulates turbulences and disturbs. The US military bought six

(27)

1.1. Virtual Reality a brief history 3

simulators in order to train with less costs their pilots and in safer conditions. During world war second, over 10000 platforms like this were utilized by the US Army to improve pilot’s skills.

FIGURE1.4: Edward Link and a Link Trainer[37]

Morton Heilig’s inventions 1955-1960

In the mid 1950s cinematographer Morton Heilig developed the Sensorama (patented 1962) which was an arcade-style theatre cabinet that would stimulate all the senses, not just sight and sound. It featured stereo speakers, a stereoscopic 3D display, fans, smell generators and a vibrating chair. The Sensorama was intended to fully im-merse the individual in the film. He also created six short films for his invention all of which he shot, produced and edited himself. The Sensorama films were titled, Motorcycle, Belly Dancer, Dune Buggy, helicopter, A date with Sabina and I’m a coca cola bottle! Morton Heilig’s next invention was the Telesphere Mask (patented 1960) and was the first example of a head-mounted display (HMD), albeit for the non-interactive film medium without any motion tracking. The headset provided stereoscopic 3D and wide vision with stereo sound.

(28)

FIGURE1.5: Sensorama system [31]

Headsight 1961

The first ever tracking HMD, patented by Philico Corporation, incorporated a video screen for each eye and a magnetic motion tracking system, which was linked to a closed circuit camera. The Headsight was not actually developed for virtual reality applications (the term didn’t exist then), but to allow for immersive remote viewing of dangerous situations by the military. Head movements would move a remote camera, allowing the user to naturally look around the environment. Headsight was the first step in the evolution of the VR head mounted display but it lacked the integration of computer and image generation.

(29)

1.1. Virtual Reality a brief history 5

The Ultimate Display 1965

Ivan Sutherland described the “Ultimate Display” concept that could simulate re-ality to the point where one could not tell the difference from actual rere-ality. His concept included:

• A virtual world viewed through a HMD and appeared realistic through aug-mented 3D sound and tactile feedback.

• Computer hardware to create the virtual word and maintain it in real time. • The ability users to interact with objects in the virtual world in a realistic way “The ultimate display would, of course, be a room within which the computer can control the existence of matter. A chair displayed in such a room would be good enough to sit in. Handcuffs displayed in such a room would be confining, and a bullet displayed in such a room would be fatal. With appropriate programming such a display could literally be the Wonderland into which Alice walked.” – Ivan Sutherland

This paper would become a core blueprint for the concepts that encompass vir-tual reality today.

Sword of Damocles 1968

In 1968 Ivan Sutherland and his student Bob Sproull created the first VR / AR head mounted display (Sword of Damocles) that was connected to a computer and not a camera. It was a large and scary looking contraption that was too heavy for any user to comfortably wear and was suspended from the ceiling (hence its name). The user would also need to be strapped into the device. The computer generated graphics were very primitive wireframe rooms and objects.

(30)

FIGURE1.6: The Sword of Damocles [43]

Artificial Reality 1969

In 1969 Myron Kruegere a virtual reality computer artist developed a series of ex-periences which he termed “artificial reality” in which he developed computer-generated environments that responded to the people in it. The projects named GLOWFLOW, METAPLAY, and PSYCHIC SPACE were progressions in his research which ultimately let to the development of VIDEOPLACE technology. This tech-nology enabled people to communicate with each other in a responsive computer generated environment despite being miles apart.

Virtual Reality 1987

Even after all of this development in virtual reality, there still wasn’t an all-encompassing term to describe the field. This all changed in 1987 when Jaron Lanier, founder of the visual programming lab (VPL), coined (or according to some popularised) the term “virtual reality”. The research area now had a name. Through his company VPL research Jaron developed a range of virtual reality gear including the Dataglove (along with Tom Zimmerman) and the EyePhone head mounted display. They were the first company to sell Virtual Reality goggles (EyePhone 1 $9400; EyePhone HRX $49,000) and gloves ($9000). A major development in the area of virtual reality hap-tics.

(31)

1.1. Virtual Reality a brief history 7

Virtuality Group Arcade Machines 1991

We began to see virtual reality devices to which the public had access, although household ownership of cutting edge virtual reality was still far out of reach. The Virtuality Group launched a range of arcade games and machines. Players would wear a set of VR goggles and play on gaming machines with realtime (less than 50ms latency) immersive stereoscopic 3D visuals. Some units were also networked together for a multi-player gaming experience.

FIGURE1.7: Example of a virtuality group arcade machine

VR glasses by SEGA

Sega announced the Sega VR headset for the Sega Genesis console in 1993 at the Consumer Electronics Show in 1993. The wrap-around protoype glasses had head tracking, stereo sound and LCD screens in the visor. Sega fully intended to release the product at a price point of about $200 at the time, or about $322 in 2015 money. However, technical development difficulties meant that the device would forever remain in the prototype phase despite having developed 4 games for this product. This was a huge flop for Sega.

(32)

FIGURE1.8: SEGA VR glasses

Nintendo Virtual Boy 1995

The Nintendo Virtual Boy (originally known as VR-32) was a 3D gaming console that was hyped to be the first ever portable console that could display true 3D graphics. It was first released in Japan and North America at a price of $180 but it was a com-mercial failure despite price drops. The reported reasons for this failure were a lack of colors in graphics (games were in red and black), there was a lack of software sup-port and it was difficult to use the console in a comfortable position. The following year they discontinued its production and sale.

FIGURE1.9: Nintendo Virtual Boy set

The Matrix 1999

In 1999 the Wachowski siblings’ film The Matrix hits theatres. The film features characters that are living in a fully simulated world, with many completely unaware that they do not live in the real world. Although some previous films had dabbled in depicting virtual reality, such as Tron in 1982 and Lawnmower Man in 1992, The

(33)

1.2. The importance of Virtual Reality 9

Matrix has a major cultural impact and brought the topic of simulated reality into the mainstream.

1.2

The importance of Virtual Reality

With that said, how can Virtual reality be useful? Virtual reality comes in handy when working in the real counterpart is

• Expensive • Dangerous

• Actually impossible (space, paintings...)

There are three levels of abstraction, according to which the Virtual Environment is built:

• Low level abstraction: acquisition of a real environment, for example through scanning. This approach is called sampling

• Medium level abstraction: modeling of a real environment. This approach is called synthesis.

• High level abstraction: modeling of a non real environment. This approach is called creation.

Low level abstraction - Conservation and safeguard

An example of a low level abstraction is the reconstruction, or sampling of sculp-tures, statues and buildings from the far past. A virtual copy of those objects would allow users to see and even touch them, without deteriorating the real thing. Fur-thermore we can reconstruct artwork or historical environments which time has de-stroyed or damaged.

Medium level abstraction - Architecture

An example of a medium level abstraction Virtual Environment is the synthesis of a not yet existing building. Users can explore a 3D architectonical environment so

(34)

as to evaluate spaces, lighting, materials, acoustic and more. In this case the Virtual Application is used as modeling tool to analyze space from the inside and evaluate different project choices.

High level abstraction - Information Lanscapes

An example of an high level abstraction Virtual Environment are the Information Landscapes. Information landscapes are virtual environment that do not correspond to any real environment, but exploit the 3D space to map semantic relationships among data into spatial relationships. The idea behind the information landscapes is that the user can walk across a text structure placed inside a virtual environment, and access knowledge otherwise difficult to acquire. Information landscapes can be enhanced using other medium of information rather than text: like video, pictures and sounds; also giving feedbacks, achieving MIL (Multimodal Information Land-scapes). There are so many applications in which the virtual reality can be useful, and here we show some:

• Fruition: allow the user to see the artwork from another perspective, for ex-ample reconstruct a Virtual environment from a painting, so that the user can literally "explore" the painting from the inside.

• Promotion and enhancement: as a compelling technology, VR is an effective means to promote and enhance cultural assets.

• Promulgation and education: VR can be a novel, interesting and powerful means for didactics and promulgation.

• Creation: VR provide means to create new forms of art.

• Restoration: Assist real restoration, or perform virtual restoration without touch-ing the original object.

• Documentation: Provide an intuitive and effective interface to multimedia databases.

Those are all interesting application, but decoupled from the industry field. Now we describe some application useful for the industries, in particular in fields like:

(35)

1.2. The importance of Virtual Reality 11

• Aeronautical • Automotive

• Entertainment and games • Transport and logistic • Military

Within this fields, the main activities requested to a Virtual Reality Application are: • Personnel training

• Design and development • Ergonomics studies • Clinical tests

CAD modeling

CAD modeling evolves in Virtual Prototyping. Virtual prototyping allows reducing prototypes costs allowing to perform virtual checks before the actual construction. The advantage of a Virtual Environment is that you can share the same environment with other users, so that the design can be collaborative and developing can be done in parallel. A virtual prototype can also be virtually tested before it has been made, is useful to find problems more rapidly.

Furthermore and entire assembly line can be designed and devloped inside a vir-tual environment, to test how much space does the machinery require, what is the optimal placement of the machinery, if there is enough space to the employee to eas-ily transport products from one stage to another and so on. CAD modeling can be used also as a cool marketing tool for more catching presentations.

Ergonomics

Ergonomics studies how easy and comfortable are to use some products. Thanks to virtual reality, many solutions of a same product can be tested without spending

(36)

any money in building different prototypes: for example inside a car, is important to focus on: • Reachability of commands • External visibility • Tools visibility • Driving comfort • Access to driving seat

The numbers of button on the dashboard, the design of the radio, the distance at which the steering wheel is placed are all parameters that defines a real prototype: inside a Virtual Environment instead, changing in the developing phase one of this parameter is simply and very cost effective, resulting in faster development and increasing quality, because the time is not spent waiting on the prototype to be built, but inside the VE judging how to improve the comfort of the final user of the real product.

Simulators

VR simulators allow training people and at the same time reducing risks due to real training in dangerous conditions (soldiers, pilots, surgeons). The simulation must be accurate and take into account multiple risk factors that may cause the failure of the training. In many occasions a realistic imitation of a dangerous situation, is not easy to achieve, therefore instead of reproduce the dangerous situation in reality, we use simulators in Virtual Reality. Although there are high cost to build a very good simulator, the whole procedure is cost effective because with a single platform (for example the Stewart platform) it is possible not only to replicate the same scene for many and many trainee, but it is also possible to simulate different scenarios.

(37)

1.3. Interaction in Virtual Environments 13

1.3

Interaction in Virtual Environments

Earlier we saw how a virtual environment can be useful, for industries as well as in other fields. Now we try to briefly explain how the user can interact with the virtual environment.

There are two type of interaction:

• Direct: the user interacts directly with the Virtual Environment

• Mediated: the user interacts with the virtual environment through an avatar.

1.3.1 Direct interaction

The type of interaction is forced by the system/hardware: if you play with a console, most likely you have a joystick that controls an avatar in the virtual environment; if you stand inside a CAVE, you will have to interact with your own body. How can the virtual environment acquire data about the user’s body? The virtual Environment needs information about the position, the orientation and the pose of (at least) the head of the user, so that the graphics is updated accordingly to the law of perspective from the user’s virtual point of view. Although tracking system are better explained in chapter 3, we proceed to explain the theory behind this technique. In order to give the VE the data he needs, the user has to be tracked, in particular:

• Absolute position/orientation allow the system to correctly place the user/a-vatar inside the virtual environment.

• The head position and orientation are necessaries to correctly compute and show the virtual environment from the viewpoint of the user inside the VE. • The hand position and orientation allow the system to check if the user is

in-teracting with some objects inside the VE

• The whole body joint value (knee, elbow, torso..) are useful to animate an avatar inside the VE (this technique is also called motion capture).

Trackers need values to correctly calculate the views on the HMD. Those values are the head position, to calculate the position of the viewpoint (up vector); and

(38)

the head orientation, to compute the view direction (front vector) and the eye offset direction (right vector).

FIGURE1.10: Graphic representation of vectors

View position and direction are used to calculate the view volume and dynam-ically update the perspective. The eye offset direction is used to know on which axis stereo images must be separated, furthermore, the same axis is used for stereo sounds.

Parameters to evaluate the performances of trackers are: • Accuracy: how much is the error for the tracking.

• Ripetibility: tracking two times an object in the same position should give the same result.

• Frequency: how fast the sensors acquire data.

• Latency: after how much time the data arrives to the application. • Weight, Volume: how cumbersome is the tracking system for the user. • #DOF: number of degrees of freedom tracked.

• Robustness: how fast does the system detect and recover from an error situa-tion.

(39)

1.3. Interaction in Virtual Environments 15

Other than with the user own body (i.e. through the uses of the trackers), there are many input devices, through which the user can interact inside the virtual en-vironment. From the more traditional ones, such as the keyboard, the mouse, the joystick. . . to the most recent ones, like the touchscreen or the air mouse and so on. . . Next we describe some of them.

Locomotion interface - Virtuix Omni VR Treadmill

The Virtuix Omni VR Treadmill is a locomotion simulator designed to work as an input interface, for the user to walk inside the virtual environment. The innova-tion about that product is that the user will stay in the same posiinnova-tion, even if he takes a step further. This is possible thanks to the base and its shape: the concavity, mixed to the low friction material, help the feet come back in the original position while a plane surface would require the user to actively take a step back, reducing a lot the sense of immersion. The Virtuix Omni VR Treadmill is used in combo with any HMD, as a peripheral for a full virtual reality setup. In addition to gaming, the Virtuix Omni can be used for virtual reality applications including training and simulation, fitness, healthcare, architecture, virtual tourism, meet-ups and events.

(40)

Brain Computer Interfaces

BCIs can measure the brain activity feedback to a stimulus (ex. P300, pick mea-surable in an EEG about 300 ms after an infrequent stimulus arriving after a se-ries of frequent stimuli). P300 can be used to select an element in a matrix whose rows/columns flash following predefined patterns. Biofeedback-based BCIs mea-sure modifications of the brain activity as a feedback to a conscious user act of thought/concentration (ex. left/right direction, even only imagined)

FIGURE 1.12: On the left, graphics that reveal a P300 peak; in the middle an interface model, in which every symbol flashes at differ-ent frequency; on the right, graphics showing how the brain activity

changes according to the frequency of the input wave.

1.3.2 Interaction metaphores

The easiest wasy for a human being to behave in a virtual environment is to behave like he was in a real environment. This is not always possible or suitable due to technological constraints, or need of more advanced option to perform the tasks required by the application.

Metaphors reproduce concepts, known by the user in a certain context, in order to transfer this knowledge in a new context related to the execution of a task. A good metaphor must be:

• Representative of the task

• Compatible with the user knowledge

• Compatible with the physical constraints of the interface

In VR Application, metaphors are used mostly to navigate, select, and manipulate objects.

(41)

1.3. Interaction in Virtual Environments 17

Navigation metaphors

Navigation in 2 or 3 degrees of freedom are achievable through simple devices such as mouse or joystick. To allow 6 DOF movements, it is necessary to add commands, for example buttons to jump.

Navigation in 6DOF is more complex and used to:

• Controlling Flying vehicle, the virtual camera is placed on a virtual vehicle which the user drives.

• Eyeball in hand, tracker on the hand, the virtual camera is coupled to tracker movements

• Head tracker viewing, the virtual camera follows head movements, used for HMD and CAVE, i.e. immersive virtual environment.

• Teleportation, the user is teleported to the esired place, accessible through a list or by vocal commands.

Selection metaphors

Differently to 2D applications, where objects are directly handled, 3D applications have an additional degree of complexity (depth perception, occlusion issues, etc. . . ). Selection metaphors are useful to easy this task, allowing the user to feel more com-fortable and capable inside the virtual environment. Some of the selection metaphors are:

• Raycasting: It is the most common and easy to implement metaphor. A virtual ray is cast from the user hand, head or other body part, and the first intercept-ing object is selected. It is simple and immediate to learn how to use, but it is sensitive to rotation from long distances, and it is difficult to select occluded objects.

• Virtual hand: a hand is moved in the scene, and it can select the object upon the sole contact, like it is a magnet.

(42)

• Speech: select object by pronouncing its name or a properties. Although it is immediate, there is need of unique names, and the speech recognition isn’t always reliable.

• List: of course, the selection from a list of options or objects.

1.4

Virtual Reality: Presence, immersion and Interaction

There are three keywords when it comes to Virtual Reality: presence, immersion and interaction.

Presence is the mental feeling of being inside (the action, the scene, the story...). Pres-ence is experiPres-enced in games, while reading a book or watching a movie.

Immersion is related to the user environment, and it is the physical feeling of being inside(the action, the scene, the story...). Immersion is experienced in one of those virtual simulator where the sits can move, and usually images of a futuristic roller-coaster are shown on a screen.

Interaction is related to realism, and it is the ability to modify the virtual environment and receive feedback of your action. Interaction is the most important and difficult achievement for a VR application, because the behavior of the user must be taken into account.

Improving interaction leads obviously to improvement of presence and immersion, because the user is part of the environment, and not a mere spectator.[8]

In VR in general, but particularly in videogames, interaction is obtained using a joystick, which allow the user to move an avatar and made him interact in the virtual environment. Technically the use of joystick is a metaphor, namely the act of doing something that represents something else, for example pressing a button to make an avatar jump, or tilting an analog stick to move the avatar around the VE.

The idea behind this thesis, is that replacing the joystick with the gesture of the user own body, the feeling of interaction will be greatly improved. In particular i focused on the hand gestures and pose, which is an interesting and difficult problem, that lately has become very popular.

(43)

1.5. Thesis structure 19

The problem of hands tracking has been considered for many years. A lot of path were examined, but we tried to develop a system yet to be considered, for all that we know. Therefore this thesis has the goal of showing the possibility to track the hand pose inside a CAVE. Achieving this result user immersion and presence will be highly improved inside the virtual environment.

In addition, i tried to simulate a tactile feedback for the user using electrodes and electric stimuli. In this way the user does not only see that the VE is changing, but he can also feel that it is changing. For example, grabbing a fork or a spoon in a VE can improve immersion if the user actually feels something in his hands.

1.5

Thesis structure

Next, we explain the contents of this thesis, highlighting objectives and purpose for each of them

Chapter 2: related work

In this chapter we will describe the main articles, for all that we know, related to this project. The objective is to understand where the techonolgy has arrived and how some applications are realized, in order to do a work not already done, and try to improve some aspects seen in this chapter. In particular, we analyze 5 different articles:

• Real-Time Hand-Tracking with a Color Glove [49]

• Efficient and Precise interactive hand tracking through joint, continuous Opti-mization of pose and correspondences [46]

• Efficient model-based 3D tracking of hand articulations using kinect [30] • Model based hand tracking with texture, shading and self-occlusions

(44)

Chapter 3: The proposed system

In this chapter we will describe at high level the system that we designed and de-veloped, explaining in details the motivation for our choices, a illustrating our pro-cedures without giving implementing details, so that in future this scheme can be reused and reenact by others with other instruments. The main components in our system are:

• The capture system

• The client of the capture system • The virtual environment

• The microcontroller driving the glove

Chapter 4: Implementation

The core of this thesis, here we explain in detail the system we designed, developed and implemented. We describe the hardware the software and the tools that we used for this project. Furthermore, the most important segments of our code is here inserted and commented to make clear how we achieved those results. For each of our components previously mentioned we provide:

• A description • Behavior

• Problems that we faced, and how we addressed it • Detailed explanation of the code

Chapter 5: Experiments and results

In this chapter we describe the two experiments we made, and explain the results of those experiment. Experiments are described in such a way that they can be reen-acted by anybody that wants to. The experiments we made were two, one for topic of this thesis:

(45)

1.5. Thesis structure 21

• Experiments about robustness and precision of tracking

Furthermore we described the first experience of some people that used our system, related to the easiness of use, dexterity and precision.

Chapter 6: Conclusions and future applications

In this chapter we discuss the conclusion and whether we are happy about the re-sults. Also we describe what are our hope for this system in the near future.

(46)
(47)

23

Chapter 2

Related work

2.1

Topics

In order to understand the problems and challenges to be solved in this work, the state of the art must be considered. Every section in this chapter will talk about articles that we read to begin with this work, so that the reader will have an idea of the topic we are discussing. The two main topics are:

• Hand tracking

• Electrotactile stimulation [50] Hand Tracking

Hand tracking is the most difficult part of this thesis for one simple reason: the hand can assume so many positions and orientations that modeling it can be very difficult. From the kinematic point of view, the hand has over 20 DOF resulting in billions of combination that makes the movement and pose of an hand highly non-predictable. There are few methods that i have read, and there is a brief list of them:

• Depth images [26] [40] [42] [13] [4] [35] • Vision based tracking [14] [10] [5] [48] [34] • Utilizing markers [9] [22] [41] [24] [33]

A popular problem for tracking is occlusion, namely the situation in which part of the hand (or the whole hand) is invisible to the camera, due to other objects or even the hand itself. Hiding part of the hand from the camera is a huge loss of informa-tion that makes the tracking and therefore the reconstrucinforma-tion of the hand pose very

(48)

complex. There are many ways to deal with such a problem, as we can see later. Another issue in tracking is that algorithm often give more than one solution to the same input (this is due to the high dimensionality of the hand) and several hypoth-esis need to be confronted to find the best one. Luckily hand tracking has been a problem widely addressed in literature, both with the help of markers and without them.

Electrotactile stimulation

Electrotactile stimulation, is the technique that aims to simulate a tactile feedback through the use of electricity. This is possible because our fingertips are full of electro-receptors, particle that can receive the electricity and elaborate them. Those electro-receptors are divided in two classes:

• Type I fibers • Type II fibers

Type I fibers are generally located 0.08 mm from the surface of the fingertip and they experience stronger electric field w.r.t. the type II fibers. Type I fibers have the same function of Meissner corpuscles and Merkel corpuscles. These mechanoreceptors can be found in large number in very little space, and each of them has a small re-ceptive field: in this way spatial resolution (the ability to understand precisely where is the point of contact among the object and the fingertip) is very high.

Type II fibers instead are deeper (0.2 mm from the surface) and feel weaker electric field. They assume the function of Ruffini endings and Pacinian Corpuscles: these are inferior in number but with greater receptive field, resulting in a low spatial me-chanical resolution. Simulating perfectly this two types of mechanoreceptors is not an easy task, for many reasons: first of all, those receptors vary from each individ-ual, means that for the same input people feel different stimulations; secondly, even with a calibration procedure, the space required to achieve perfect match are wider that the one actually available on the fingertips. We found many articles about elec-trotactile feedback given through an "elecelec-trotactile display" a small portion of metal over which the user places the fingertip and feels different stimulations based on its pressure, but for my objective this method isn’t practical, since it cannot be mounted inside a glove.

(49)

2.2. Articles 25

Next, we discuss about the most interesting articles that we read, in order to give a better view of what has been done until now.

2.2

Articles

2.2.1 Real-Time Hand-Tracking with a Color Glove

Proposed by Robert Y. Wang and Jovan Popovic,[49] in 2009 this approach uses a single camera and a color patched glove in order to determine the hand pose in a cheap way. The glove is custom designed to the needs of nearest-neighbor algorithm approach that they used. Those algorithms allow the entire process to be executed at interactive rates.

The idea behind this work is that a distinct glove simplify pose inference in such a way that is possible to determine the entire posture of an hand in one single frame. This is done with the help of a database whose construction will be discussed later. Inverse Kinematics is used to improve accuracy by means of constraints that link the query image and the nearest neighbor pose. To reduce jitter, a term of temporal smoothness is introduced in the problem of inverse kinematics optimization.

The base of this approach is determine the user’s hand pose by means of a colored glove (as in [38]) and a database. A variety of colored gloves can be used and may be good for the tracking problem. This particular pattern is adopted because of the low-quality consumer camera that are commonly diffused. The glove has twenty patches colored at random with a set of ten different colors. The color set is lim-ited by the camera, so better the camera, larger the pool of usable colors. In order to distinguish the hand profile for the background fully saturated colors are used. Another important choice regarding the glove is to use few large patches instead of many smaller ones, in order to improve robustness and reduce the problem of occlusion, along with the algorithm complexity required if more patches are used.

Starting from a 3D model of the hand, twenty seed-triangle are chosen so that they are maximally distant from each other. the remaining triangles are assigned to each

(50)

FIGURE2.1: Patched glove, [49]

patch based on the distance from the seeds. The non smooth boundary of the patches is due to the fact that the 3D model of the hand has a low number of triangles. An ideal database is small and covers uniformly all the possible natural poses of the hand. The risk of an over-complete database is to increase the search time of the solution, an under-populated database will have a very low accuracy. For these reasons, there is a set of 18,000 finger configurations collected by using a CyberGlove II motion capture system. The configurations are mainly sign language alphabet, common hand gestures, and random movement of the fingers. The distance metric used to evaluate the difference between two images is the root mean square (RMS). Transformation of the image in a query is done in the following way:

• Noise is reduced from the acquired image from the camera using a bilateral filter

• Each pixel is classified in glove or background based on the color using Gaus-sian mixture models trained from a set of hand-labeled images

• The so formed image has glove pixels and non glove pixels. Using an iteration algorithm the final image is the cropped area occupied by the hand.

Experimental results lead to an optimal database size of 100,000 entries. How-ever, graphical comparison is computationally expensive, so the tiny cropped image obtained previously is converted and compressed in a 192 bit string, as first shown by Torralba and colleagues The measuring distance in this scenario will be the clas-sic Hamming distance, which is fast, so the searching time in the database will be drastically reduced.

(51)

2.2. Articles 27

Inverse Kinematics is used to penalize incoherence between query image and in-dexes images by means of constraints. However, rasterization and computation of the Jacobian are too time consuming to be performed at each frame. Instead, patch centroids are computed and distances between centroids are evaluated as metric.

In conclusion, this approach has many perks: • Simple

• Cheap • Robust

• The lack of previous frame dependencies add the perk of fast recovery • Fast

And of course, some disadvantages: • low precision and accuracy

• performances limited by low-quality hw • background dependent

To better understand this work, we suggest to read some of the sources, in particular [7] [6] and [38]

(52)

2.2.2 Efficient and Precise Interactive Hand Tracking Through Joint, Con-tinuous Optimization of Pose and Correspondences

Proposed by Taylor et al. [46] in 2016 this approach is very interesting and powerful. It requires no glove to help the tracking, using only a depth image camera. This

FIGURE2.2: Hand Model, [46]

camera provides a stream of depth images, It, t ∈ {t

0, t1, ...}where t is the time used

to acquire the image. Itis then fragmented by an algorithm in:

• An array of N points in a 3D space.

• An array of N estimated normals coupled to the first array.

• A segmentation mask useful to compute the distance transform Dt: R2 → R

• A set of 3D points, Ft= {fft}F tf =1which represents the detected fingertips.

The hand pose was parametrized according to Khamis et al.[23], with a pose vector

∈ R28including global translation and rotation, four elements for each finger, and

three element for the wrist.

An hand blend shape is generated by the model, then a pose θ is applied, to produce

an articulated triangular control mesh P (θ) ∈ R3×M. This mesh define the posed

hand’s smooth surface S(θ) ⊆ R3.

The function P (θ) : R28 → R3 uses linear blend skinning, obtaining S(θ) as an

ap-proximation w.r.t. the surface that the control mesh defines.

There is, from a previous work of Taylor et al.[47], a smooth map S(u; θ) that

con-nects a 2D parameter space Ω to R3 as well as the corresponding surface normal

(53)

2.2. Articles 29

The objective is to find pose parameters θt in a way that the 3D surface S(θt) is a

"valid response" to the user actual hand pose retrieved by the depth camera and described in It.

FIGURE2.3: From pose to surface, [46]

What is exactly a "valid response"? It is explained by the following cost function:

ˆ

θt= argminθEt(θ) (2.1)

Since is easily understandable that this cost function has multiple local minima, a good way to address this problem is to start the algorithm from multiple starting points, increasing the chance to found different solution, therefore picking the lowest one.

The algorithm is summarized by the following four steps: 1. Preprocess

2. Generate multiple starting points

3. Optimize the cost function for each starting point 4. Retrieve the pose with lowest cost.

Define the cost function is always challenging. The cost function proposed by Taylor et al. [46] is a weighted sum of the following:

data Each point xnshould have a similar normal to the closest point on the surface

bg Points should not face the background

(54)

limit The pose θ should be coherent to the physical joint constraint of a real hand

temp Temporal coherence between subsequent frames

int The hand model should not penetrate itself

tips Every detected fingertip should have another fingertip nearby.

Therefore, imposing Terms = {bg,tips,pose,limit,int,temp} as the con-tribute to the function, the Energy formula is:

E(θ) = Edata(θ) + X τ ∈Terms λτEτ(θ) (2.2) where Edata(θ) = 1 N N X n=1 min u∈Ω ||S(u; θ) − x||2 σ2 x +||S ⊥(u; θ) − n||2 σ2 n (2.3)

σx2 and σn2 are the estimated noise variance regarding position and orientation

re-spectively. Imposing that: (u, θ, x, n) : = ||S(u; θ) − x|| 2 σ2 x +||S ⊥(u; θ) − n||2 σ2 n (2.4)

and inserting this in the previous formula,

Edata(θ) = 1 N N X n=1 min u∈Ω(u, θ, xn, nn) (2.5)

the variable being minimized can be substituted with a specific set of correspondences, let it be U = {un}Nn=1 Edata0 (θ, U ) = 1 N N X n=1 min un∈Ω (un, θ, xn, nn) (2.6)

Now, 2.2 can be written as:

E0(θ, U ) = Edata0 (θ, U ) + X

τ ∈Terms

(55)

2.2. Articles 31

At this point, the following statements are true:

E0(θ, U ) ≥ E(θ), ∀ U (2.8)

and

min

θ E(θ) = minθ,U E 0

(θ, U ) (2.9)

The previous passages allow the creators to us an ICP (iterated closest point) through

gradient-base optimization, because E0(θ, U ) is now easily differentiable w.r.t. all

parameters. Performances of this approach are very promising, as a matter of fact,

FIGURE2.4: Different starting points, different results, [46]

the following graphics compare the implementation described above with others belonging to the state of the art

This approach is sure very efficient, robust and fast. The absence of gloves is a perk but also a downside: in fact with nothing on the user’s hand is not possible to stimulate it and give the user feedback of what he is doing (although it wasn’t the purpose of that paper).

(56)

FIGURE2.5: Comparison with described approach (in the figure "this paper", as the image is taken directly from Taylor et al.[46]) and state

of the art algorithms on DEXTER

FIGURE2.6: Comparison with described approach (in the figure "this paper", as the image is taken directly from Taylor et al.[46]) and state

of the art algorithms on NYU

To better understand the topics of this article, we suggest to read [45] [39] [11] [1]

2.2.3 Efficient Model-based 3D Tracking of Hand Articulations using Kinect

Proposed by Oikonomidis et al. [30] in 2011 it uses a Kinect to obtain both RGB and depth informations. The idea of this paper is to use skin color detection joint with depth cues to isolate the hand both in 2D and 3D.

(57)

2.2. Articles 33

FIGURE2.7: From real to model [30]

problem is to find the combination of those 27 parameters in order to minimize the difference between the virtual pose to the real pose. To do so, skin and depth maps of an hypothetical hand pose are generated with graphics techniques. A PSO (Particle Swarm Optimization) is then performed on an appropriate objective function. The algorithm can be described in 5 steps:

1. Observing a hand 2. Modeling a hand

3. Evaluating a hand hypothesis

4. Stochastic optimization through particle swarms 5. GPU acceleration

Observing a hand

The input for the algorithm, is a 640 × 480 RGB color image and the coupled depth frame, both acquired through the use of a Kinect. A first selection is made taking the largest skin colored blob from the RGB image. This blob is then dilated with a circular mask, to have an idea of the hand spatial extent. Taking the result from the previous frame, aka the pose of the hand, the 3D points that match color in the previous circular mask are preserved, the rest of the depth map is set to zero. In this way the input for the next step O = (os, od)is created, where osis the 2D segmented

color map and odis the corresponding depth map.

Modeling a hand

The hand model is a simple palm with five fingers. The palm is essentially an ellip-tic cylinder, meanwhile the fingers but the thumb consists of three cones and four spheres each; the thumb is an ellipsoid, two cones and three spheres. The fingers

(58)

kinematics is modeled through the use of four parameters encoding the angles. Fur-thermore the global orientation of the hand is represented by profiting the redun-dancy of quaternions. The result of this step is a 26-DOF model with 27 parameters.

Evaluating a hand hypothesis

Let h be the hand pose hypothesis and "C" the camera calibration information, rd(h, C)

is a depth map generated by rendering. rdis confronted with odfrom the previous

step, so that a "matched depths" binary map rm(h, C)is inferred. Then, os is taken

in consideration and reduced again, eliminating those pixels with unmatched depth in rm.

The function E(h, O) is created in order to measure discrepancy between the ob-served skin and depth maps coming from kinect and the generated maps for the hypothesis.

E(h, O) = D(O, h, C) + λk· kc(h) (2.10)

Where λkis a normalization factor, kc is a function used to penalize kinetically

im-possible poses and D is as follow:

D(O, h, C) = P min(|od− rd|, dM) P(os∨ rm) + ε + λ 1 − 2P(os∧ rm) P(os∧ rm) +P(os∨ rm) ! (2.11)

The first term measures the differences between the observation O and the hypothe-sis h, ε is added to avoid division by zero. The second term measures the difference between the model and the observation’s skin colored pixels. The sum range is omit-ted for clarity’s sake, but it is over the whole feature maps.

Stochastic optimization through particle swarms

"Particle Swarm Optimization (PSO) was introduced by Kennedy and Eberhart in [19][21]. PSO is a stochastic, evolutionary algorithm that optimizes an objective function through the evolution of atoms of a population. A population is essentially a set of particles that lie in the parameter space of the objective function to be optimized. The particles evolve in runs which are called generations according to a policy which emulates “social interaction”. Every par-ticle holds its current position (current candidate solution and kept history) in a vector xk

(59)

2.2. Articles 35

achieved, up to the current generation k, the best value of the objective function. Finally, the swarm as a whole, stores in vector Gk the best position encountered across all particles of

the swarm. Gkis broadcast to the entire swarm, so every particle is aware of the global

op-timum. Typically, the particles are initialized at random positions and zero velocities. Each dimension of the multidimensional parameter space is bounded in some range. If, during the position update, a velocity component forces the particle to move to a point outside the bounded search space, a handling policy is required. [. . . ] The “nearest point” method was chosen in our implementation. According to this, if a particle has a velocity that forces it to move to a point pooutside the bounds of the parameter space, that particle moves to the point

pbinside the bounds that minimizes the distance |po−pb|."(brief explanation of PSO, [30])

In the cited paper, PSO is adopted to work in a 27 dimension space, the number of parameters used to describe the hand pose in 3d. Correctly using temporal coher-ence, the results of the frame fxare the starting population for the frame fx−1

GPU acceleration

The most resource consuming part of this algorithm is the computation of an hypothesis-observation discrepancy function E(h, O). In fact, this calculations involves render-ing and other pixel-wise operation that uses the GPU capabilities. This lead to de-veloping a solution in which several hypothesis are confronted simultaneously, and also improving the hand model in such a way that computation in GPU can be par-allelized: therefore the hand model is transformed in simpler primitives, cylinders and spheres.

All in all this approach has its perks in: • Cheap

• Trackerless

• Robust (see figures 2.9 2.10) and downsides in:

• no feedback possibilities

(60)

FIGURE2.8: Performances of PSO algorithm w.r.t. (a) the PSO param (b) the distance from the sensor (c) noise (d) viewpoint variation. [30]

FIGURE2.9: Various hand poses correctly recognized by the method. [30]

FIGURE2.10: Error frequency expressed in mm [30]

(61)

2.2. Articles 37

2.2.4 Model-Based Hand Tracking with Texture, Shading and Self-occlusions

FIGURE2.11: Skinned hand model, Γθ visualized [25]

Proposed by de La Gorce et al. in 2008 [25] this method uses "analysis by synthe-sis" approach, consisting in the following steps:

1. Synthesis

2. Creation of the objective function 3. Pose and lighting estimation 4. Model registration

5. Texture upgrade Synthesis

At first the surface of a hand is modeled with 1000 triangles closed and oriented. Then the surface is deformed according the procedure called skinning. The skele-ton of this method is composed by 17 bones, with a total of 22 degrees of freedom. Obviously, each DOF is coded as an angle which represents one articulation: thus, in order to avoid unrealistic poses, each angle is bounded within realistic values.

Therefore the pose is fully determined by the vector θ ∈ R28, namely the 22 angles

plus the 6 variables of global position and orientation of the hand w.r.t. the camera’s coordinate frame.

The reflectance model for the hand used in [25] is the Lambertian reflectance model, that can be implemented assigning an RGB triplet at each vertex of the surface mesh,

Riferimenti

Documenti correlati

Oltre alle forme “c’è/ci sono”, che sono al presente, è possibile ottenere in modo analogo anche le espressioni al passato (c’era/c’erano oppure ci fu/ci furono) o al

Anthropology is a well suited example of a social field of inquiry where disciplines as diverse as history, sociology, cultural critique, biology, cognitive science – to name a few

In conclusione, si può affermare che l’intento di riorganizzazione della normativa di vigilanza per gli intermediari finanziari non bancari e di omogeneizzazione della stessa con

Non ci vuole molto a capire che la lingua conta nelle scienze umane più che in altri campi, e questo non per un elemento estrinseco, ma proprio per la definizione degli

Dal principale obiettivo discende poi la seconda RQ e, in particolare, tramite la stessa ci proponiamo di capire quale Strategic Business Unit (aviation o non aviation) si

There are several validated transiting multi-planetary sys- tems orbiting early M dwarfs similar to TOI-776 in terms of planetary architecture. Kepler-225, Kepler-236 and Kepler-

In Section 5 it is presented the approach that measure the performance of critical supply chain processes, in order to encourage a cooperative behavior across

Genetic Department, Institute for Maternal and Child Health - IRCCS “Burlo Garofolo”, Trieste, Italy. 4 Pharmacy and Clinical Pharmacology Department,