Comparison between two different tracking systems for Augmented reality in industrial workspace

(1)

UNIVERSIT `

A DEGLI STUDI DI PISA

Facolt`a di scienze matematiche fisiche e naturali.

CORSO DI LAUREA MAGISTRALE IN INFORMATICA

TESI DI LAUREA

Comparison between two different tracking

systems for augmented reality in industrial

workspace

Candidato: Francesco Vaira Matr. n◦ 292954

Relatori:

Ing. Franco Tecchia Prof. Marcello Carozzino

(2)

List of Figures

1.1 Examples of an Handheld device and of an HMD . . . 4

1.2 The HMD conceived by Ivan Sutherland . . . 5

1.3 peoples using Videoplace . . . 5

1.4 Overlaid graphics of KARMA system . . . 6

1.5 Quake in Augmented Reality . . . 7

1.6 Augmented reality games, in macroscopic scale and in real scale 9 1.7 Mask for faces, in snapchat application . . . 10

1.8 AR for designer . . . 11

1.9 Augmented reality for industry . . . 12

1.10 Examples of augmented reality in surgery . . . 13

1.11 All information automaticaly obtained by an AR application in an urban contest . . . 14

1.12 Augmented reality used for advanced physical studies . . . . 15

2.1 Perspective projection of a cube . . . 19

2.2 Registration of a 3D model . . . 21

2.3 The Pinhole camera model . . . 23

2.4 Example of ambiguous projection . . . 25

2.5 Examples of fiducial markers . . . 26

2.6 A Datacenter not suitable to the use of Markers . . . 27

2.7 The walls completely covered by a markers grid . . . 28

2.8 Graph of a Markerfield . . . 30

2.9 Pictures processing by Baratoff . . . 32

2.10 Marker relation mean computation. . . 34

2.11 Error standard deviation of pose detection regarding various point of aquisition. . . 35

(5)

2.12 Differences of projections depend on Marker Tilt . . . 36

2.13 Cube structure built throught images gotten in several points of view . . . 38

2.14 Natural Edge detection . . . 40

2.15 Pose detection using a truck Cad Model . . . 41

2.16 An images captured by a camera and its radial correction . . . 44

2.17 A model of environment drawn distorted on purpose, to per-fectly coincide with the distorted captured image . . . 45

2.18 Type of region in feature detection . . . 49

2.19 The circle pixel involved in corner detection . . . 52

2.20 The pyramyd built in a keyframe . . . 56

2.21 The steps involved in the patches comparison . . . 59

2.22 Map initialization in PTAM application . . . 60

2.23 The dominant plane used for playing purpose . . . 61

2.24 The tracking with PTAM . . . 64

3.1 A Fiducial Marker used by Vuforia . . . 67

3.2 A Marker fixed on a slab . . . 70

4.1 Structure of markerfield object . . . 75

4.2 Inclination angle α, used to compute the rotational confidence factor . . . 79

4.3 The middle way quaternion computed throught the Slerp method 83 5.1 Exploiting of the poses detected from several point of view . . 96

5.2 The barriers problem . . . 101

5.3 The working of a barrier . . . 103

6.1 A cross used to measure the spatial error . . . 108

6.2 Schema of crosses position on test table . . . 108

6.3 Schema of fiducial markers placement . . . 109

6.4 The drifting of a markerfield . . . 111

6.5 Error on the first session of visual test . . . 113

6.6 Error on the second session of visual test . . . 113

6.7 Positional error on markers row . . . 114

(6)

6.9 Wrong rotation causes error with linear trend . . . 118 7.1 Error correction by using the additional markers . . . 123

(7)

Chapter 1 Introduction

This study proposes the comparison of two different tracking systems for the augmented reality. Both systems use visual techniques to get the tracking. The first system is based on the use of Markerfields, a well-known method to keep the tracking in large environments. The second one is an hybrid system that uses both a fiducial marker and the reference points existent in the environment.

Both the systems have been implemented using the state of art techniques and they were compared taking into account the error measurement made by each of them.

Nowadays in the world, there are many industrial facilities that work with old machinery because they still work fine. Anyway, each machine has to be maintained in perfect efficiency, making regular checking and sometimes fixing any possible malfunction.

Since they are old, their technical manuals are not often available because have gone lost. For this reason, the checking system performed by some of the factory technical experts becomes very hard, there is a waste of time, especially when more than one expert works on the same machine.

Once one of them needs a checking or a fixing, the expert has to disassemble it completely, because the technical manual is not available and he does not know anything about it. Besides, if he knows the position of each machine component, he has to disassemble just the the part containing the element that causes the problem.

(8)

Having the expert fixed the machine, he has discovered where is the bad part, but has no practical way to share his knowledge with other experts that will operate with the same machine in the future. Therefore, the next expert that will check or fix the machine is going to disassemble it completely, like the first one.

The more advanced tool suitable to share and learn this kind of knowledge is the augmented reality.

Augmented reality is the perception of the real-word environment together with other unreal objects, used to increase the amount of information for the user.

With the support of this technology, when an expert discovers some new interesting information about an element of the machine, he can put them in a database, bound with points referring to a three-dimensional virtual model of a machine space. Through the same technology, any person can view all information collected from a machine quickly.

The main purpose of this work is to make a support tool able to provide the localized technical informations about industrial machinery for the experts to fix or maintain them.

Each information provided by these tools has to be combined with the referred point on the machine, to be quickly available in a practical way. It should allow the user to know every constructive detail of a machine, in order to speed up the operation of maintenance and fixing, making them easier. A technician encharged to fix a machinery can take advance from the use of the augmented reality, even if it means getting help from a remote expert. In this case, while the technician fixes the machinery and uses the device to exploit the augmented reality, the remote expert can see what the camera is taking using another proper device in a different place of the world. Furthermore, he can virtually place the information in a precise point of the technician environment to help him in his work.

Though the augmented reality is practicable using all the five human senses, this study just refers to it based on the use of the view.

When the user looks at a machine through a system providing the augmented reality, he can see immediately the machine together with the interesting information in overlay, placed near the points they are referred to.

(9)

The most used hardware platforms to use the augmented reality are the Head-Mounted display and the Handeld devices.

A head-mounted display (HMD) is a device display paired to a headset, such as a harness or an helmet. HMDs place images of both the physical world and the virtual objects over the user point of view. Modern HMDs often employ sensors for a six-degrees freedom supervision that allows the system to align virtual information to the physical world and adjust it according to the user’s head movements.

The Handheld device employs a small display that fits in an user’s hand. Initially, Handheld AR employed fiducial markers and later GPS units and MEMS sensors, such as digital compasses and a six-degrees freedom accelerom-eter–gyroscope. Today, markerless trackers are starting to come into use. Handheld display AR promises to be the first commercial success for AR technologies.

The two main advantages of Handheld AR are the portable nature of Hand-held devices and the ubiquitous nature of camera phones. The disadvantages are the physical constraints of the user who has ever hold the Handheld device out in front of them and the distorting effect of classically wide-angled mobile phone cameras when compared to the real world seen through the eye. Figure 1.1 show these type of devices.

In any case, the main problem in the augmented reality is the detection of the positional relation between the point of view and the environment that has to be augmented, in order to show the augmented objects in the right place on the devices display.

In scientific literature, this problem is called in two different ways: ”Tracking problem” when it refers to the computation of relation between the real envi-ronment and the observer’s point of view (or the device’s) and ”Registration problem” when it regards the correct placement of the augmented elements with respect to the real environment.

This study is made to investigate about the visual tracking systems used in the augmented reality, hence it does not depend on the different hardware platforms employed in the augmented reality. It takes into account the only assumption that any proper hardware platform providing augmented reality has got a video camera and a display. The first one is needed to keep the

(10)

Source: https://commons.wikimedia.org/wiki

Figure 1.1: Examples of an Handheld device and of an HMD

tracking of the environment, while the display shows the augmented reality to the user.

1.1 History of the Augmented reality

The History of the augmented reality begins in 1968, when Ivan Sutherland and his student Bob Sproull construct the first mounted Head Display[1]. The tracking system of this prototype is mechanic: the user has an helmet on its head, linked to a fixed point through a system of mechanical sensors. Using the direct kinematic, the system computes the position and the rotation of the helmet with respect to the environment and provides the virtual or augmented reality in a coherent way. This system is called ”Sword of Damocle” because the user has to hold many heavy mechanisms to use it. The figure 1.2 shows the system.

Next, in 1975, Myron Krueger conceived a system called “Videoplace”. Through this system, the user could interact with some graphical objects without any kind of wearable device. It used a videocamera to detect the user silhouette and considered it like an object capable to move the other virtual objects and interact with them in many other ways. The user could see the result drawn on a big screen as shown in figure 1.3. The system allowed

(11)

Source: http://www.idesigner.es/noticia/la-realidad-virtual-desde-sus-inicios/21

Figure 1.2: The HMD conceived by Ivan Sutherland

to many people in different places to join in the same session and interact togheter with the same objects, for example playing a simulated tennis match. In 1990 Thomas Caudell, a researcher of Boeing company, constructed a

Source: http://www.inventinginteractive.com/2010/03/22/myron-krueger/

Figure 1.3: peoples using Videoplace

device to help the technicians responsible to the maintenance of the electronic equipment on the aircraft. He introduced the term ”Augmented reality” for the first time, giving this definition: ”The interaction of superimposed graphics, audio and other sense enhancements over a real-world environment displayed in real-time”. This one was less detailed with respect to the current definition, but the meaning was similar.

In 1992 Lois B. Rosemberg created the term ”Virtual Fixture”. Through this term he meant the overlapping of abstract sensory information on a

(12)

workspace to improve the realization of a task.

In the same year Steven Feiner, Blair MacIntyre and Doree Seligmann showed one of the main articles related to an AR system called KARMA. KARMA means ”Knowledge-based Augmented Reality for Maintenance Assistance” and is an AR system that permits to improve the task of maintenance of a laser printer. It exploits an head mounted display and several ultrasonic tracking devices placed on the workspace to get the pose of the user’s head. The figure 1.4 shows an example of the use of this technology. Thanks to it, an user can know where are all the parts of a device, and if a component is not reachable, which are the components that block that component, just watching the abstract objects representation.

In 1997 Ronald Azuma introduced the current definition of “Augmented

Source: http://prior.sigchi.org/chi96/proceedings/overview/Feiner/fs txt.html

Figure 1.4: Overlaid graphics of KARMA system

reality”. He will be considered as a pioneer in the augmented reality scope. In the same year Steve Feiner developed the ”Touring Machine”, a system capable to show informations about the surrounding environment. It exploits a system invented by the Columbia university called MARS: Mobile Augmented

(13)

Reality System. It is a portable and a wearable system. During that time, ”portable” meant the user could bring the system with himself, wearing it like a backpack.

In 1999 Hirokazu Kato created ”Artoolkit”, the most famous open source framework to develop visual augmented reality applications.

In 2002 Steve Fainer published a paper on “Scientific America” magazine. The main argument contained in this paper was the fact that the computer scientists were working to improve the human perception of the world through the augmented reality.

In the same year Bruce Thomas developed a version of the famous Videogame

Source: https://ultimatehistoryvideogames.jimdo.com/arquake/

Figure 1.5: Quake in Augmented Reality

Quake, playable using the augmented reality. With this, the player can play moving his body outdoor in the real world, as shown in figure 1.5. Strangely, this fact did not attract the interest of the players, but the one of the computer science researcher.

(14)

1.2 Uses of augmented reality

Nowadays this technology is used in several scopes, for example entertainments, in the industrial production, in marketing and in medical science.

1.2.1 Entertainment

The augmented reality is used in videogames to create environments in which the virtual objects and the characters of the game are drawn in such a way they seem to belong to real world.

One famous example of game based on augmented reality is “Ingress”. It exploits the GPS sensor of the portable devices to get their position with respect to the world, and shows the augmented reality in a macroscopic scale, drawing the elements over a map. The players can interact with some virtual points of interest, placed near their position.

The same idea has been exploited by another famous game, called “Pokemon Go”. As in Ingress, the players have to walk in the real world, searching the virtual points of interest and interact with them. This game adds the use of visual tracking to register the characters of the game in the real environment, using the real scale.

The idea that a player has to displace his body in the real world to play has been successful since the first days after the release.

Figure 1.6 shows some screenshots of these type of games. Another kind of application inherent to the entertainment is the masking of human faces. The system tracks the user’s face using a portable device camera, and draws the mask over it. Since the faces have not a fixed solid shape, the system cannot use a normal tracking system for the augmented reality, but it must use more complex tools to detect the position of each part of the face one by one. Knowing the position of each part, the system can deform a mask and draw it deformed over the user’s face. In this way the system is able to draw a mask over the real face of the peoples, maintaining the same facial expression. The applications can show the peoples with the face of an animal, a fantasy character or with the facial features of other peoples. It is shown in figure 1.7 More complex procedures allow to change the facial expressions, making angry an happy face or viceversa, but maybe this concerns another

(15)

Figure 1.6: Augmented reality games, in macroscopic scale and in real scale

scope. It could be called ”Modified Reality” instead of augmented reality. Most of the modern films are made using many special effects to insert fake objects or characters in the video, because it would be impossible to record them in the reality.

This is made drawing a 3D model in a captured image through the augmented reality. In this case, due to the realism needed in the scenes of the film, the final result has to be very precise, the augmented elements need to be recorded in high definition for each video frame and the procedure requires a lot of time. Hence, the result is not shown in real time, but the objects have to be registered with the same type of techniques used for the visual augmented reality but spending more time in computation to obtain perfect results.

1.2.2 Industrial production

The augmented reality technologies are used to give support in industrial production to make the realization of artefacts easier, faster and cheaper. While the maker is assembling a product, since his hands must be good to work, he can use the augmented reality only through an head mounted display.

(16)

Source: http://balancedscorecard.blogspot.com/2017/04/

Figure 1.7: Mask for faces, in snapchat application

Through this display, he receives some information about every step of the product construction.

According to the type of artefact, the construction steps can be very complex. Being available the AR, it is not necessary for the worker, to read the manual of technical specification for the construction, because the system provides him all the information about the pieces and operations needed to every step of the construction process. The technician can see them in the precise point which they refer and the works results easier, faster, and less prone to error. The work of the technician is done moving his head in a finite positional set over the workspace, that is often a table, so it is possible to mount several types of structure. They are capable to get a precise position estimation of the technician head in the workspace and draw augmented reality precisely. This type of sensor uses electromagnetic or ultrasonic signals to detect the pose of the user’s head. Exploiting the signals emitted by a base fixed on the workspaces, the head mounted display reads the delays of each signal received to get his position. The same thing happens when a GPS receiver gets the actual device coordinates.

The current state of art allows to get the position of the worker point of the view quite precisely, just using visual methods. The producers of the

(17)

augmented reality system can provide more precise but more expensive systems that exploit electromagnetic signals, or cheaper but less precise systems that exploit the visual tracking.

The augmented reality also helps the technicians in the previous step of the industrial production: the design.

When the designer has to create a new prototype of a product, he can design it constructing its 3D model and using the augmented reality he can watch it in a real context, as shown in figure 1.8, to check if he is getting the results. The augmented reality can be also used for maintenance. The technician

Source: www.slashgear.com

Figure 1.8: AR for designer

can watch all the interesting information about a machinery overlapped on it, where is each part of the set or which are the parts that must be removed to reach other parts.

Figure 1.9 shows what a technician would see if he uses augmented reality when works on a set of electrical circuits. The whole scheme is projected on the real panel in the environment and the technician must not use his hands to see the paper version of it.

Augmented reality can also be used to maintain and repair in other ways. Sometimes, a factory has machines that were purchased from a company having headquarters in the opposite side of the world. If one of these machines breaks down, the production stops, but the expert technicians of the company are very far from it. Before the expert could reach the machine and fix it,

(18)

the factory owner loses a lot of money.

Using the augmented reality with a set of proper devices, the expert technician can remotely put information on the machine environment, while he drives a local technician to repair the fault. This way of working allows to save a lot of time and money.

Source: https://www.produktion.de/technik/datenbrille-fuer-servicetechniker-290.html

Figure 1.9: Augmented reality for industry

1.2.3 Medical science

The augmented reality also helps the surgeons when they are performing surgery. Using the radiography and the other kind of medical exams, it is possible to define a three-dimensional model of the internal parts of a body. Due to augmented reality, the doctor can see the drawing of the model overlapped to the real body, so that every interesting point of the model is projected in the correct position over the real body. Indeed, the surgeon can understand where is the better point to cut to reach the internal points to operate.

1.2.4 Marketing

Augmented reality is also used to promote clothing products and increase their sales.

(19)

Sources : https://anixvlog.wordpress.com/augmented-reality ;

http://plusarquitectura.info/?n=Using+Virtual+And+Augmented+Reality+In+Medical+Diagnosis

Figure 1.10: Examples of augmented reality in surgery

Some internet shops make an application for the customers to let them try the clothes that the shop sells.

The customer has just to download the application on his device and put himself in front of a camera. He can see his image on the device screen, with the dress overlying to his body as if he really wear the dress.

Other applications exploit the same idea to promote other kinds of products. Using the augmented reality, a customer can view a preview of his house, to see how it would appear if it was painted with several kinds of tint, and he can buy the one that gives the better result.

In the same way, when people have to furnish their house and a company producer provides the 3D model of a furniture, he can virtually place it in their house and see how the apartment would appear if the furniture was placed there.

This is applicable to any product that should be observed along with other objects, or in a certain place like the eyeglasses or the car accessories.

1.2.5 Touring

There are many applications developed to help the tourists. They can show all the information about the surrounding environment, such as which is the name of a mountain or the height of a building, the way to reach a certain place, the name of the points of interest like the restaurants, the shops or the museums. They use the GPS device to get the coarse position of the

(20)

observer and register the precise information in the captured images, due to the compass device and the visual tracking techniques.

An example of the informations that this kind of application can provide is in figure 1.11 .

The augmented reality is also used in the museums. When a visitor gets in,

Source: https://decoratorist.com/google-glass-architecture-solutions/

Figure 1.11: All information automaticaly obtained by an AR application in an urban contest

the attendant gives him a proper device to provide the augmented reality and get all the information about each work or piece of the museum.

The AR provides some further features: for example, using it on a broken statue it shows the original one, or using it on the skeleton of a dinosaur it shows the reconstruction of its body.

1.2.6 Teaching

The augmented reality draws the attention of the children. For this reason, it is also used to teach some concepts in primary school.

Learning, reading a book or listening to the teacher can be very boring for a child, instead the AR is like an interactive documentary, visible from any direction, and is more attractive.

(21)

Teaching does not only regard children, but it is also useful for teenagers and adults. For example, a student has to study the theory of the interaction between the atomic particles. Using the augmented reality, he can view an animation that visually explains the behaviour of the atomic components from any desired position, as is shown in figure 1.12.

Source: https://deskgram.net/explore/tags/technologyi

Figure 1.12: Augmented reality used for advanced physical studies

1.3 Thesis structure

This Thesis has the following structure:

• Chapter 1. It explains the reasons for which this work was done. It sums up the history of the augmented reality and its current uses. • Chapter 2. It enquires the current state of art on the scope of the

augmented reality. The first part talks about the most recent methods and the algorithms used to get the tracking of the environment, the costs and benefits of the last kind of sensors systems. This part also explains some basic concepts in the field of computer vision.

• Chapter 3. It specifies which are the tools used to perform all the useful work to get the data needed to write this thesis.

(22)

• Chapter 4. It explains the working and the implementation of the first method, using Markerfield.

• Chapter 5. It explains the working of the hybrid method and some problems found in its implementation phase.

• Chapter 6. It describes all the procedures performed to test the methods and compare them.

• Chapter 7. It shows what is possible to understand, having all the data provided by the tests and explains some possible future work that can improve the results.

(23)

Chapter 2 Previous work

Making sure the result of both methods is the better as possible, before the development, some research have been performed about the state of art in the augmented reality.

2.1 3D model drawing

The visual augmented reality is obtained showing the real environment with some other objects superimposed. The drawing of the augmented objects is performed like the 3D drawing of any other 3D application, applying some little distortion to make it coherent with the images captured from the devices cameras.

The developer makes a 3D model of the objects that the application must show to the users. Even if the AR application just shows the textual infor-mation, since the user must perceive them as they are placed in a precise point of the real environment, they have to be treated as planar drawings in a three-dimensional representation of the environment.

This 3D model stores the three-dimensional coordinates of each vertex that defines the model. All objects surfaces are represented by a set of triangles and the model stores their vertices.

These objects are drawn using the perspective rules. Since the user can move himself in any place of the environment, the system must be able to draw the perspective projection from any point of view of the environment. Whenever

(24)

the system shows the model, it computes the position of its points with respect to the point of view and draws the perspective projection.

This is a geometrical process that finds the coordinates of the model vertices with respect to a reference system centered on the point of view. The task is performed using mathematical tools, the matrices of homogeneous transfor-mations.

The result is a virtual space, where the point of view is the center of the reference system, and every object keeps the same spatial relation with respect to it.

The augmented reality application works fine if every augmented object is drawn as it is really in the environment. Though the camera moves, the object appears overlapped to the same real objects of the environment, it looks fixed to the workspace. The object appears bigger when the user goes close to it, and smaller when the user goes far from it.

To make it possible, the system must everytime know the camera position with respect to the real environment.

This knowledge has a simple mathematical representation, it is the homoge-neous transformation that binds the reference system centered on a camera to a reference system fixed in the real world.

The task of the position detection that lets to draw the object in the correct place is called Pose Detection or Tracking.

When a developer makes an augmented reality application, it must establish a reference system fixed to the environment and build a 3D model containing all augmented objects. As final result, the objects described in the model are draw overlapped to the real environment, so that the reference system used to describe the model coincide with the reference system enstablished for the real environment.

To draw the augmented reality, the system must get the position in which the points of the augmented objects must appear with respect to the camera. To get these, the system must detect the transformation between the camera reference system and the one fixed in the environment, and transform every point of the model.

The objects will be drawn in those positions, making use of the perspective rule. Figure 2.1 shows an examples of perspective projection.

(25)

Since the light ray entering in the device camera passes trough a lens,

Figure 2.1: Perspective projection of a cube

the resulting acquired image is not the real perspective projection of the environment, but is a little distorted. Hence, to draw the augmented reality perfectly overlapped over the acquired image, it must undergo the same type of distortion.

Each point defining the augmented object model is transformed in the follow-ing way:

xc = T xm (2.1)

The xm is the 3D model point, defined though a vector in homogeneous

coordinates, xc is the point referred to the camera reference system where

the model point will appear, and T is the matrix of the transformation from the camera reference system to the environment reference system. The transformation represented by T is obtained by the tracking system.

Assuming that the augmented objects are only information, they will be presented without any other improvement, because the main purpose of the information is saying everything about a certain point on the environment. Otherwise, if the augmented objects are representations of real objects, they must look like them. Hence a shading system, like the one made by Phong [33], tries to give the augmented objects an appearance similar to reality, changing the surface colours.

(26)

Though in most of the 3D applications the improvement of the Phong method is made thanks to three kinds of surface colours, (Ambient, Diffuse and Specular), when the augmented reality system draws the object, the light source direction in the real environment is unknown, so it can only use the Ambient and Diffuse component to colour the objects surfaces.

2.2 Tracking systems overview

In an augmented reality application, each virtual object or information has to appear in a fixed place of the real environment. Hence, to draw the augmented reality correctly the system must know the position of the environmental points in the captured images, where every augmented object has to appear. Assuming to fix a three-dimensional reference system in the environment, if the system knows the spatial relation between this reference system and the device camera, and the position of every environmental interesting point with respect to the reference system, it can mathematically get the position of each interesting point with respect to the camera.

Since the position of the interesting points in the environment remains the same, but the position of the camera changes while the user uses the appli-cation, the system must get the relation between the environment and the camera for every captured frame. This is the task performed by the tracking system.

As shown in fig.2.2 to draw augmented reality, the system move a model centered in the reference system of the camera in the center of environment reference system and draw as if it is there. The set of information about the position and orientation of an object as regards a reference system is usually called pose. A pose also represents the spatial relation between two reference systems.

It is possible to describe the relation between the environment and the camera as the camera pose in the environmental reference system, or as the environ-mental pose in the camera reference system.

The standard way to represent a pose is the use of objects from the algebraic group SE(3). Each member is an homogeneous transformation in a three-dimensional space. The SE(3) element that refers to an object pose, with

(27)

Figure 2.2: Registration of a 3D model

respect to a certain point of view, is the homogeneous transformation from the reference system centered in the point of view to the reference system centered on that object.

The Precise details about the element of SE(3) group and their representa-tions can be found in [23]. The representation to store a member of SE(3) over a machine, suitable to realize an augmented reality application, is the transformation matrix of the homogeneous transformation matrices for three-dimensional spaces.

Below there is the general structure of an homogeneous transformation matrix:

T =      r11 r12 r13 X r21 r22 r23 Y r31 r32 r33 Z 0 0 0 1     

Each one of these matrices is composed by a 3x3 rotation matrix placed in the head (its members have an r in their indexing), and a vector of translation (the members called X,Y and Z). The remaining members on the last row

are fixed to 0,0,0,1 in any case.

To apply the transformation to a point, we must multiply the matrix for the homogeneous coordinate vector that represents the point. The result is the homogeneous coordinate vector of the transformed point.

(28)

the rotation. Then, it would be possible to use a six-members vector to store a pose, saving memory space with respect to the use of a sixteen-members matrix. Nevertheless, the use of matrices greatly simplifies the work of the developers.

For instance, having the pose of a first object with respect to the camera and the pose of a second object with respect to the first object, both represented in form of matrices, it is possible to get the pose of the second object with respect to the camera just multiplying the pose matrices together. This fact is much exploited in the developing of the methods compared in this study, then it will be better explained in the chapters about the methods implementation. Some augmented reality systems use several kinds of sensors to compute the device’s pose, like the infra-red sensor, the magnetic sensor, gps etc. Some of these systems require structures made by many external objects fixed in the environment to get the device’s pose. These tracking systems are not enough serviceable in a factory, the main place where the systems covered by this study should work.

Instead, the two compared methods are enough accurate to the specified purpose and they use just the device videocamera.

2.2.1 Visual tracking

As the name suggests, ”Visual tracking” is a research area where the scientists search the way to detect an object pose just using the visual sensor.

Once the system has got the pose of an object fixed in an environment, it can mathematically determine the environment’s pose

The system receives the perspective projection of the environment captured by the device camera and has to get the pose of a depicted object just using the informations available on the image.

Assuming the acquired image is the perfect perspective projection of the environment (not really, because of lens distortion), knowing the real position of a set of points with respect to an environmental reference system and finding them on the acquired image, it is possible to estimate the transformation between the camera reference system and the environmental one.

(29)

ui vi pi zi yi xi di Z Y X

Figure 2.3: The Pinhole camera model

the known points, in the position they would appear in the captured image. There is a mathematical relation between a point position with respect to the camera and its position on the image. If a point pi has coordinates (xi, yi, zi)

in the camera reference system, it will appear in pixel di of the captured

image having coordinates (ui, vi), where ui = _zx_ii , vi = y_z_ii and the point

(u = 0, v = 0) will be placed on the center of the image. As illustated in figure 2.3

The set of three coloured lines represents the camera reference system. The model that describes this type of perspective projection is called pinhole camera. Since the system must get which one is the transformation that links the camera reference system to the environmental reference system, it must find the transformation that displaces the known object points in the place where they currently are. It does not know the precise current position of each point with respect to the camera, but it knows their position in the perspective projection.

Then, it is possible to exploit a non-linear system of equations in which the unknown values are the members of the matrix representing the wanted transformation.

(30)

Due to the perspective rules, the number of the system’s solutions depends on the number of points correspondences found in the image. If they are just two or less, the non-linear system has infinite solutions; if three correspondences are available, the solutions are more than one; with four points instead, the solution is often only one.

This is a well-known issue called PnP problem where PnP means ”perspective-n-point”. It concerns the fast and reliable pose computation of an object, having the position of n its points in the perspective projection.

The techniques proposed by the researchers follow two different ways, the iterative methods and the direct methods. In most cases the first ones are slower, but provide more accurate solutions than the other ones.

The iterative methods exploit a procedure called calibration by minimization of reprojection error. Assuming a given transformation T the system can get the position where the real points of an object would be projected if the transformation T was applied to it. So, the system iteratively searches the transformation producing a projection as similar as possible to the real one on the camera image.

Having a set S of 3D points si of an object, whose coordinates xi, yi, zi are

known in advance, and their current corresponding projection pi, the iterative

method searchs the following value: min

T ∈R4x4

X

i∈S

dist(pi, P T si)

where dist(a, b) is the euclidean distance in pixels between the points a and b and P is the projection matrix used to get the the coordinates where a point will be projected. All points are defined through homogeneous coordinates. Lepetit, Moreno and Fua in [24] propose a hybrid method that provides a solution to the problem for any number of available correspondences. This method spends an amount of time proportional to the number of involved correspondences to find a quite accurate solution with a direct method. Then, it performs some steps of Gauss-Newton iterative method to reach a more precise solution. The time spent in the iterative steps is negligible.

In some cases, due to ambiguity in the projection, the function of error repro-jection can have more than one local minima. For this reason, Schweighofer

(31)

and Pinz in [29] provide a method to get all the possible local minima of the function, namely all the poses that correspond to a certain projection. The figure 2.4 shows an example where the same triangle, placed in two different poses, gives as result the same perspective projection. In this case, the function of the error reprojection has two local minima. The system

Source: Algorithms for Augmented Reality-3D Pose Estimation-Sebastian Grembowietz

Figure 2.4: Example of ambiguous projection

can correctly perform this task, only if it can find a perfect correspondence between a points set projected in a captured image and another one in the real world, end if the system knows the position of those points with respect to the environment reference system. Hence, to perform tracking, the system needs a points set easily recognizable and distinguishable. These points are called Feature points and several techniques have been developed to find all the suitable points in an image quickly.

To perform the tracking, it is possible to put a special object in the environ-ment whose points are easily recognizable and whose positions are well-known. These kinds of objects are called Fiducial markers.

(32)

2.2.2 Tracking by fiducial markers

The Fiducial marker (marker below) is a planar object, with a figure drawn on it. Drawing colours and figures are chosen so that a detection algorithm can find the marker pose as easily and precisely as possible. The most used colours in practice are black and white, because they have high contrast in the grayscale image and in most environmental light conditions. The figure usually drawn on a marker is a large square. The squares drawn on the markers allow to detect precisely its four vertices on the image. Knowing the position of the markers vertices in the acquired image and their position in a reference system fixed on the marker, the system can get the marker pose with respect to the camera simply solving the relative PnP problem previously described.

Since the marker has a fixed position in the workspace, the system gets the

Figure 2.5: Examples of fiducial markers workspace pose detecting the marker pose.

The registration of augmented objects through one fiducial marker is suitable to perform the augmented reality. This is because the detection of the marker pose made by any detecting system is affected by the same kind of error that a person can commit if he tries to guess a marker pose just using one eye. Both the people and the machine are capable to perceive the coordinates of the object position with high precision, as regard to a plane perpendicular at their visual direction. On the contrary, both cannot perceive the distance of the object with high precision.

Then, if the system gets a wrong measure on distance of the marker, the augmented objects are drawn in a wrong place, too near or too far. Since the

(33)

user eye is not capable to perceive this error, the augmented objects anyway seem to belong to the real workspace.

This fact is true only when each marker is used alone. When the markers poses are used to build complex structures, like the ones used in the first method, the user can perceive some errors.

To perform the tracking with markers, the user has to place them in all the environment, so that at least one marker is visible in any point where the system must register an augmented object. Thus is not always possible, because in some environments, like the one depicted in fig.2.6 there is not enough space to insert many fiducial markers.

The working of the researcher in tracking by the fiducial markers is led to

Source: http://www.proactiveitsolutions.co.in/service.html

Figure 2.6: A Datacenter not suitable to the use of Markers

find the best shapes and colours for the marker, to improve the precision and the efficiency of the algorithms that detect the markers pose with respect of the camera. Gonz´alez, Guil, and C´ozar, in [25] propose a colours handling procedure that has the aim to improve the accuracy of of the tracking methods already known.

Some other studies have been performed to improve the practical issues of the tracking by the markers:

Wagner, Langlotz and Schmalstieg in [27] have invented some new types of unobtrusive markers that can be placed on the surfaces without hiding the drawing present on the surface.

(34)

high resilience to occlusion. Using these fiducial markers is possible to keep tracking even if the markers are not completely visible.

For some applications, the needed tracking accuracy is higher than the one provided by the use of one marker. Then, some studies try to use more than one marker to improve the tracking accuracy.

The work of Yu et al. [30] provides a solution to improve the pose detection using a grid of markers. It improves the accuracy very much, but it is very hard to use it in practice. It needs a large planar surface to work. Fig.2.7 shows an example of marker grid.

Instead, Ababsa and Mallem in paper [31] propose a tracking method that

Source: JanusVF: Accurate Navigation Using SCAAT and Virtual Fiducials. Malcolm Hutson; Dirk Reiners

Figure 2.7: The walls completely covered by a markers grid

has the same purpose of the previous one, but in this case there are no constraints on the relative position of the markers in the workspace. This system must know the positional relation among them in advance to work correctly.

Since nowadays the inertial sensors are quite cheap and almost all the portable devices have got them, Maidi at al. in [32] describe an hybrid system, robust to marker occlusions, that uses the data provided both from the camera and the inertial sensors. When the marker is not visible, it uses the inertial data to keep the tracking.

(35)

2.2.3 Building of markers structures

The research activity also aims to exploit several fiducial markers of the same environment to enlarge the area where the system can keep tracking. This has led to the development of a complex structure of fiducial markers. Using them, it is possible to cover a wide space with information, placing a marker in each point where the system has to show a piece of information. The user moves the camera around the machinery and every time a marker enters in the view field of the camera, the system tracks the marker and shows the related information in the right point.

But this is not enough. If an user has to add information about a machine in a system where each marker works alone, it should work with an amount of 3D different models equal to the number of used markers. This would be impractical.

The model of shown information should be only one for the whole environment. If it is one, the user that puts the information about the environment should work with only one model and its work would be easier. Moreover, the system would provide all the information in a continuous way, while the user moves the camera over the space.

Many studies have been done to use several markers as single structures. These kinds of structures are called Markerfields by the scientists. The main purpose of the Markerfields is getting the pose relationship between any marker placed in the environment and the main marker of the set.

Using the Markerfields, the user can place the informations about the machines, having to prepare only one 3D model. This model will be referred to the reference system centered in the main marker of the Markerfields (marker Zero, or base marker, below).

The structure of a Markerfield can be described using a graph. Each node of the graph represents a marker placed in the workspace, and each edge represents the measured relation between the markers linked by the edge. An example of a graph relative to a markerfield is depicted in the figure 2.8. The figure represents the graph of a markerfield composed by seven fiducial markers. The node named Mi represent the marker number i with 0 ≤ i ≤ 7.

(36)

M0 M1

M2

M3

M4 M5

M6

Figure 2.8: Graph of a Markerfield

the positional relation between the marker Ma and Mb because the system

has measured and stored it.

The graph is not oriented, because if the transformation from the marker Ma to marker Mb is available, the one from Mb to Ma is available too. The

second one is the inverse transformation of the first one. For any combination of two markers there is a bijection between the two transformations which link them.

The main purpose of the Markerfield is to make available every marker pose with respect to the marker Zero. In the example shown in figure 2.8, the transformation from marker M0 to the marker M1 is called TM0−>M1 and is

equal to the wanted pose for the marker M1, it’s the pose of marker M1 with

respect to the the marker M0, the global pose.

The system can get the global pose of the marker M2 concatenating the global

pose of the marker M1 with the transformation TM1−>M2. The concatenation

is performed through the product of their respective matrices:

TM0−>M2 = TM0−>M1TM1−>M2 (2.2)

Even if two markers Maand Mb are not visible at the same time (on the same

captured image), but there is a path in the graph that links their relative nodes, it is possible to evaluate the transformation from Ma to Mb.

(37)

Assuming Ga is the global pose of the marker Ma the general rule to get

the global position of another marker Mb linked to Ma in the graph, is the

following:

Gb = GaTMa−>Mb (2.3)

Namely, to get the global pose of a marker Mb, the system must have the

transformation between the marker Mb and another marker Ma whose global

pose is already know.

Hence is possible to give a recursive definition of global pose: G0 = I

Gm = GnTMn−>Mm

Where I is the identity transformation.

Since we represent every transformation using a matrix, the global pose matrix of a certain marker is equal to the ordered product of the matrices corresponding to the transformations that stay on the the path from the marker base to that marker in the graph.

From above, it is possible to state that if the graph is fully connected, it is possible to get the relation between every marker and the marker base (The global pose of every marker).

If all the available relations between the markers does not contain errors, the transformation of the markers would be equal to the concatenation of all the transformations corresponding to the edges encountered on the path from the first marker to the second one.

Of course, since the measurements of every marker pose are not precise, the resulting transformations between the markers are not completely equale to the real one, and there are some errors in the computed global poses. Although the relations between markers are not completely error-free, using them is possible to get the pose of each marker with respect to the base marker, quite similar to the real pose.

Since every marker is fixed to the environment, the relations between markers and environment not change. Hence, the system could use a set of measures taken by hand. In this way, the user can take the measure of the relationship between the markers with an high grade precision. But this work is a time

(38)

Source: Interactive Multi-Marker Calibration for Augmented Reality Applications. Gregory Baratoff, Alexander Neubeck, Holger Regenbrecht.

Figure 2.9: Pictures processing by Baratoff

consuming operation.

Baratoff, Neubeck, and Regenbrecht in [4] propose a technique to get the relationships between markers, without measuring them by hand.

Baratoff suggests taking a set of high-resolution pictures of the markers placed in the environment, before the AR application execution. The Markers must be placed in such way, at least one marker is visible for any framing of the workspace. There are no other constraints on position and rotation where the markers can be placed.

Then, offline, the environment pictures are processed to compute the mutual relations between the markers. This process iteratively searches the set of mutual relations that minimizes the reprojection error for all the markers in all the pictures. There are no time limitations for this work, it is completely performed offline. An Example of picture processing is shown in the picture

(39)

2.9

Later on, in real time, the system will use the computed relationships to use every marker as part of a structure, the Markerfield.

Nowadays, the speed of the CPUs is higher than the ones existent at the time of the paper writing, also in the portable device, and it is possible to perform the same procedure in real time. While the system is showing the augmented reality to the user, it can collect the relationships in all the visible markers. Of course, the procedure made by Baratoff provides more reliable results than the real-time procedure, because the second one has time constraints and is not based on high definition pictures, but provides acceptable results. The system can measure the relation between two markers if they appear both in the same frame.

Assuming that Ma is the matrix that describes the marker a detected pose

with respect to the camera in a certain moment, and Mb describes the

marker b detected pose with respect to the camera in the same moment, the system can compute the transformation from marker a to marker b using the mathematical formula:

TMa−>Mb = M1

−1

M2 (2.4)

The transformation matrix TMa−>Mb describes the marker b pose with respect

to the reference system centered on the marker a.

Siltanen, Hakkarainen, Honkamaa in [5] succeed in building the Markerfield while the system shows AR. Their method collects and refines the marker relationships each time they can be measured. The resulting relation of each pair of markers is the weighted mean of all the measured relations.

Figure 2.10 show how the system computes the weighted mean of two different relation measurements. In position φ1 the system get the measurements M1φ1

and M2φ1, and it can compute the releation (TM1−>M2)φ1 = M1φ1

−1

M2φ1.

Then when the camera is in position φ2, the system get the measurement

M1φ2 and M2φ2 and compute (TM1−>M2)φ2 = M1φ2

−1

M2φ2. The relation used

to build the markerfield is a weighted mean of relation (TM1−>M2)φ1 and

relation (TM1−>M2)φ2.

The authors of the paper base the weight of each measurement on a confidence value which denotes the probable accuracy with which a marker pose has been

(40)

Figure 2.10: Marker relation mean computation.

measured. The paper assumes to use the framework AR toolkit to detect the markers poses. This tool also provides a confidence value for each pose detection performed.

To detect the poses of the fiducial markers, we have used the Vuforia Tracking system. The reason of this choice is better explained in section 2.5

Unfortunately, the Vuforia tools do not provide any type of confidence value to the users.

Since we need a confidence value to make a weighted average and build a better Markerfield, we must use another method to get the confidence value of every marker pose detection.

In [6][7] and [8] some researchers study the relations between the fiducial markers pose and the probable value of error in marker pose detection using statistics. The results lead to conceive a function that allows to compute the most probable absolute error made in pose detections starting from the measured poses.

Using this function, it is possible to find a confidence value starting from the pose computed by Vuforia and build the Markerfield with weighted measurements.

All these papers agree on the fact that an error on the pose computing increases when the area of the marker projection in the captured image

(41)

Source: Accuracy in Optical Tracking with Fiducial Markers: An Accuracy Function for ARToolKit.

Figure 2.11: Error standard deviation of pose detection regarding various point of aquisition.

decreases.

The figure 2.11 shows the standard deviation of error in measurement collected in different points of acquisition.

The work in [8] concerns a further distinction on these types of errors: error in rotation and error in position. The authors analyse how the error and the standard deviation of measurement change when the marker orientation changes with respect to the camera.

Errors on yaw, roll and pitch detection change in many ways depending on the marker orientation. Summing up all, the yaw is almost detected well. Some errors can happen regarding the roll and pitch detection, when the marker plane is near to be parallel to the image plane.

The same happens if a human being must measure the marker rotation using his eyes, due to the perspective projection.

For instance, we assume there are two markers in front of a camera (figure 2.12): the plane of the first marker (a) is almost perpendicular to the projection plane of the camera and the one of the second marker(b) is almost parallel to

(42)

Figure 2.12: Differences of projections depend on Marker Tilt

that plane. If both markers are tilted with the same amount of degrees, the positions of the first marker corners in a captured image change a lot, while the corners position of the other one change a little.

It means that a little changing of corners positions can be caused by a large rotation, if the Marker plane is almost parallel to the camera projection plane. Hence, the tracking system perceives the marker pose with better accuracy when its plane is not perpendicular to the view direction.

Regarding to the error in the position, the coordinates that describe the marker position with respect to camera are three. The authors split them in two groups: X and Y corresponding to the axes parallel to the plane of the image, and Z corresponding to the axis perpendicular to that plane (Z coordinates approximately is the marker distance from the camera).

The error on X and Y axes is in any case near zero, instead the one regarding the Z axis increases when the marker projection area decreases, as described in other two papers.

2.2.4 Tracking by feature points

In any visual tracking system, the system must be aware of the real position of a points set in the workspace, and must compute the pose using the mini-mization of the reprojection error.

(43)

the tracking. There are some techniques that allow to perform the tracking, using the reference points present in the environment.

This problem is very similar to the problem known as SLAM problem, in the robotic science community. SLAM means Simultaneous Localization And Mapping. It concerns the auto-localization of the robots in an unknown environment. Durrant-Whyte in [43] describes it and the methods to resolve the problems very well.

The scientists have created many solutions and some of them must have sev-eral types of sensors, like ultrasound sensors or others that are not available on the portable devices. Then, the only interesting solutions for this study are those using the videocamera sensor.

Most of these solutions use of the Structure from motion methodology to perform the Extensible Tracking. They construct a workspace map little by little that contains the position of the environment reference points called Landmarks.

The map contains the coordineates of every landmark whose the system knows the position referred to the map reference system. For every image acquired from the camera, the system tries to compute the pose of map, if it succeds, then it collect all informations about new lanmarks useful to insert them in the map. If the system knows the positions of a group of landmarks in any three-dimensional reference system (the map in this case) and locates their position in an acquired image, it can use geometric methods to compute the camera pose with respect to the same map reference system. The same appens as with the tracking through fiducial markers. Then, when the system knows the camera pose, it can detect the projective space of the visible landmarks which are not yet in the map.

In other terms, the system detects the rays starting from the camera focus point and passing on all the points where the landmarks could be, just taking into account the information obtainable from the current acquired images. When more than one ray is available referring to the same landmark, the system can compute its real position, that is the intersection all rays referring to that landmarks.

The method is called Structure from motion, because it allows to get the real structure of a workspace using the information acquired from different points

(44)

Source: www.researchgate.net/figure/Structure-from-Motion fig1 234044400

Figure 2.13: Cube structure built throught images gotten in several points of view

of view. This allows to perform the extensible tracking because it enlarges the mapped workspace area step by step where it is possible. Fig. 2.13 shows how the structure from motion methodology reach to construct the structure of a cube getting some images from several point of view.

To allow the tracking, a landmark must be easily localizable, fixed to the workspace and distinguishable from the others in the environment.

Examples of proper landmarks are the objects corners that has a quite different colour from the background. They maintain their position in the workspace and it is an easy task to find a match between the corners pictured in an image and the same pictured in another one acquired from a different point of view.

The points suitable to be landmarks in the map are called feature points. There are different techniques to detect them in an image, better explained in section 2.3.

The tracking through feature points seems to be a very powerful technology, no fiducial markers neither any other artifact is needed to keep the tracking, it needs just an environment with natural landmarks. Unfortunately, this is not true if the extended tracking is used to provide the augmented reality.

(45)

If no other tool is available, the system has to build the initial map just using the information obtained from the captured images. There are procedures, for example the one described in [18] that allow to initialize a map using the landmarks position in two different images. But the system has no way to perceive the real dimension of the map because it does not know the real distance between the points where the two images have been acquired. This lead to buil a map that is coherent, but it has an unknown scale factor. Since the map extension proceeds taking into account the map scaled points, the new ones are inserted using the coordinates scaled with the same ratio. Then, the final result is a scaled map which allows to correctly track the camera rotation with respect to the environment, but it does not allow to track the real distances of the camera displacements. This trouble does not allow to provide the augmented reality for our purpose just using the natural feature points.

Making the precise registration of the augmented objects in the workspace is not possible. Since the map scale factor is unknown, the system cannot get the real distance of the environment from the camera, hence it cannot draw the augmented objects in correct size and in right position.

We need a system able to draw technical information in the right place with high precision. Hence, for a system that uses the feature points to track, we need a tool that lets to compute the real map scale factor.

Sometimes the acquired images do not depict correctly the real world, because for technical reasons the images are affected by noise. This is a big problem, a little error on the landmark position detection in an image leads to extend the map putting the new landmarks in the wrong position.

To solve this problem, the scientists have adopted some probabilistic methods to construct the map. Two examples of these methods are EKF-SLAM [34] and FastSLAM 2.0 [35]. These are stronger against the uncertainty given by the noise: they mix the current sensors measurement (in our case the landmarks position in the images) and the previous poses estimation using proper computation and giving a better estimation of the current pose. In other terms, the probabilistic methods build and work with maps that contain the landmarks expected position.

(46)

informa-Source: Edge Landmarks in Monocular SLAM - Ethan Eade and Tom Drummond

Figure 2.14: Natural Edge detection

tion, EKF-SLAM uses the Extended Kalman Filter, while FastSLAM 2.0 uses The Particle Filter algorithm. FastSLAM is more recent and more efficient than EKF-SLAM.

Many extensible tracking systems have been developed. Also [38] and [40] are examples of extensible tracking that use the Kalman filter. In [39] Jiangetal et al propose a method to use the lines of the environment as landmarks too. A special procedure detects the set of edges that form the lines and inserts them directly in the map. Fig. 2.14 shows a system detecting the edge naturally present in the environment.

Another method that uses the edges as landmarks is proposed by Eade and Drummond in [44].

Subbarao et al. in [41] don’t use of Kalman filter, but a special pose estimator that ensures to reduce the drifting generated by the map extension.

2.2.5 Tracking by CAD models

The CAD model is a 3D approximate representation of a group of solid objects, and it is even used to describe the environments. The systems that use the CAD model to track must have ready the structure model of the most

(47)

Source: https://www.jvrb.org/past-issues/4.2007/1159

Figure 2.15: Pose detection using a truck Cad Model

of the workspace.

Using some techniques of the computer vision science, the system detects the edges position in the captured images and finds a correspondence between them and the model edges. As usual, using the minimization of the reprojec-tion error method, the system computes the workspace pose with respect to the camera.

Using the real measures to design the environment model, the system can perceive the real distances and draw the augmented reality in precise points. It would be possible to use a CAD model together with an extensible tracking system. The starting CAD model allows to get the real camera position while the extended tracking system builds the map. In this way, the map would be constructed in real scale and the system would continue to extend it maintaining the same scale. Having the map scale factor, it is possible to register the augmented objects on the whole workspace correctly. This idea is exploited by [42]. Fig. 2.15 shows the overlapping of a Cad Model (red lines) on an image gotten by camera to perform the detection of camera pose.

(48)

fac-tory, the knowledge of a technician does not necessarily include the 3D object modelling and since the physical structure of the workspaces can change, the CAD models could need maintenance.

Genc et al in [37] propose a tracking method that builds a model of the workspace using the machine learning techniques, but it needs an external tracker to be initialized.

A more practical solution is using a fiducial marker like an object of known dimensions: as the CAD model, it allows to get the map scale factor, but no further work is needed.

2.2.6 Camera calibration

Every visual tracking system computes the camera pose exploiting the knowl-edge about a point set in the world and its position in an image captured by the camera.

This process is made researching some parameters that minimize an error function and involves the basic rules of the geometric perspective.

The ideal camera is a device that provides the images representing the per-spective projection of the environment. This model called Pinhole camera, previously described in section 2.2.1 is only an abstract thing. Indeed, due to the lens shape, the images do not often correspond to the ones with the perfect perspective projection. They can be affected by a radial distortion, and, since each camera has its own constructive details, it distorts the images in a different way.

To perform tracking optimally, the system needs the perfect prospective projection. For this reason it needs a procedure to transform the distorted acquired images, in their perfect perspective projection.

First of all, to get the undistorted images, the system must know how the acquired images are distorted, then it can invert the distortion and obtain a useful image.

The distortion generated by the camera can be described using a set of values, called Intrinsic parameters of the camera.

The main parameters are the following:

Comparison between two different tracking systems for Augmented reality in industrial workspace

UNIVERSIT `

A DEGLI STUDI DI PISA

CORSO DI LAUREA MAGISTRALE IN INFORMATICA

Comparison between two different tracking

systems for augmented reality in industrial

workspace

Contents

List of Figures

Chapter 1

Introduction

1.1

History of the Augmented reality

1.2

Uses of augmented reality

1.2.1

Entertainment

1.2.2

Industrial production

1.2.3

Medical science

1.2.4

Marketing

1.2.5

Touring

1.2.6

Teaching

1.3

Thesis structure

Chapter 2

Previous work

2.1

3D model drawing

2.2

Tracking systems overview

2.2.1

Visual tracking

2.2.2

Tracking by fiducial markers

2.2.3

Building of markers structures

2.2.4

Tracking by feature points

2.2.5

Tracking by CAD models

2.2.6

Camera calibration