Track design - Real-World experiments - Design of the control system

Design of the control system

4.3 Real-World experiments

4.3.4 Track design

In order to carry out real-world experiments with Cozmo, we need to build a track specifically designed for robot dimensions. For this reason, we spent some time to search for the most reliable way to build a path where to train Cozmo efficiently.

Firstly, we opted for an easily transportable track to allow various attempt with different locations and environmental conditions that could affect the training phase. Furthermore, we noticed that the presence of reflections might influence the learning process, then it was necessary to use a material to use as terrain for the track that was less reflective as possible to avoid such kind of problems. The first choice was the black floor of the Data Science laboratory of Eurecom. It was useful only during the initial design and development of the control system to build small pieces of track in which testing functionalities. This solution had numerous drawbacks such as the impracticality to transport and high light reflection.

After this first try, we analysed different solution to design the track. The following list provides a brief report of the various solutions taken into account during the thesis, together with a brief analysis of advantages and drawbacks.

• Covering fabric: this material is easily transportable, but it has a high light reflection, and its structure is prone to make wrinkles and dunes particularly challenging to remove.

• Tar paper: this solution slightly diminished the reflection problem compared

Figure 4.5. Picture from above of the track we designed and build to carry out experiments with Anki Cozmo robot. The route is a mixed track that includes two straights, a narrow hairpin and a wide one that, united by a curve elbow, make up a serpentine. The track length is about 3 metres.

to the previous choice, but the material was fragile and with the same draw-backs of the covering fabric.

• Cotton fabric: this solution offers an easily transportable material with re-duced light reflection where it is easy to remove wrinkles and dunes.

Summing up, it is noticeable that the cotton fabric provides the right trade-off among all requirements reported before.

Another crucial factor was the dimension of the lane, which must reproduce an environment similar to real driving. We opted for analysing the ratio of the size of a vehicle to the width of a road. In our study, we utilised the width of a typical family car and utilised the average width of a typical Italian motorway for the lane dimension: we found out a value of about 160-170cm for the first one and between 275cm and 375cm for the second one. Therefore, we measured the Cozmo width that was about 5.5cm for a resulting ratio of ₃₀¹ . Accordingly, the scaled lane width obtained was between 9cm and 12.5cm. The other component to take into account was the material to use to draw the lane. In this case, we used a simple paper tape of width equal to 2.5cm. Because of the narrow and limited angle of vision provided

by Cozmo camera, we opted for a total width of about 10cm: the size of the road changes slightly throughout the path emulating what can happen in a hypothetical real road. Therefore, we chose this dimension because positioning the tape with a distance greater than 10cm would result in a significant part of the tape outside the view of the camera. A track picture is available in fig. 4.5 on the previous page.

Chapter 5 Experimental results

In the previous chapters, we described the reinforcement learning control system we designed, together with an analysis of the solutions we proposed for the problems we faced during the development process. Indeed, this process has not been free from difficulties, both of implementation level and parameter optimisation. After completing the design of this architecture, our second goal was to look for an algorithm that could better adapt to a real context, exceeding the limits set by DDPG in hyper-parameter tuning. The ideal would have been to find a parameter agnostic algorithm: an enabling feature to achieve excellent performance regardless of the specific configuration and hyper-parameter selection. During our research, we came across the SAC algorithm and, after a careful analysis of the paper and having understood the considerations made by the authors about SAC real-world applications, we thought it might be the right choice to get better performance than the DDPG experiments. For this reason, this chapter aims to present a detailed comparison between DDPG and SAC experiments carried out with Pendulum-v0 environment and show the performance achieved by SAC with the experiments with Cozmo.

The first section of this chapter focuses on the experimental methodology. It will be an opportunity to describe the hardware of the development machine we used for the experiments and present in a more schematic and precise way the OpenAI Gym environments on which we will apply the algorithms. We speak in the plural, because we have decided to report both the experiments performed on Pendulum-v0 environment, and those carried out with Anki Cozmo in the real world using the architecture we built. This part will also contain a brief analysis of how reinforcement learning experiments are assessed to date. For this segment, we took inspiration by [24], an exciting publication where the authors investigated re-producibility challenges, proper experimental techniques, and reporting procedures of modern deep reinforcement learning to draw up guidelines from which to start in order to obtain better reports, not so much from the result perspective, but from how they are reported.

The second and third sections of this chapter will, therefore, be devoted respec-tively to the two types of environments used. We will show all the useful graphs in order to analyse and to comment on the obtained results.

Nel documento Deep Reinforcement Learning for Autonomous Systems (pagine 89-93)