Crowdtesting for chatbots : a market analysis. Empirical activity with App-Quality srl

(1)

POLITECNICO DI MILANO

School of Industrial and Information Engineering

Master of science in

Management Engineering – Digital Business and Market Innovation

A.Y. 2016/17

CROWD TESTING FOR CHATBOTS: A MARKET

ANALYSIS

Empirical activity with App-Quality srl

Supervisor:

Prof. Filippo Renga

Master thesis by:

Arcangelo Trani

(2)

Part 1 Working on the internal assets: Crowd & Platform

1. Company and service overview ... 2

1.1 Crowdtesting ... 2

1.2 Testing workflow ... 3

1.3 Testing solutions ... 3

2. Problem analysis and Plan of action ... 5

2.1 Domain of the analysis ... 5

2.1.1 The platform ... 5

2.1.2 The crowd ... 6

2.2 IT infrastructure: the Database ... 6

2.3 Problem overview ... 8

2.4 Solution overview ... 10

3. Implementation of the solution ... 11

3.1 Tool utilized: Microsoft Power BI ... 11

3.2 Database set-up ... 11

3.3 Analysis, correction, integration of the as-is (back-end) + Internal process organization ... 13

3.3.1 Data cleansing/correction:... 13

3.3.2 Feeding techniques:... 20

3.3.3 Attributes correction/integration ... 24

3.4 Analysis, correction, integration of the as-is (back-end) + External feature for customer ... 24

3.5 Creation of a new service: dashboard (front-end) ... 26

3.5.1 Pre-test crowd visualization ... 26

3.5.2 Post-test results visualization ... 32

4. Results and conclusion ... 38

Part 1+2

Integrating the internal assets with the market opportunity: crowd

selection by chatbot related attributes

1. ER diagram for chatbot testers attributes integration ... 2

(3)

Part 2 Working on a potential market: Chatbot

1. Bot introduction ... 2

2. Variables ... 3

2.1. Development Back-end ... 5

How to develop your chatbot ... 5

How to let people discover your chatbot ... 8

How to sell your chatbot ... 8

Agency ... 9

Must have characteristics for an agency ... 9

Selling preparation phases of the chatbot followed by an agency ... 10

After sales service ... 12

Development platforms and Bot frameworks ... 13

Types of bots by developing technology ... 18

1. Interpretation technology (understanding) ... 18

2. Response technology (replying)... 23

3. Knowledge base ... 24

4. Handling the conversation ... 27

5. Designing the dialogue ... 28

2.2. Consumption Front end ... 30

How we conducted the analysis ... 30

Types of bots by level of interaction with user ... 35

User platforms ... 42

Chatbot categories per platform ... 44

All the platforms ... 44

Alexa ... 46

Messenger ... 46

Telegram ... 47

Slack ... 48

Skype ... 48

3. Chatbots analysis by sector ... 50

3.1 Bank and Insurance sector ... 50

3.2 Media sector ... 54

3.3 Utility sector ... 64

(4)

Abstract

This work is the result of an empirical activity conducted with an Italian start-up based in Milan and well established in Italy but aiming to cover a top spot in the international field of software testing solutions. It is an innovative start-up offering solutions of crowdtesting through two main assets: the crowd of real end users working as testers and a proprietary platform where enterprise customers hire the crowdtesters to conduct testing campaigns for their own software.

In the Part 1 of the thesis we will focus on the internal improvement of the two main assets, crowd of testers and platform. Testers’ personal data and devices’ data registered on the platform are stored on an internal database; we initially worked in back-end to correct existing errors in the database due to misunderstanding, low attention and laziness of testers during the phase of registration, then we added new attributes when necessary and set-up new fail proof feeding techniques to prevent the recurrence of the same mistakes in the next future; after the database set-up on a third party tool (we did not work directly on the company infrastructure, we extracted the data from the real database and we created our own ER diagram with tables and attributes), we used the corrected and integrated infrastructure to create a front-end dashboard for crowd selection by demographics and devices’ characteristics. We also added to the database a table with the results of a particular testing campaign and we created the relative dashboard allowing a user to interact with the results of these kinds of campaigns that, since there are no bugs to be submitted due to the nature of the test, do not provide the customer with an interactive dashboard on his/her personal space on the platform.

Passing from Part 1 to Part 2 the focus tends to shift from an internal to an external perspective but trying to make of the external opportunity a potential to become an internal strength on the

proprietary platform: we created a new table of the database with testers attributes coherent with the requirements and needs of the highly growing market of chatbots. The attributes of the database have been fed with fake data assigned to each tester, so to create a demo of dashboard for testers selection by a customer requiring certain characteristics and willing to test his/her own chatbot with the proper end user.

In the Part 2 of the thesis we will focus on the analysis of the chatbot itself and its relative market. A chatbot is defined by some peculiar characteristics determined both during the development and consumption phases: when developing a chatbot a company can choose among a list of many different agencies specialized in chatbot development, several developing platforms, and diverse technologies of development; when interacting with the user the chatbot could reach different levels of interaction, be available on multiple platforms and perform actions solving several issues

depending on the category of application. It has been conducted an in-depth literature review of a still highly unstructured theoretical background to describe the variables related to the development phase; it has been conducted a statistical analysis of the actual market of chatbots to define the characteristics of around 5000 thousand of chatbots and the relative level of interaction with users, consumption platforms on which they work and categories of application. Finally, looking at three main Italian sectors (Utility, Media, Bank & Insurance) we found out the existing chatbots and interacted with each and all of them to understand the progress of the actual chatbot market.

(5)

1

Part 1

(6)

2

1. Company and service overview

1.1 Crowdtesting

App-quality is an innovative Italian start-up offering software application testing leveraging the power of the crowd. The testing phase is executed letting real users interact with the application and discovering bugs and bad experiences.

Typically, the testing phase has ever been conducted in a laboratory or through automated tests but then resulting in unexpected situations of malfunctioning or in general bad experiences because of the different point of view of a real user with respect to the team of testers in a laboratory.

For a software company, the advantage of using real end-users for testing rather than an internal team lies in both quality and quantity of the testing campaign:

- If conducted in a testing lab, you can cover around ten devices per sprint of testing; with a session of crowdtesting you can easily reach a number of fifty devices solving the problem of diversification of devices that could prevent a software to face malfunctions on devices that were not tested in laboratory

- The testing phase in a laboratory could be time pressured and available only in predefined windows of time; crowdtesting offers the possibility to engage testers 24/7 because of the great number of testers and their availability every time they are required

- It can take two weeks to cover all the possible scenarios because of the limited time and resources dedicated to the testing phase; crowdesting covers several scenarios in very short time due to the high number of devices and users available at the same time

A crowdtesting company provides the customer with three main services:

-The crowd: it is the community of testers registered on the platform always available to test softwares 24/7 with a great variety of devices, operating systems and demographic characteristics. The company itself is in charge of their “education” thanks to a university of testing where they are always kept updated about the modality of testing, the good practices and the rewards associated to their good performances

- The platform: the platform is two-sided, testers and customers can register, then, when a project of test is activated, the customer can follow the evolution of the testing campaign in real time, the tester, if it has been selected for the campaign, can open his personal dashboard to enter the testing campaign and follow the guidelines to successfully conduct the test and get his experience points and rewards

- The quality/project manager: it follows the project from the very beginning to the closure. He is directly engaged with the customer to find the kind of test best fitting the application and they together creates they use cases the testers will pass through when conducting the test. It opens the project on the dashboard and he is the first contact during the entire length of the test. He also reviews the uploaded bugs to approve, reject or require more information to the tester about them. At the end of the project, since he had in his hands all the sides of the testing campaign, he can draw up a final report to be delivered to the customer

(7)

3 1.2 Testing workflow

A crowdtesting campaign is organized in four main phases: 1) Definition of testing constraints

In this phase the project manager together with the customer define the domain of the project it is going to start setting the objectives of the test.

They create use cases and scenarios in which the software will be tested, so that testers, when engaged in the sprint of test, follows exactly the use cases pre-defined and all the bugs and malfunctions are found exactly where the customer is looking for them.

2) Selection of the crowd

Customer is interested in letting the app be tested by target users in order to create a product that has already been tested and judged by the typical user. Leaving on the market a product that you are sure is going to satisfy the end user from an usability perspective, is an

incentive to accurately select the crowd of testers.

In this phase the crowd manager is in charge of keeping in touch with its own crowd to set-up the date of the test, the type of devices and operating systems and the correct mix of devices required by the customer (smartphones, tabtlets, laptops, wearables)

3) Testing phase

During the phase of testing, the project manager keep under control the evolution of the test, accepting or rejecting the uploaded bugs found by testers and helping them to individuate a high number of quality bugs.

Customers can manage in real time the uploaded bugs directly from their platform and get statistics about devices and types of bugs.

4) Presentation of the results

The final results are always available on the platform for the customer on its own dashboard but the project manager provides an interpretation of the results and gives the customer insights about the problems arose from the beginning to the end.

A report is available for customers to be downloaded directly from the dashboard.

1.3 Testing solutions

The workflow of testing is always the one defined above but the type of test depends on the needs of the customer and it is arranged during the phase 1 when constraints are set.

There are two main business areas for which customers can conduct a test: 1) Quality assurance

It makes sure that the software works properly with no faults. It focuses on identification of bugs to develop a software bug-free. They can appear errors or the application can even stop working. It is possible to find malfunctions that prevent the user to make the application function as expected.

(8)

4 - Test cases bug finding: the project manager and the customer create personalized use cases

to be followed by testers during the campaign. Users go through the test cases one by one and every time a defect arises they compile the module and upload the issue proving it with screenshot and video records of the screen, if required also logs

- Test reproducibility: there are bugs already know on applications that are difficult to

replicate and probably depends on the devices and operating systems utilized. In this way it is possible to identify all the cases in which the bug can be replicated and punctually solve it.

- Geo testing: if an application is available in several countries, to make sure that it is properly working in each of them, a diversified crowd of testers from different geographical regions in engaged to perform the test and evaluates the correct functioning of the application in his own country.

- Fast test prototype: during the initial phases of development the final product is still far but you would probably like to interact with your target user to understand if you are going in the right direction. It is set up a controlled field where a group of target users are asked to try out the prototype and give their feedback.

- Performance testing: measuring the performances of apps or websites using a third party tool like jmeter or blazemeter, it is measured the ability of the software to handle with a growing number of users, also forecasting the number of users that would prevent the software to correctly work.

- Bug arena: the app or the website is left in the wild to users in an arena they can accede every time they want and, when using the software and finding out something wrong, they upload the issue on the platform (only if not already reported by another tester).

2) Usability and User experience

It makes sure that the application runs as expected by users. Even if an application has no bugs an user can be frustrated in the utilization of the software. They can find difficulties in fulfilling determined actions or they can be used to perform the same activity in a different way so they do not feel comfortable in acting in such a way. Navigation paths, contents, images can make the customer feel unsatisfied by the experience. Users can be required to fill in surveys or conduct video during the utilization of the software to get precious insights about the real actions of users when engaged with the software.

There are seven types of tests it is possible to run under this business area

- Rating preview: it is possible to forecast the rating you would get if releasing the software on the store in the actual condition.

- User screencast: testers register a video + audio “interview” where each step in the

utilization of the app is described out loud. Testers express their expectations and frustration about the usability and the functioning of the application.

(9)

5 - User experience and User interaction study: thanks to the support of an user experience

manager, testers are required to answer some questions while utilizing the applications and it is evaluated the level of confidence they get when interacting with the application and the level of satisfaction in relation to the actual experience of usage.

- Apps benchmark: a group of testers is required to confront and evaluate applications that are compared each other, typically the applications of the main competitors. Thanks to a survey it is possible to draw up a ranking of the applications by level of preference, usability, features available and so on.

2. Problem analysis and Plan of action

2.1 Domain of the analysis

The key factors to the success of the company are represented by a functional exploitation of two fundamental assets: the platform and the crowd. Both will be briefly presented in this first chapter and represents the domain of action of this work.

The crowd, intended as the database with all the data about the testers registered on the platform, will be the subject of our back-end implementation; we will clean the actual database, integrate it with new attributes and create an infrastructure (Entity-Relationship diagram) of Database allowing us to exploit the implemented solution for the creation of a new service, in front-end, on the

platform;

The platform, intended as the place where testers and customers log-on to take part to a testing campaign (for testers) and visualize bugs uploaded by testers (for customers), will be the subject of our front-end implementation; in fact, we will provide the customer with an easy to use interface for pre and post-test analysis (a “test”/”testing campaign” is the phase in which a software is available on the platform and tested by the crowd to identify defects and/or usability issues):

- pre-test service: dashboard for crowd/device selection;

- post-test service: dashboard for interacting with the results of testing campaigns not asking testers to submit bugs;

2.1.1 The platform

The platform offers a personal dashboard to users and can be considered two sided: customer dashboard, tester dashboard.

- Tester dashboard: testers can register and, using their demographics and devices’

information, they become part of the existent crowd. In order to start testing and so finally be rewarded they are required to take part in an entry test, which proves their abilities to go through a typical campaign of testing. They are provided with a section of the dashboard called “University” where they learn all the tips to become skilled testers able to, first, correctly perform the test and, second, be able to identify high quality defects.

- Customer dashboard: customers can register as well and gain access to their real-time test monitoring showing them all the bugs’ statistics about the running test and letting them download the final report with all the defects arose and the analysis of the result conducted by the work manager.

(10)

6 2.1.2 The crowd

The crowd is composed by a huge variety of testers which autonomously register on the platform to learn the basics of testing or to just improve their already existing skills and be rewarded,

performing in real test campaigns, by finding defects on customers’ app. Only a restricted number of testers is selected in each campaign. They are selected on the basis of three parameters:

demographics’ attributes, devices’ attributes, testing abilities shown in previous campaigns. 2.2 IT infrastructure: the Database

Once the crowd has been recruited and is waiting for the next campaign to start, it “just” represents a database of information the company can leverage to offer the customer a group of testers and devices. Customers are interested in the crowd database to set-up a testing campaign from two different perspectives: demographics characteristics and devices characteristics. They usually look for:

- testers representing exactly the target end-user of the application -in order to get precious insights

about pains and errors real users are facing during a typical use of the app-

- devices covering a huge number of alternative solutions in terms of manufacturer, Os and

technical features -in order to cope with the typical problem of fragmentation of devices utilized by

end users during their daily engaging with the software-.

During the registration phase a tester is asked to fill out several fields: in this way he is creating a new tuple (one per each tester) in the database and each field represents the value assumed by the attributes of the company database.

Columns are our attributes, rows are our entities; every time a tester registers a new device on the platform, a new row on the database is created with all the data about demographics and device features related.

I show here the list of attributes a tester is required to fill out in phase of registration. They are presented in two columns but, obviously, each of them represents a single column of the database.

Tester ID spoken_lang name booty surname payment_status total_exp_pts pending_booty is_verified address last_login address_number sex qualification education u2b_login_token birth_date user_login Acc. Subscription user_email Subscription device_id wp_user_id `form_factor employment manufacturer

(11)

7

city os_version

postal_code operating_system phone_number enabled

They represent the attributes of the table we will refer to under the name “Full_data_crowd” in our Entity-relationship diagram.

Just like the attributes asked to testers in phase of registration, there is a table that is generated during the testing campaign when testers submit their bugs. Some fields are directly typed by testers (i.e. Step by step description of the bug) others are introduced by the Work manager (i.e. Status of the bug, if accepted or refused) others are automatically generated when creating or editing the testing campaign on the platform (i.e. Id, bug Id)

id note

bug #id tester id

title & description tester email step by step description campaign_id expected results version_id actual results status_reason

status updated

bug category os

bug severity os_version

time manufacturer

occurance model

created application_section

Edited By media1

These presented above are the attributes that populate our database but let us see which is the structure of data we have when these data are extracted from the database:

- Data about the crowd (devices and demographics) can be extracted from the admin page of the platform in a single spreadsheet in csv format. This is a very useful function to quickly navigate through the data but it is not consistent for our work: we need an Entity-Relation structure of tables linked through primary and external keys to be able to analyse and correct the corrupted data, implement new attributes and feeding techniques, and exploit the clean data for the creation of the final dashboard.

Below is the spreadsheet directly downloaded from the admin page of the platform that represents the starting point of this work:

(12)

8 34 attributes (columns), 5992 rows. This was the number of values at the beginning of the work; during the phase of analysis, correction, implementation of new attributes and creation of the dashboards, new testers joined the platform and the number further increased.

- Data about the campaigns of testing are available to download from the platform, in a single spreadsheet (one per campaign) with all the information about each testing campaign

concluded or in progress. In this spreadsheet, the focus is obviously on defects identified by testers and, of course, there is a relation between each single defect and the relative

tester/device associated.

Below is the spreadsheet with the data about a testing campaign:

24 attributes (columns) and a variable number of rows depending on the number of defects identified by testers in each testing campaign.

The starting point of our work is not a database, just two excel spreadsheets from the real database. 2.3 Problem overview

We will consider the asset “Crowd”, from a back-end perspective of database analysis, correction and integration, and look forward to exploit the solution adopted in back-end to improve the quality of the service offered through the asset “Platform”, creating a front-end dashboard for crowd

(13)

9 selection and results visualization for test with no bugs that is currently not available for the

customer.

The Db is made by attributes describing the demographics of each tester (language, city, ecc.), the characteristics of their devices (manufacturer, Os, ecc.), the results they performed when engaged in testing apps on the platform.

A first preliminary problem to solve is the creation of our personal database with primary and external keys because, as we saw when talking about the database, what we actually have is not a database but two excel spreadsheets, one with the data about the characteristics of the crowd (about demographics and devices), another one with the data about the results of a testing campaign. After that, from the setup database, an analysis of the data can be conducted to understand which kind of problems we are going to solve on the actual data, what are the functional implementation we will add to the actual database and finally creating a dashboard to exploit the data.

After the set up and analysis of the as-is situation, few problems came out:

1) The feeding techniques of the Database are not properly implemented: testers can input wrong data, not homogenous with others, sometimes leaving blank fields ecc.;

2) As a consequence of the first problem, the values submitted by testers are not homogeneous, are wrong or left blank and need to be corrected;

3) The Database does not have a user interface for analysis and visualization of data both for internal and external purposes (i.e. visualization of data and statistics about crowd

characteristics; visualization of results from particular campaigns of testing without bugs -the customer dashboard, in fact, allows only -the visualization of statistics related to bugs and if a campaign is not asking for bugs it is not useful), thus, data exploration is difficult due to the necessary use of queries on the tables of the database;

The following are the key objectives to be achieved by the work:

1) Establishing new fool proof feeding techniques aiming at creating a mechanism ensuring the correct input of future data;

2) Analysing and correcting all the wrong values in the actual database;

3) Utilizing the new Database to create a new service of visualization\interaction with the data through a user-friendly interface;

Once known exactly the existing problems and set the objectives, I defined a line of action: I decided to look at the implementation from two perspectives and two levels.

Two levels:

1) Analysis, correction, integration of the as-is (back-end); 2) Creation of a new service (front-end);

Since the objective is to obtain tangible results at the end of the work and, even if a back-end operation of cleaning and integration of the Database is clearly considerable a tangible result, however, in some ways I did not considered it satisfactory from an “artistic” point of view. In fact, to be able to tangibly touch these results you would need to pass through a technical analysis. Hence, the first level (back-end) of the solution aims at making the infrastructure perfectly

functional, then, thanks to the second level (front-end), transforming the back-end intervention in a new service available in front-end.

(14)

10 Two perspectives:

1) Internal process organization;

2) External feature for customer;

Since the functionality of the interventions should be optimized both for internal purposes and for external customers: the first perspective focuses on what we wanted the database to be, based on the experience in this field; the latter focuses on what the database is not offering in terms of feature to be exploited by the customer.

Matching the two levels of depth with the two perspectives of analysis, I defined three macro-areas that have been analysed and integrated individually along the work in order to achieve the key objectives we set above:

LEVELS \ PERSPECTIVES Internal process organization External feature for customer Analysis, correction,

integration of the as-is (back-end)

DATABASE IS DIRTY

CAMPAIGNS WITHOUT BUGS ARE NOT INTEGRATED IN THE DB Creation of a new service

(front-end)

DATA VISUALIZATION ABOUT - CROWD CARACHTERISTICS - TESTING CAMPAIGNS WITHOUT BUGS

IS NOT AVAILABLE 2.4 Solution overview

Considering the two perspectives and two levels of intervention we defined, here is the process we are going to undertake for each of them:

LEVELS \ PERSPECTIVES Internal process organization External feature for customer Analysis, correction,

integration of the as-is (back-end)

-Data cleansing/correction of wrong values (rows of the DB); -Feeding techniques -Attributes correction/integration (columns of the DB); Entity-relationship diagram dynamically linking existing tables about the crowd characteristics to testing campaigns not available in the internal Db

Creation of a new service (front-end)

User-friendly interface:

- Open to all the members of the company in order to get, through data visualization/interaction, insights on past data and complete control over the ongoing process

- Integrated on the customer dashboard

PRE-TESTING CAMPAIGN

selection/visualization of crowd characteristics

POST-TESTING CAMPAIGN

(15)

11

3. Implementation of the solution

3.1 Tool utilized: Microsoft Power BI

Microsoft power BI is a business intelligence software for data preparation and analysis that also allows the creation of interactive dashboards and reports. It is possible to connect SQL databases or just excel spreadsheets of data and start working on them by creating tables and relations just like on a normal database. It is possible to upload up to 1gb of raw data.

There is plenty of functionalities to exploit but we will just list the ones we needed for the purpose of this work:

1) Setting up a database through an Entity-relationship diagram: once uploaded the data from different sources, you can connect tables from different databases in your own new

database. In this way you are not just going to replicate a database in Power BI cloud but you can actually integrate it exploring new possibilities through the use of new data; data can be collected and stored even from webpages;

2) Transforming and cleaning data: data collected can be cleaned in bulk by creating formulas ad hoc for selection, cleansing and replacement;

3) Creating interactive reports and dashboards: from the raw data, it is possible to create not just simple graphs but real dashboard and interactive reports through which you can visually navigate the data to get insights. It is possible to share reports, publish them on the web and even integrate in webpage through HTML code.

3.2 Database set-up

Before actually working on the values of the attributes themselves we need to create our database (using an entity-relationship diagram) on the software used, through which we will conduct the phases of cleaning, integration and creation of a dashboard; this is necessary to be able to transform the file excels we have in input - about the crowd characteristics (directly exported from the

platform) and about the testing results (exported from the database if test with bug, imported

through an excel spreadsheet when representing results arose from surveys of tests with no bugs)- in more tables interlinked through external keys that can be exploited at the end of the back-end phase to create the interfaces we want in front end, thus, not just cleaning the data but using them in a functional way.

The two csv spreadsheets from which we will extract the crowd characteristics about demographics and devices (called “Full_data_crowd” ) and the test results have already been presented in the Database paragraph.

The number of testers who registered, and so filled in their own row, is exactly 4195. Considering one value under each attribute, the amount of single values stored is 4,195(number of

testers)*34(number of attributes)= 142,630.

Now considering that for each tester there is more than one device, each row related to one tester is replicated for each device they submitted; the number of devices available on the platform is 4562. Considering the attributes, the total amount of values stored in the Database related to both the demographics’ and devices’ characteristics of testers is 155,108.

(16)

12 The number of bugs, devices and testers involved in a testing campaign strongly relies upon the types of campaign, the only strict constraint with need to set up in our architecture is that each bug has its own ID and to each bug is associated the ID of the tester who submitted it.

Known the attributes available on the csv spreadsheets, we can now split these data in more tables of our database and create the architecture we need. It will be the very first step of our work. The first step is to divide the single existing table with the crowds’ characteristics

(“Full_data_crowd”) in two different tables: one with only the demographics’ characteristics, another one with only the devices’ characteristics.

I decided to utilize this approach since the attribute “Tester ID” is of fundamental importance when we will create a dashboard for crowd selection and for results visualization of testing campaigns that are actually not stored in the internal database. In fact, the tester (with its primary key TesterID) will become the center of the entire database: he can owe more devices, he can submit more bugs and he can participate to more testing campaigns. I create a primary key TesterID associated to the table Demographics and a primary key DeviceID associated to the table Devices.

In the following screenshot is the Entity-relationship diagram of the new infrastructure:

Picture 1: ER diagram initial database

“Full_data_crowd” is the initial table we extracted from our database. We said that the value of TesterID is repeated to create many devices for one single tester.

The second table “Demographics” considers only the demographics’ attributes of the tester so the TesterID is not repeated and represents a primary key. It is related to the original table

Full_data_crowd through the attribute TesterID with a relation one to many (many times a testerID is replicated in Full_data_corwd, only one time a tester is listed in the Demographics table).

The third table “Devices” considers only the devices’ attributes of the tester. A complete list of all the devices is here available, therefore, the DeviceID is unique and represents a primary key; in order to assign the device to the owner tester I used the attribute TesterID from this table as an external key linking to the primary key TesterID in the Demographics table.

(17)

13 The fourth table “Bugs Entry Test 2.0” is a table with all the bugs associated to the campaign

named Entry Test 2.0. Each bug is unique and has its own BUG#ID that represents the primary key for this table.

A tester taking part to a testing campaign is linked to the table showing the bugs associated to the campaign; Each tester can submit many bugs, in fact the relation between the key TesterID (from “Demographics”) and TesterID (in “Bugs Entry Test 2.0”) is one to many.

Tables about Demographics and Devices are linked to the testing campaign through the key Tester ID.

Just to be sure that the overall picture of the infrastructure is clear I consider important to underline how a single tester, submitting many bugs in a testing campaign, can use many devices if required. Another important consideration is that if we want we can integrate this infrastructure with other testing campaigns. In the screenshot below you can see how one TesterID is linked to the campaign Entry Test 2.0 but also to the Campaign Apps Benchmarking:

Picture 2: ER diagram initial database including table with attributes of Bugs Campaign Apps Benchmarking Our infrastructure is set, we can now start with the phase of analysis, correction, and integration of the as-is database passing through three phases:

Data cleansing/correction; Feeding techniques;

Attributes correction/integration;

3.3 Analysis, correction, integration of the as-is (back-end) + Internal process organization

3.3.1 Data cleansing/correction:

From the analysis of the values in the Db it came out how multiple values where repeated and input in several ways even if clearly representing the same value. For example, it is clear that testers who

(18)

14 declared the possession of a device manufacturer called “Hewlett Packard” or “HP” or “Hewlett-Packard” were exactly talking about the same manufacturer.

In this section, I am not going to show every single row of the Db because it would result in an annoying list of devices, manufacturers, Os, cities, languages and so on, wrongly typed by a user: moreover, it is not representative of the methodology I used to solve the issue. In fact, once passed through the initial phase of analysis I used queries to recall and replace in bulk all the data requiring a correction. Let us go gradually with the formalization of the existing problems and their

resolution.

Attribute: “Country“

Under the attribute “country” it was meant to ask the tester his own Nationality but it created some confusion and the value is now corrupted. Some testers inserted the name of the city, others the region, others the Nationality. In the screenshot below you can see the list of non-repeated values under the category Country:

Two main problems were identified:

1) The values typed do not represent their nationality, we need to take the value typed and transform it in the relative nationality. When talking about the feeding techniques we are going to adopt in the next future we will also define a failproof approach to avoid repeating the same mistake in the future;

2) There are 2111 out of 4195 fields without the correct value of Country in the database. This is an issue that we can solve without directly involving the tester in this process; in fact, since we already know the city of the tester or they just typed the region instead of the country, we can associate to the value of city or region the relative Nationality.

I followed three steps to solve the issues:

- For the rows with name of the cities instead of the nationality, I just used the table I added in the database with an association between cities and nationalities;

- For the rows with the name of the region instead of the nationality, I used the table with the data about cities and Italian regions and replaced the value of region with the general value Italy;

- For the wrongly typed values of Italy (i.e. Italioa, Italian, Itali, Itaglia) I replaced all the rows having “It” in the value with the unique value Italy;

(19)

15 - For the values left blank I associated the city typed under the attribute city, with the relative

Country using another table with an association between cities and country;

- The rest of them were just punctual problems to solve like “TE” or “VB” representing the province but easily replaced manually;

The final list of attributes arose from the bulk re-association of the value Country is the following:

Out of the total number of Testers (4195), only 193 testers have the value of Country left blank, we can consider the issue of blank fields as solved, we will directly ask those testers to type some indications about their provenance;

The number of values under the attribute country is now correct, only nationalities are showed.

Attribute: “Language“

Leaving the tester the freedom to type a list of his languages created a great confusion in the database. The list of alternative languages is infinite since they can not only type the language in different orders but also input the Language value using abbreviations, capital letters, wrong syntax and so on.

The list of non-repeated values input by testers is shown in the screenshots below. It represents a single column of our database but to be able to show all of them in a compact way I took three different screenshots with 6 columns each; taking the list from each column and putting them under the single attribute Language it is easy to imagine how the use of the database as an asset become impossible even when having the ability to use queries to navigate through them.

(20)

16 Our first concern in this chapter is to define few categories and categorize under the same value of attribute when the value should be the same but it is just typed using different abbreviations; In the next chapter we will change the way these data are input by testers when talking about the feeding techniques.

(21)

17 The order of input of the language remains important, in fact, a tester who typed “Italian, English” is considered as first language Italian and second English which, obviously, is different from a tester who typed “Englis, Italian”.

The available languages on the platform are the following: Albanian, English, Hindi, Iranian, Italian, French, Spanish, Tamil, Telugu, Philippine, Chinese, Guijarati, Bengali, Marwari, Marathi, Russian, Dutch, German, Slovakian, Japanese, Portuguese, Arabic, Ukrainian,

The new categories available under the attribute Language are then a combination of them also considering as important the order. If the proposed order has never been typed by any testers then the category is not created.

Below is a screenshot of the new values available on the database under the attribute language. Being available means that at least one tester typed those languages exactly in that sequence. In order to consider the use of abbreviations to express the same value of language (i.e. Italiano, Italian, Ita), the matching among the actual values and the potential combination has been made using codes easily matching each possible abbreviation (i.e. for Italian, Italiano, Ita we used Ita).

Attributes: “Manufacturer“ and “Operating_System”

In the screenshots below, the list of values input by testers under the Attribute: “manufacturer” and the Attribute “Operating_System”; the column of one attribute is split in more columns (there is always a label on the top recalling the name of the attribute) but obviously in the Db there is only

(22)

18 one column under the attribute field, here is just to give a compact view of the Db column:

(23)

19 Problems arose are now listed and solved one by one:

Decoupling of multiple values input in a single cell:

few testers, during the phase of registration of a single device thought they were going to register all of their own devices together; thus, the value of the attribute “manufacturer”/”Operating_System” is a list of four manufacturers/OSs. Looking at the other parameters the tester typed in the successive phases of registration I identified the correct manufacturer/OS of the device they were going to register and I finally eliminated the other values typed.

Grouping same values (but differently typed) in a single homonymous category: As anticipated in the general presentation of the data-cleansing phase, the choice to leave a tester the freedom to type has created multiple values repeated even when representing the same value. For example, the values “Asus”, “Asus pro”, and “ASUSTek Computer Ink” clearly refer to the same manufacturer “Asus”.

All of these activities were performed in bulk by creating functions that, when finding similarities in the name of the manufacturer/OS, proceeded in grouping them under the single name of the unique common manufacturer/OS.

(24)

20 3.3.2 Feeding techniques:

We decided to create a new feeding procedure of the database focusing on the four main areas: User profile

1. Address

2. Educational background 3. Language

4. Devices

User profile – Address

The address typed by users when registering on the platform will be in real time verified through the use of a google API.

When starting typing the name of a city and the address, thanks to the API it is proposed an auto-filling of the field. From this information, we can extrapolate the values of:

(25)

21 2. Region

3. Cap

An example is shown here.

User profile – Educational background

Actually, the value is typed in a blank field free to input. The new solution will lean towards a multiple-choice mandatory and another free format typing.

For the mandatory field the available choices will be: 1. Second level (High School)

2. College (No Degree) 3. Degree or equivalent 4. Bachelor or equivalent 5. Master or equivalent 6. Phd

The second field free to type, gives the tester the possibility to better explain his level of education. In case of users logged through Linkedin, data about education took from the API will be used to automatically fill in this empty field.

User profile – Language

On the internet, the standard when asking for a language requires a user to type the level of knowing on three categories: spoken, written, comprehension.

This approach is too complex and we are not interested in this level of depth. Our approach will be as follows:

A dynamic field auto filled by a function that associates to the Nationality, the first language. Other field available to fill by testers interested in adding secondary languages.

In the Db we will use the international standard code (it, en, ja, fr … ) and a level of knowledge from 1 to 4 indicating the levels Base, Discrete, Advanced, Mother tongue.

A code list of languages is available on wikipedia and we are interested in adding the European, Indian and main oriental ones.

User profile – Devices

- Tablet and smartphone:

Testers will be asked to select their device from a list of existing ones. In the actual database, it already exists a list of main devices registered along the past years. Obviously, it is not exhaustive and we will integrate that list through the use of an API.

An example of the API is on this page. The data we will collect from the API are:

• Manufacturer (Brand) • Model (device model) • Screen Size

(26)

22 The version of the software will be directly asked to testers in a multiple choice field. The list of versions we will make available to be selected is the following:

Android versions

All the Major releases of Android will be showed both with the reference code and the comercial name:

os_version version_name Release date Label shown to user

7.1.1 Nougat 2016 Dec 5 25 7.1.1 (Nougat)

7.1 Nougat 2016 Oct 4 25 7.1 (Nougat)

7.0 Nougat 2016 Aug 22 24 7.0 (Nougat)

6.0.1 Marshmallow 2015 Dec 7 23 6.0.1 (Marshmallow)

6 Marshmallow 2015 Oct 5 23 6 (Marshmallow)

5.1.1 Lollipop 2015 Apr 21 22 5.1.1 (Lollipop)

5.1 Lollipop 2015 Mar 9 22 5.1 (Lollipop)

5.0.2 Lollipop 2014 Dec 19 21 5.0.2 (Lollipop)

5.0.1 Lollipop 2014 Dec 2 21 5.0.1 (Lollipop)

5.0 Lollipop 2014 Oct 17 21 5.0 (Lollipop)

4.4.4 KitKat 2014 Jun 23 19 4.4.4 (KitKat)

4.4.3 KitKat 2014 Apr 14 19 4.4.3 (KitKat)

4.4.2 KitKat 2013 Dec 9 19 4.4.2 (KitKat)

4.4.1 KitKat 2013 Dec 5 19 4.4.1 (KitKat)

4.4 KitKat 2013 Oct 31 19 4.4 (KitKat)

4.3 Jelly Bean 2013 Jul 24 18 4.3 (Jelly Bean)

4.2.2 Jelly Bean 2013 Feb 11 17 4.2.2 (Jelly Bean)

4.2.1 Jelly Bean 2012 Nov 27 17 4.2.1 (Jelly Bean)

4.2 Jelly Bean 2012 Nov 13 17 4.2 (Jelly Bean)

4.1.2 Jelly Bean 2012 Oct 9 16 4.1.2 (Jelly Bean)

4.1.1 Jelly Bean 2012 Jul 23 16 4.1.1 (Jelly Bean)

4.1 Jelly Bean 2012 Jul 9 16 4.1 (Jelly Bean)

4.0.4 Ice Cream Sandwich 2012 Mar 28 15 4.0.4 (Ice Cream Sandwich)

3.2.6 Honeycomb 2012 Feb 15 13 3.2.6 (Honeycomb)

4.0.3 Ice Cream Sandwich 2011 Dec 16 15 4.0.3 (Ice Cream Sandwich)

3.2.4 Honeycomb 2011 Dec 15 13 3.2.4 (Honeycomb)

4.0.2 Ice Cream Sandwich 2011 Nov 28 14 4.0.2 (Ice Cream Sandwich)

4.0.1 Ice Cream Sandwich 2011 Oct 19 14 4.0.1 (Ice Cream Sandwich)

4.0 Ice Cream Sandwich 2011 Oct 18 14 4.0 (Ice Cream Sandwich)

3.2.2 Honeycomb 2011 Sep 30 13 3.2.2 (Honeycomb)

2.3.7 Gingerbread 2011 Sep 21 10 2.3.7 (Gingerbread)

3.2.1 Honeycomb 2011 Sep 20 13 3.2.1 (Honeycomb)

2.3.6 Gingerbread 2011 Sep 2 10 2.3.6 (Gingerbread)

2.3.5 Gingerbread 2011 Jul 25 10 2.3.5 (Gingerbread)

3.2 Honeycomb 2011 Jul 15 13 3.2 (Honeycomb)

2.3.4 Gingerbread 2011 May 10 10 2.3.4 (Gingerbread)

3.1 Honeycomb 2011 May 10 12 3.1 (Honeycomb)

3.0 Honeycomb 2011 Feb 22 11 3.0 (Honeycomb)

2.3.3 Gingerbread 2011 Feb 9 10 2.3.3 (Gingerbread)

2.3 Gingerbread 2010 Dec 6 9 2.3 (Gingerbread)

2.2 Froyo 2.2 (Froyo)

Ios versions Windows Phone versions

os_version

iOS 7.0 iOS 7.1 iOS 8.0 iOS 8.1

(27)

23 iOS 8.2 iOS 8.3 iOS 8.4 iOS 9.0 iOS 9.1 iOS 9.2 iOS 9.3 iOS 10.0 iOS 10.1 iOS 10.2 iOS 10.3 - PC:

o Laptop side, we will create a list of main brands through which testers will choose, and a last field named “Custom/other”.

o Mac side, we will give the list of all the available models on the market. We will create macro-categories of OS versions:

Windows Desktop versions

An available tip will allow testers to select 32 or 64 bit and the versions will be as follows: os_version 10 8.1 8 7 Vista Xp OS X Desktop versions

Versione Commercial name Label shown to user

10.6 Snow Leopard 10.6 (Snow Leopard )

10.7 Lion 10.7 (Lion )

10.8 Mountain Lion 10.8 (Mountain Lion)

10.9 Mavericks 10.9 (Mavericks) 10.10 Yosemite 10.10 (Yosemite) 10.10.1 Yosemite 10.10.1 (Yosemite) 10.10.2 Yosemite 10.10.2 (Yosemite) 10.10.3 Yosemite 10.10.3 (Yosemite) 10.10.4 Yosemite 10.10.4 (Yosemite) 10.10.5 Yosemite 10.10.5 (Yosemite)

10.11 El Capitan 10.11 (El Capitan)

10.11.1 El Capitan 10.11.1 (El Capitan)

10.12 Sierra 10.12 (Sierra ) 10.12.1 Sierra 10.12.1 (Sierra ) 10.12.2 Sierra 10.12.2 (Sierra ) 10.12.2 Sierra 10.12.2 (Sierra ) 10.12.4 Sierra 10.12.4 (Sierra ) os_version 10 8 8.1 10

(28)

24

Linux versions

A list of OS versions will be available but another free field will be available to give us more info if considered valuable by the tester.

os_version Ubuntu Centos Redhat Fedora Archilinux Mint Suse Slackware Other 3.3.3 Attributes correction/integration

Since the business is particularly well established in Italy, it is important to keep track of the Regions of provenance and the Geographic area (Nord-ovest; Nord-est; Centro; Sud; Isole) of each tester. Even if it is actually not asked to the tester, we can integrate this feature by adding a new table to our database with an association between city and region. We link the external key “city” available in our actual database to the primary key “city” of the new table added.

Here is a screenshot of the Entity-relationship diagram:

Picture 3: ER diagram with new table CODICI COMUNI ITALIANI

Obviously, the relation is many to one: there are many testers having the value “Milan” under the attribute City and in the table CODICI COMUNI ITALIANI there is only one city as Milan with a direct association to the belonging region.

3.4 Analysis, correction, integration of the as-is (back-end) + External feature for customer

The actual service offered to customer includes the access to a personal dashboard where it is possible to inspect the results of their testing campaigns. The process behind the visualization of the results on the dashboard is as follows:

- Testers subscribe to a testing campaign

- Testers submit bugs during the testing campaign - Bugs are stored in a table of our Db

(29)

25 - In the Bugs table, “Tester ID” directly links, as foreign key, to the crowd table in the Db

- The dashboard shows statistics comprehensive of the association of the submitted bugs with the relative testers who submitted them and all the related info about Device, Os and so on.

The problem we are going to solve in this part of the work is the following:

during a testing campaign where testers are required to submit bugs (usually functional testing), since the form to submit them was created on the platform itself, bugs input are directly stored in the Db and can be linked to the info about the tester through the use of external keys among different tables of the Db.

On the contrary, during a testing campaign where testers are not required to submit bugs (usually usability testing) they are required to provide answers to qualitative survey or upload sample of videos using an external interface, not implemented on the platform. For this reason, data are not directly stored in our Database but are stored in excel files which are used to create a final report to be shown to the customer.

Thanks to the creation of an Entity-relationship diagram, we are going to create a dynamic

infrastructure where Excel files in output from non-bugs-based campaigns can easily become part of the entire infrastructure of data and lead to analysis, manipulation and dashboard creation directly linked to all the information about the crowd demographics and devices. In this way, the customer can access both the results of the test and the demographics associated to the testers submitting them.

In the picture below is the Entity-relationship diagram created:

Picture 4: ER diagram including External table - Test without bugs

Tables about Demographics and Devices are linked to the testing campaign with no bugs named “External table – Test without bugs” through the external keys “Tester ID” and “Device ID”. This new, completely interconnected infrastructure represents a starting point for the potential creation of an interactive webpage for the customer to be implemented on his dashboard. In the next part of the work the focus will be exactly on this: exploiting the new infrastructure and offering a new service both for internal and external purposes.

(30)

26 3.5 Creation of a new service: dashboard (front-end)

Now it is available a database that is clean, progressively fed in a correct way by testers, completely linked (through the Entity-relationship diagram) to all the possible kinds of testing services and linked to all the past campaigns activated by customers.

Nevertheless, the use of this infrastructure is not easy and it is time consuming even for people able to generate queries and formulas to inspect the data.

The scope of this part of the work is to create a visual interface that is interactive and full of quick and useful infographics.

The visual interface created can be useful both before and after a testing campaign. In fact, two interfaces have been integrated on the dashboard:

The PRE-TEST crowd visualization; The POST-TEST results visualization;

3.5.1 Pre-test crowd visualization

(Here the links to the initial and final dashboards created which I will refer to in this paragraph) From the clean database now it is possible to extract insights about the crowd showing appropriate statistics and allowing to select the preferred demographics’ and devices’ characteristic.

In particular, we are interested in showing some aspects of our crowd that can be particularly relevant for our customers. The cleaning, feeding and integrating phase of the initial database followed a practical oriented implementation keeping in mind the final goal of enhancing the value of both the assets: crowd (with respect to the database) and platform (with respect to the customer dashboard).

With the purpose of showing the improvements we achieved on our database through the previous steps of the work, we can compare two dashboards: the first one with the initial data and another one with the clean database. In this way, we can graphically and clearly recognize the value of the back-end implementations.

At the beginning of the work we set-up a database based on an Entity-Relationship diagram with two separated tables for Demographics and Devices, it is clear that the final purpose was to allow a customer and an internal actor to select testers from these two different perspectives. Hence, the dashboard itself has been created two sided: on the left Demographics, on the right Devices characteristics.

Demographics ‘characteristics

Talking about the Demographics’ characteristics of the crowd we can show the results of the cleansing phase related to the attributes Country and Language and we can also show the integration of the two new attributes Region and Geographical Area which were integrated

considering the fact that the business of the company is mainly concentrated in Italy and customers could be particularly interested in selecting testers from some specific geographic areas.

In the screenshots below are the two front-end interfaces for visualization and selection of the demographics characteristic before and after the back-end solutions we adopted. Note that the number of active testers in the first screenshot is lower than the second one: this is because, when

(31)

27 we initially extracted the values from the database, before cleaning and integrating it, the number of testers registered was 3834 and, while working on the database to clean and integrate it, new testers registered and the final dashboard shows a number of active testers of 4195.

The initial interface for Demographics selection before correcting the Database:

Picture 5: Initial dashboard for crowd selection

Just imagining for a moment of being an Italian customer looking for testers to select for his testing campaign, let us think to the actual pains, using the initial database, from different perspectives:

1) The geographic distribution of the testers is quite difficult to interpret and utilize since they are distributed using the only geographical attribute available in the database: City.

It is clear that it exists a high density of testers in Italy, Americas, India and Albania but if we zoom-in on the map and try to get a more precise analysis we only get a distribution of thousands of points on a map. City is clearly a too particular attribute to be exploited. 2) Even using a slicer with the possibility to select or deselect the preferred cities (the section

named City on the very bottom of the dashboard) it is still too difficult to take a decision about testers’ selection by using the Attribute City. I would rather prefer to select them from a region in Italy or in general a geographic area.

(32)

28 3) The Nationality attribute has been filled-in with values that do not represent the nationality,

in fact, as we saw in the cleansing phase, testers misunderstood the attribute Country and input values about their region or city.

4) The Language attribute has been filled-in with completely different values even when representing the same values (see ENG and English in the screenshot above) and needed to be recategorized in a set of unique categories.

The final interface for Demographics selection after correcting the database appears as follows:

Picture 6: Final dashboard for crowd selection

One by one the four pains we reported above were solved during the cleansing and integrating phase of the database and the final solution utilizes the new back-end implementation to create a useful interface for the customer (and for internal processes as well):

1) The Geographic distribution of testers follows now a selection per areas (ovest, Nord-est, Centro, Sud, Isole). The treemap graph shows as bigger the areas where the density of testers is higher and uses different colours to denote different categories. The high number of blank fields depends on the fact that a huge number of testers did not indicate his city and the allocation in the Geographic area was not possible in the database when working on the back-end implementation. When testers will be pushed to input the missing value for city, the blank field will become smaller and smaller; in the next future, the problem will be solved through the new feeding techniques we established.

(33)

29 2) The slicer on the bottom of the dashboard is now not asking anymore for the selection of the

city; by adding to the database a new table with regions associated to Italian cities, it is now possible to create on the dashboard a slicer that lets the customer select the one of the Italian regions.

3) As we can see from the new dashboard, the nationalities are now correctly available to be selected; we already showed how we grouped the nationalities, here is just to show the clearness of the new solution with the slicer allowing the selection.

4) The list of available languages is now made by unique and non-repeated values.

Devices ‘characteristics

Talking about the devices’ characteristics we worked on the database by correcting and aggregating all the values regarding the attributes Manufacturer (in the dashboard below referred as Device brand) and Os (in the dashboard below referred as Device Os).

We did not add any attribute in the device section. The dashboard created for the selection of the devices’ characteristics will focus on the possibility to offer the user the possibility to select the type of device (Smartphone, PC, Tablet), the Brand and the OS.

Below is an initial dashboard with the initial dirty database we set-up before the actual correction:

Picture 7: Initial dashboard for device selection

It is clear how values of Device brand like “AsRock” and “Assembled” are repeated with different words but representing the same value. As well under the Device OS attribute, the values

(34)

30 “Android”, “Android OS” ecc. stand for the same value. They have also been aggregated. All the other values wrongly typed were corrected and the final dashboard results as follows:

Picture 8: Final dashboard for device selection

There is a unique value for each category of the existing Device OSs and a unique value for all the existing Device brands on the DB.

As an example mentioned above: in the Device brand slicer, “Asrock” and “Assembled” attributes now have a unique category not duplicated; in the Device OS slicer, “Android” attribute is unique and not repeated in different ways. As well, all the other values are unique.

Final dashboard for crowd selection

The final results aggregating both the selection of the Devices’ and Demographics’ characteristics is in the following picture:

(35)

31 Picture 9: Final dashboard for crowd and device selection

Here the link to the interactive dashboard.

Just to close the loop, I show below how the initial dashboard would appear without any activity of cleansing and integration of the database:

Picture 10: Initial dashboard for crowd and device selection Here the link to the interactive dashboard.

(36)

32 The pre-test dashboard of crowd selection can be concluded. Let us exploit the clean and integrate database to create a dashboard, not yet existing on the platform, for interacting with the results of testing campaigns with no bugs to submit.

3.5.2 Post-test results visualization

It will allow internal members of the company to read, understand and quickly get insights from past works, actual data and future integrations;

It will allow a customer to accede and inspect the statistics about a testing campaign without bugs; in fact, now, a campaign not asking testers to submit bugs (just asking for example to answer a questionnaire) is not directly integrated in the internal database.

To present the new service we are going to use the real results of a campaign of testing we recently conducted with a famous Italian player.

The campaign is under the category of “Apps-Benchmarking”: during this type of test a group of testers is asked to confront and evaluate the usability and the features of a set of applications belonging to the same category of apps. They utilize them as end-users and during the use, they are asked to follow a set of instruction and evaluate exactly what the customer wants to understand about the apps.

A series of questions to be answered leads the tester through the entire test. In this report the focus is not on the test itself but we need to understand the test to be able to create the best visual

interface that lets the Project Manager interact with data and allows the customer to visually analyse the outcome.

I created a no-brand version of the survey deleting any trace of the real apps references (just calling them App A,B,C,D. In this way, the dashboard showing the results can be shared with anyone without any privacy concern.

Our customer asked us to compare four food delivery apps in three different cities Turin, Milan and Rome. In Rome only three apps are available on the market, in Turin two, in Milan four. The list of questions is exactly the same for all the testers but obviously in Milan they are testing four Apps, in Rome three and in Turin two.

Testers were required to pass through an initial familiarization with the applications before actually going through the real analysis and a first page of questions asks them to analyse general

information like the number of restaurants, the variety of the menu ecc.

Obviously, they are objective information that can be found on the applications but the customer was interested in understanding the level of clearness of the app for the end users. Even if 150 types of cuisine are available but users are not able to find them because of the complexity of the menu, the app will result in low functionality.

Here is the list of questions they were made in Milan but they were the same for Rome and Turin:

Demo - Apps Benchmarking Indagine comparativa - Milano

• Tester ID

Il tuo tester ID lo puoi trovare riportato nella pagina del manuale • Da quanti ristoranti della tua città è possibile ordinare, tramite le diverse applicazioni? *

(37)

33

C A B D

# Ristoranti

• Qaunti tipi di cucina diversi sono presenti sulle diverse applicazioni (es: messicano, thailandese, italiano, ...)? *

C A B D

# Tipi di cucine

• Con quali criteri è possibile ricercare un ristorante sulle app da te testate?

C A B D Distanza Tipo di cucina Fascia di prezzo Ordine Alfabetico Offerte disponibili Rating Costi di consegna Preferiti

• E' stato possibile utilizzare anche altri filtri di ricerca? • C

• A

• B

• D

After the first phase of general usage of the Applications we asked testers to make an order on the application to get insights about their journey when selecting dishes, buying and expressing their opinion about delivery, tastes and variety of choice.

(38)

34 Each tester had the possibility to order from each and all the applications; the survey was exactly the same so we just present the list of questions for the appA, the same would be for the others. Here is the part of the survey related to the phase of purchase:

Ordine App A

Dopo aver effettuato l'ordine da App A, compila il questionario sottostante.

• Tester ID

Il tuo tester ID lo puoi trovare riportato nella pagina del manuale • E-mail *

Inserisci l'email che utilizzi per il login su crowd.app-quality.com • In che città hai eseguitol'ordinazione? *

Milano Torino Roma

•

Caratteristiche dell'app

• Versione dell'app che stai testando *

Se non sai come trovarla, leggi le istruzione nel sottomenù "Domande frequenti" del tuo manuale.

• Quali delle seguenti funzionalità sono previste dall'app? *

Consultazione archivio ordini precedenti con possibilità di riordino Possibilità di lasciare e consultare recensioni e feedback

Possibilità di visionare lo stato dell'ordine (acquisto, preparazione, consegna…) Possibilità di salvare i ristoranti preferiti per trovarli rapidamente

• Esprimi un giudizio sulla chiarezza dei menù (descrizione dei piatti, foto, prezzi) *

1 2 3 4 5

(39)

35 • Esprimi un giudizio sulla semplicità di utilizzo dell'interfaccia *

1 2 3 4 5

Pessimo Semplice

Since it is a testing campaign with no bugs to be submitted, the survey to fill-in was created on a third party service and not directly on the platform. In the previous section, we described and showed the Entity-relationship diagram directly linking a file excel (output of this kind of test without bugs) to our database of testers. Starting from this new integrated database:

- I created calculated columns and formulas to show the results I considered important in relation to the testing campaign. We will directly see the results in the interactive webpage on the customer dashboard; in fact, I consider useless and redundant explaining the

calculation I did when they are actually self explanatory in the interactive report as a title of each section on the top of the interactive graphs –for example, the average app rating, will be the average of the score given by each tester to the app);

- On the Power BI software, there are some visuals already available and other custom visuals to be imported. Here is the list of the visuals I decided to use for the web page:

Clustered bar chart; Gauge; Stars; Card; Treemap; Funnel

(again, I do not consider valuable writing down a theoretical description of the diagram utility since the interactive webpage clearly shows their functionality);

Here is the final dashboard we are going to describe.

The welcome page of the interactive report allows the user to have an overall picture of the different sections of the report he is going to navigate.

In this page he can also download an excel file with the aggregated results of the survey and a document with the instruction and a link to the webpage on our platform explaining the testers the steps to go through while conducting the test as well as the web page with the questionnaires to be filled.

The sections of the interactive webpage created are seven, through them I summarized the results arose from the test: