A reference approach and an application profile to support the integration of chatbot systems with web information sources

(1)

Master’s degree in:

Computer Science and Engineering

A reference approach and an application profile to

support the integration of chatbot systems with

web information sources

Supervisor: Prof. Alfonso FUGGETTA

Master Thesis by:

Paolo CAPPELLO

Matr. n. 841469

(2)

Giunto a questo traguardo, sono felice di poter ringraziare tutti coloro che hanno reso possibile il raggiungimento di questo importante obiettivo personale.

In primo luogo ringrazio il professor Alfonso Fuggetta per la gentile disponibilit`a, includendo ovviamente anche Irene Celino e Marco Comerio per la loro preziosa guida e per la considerevole quantit`a di pazienza dimostrata nei miei confronti.

Ringrazio inoltre tutti i colleghi di Cefriel con i quali ho avuto il piacere di collab-orare per avermi permesso di svolgere un periodo di stage in un ambiente sereno, professionale e stimolante.

Un caloroso grazie va ai miei genitori per tutti i sacrifici che hanno compiuto, per avermi sostenuto nel corso degli anni e per aver supportato tutte le mie deci-sioni, anche le pi`u discutibili.

Un grande abbraccio lo riservo ad Andrea, Marco, Omar e Mattia per la forte amicizia che ci lega, per tutte le esperienze che abbiamo condiviso e per quelle che ancora condivideremo. Ringrazio infine tutti coloro che mi hanno accompagnato durante questo percorso, permettendomi di diventare la persona che sono oggi.

(3)

Chatbots are software conversational agents whose distribution across the market is predicted to grow over the next years. Those software agents simulate an “in-telligent” conversation with final users by exploiting human natural language and enable the accomplishment of tasks across the web, e.g. reservation requests, pub-lic and private data retrieval. The information consumed by chatbots is generally published and maintained by stakeholders within datasets acting as web informa-tion sources. The thesis proposes BotDCAT-AP, an applicainforma-tion profile composed of a set of best practices and a vocabulary enabling the creation of expressive in-formation sources descriptions to be used along the chatbot development process. BotDCAT-AP enables a reference approach that reduces the integration effort of web information sources within chatbot applications, and increases datasets discoverability, reuse and sharing within developers’ community. Moreover, the thesis proposes BotDCAT-AP Interpreter, a software application that supports the usage of information sources descriptions defined according to the application profile. Finally, BotDCAT-AP enhances the chatbot reference architecture with a new component managing the access to information sources for the formulation of appropriate answers to users’ requests. A qualitative evaluation of the the-sis contribution is given from the comparison of two different chatbot solutions: TestBot E015 and Talkin’Piazza Bot. The former application is built using an existing framework, while the latter is developed exploiting the proposed applica-tion profile, the Interpreter and the enhanced chatbot reference architecture. The comparison of the two implementation experiences highlights the advantages of

(4)

(5)

I chatbot sono agenti software conversazionali, la cui adozione sul mercato è prevista in continuo aumento nei prossimi anni. Il loro principale obiettivo è di coinvolgere gli utenti finali con conversazioni in linguaggio naturale, creando l’impressione di dialogare con delle entità “intelligenti”. Le conversazioni sono finalizzate al corretto svolgimento di determinate attività utili per l’utente finale, come la gestione delle prenotazioni o la richiesta di dati dal settore pubblico e privato. In genere, le informazioni desiderate dell’utente e restituite dai chatbot sono rese disponibili e mantenute da attori esterni attraverso dataset pubblicati sul web; i chatbot, quindi, accedono a queste sorgenti informative per soddis-fare le richieste utente. Questa tesi propone BotDCAT-AP, un application profile costituito da un insieme di linee guida e da un vocabolario, con cui creare delle descrizioni esplicite delle sorgenti informative al fine di facilitare lo sviluppo di applicazioni chatbot che sfruttano tali dataset. In particolare, BotDCAT-AP stabilisce un approccio di riferimento per agevolare l’integrazione delle sorgenti disponibili sul web all’interno dei chatbot, promuovendo la visibilità, il riutilizzo e la condivisione dei dataset tra gli sviluppatori di questo genere di applicazioni. In aggiunta, la tesi propone una soluzione software, chiamata BotDCAT-AP In-terpreter, che supporta gli sviluppatori nell’utilizzo delle sorgenti informative descritte tramite l’application profile proposto. Infine, BotDCAT-AP migliora l’architettura generale dei chatbot perché identifica e isola il componente chiave nella gestione dell’accesso alle sorgenti informative, permettendo l’appropriata for-mulazione delle risposte agli utenti. Una valutazione qualitativa del contributo

(6)

l’utilizzo di un framework esistente, il secondo realizzato con il supporto aggiun-tivo dell’application profile proposto. Questo confronto permette di evidenziare i vantaggi derivanti dall’adozione delle soluzioni progettate durante il lavoro svolto.

Parole chiave: chatbot, agente conversazionale, application profile, dataset, vo-cabolario

(7)

1 Introduction 1

1.1 Motivation . . . 1

1.2 Innovative contribution . . . 2

1.3 Outline . . . 3

2 State of the art 5 2.1 Chatbots . . . 5

2.1.1 Evolution of chatbots from the Turing test to Siri . . . 5

2.1.2 Natural Language Processing (NLP) techniques for chatbots 9 2.1.3 A classification for chatbot systems . . . 12

2.1.4 General architecture of chatbots . . . 15

2.2 Information sources . . . 17

2.2.1 Descriptions of sources exposed as web services . . . 18

2.2.2 Descriptions of datasets published over the web . . . 24

3 Problem setting 34 3.1 Building a chatbot with existing frameworks . . . 34

3.1.1 Microsoft Bot Framework . . . 35

3.1.2 E015 digital ecosystem . . . 38

3.2 Use case description . . . 39

3.3 Limitations . . . 44

(8)

4.1.1 Requirements . . . 49

4.1.2 Best practices . . . 53

4.2 Reference architecture . . . 62

4.3 BotDCAT-AP Vocabulary . . . 64

4.4 Adopting the reference architecture . . . 69

4.4.1 Design-time . . . 69

4.4.2 Run-time . . . 70

5 Results 73 5.1 Use case scenario . . . 73

5.2 Implementation . . . 76

5.2.1 Information sources descriptions . . . 76

5.2.2 BotDCAT-AP Interpreter . . . 79

5.2.3 Chatbot development and architecture . . . 84

5.3 Advantages . . . 85

6 Conclusions 89

Bibliography 92

(9)

2.1 The Turing test (adapted from [43]) . . . 7

2.2 A sample of conversation with ELIZA chatbot (adapted from [60]) 8 2.3 The proposed chatbot classification . . . 13

2.4 The proposed general architecture of chatbots . . . 15

2.5 W3C Data on the Web Best Practices context (source [11]) . . . . 25

2.6 Data Catalogue vocabulary class diagram (source [19]) . . . 28

2.7 DCAT Application Profile class diagram (source [44]) . . . 31

3.1 Architecture of a chatbot developed with the Microsoft Bot Frame-work . . . 37

3.2 Architecture of a chatbot exploiting E015 services . . . 39

3.3 TestBot E015 basic interactions . . . 40

3.4 TestBot E015 events search . . . 41

3.5 TestBot E015 directions search . . . 42

3.6 TestBot E015 general architecture . . . 43

4.1 Use case diagram of chatbot actors . . . 49

4.2 Proposed high-level architecture . . . 63

4.3 BotDCAT-AP simplified UML Class Diagram . . . 65

4.4 BotDCAT-AP complete UML Class diagram . . . 68

4.5 Design-time sequence diagram . . . 70

4.6 Run-time sequence diagram . . . 71

5.1 Talkin’Piazza functionalities . . . 74 VIII

(10)

5.4 “OpenStreetMap” BotDCAT-AP description . . . 80

5.5 BotDCAT-AP Interpreter landing page . . . 81

5.6 BotDCAT-AP Interpreter functionalities . . . 83

(11)

2.1 Data on the web challenges and benefits . . . 27

4.1 BotDCAT-AP defining classes . . . 66

4.2 BotDCAT-AP defining properties . . . 67

5.1 Advantages of our solution . . . 85

(12)

Introduction

Chatbots are software programs capable of simulating a conversation between hu-mans and computers. The main goal of those applications is providing suitable answers, possibly carrying out specific actions, based on the context of conversa-tions and users’ intenconversa-tions. Chatbots usually operate on standard communication channels, including the majority of chat and instant message (IM) applications on the Internet, but also on customised channels or infrastructures (e.g., websites, mobile applications) where text and audio messages, pictures, video or any other content is transmitted. More specifically, chatbots can be seen as “intelligent” information retrieval systems that access relevant datasets available on the web in order to retrieve useful data for the formulation of appropriate answers to users’ requests (e.g., weather forecast, reservations, public and private sectors informa-tion). Datasets are published and mainly maintained by external stakeholders that generally offer their web information sources as services enabling their usage for multiple software applications, including chatbots.

1.1 Motivation

Chatbots represent one of the major rising trends and their usage and distribution are predicted to grow over the next years. Gartner places chatbot systems into the top strategic technology trends for 2017 [2], evolving and expanding the use of

(13)

Artificial Intelligence and Machine Learning in apps and services during the next 20 years. A 2016 survey conducted by Oracle [3] reports that 80% of respondents already use or plan to employ chatbots by 2020. This is because chatbots open up to new ways and opportunities of saving time and resources. A 2015 research directed by McKinsey [1] states that 29% of customer service positions in the US could be automated by means of chatbots, resulting in 23B$ in annual savings.

With this growing demand and market potential, the need arises to simplify and standardize the development of such applications. As a matter of fact, even when relying on existing development frameworks, great effort is required to create the set of components that connect chatbots to web information sources. This thesis aims at proposing a reference approach for reducing the effort required by the development of those components and, at the same time, improving the sharing and reuse of the web information sources.

1.2 Innovative contribution

The thesis starts with an analysis of the different chatbot implementation method-ologies, particularly focusing on those systems exploiting modern natural language understanding components. This step leads to the identification of a common model representing the general chatbot reference architecture. This model is used as a foundation for the creation of a chatbot (named TestBot E015) providing in-formation about events and public transport. The creation of such an application aims at identifying current requirements and limitations of the chatbot develop-ment process. What emerges is that the major issues are related to the integration of web information sources, containing users’ relevant data (e.g., public events and transport schedule), with chatbot internal components. Examples of such issues are (i) the discovery of relevant external datasets and their access methods and (ii) the training and configuration of the chatbot for accessing the proper dataset at run-time. As a matter of fact, while literature presents several standards to

(14)

de-scribe datasets as information sources, at the best of our knowledge there exist no solution that supports the description of those information sources for facilitating their integration into chatbot applications.

To overcome the limitations encountered during the creation of TestBot E015 and to compensate for this lack in literature, the thesis proposes an application profile named BotDCAT-AP to support the integration of chatbot systems with web information sources (e.g., datasets). The solution comprehends a set of best practices and a vocabulary describing terms and relations to be used along with the descriptions of information sources. Specifically, the vocabulary is built fol-lowing DCAT-AP [44] principles of reusing standard vocabularies and avoiding the inclusion of domain specific terms and properties.

The application profile development leads to the creation of a specific software program (i.e., the BotDCAT-AP Interpreter) exploiting the information sources descriptions built using the proposed vocabulary, and the identification of an inter-nal chatbot component (i.e., the Wrapper) in charge of accessing data contained within those external sources.

Finally, the application profile and the previously mentioned components are used to develop a second example of chatbot deployed within an European project, named Piazza1. The comparison between this chatbot and TestBot E015 is used to highlight the advantages derived from the adoption of the proposed innovative contribution and to remark the resolution of previously mentioned limitations.

1.3 Outline

The thesis is organized as follows:

• Chapter 2 contains an overview of the main thematic areas of this thesis: chatbots and information sources. The former describes the evolution of chatbot systems and their characteristics. The latter concerns the analysis of

(15)

available standards for the description of web services and datasets published over the web and constitute the starting point for the development of the proposed solution.

• Chapter 3 describes the realization of a chatbot system with the support of an existing framework. This activity enables the identification of current limitations encountered along the chatbot development process.

• Chapter 4 presents the innovative contribution of this thesis. The work starts with the identification of requirements and best practices for chatbot development. The obtained results are then used as basis for the realiza-tion of an applicarealiza-tion profile describing data sources accessed by chatbots. Finally, the chapter ends with the description of additional software compo-nents exploiting the application profile and the advantages derived by their adoption.

• Chapter 5 illustrates the development of a chatbot application with the support of the innovative contribution described in Chapter 4.

• Chapter 6 summarizes the conclusions of this thesis and proposes future extensions.

(16)

State of the art

This chapter starts (Section 2.1) with the state of the art for chatbot systems pro-viding a brief introduction to their history, their classification and general archi-tecture. Then, in Section 2.2, current standards for the description of information sources and their access methods are described.

2.1 Chatbots

As mentioned in Chapter 1, chatbots are software applications aiming at simulat-ing conversations with humans in natural language and propossimulat-ing suitable answers based on the context. The final purpose of this simulation is changed over the years and has assumed different values. In the following, this chapter proposes an introduction to the history of chatbots in order to clarify the evolution of chatbots role and underlying technologies.

2.1.1 Evolution of chatbots from the Turing test to Siri

The evolution of chatbots is strictly related to advancements in computer science. These software applications were theorized by Alan Turing in 1950 [56] when for the first time and in relation with the newborn computers, the existence of intelligent machines capable of interacting with external inputs and, at the same time, acting as a human being was hypothesized.

(17)

Along with his work, Turing laid the foundations for a test meant to evaluate the thinking capability and the effective intelligence of a machine by judging its ability to simulate a human behaviour. The Turing test is based on another ex-periment named the Imitation Game.

The Imitation Game is composed by (i) a woman, providing answers to incoming questions; (ii) a man, also answering to incoming questions, but pretending to be a woman, aiming at fooling the third participant in believing he is of the opposite sex; (iii) an interrogator, asking questions to the previous two with the objective of telling the man and the woman apart.

The interrogator is placed in front of apparatus similar to modern keyboards and computer monitors and, through a series of questions in text-based natural language, attempts to tell the man from the woman. The interaction takes place by means of messages exchanged through the provided equipment only and with-out the knowledge, from the participant point of view, of the actual sex of the two other individuals.

Turing’s proposal consists in substituting one human participant with a ma-chine (i.e., a chatbot) driven by the same intent of the previous individual. The Turing test (depicted in Figure 2.1) therefore consists of (i) a human, providing answers to incoming questions; (ii) a computer, aiming at fooling the third par-ticipant; (iii) an interrogator, having the objective of telling the human and the computer apart.

The test takes place under the same conditions of the Imitation Game. If the interrogator is not able to properly understand which of the two is the human, from Turing’s prospective, the computer proved to exhibit intelligent behaviour that is indistinguishable from that of a human, and as a consequence can be considered an intelligent machine.

In 1991 Hugh Loebner founded a competition called Loebner Prize1 _{with the}

(18)

Figure 2.1: The Turing test (adapted from [43])

purpose of encouraging and promoting the creation of chatbots capable of passing the Turing test. Nowadays, this annual competition still takes place and offers valid examples of state of the art chatbots.

A first example of chatbot is represented by ELIZA [61], developed by Joseph Weizenbaum between 1964 and 1966 at the Massachusetts Institute of Technol-ogy. The most famous script based on this chatbot was created to provide an emulation of a Rogerian psychotherapist. This choice was initially preferred be-cause a psychiatric interview allows the chatbot to act as an individual having no knowledge about the external world and to demonstrate no personality. The same approach was used by the first winners of the Loebner Prize competition [43].

ELIZA functionality is based on the concept of keyword recognition; when the chatbot receives a message, the input is inspected for the presence of selected key-words. If one of such words is found, the input sentence is transformed according to a specific rule. On the other hand, if the result of this inspection is inconclusive, some other rule, occurred previously during the conversation, or a standard trans-formation is triggered. An example of conversation with this chatbot is contained in Figure 2.2.

(19)

The scope of ELIZA and of all the other chatbots developed till the end of the nineties (e.g., Parry2_{, Jabberwacky}3_{, ALICE}4_{) is to strictly emulate a predefined}

human behaviour in order to trick the conversational partner into thinking that there are no computer programs involved in the conversation.

Modern chatbots goal has changed for the last years. Apple Siri5_{, Google}

As-sistant6 and Microsoft Cortana7 are examples of chatbots aiming at assisting the users in the execution of different tasks and activities. As such, chatbots are no more employed to “fool” users, who are perfectly aware that they are not talking to a person. The objective is to provide useful information through a conversation, as closer as possible to a natural dialogue.

Figure 2.2: A sample of conversation with ELIZA chatbot (adapted from [60]) We provide two examples to better clarify the concept of tasks. AzureBot8 _{is a}

chatbot, created by Microsoft, to increase users productivity over Azure services,

2_{https://www.botlibre.com/browse?id=857177} 3_{http://www.jabberwacky.com/} 4_{http://alice.pandorabots.com/} 5_{https://www.apple.com/ios/siri/} 6_{https://assistant.google.com/} 7_{https://www.microsoft.com/en-us/windows/cortana} 8_{https://microsoft.github.io/AzureBot/}

(20)

that enables the interaction with the resources deployed over users’ subscriptions by means of natural language. Herzi9 _{can connect public events with transport}

information conversing directly with users through a chat service.

2.1.2 Natural Language Processing (NLP) techniques for

chatbots

There are several mechanisms that allow chatbots to process and interpret text-based inputs sent by the users, generically called Natural Language Processing (NLP) methodologies. Common rule-based approaches are particularly appropri-ate for chatbots built for succeeding in the Turing test, attempting to simulappropri-ate humans’ behaviour and providing the illusion of talking directly to a real per-son. However, a different approach based on the concept of intent recognition is more suitable for modern chatbots that are focussed on the completion of specific task and the provision of useful information to users. In the following, the two approaches are illustrated.

Rule-based approach

A first approach to develop chatbots involves the creation and utilization of com-ponents exploiting the definitions of rules that are triggered once the input is processed. ELIZA represents the first example of a chatbot implemented by fol-lowing this paradigm. The general procedure takes into account the presence of specific keywords within the input sentences and triggers the associated rules. The fundamental technical issues managed by ELIZA are as follows:

• Identifying the most important keywords within the user input and their context;

• Selecting the appropriate rules based on the keywords or selecting the ap-propriate behaviour when no keyword is identified;

9

(21)

• Providing an extensible solution to enable adding more scripts with further rules.

ELIZA’s scripts are composed by the list of recognizable keywords and the set of associated rules, also known as transformations due to the fact that they ma-nipulate the input sentences to create appropriate answers. The program runs executing the scripts that all together characterize the behaviour of the chatbot. ELIZA is not restricted to a specific behaviour only or a single language but the NLP system can be extended and modified even by other users by editing or adding more scripts.

Starting from ELIZA, adaptations and improvements were proposed. A fa-mous example is represented by Artificial Linguistic Internet Computer Entity (ALICE) [60] developed by Richard Wallace and winner of the Loebner Prize competition in 2000, 2001 and 2004. ALICE makes use of the general archi-tecture previously adopted by ELIZA, extending the triggering rules in number and complexity. Rules are stored in specific files written in Artificial Intelligent Mark up Language (AIML [10]), a derivative of XML specifically built for chat-bot development, which still now represents a standard for the implementation of rule-based NLP systems.

Intent recognition approach

Intent recognition represents a modern and alternative solution to the previous methodology. The basic assumption underlying this approach is that every sen-tence in natural language coming from the users can be associated to a label symbolizing the intention of human interlocutors to complete specific tasks or actions.

Chatbots that are implemented following this paradigm make use of compo-nents named Natural Language Understanding (NLU) engines [29], in charge of translating users’ input into machine understandable actions. More specifically,

(22)

those engines associate to every single input (utterance) an intent, a label rep-resenting what users wish to accomplish using the chatbot, and entities, domain specific information extracted from the utterance when present. The association of intents and the extraction of entities is accomplished by means of machine learning (ML) algorithms (e.g., logistic regression, conditional random fields). To facilitate the development of chatbots, many companies (e.g., Microsoft LUIS10_,

Google API.ai11, Facebook Wit.ai12) are offering NLU engines as Software-as-a-Service (SaaS), or more precisely as AI-as-Software-as-a-Service, integrable with chatbot systems using REST API endpoints [29].

In order to properly work, NLU engines need to be trained at design-time so that are able to identify intents at run-time. Intent training is computed over a set of input examples provided by the chatbot developer. Each sentence of the training set is then manually assigned to an intent, representing the meaning of the action to be triggered. The set of intents recognized by the chatbot is usually identified from the project requirements analysis.

Entities are used to identify the possible parameters that are required by an action. Standard entities (e.g., dates, names, locations, numbers) are generally automatically extracted by NLU engines without the necessity of prior training. On the contrary, customised entities (i.e., ad-hoc keywords that need to be identi-fied in the utterance) require a different treatment. First, entities are structured in a taxonomy, by the chatbot developer. Then, those entities are highlighted within the utterances of the training set previously used to predict the intents. Finally, NLU engine specific training is started. This process enable the generation of a prediction model for intents and entities.

Through the completion of the previous steps, NLU engines are trained to iden-tify the intent and extract the entities of new and unseen utterances at run-time,

10 https://docs.microsoft.com/en-us/azure/cognitive-services/luis/home 11 https://docs.api.ai/docs 12 https://wit.ai/docs

(23)

and to attribute a score representing the level of confidence of the prediction. The information computed by NLU engines is used to guide the chatbots in selecting the action to perform accordingly.

With the intention of better introducing the concepts of intent and entity, let’s consider a chatbot providing weather forecast. This chatbot can interpret limited inputs such as “tell me the weather in Milan”, “what are the weather forecasts for tomorrow?”, “and for the weekend?”. All these utterances are associated by a trained NLU engine to a hypothetical intent called “get weather”, representing the intention contained in the sentences above. Moreover, the engine extrapolates “Milan”, “tomorrow” and “weekend” as entities. Once the chatbot receives the intent “get weather” from the NLU engine, it is able to identify the action to carry out (i.e., querying the weather forecast data source using location and dates as parameters).

2.1.3 A classification for chatbot systems

Chatbots are also known as chatterbots, bots, machine conversation system, vir-tual agent [50]. A standard classification for them has not been proposed in liter-ature yet. Generally, chatbots are in close relation with the concept of “agent”, defined “a component of software and/or hardware which is capable of acting exactingly in order to accomplish tasks on behalf of its user”[36]; some of agent derivation such as Software Agents (SA), Virtual Agents (VA) and Intelligent Personal Assistants (IPA) can also be related. Software Agents appear to be the closest reference to chatbots in literature, because they share common charac-teristics: reactive, pro-active and goal-oriented, deliberative, continual, adaptive, communicative and mobile [49].

In this section we propose the classification represented in Figure 2.3, based on the analysis of chatbot examples currently and publicly available over the web, focusing on their domain and their level of understanding.

(24)

Command-based bots

Command-based bots are chatbots that rely on predefined commands. There is no NLP involved in this solutions as they do not possess any knowledge composed by predefined rules and they do not exploit the capabilities of NLU engines. How-ever, this kind of chatbots has its advantages, such as the simplicity of system implementation resulting in reduced time-to-market (TTM), and the certainty of a correct, although limited, interpretation of users’ inputs. The domain of Command-based bots is limited and composed by the sets of interpretable com-mands. Additionally, their level of understanding of the users’ input is practically non-existent.

Figure 2.3: The proposed chatbot classification

TrackBot13_{is an example of Command-based bot currently deployed over}

Tele-gram messenger and its objective consists in keeping track of users’ shipments throughout different carriers. Inputs handling is limited to specific commands activated by single keywords and proposed to the users by means of a collections of buttons (e.g., “\traccia” to track a shipment, “\corrieri” to retrieve the list of carriers available).

(25)

Standard bots

Standard bots represent the most common category of chatbots available on the market. Their strength is the understanding of a defined domain of utterances proposed in natural language through several NLP techniques. These chatbots are able to process inputs in text format but also voice commands, making use of technologies like speech-to-text to parse the request and text-to-speech to propose a vocal response. However, Standard bots range of action is limited to specialised domains (e.g., weather forecasts, reservations, public transport information, FAQ based Q&A sessions) and their implementation does not involve the understanding of inputs that fall outside their specific purpose.

For instance, Poncho14_{is a Standard bot operating on different communication}

channels (i.e., Facebook Messenger, Kik, Viber, Slack and mobile apps) in charge of providing weather forecast for worldwide locations. NLP techniques used by Poncho allow the comprehension of utterances mainly related to weather, anything else is not recognized and standard messages (e.g., “So, I’m good at talking about the weather. Other stuff, not so good. If you need help just enter help.”) are returned to users in attempt to redirect the conversation towards the right domain. AI machines

AI machines are an evolution of Standard bots, from which they inherit all the characteristics, except from the fact that they are not limited in domain. This last category of conversational agents is capable of answering to queries related to multiple subjects and areas of competence. Obviously, their level of complexity and reliability is much more higher than what Standard bots are able to accom-plish and their NLP systems are far more sophisticated than the ones exploiting simple rule-based or intent recognition approaches.

Apple Siri, Google Assistant, Microsoft Cortana and more generally all agents

(26)

classified as Intelligent Personal Assistant (IPA) and QA Computer System (e.g., IBM Watson15_{) compose this category of chatbots.}

2.1.4 General architecture of chatbots

Chatbots architecture is far from being uniquely defined in computer science lit-erature. Some approaches were proposed for the integration of chatbots in IoT [29] and Q&A [46] systems.

Basing on the analysis conducted to identify a classification for chatbot sys-tems, we propose a general architecture (depicted in Figure 2.4) containing an high-level view of all common elements identified during our research. Due to the fact that Standard Bots represent the majority of chatbots currently available, we structured our architecture to describe this category of conversational agents only.

Figure 2.4: The proposed general architecture of chatbots

(27)

Channels

Channels represent the connectors between the users and the chatbot applica-tion. The main role of this components is to transfer messages containing text, audio, images and other conversational data (e.g., information about the sender of the message) from an interlocutor to the other. Channels can occasionally sup-port also group conversations toward the same chatbot to enhance conversational experience.

Many chat providers (e.g., Telegram, Skype, Facebook Messenger, Slack, Skype for Business, WeChat and many others) are offering solutions to support and fa-cilitate the integration of their communication channels into chatbots. Usually these environments provide REST API access points to exchange messages, or other data like contact relation updates.

Chatbot application

Chatbot application is the heart of the entire architecture. Inputs received by the users through channels are processed by the application and transferred to the component managing NLP. In Figure 2.4 this element is represented by an external component named NLU Engine, so to describe, with our schema, the currently most common category of chatbots based on intent recognition. Important data, such as the conversation state and user information, is cached within the chatbot and used, together with the output of NLU engines, to identify the action to be performed and access external data sources.

NLU Engine

NLU Engine is the part of the system processing and converting natural language utterances into machine readable data so that it can be consumed by the main chatbot application. Due to the fact that these components exploit Machine Learning algorithms that are usually complex and time-consuming to implement,

(28)

NLU engines today come mostly as external SaaS solutions. However, research is still very active in this field.

Datasets

Datasets are external information sources that are accessed by the chatbot ap-plication to retrieve information and collections of data requested by the users. Datasets serve as knowledge bases; they are queried every time users demand to perform an action (e.g., place a reservation) or simply to be informed or notified of some events (e.g., traffic and weather information updates).

Today, chatbots following the architectural structure shown in Figure 2.4 can be implemented making use of pre-existing frameworks (e.g., Microsoft Bot Frame-work16 _{and IBM Watson Conversation}17_{) to remarkably speed up all the}

deploy-ment steps. The most noticeable advantages brought by adoption of these frame-works are (i) the auto-configuration and the possibility to deploy chatbots over most of the available communication channels at the same time and (ii) the pres-ence of NLU engines and SDK to ease the configuration and the interfacing of those components with the chatbot application. Still, developing chatbots with the aforementioned frameworks require time and effort to code the required soft-ware parts; we will shown an example in Chapter 3.

2.2 Information sources

An important part of the deployment and implementation of chatbots is reserved to the integration of external information sources. In Section 2.1 we pointed out a significant difference between chatbots whose purpose is to simulate a human behaviour in order to pass the Turing test, and other solutions more task-oriented aiming at providing a service for final users. Chatbots implementing this former

16_{https://dev.botframework.com/}

(29)

concept, representing the majority of solutions available on the web, are char-acterized by the access to external and heterogeneous datasets. For instance, a hypothetical chatbot retrieving information about weather most probably would have the necessity to query a dataset containing data about weather forecast; an-other chatbot in charge of proposing public events to the citizens of a city would have to access a series of external data sources containing information about those events.

Datasets access is usually accomplished by means of web services invocations; therefore, in the following, a description of different languages to describe both web services and datasets is provided.

2.2.1 Descriptions of sources exposed as web services

Web services are software entities that can be discovered and invoked by other software systems. In our context, web services compose the access layer that lies between chatbots and datasets; this section provides an insight into state of the art languages, techniques, reference and standardization attempts relevant for the description of web service properties. The analysis starts with languages for the syntactic description of web services functionalities. Then, an overview of semantic solutions for web service descriptions is provided. Finally, innovative approaches to enhance functional descriptions with non-functional properties specification are detailed.

Syntactic web service descriptions

A large set of languages has been proposed for the description of the functionalities provided by Web services.

Concerning SOAP-based Web Services [16], the Web Service Description Lan-guage (WSDL) is the adopted standard lanLan-guage. It is an XML-based provid-ing a formal and computer-readable description of SOAP Web Services. WSDL enables the functional description of interfaces for software components

(30)

imple-mented in different languages by dealing with (i) interface information describing publicly available operations; (ii) data type declarations for message requests and responses; (iii) binding information about the transport protocol; (iv) address information for locating the service.

The technology underlying RESTful web services [47] is an evolution of SOAP based on the primary Hyper-Text Transfer Protocol (HTTP) methods (i.e., POST, GET, PUT, DELETE) that does not adopt WSDL. Despite the fact that in literature a common agreement still needs to be reached [42], several description languages were proposed to be adopted for RESTFul Web Service descriptions, such as WADL [24] (a REST adaptation of WSDL), OpenAPI18_{(previously known}

as Swagger), RAML19, API Blueprint20 and I/O Docs21.

WADL is designed to provide a machine readable protocol description for HTTP-based applications, and particularly for those exploiting Extensible Markup Language (XML). WADL describes web services by means of resource elements that are composed by a list of methods. Each method corresponds to a particular operation exposed by web services. Moreover, methods contain param elements representing the input parameters characterizing the request tag. HTTP response status codes are dealt within the response part of methods.

OpenAPI is part of an initiative focused on creating and promoting a language-agnostic API description format based on the Swagger framework22 _{that defines}

the set of files required to describe REST APIs. While WADL covers any possible API design at the cost of complexity, Swagger aims at providing a description covering more common design patterns resulting in a much more simpler solution. Descriptions are produced using the more human readable JSON language and contain information about URL path, HTTP method, parameters and HTTP

18_{https://www.openapis.org/} 19_{http://raml.org/}

20_{https://apiblueprint.org/}

21_{https://github.com/mashery/iodocs} 22_{http://swagger.io/}

(31)

responses status for every web service method contained in the description. RAML is a YAML-based [6] language for describing RESTful APIs as well as APIs that do not conform to REST restrictions. Similarly to Swagger implemen-tation, RAML enables the description of HTTP methods but provides also a way to represent outputs schema within method definitions.

API Blueprint is a documentation-oriented description language. Essentially, this language is composed by a set of semantic assumptions laid on top of the Markdown syntax (i.e., a plain text formatting syntax) used to describe a web API.

A last example of language aimed at describing APIs is I/O Docs. Through the definition of API resources, methods and parameter levels in JSON, I/O Docs generates an interactive documentation for the web services.

Semantic web service descriptions

A popular approach for describing functional properties of stateful and conver-sational web services is by means of Inputs, Outputs, Preconditions and Effects (IOPE) [57]. The IOPE approach is applicable to both SOAP-based and RESTful web services. Inputs and outputs represent the aspects of a service dealing with the information transformation occurring from the inputs consumed and the out-puts generates by the web service. Preconditions and effects manage respectively (i) the state-wise conditions that need to hold before service operations can be invoked and (ii) the state changes that will be applied after the invocation. The main advantage of this approach is that inputs and outputs descriptions allow service provider to uniquely associate method parameters through the semantics provided by data vocabularies. Additionally, descriptions of preconditions and effects are useful for web services whose behaviour depends on the state of the world, and whose execution may cause changes to it.

WSDL-S [4] is a solution based on IOPE model proposed by the W3C. Its attempt is to describe both the semantic part and the syntactic level, exposing

(32)

actual operations of Semantic Web Services in WSDL language. Semantic models (i.e., ontologies) are maintained outside the original WSDL document describing the syntatic level of the web service. Reference to semantic models is achieved by the usage of WSDL extensibility elements withing the WSDL document.

Variations of the IOPE model are employed within all of the existing Semantic Web Services (SWS) frameworks. The most prominent frameworks are OWL-S [34] and WSMO [48]. OWL-S is based on Inputs, Outputs, Preconditions and Results (IORP) model. Results are associated to specific preconditions under which they can occur and to the outputs (or effects) of their operations. WSMO is characterized by a more fine-grained definition of Preconditions, Assumptions, Postconditions and Effects in order to define state-wise constraints.

OWL-S and WSMO are proving to be much more complex than what service providers are willing to use to describe their web services. To overcome this prob-lem, the W3C proposes a recommendation called Semantic Annotation for WSDL (SAWSDL) [53]. SAWSDL simplifies the description of Semantic Web Sevices, establishing correspondence between tags in WSDL and concepts in arbitrary on-tologies. A similar approach, named Semantic Annotations for REST (SA-REST) [51], is implemented by the W3C Member Submission for RESTful solutions.

WSMO-Lite [20] is a lightweight version of WSMO ontology that means to be an evolution of SAWSDL. The original simplified tag annotations in SAWSDL are filled with concrete semantic service descriptions. In doing so, WSMO-Lite provides the bridge between WSDL, SAWSDL and (already existing) domain-specific ontologies (e.g., classification schemas, domain ontology models). WSMO-Lite is composed by two leves, namely Syntactic level and Semantic level. Web service functional descriptions are represented using WSDL in the former level, while in the latter as capabilities and/or functional classifications. Capabilities define preconditions, and effects. Classifications define the service functionality using some classification ontologies (i.e., a hierarchy of categories).

(33)

Another noticeable ontology presenting lightweight approaches to the seman-tic modelling of web services descriptions is Minimal Service Models23 _(MSM)

[30]. MSM is an RDF integration ontology capturing the common ground be-tween existing conceptual models for services. This solution is not intended to be another service model in addition to the ones composing the heterogeneity of this landscape; it is instead an integration model able to capture the core semantics and support the publication and discovery of web services. MSM is built upon existing vocabularies, including SAWSDL and WSMO-Lite.

Enhanced web service descriptions

The previous section presents languages for the description of functional proper-ties and technical aspects of automated web services. In order to enhance and complete web service descriptions business-orientation, co-creation, pricing, le-gal aspects, and security issues must also be considered. Moreover, web service functionalities must be captured in different layers, for different levels of concern. For instance, service providers must be able to see the detailed functionality of services aligned to organizational resources and objects accessed (as a white-box view). Intermediaries, on the other hand, should be limited to a less intimate view of functionality but sufficiently detailed so that they can configure third-party de-livery functionality (grey-box view). Finally, consumers would only see a view of the service focused on the interactions (black-box view).

The Unified Service Description Language (USDL) [37] and its enhancement Linked-USDL [12] based on Linked Data principles [7] are examples of languages supporting the production of enhanced web service descriptions. Both solutions are composed by different modules. The Functional Module is the one that allows to describe service functionalities at an abstract level, without entering in details about technical implementation. The elements contained in the Functional Mod-ule depict (i) the features that are offered by and (ii) the interfaces implementing

(34)

access to the service. Features are associated to functions, representing the course of action to be followed in order to generate service proposed value. The main purpose of interfaces is to capture all the different ways of accessing the service and map these onto the capabilities. Functions are elements that abstract service capabilities and may feature one or more input and output parameters, as well as one or more faults. Functions are characterized by pre-conditions and produces post-conditions (i.e., effects). The decomposition of functions into sub-functions supports different degrees of detail for different concerns related to providers, intermediaries and consumers.

A different approach to produce enhanced web service descriptions consists in complementing functional descriptions with a complementary documentation on non-functional properties (NFPs). Basically, while Functional Properties (FPs) characterize what a service does, NFPs describe how it does it. NFPs specify con-ceptual elements, such as quality of service (e.g., response time and availability), legal aspects (e.g., fair use and copyright), intellectual rights (e.g., allowing or denying composition), and business aspects (e.g., payments and taxes), that are parts of the agreement between a service provider and a service consumer.

This agreement takes the role of an understanding about the business trans-action of the two counterparts, and it is established by specifying (i) policies, (ii) service level agreements, (iii) licenses and (iv) contracts. Policies [28] establish relationships by exposing both the set of activities than an object must (or must not) perform (i.e., obligations), and the set of activities that an object is permitted (or prohibited) to perform (i.e., authorizations) on target objects. Service Level Agreements (SLAs) [31] describe the minimum performance criteria that providers are expected to follow in delivering their services. SLAs also contain information about corrective actions and penalties that occur when performance falls below the promised standard. Licenses [21] identify extents to which services can be used (i.e., permissions, commercial terms of use, warranties and indemnities), as well

(35)

as licensor limitations of liability in case of service failure. Service contracts [15] are more comprehensive documents stipulated between providers and consumers that contain contractual terms that potentially refer to policy properties, SLA terms and licence clauses.

The Italian Public Administration Services (IPAS) model [14, 38] is an exam-ple of model supporting the description of service contracts for public administra-tion services. Basically, in the IPAS model the service contract is composed by contractual terms that are defined on NFPs.

2.2.2 Descriptions of datasets published over the web

In Section 2.1.4 datasets are described as external information sources that are accessed by chatbots every time users ask for specific data. In an ideal world, chatbots should be able to automatically discover and access datasets published over the web to obtain the requested relevant data.

The W3C recently came up with a document, namely Data on the Web Best Practices (DWBP) [11], denoting a set of best practices for the publication and us-age of data over the web. The general concept is that data should be discoverable and understandable by both humans and machines (e.g., chatbots). More specif-ically, those best practices are meant to (i) facilitate interaction between dataset publishers and consumers; (ii) promote the reuse of data by providing guidance to dataset publishers and, at the same time, improving the consistency level in the way data is managed; (iii) encourage trust in the data from developers point of view. To achieve this result, the work makes use of the Data Quality Vocabulary [5] and the Dataset Usage Vocabulary [45], created by the same Working Group in charge of publishing the best practices.

Figure 2.5 depicts the context considered within the DWBP identification. Vo-cabularies are used to create publications that are acknowledged and used by the community of data publishers. Datasets described through DWBP are composed by one or more distributions that specify different physical representations of the

(36)

Figure 2.5: W3C Data on the Web Best Practices context (source [11])

data. Both datasets and distributions are characterized by metadata providing machine and possibly human readable descriptions.

Data on the web: benefits

Every best practice is associated to one or more benefits highlighting advantages to the way how datasets are published over the web. The complete list is the following:

• Comprehension (C): increases the level of understanding, from a human perspective, about data structure, meaning, metadata and the nature of the dataset;

• Processability (P): facilitates the automatic machines processability and manipulation of data contained in datasets;

• Discoverability (D): facilitates the automatic machines discoverability of datasets and their data;

• Reuse (R): augments the possibility of data reuse;

(37)

• Linkability (L): enables the creation of links between datasets and data items;

• Access (A): facilitates the access to up to date data in multiple formats for both humans and machines;

• Interoperability (I): creates agreement between data publishers and con-sumers.

Data on the web challenges

Starting from the requirements extracted from the most common use cases [33], the W3C Working Group [11] identified a complete set of challenges that need to be taken into account when publishing or consuming data on the web, summarized in Table 2.1.

Every challenge is associated to a summarizing question, the proposed list of best practices to overcome the challenge and the introduced benefits. For instance, the challenge “Data Access” consists in clearly defining how to provide access to data published over the web. Among the related best practices, there is explicit indication to provide real time access to up to date data in order to increase Reuse and Access. Bulk download and API are the most used solutions for data access. The latter solution is preferred over the former when dealing with large, frequently updated, or high complex datasets since it improves Processability and Interoper-ability. On the other hand, bulk download resolves situations where individually accessing data over many retrievals, to reassemble the complete dataset, leads to inconsistent results.

“Data Access”, “Metadata” and “Data Vocabularies” are the major challenges that developers have to face when describing datasets used within chatbot systems. Metadata provide both machine and human readable descriptions of datasets and are forged over the utilization of standard terms and popular vocabularies to increase interoperability. Next section proposes a vocabulary to deal with these

(38)

Table 2.1: Data on the web challenges and benefits

Challenge Answering Related best practices Benefits

Metadata How do I provide metadata for humans and machines?

- Provide metadata - Provide descriptive metadata - Provide structural metadata

R, C, D, P R, C, D R, C, P Data License How do I permit and restrict access? - Provide data license information R, T

Provenance and Quality How can I increase trust? - Provide data provenance information - Provide data quality information

R, C, T R, T

Data Versioning How can I track versions and version histories? - Provide a version indicator - Provide version history

R, T R, T

Data Identification How can I identify datasets and distributions?

- Use persistent URIs as identifiers of datasets - Use persistent URIs as identifiers within datasets - Assign URIs to dataset versions and series

R, L, D, I R, L, D, I R, D, T

Data Formats What data formats should I use?

- Use machine-readable, standardized formats - Use locale-neutral data representations - Provide data in multiple formats

R, P R, C R, P

Data Vocabularies How to improve data interoperability? - Reuse vocabularies, preferably standardized ones - Choose the right formalization level

R, P, C, T, I R, C, I

Data Access How can I provide access to data?

- Provide bulk download - Provide subsets for large datasets - Use content negotiation - Provide real time access - Provide data up to date

- Provide an explanation for data that’s not available - Make data available through an API

- Use Web Standards as the foundation of APIs - Provide complete documentation for your API - Avoid breaking changes to your API

R, A R, L, A, P R, A R, A R, A R, T R, P, I, A R, L, I, D, A, P R, T T, I

Data Preservation How can data be archived? - Preserve identifiers - Assess dataset coverage

R, T R, T

Feedback How can I engage users? - Gather feedback from data consumers - Make feedback available

R, C, T R, T

Data Enrichment How can I add value to data? - Enrich data by generating new data - Provide complementary presentations

R, C, T, P R, C, A, T

Data Republication How can I reuse data responsibly?

- Provide feedback to the original publisher - Follow Licensing Terms

- Cite the original publication

R, I, T R, T R, D, T

challenges.

Data Catalogue vocabulary

The Data Catalogue vocabulary (DCAT) [19] is the recommended solution by W3C to describe datasets and their distributions metadata. The usage of DCAT for dataset publishers results in increased Reuse, Comprehension, Discoverability

(39)

and Processability of the data. DCAT class diagram is represented in Figure 2.6.

Figure 2.6: Data Catalogue vocabulary class diagram (source [19])

DCAT describes a limited and concise set of classes characterized by the names-pace prefix dcat. Those are Catalog, Dataset, Distribution and CatalogRecord. DCAT itself makes use of already existing standard terms and vocabularies to complete and extend those classes; examples are Dublin Core [18] (prefix dct ), FOAF [9] (prefix foaf ) and SKOS [35] (prefix skos).

Catalogs are containers for several Datasets where the actual data is located. The CatalogRecord class is recommended when the metadata specification about the Datasets entries in the Catalogs is needed. For example, such optional in-formation can include the first publication date of Datasets within the Catalogs. Datasets metadata specify some general information such as title, description, language and landing page. Optionally, Datasets can have multiple Distributions that represent different physical formats of the data they contain. More

(40)

specif-ically, Distributions can provide information about the mediatype (e.g., CSV, JSON) and general information about alternative ways to access the Datasets (e.g., downloadable files, API, RSS feed).

FOAF and SKOS vocabularies are meant to cover the description respectively of (i) publishers and (ii) domains (e.g. thematic area) that characterize Catalogs and Datasets.

Data Catalogue vocabulary Application Profile

The Data Catalogue vocabulary Application Profile (DCAT-AP) [44] is a meta-data specification created by a Working Group under the ISA programme and promoted by the European Commission. DCAT-AP extends DCAT improving the descriptions for public sector datasets in Europe. This improvements is ac-complished by reusing the set of terms (i.e., classes and properties) contained in DCAT and related vocabularies but, in addition, terms are specified through the categorization in mandatory, recommended and optional elements. Mandatory terms are classes and properties that must be supplied by data providers and must be processable by data consumers. Recommended terms follow the same concept of previous ones but stand as recommendations, whilst optional terms can be arbitrary used or not within the descriptions.

Additions to the DCAT specification are not limited to this last categorization only. DCAT-AP augments the set of external vocabularies by using terms specified over other solutions such as SPDX [55], Schema.org [23] and ADMS [52].

The complete UML class diagram is depicted in Figure 2.7; its skeleton resem-bles the structure analysed in the previous section about DCAT, but additionally provides information about terms usage recommendations and properties coming from the additions of the new vocabularies. For instance, Dataset descriptions can now be enriched with information about samples, version notes or secondary identifiers; Distributions are able to provide data about their status, denoting the maturity level, and checksums to verify that the contents have not changed.

(41)

DCAT-AP supports extensions in order to cover specific domains of applica-tion. GeoDCAT-AP [26] and StatDCAT-AP [27] are two examples aiming at com-pleting the original DCAT-AP implementation with metadata to describe more detailed information respectively for geospatial and statistical datasets.

(42)

(43)

Other solutions for metadata descriptions

Asset Description Metadata Schema (ADMS) [52] is a metadata schema created by the ISA Programme of the European Commission. Similarly to DCAT, ADMS is meant to help publishers in creating metadata descriptions. The difference consists in the fact that DCAT is designed to facilitate interoperability between dataset catalogues, while ADMS is focused on highly reusable metadata (e.g. xml schemata, generic data models) and reference data (e.g. code lists, taxonomies, dictionaries, vocabularies) within catalogues. Some of ADSM concepts were used for the specification of these metadata inside DCAT-AP.

INSPIRE [39] is an European directive that lays the foundation for spatial information infrastructures established and operated by the Member States of the European Union. INSPIRE framework is based on the usage of common specifica-tions for metadata, data monitoring, sharing and reporting. Moreover, INSPIRE defines a set of implementing rules [41] and technical guidelines [40] that were used also as a starting point for the formulation of GeoDCAT-AP which, on the other hand, provides additional RDF syntax binding. Regarding metadata, the INSPIRE implementing rules include directives for the description of geospatial datasets.

Schema.org24 _{[23] is a community activity founded by Google, Microsoft,}

Ya-hoo and Yandex aiming at creating, maintaining and promoting a collection of schemas for structured data on the Internet. This project covers many domains and there are classes and properties in order to define datasets and catalogues. Schema.org can be optionally used within DCAT-AP to extend the descriptions of temporal periods covered by datasets.

Vocabulary of Interlinked Datasets (VoID) [17] is the alternative proposed by the W3C specifically created for expressing metadata about RDF datasets. VoID provides a way to describe (i) general metadata (through the Dublin Core model

24

(44)

[18]), (ii) access metadata that describes several way for accessing RDF data, (iii) structural metadata containing information about the datasets structure and schema, (iv) links between datasets facilitating the understanding of relationship between multiple datasets.

(45)

Problem setting

Section 2.1.4 pointed out the general architecture of chatbots identifying major components that are fundamental for chatbot functionalities. This chapter con-textualizes the previously identified chatbot architecture proposing an example that we created using already existing solutions (Section 3.1) and highlighting all the limitations derived by this use case (Section 3.3).

3.1 Building a chatbot with existing frameworks

The web offers multiple solutions (i.e., chatbot frameworks) designed to facilitate and speed up chatbot implementation and deployment processes. Most of the times, those frameworks give the possibility to exploit the most widely used com-munication channels to enable conversations between users and chatbots (e.g., Facebook Messenger, Skype, Telegram). In this way, developers can easily de-ploy the same chatbot application over multiple channels at the same time, while chatbots are capable of interacting with an increased variety of users. Generally, chatbot frameworks also provide proprietary NLU engines that represent the core components for managing users’ utterances expressed in natural language. The decision of using a specific NLU engine over the others depends on developer pref-erences; on the one hand, when dealing with chatbot frameworks, the adoption of proprietary NLU engines simplifies code and reduces development time, because

(46)

those components are already integrated within the frameworks. On the other hand, NLU engines performance in terms of intents and entities recognition varies a lot; for this reason, developers may choose to use engines different from those provided with the adopted framework because they are more suitable to their needs.

The analysis of the differences among the chatbot frameworks (e.g., Microsoft Bot Framework, IBM Watson Conversation) or among the NLU engines (e.g., Mi-crosoft LUIS, Facebook Wit.ai) listed in Chapter 2 is out of scope with respect to the purpose of this chapter. In the following, we use the Microsoft Bot Framework along with its NLU engine, namely Language Understanding Intelligent Service (LUIS) to develop our example of chatbot. This decision was taken because, in our opinion, this framework represents as of today the most widely used and supported solution by the developers community.

In the following, we give details on the selected framework and we illustrate the data sources that we integrate in our sample chatbot.

3.1.1 Microsoft Bot Framework

The Microsoft Bot Framework1_{is a development framework that supports the}

cre-ation of chatbot appliccre-ations. In particular, this platform enables the possibility to build, connect, test and deploy functional conversational chatbots.

The Bot Framework provides a Software Development Kit (SDK) called Bot Builder and constituted by a set of tools to manage the building process. The SDK is available in two versions exploiting different programming languages: C# and Javascript. The former solution provides a familiar way for .NET developers to create chatbots, while the latter is more oriented to developers experienced with Node.js.

The Bot Framework Emulator is a desktop application provided by Microsoft to support the creation of chatbots over the Bot Framework. This application

(47)

allows developers to test and debug chatbots before publishing them. The Emu-lator can interact with a chatbot running locally or remotely through an HTTP tunnel (by exploiting solutions such as Ngrok2). The application functionalities are simple and designed to transfer text messages to the chatbot application by emulating a communication channel and to log JSON requests and responses for later evaluation.

When the debugging process is complete, chatbots are generally deployed over the web so that they can interact with users and scale at request. The C# version of Bot Builder offers direct deployment on Azure3 through the Visual Studio IDE. On the contrary, the Javascript version uses Restify4 _{(i.e., a popular framework}

for building web services) to deploy chatbots as web services. Moreover, by using the Bot Framework developers can deploy chatbots with continuous integration from git repositories or platforms like GitHub5.

Finally, chatbots connections to selected channels are accomplished through the registration over the Bot Framework Portal. The Portal provides a dashboard interface to perform management activities, such as configuring communication channels and managing credentials.

The architecture of a chatbot created using the Bot Framework is depicted in Figure 3.1. The differences with reference to the general architecture proposed in Section 2.1.4 are composed by the presence of a component called Bot Connector and the NLU engine contextualization.

The Bot Connector provides a service that allows chatbots to exchange mes-sages with communication channels configured in the Bot Framework Portal, by exploiting REST paradigm through JSON over HTTPS. The available channels are: Facebook Messenger, Skype, Slack, Telegram, Kik, GroupMe, Microsoft Teams. In addition, there are some others channels that enable interaction through

2 https://ngrok.com/ 3 https://portal.azure.com/ 4 http://restify.com/ 5 https://github.com/

(48)

Figure 3.1: Architecture of a chatbot developed with the Microsoft Bot Framework

SMS (Twilio), emails, voice messages (using Cortana, Microsoft virtual personal assistant), or again text messages exploiting web chats deployed over web sites. Finally, the Bot Connector provides “Direct Line”, an API to directly interface chatbots with any other customized solution (e.g., mobile applications).

Cognitive Services are a suite of Machine-Learning-as-a-Service (MLaaS) com-ponents designed to improve chatbots “intelligence”. Those services expose APIs offering functionalities specific for: natural language processing (e.g., LUIS, Trans-lator Text API, Text Analytics API), image processing (e.g., Computer Vision API, Face API, Emotion API, Video API), speech recognition (e.g., Bing Speech API, Speaker Recognition API), knowledge management (e.g., Recommendations API, QnA Maker API, Entity Linking Intelligence Service API) and search ser-vices (e.g., Bing Autosuggest API, Bing Web Search API). Microsoft NLU engine, named LUIS, is part of the natural language services.

(49)

3.1.2 E015 digital ecosystem

E015 [62, 8, 63] is a digital ecosystem enabling the interaction between informa-tion systems and actors operating over public and private sector (e.g., private companies from multiple business sectors, SMEs, public authorities, universities, innovative startups). E015 is openly accessible6 _{and built upon shared standards,}

processes, policies, technologies and common participation guidelines that allow developers to access data and services for the integration with information sys-tems (e.g., web sites, mobile web apps, chatbots) and data publishers to make available their contents. E015 fully exploits the notion of API Economy [22] be-cause it offers a registry of API and Web Services offered by providers to build new applications or enhance existing solution. The number of E015 participants is constantly growing and the platform itself represents one of the legacies that Expo Milano 2015 left to the city and, more generally, to the Italian and European public and private economic system after the closure of the event.

Chatbot applications can take advantage of the digital ecosystem by access-ing specific services to retrieve relevant information and propose it to users. The overall architecture of a chatbot system that consumes E015 services is depicted in Figure 3.2. The functional characteristics of this architecture follow the general representation of Section 2.1.4: NLU engine is still used by the chatbot at run-time to recognize users’ intents and to choose the correct information source to be accessed. However, the digital ecosystem creates a layer that lies between chatbot and datasets and simplifies developers effort at design-time. As a matter of fact, E015 brings the following advantages to chatbots development: (i) the presence of a catalogue of services, where developers can search and select the information sources to be accessed by chatbots; (ii) the availability of thematic glossaries (i.e., standard models to represent information by means of taxonomies, ontologies, classification schemes etc.) so that developers can rely on a set of shared and

6

(50)

Figure 3.2: Architecture of a chatbot exploiting E015 services

consolidated data models in order to facilitate data access and interoperability for multiple sources; (iii) bespoke API Descriptors, providing both technical aspects (e.g., interface description, data model description, supported interaction pat-terns, supported technologies, security requirements) and non-functional aspects (e.g., pricing plan, terms of use, time of availability) that govern APIs usage.

3.2 Use case description

In the previous sections, we described the functionalities and characteristics of the main tools to develop chatbots; the Microsoft Bot Framework takes care of communication channels management and natural language processing compo-nents, while the E015 digital ecosystem is used to leverage back-end components that work as information sources. The creation of our example chatbot, named TestBot E015, is needed to investigate current limitations and issues of chatbots from the point of view of their development process.

TestBot E015 is a chatbot application capable of proposing events to the users localized in the city of Milan and the three major valleys near the city of

(51)

Berg-amo (i.e., Val Seriana, Val Cavallina and Terre del Vescovado). Figure 3.3 shows some screenshots of basic interactions between a user and TestBot E015 over the mobile Skype channel. In this scenario, a welcome screen providing a description of chatbot capabilities is proposed when starting a new conversation with the conversational agent (Figure 3.3a); TestBot E015 is capable of interpreting user inputs contextualized within its domain (i.e., public events over the specified lo-cations) but also general inputs that usually appear during the conversation, such as greetings utterances (Figure 3.3b). To help the user in better understanding what kind of utterances can be correctly recognized, the Aiuto keyword is used to obtain some examples of inputs (Figure 3.3c). These sentences can be directly typed in by the user or sent using buttons in order to save time and improve user experience.

(a) Welcome (b) Greetings (c) Help

Figure 3.3: TestBot E015 basic interactions

Figure 3.4 presents some extracts of a conversation centred over the retrieval of information about public events. The user can ask TestBot E015 questions in natural language to retrieve list of events. The chatbot can manage requests