Low-cost air quality stations and control network for a smart city

(1)

(2)

(3)

(4)

of great help during my research. In particular my thanks go to Alessandro Zaldei, Sara Di Lonardo and Giovanni Gualtieri who collaborated on the main parts of my research work. I would like to thank also Carolina Vagnoli and Francesca Camilli for their kind support.

My acknowledgement goes to all the Academic figures which supported me during the PhD period. Thanks to Prof. Paolo Nesi and the DISIT Lab which I worked with the first year. I am also very grateful to both Prof. Michele Basso and Prof. Stefano Ruffo; thanks to their contribution and suggestions, I had the opportunity to identify not only a suitable path for the study but also a direction which stimulated my interest.

From a personal point of view, I am grateful to my friends for being with me in these years, particularly to my dear friend Lorenzo for his support and affection and also to Dasara, Daniele and Nirmala that shared with me this adventure. Special thanks are due to Adriano, who taught me to see things through new eyes.

Finally, I would like to thank my family, especially my mum, my sister and my dad that I’m sure, would be proud of me.

(5)

A Smart City (SC) is a city structured and organized with sensors and in-formation networks so that its resources can be accessed through an efficient telematic infrastructure. Prefiguring urban settings where new technologies enable and interact with individuals is one of the most fascinating research themes and deepest issues in the field of urban sciences.

The environmental monitoring and protection in urban context plays a cru-cial role in people’s physical, mental and socru-cial well-being. Air pollution is one of the most important factors affecting the quality of life and the health of the increasingly urban population of industrial societies. In the European Union (EU) and in US, air quality guidelines, evaluated by the World Health Organization (WHO) have been established in order to protect human health and vegetation. It is well known that cities exhibit high air pollutant con-centrations due to a number of reasons, such as the high spatial density of human activities - including industry, transport, home energy consumptions. For these reasons, air quality monitoring is required by national air quality regulations; regarding EU countries, a large number of elements are shared due to the adoption of 2008/50/EC which acts as a common framework. However, the costs and the complexity of the equipment necessary to meet the standards established by these regulations, are generally relevant. Con-versely, low-cost air quality sensors are an emerging technology and are now commercially available in a wide variety of designs and capabilities, together with the spreading of consumer communication devices (smartphones, In-ternet of Things devices etc.). Thus, low-cost systems offer new opportu-nities for community-based sensing projects. Aimed at investigating the usefulness and potential of next generation monitoring systems for higher temporal-spatial resolution air quality measurements, this thesis addresses the challenging issue of improving the reliability of low-cost air quality sen-sor by performing both laboratory and field testing. We explore how these

(6)

monitoring systems can be used to improve the public understanding of local air quality. This community-based monitoring relies not only on technical aspects, but implies also communication skills, mainly concerning social me-dia, and the participation methodologies typical of citizen science.

In any case, to estimate in a quantitative manner the reliability of the mea-surements, sensors calibration is a mandatory procedure to be carried out. In this work the AirQino air quality monitoring system, developed by the Insti-tute of Biometeorology-Italian National Research Council (IBIMET-CNR), has been analysed. AirQino is based on an Arduino Shield Compatible elec-tronic board, and integrated with low-cost and high resolution sensors. It is designed to collect measurements of both meteorological parameters (i.e. rel-ative humidity and temperature), and pollutant/species concentrations (CO, CO2, O3, NO2, CH4, PM2.5 and PM10). Each sensor exhibits a

character-istic curve that defines its response to an input. A preliminary evaluation of these low-cost sensors was made, based on the specifications provided by each sensor’s datasheet.

Therefore, a twofold sensors calibration was performed: (i) in IBIMET-CNR laboratory, against reference air quality monitoring instrumentation; (ii) on-site, against local authority fixed air quality stations currently in operation (ARPAT). The result of sensors calibration, performed through linear and non linear regression, were expressed using a set of typical statistical indica-tors. For this purpose, we performed pilot studies with AirQino.

The first one is an application (IMP) to monitor traffic flows and related air pollution in an urban areas. The IMP included AirQino, to measure air pollutant concentrations and a traffic monitoring device, equipped with a camera sensor and a video analysis software, to detect vehicles counts, speed and category. The data provided by the IMP have been used to in-vestigate the influence of local road traffic and meteorological parameters on NO2 and CO2 concentrations, both linear regression and an ANN have

been applied on the full dataset. The second application, is on outdoor air pollutant monitoring activity performed using a mobile version of AirQino installed on a special bike developed by the Department of Industrial Engi-neering of the University of Florence (Italy). The proposed case study was selected due to its potential interest for future mapping of pollutants at road and user level. Finally, in order to test the system performance at extreme conditions of temperature and humidity for future measurement campaigns, the system was tested in extreme environment conditions, at International

(7)

Scientific Base in Ny-Alesund (Svalbard Islands, Norway) in both fixed and mobile mode during March 2017. These tests are still running.

Furthermore, in a future perspective of implementing at European level, a research strategy based on participatory methods (citizen science) was pro-posed. A survey regarding urban quality air monitoring was conducted to investigate the consciousness of citizens regarding air pollution phenomena and the availability in hosting at their own houses and/or working places the AirQino air quality monitoring system in combination with a road traffic monitoring device. The results of the survey, highlighted a good sensitiveness of the panel under study and a potential interest to support environmental project.

(8)

(9)

Contents ix

1 Introduction 1

1.1 Contributions . . . 5

2 Literature review 7 2.1 Air quality monitoring . . . 7

2.2 Participatory approach for environmental monitoring . . . 9

2.3 Low-cost sensors for air pollution measurement . . . 10

3 AirQino air quality board 13 3.1 Introduction . . . 13

3.2 Sensors specifications . . . 15

3.3 Evaluating sensors’ outputs in a controlled environment . . . 18

3.4 Conclusions . . . 21

4 Laboratory calibration 23 4.1 Laboratory calibration set up . . . 23

4.1.1 HORIBA Ambient Air Pollution . . . 23

4.1.2 TSI DustTrak DRX . . . 26

4.2 Calibration methods . . . 27

4.2.1 Linear regression . . . 28

4.2.1.1 Residual analysis . . . 30

4.2.1.2 Cook’s distance . . . 31

4.2.2 Robust linear regression . . . 32

4.2.3 Non-linear regression . . . 35

4.3 Figaro TGS-2600 for CO . . . 37

4.4 SGX MICS-2714 for NO2 . . . 49

(10)

4.5 SGX MICS-OZ-47 for O3 . . . 61 4.6 SDS011 for PM2.5 . . . 71 4.7 SDS011 for PM10 . . . 82 4.8 Conclusions . . . 92 5 On-site calibration 95 5.1 Introduction . . . 95 5.2 SIM1 PM calibration . . . 97 5.3 SIM2 PM calibration . . . 100 5.4 Conclusions . . . 102 6 Monitoring campaigns 107 6.1 Introduction . . . 107

6.2 Monitoring of traffic flows and related air pollution in urban areas . . . 107

6.2.1 Data analysis . . . 109

6.2.2 Multi-regressive framework and relative importance anal-ysis . . . 109

6.3 Monitoring and tracking outdoor pollutants . . . 112

6.3.1 Data analysis . . . 113

6.4 Extreme environment test . . . 117

6.4.1 System performance . . . 117

7 Atmospheric Dispersion Models 121 7.1 Introduction . . . 121

7.2 Gaussian plume distribution . . . 123

7.2.1 Pasquill’s stability classes . . . 126

8 Community-based sensing projects 133 8.1 Introduction . . . 133

8.2 Citizen Science . . . 134

8.2.1 Urban quality air monitoring survey . . . 134

8.3 Twitter Vigilance . . . 138

8.3.1 Social impacts of certain unusual climatic events . . . 140

(11)

9 Conclusions 145 9.1 Summary of contribution . . . 145 9.2 Directions for future work . . . 149

A Publications 151

(12)

(13)

Introduction

A Smart City (SC) is a city structured and organized with sensors and in-formation networks so that its resources can be accessed through an efficient telematic infrastructure. Envisaging urban settings where new technologies operate and interact with individuals is one of the most challenging research topic and deepest issue in the field of urban sciences.

The “smart” adjective has always represented the purpose to indicate an evolutionary state that can be traced back to the latest developments in technological innovation. Technological innovation within the general evolu-tionary process of the urban system involves some of the structural factors of the SC, including: (i) physical detectors system that constitute the basis for the presence of a network of sensors on the urban area, (ii) functional data architectures assuring open data availability and (iii) socio-anthropic organizations like community-based sensing projects.

As said, the systemic approach is the most suitable paradigmatic framework for understanding the new urban structure and this research project is struc-tured in reference to what has been defined a physical detector system. In other words, the SC is characterized by the presence of sensors capable of real-time monitoring in the status of the urban system. In particular, we dealt with sensors for pollutants in air and their sources.

Environment in urban system plays a crucial role in people’s physical, men-tal and social well-being. The World Health Organization (WHO) defined this kind of contamination as follows: “Air pollution occurs when one or several air pollutants are present in such amounts and for such a long pe-riod in the outside air that they are harmful to humans, animals, plants

(14)

or properties, contribute to damage,or may impair the well-being or use of property to a measurable degree ”. Air pollution is a public health issue as-sociated with various health effects, including cardiovascular and respiratory disease, cancer, pregnancy complications, cardiovascular diseases, adverse pregnancy outcomes and even death Refs. [1]. In a recent study the Lancet Commission on Pollution and Health, involving more than 40 international health and environmental authors, states that the air pollution caused 6.5 million heart and lung-related deaths Refs. [2]. Among these deaths, around 2.1 and 0.47 million are due to the actions of fine particulate (PM) and ozone (O3) Refs. [3]. In Europe, over the past decades, due to the effective

legislation, many air pollutants have substantially decreased, resulting in an improved air quality across the region. However, air pollutant concentrations are still too high, and the air quality problems persist. A significant fraction of Europe’s population lives in areas, especially cities, where air quality ex-ceedances still occur. Air pollutants, such as carbon monoxide (CO), sulphur dioxide (SO2), nitrogen oxides (NOx), volatile organic compounds (VOCs),

ozone (O3), heavy metals, and breathable particulate matter (PM2.5 and

PM10), are generally recognized as the pollutants that most significantly affect human health. In European cities, around 90% of population is ex-posed to pollutant concentrations higher than those which are considered as harmful to health. For example, the city of Turin (Italy), where safe limits for fine particles have been exceeded for several consecutive days in the last few years, have the greatest exposure to deadly air pollution compared to other cities throughout the European Union. Therefore, reducing air pol-lution remains crucial and in many cases authorities imposed traffic blocks, often applying to the most polluting vehicles.

Air pollution is characterized by non uniform trends, particularly in dense urban areas, which implies the necessity for carrying out a pollution monitor-ing at finer resolution. In developed countries, about 80% of population lives in urban areas, while urbanization is rapidly growing in developing countries. Due to the high spatial density of human activities, cities exhibit high con-centration levels of different air pollutants. For these reasons, air quality monitoring is required by national regulations, such as the 2008/50/EC Eu-ropean Directive. In Italy, pollutant air quality monitoring is regulated by the Legislative Decree no. 155 of 13/08/2010 that transposes the European Directive 2008/50/EC. This decree assigns to the various regional environ-mental protection agencies (ARPAs) the institutional task of monitoring and

(15)

controlling air quality. Monitoring is carried out through fixed stations lo-cated in the territory. Stations are subdivided in urban or rural, background stations, traffic and industrial ones. For example, in the city of Florence (Italy), the regional network consists of 6 stations: 2 urban-traffic ones, 3 background-stations, and one rural-background station. High road traffic volumes are not the only reason bringing to an air pollution increase, as emissions from industrial plants or from domestic heading may also signif-icantly contribute to air pollution high levels. Various air pollutants have been reported, which differ by chemical composition, reaction properties, emission rates, persistence in the environment, capability of being trans-ported along short or long ranges, and eventual impact on human and/or animal health. Air pollutants may be either particles, or liquids or gaseous in nature. There are various sources, activities or factors that are responsible for releasing pollutants into the atmosphere. Pollutant emitting sources can be classified into two major categories:

Natural: e.g. forest fires, erupting volcanoes, Sahara dust, and gases released from the radioactive decay of rocks;

Anthropogenic: e.g. road vehicles and engines, electric utilities, heat-ing plants, industrial processes.

Air pollutant composition may greatly vary as well, depending on the sea-son, the weather and the types and numbers of sources. Substances emitted into the atmosphere can be:

Gases (CO, NO, NO2, SOx, O3),

Solid particles (e.g. aerosol and metals).

Based on the chemical characteristics, they can be subdivided in: Inorganic substances (minerals: silica, asbestos, metals; not minerals:

CO, CO21,NOx, SOx, O3),

Organic substances (organometallic and organoclorurated compounds). This characterization leads to further classify air pollutants as primary or secondary:

1_{Carbon dioxide is a natural component of the atmosphere so it’s not properly classified}

(16)

Primary pollutants, directly emitted from the sources (primary compo-nent of particulate matter PM, SOx, NO, primary component of NO2,

CO, IPA, unburned hydrocarbons, metals),

Secondary pollutants, chemicals species formed through reactions in the atmosphere (secondary component of particulate matter PM, O3,

secondary component of NO2).

Many types of chemical reactions in the atmosphere create, modify, and destroy chemical pollutants. Inert pollutants, once emitted from sources, are subject to various atmospheric processes, including advection due to wind, diffusion due to atmospheric turbulence (dilution or mixing), and (dry or wet) ground deposition. Reactive pollutants undergo further processes, e.g. chemical-physical transformations leading to the formation of secondary pollutants. Thus, it is important to discern between:

Non-reactive (or inert) pollutants like CO and NO, Reactive pollutants like NO2 and O3.

Carbon monoxide (CO) is a colourless and odourless gas. It is very stable and has a life time of 2 to 4 months in the atmosphere. It originates from the incomplete combustion of carboneus material and is the air pollutant emit-ted in the largest quantity. The combustion of carbon-based fuels produces carbon dioxide (CO2). The two main nitrogen oxides are nitrogen monoxide

(NO) and nitrogen dioxide (NO2): their sum is equal to NOx. Only a small

percentage of NO2found in the atmosphere is directly emitted from sources,

as the (prevailing) remainder is formed as a result of chemical reactions in the atmosphere when combustion takes place at high temperature. Therefore, NO2 has both a primary and a secondary component.

Conversely, ozone (O3) can be found in the stratosphere2and in the

tro-posphere, where it occurs both as a result of natural and human-generated emission. Ozone, not directly emitted from sources, is a secondary pollutant formed in the troposphere as a result of anthropogenic emissions, markedly produced by the reactions of primary pollutants, nitrogen oxides and hy-drocarbons, in the presence of sunlight. To this aim, nitrogen oxides and hydrocarbons are generally referred to as ozone precursors.

2_{The natural stratospheric ozone protects the the surface of the Earth from harmful}

(17)

In addition to gaseous pollutants, the atmosphere contains solid and liq-uid particles that are suspended in the air. These particles are referred to as aerosols or particulate matter (PM). They exhibit a wide range of sizes, with their aerodynamic diameter varying from more than 100µm to less than 0.1µm. Particles with a diameter larger than 10µm are mainly generated from natural processes, i.e. following long-range transport of sea spray, dust, and other debris into the atmosphere. Fine aerosol particles are basically of anthropogenic origin, and are mainly produced when precursor gases condense in the atmosphere. Major components of fine aerosols are sulfate, nitrate, organic carbon, and elemental carbon.

1.1 Contributions

Measurements at appropriate spatial and temporal scales are essential for un-derstanding and monitoring spatially heterogeneous environments with com-plex and highly variable emission sources, as happens in urban areas. Moni-toring of air pollutants is primarily performed using analytical instruments, such as optical and chemical analysers. Usually, air pollutant analysers are complicated, bulky and expensive, each instrument costing anywhere from about five to tens of thousands Euro, accompanied by a significant amount of resources required to routinely maintain and calibrate them. Conversely, low-cost air quality sensors are an emerging technology and are now com-mercially available in a wide variety of designs and capabilities. Nowadays there is an ample choice of good sensors and many of them are good enough out of the box to be used for non-critical applications.

Aimed at investigating the usefulness and potential of next generation monitoring systems for higher temporal-spatial resolution air quality mea-surements, this work addresses the challenging issue of improving low-cost air quality sensor reliability. The main aim of the PhD activity presented herein is the analysis and the proposal of methodologies to achieve the most reliable measurements by low cost and high resolution sensors.

For this purpose in chapter 2 we present a literature review about air pollution monitoring with low cost sensors, and in chapter 3 we analysed the AirQino air quality monitoring system, developed by the CNR-IBIMET research institute. A preliminary evaluation of AirQino low-cost sensors was made based on the specifications provided by each sensor datasheet and then, a twofold sensors calibration was presented: (i) in laboratory, against

(18)

refer-ence air quality monitoring instrumentation, Chapter 4; (ii) on-site, against official fixed air quality stations currently in operation, Chapter 5. Once the calibration testing was completed, some campaigns on monitoring out-door air pollutants have been performed. In Chapter 6 we describe a test in extreme environmental conditions and two campaigns on monitoring traffic flows and related air pollution in urban areas. In Chapter 7 we introduce some basic concepts of environmental modelling and the problem of advec-tion, also reporting the fundamentals of Gaussian plume modelling diffusion of pollutants emitted from point source like a car exhaust pipe.

Furthermore, since low-cost air quality sensors offer new opportunities for community-based sensing projects that relies on participation method-ologies typical of citizen science, in Chapter 8 we present the results of a survey regarding urban quality air monitoring. This survey was conducted to investigate and elicit the readiness of Italian citizens in hosting in their homes and/or working places sensors connected to an integrated monitoring platform (IMP) developed to monitor road traffic and related air pollution. Finally, since sensors like the monitoring platform (IMP) we proposed, enable new applications across a wide variety of domains, such as social net-works a preliminary analysis of Twitter social network was made to analyse the impact of meteorological phenomena on population.

(19)

Literature review

This chapter presents a literature review on air quality monitoring. A first part of the chapter deals with works focussed on urban environmental mon-itoring. The following section introduces the topic of using low-cost sensors for air pollution measurement. While reviewing the literature, some aspects have been primarily investigated: the importance to obtain a high resolution spatial measurement using participatory and low-cost sensors and the need to evaluate their accuracy and performance.

2.1 Air quality monitoring

Air pollution is one of the most important factors affecting the quality of life and the health of the increasingly urban population of industrial societies. In the European Union (EU) and in US, air quality guidelines, evaluated by the World Health Organization (WHO) have been established in order to protect human health and vegetation. In the European Union, the setting of limit values is a multi-step process with various EC directives, where the first one was adopted in 1980 Refs. [4].

A successive series of directives have imposed progressively more stringent limits on levels of harmful air pollutants in ambient air. The most recent di-rective is the Didi-rective 2008/50/EC which requires Member States to monitor and assess air quality, report to the Commission, public the results of this monitoring and assessment, and prepare and implement air quality plans. Table 2.1 summarizes the air quality standards as set by the directives. EU directives must be transposed into national legislation and nowadays,

(20)

Pollutant Concentration Averaging period Permitted exceedences each year

Fine particles (PM2.5) 25 µg/m3 _{1 year} _n/a

Sulphur dioxide (SO2) 350 µg/m

3 125 µg/m3 1 hour 24 hours 24 3 Nitrogen dioxide (NO2) 200 µg/m

3 40 µg/m3 1 hour 1 year 18 n/a PM10 50 µg/m 3 40 µg/m3 24 hours 1 year 35 n/a

Lead (Pb) 0.5 µg/m3 _{1 year} _n/a

Carbon monoxide (CO) 10 mg/m3 Maximum daily

8 hour mean n/a

Benzene 5µg/m3 _{1 year} _n/a

Ozone 120µg/m3 Maximum daily

8 hour mean

25 days averaged over 3 years

Arsenic (As) 6ng/m3 _{1 year} _n/a

Cadmium (Cd) 5ng/m3 _{1 year} _n/a

Nickel (Ni) 20ng/m3 _{1 year} _n/a

Polycyclic Aromatic Hydrocarbons

1ng/m3 _{1 year} _n/a

Table 2.1: European Union Air Quality Standards.

air pollution is monitored by networks of fixed measurement stations oper-ated by official authorities. These stations are highly reliable and able to accurately measure a wide range of air pollutants. Environmental impacts are particularly severe in urban areas due to high population, traffic levels, intense vehicle use, driving patterns, vehicle characteristics and complex ur-ban geometry Refs. [5]. In the past few decades, worldwide regulations have progressively imposed more and more restrictive thresholds for air pollutant concentrations Refs. [6].

This has led to improvements in the road transport sector, e.g. promotion of public vs. private transports, vehicle fleet turnover, fuel improvement, or increase in electric/hybrid vehicles share Refs. [7], resulting in considerably reduced air pollutant emissions. In Europe, the road transportation sector

(21)

accounts for total annual emissions constituted for 40% by NOx, 23% by CO,

13% by primary PM2.5, 9% by primary PM10, and 11% by VOCs. Shares rise to 23% for total PM10 and 28% for O3 precursors if emissions from

precursors of secondary aerosol and O3 are also considered Refs. [8].

2.2 Participatory approach for environmental

monitoring

Today, mobility managers and city administrators are strongly committed to analysing and taking actions to tackle air quality limit trespass.

However, the costs for data acquisition and the maintenance of stations severely limit the number of installations and thus collected data have a low spatial resolution and can not be used to assess the spatial variability of air pollutants in urban environments.

To this goal, a clear support might arise from availability of monitoring devices at the same urban site of air pollutant concentrations and traffic flows. To provide adequate information on air quality spatial distribution, the Directive 2008/50/EC also states that those fixed measurements may be supplemented by indicative measurements Refs. [9]. Thus, the legislative importance of indicative measurements is to be stressed, as “the results of indicative measurement shall be taken into account for the assessment of air quality with respect to the limit values” Refs. [10].

Recent air quality regulations (including Directive 2008/50/EC itself) enforce the transition from point-based monitoring networks to new tools that must be capable of mapping and forecasting air quality over the whole urban area, and thus the totality of citizens. This implies that new tech-nologies such as models and additional indicative measurements, are needed in addition to accurate fixed air quality monitoring stations.

From this point of view, urban environmental monitoring is a key tool for the development of continuous information services, able to provide data needed to take decisions/interventions in a proper way in the area. In-novations in ICT now allow to integrate the existing data infrastructure with participatory systems involving different stakeholders, including the cit-izens themselves. This participative sensing approach is reflected on specific projects like the SensorWebBike, a platform consisting of a real-time Spatial Data Infrastructure (SDI) and a web interface offering an open and partici-patory approach for environmental monitoring in urban areas Refs. [11].

(22)

To obtain a high-resolution spatial measurement, participatory and mo-bile monitoring measurement networks have been proposed. For example in Refs. [12] a study of collected measurements using mobile sensor nodes in-stalled on top of public transport vehicles in the city of Zurich, Switzerland, is presented. Furthermore, in Refs. [13] a Mobile Weather Station (MWS) equipped with a built-in GPS antenna is proposed. This mobile station is specifically designed for buses or tramways of Florence (Italy), and logs ev-ery minute both meteorological variables (i.e. temperature and air humidity) and air quality parameters (i.e., atmospheric CO2 concentration and noise

detection).

Vehicular-based approach has been also proposed in Refs. [14], where two solutions for monitoring air pollution are described: one that can be deployed on public transportation, and the other mounted inside personal vehicles.

Similar sensor-based projects were also proposed in cities with high pol-lution levels outside Europe Refs. [15]: an air polpol-lution study in megacity is proposed in Refs. [16]; this reviews is focused on nine urban centers such as Los Angeles (USA),Delhi (India), Mexico City (Mexico), Toronto (Canada). Beijing (China), Santiago (Chile), San Paulo (Brazil), Bogot´a (Colombia) and Cairo (Egypt). The study proposed in Refs. [17] is also focused on the pollution in China.

2.3 Low-cost sensors for air pollution

measure-ment

The increasing use of low-cost sensors for air pollution monitoring in cities is radically changing the conventional approach to deliver real-time infor-mation; as we described in the previous section, they represent an efficient solution for tightening the coarse urban air pollution monitoring. Low-cost, portable, and autonomous sensors are manufactured using micro-fabrication techniques and contain micro-electro-mechanical systems (MEMS) made of microfluidic, optical and nanostructured elements, which allow them to be compact and inexpensive Refs. [18].

Most of these sensors can be classified in some category Refs. [19]: (i) those that depend on interactions between the sensing material (semicon-ductor) and gas phase component such as nitrogen dioxide (NO2), ozone

(23)

visible by chemiluminescence (NO2) or non-dispersive infrared (CO2), (iii)

those the use light scattering for measuring particle number concentrations that can be converted to any mass fraction (PM).

These types of electrochemical sensors have been used in many works Refs. [20]. For example, in Ref. [21], a pollution monitoring project using pervasive low-cost sensor technology is presented. A study using metal ox-ide semiconductor sensors to monitor O3 is presented in Ref. [22], while in

Ref. [23] a CO, NO2 and total NOx wearable system using MOx sensors is

described . An implementation of electrochemical sensors for carbon monox-ide and nitrogen dioxmonox-ide is also presented in Refs. [24].

All these studies show that new low-cost and small-size sensors provide new opportunities to simultaneously enhance existing monitoring systems and that they could be employed in air quality monitoring. For this purpose, modelling is also encouraged in order to provide better information on spa-tial distribution of concentrations thus, in chapter 7 we have introduced basic concepts of environmental modelling

However, accurate assessment of sensors accuracy and performance both in controlled and real monitoring conditions is crucial. Thus it is mandatory to define a calibration procedure. In general, we can distinguish two ways of implementing a calibration procedure: in laboratory, and on-field. The first methodology consists of laboratory tests with controlled artificial gas mixtures Refs. [25]; the second methodology consists of tests in the field against real pollution measurements such as the data provided by local envi-ronmental agencies Refs. [26]. The ultimate objective of both procedures is to develop and validate calibration models aimed at predicting air pollutant concentrations. Some works implemented neural networks techniques to de-termine concentration values (e.g. Refs. [27]) or parametric regression-based models (e.g. Refs. [20, 28]).

In this thesis we decided to implement both of the above-mentioned calibra-tion procedures and to apply some regression-based models. Thus, sensor laboratory calibration has been performed following Ref. [29], which sug-gested that for the test of sensors an exposure chamber, through which the generated atmosphere is passed and of sufficient capacity to simultaneously accommodate several sensors, should be used. We have also implemented a pre-calibration methodology to evaluate the linearity and the range of sen-sors response; for the details, please refer to Chapter 3.

(24)

mobile sensors and calibrated the sensors after comparison against the offi-cial data measured by an ARPA fixed air quality station. In both cases, to estimate the calibration procedure, as suggested in Ref. [30], we have plot-ted the results of experiments over time of sensor responses and then sensor responses versus the reference measurements of the test gas or the reference data.

In conclusions, several research projects are exploring the possibility of col-lecting air quality data using low-cost sensor systems which can provide aggregated information on the observed air quality Refs. [31]. This assump-tion, especially for a multidisciplinary approach, is important especially in citizen science applications, where citizens are collecting and interpreting the data. Citizen Science is a research strategy based on participatory methods and aimed at shortening the gap between citizens and research, actively in-volving citizens in studying issues of public interest. Please refer to Chapter 8 for a closer look. .

(25)

AirQino air quality board

In this chapter we present AirQino®, a compact low-cost air quality moni-toring station, developed by the Institute of Biometeorology-Italian National Research Council (IBIMET-CNR). AirQino is based on an Arduino Shield Compatible electronic board and integrated with low-cost and high resolution sensors, dedicated the monitoring environmental parameters and air quality pollutants. The board integrates also a microprocessor unit that acquires all the installed sensors.

3.1 Introduction

Recent air quality regulations (Directive 2008/50/EC) enforce the transition from point-based monitoring networks to new tools that must be capable of mapping and forecasting air quality on the totality of land area, and therefore the totality of citizens. This implies that new tools, such as models and additional indicative measurements, are needed in addition of accurate fixed air quality monitoring stations, that until now have been taken as reference by local administrators for the enforcement of various mitigation strategies. However, due to their sporadic spatial distribution, they cannot describe the high spatial pollutant variations within cities. Integrating additional indicative measurements may provide adequate information on the spatial distribution of air quality parameters.

For this purpose, new low-cost and small size sensors are becoming avail-able to be employed in air quality monitoring including mobile applications. However, accurate assessment of their accuracy and performance both in

(26)

controlled and real monitoring conditions is crucially needed. Quantifying sensor response is a significant challenge due to the sensitivity to ambi-ent temperature and humidity and the cross-sensitivity to others pollutant species Refs. [32].

This chapter reports the analysis of the AirQino board and its equipped sensors. AIRQino is a complete air quality sensors board equipped with a set of industrial SMD sensors. The board is Arduino Shield compatible, inte-grated with low cost and high resolution sensors, dedicated to monitor both environmental parameters and air quality pollutants (humidity, temperature, CO, CO2, O3, NO2, VOC, PM2.5, PM10) in urban environment.

Sensors transmits geolocated data through the General Packet Radio Ser-vice (GPRS) technology to a data server connected to the applications and webserver allowing to visualise observations in real-time on a web browser Refs. [33]. The board is placed into an IP68 waterproof box. Because the gases monitored by AIRQino sensor board includes also reactive ones, air-flow inside the waterproof box is designed to minimize the gases interference. Furthermore, a small brushless fan blows the air out of the box. This create a depression that attracts air from the inlet window. It is very important for the correct functionality of the sensor board to maintain clean both the inlet and outlet windows.

Figure 3.1: IP68 box AirQino. The system is provided with an internal DC-DC converter unit that accept a wide range of voltage input, from 10Vdc to 30Vdc.

(27)

3.2 Sensors specifications

The sensors that responded to a good accuracy, low cost and interfacing capabilities were chosen by basing towards a good compromise between cost-effective and reliable, and by ensuring the continuity and traceability of observations. Thus, after a market research of available sensors Refs. [34], the selected sensors were the follow (table 3.1):

S8 by SenseAir for CO2 detection: is a NDIR, non dispersive

infrared, sensor with an analysis chamber protected by a membrane which avoids dust contamination.

DHT22 for air temperature measurements: is an integrated sen-sor, encapsulated in a plastic shed with a porous filter.

MiCS-OZ47 for O3 measures: is a MOS-type gas sensors, installed

on a sensor board and connected to the motherboard with a red con-nector

MiCS-2614 for O3 measures: is a MOS-type gas sensor, installed

on the sensor board and connected to the motherboard with a red connector.

MiCS-2714 for NO2detection: is a MOS-type gas sensor, installed

on the sensor board and connected to the motherboard with a red connector.

TGS-2600 for CO measurements: is a MOS-type gas sensor, in-stalled on the sensor board and connected to the motherboard with a red connector.

MiCS-5524 for volatile organic compounds (VOCs) measures:: is a MOS-type gas sensor, installed on the sensor board and connected to the motherboard with a red connector

SDS011 for PM2.5 and PM10 particles: is based on laser scatter-ing principle. The scattered light is transformed into electrical signals that are amplified and processed. The number and diameter of par-ticles is obtained by analysis by considering signal waveform relations with the particles diameter.

The measurement ranges of these sensors are reported in Table 3.2. The outline of sensors disposition in AirQino station is reported figure 3.2.

(28)

Pollutant Manufacturer Model Type Output CO2 CO2meter Senseair S8 Non-dispersive infrared (NDIR) ppm

O3 SGX Sensortech MiCS-OZ-47 MOS-type gas sensors ppb

T/RH Aosong DHT22 Semiconductor ◦_C/%

O3 SGX Sensortech MiCS-2614 MOS-type gas sensors counts

NO2 SGX Sensortech MiCS-2714 MOS-type gas sensors counts

CO/CH4 FIGARO TGS-2600 MOS-type gas sensors counts

CO/VOC SGX Sensortech MiCS-5524 MOS-type gas sensors counts

PM2.5/PM10 Novasense SDS011 Laser scattering µg/m3

Table 3.1: AirQino sensors specifications. The output counts, in range 0-1024, is the digital to analog conversion board scale.

Parameter Reference Unit Range

T ◦C -40 – 80 Rh % 0 – 100 CO2 ppm 0 – 2000 O3 ppb 20 – 200 NO2 ppm 0.05 – 5 CO ppm 1 – 30 PM µg/m3 _{0 – 999} VOC ppm 1 – 100

(29)

(30)

3.3 Evaluating sensors’ outputs in a controlled

environment

Each sensor exhibits a characteristic curve that defines its response to input. A preliminary evaluation of these low cost sensors was made (excluding the SDS011), by basing on the specifications provided by each sensor datasheet. The evaluation was performed by inserting the card into a test chamber. Controlled concentrations of different gases were injected in the box through a flow-meter and then to a specific container (AeroQual, mod. AQM R41 Calibration Humidifier) in order to maintain a constant airflow humidity (figure 3.3). For the characterization of ozone sensors, an Ozone Calibration Source (2btech mod. 306c [OCS])1was used. For evaluate the others sensors, certified gas mixtures concentrations was used.

Figure 3.3: AirQino test chamber: controlled concentrations of different gases were injected in the box through a flow-meter and then to a specific container in order to maintain a constant airflow humidity.

1_{Ozone Calibration Source is an O}

(31)

During the first evaluation sessions, sensor response limits were highlighted due to an over-range response problem. The analog-digital conversion scale of the board (0-1024 counts) was not suitable for the response of some sensors whose signal variation exceeded the converter limits (1024 counts). For this reason, a change in the value of the resistances characterizing the output signal of the sensors has been made.

Figure 3.4: O3 Rgain. Resistance range is from 4.7 kΩ up to 22 kΩ.

Thus, we have proceed to choose an appropriate Rgain to reduce noise and to stabilize outputs from the sensors. Figure 3.4 shows the different response performed by changing the Rgain of MiCS-2614 O3sensor. In order

to obtain the best response ranges, changes in the value of the resistances that characterize the sensor output signal has been performed. After performed this adjustment, we proceeded to evaluate sensor. Table 3.3, 3.4 and 3.5 shows some collected data with different dilutions of the injected gases.

(32)

Diluition Concentration (ppm) TGS2600 (counts) MiCS5524 (counts)

100 10 668 730

80 12,5 643 644

40 25 515 205

Table 3.3: Use of 1000 ppm Isobutane for TGS2600 and MiCS5524 evalua-tion.

Diluition Concentration (ppm) MiCS2714 (counts)

400 0.05 820

200 0.1 732

40 0.5 430

Table 3.4: Use of 20 ppm NO2 for MiCS2714 evaluation.

OCS ppb MiCS2614 (counts) OZ47 (ppb)

10 531 15 50 419 16 100 222 16 150 98 18 200 49 21 250 34 26 300 23 34 350 19 40

Table 3.5: Ozone Calibration Source (OCS) ppb for MiCS2614 and OZ47 evaluation.

(33)

Finally, figure 3.5 reports an example of the complete of one of the tested sensor (in this case, MiCS2614). All the tests were carried out at ambient temperature and humidity and taking into account the response time of the sensor (in this case 6 min).

Figure 3.5: MiCS2614 O3 counts response curve (range values

10ppb-350ppb).

This test, lasting an hour and a half, was performed in order to evaluate the range of output sensor and also to verify the response time. The peaks that are seen in the graph correspond to the warm up time of the sensor between one concentration and the previous one.

3.4 Conclusions

SENSEAIR S8 The sensor provides output value directly in ppm and shows a proportional response to the increase of injected CO2concentrations.

Otherwise, the manufacturer declares a ± 70 ppm error on the measurement, with a further ±3% error on the single observation. This is already in itself an aspect to be taken into account in environmental monitoring.

MiCS-OZ-47 and MiCS-2614 We used two type of ozone sensors used in this test phase. MiCS-OZ-47 works in a range of 20 to 200 ppb and its response is compensated for the temperature and the humidity of the air,

(34)

measured and entirely considered by the sensor itself. This sensor response is proportional to the increase of gas concentration.

MiCS-2614 initially has shown an output response poorly amplified with respect to the converter range. This reflected in a loss of information because the signal variations are represented by the respective digital output of a few counts. Thus, an adjustment in terms of Rgain, in this case was mandatory. Otherwise, there is a certain difference in response between one sensor and another; it is also necessary to keep in mind the possible hysteresis effect that occurs in some sensors when the physical quantity is increases or decreases. MiCS-2714 The MiCS-2714 sensor is characterized by a range of 0.05 to 5 ppm. The output signal of NO2concentration is in counts unit. In this case,

we performed only a few tests because of the technical difficulty of preparing a certified concentration cylinder at such tight limits as the environmental ones.

TGS2600 and MiCS5524 TGS2600 detects CO/CH4in 1-30 ppm range.

The output signal of the concentration is in counts unit but the specifications of this sensor do not allow to compute the reading error. Otherwise, MiCS-5524 is characterized by a response range of 1-1000 ppm and the output signal of the concentration is in counts unit. Specifications of this sensor do not allow to know the reading error. The peculiarity of this sensor is that it is also sensitive to VOCs.

A common issue is that some sensors also respond to variations related other gas (cross-sensitivity) Refs. [35]. This effect could cause alterations in their response , i.e. even by exposing them to a gas or mixture other than that to which a particular sensor should in theory be sensitive. For example, the response of ozone sensors can be influenced by different concentrations of other gases (i.e. CO and CH4). These effects should be considered,

espe-cially during the outdoor test phases; therefore, indoor tests are generally performed as mandatory procedure to achieve the most reliable calibration and understandings on sensors’ performances.

(35)

Laboratory calibration

In chapter 3, a preliminary evaluation of AirQino low-cost sensors was made based on the specifications provided by each sensor’s datasheet. To achieve the most reliable measurements, sensors calibration is a mandatory procedure to be carried out. In this chapter a laboratory calibration of AirQino sensors for CO, NO2, O3, PM2.5 and PM10 was proposed against reference air

quality monitoring instrumentation.

4.1 Laboratory calibration set up

The calibration process maps the sensor’s response to an ideal linear re-sponse that provided by the reference instrument. To this aim, a calibration laboratory was set up at IBIMET CNR with high quality analytical instru-ments such as Horiba AP SERIES Instruinstru-ments for NO2,O3,CO and TSI

DustTrak for PM2.5 and PM10 , which served as reference instruments for the characterization and calibration.

4.1.1 HORIBA Ambient Air Pollution

The HORIBA Ambient Air Pollution AP SERIES analysers feature advanced technology, field-proven reliability with excellent sensitivity and precision at ppb levels, and hassle free maintenance. Each of these instruments, different in their operation principle, measure single/multiple components in ambient air or diluted stack gases.

(36)

For our laboratory calibration we analysed pollutants concentrations outputs from APMA-370, APNA-370 and APOA-370 .

Figure 4.1: Horiba AP SERIES for CO, SO2 (future calibration), O3, NO2.

APNA-370 is an ambient nitrogen oxide monitor using the chemilumi-nescence (CLD) method as operating principle. This instrument allows to continuously measure the concentrations of nitrogen oxides N O, N O2 and

N Ox = N O + N O2 in the atmosphere with 0.5 ppb as lower detectable

limit [36].

(37)

APMA-370 is a device for continuously monitoring CO concentrations using a non-dispersion cross modulation infrared analysis method. This in-strument allows to continuously measuring the concentrations of CO in the atmosphere with 0.02 ppm as lower detectable limit [37].

Figure 4.3: Horiba APMA-370 system configuration.

APOA-370 monitors atmospheric ozone concentrations using a cross flow modulated ultraviolet absorption method. This instrument allows to continuously measuring the concentrations of O3 in the atmosphere with

0.02 ppm as lower detectable limit [38].

(38)

4.1.2 TSI DustTrak DRX

The DustTrak DRX desktop monitor is a battery operated, data-logging, light-scattering laser photometer that simultaneously measures size-segregated mass fraction concentrations corresponding to PM1, PM2.5, PM10 and total PM size fractions [39]. Aerosol concentrations range goes from 0.001 to 150 mg/m3_.

Figure 4.5: DustTrak interface.

(39)

4.2 Calibration methods

To perform a laboratory calibration, the AirQino board was located outside in a dedicated space. Here the same sampled air was simultaneously injected to the reference instruments through a Teflon tubes. The minimum length of a dataset, to capture a complete sensor output, includes a day-night cycle in order to evaluate the presence of some periodical patterns. To this aim, a two weeks dataset was collected.

Figure 4.7: Clusters of AirQino at IBIMET-CNR, via Caproni Firenze. Data was simultaneously collected from all instruments, and thus datasets were collected:

1. AirQino: from 02/07/2017 00:01:01 to 2017-07-15 09:10:22 with a sensor reading frequency of 2 minutes. Pollutants: O3 (ppb), NO2

(counts), CO (counts), PM2.5 (µg/m3_{), PM10 (µg/m}3_).

2. Horiba: from 02/07/2017 00:03:00 to 17/07/2017 15:00:00 with a sen-sor reading frequency of 3 minutes. Pollutants: O3(ppm), NO2(ppm),

(40)

CO (ppm).

3. DustTrak: from 03/07/2017 12:58:21 to 18/07/2017 12:04:45 with a sensor reading frequency of 2 minutes. Pollutants: PM2.5 (mg/m3_),

PM10 (mg/m3).

Once data have been collected we defined a sensors evaluation procedure developed on MATLAB. This procedure consisted of:

Timescale alignment and common interval search. Optimized interpolation and resampling every 1 minutes.

Scatterplot of reference signal vs. sensor signal and least-squares line to evaluate the presence of a linear relationship.

Linear regression model with model specification: F (x) = β0+ β1x

and a sequential: – Residual Analysis,

– Outliers determination using Cook’s Distance. Polynomial regression model with model specification:

– F (x) = β0+ β1x + β2x2

– F (x) = β0+ β1x + β2x2+ β3x3

Robust linear regression.

Non-linear regression: exponential and power models.

Therefore, the following section explains the methods we used and the sta-tistical properties that we reported to analyse the results.

4.2.1 Linear regression

A regression model relating a predictor x to a response y is of the form:

y = F (x) + ε (4.1)

where the function F (x) represents the expected value (mean) of y given x. In this work we studied these model for the form of the function F (x):

(41)

1. F (x) = β0+ β1x

2. F (x) = β0+ β1x + β2x2

3. F (x) = β0+ β1x + β2x2+ β3x3

The values of the coefficients are unknown and must be estimated from the data. The method of estimation is least squares, which minimizes the sum of squared residuals in the sample:

ˆ

βOLS= arg min β n X i=1 (yi− ˆf (xi))2= arg min β n X i=1 ε2_i

A residual εi is the difference between the observed response yi and the

fitted value based on the estimated coefficients. To determine the proper model type, we evaluated:

Ordinary Rsquared or R2

ord : the proportion of the total sum of

squares explained by the model.

R2ord=

SSR SST R2

ord may have any value between 0 and 1, with a value closer to 1

indicating that a greater proportion of variance is accounted for by the model.

RMSE: root mean squared error (also known as fit standard error and the standard error of the regression).

RM SE =√M SE

where MSE is the mean square error or the residual mean square . M SE = SSE

DF E

where SSE is the sum of squared errors and DFE is the degrees of freedom.

SSE: sum of squared errors (residuals). This statistic measures the to-tal deviation of the response values from the fit to the response values.

(42)

SSE =

n

X

i=1

(yi− ˆyi)2

where yi is the i th value of the variable to be predicted and ˆyi is the

predicted value of yi. A value closer to 0 indicates that the model has

a smaller random error component.

SSR: regression sum of squares. The regression sum of squares is equal to the sum of squared deviations of the fitted values from their mean.

SSR =

n

X

i=1

( ˆyi− ¯y)2

where ˆyi is the predicted value of yi and ¯y the mean value of the

response variable.

SST: total sum of squares.

SST =

n

X

i=1

(yi− ¯y)2

Therefore, we can define the total sum of squares equal to the explained sum of squares and the residual sum of squares.

SST = SSE + SSR. N TESTING: number of observations.

4.2.1.1 Residual analysis

The difference between the observed value of the dependent variable y and the predicted value ˆy is called the residual e1_.

ei= yi− ˆyi

Each observed residual can be thought of as an estimate of the actual unknown error term:

εi= yi− E(yi)

(43)

The typical assumption about the error terms ε is that they are indepen-dent each other and iindepen-dentically normally distributed random variables with mean zero and common variance σ2_{. So, the observed residuals should reflect}

the properties assumed for the unknown true error terms. Plots of residuals are very powerful methods to check the validity of these assumptions and provide information on how to improve the model Refs. [40]. In this work we analysed:

Case order plot: it is a scatterplot that can be used to check the drift of the variance during the experimental process, when data are time-ordered. If the residuals are randomly distributed around zero, it means that there is no drift in the process.

Residuals vs fitted values: it is a scatterplot of residuals on the y axis and fitted values (estimated responses) on the x axis. The plot is used to detect non-linearity, unequal error variances, and outliers. If residuals are randomly distributed around the 0 line, this suggests that the linearity assumption is reasonable. If the residuals roughly form a “horizontal band” around the 0 line, then the variances of the error terms are equal (heteroscedasticity). Conversely, if no residual stands out from the basic random pattern suggests that there are no outliers. Normal probability plot: it is constructed by plotting the sorted values of the residuals versus the associated theoretical values from the standard normal distribution. If the random errors are normally distributed, the plotted points lie close to a straight line. Distinct curvature or other significant deviations from a straight line indicate that the random errors are probably not normally distributed. Histogram: it can be used to check whether the variance is normally

distributed. A symmetrical bell-shaped histogram evenly distributed around zero indicates that the normality assumption is likely true. If the histogram indicates that random errors are not normally dis-tributed, the model’s underlying assumptions may have been violated.

4.2.1.2 Cook’s distance

In regression models, one or a few observations may have undue effects on estimators. In the classical linear models, Cook’s distance (D) Refs. [41] is

(44)

a commonly used estimate of the influence of a data point when performing a least-squares regression analysis.

Technically, Cook’s D is calculated by removing the i-th data point from the model and recalculating the regression. It summarizes how much all the values in the regression model change when the i-th observation is removed. The formula for Cook’s distance is:

Di= ei 2p ∗ M SE _h ii (1 − hii)2

where p is the number of regression coefficients, MSE is the mean squared error of the regression model and hi ≡ xTi(XTX)xi the diagonal element of

the projection matrix H, is the leverage of the i -th element. A data point having a large Di indicates that the data point strongly influences the fitted

values. Observations with Cook’s distance values greater than a threshold value suggest the presence of outliers. In Refs. [42] a cut-off is prosed of:

Di =

4

n − k − 1 (4.2)

where n is the number of observations and k the number of coefficients computed by the model. Otherwise, the traditional cut-off might not be adequate where model error distributions exhibit skewness or heavy tails Refs. [43].

4.2.2 Robust linear regression

The main disadvantage of least-squares fitting is its sensitivity to outliers Refs. [44]. Outliers have a large influence on the fit because squaring the residuals magnifies the effects of these extreme data points. Robust regres-sion methods provide an alternative to least squares regresregres-sion by requiring less restrictive assumptions. These methods attempt to reduce the influence of outlying cases in order to provide a better fit to the majority of the data Refs. [45].

If we consider the linear model for the i th of n observation yi= β0+ β1xi+ ... + εi= xTiβ + εi

to reflect the error term’s dependency on the regression coefficients we can write

(45)

i(β) = yi− xTiβ

The most common general method of robust regression is estimation. M-estimators attempt to minimize the sum of a particular objective function ρ(·) that gives the contribution of each residual.

ˆ βM= arg min β n X i=1 ρ(i(β)).

Some M-estimators are influenced by the scale of the residuals, so a scale-invariant version of the M-estimator is used:

ˆ βM= arg min β n X i=1 ρ i(β) τ where ˆ τ = M AD 0.6745 ˆ

τ is the median of the residuals, and MAD is the median absolute de-viation of the residuals from their median based on the idea that, for the standard normal E(M AD) = 0.6745.

Minimization of the above is accomplished primarily in two steps: 1. Differentiating the objective function with respect to the coefficients β

and setting the partial derivatives to 0 resulting in a set of p non-linear equations n X i=1 xi,jψ i τ = 0

where ψ(·) = ρ0_{(·)ψ(·) is called influence curve Refs. [46]. Define the}

weight function w() = ψ()/, and let wi= w(τi) for i = 1, ..., n with

wi= 1 if i = 0 and j = 1, ..., p n X i=1 xi,jwi i τ = n X i=1 xi,jwi yi− xTi β 1 τ = ... = n X i=1 xi,jwiyi (4.3)

(46)

2. To iteratively estimate the weighted least squares, a numerical method called iteratively reweighted least squares Refs. [47, 48] is used until a stopping criterion is met.

Defining the weight matrix [W−1](t) _{= diag(w}(t) 1 , . . . , w

(t)

n ) yields, for

iterations t=0,1,. . . , the following matrix form of equation (4.3): ˆ

β(t+1)= (XT[W−1](t)X)−1XT[W−1](t)y, and weighted least squares estimates such as:

w(t)_i =      ψ[(yi−xtiβ(t))/ˆτ(t)] (yixtiβ(t))/ˆτ(t) , if yi6= xTiβ (t)_; 1, if yi= xTiβ(t).

This is very similar to the solution for the least squares estimator, but with the introduction of a weight matrix to reduce the influence of outliers Refs. [49, 50].

The functions chosen in M-estimation are given below in the form ui= u(_τi)

Refs. [51]: 1. Andrews:

w(e) =

( _sin(e/k)

e/k for |e| < πk

0 for |e| ≥ πk (4.4)

with default tuning constant k=1.339. 2. Bisquare: w(e) = _{[1 − (}e k) 2_]2 _for _{|e| < k} 0 for |e| ≥ k (4.5)

with default tuning constant k=4.685. 3. Cauchy

w(e) = e

1 + (e_k)2 (4.6)

with default tuning constant k=2.385. 4. Fair

w(e) = 1

1 + |_ke| (4.7)

(47)

5. Huber:

w(e) = (

1 for |e| <= k

k

|e| for |e| > k

(4.8) with default tuning constant k=1.345

6. Logistic w(e) = tanh( e k) e k (4.9) with default tuning constant k=1.205

7. Talwar

w(e) =

1 for |e| ≤ k

0 for |e| > k (4.10)

with default tuning constant k=2.795 8. Welsh

w(e) = e−(ek) 2

(4.11) with default tuning constant k=2.985

Default tuning constants k give coefficients estimates that are approx-imately 95% as statistically efficient as the ordinary least squares (OLS) estimates provided the response has a normal distribution with no outliers. A lower value of k increases the down weight assigned to large the residu-als; upper value of k instead decreases the down weight assigned to large residuals (Refs. [52, 53]).

4.2.3 Non-linear regression

While linear regression is often used for building a purely empirical model, non-linear regression is usually applied when there are physical reasons for believing that the relationship between the response and the predictors fol-lows a particular functional form.

Non-linear regression is a regression in which the dependent variables are modelled as a non-linear function of model parameters and one or more in-dependent variables (called predictors).

To determine the non-linear parameter estimates, an iterative algorithm is typically used Refs. [54]. A non-linear regression model has the form:

(48)

where x is a vector of p predictors, γ is a vector of k parameters2_{, f (·) is}

some known regression function, and εi are random errors.

The unknown parameter vector γ in the non-linear regression model is esti-mated from the data by minimizing a suitable goodness-of-fit expression with respect to γ . The most popular criterion is the sum of squared residuals:

ˆ γ = arg min γ n X i=1 yi− f (xi, γ)2

and estimation based on this criterion is known as non-linear least squares.If the errors εi follow a normal distribution, then the least squares estimator

for γ is also the maximum likelihood estimator. However, since the func-tions are non-linear in the parameter estimates ˆγk to be solved, iterative

numerical methods are often employed Refs. [54]. In this work we employed a Matlab implementation of Levenberg-Marquardt non-linear least squares algorithm that requires to supply initial estimates of the parameters to con-verge Refs. [55]3_.

Some non-linear regression problems can be moved to a linear domain by a suitable transformation of the model formulation. When so transformed, standard linear regression can be performed to supply initial estimates of the parameters for the non-linear least squares algorithm Refs. [56].

Exponential regression models:

yi= γ0eγ1xi+ i, (4.12)

where the i are independent normal errors with constant variance.

To find initial values for γ0and γ1we can linearise the response function

by taking the natural logarithm:

log(γ0eγ1xi) = log γ0+ γ1x,

log f (x, γ) = log γ0+ γ1x. (4.13)

Thus we can fit a simple linear regression model: g(x, γ) = β0+ β1x

with response g(x, γ) = logf (x, γ), β0= logγ0 implying γ0 = eβ0 and

2_{The parameter vector in the response function is denoted by γ rather than β to}

underline non-linearity.

(49)

γ1 = β1. So, in the exponential model procedure, the least squares

linear regression method is used to solve for the β0 and β1 coefficients

which are then used to determine the original constants of the expo-nential model.

Power regression model4

yi= γ0xγi1+ i, (4.14)

To find initial values for γ0and γ1we can linearise the response function

by taking the natural logarithm:

log(yi) = log γ0+ γ1logxi.

Thus we can fit a simple linear regression model: g(x, γ) = β0+ β1x

with response g(x, γ) = log(yi), x = log(x), β0= logγ0and γ1= β1.

In the power model procedure, a log-log transform and the least squares linear regression method are used to solve for the a β0 and β1

coeffi-cients which are then used to determine the original constants of the power model.

Finally, to analyse the model we know that R2 _{is not useful in the}

non-linear case, as SSR + SSE 6= SST in many cases. Otherwise we can check the others statistics and the residuals plots. One of the problems with the R2 _{definition is that it requires the presence of an intercept, which most}

non-linear models do not have Refs. [57].

4.3 Figaro TGS-2600 for CO

In section 4.2 we described the calibration procedure so, which is instru-mental to evaluate sensor and reference signals. Therefore, in figure 4.8 we reported the HORIBA APMA-370 dataset, and in figure 4.9 we reported the AirQino output.

4_{In chemical engineering, the rate of chemical reaction is often written in power function}

(50)

Figure 4.8: Horiba APMA-370 CO Dataset (ppm). The x-axis represents the time interval during which data were obtained. Red lines indicate day-time CO data collected from 7:00am to 5:00pm; blue lines, indicate night-time CO data from 5:00pm to 7:00am.

Figure 4.9: AirQino CO dataset (counts). The x-axis represents the time interval during which data were obtained. Red lines indicate day-time CO data collected from 7:00am to 5:00pm; blue lines, indicate night-time CO data from 5:00pm to 7:00am.

(51)

In each figure, data are coloured in two different ways: (i) red lines indicating day-time data collected from 7:00am to 5:00pm; (ii) blue lines, in-dicating night-time data from 5:00pm to 7:00am. A primary analysis shows that both AirQino CO sensor and the reference instrument exhibit a de-creasing trend in the day-time, while an inde-creasing trend is exhibited in the night-time. Then, the next step was to draw the a scatterplot to evaluate the linearity of the signals relationship and compute a linear regression model.

Figure 4.10: CO Horiba (ppm) vs AIRQino CO (counts). Red line indicate a least square line.

Linear regression model

As we can see in the figure 4.10, where we have also plotted a least square line, the relationship is weakly linear, so we expect a poor statistics for the linear regression.

In table 4.1 the statistical scores are reported. As we can see not only R2 has a low value but also the SSE, the total deviation of the response values from the fit to the response values, is not encouraging.

(52)

R2 _{RMSE (ppm)} _SSR _SSE _SST _{N TESTING}

0.0319 0.0461 1.0578 32.0896 33.1474 15070

Table 4.1: CO linear model statistics.

the predicted response computed by the model F (x) = 0.0363 + 0.0004x and the reference values.

Figure 4.11: CO predicted response (red line, ppm) vs the reference signal (blue line, ppm). The x-axis represents the time interval during which data were obtained.

Residual Analysis and Cook’s Distance

As suggested by the poor results in table 4.1, we expect residuals not to be normally distributed with mean zero and common variance.

In figure 4.12 the normal probability plot shows that it is not reasonable to assume that the error terms are normally distributed, as a decreasing curve

(53)

Figure 4.12: CO linear model residual plots of linear regression.

implies an asymmetrical distribution. Also the histogram suggests that the residuals (and hence the error terms) are not normally distributed; on the contrary, the distribution of the residuals is quite skewed. Conversely, the residuals vs. fits plot instead shows that the residuals have not a constant variance (Heteroskedasticity).

To investigate whether these problems are caused by the presence of outliers we used the Cook’s distance, described in section 4.2.1.2 and then we evaluated again the model and the residuals after the cut-off.

As we can see in figure 4.14a, the normality plot is better distributed and the histogram in figure figure 4.14b is less skewed but in this case, the procedure does not suffice to improve the linear model. In table 4.2 we reported the statistics for the new model F (x) = 0.0905 + 0.0002x.

Cut-off (%) R2 _{RMSE (ppm)} _SSR _SSE _SST _{N TESTING}

3.39 0.0158 0.0379 0.3362 20.9610 21.2972 14559

(54)

Figure 4.13: Cook’s distance.

(a) Residual probability plot after cut-off. (b) Histogram after cut-off

Figure 4.14: Residuals CO plot after Cook’s distance cut-off.

Robust linear regression model

In previous section we proved that in this case, the Cook’s cut-off is not suit-able to improve the linear regression model. But if we analyse the statistics in table 4.2, we can confirm that an outliers detection strategy improves the SSR. Therefore, the use a robust linear regression model, that attempt to reduce the influence of outlying, should provide a better fit to the majority of the data. In table 4.3 the usual statistics for the M-estimator functions are reported. In figure 4.16 the results of Talwar M-Estimator are shown. As we can see, the Talwar M-Estimator, model equation F (x) = 0.0798 + 0.0002x, is the best estimator that improves the model in terms of SSE.

(55)

Function R2 _{RMSE (ppm)} _SSR _SSE _SST _{N TESTING} Andrews 0.0576 0.0387 1.3759 22.5145 23.8903 15070 Bisquare 0.0573 0.0387 1.3703 22.5314 23.9017 15070 Cauchy 0.0516 0.0394 1.2703 23.3590 24.6292 15070 Fair 0.0516 0.0394 1.3308 24.4821 25.8128 15070 Huber 0.0452 0.0402 1.1499 24.3703 25.5202 15070 Logistic 0.0507 0.0398 1.2747 23.8584 25.1331 15070 Talwar 0.0497 0.0357 1.0066 19.2473 20.2540 15070 Welsch 0.0543 0.0389 1.3067 22.7608 24.0674 15070

Table 4.3: CO linear model statistics robust regression.

(a) Andrews residual probability plot. (b) Talwar residual probability plot.

Figure 4.15: Residuals plots of Talwar and Andrews M-estimator.

Otherwise, the Andrews M-Estimator, F (x) = 0.0363 + 0.0004x, improves the model in term of R2_{. Both the estimators have not normal distributed}

residuals (figure 4.15(a) and 4.15(b)) .

(56)

Figure 4.16: CO Predicted response of Talwar robust regression model (red line, ppm) vs reference signal (blu line, ppm). The x-axis represents the time interval during which data were obtained.

Polynomial regression model

The model order is the type of model used to show a trend in the data. The model order is an important factor in assessing how accurately the model describes the data and predicts a response. A quadratic model can explain curvature in the data, while a cubic model can describe a peak-and-valley pattern in the data.

Pol. Degree R2 _{RMSE (ppm)} _SSR _SSE _SST _{N TESTING}

Quadratic 0.0426 0.0459 1.412 31.7354 33.1474 15070 Cubic 0.0608 0.0455 2.0155 31.1319 33.1474 15070

(57)

The equations for the data in table 4.4 are: (i) F (x) = −0.7671 + 0.0047x + 0.00005x2

(ii) F (x) = −7.4763 + 0.0572x − 0.0001x2_{+ 0.000001x}3

Figure 4.17 shows the predicted response of both polynomial models.

Figure 4.17: A detail of CO Predicted response (ppm) of polynomial regres-sion model. Red line represents the quadratic model response (ppm), yellow line the cubic model response (ppm) and blue lines represents the reference signal (ppm). The x-axis represents the time interval during which data were obtained.