Accuracy of predicting sepsis mortality with machine learning compared to Sequential Organ Failure Assessment Scale

(1)

1

Accuracy of predicting sepsis mortality with machine learning

compared to Sequential Organ Failure Assessment Scale

Author

Alexander Aviri Faculty of Medicine

Lithuanian University of Health Sciences

Supervisor

Vita Špečkauskienė Doctor of Science

Department of Physics, Mathematics and Biophysics

2020-02 – 2021-05 Kaunas

(2)

2

Summary

Alexander Aviri. Final Master Thesis: Effectiveness of machine learning in detection of sepsis compared to current screening methods.

Sepsis is a life-threatening condition which leads to rapid tissue damage and organ failure.

This condition is difficult to diagnose in early stage. SOFA score is used for diagnosis of sepsis currently in clinical practice. It is crucial for the patient to be diagnosed rapidly, since early treatment gives a better prognosis. Machine learning is a subbranch of artificial intelligence. It is already used in several branches of medical care and diagnostics.

Aim: To analyze the performance of machine learning prediction concerning sepsis mortality. Objectives: (1, 2) To assess, analyze and present the performance of machine learning-based algorithms. (3) To review and compare of machine learning to SOFA score. (4) To present different predictive models of machine learning and discuss the difficulties.

Methods: An electronic database research was conducted on this topic. Search of studies was focused on years of publishment between 2012 up to 2020. Peer-reviewed articles were included. Interest of measurement unit was AUROC score which is a summary measure of sensitivity and specificity. Machine learning studies were included if they predicted sepsis mortality. Studies conducted in various countries were included. Studies based on Pediatric patients were excluded, as well as studies written in different languages than English. The selected environments were ICU, hospital and emergency department.

Results: A total of 1959 studies were found through online research and only ten of them were included in this research, due to the inclusion and exclusion criteria. The studies were divided between machine learning algorithms (n= 5), and SOFA score (n =5). The studies included were made retrospectively or prospectively. In hospital environment the best AUROC score of machine learning algorithms presented: 0.85, while SOFA score: 0.75-0.9. In emergency department the machine learning score gave: 0.9, while SOFA score: 0.63-0.78. In ICU environment the machine learning score showed: 0.85 and for SOFA score 0.79-0.83. In all environmental settings the machine learning score was similar to SOFA or slightly higher in predicting sepsis.

Conclusion: Machine learning outperforms SOFA score marginally in the prediction of sepsis mortality. Therefore, it can be used in clinical practice to help physicians to take faster actions in treatment decisions, and easier select higher risk patients. The use of machine learning in medical care needs to be further evaluated.

(4)

4

Acknowledgement

I sincerely thank my supervisor, Vita Špečkauskienė for her help and support given during

the whole study, for her guidance regarding the structure of the thesis and feedback through the entire work.

Conflict of Interest

There was no conflict of interest in this research.

Permission issued by the Ethics Committee

Approved from the bioethics center at LUHS.

(5)

5

Abbreviations

AUC – Area under curve

AUROC – Area under the receiver operating characteristic curve CART –Classification and Regression Tree

CI – Confidence interval

EHR – Electronic health records GBM – Gradient boosting machine ML – Machine Learning

LASSO –Least absolute shrinkage and selection operator LRM –Logistic regression model

LGBM –Light gradient boosting machine RFM –Random forest model

ROC – Receiver operating characteristics curve SOFA –Sequential organ failure assessment score SVM – Support vector machine

(6)

6

Introduction

Sepsis is a common life-threatening condition, which is caused by a reaction of the body to

an infection and a rapid increase of chemicals leading to: rapid tissue damage and organ failure, with even death as a final result if treatment is not given early. The mortality of sepsis is very high, with a global estimated incidence of 148 per 100 000 person per year and a mortality rate of 26% (1)(44). Clinical manifestations are fever, tachycardia, shortness of breath, and an increase in peripheral white blood cells.

The sepsis follows often with prolonged hospital stay and a high cost of treatment (2). It is

a crucial point to identify sepsis in the early stage and to give antibiotics immediately, for the reduction of mortality rate and the elevated cost of it (3)(38).Today the medical care is still struggling to diagnose it, due to lack of specific diagnostic and rapid tests.

Currently, there is no accurate way to diagnose sepsis but is relying on physiological and

laboratory measures. In 1991, the first definition of sepsis was defined as an “ongoing” process. In 2001, there was proposed a new word for sepsis and now it was described as a clinical syndrome together with organ injury and the diagnostic criteria for sepsis was still left the same as from 2001. In 2016, there was again made some changes to the definition of sepsis and the introduced new criteria under the name Sepsis-3 and the term severe sepsis was removed. It was described now as a life-threatening organ dysfunction that is cause due to a dysregulated response of body to an infection (5). To diagnose sepsis, medical doctors have an algorithm with guidelines to follow that starts with a quick Sequential Organ Failure Assessment (SOFA) scale. The quick SOFA scale includes three points of maximum and the patient gets one point for each criterion. The criteria are, one point for a respiratory rate of above 22 /min, one point for change in mental status and lastly, one point for a systolic blood pressure below < 100mmHg (6)(41)(42). Once there are at least two out of three criteria, the clinician should go towards evaluating the SOFA scale (Figure 1). In the SOFA scale, there are several criteria and laboratory tests evaluated, as there is not any specific test that could be considered as golden. There is an evaluation of PaO2/FiO2, which is the ratio of arterial oxygen partial pressure to the fraction of inspired oxygen with given points of 0-5. Then Glasgow scale, which is the assessment of consciousness of the patient. Evaluation of kidney with creatinine values, liver by bilirubin and finally platelet count in the blood (45). As seen, the whole evaluation of sepsis is based on various variables and not on any specific biomarker. A more advance stage of sepsis is when it becomes classified as septic shock, when the vasopressors are needed to maintain at least the blood pressure above 65 mmHg and that serum lactate is increased above 2 mmol/l. The algorithm for sepsis with different criteria can be summarized in table (1) for an overview.

(7)

7

Studies are showing different sensitivity and specificity depending on the qSofa or SOFA

scales. Often, medical doctors have a hard time to differentiate sepsis from other diseases, as there are mixing symptoms and are not enough sensitive, neither specific test to make an easy diagnose. This can prolong the time for the patient to get a definitive diagnosis and can lead to the patient´s death, due to this delay. There are several currently ongoing studies, searching for biomarkers which have a higher sensitivity and specificity, than the existing scales earlier mentioned. This, by finding a single golden biomarker or even a combination of biomarkers. But still, nothing found yet that exists in current clinical routines (6). Studies have shown that with a SOFA score of 9, there was around 33% of mortality. If the score would exceed 11 points, the mortality reached 95%. The amount of score points was correlated with the mortality percentage (7).

(8)

8

Fig 1

Figure 1The Algorithm and criteria for clinicians to follow in case of suspected case of sepsis

Modified from (1) Lambden, S., Laterre, P. F., Levy, M. M., & Francois, B. (2019). The SOFA score-development, utility and challenges of accurate assessment in clinical trials. Critical Care (London, England), 23(1), 374. And from (2) Gul, F., Arslantas, M. K., Cinel, I., & Kumar, A. (2017). Changing Definitions of Sepsis/Sepsis Tanimlarinin Degisimi.(Review. Turkish Journal of Anaesthesiology and Reanimation, 45(3), 129–138.https://doi.org/10.5152/TJAR.2017.93753

(9)

9

During the last decades, with rapid progress in technology, the interest in machine learning

has increased immensely and also the appliance into the medical field. Machine learning could help clinicians to take faster actions and analyze critical ill patients with a more precise accuracy.

The machine learning techniques, with help of the data stored in electronic health records,

give an opportunity to conduct data-driven predictive analytics in different medical environments, such as emergency care or hospitals (8). Machine learning (ML) is commonly used as a term under the umbrella of Artificial Intelligence (AI). It has over time proven, to show а great value in various fields (9). From а clinical perspective, with the help of ML, it could provide for physicians a help to administer antibiotics and take blood cultures for correct pathogens at the right time, which could save numerous lives.

ML models take time to develop, as the results are not always as easily interpreted and

compared to the triage scores at entrance of the patient. In different environments, such as in emergency department, there are more resources to get blood tests faster and other tests, compared to smaller hospitals. There are earlier studies that have shown that SOFA scale, is one of the most accurate scales regarding prediction of mortality in sepsis, compared to its competitors as SIRS or qSOFA (10).

The performance results were given in AUROC score due to the fact, that majority of ML

studies were given in this measurement. AUROC score is AUC (area under the curve) of ROC-curve (Receiver operating characteristic). ROC analysis is generally used in clinical epidemiology to quantify the accuracy of different diagnostic tests, how good they can discriminate between two states of a patient, typically if a patient has a disease or has no disease (22, 23). Therefore, ROC test is analyzing a single test, in a single population and thereafter compares the accuracy with different threshold for positive values (25). To graph the ROC-curve, it is performed by taking the sensitivity (true positive rate, which is on vertical axis) against 1-specificity (false positive rate on horizontal axis), and each point on the graph is obtained from different thresholds. Once all true positive and false positive are obtained, then the ROC-curve is calculated with help of the different thresholds obtained and plotted on a graph (26).

The values that is possibly to be obtained is a score between one and zero. A perfectly

random sample gives an AUC score of 0.5. The closer a score to one, means that the test is more accurate in discriminating between true positive and true negative (27) (28) 29). The AUC is summarizing the whole ROC curve. This is significant, since it can be interpreted as the probability of a subject, which is randomly chosen, to be either succumbing the disease or not depending on the score (24). Almost all results were given in 95% confidence here and further. To calculate the confidential interval from the AUC, there are several formulas applied. The confidential interval is equal to AUC ± se · zcrit . Where zcrit is the two ended critical value of normal standard

(10)

10 distribution. To get the “se” and zcrit value, two different formulas are used to calculate each of them separately.

First, to get the “zcrit” value, the formula applied is provided in Excel =

NORM.S.INV(1-α/2) and variable “a” will have a value of 0.05 to give a 95% confidential interval. Thereafter to get

the “se” value the formulas applied are shown below. Formula for calculation of 𝑠𝑒 = √𝑞𝑜+(𝑛1 − 1)𝑞1+(𝑛2−1)𝑞2

𝑛1𝑛2 and for q0,q1 and q2 we use the formulas q0 = AUC (1 − AUC), q1 = AUC 2 −AUC− AUC 2_{and q} 2 = 2AUC2 2 −AUC− AUC

2_{. N1 and n2 are sample sizes of either having the disease} or not having it and in our case and study the people who either died or survived from sepsis (29).

(11)

11 Once the AUC score is calculated together with confidential interval in various studies, it is possible to compare how well the ML models are predicting compared to the SOFA score and each other.

The aim of this research is to compare the performance of machine learning models to the SOFA score in AUC measurement.

(12)

12

Aim

The aim of this work is to evaluate the performance of machine learning models and

compare with SOFA scale in regard of predicting mortality for sepsis and septic shock with AUROC as a measure metric in hospital, ICU and emergency environment.

Objectives

1 To assess the best performance machine learning algorithms and to compare the results to the SOFA scoring system.

2 To present the results of machine learning algorithms and their findings.

3 To review and compare the machine learning-based algorithm in comparison to the current SOFA score for sepsis.

4 To present the results of machine learning in outcomes of the effectiveness of anticipating sepsis mortality accuracy with different predictive models of machine learning, what difficulties there are.

(13)

13

Literature review

Sepsis is a very dangerous and serious condition caused by an infection that leads to tissue damage and organ failure due to an increase of chemicals in the bloodstream. It is crucial to provide fast and efficient treatment for the patient. Currently in clinical practice there are used different scoring systems to diagnose sepsis and the most important score system today is the sequential organ failure assessment score also abbreviated to SOFA score. It is based on various physiological and laboratory measures which are taken from blood samples and by analyzing if they are within a normal range or not.

The definition of sepsis has undergone various changes during latest decades. The first definition was made in 1991 with the definition of an ongoing inflammatory process. Next change was made in 2001 where the sepsis was described instead as a clinical syndrome together with organ injury. In 2016 the sepsis got a better definition and description of a life-threatening organ dysfunction that is caused due to a dysregulated response of body to an infection (1)(5).

Today in the clinical practice there is still not a very accurate way to diagnose sepsis except for the clinical and laboratory measures (3)(38). Researchers are constantly trying to find new ways and methods to diagnose sepsis more accurately. It was investigated in this study to see if with help of Machine learning there could be some better predictions over the current laboratory and clinical measures in form of SOFA score system. Machine learning is a branch of artificial intelligence that is constantly improving automatically by inputting data to the ML algorithms.

Definition & classification

In clinical practice there are some variations in the term of sepsis, where terms could be found as septic shock, severe sepsis or only just sepsis. Although the term severe sepsis in no longer used as it was removed in 2016, it is still important to mention it since many studies are still

including this term and it is still quite often seen in the literature. Severe sepsis is the same definition as sepsis, but the difference is that there is an addition of acute organ dysfunction.

Septic shock which can be found in this review is a sepsis when it is persistent or refractory hypotension despite adequate fluid resuscitation (46). Machine learning uses variables to learn from past experience to detect patterns from large and complex datasets. In this review all of ML

algorithms were supervised rather than unsupervised. The supervised algorithms are used when the outcome is known, and the ML is developing a prediction model (47). There are various ML algorithms that are used for this purpose which will be discussed in this review. ML algorithms are numerous and some of them are as random forest model, support vector machine or extreme gradient boosting which are some of the algorithms which will be reviewed.

(14)

14 Prevalence & mortality

Sepsis prevalence is varied in different age groups, the older the age group the higher

prevalence is seen. According to a study, patients who are older than 65 years old had a mortality of 2.3 times higher than patients below that age. (48)(57). Another study has showed that more than 50% of patients had at least one comorbid illness and that patients with chronic pulmonary disease, immunosuppression, liver disease or diabetes mellitus were associated with sepsis (49).

A study conducted regarding sepsis in Germany showed that there was an incidence of 256 to 335 cases per 100 000 persons per year and that the percentage of sepsis were 41% and that the in-hospital mortality was 24.3% (50). Another study conducted on mortality of sepsis in Europe, North America and Australia showed that there was an average 30-day septic shock mortality of 34.7% and 90-day of 38.5% (51). This numbers show us that sepsis is a very serious and common condition in our societies.

Etiology

Sepsis can be caused by various etiologies but gram-negative bacteria is usually the most frequent cause of sepsis (52) (53). Sepsis can also be caused by fungal, parasitic or viral infections. Lately gram-positive bacteria have gained in incidence with predominant microorganism of

Staphylococcus aureus (54). As sepsis can originate from various sites of the body there are from some sources of the body where infection can originate more frequently.

A study has showed that the most common site of sepsis is from respiratory tract infections, specifically pneumonia that is also associated with highest mortality (55). Other sources of infection that are in top of patients with sepsis are from genitourinary tracts and abdomen (56).

Management

Main treatment of sepsis currently is with antibiotics and control of the source of infection. The selection of correct antibiotic can be challenging due to antibiotic resistance, type of pathogen and clinical comorbidities (58). It is crucial to provide antibiotics as soon as possible and a study showed that if delayed for more than one hour after ICU admission there is a significant increase in hospital mortality (59).

Intravenous fluids play a critical role for a patient with sepsis for hemodynamic stabilization and resuscitation. Although fluids play a critical role there is still some questions remaining

(15)

15 condition and this is causing an imbalance for the cellular oxygen consumption and oxygen

delivery. A patient in sepsis and especially septic shock should be given oxygen to maximize hemodynamic resuscitation for the patient (62).

(16)

16

Methods

This study was conducted in accordance with the Preferred Reporting Items for Systemic

Reviews and Meta-analysis (PRISMA) with guidelines on conduct and reporting applied (11).

Protocol and information sources

An electronic database search was conducted in Medline through PubMed, Ovid and manual

search on Google Scholar studies. Search of studies was made through years that were published between 2012 up to 2020.

The keywords used are “Machine learning sepsis prediction mortality” and “Sepsis

prediction mortality SOFA”. Simple keywords were also used as only “Machine learning”, “Sepsis”, “AUROC sepsis”.

Also, searches were conducted manually by analyzing various articles’ references and

through various search engines. The search filter was also applied to narrow down articles only to peer-reviewed and to exclude systemic literature reviews.

Eligibility criteria and study selection

The aim of selecting studies was to choose articles with given results of the measurement

AUROC, for convenience purposes. Most results of machine learning methods gave results in this metric, AUROC which is a summary measure of sensitivity and specificity and is a habit for the field of diagnostic test accuracy (12). The closer this number is to 100% means that the model is more accurate in predicting correctly true positive and minimizing false positives (13).

Studies were included if the machine learning models were predicting accuracy of mortality

caused by sepsis. There was no importance regarding which type of machine learning algorithm that was used. Studies were only chosen that were written in English and in no other languages. Studies included were both prospective and retrospective. The aim was to include studies regarding either sepsis, septic shock. Studies that were predicting if the patient would die due to sepsis/ septic shock up to 12 hours in advance. There were no golden variables or diagnostic markers taken into consideration in inclusion criteria as each ML model had their own variables analyzed.

Regarding machine learning models there were no exclusion criteria for either supervised or

unsupervised model. Regarding SOFA scale prediction of sepsis, the inclusion criteria was that the results would be given in AUROC. The environment of prediction was either in hospital or emergency department.

(17)

17 Exclusion criteria

There were several exclusion criteria’s and those were when studies gave no AUROC results

but only sensitivity, specificity or other time of measurements. Studies that were using machine learning models to predict other types of sequalae of sepsis were also excluded. Studies that were conducted on pediatric patients were excluded.

Other studies involved as postoperative sepsis were not included.

Data collection

Collection of all data, analysis, sampling and inclusion, exclusion was performed by one

researcher, verified by a supervisor.

Analysis

Eligibly studies were collected and summarized in tables and in the results. The next step

was to rapport the outcomes in a summarized way. The data was not sufficient to rapport results in a meta-analysis, thus, a narrative approach was used.

The results were compared of ML algorithms against the SOFA scores in AUROC

measurements. The highest scores of each study were analyzed, of which ML model that were used, and we made an observation on how good they were competing with SOFA scale in regard to AUROC score as measurement unit.

(18)

18

Results

After screening various databases for eligible studies, a total of 1959 studies were found of

which 1917 on electronic database search (n=281 PubMed, and n = 1636 in Ovid database), and 39 by manual search in scholar. Of them, 10 studies were chosen that met the inclusion criteria. Different reasons of not meeting the right eligible criteria can be found in flow diagram. Eight studies were conducted retrospectively, and two studies were prospective studies.

Accordingly, the eligible studies were divided into two groups. The first group consisting of

studies that met the inclusion criteria for machine learning models. Those studies were n = 5 and studies that met inclusion criteria for SOFA score were n = 5.

Studies were published from year of 2012 up to 2020 at most were published in USA and

China. USA: n = 3, Turkey: n = 1, China: n = 3, Australia: n = 2, Korea: n = 1.

Regarding machine learning studies, n = 3 of them were carried out in hospital environment, n = 2 of them in ICU and n = 1 in emergency department. One of the machine learning studies were carried out in both ICU and in-hospital environment. Studies concerning the SOFA score were conducted n = 2 in hospital environment, n = 2 in hospital and n = 2 in emergency, of those studies one of them was conducted both in ICU and in hospital.

All of the studies are summarized in Figure 4. The patient population contained only adult patients of age >18 years old.

In studies involving machine learning, all of them were conducted retrospectively and under

the sepsis condition. There was variation in outcomes of different machine learning models.

Number of participants in each study varied from high as n = 42220 till small as n = 5278.

Those variables varied from 53 clinical variables up to over 500. Models that were used are RFM which gave a result of range low as 0.829-0.89.

CART algorithm gave a result 0.69 and was only used in one study (8). The LRM algorithm

gave a span of 0.76-0.833. Lasso algorithm gave a result of 0.829 (15). One study was predicting accuracy of mortality with different time span 72-hours and 28-day mortality after admission to the emergency department. The prediction of mortality was very similar between 72-hours and 28 days after admission with the best prediction of 72h with a score of 0.93 and at 28day score of 0.9. Both of these algorithms were SVM that gave highest AUROC results in this study (18). The confidence interval was reported for all ML studies except for one study (Boggle) with result of 0.8561 and conducted with the GBM algorithm.

(19)

19 Fig 3 Pubmed n= 281 Ovid database n= 1636

Second stage evaluation n = 40 Final analysis n=10 Total n= 1959 Manual search n= 39 *Irrelevant abstract/title n = 1580

*Results given in other measurement than AUROC n = 734 *Conducted in Different environment n = 1400 n = 1919 Excluded during 1st stage review No AUROC results n = 20 Prediction of other types of sequalae n = 3 Pediatric patients n = 5 Post-operative sepsis n = 15 n = 30 excluded during second stage evaluation

(20)

20

Fig 4

Figure 4Overview of studies and their environments where it´s conducted

(21)

21

Table 1 Summarizes all studies of prediction of mortality with ML algorithms

We can view the results in AUROC, confidential interval, environment, how many variables were included, number of participants with their age and the types of ML algorithms. First author, year country Patient population Design Target condition Machine learning model Environment AUROC ML results Other R Andrew Taylor et al. (8) n = 5278 of n = 4676 fulfilled sepsis Retrospectively. October -13 till October -14. Sepsis RFM. CART. Logistic regression model. In hospital1 _{RFM = 0.86} (95% CI = 0.82 – 0.9) CART = 0.69 (95% CI = 0.62 – 0.77) LRM = 0.76 (95% CI = 0.69 – 0.82)

Over 500 Clinical variables.

Guilan Kong; Ke Lin; Yonghua Hu (15) n= 16.688 Deaths n = 2949

Retrospectively. Sepsis Lasso.

RFM. GBM. LR. In hospital Lasso = 0.829 (95% CI = 0.827– 0.831) RF = 0.829 (95% CI = 0.837– 0.853) GBM = 0.845 (95% CI = 0.823– 0.834) LRM = 0.833 (95% CI = 0.830– 0.838) 86 Predictor variables Young Suk Kwon, Moon Seong Baek(16) n =23.587 training- validation dataset n = 4234 Retrospectively. January 2016 – December 2018 Sepsis XGB, LGBM, and RF ICU and In-hospital qSOFA based variables = 0.86 (95% CI = 0.85–0.87) n/a which model learning algorithm.

Prediction of three-day mortality. median population age = 63y qSOFA variables were used.

(22)

22

Auroc qSOFA = 0.78

Boggle et al. n =10.593 Retrospective and cross-sectional Between 2006-2016 Sepsis GBM In-hospital and up to 90days after hospital discharge GBM = 0.8561 No confidential interval

Age mean = 67y

Mortality between admission and 90 days after discharge

Total of 199 variables were used. Jau-Woei Perng et al. (18) n=42.220 1991 died within 72-h and 5939 died within 28-days. Retrospective January 2007 – December 2013 Sepsis RFM KNN SVM Emergency department 72-h mortality RF = 0.89 (95% CI = 0.88–0.89) KNN = 0.83 (95% CI = 0.83–0.84) SVM = 0.9 (95% CI = 0.92–0.93) 28-day mortality RF = 0.89 (95% CI = 0.89–0.89) KNN = 0.84 (95% CI = 0.83–0.84) SVM = 0.9 (95% CI = 0.89–0.90)

53 Clinical variables chosen as factors for ML algorithms.

(23)

23

Table 2 Summarizes all studies on prediction of mortality with SOFA score

We can view the results in AUROC, confidential interval, environment setting, which condition patient was in, number of participants with median age and type of study.

First author, year country Patient population Design Target condition Environment AUROC SOFA Other Baig MA et al. (13) N = 760 Prospective observational cohort study Severe sepsis1_/ Septic shock2

Emergency Department Severe sepsis = 0.63 (95% CI = 0.55–0.70) 71% Sensitivity and 57% specificity Septic shock = 0.63 (95% CI = 0.55–0.70) 70% sensitivity 59% specificity

Low middle-income country Conducted from October -16 till mars -17.

Eamon P. Raith et al. N= 184 875

Retrospective cohort analysis

Sepsis In hospital In hospital mortality = 0.753 (99% CI = 0.750–0.757)

In hospital mortality or ICU stay ≥ 3days = 0.736 (99% CI = 0.733–0.739)

From year 2000-2015 Mean age = 62.9y

Died patient = 34578 (18.7%)

Christopher P. Kovach et al.

N=10942 Retrospective Sepsis3 _{ICU and}

In-hospital

Hospital mortality = 0.9 (95% CI = 0.89–0.91) ICU stay > 3days = 0.84 (95% CI = 0.82–0.87)

Median age 52y.

Conducted between January 2011 and March 2017.

Mion-Yun Wen et al. N = 138 Prospective observational study

Sepsis ICU ICU mortality = 0.793

(95% CI = 0.716–0.870)

Median age 62y

Conducted between June 2017 to November 2018.

Stephan P. J Macdonald

N = 240 Retrospective Severe sepsis/ Septic shock

Emergency department ED mortality = 0.78 (95% CI = 0.71–0.87)

(24)

24

1. Severe sepsis was considered when criteria for sepsis were fulfilled with signs of acute organ dysfunction or hypoperfusion (defined Systolic blood pressure < 90mmHg or mean arterial blood pressure < 70mmHg or systolic blood pressure decrease > 40mmHg or less than two standard deviations below normal for age in absence of other causes of hypotension). Serum lactate above upper limits normal, urine output < 0.5 mL/kg/h for more than 2 h despite adequate fluid resuscitation, acute lung injury (ALI) with PaO2/FIO2 < 250 in the absence of pneumonia as infection source, ALI with PaO2/FIO2 < 200 in the presence of pneumonia as infection source, serum creatinine.

2. > 2.0 mg/dL (176.8 μmol/L), total bilirubin >2 mg/dL (34.2 μmol/L), platelet count < 100.000 μL or coagulopathy (international normalized ratio > 1.5). Patients were considered having septic shock when they fulfilled criteria for severe sepsis with the presence of hypotension (systolic blood pressure <90 mm Hg) despite adequate fluid resuscitation (13). 3. Septic shock was considered when criteria for septic shock were fulfilled and additionally presence of hypotension (systolic blood pressure < 90 mm Hg) despite adequate fluid

resuscitation (13).

4. If the urine, blood or sputum culture order followed by clinician order of an intravenous (IV) antibiotic within 72 hours, or if clinician orders an IV antibiotic followed by a culture order within 24 hours (14).

5. Exclusion criteria were age ≤18 years; pregnancy; cancers; hematologic disorders; a history of transplantation; and immunosuppressive drug use (15)

(25)

25

Figure 5 Synthesis of results

All the studies are summarized for a general view. The two ending lines in AUROC score represents the confidential interval of those study where it was calculated. The target condition, number of patients and what type of model was used in the specific study.

(26)

26

Discussion and Conclusions

Summary of evidence

In this review the focus is to evaluate ML models and their accuracy in various studies,

which are predicting the sepsis before its onset with various variables. The difficulties in clinical practice are that it takes time to collect all variables and to calculate the scores in SOFA scale. We will try to summarize the main findings of the ML and SOFA score results and compare the performance into similar environments. Concerning ML models there was a big variation in how many were used to train the model. In our review there were several ML algorithms for each study except for one, and we will take into consideration the best performance ML algorithm of each study in our discussion.

To start with the comparison in hospital environment, the best performance ML algorithm

score was with an AUROC value of about 0.85 and the best performed algorithms were RFM in one study and GBM in the other. In SOFA score, we had some greater variance from 0.75 up to 0.9 (8)(10)(13)(15)(16). Which shows that ML could be a better model to predict sepsis. It is hard to deduce any specific conclusion over the benefit, due to the greater variance in SOFA score results. But we certainly see that ML algorithms are a very convenient option. Regarding the emergency environment the ML score was 0.9 for the best algorithm and the best algorithms were RFM together with SVM. In SOFA score it only gave 0.63-0.78. In this environment the ML models outperformed the SOFA scale in prediction of mortality (12)(18)(19). For the ICU environment the ML score was 0.85 and not specified for that study which ML algorithm. For SOFA score 0.79-0.83 (1)(13)(16). This shows again that the ML score is very effective in predicting mortality of death due to sepsis. In all different environments the ML was better or equivalent to the SOFA score in performance.

Similar conclusions have been made from other studies regarding the benefits of applying

ML algorithms. A study concluded that it is reliable to apply ML algorithms in clinical practice and that they have been giving a better result in comparison to manual methods to identify sepsis (30)(40). In a different study it was concluded that another type of algorithm called XGBoost could provide an accuracy of higher than 80% in diagnosis of sepsis. Also, in that study it was concluded that the algorithm was greater in performance than the SOFA score in diagnosing of sepsis (31).

Mortality and diagnosis of sepsis are strongly associated, as one is affecting the other. Since

the analysis of mortality gives us information to create a more accurate diagnosis thus, by early diagnosis of sepsis the mortality rate could be decreased.

To apply ML models in clinical practice could prove being very helpful to clinicians in

managing and improving the outcome for the patients, which correlate with other studies (30)(32)(33)(35)(39).

(27)

27

Clinical implications in the future will become important as ML algorithms will be needed

to integrate into the electronic medical records, to not create unnecessary workflow for the physicians. With an integration into medical records and by an automatic triggering of support regarding clinical decision, it will become more efficient for clinicians to use this as a tool for selecting eligible patient of higher treatment (34)(37). Sepsis is a very heterogenous syndrome and thus patients in need of higher treatment would benefit from early administration of antibiotics (36).

Strengths and Limitations

There are several limitations and strengths to take into consideration. To start with one

strength, since all of the results were reported with the same measure, AUROC, and this gives us a more uniform result. Although, AUROC was chosen as measure results there are some limitations with the chosen unit. With calculations of false positives and false negatives of the diagnosis, it could could give some misclassification costs (21). There is also criticism that AUROC neglects the cost and harms the evaluation of false negatives against false positives (22).

Further limitations of the study are that the indications for ICU admissions/hospital stay

varied among the studies which could give different AUROC results. Because the studies vary in predictor variables for each ML algorithm this could give also different outcomes even with the same ML model. Some studies were conducted in hospitals while others in ICU or emergency department, because of this the variables could vary as the average age and other factors depending on environment. ML algorithms were predicting different amount of days into the future and could give an influence on the deviating results. One study was predicting up to 90 days after admission into hospital and thus the need to take this variance into consideration. The discrepancy that some studies were calculating the AUROC from septic shock while others from just sepsis could give us also some discrepancy in the results.

Moreover, it is important to consider that sepsis is a new emerging topic where there are still

a lot of limitations and still no standard way of conducting such study. Limitations of number of articles that would meet all of the inclusion criteria was also an issue. Due to that the studies were conducted in a wide different country there is a possibility that there are different routines in those environments which could affect the sampling and interpretation of results.

Conclusions

ML provides a great tool for assisting the physicians, by a combination of ML algorithms

(28)

28 predictions of mortality in our study. When comparing the performance of all algorithms, the RFM showed top results in different environments. In hospital environment the best ML algorithms were RFM together with GBM. In emergency environment the best ML algorithms were RFM and SVM. It is still difficult to tell which ML algorithm is the best, due to the variance of results and there is a need of additional studies with higher number of participants. Better results could be achieved by integrating the techniques into electrical medical records. Also, this is still a very novel emerging topic and further studies and research are needed. For instance, on how to apply these technologies in a daily clinical practice, to help predict the patients who are at higher risk of dying and thus in need of extra care.

(29)

29

References

1. Wang X, Wang Z, Weng J, Wen C, Chen H, Wang X. A New Effective Machine Learning Framework for Sepsis Diagnosis. IEEE access. 2018;6:48300–10.

2. Fleischmann C, Scherag A, Adhikari NKJ, Hartog CS, Tsaganos T, Schlattmann P, et al. Assessment of Global Incidence and Mortality of Hospital-treated Sepsis. Current Estimates and Limitations. American journal of respiratory and critical care medicine. 2016;193(3):259–72.

3. Novosad SA, Sapiano MRP, Grigg C, Lake J, Robyn M, Dumyati G, et al. Vital Signs: Epidemiology of Sepsis: Prevalence of Health Care Factors and Opportunities for Prevention. MMWR Morbidity and mortality weekly report. 2016;65(33):864–9.

4. Gül F, Arslantaş MK, Cinel İ, Kumar A. Changing definitions of sepsis. Turk J Anaesthesiol Reanim.

2017;45(3):129–38.

5. Marik PE, Taeb AM. SIRS, qSOFA and new sepsis definition. Journal of thoracic disease. 2017;9(4):943–5.

6. Lambden S, Laterre PF, Levy MM, Francois B. The SOFA score-development, utility and challenges of accurate assessment in clinical trials. Critical care (London, England). 2019;23(1):374–374.

7. Ferreira FL, Bota DP, Bross A, Mélot C, Vincent J-L. Serial Evaluation of the SOFA Score to Predict Outcome in Critically Ill Patients. JAMA : the journal of the American Medical Association. 2001;286(14):1754–8.

8. Taylor RA, Pare JR, Venkatesh AK, Mowafi H, Melnick ER, Fleischman W, et al. Prediction of In‐ hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data–Driven, Machine Learning Approach. Academic emergency medicine. 2016;23(3):269–78.

9. Vellido A, Ribas V, Morales C, Ruiz Sanmartín A, Ruiz Rodríguez JC. Machine learning in critical care: state-of-the-art and a sepsis case study. Biomedical engineering online. 2018;17(Suppl 1):135– 135.

10. Raith EP, Udy AA, Bailey M, McGloughlin S, MacIsaac C, Bellomo R, et al. Prognostic Accuracy of the SOFA Score, SIRS Criteria, and qSOFA Score for In-Hospital Mortality Among Adults With Suspected Infection Admitted to the Intensive Care Unit. JAMA : the journal of the American Medical Association. 2017;317(3):290–300.

11. McInnes MDF, Moher D, Thombs BD, McGrath TA, Bossuyt PM, Clifford T, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA : the journal of the American Medical Association. 2018;319(4):388–96.

12. Baig MA, Sheikh S, Hussain E, Bakhtawar S, Subhan Khan M, Mujtaba S, et al. Comparison of qSOFA and SOFA score for predicting mortality in severe sepsis and septic shock patients in the emergency department of a low middle income country. Turkish journal of emergency medicine. 2018;18(4):148– 51.

13. Kovach CP, Fletcher GS, Rudd KE, Grant RM, Carlbom DJ. Comparative prognostic accuracy of sepsis scores for hospital mortality in adults with suspected infection in non-ICU and ICU at an academic public hospital. PloS one. 2019;14(9):e0222563–e0222563.

14. Wen M-Y, Huang L-Q, Yang F, Ye J-K, Cai G-X, Li X-S, et al. Presepsin level in predicting patients’ in-hospital mortality from sepsis under sepsis-3 criteria. Therapeutics and clinical risk management. 2019;15:733–9.

15. Kong G, Lin K, Hu Y. Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU. BMC medical informatics and decision making. 2020;20(1):251–251.

16. Kwon YS, Baek MS. Development and Validation of a Quick Sepsis-Related Organ Failure Assessment-Based Machine-Learning Model for Mortality Prediction in Patients with Suspected Infection in the Emergency Department. Journal of clinical medicine. 2020;9(3):875–.

17. Boggle, Brittany & Balduino, & Wolk, Donna & Farag, Hosam & Kethireddy, & Chatterjee, & Abedi, Vida. (2019). Predicting Mortality of Sepsis Patients in a Multi-Site Healthcare System using Supervised Machine Learning.

18. Perng J-W, Kao I-H, Kung C-T, Hung S-C, Lai Y-H, Su C-M. Mortality prediction of septic patients in the emergency department based on machine learning. J Clin Med. 2019;8(11):1906.

19. Macdonald SPJ, Arendts G, Fatovich DM, Brown SGA. Comparison of PIRO, SOFA, and MEDS scores for predicting mortality in emergency department patients with severe sepsis and septic shock. Acad Emerg Med. 2014;21(11):1257–63

(30)

30

20. Halligan S, Altman DG, Mallett S. Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach. Eur Radiol. 2015;25(4):932–9.

21. Janssens ACJW. A more intuitive interpretation of the area under the ROC curve [Internet]. PeerJ. 2017

22. Swets JA. Indices of discrimination or diagnostic accuracy: Their ROCs and implied models. Psychol Bull. 1986;99(1):100–17.

23. Metz CE. ROC methodology in radiologic imaging. Invest Radiol. 1986;21(9):720–33.

24. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.

25. Vamvakas EC. Meta-analyses of studies of the diagnostic accuracy of laboratory tests: a review of the concepts and methods. Arch Pathol Lab Med. 1998;122(8):675–86.

26. Jones CM, Athanasiou T. Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests. Ann Thorac Surg. 2005;79(1):16–20

27. Walter SD. Properties of the summary receiver operating characteristic (SROC) curve for diagnostic test data. Stat Med. 2002;21(9):1237–56.

28. Metz CE, Herman BA, Shen JH. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med. 1998;17(9):1033–53

29. Zaiontz C. Statistics Using Excel Succinctly. United States: Createspace Independent Publishing Platform; 2017.

30. Dhungana P, Serafim LP, Ruiz AL, Bruns D, Weister TJ, Smischney NJ, et al. Machine learning in data abstraction: A computable phenotype for sepsis and septic shock diagnosis in the intensive care unit. World J Crit Care Med. 2019;8(7):120–6.

31. Yuan K-C, Tsai L-W, Lee K-H, Cheng Y-W, Hsu S-C, Lo Y-S, et al. The development an artificial intelligence algorithm for early sepsis diagnosis in the intensive care unit. Int J Med Inform. 2020;141(104176):104176.

32. Hou N, Li M, He L, Xie B, Wang L, Zhang R, et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med. 2020;18(1):462.

33. Burdick H, Pino E, Gabel-Comeau D, Gu C, Roberts J, Le S, et al. Validation of a machine learning algorithm for early severe sepsis prediction: a retrospective study predicting severe sepsis up to 48 h in advance using a diverse dataset from 461 US hospitals. BMC Med Inform Decis Mak.

2020;20(1):276.

34. Horng S, Sontag DA, Halpern Y, Jernite Y, Shapiro NI, Nathanson LA. Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PLoS One. 2017;12(4):e0174708.

35. Mani S, Ozdas A, Aliferis C, Varol HA, Chen Q, Carnevale R, et al. Medical decision support using machine learning for early detection of late-onset neonatal sepsis. J Am Med Inform Assoc.

2014;21(2):326–36.

36. Schinkel M, Paranjape K, Nannan Panday RS, Skyttberg N, Nanayakkara PWB. Clinical applications of artificial intelligence in sepsis: A narrative review. Comput Biol Med. 2019;115(103488):103488.

37. Tsoukalas A, Albertson T, Tagkopoulos I. From data to optimal decision making: a data-driven, probabilistic machine learning approach to decision support for patients with sepsis. JMIR Med Inform. 2015;3(1):e11.

38. Dellinger RP, Levy MM, Rhodes A, Annane D, Gerlach H, Opal SM, et al. Surviving Sepsis Campaign: international guidelines for management of severe sepsis and septic shock, 2012. Intensive Care Med. 2013;39(2):165–228.

39. Fleuren LM, Klausch TLT, Zwager CL, Schoonmade LJ, Guo T, Roggeveen LF, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 2020;46(3):383–400.

40. Shimabukuro DW, Barton CW, Feldman MD, Mataraso SJ, Das R. Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial. BMJ Open Respir Res. 2017;4(1):e000234.

41. Seymour CW, Liu VX, Iwashyna TJ, Brunkhorst FM, Rea TD, Scherag A, et al. Assessment of clinical criteria for sepsis: For the third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA. 2016;315(8):762.

(31)

31

42. Raith EP, Udy AA, Bailey M, McGloughlin S, MacIsaac C, Bellomo R, et al. Prognostic accuracy of the SOFA score, SIRS criteria, and qSOFA score for in-hospital mortality among adults with

suspected infection admitted to the intensive care unit. JAMA. 2017;317(3):290.

43. Freund Y, Lemachatti N, Krastinova E, Van Laer M, Claessens Y-E, Avondo A, et al. Prognostic accuracy of sepsis-3 criteria for in-hospital mortality among patients with suspected infection presenting to the emergency department. JAMA. 2017;317(3):301–8

44. Rivers EP, Coba V, Whitmill M. Early goal-directed therapy in severe sepsis and septic shock: a contemporary review of the literature. Curr Opin Anaesthesiol. 2008;21(2):128–40.

45. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA. 2016;315(8):801– 10.

46. Martin GS. Sepsis, severe sepsis and septic shock: changes in incidence, pathogens and outcomes. Expert Rev Anti Infect Ther. 2012;10(6):701–6.

47. Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak. 2019;19(1):281.

48. Martin GS, Mannino DM, Moss M. The effect of age on the development and outcome of adult sepsis. Crit Care Med. 2006;34(1):15–21.

49. Suarez De La Rica A, Gilsanz F, Maseda E. Epidemiologic trends of sepsis in western countries. Ann Transl Med. 2016;4(17):325.

50. Fleischmann C, Thomas-Rueddel DO, Hartmann M, Hartog CS, Welte T, Heublein S, et al. Hospital incidence and mortality rates of sepsis. Dtsch Arztebl Int. 2016;113(10):159–66.

51. Bauer M, Gerlach H, Vogelmann T, Preissing F, Stiefel J, Adam D. Mortality in sepsis and septic shock in Europe, North America and Australia between 2009 and 2019- results from a systematic review and meta-analysis. Crit Care. 2020;24(1):239.

52. Mayr FB, Yende S, Angus DC. Epidemiology of severe sepsis. Virulence. 2014;5(1):4–11.

53. Finfer S, Bellomo R, Lipman J, French C, Dobb G, Myburgh J. Adult-population incidence of severe sepsis in Australian and New Zealand intensive care units. Intensive Care Med. 2004;30(6):1252– 1252.

54. Vincent J-L, Rello J, Marshall J, Silva E, Anzueto A, Martin CD, et al. International study of the prevalence and outcomes of infection in intensive care units. JAMA. 2009;302(21):2323–9.

55. Esper AM, Moss M, Lewis CA, Nisbet R, Mannino DM, Martin GS. The role of infection and comorbidity: Factors that influence disparities in sepsis. Crit Care Med. 2006;34(10):2576–82.

56. Mayr FB, Yende S, Angus DC. Epidemiology of severe sepsis. Virulence. 2014;5(1):4–11.

57. Mayr FB, Yende S, Linde-Zwirble WT, Peck-Palmer OM, Barnato AE, Weissfeld LA, et al. Infection rate and acute organ dysfunction risk as explanations for racial differences in severe sepsis. JAMA. 2010;303(24):2495–503.

58. Gyawali B, Ramakrishna K, Dhamoon AS. Sepsis: The evolution in definition, pathophysiology, and management. SAGE Open Med. 2019;7:2050312119835043.

59. Ferrer R, Martin-Loeches I, Phillips G, Osborn TM, Townsend S, Dellinger RP, et al. Empiric antibiotic treatment reduces mortality in severe sepsis and septic shock from the first hour: Results from a guideline-based performance improvement program. Crit Care Med. 2014;42(8):1749–55.

60. Myburgh JA, Mythen MG. Resuscitation fluids. N Engl J Med. 2013;369(25):2462–3.

61. Guidet B, Martinet O, Boulain T, Philippart F, Poussel JF, Maizel J, et al. Assessment of

hemodynamic efficacy and safety of 6% hydroxyethylstarch 130/0.4 vs. 0.9% NaCl fluid replacement in patients with severe sepsis: the CRYSTMAS study. Crit Care. 2012;16(3):R94.

62. Stolmeijer R, ter Maaten JC, Zijlstra JG, Ligtenberg JJM. Oxygen therapy for sepsis patients in the emergency department: a little less? Eur J Emerg Med. 2014;21(3):233–5.

Accuracy of predicting sepsis mortality with machine learning compared to Sequential Organ Failure Assessment Scale