A Different Traditional Approach for Automatic Comparative Machine Learning in Multimodality Covid-19 Severity Recognition

sounds good among evaluated traditional models in systems


Introduction
The origins of the coronavirus go back to World War II (Zaim, Chong, Sankaranarayanan, & Harky, 2020), which has been identified by the World Health Organization (WHO) as a "Pandemic" since 2020 (Rubin, et al., 2020). The most common clinical signs of Covid-19 are fever and cough or at least three signs of pulmonary and gastrointestinal problems. The severity of the disease is divided into five categories based on grading: asymptomatic, mild, moderate, severe, and critical (AMMSC) (Wang, Kang, Liu, & Tong, 2020) and ways to diagnose Covid-19 are the use of laboratory results of real-time polymerase chain reaction (PCR) and computed tomography (CT) of the lung (MOHME, 2021) but under different conditions, various other tests are taken, such as complete blood count (CBC), intracellular enzyme lactate dehydrogenase (LDH), erythrocyte sedimentation rate (ESR), blood clotting (D-dimer), and C-reactive protein (CRP), but each of these methods alone cannot correctly diagnose and adjust timely treatment plan for corona's patients . For example, the results of an RT-PCR at the beginning of the disease period demonstrate a 40% false-negative result . Therefore, methods based on machine learning (ML) (until 2010), deep learning (DL) (from 2010 onwards) and fuzzy inference systems (FIS) have been used for classification, segmentation, detection, and prediction tasks with the aim of preventing sources of infection, prevalence and increasing disease severity (Gayathri & Satapathy, 2020) & (Aydin & Yurdakul, 2020), ML can actually perform the mentioned tasks and predict diseases from different aspects using self-learning based on records (task-driven), similarities (data-driven) and trial and error (reinforcement) (Nayak, et al., 2021), but the variety of input information of Covid-19 patients in the real world includes physiological symptoms (e.g., manometry, thermometry, and pulse oximetry), laboratory tests by sampling (e.g., hematology, and PCR), medical imaging (e.g., chest X-rays (CXR), and lung CT), acquisition of biological signals (e.g., electrocardiography (ECG) and plethysmography (PLG) and rhythm of speech, and medical records (e.g., demographics and epidemiological information) that help physicians to interpret and diagnose the disease better, and make a decision for treatment plan. It makes a major challenge for designers of artificial intelligence-based (AI) prediction systems, with more than 99% of medical data analysis studies devoted to the use of AI techniques with unimodal inputs and only 0.04% belonging to AI methods with multimodal inputs (Huang, Pareek, Seyyedi, Banergee, & Lungren, 2020). Due to the risk of many people dying from coronavirus during the last two years and better control in the future, especially the control of similar pandemic diseases and endemic mutations, various prediction systems of ML models for Covid-19 (Bhargava & Bansal, 2021), the severity of the disease (Zhang, Xiao, Zhu, Lin, & Tang, 2022), the risk of death (Moulaei, Shanbehzadeh, Mohammadi-Taghiabad, & Kazemi-Arpanahi, 2022), adjuvant drugs (Das, et al., 2021), etc. were examined, which are part of the following studies.  (Muhammad, et al., 2021). This paper according to designed systems of Automated Comparative Machine Learning (AutoCML) and Automated Comparative Machine Learning based on Important Features Selection (AutoIFSCML), tries to reply to this question: Which is the best traditional implementation method for the severity recognition model in the multimodal data of patients with Coronavirus? Actually, we make an effort to implement the two systems we have designed on the Early Fusion Type-I (EFT1) of 2500 samples extracted from Central Corona hospitals in East Azerbaijan during the pandemic period from 2020 until now and evaluating the mentioned models using the DCSA technique to achieve the best classification model.

Covid-19 Data Collection and Understanding
Information from 2500 multimodal samples of Covid-19 patients who had been referred to central SARS-CoV-2 hospitals in East Azerbaijan province since 2020 were randomly collected in the format of Commas-Separated-Value (CSV). This information consists of the physiological symptoms, clinical signs, medical records, demographic information, laboratory test, and extracted the percentage of lung severity scores from CT images (NCIRD, 2021) & (WHO, 2021). The data type was in the form of "structured numerical data" and "textual data that can be converted to structured data". Extracted features from CT images through three ways: density of lesions (e.g., ground glass opacity (GGO), and consolidation), signs of lesions (e.g., halo sign, reverse halo sign, and crazy paving sign), and interstitial pulmonary involvement (e.g., vascular enlargement, and bronchial wall thickening). Data were standardized using conditional encoding for each attribute. The Covid-19 severity recognition model has 15 features, and it is a four-modal combination of clinical signs, physiological symptoms, demographic information, epidemiological information, laboratory test, and CT imaging (including the percentage of lung severity) (Abirami & Kumar, 2022). In this model, 37% of the samples are asymptomatic, 33.32% are mild, 5% are moderate, 4.04% are severe, and 20.64% are critical.

Covid-19 Data Fusion
Information fusion is the process of connecting data from different methods with the aim of extracting an integrated dataset, which takes place in three architectural forms: Early Fusion (EF), Joint Fusion (JF), and Late Fusion (LF). In this study, the EFT1 was used as the feature level combination (Fig. 1). This type of combination provides multiple input methods to a feature vector before entering into a prediction model. In most studies, EFT1 has been used as the basic method to connect to multimodal inputs, but this requires that multimodal input data have the same dimensions, which in this study used discrete encoding to equalize the dimensions.

Automatic Comparative Machine Learning (AutoCML)
ML in simple words applies an executable algorithm to a set of different types of data (for our example, Covid-19 numerical, pixel, and categorial textual multisource data) and its related information (in our example, asymptomatic, mild, severe, and critical) that in this process the algorithm by learning from this set of training data can predict new data with a certain coefficient of confidence. In fact, if algorithms determine their hyperparameters in such a way that their performance improves (fine-tuning), that is, more test cases are correctly predicted, then the optimal prediction model is considered for that task (for our example, classification for detecting Covid-19 severity). ML is done in three ways: supervised (task-driven), unsupervised (data-driven), and reinforcement (reward and punishment). For our example, supervised learning is intended for the classification task. This type of learning is defined as the use of labeled (targeted) datasets to learn new data prediction algorithms.
The comparative machine learning approach is to use different algorithms and selects the best prediction model. Our system includes twelve machine learning prediction models that fall into three distinct categories of traditional models (e.g., K-Nearest Neighbors Classifier (KNN), Decision Tree Classifier (DTC), Gaussian Naive Bayes Classifier (GNB), Support Vector Machine Classifier (SVM), Logistic Regression Classifier (LRG)), ensemble models (e.g., Random Forest Classifier (RFC), Gradient Boosting Classifier (GBC), Extreme Gradient Boosting Classifier (XGB), Ada Boost Classifier (ADB), ETC, Cat Boost Classifier (CBC)) and feed-forward artificial neural network (FFANN) (e.g., Multilayer Perceptron Classifier (MLP)). As shown in Fig. 2, in this system, data preparation (like preprocessing, normalization, standardization, integration, anomaly elimination, balancing, etc.), multimodal data fusion, training and testing of 12 popular algorithms, and finally evaluation based on the DCSA is done in a single step and automatically that it introduces a prediction model as the best model.

Fig. 2. The automatic comparative machine learning system (AutoCML)
By leveraging the AutoCML system in the severity recognition model, Labels were coded as asymptomatic (0), mild (1), moderate (2), severe (3), and critical (4). There were no duplicate records and the value of 1.13 was replaced in the missing values. The Isolation Forest (IF) was then used to identify erroneous records (outliers) (Liu, Ting, & Zhou, 2008). Due to the inequality of the suspicious classes, the Synthetic Minority Over-sampling Technique (SMOTE) algorithm was used to balance the data according to the more appropriate class (Blagus & Lusa, 2013). The dimensions of the data remained constant in the preprocessing step and decreased by 10% after the clearing step and finally increased by 74.44% with data balancing. The total data became 3925.

Automatic Comparative Machine Learning Based on Important Feature Selection (AutoIFSCML)
ML is the extraction of meaningful patterns from examples of human intelligence in order to simulate fatiguefree human behavior. ML algorithms are the operational arms of computer-aided diagnostic and decisionmaking systems (Erickson, Korfiatis, Akkus, & Kline, 2017). we implemented and evaluated the process of selecting the best ML prediction model based on the most important features automatically and in one step by adding the DCSA-based automatic feature selection (AutoIFS) to the structure of the AutoCML system. In fact, we designed and compacted the AutoIFS with the previous system (Fig. 3).

Descended Composite Scores Average for System Assessment (DCSA)
Evaluating the performance of ML models is one of the basic steps in selecting the best and most efficient predictive model. Confusion Matrix (CM), Accuracy (Acc.), Precision (Prec.), Sensitivity or Recall (Rec.), ROC AUC , F1 Score , and K-Fold Cross Validation (KFCV) are some of the common assessment methods in research that are described below.
CM. This criterion has four measurement values containing True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). In our example, the TP value means the Covid-19 patient is suspicious or probable, and the model correctly predicts this. The FP value means that the Covid-19 patient is not suspicious or healthy, but the model detects the case as suspicious or probable. The TN value means that the Covid-19 patient is non-suspicious or healthy, and the model correctly predicts this, and the FN value means that the Covid-19 patient is suspicious or probable, but the model identifies the case as suspicious or healthy. In medical problems, a lower FN than the FP is high-significance.
Acc. and Prec. These criteria are computed based on the output of the confusion matrix according to Eq. (1) and Eq.
(2). The closer the output is to 100, the model is better. DCSA is a traditional composite assessment method for AutoCML and AutoIFS systems and even a compact version of them entitled AutoIFSCML system that we designed for this Covid-19 severity recognition. The ranges of DCSA in the AutoCML are between zero to one and compute the highest average score of the above six evaluation methods according to (6) [refer to Table 1]. In fact, this method aims to use the highest value of efficiency in the whole six methods for choosing the best performance of the recognition model, not just each (Fig. 4). We proposed DCSA for utilizing the entire capacity of methods because of differences in evaluation methods' values that selecting a proper model becomes hard in this situation.

Fig. 4. Mechanism of descended composite scores average (DCSA)
As well as, DCSA can be used in the AutoIFS system for choosing the most important features. The domain of DCSA in the AutoIFS is dependent on the values of each of using each significant features selection method and computes the highest average point of the methods [refer to Table 1].

Results
The development of the designed systems was performed in the Covid-19 severity recognition. As shown in

AutoCML
According to the obtained scores in each of the conventional evaluation criteria in AutoCML, DCSA has shown for the severity recognition model in Table 1. The first line and column are "Models" and "Methods" respectively. The result demonstrates that the acc. of the XGB is more than the GBC, the ROC AUC score is equal, and the value of KFCV is less than it. As well as, the GBC's Acc., Prec., F1 Score , and Rec. are equal to the DTC but the value of the KFCV of GBC is more than DTC. The value of CBC's KFCV is less than the DTC but the other scores are more than it. Also in other methods, the amounts of points in each of them are different.

AutoIFSCML
Characteristics in the severity recognition model were prioritized by the AutoIFS system, which is shown in Table 2. The first row and column are "Features as the Models" and "Methods" respectively. The result shows that the values of ANOVAF, Chi2, and PIKNN in feature 8 in comparison to feature 1 are lower but the other points are more. This issue also exists in other features. Among the priorities, 70% of the features with high DCSA were selected by this system to enter the AutoCML system. These are "loss of consciousness and muscle control", "blood oxygen saturation or SpO 2 ", "lung severity scores", "PCR", "shortness of breath", "age", "chest pain", "medical records", "loss of smell and taste" and "headache" for the severity recognition model. DCSA is calculated for the model in the AutoIFSCML system in Table 3. The result in Table 3 demonstrates that the KFCV for RFC is less than CBC, and the value of Acc. and Rec. is more than it. As well as, other methods between CBC and RFC are equal.

Conclusion
AI is a trend in computer science that has aroused special interest among medical researchers due to its ability to make decisions, solve problems and identify meaningful patterns. ML is a subset of this knowledge that simulates human behavior in different tasks using various ML methods (Xiong, et al., 2022). In ML, the early identification of severe and critical infected cases can aid to diminish the burden on healthcare systems, enabling them to prioritize the allotment of bounded resources during the climaxes and optimize decision-making (Gao, et al., 2020) Prognosis and early recognition tools and methods in ML must be precise specifically during the Covid-19 peaks. Because there is a direct relationship between mortality and severity (Pollack, 2016), Therefore, this issue is a significant subject in ML. In this study, the first hypothesis was the combination of multimodal data from Covid-19 cases in order to design the severity recognition model. Due to data sources being capable of dimensional unification, -which means that they all convert to numerically structured data-"EFT1" was chosen as the most appropriate type of fusion. The second hypothesis was the design of automated data preparation, important feature selection, and comparative machine learning in one step in which the two systems of AutoCML and AutoIFSCML were designed. The third hypothesis was the implementation of the systems on the severity recognition model and the selection of the best prediction algorithm in the mentioned model. DCSA is a solution for selecting the best model by considering the performances of the model in all methods. In AutoCML: XGB (DCSA=0.998), and in AutoIFSCML: RFC (DCSA=0.960) indicated the best performance for the model. Eventually, The DCSA-based designed systems can be useful in implementing fine-tuned machine learning models in medical processes by leveraging the capacities and performances of the model in all methods. As well as, ensemble learning made sounds good among evaluated traditional models in systems.