505 results on '"Machine Learning Classifiers"'
Search Results
202. Classification of Problem and Solution Strings in Scientific Texts: Evaluation of the Effectiveness of Machine Learning Classifiers and Deep Neural Networks
- Author
-
Rohit Bhuvaneshwar Mishra and Hongbing Jiang
- Subjects
discourse analysis ,problem–solution pattern ,automatic classification ,machine learning classifiers ,deep neural networks ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
One of the central aspects of science is systematic problem-solving. Therefore, problem and solution statements are an integral component of the scientific discourse. The scientific analysis would be more successful if the problem–solution claims in scientific texts were automatically classified. It would help in knowledge mining, idea generation, and information classification from scientific texts. It would also help to compare scientific papers and automatically generate review articles in a given field. However, computational research on problem–solution patterns has been scarce. The linguistic analysis, instructional-design research, theory, and empirical methods have not paid enough attention to the study of problem–solution patterns. This paper tries to solve this issue by applying the computational techniques of machine learning classifiers and neural networks to a set of features to intelligently classify a problem phrase from a non-problem phrase and a solution phrase from a non-solution phrase. Our analysis shows that deep learning networks outperform machine learning classifiers. Our best model was able to classify a problem phrase from a non-problem phrase with an accuracy of 90.0% and a solution phrase from a non-solution phrase with an accuracy of 86.0%.
- Published
- 2021
- Full Text
- View/download PDF
203. Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators
- Author
-
Karna Vishnu Vardhana Reddy, Irraivan Elamvazuthi, Azrina Abd Aziz, Sivajothi Paramasivam, Hui Na Chua, and S. Pranavanand
- Subjects
heart disease ,data pre-processing ,attribute evaluation ,machine learning classifiers ,hyperparameter tuning ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Cardiovascular diseases (CVDs) kill about 20.5 million people every year. Early prediction can help people to change their lifestyles and to ensure proper medical treatment if necessary. In this research, ten machine learning (ML) classifiers from different categories, such as Bayes, functions, lazy, meta, rules, and trees, were trained for efficient heart disease risk prediction using the full set of attributes of the Cleveland heart dataset and the optimal attribute sets obtained from three attribute evaluators. The performance of the algorithms was appraised using a 10-fold cross-validation testing option. Finally, we performed tuning of the hyperparameter number of nearest neighbors, namely, ‘k’ in the instance-based (IBk) classifier. The sequential minimal optimization (SMO) achieved an accuracy of 85.148% using the full set of attributes and 86.468% was the highest accuracy value using the optimal attribute set obtained from the chi-squared attribute evaluator. Meanwhile, the meta classifier bagging with logistic regression (LR) provided the highest ROC area of 0.91 using both the full and optimal attribute sets obtained from the ReliefF attribute evaluator. Overall, the SMO classifier stood as the best prediction method compared to other techniques, and IBk achieved an 8.25% accuracy improvement by tuning the hyperparameter ‘k’ to 9 with the chi-squared attribute set.
- Published
- 2021
- Full Text
- View/download PDF
204. Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations
- Author
-
Steve Agajanian, Odeyemi Oluyemi, and Gennady M. Verkhivker
- Subjects
cancer driver mutations ,machine learning classifiers ,ensemble-based machine learning features ,random forest ,deep learning ,convolutional neural networks ,Biology (General) ,QH301-705.5 - Abstract
Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models.
- Published
- 2019
- Full Text
- View/download PDF
205. The role of beat-by-beat cardiac features in machine learning classification of ischemic heart disease (IHD) in magnetocardiogram (MCG).
- Author
-
Senthilnathan S, Shenbaga Devi S, Sasikala M, Satheesh S, and Selvaraj RJ
- Subjects
- Humans, Male, Female, Middle Aged, Adult, Case-Control Studies, Signal Processing, Computer-Assisted, Algorithms, Electrocardiography methods, Aged, Heart Rate physiology, Heart physiopathology, Reproducibility of Results, Magnetocardiography methods, Myocardial Ischemia physiopathology, Myocardial Ischemia diagnosis, Machine Learning
- Abstract
Cardiac electrical changes associated with ischemic heart disease (IHD) are subtle and could be detected even in rest condition in magnetocardiography (MCG) which measures weak cardiac magnetic fields. Cardiac features that are derived from MCG recorded from multiple locations on the chest of subjects and some conventional time domain indices are widely used in Machine learning (ML) classifiers to objectively distinguish IHD and control subjects. Most of the earlier studies have employed features that are derived from signal-averaged cardiac beats and have ignored inter-beat information. The present study demonstrates the utility of beat-by-beat features to be useful in classifying IHD subjects (n = 23) and healthy controls (n = 75) in 37-channel MCG data taken under rest condition of subjects. The study reveals the importance of three features (out of eight measured features) namely, the field map angle (FMA) computed from magnetic field map, beat-by-beat variations of alpha angle in the ST-T region and T wave magnitude variations in yielding a better classification accuracy (92.7 %) against that achieved by conventional features (81 %). Further, beat-by-beat features are also found to augment the accuracy in classifying myocardial infarction (MI) Versus control subjects in two public ECG databases (92 % from 88 % and 94 % from 77 %). These demonstrations summarily suggest the importance of beat-by-beat features in clinical diagnosis of ischemia., (© 2024 IOP Publishing Ltd.)
- Published
- 2024
- Full Text
- View/download PDF
206. Reliable Crops Classification Using Limited Number of Sentinel-2 and Sentinel-1 Images
- Author
-
Beata Hejmanowska, Piotr Kramarczyk, Ewa Głowienka, and Sławomir Mikrut
- Subjects
reliability of the classification ,machine learning classifiers ,random forest ,Sentinel-2 ,Sentinel-1 ,Science - Abstract
The study presents the analysis of the possible use of limited number of the Sentinel-2 and Sentinel-1 to check if crop declarations that the EU farmers submit to receive subsidies are true. The declarations used in the research were randomly divided into two independent sets (training and test). Based on the training set, supervised classification of both single images and their combinations was performed using random forest algorithm in SNAP (ESA) and our own Python scripts. A comparative accuracy analysis was performed on the basis of two forms of confusion matrix (full confusion matrix commonly used in remote sensing and binary confusion matrix used in machine learning) and various accuracy metrics (overall accuracy, accuracy, specificity, sensitivity, etc.). The highest overall accuracy (81%) was obtained in the simultaneous classification of multitemporal images (three Sentinel-2 and one Sentinel-1). An unexpectedly high accuracy (79%) was achieved in the classification of one Sentinel-2 image at the end of May 2018. Noteworthy is the fact that the accuracy of the random forest method trained on the entire training set is equal 80% while using the sampling method ca. 50%. Based on the analysis of various accuracy metrics, it can be concluded that the metrics used in machine learning, for example: specificity and accuracy, are always higher then the overall accuracy. These metrics should be used with caution, because unlike the overall accuracy, to calculate these metrics, not only true positives but also false positives are used as positive results, giving the impression of higher accuracy. Correct calculation of overall accuracy values is essential for comparative analyzes. Reporting the mean accuracy value for the classes as overall accuracy gives a false impression of high accuracy. In our case, the difference was 10–16% for the validation data, and 25–45% for the test data.
- Published
- 2021
- Full Text
- View/download PDF
207. Efficient Classification of Optical Modulation Formats Based on Singular Value Decomposition and Radon Transformation.
- Author
-
Eltaieb, Rania A., Farghal, Ahmed E. A., Ahmed, HossamEl-din H., Saif, Waddah S., Ragheb, Amr, Alshebeili, Saleh A., Shalaby, Hossam M. H., and Abd El-Samie, Fathi E.
- Abstract
Two schemes for blind optical modulation format identification (MFI), based on the singular value decomposition (SVD) and Radon transform (RT) of the constellation diagrams, are proposed. Constellation diagrams are obtained at optical signal-to-noise ratios (OSNRs) ranging from 2 to 30 dB for eight different modulation formats as images. The first scheme depends on the utilization of feature vectors composed of the singular values (SVs) of the obtained images, while the second scheme is based on applying the RT and then getting the SVs. Different classifiers are used and compared for the MFI task. The effect of varying the number of samples on the accuracy of the classifiers is studied for each modulation format. Simulation and experimental setups have been provided to study the efficiency of the two schemes at high bit rates for three dual-polarized modulation formats. A decimation approach for the constellation diagrams is suggested to reduce the SVD complexity, while maintaining high classification accuracy. The obtained results reveal that the proposed schemes can accurately be used to identify the optical modulation format blindly with classification rates up to 100% even at low OSNR values of 10 dBs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
208. Particle swarm optimization and feature selection for intrusion detection system.
- Author
-
Kunhare, Nilesh, Tiwari, Ritu, and Dhar, Joydip
- Abstract
The network traffic in the intrusion detection system (IDS) has unpredictable behaviour due to the high computational power. The complexity of the system increases; thus, it is required to investigate the enormous number of features. However, the features that are inappropriate and (or) have some noisy data severely affect the performance of the IDSs. In this study, we have performed feature selection (FS) through a random forest algorithm for reducing irrelevant attributes. It makes the underlying task of intrusion detection effective and efficient. Later, a comparative study is carried through applying different classifiers, e.g., k Nearest Neighbour (k-NN), Support Vector Machine (SVM), Logistic Regression (LR), decision tree (DT) and Naive Bayes (NB) for measuring the different IDS metrics. The particle swarm optimization (PSO) algorithm was applied on the selective features of the NSL-KDD dataset, which cut down the false alarm rate and enhanced the detection rate and the accuracy of the IDS as compared with the mentioned state-of-the-art classifiers. This study includes the accuracy, precision, false-positive rate and the detection rate as performance metrics for the IDSs. The experimental results show low computational complexity, 99.32% efficiency and 99.26% detection rate on the selected features (=10) out of a complete set (= 41). [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
209. Paper-based device for the colorimetric assay of bilirubin based on in-situ formation of gold nanoparticles.
- Author
-
Edachana, Resmi P., Kumaresan, Abishek, Balasubramanian, Vidhya, Thiagarajan, Ramachandran, Nair, Bipin G., and Thekkedath Gopalakrishnan, Satheesh Babu
- Subjects
- *
GOLD nanoparticles , *BILIRUBIN , *DIGITAL cameras , *SURFACE plasmon resonance , *DETECTION limit - Abstract
A paper-based colorimetric assay for the determination of bilirubin has been developed. The method is based on the in-situ reduction of chloroauric acid to form gold nanoparticles. A chromatographic paper was patterned using a wax printer. Chloroauric acid was drop-cast onto the reagent zone. In the presence of bilirubin, gold(III) ions are reduced and form gold nanoparticles. This leads to a color change from yellow to purple. The intensity of the purple color (peak at 530 nm) increases with bilirubin concentration in the 5.0 to 1000 mg L−1 range. The detection limit is 1.0 mg L−1. For the quantification of bilirubin, images were captured using a digital camera, and data were processed with the help of machine learning-based supervised prediction using Random Forest classification. The method was applied to the determination of bilirubin in urine samples. The spiked urine samples exhibit more than 95% recovery. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
210. DDLA: dual deep learning architecture for classification of plant species.
- Author
-
Sundara Sobitha Raj, Anubha Pearline and Vajravelu, Sathiesh Kumar
- Abstract
Plant species recognition is performed using a dual deep learning architecture (DDLA) approach. DDLA consists of MobileNet and DenseNet‐121 architectures. The feature vectors obtained from individual architectures are concatenated to form a final feature vector. The extracted features are then classified using machine learning (ML) classifiers such as linear discriminant analysis, multinomial logistic regression (LR), Naive Bayes, classification and regression tree, k‐nearest neighbour, random forest classifier, bagging classifier and multi‐layer perceptron. The dataset considered in the studies is standard (Flavia, Folio, and Swedish Leaf) and custom collected (Leaf‐12) dataset. The MobileNet and DenseNet‐121 architectures are also used as a feature extractor and a classifier. It is observed that the DDLA architecture with LR classifier produced the highest accuracies of 98.71, 96.38, 99.41, and 99.39% for Flavia, Folio, Swedish leaf, and Leaf‐12 datasets. The observed accuracy for DDLA + LR is higher compared with other approaches (DDLA + ML classifiers, MobileNet + ML classifiers, DenseNet‐121 + ML classifiers, MobileNet + fully connected layer (FCL), DenseNet‐121 + FCL). It is also observed that the DDLA architecture with LR classifier achieves higher accuracy in comparable computation time with other approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
211. A modified content-based evolutionary approach to identify unsolicited emails.
- Author
-
Trivedi, Shrawan Kumar and Dey, Shubhamoy
- Subjects
EMAIL management ,EMAIL ,SUPPORT vector machines - Abstract
This computational research seeks to classify unsolicited versus legitimate emails. A modified version of an existing genetic programming (GP) classifier—i.e., modified genetic programming (MGP)—is implemented to build an ensemble of classifiers to identify unsolicited emails. The proposed classifier is assessed using informative features extracted from two corpora (Enron and SpamAssassin) with the help of the greedy stepwise feature search method. Further, a comparative study is performed with other popular classifiers, such as Bayesian network, naïve Bayes, decision tree, random forest (RF), support vector machine (SVM), and GP. Further the results are validated with 20-fold cross-validation and paired T test. The results prove that the proposed classifier performs better in terms of accuracy and false-positive detection in comparison with the other machine learning classifiers tested in this study. Using different training and testing a set of email files from the Enron corpus, ensemble-based classifiers, such as boosted SVM, boosted Bayesian, boosted naïve Bayesian, RF, and the proposed MGP classifier, are tested and compared on all metrics, including training and testing time. The findings suggest that the MGP classifier with the greedy stepwise feature search method offers an improvement over alternative methods in detecting unsolicited emails. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
212. Semantic segmentation of road furniture in mobile laser scanning data.
- Author
-
Li, Fashuai, Lehtomäki, Matti, Oude Elberink, Sander, Vosselman, George, Kukko, Antero, Puttonen, Eetu, Chen, Yuwei, and Hyyppä, Juha
- Subjects
- *
AIRBORNE lasers , *GAUSSIAN mixture models , *FURNITURE , *TRAFFIC signs & signals , *POINT cloud , *SUPPORT vector machines , *OPTICAL scanners - Abstract
Road furniture recognition has become a prevalent issue in the past few years because of its great importance in smart cities and autonomous driving. Previous research has especially focussed on pole-like road furniture, such as traffic signs and lamp posts. Published methods have mainly classified road furniture as individual objects. However, most road furniture consists of a combination of classes, such as a traffic sign mounted on a street light pole. To tackle this problem, we propose a framework to interpret road furniture at a more detailed level. Instead of being interpreted as single objects, mobile laser scanning data of road furniture is decomposed in elements individually labelled as poles, and objects attached to them, such as, street lights, traffic signs and traffic lights. In our framework, we first detect road furniture from unorganised mobile laser scanning point clouds. Then detected road furniture is decomposed into poles and attachments (e.g. traffic signs). In the interpretation stage, we extract a set of features to classify the attachments by utilising a knowledge-driven method and four representative types of machine learning classifiers, which are random forest, support vector machine, Gaussian mixture model and naïve Bayes, to explore the optimal method. The designed features are the unary features of attachments and the spatial relations between poles and their attachments. Two experimental test sites in Enschede dataset and Saunalahti dataset were applied, and Saunalahti dataset was collected in two different epochs. In the experimental results, the random forest classifier outperforms the other methods, and the overall accuracy acquired is higher than 80% in Enschede test site and higher than 90% in both Saunalahti epochs. The designed features play an important role in the interpretation of road furniture. The results of two epochs in the same area prove the high reliability of our framework and demonstrate that our method achieves good transferability with an accuracy over 90% through employing the training data of one epoch to test the data in another epoch. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
213. CBR-PDS: a case-based reasoning phishing detection system.
- Author
-
Abutair, Hassan, Belghith, Abdelfettah, and AlAhmadi, Saad
- Abstract
Phishing attacks have become the preferred vehicle to gather sensitive information as well as to deliver dangerous malware. So far, there is still no phishing detection system that can perfectly detect and progressively self adapt to differentiate between phishing and legitimate websites. This paper proposes the case-based reasoning Phishing detection system (CBR-PDS) that relies on previous cases to detect phishing attacks. CBR-PDS is highly adaptive and dynamic as it can adapt to detect new phishing attacks using rather a small dataset size in contrast to other machine learning techniques. CBR-PDS aims to improve the detection accuracy and the reliability of the results by identifying a set of discriminative features and discarding irrelevant features. CBR-PDS relies on a two stage hybrid procedure using Information gain and Genetic algorithms. The reduction of the data dimensionality amounts to an improved accuracy rate, yet it necessitates a reduced processing time. The CBR-PDS is tested using different scenarios on a various balanced datasets. The obtained performances clearly show the suitability of our proposed hybrid feature selection procedure as well as the efficiency of the proposed CBR-PDS system. The obtained accuracy rates exceed 95%. We also show that the integration of an Online Phishing Threats component into the CBR-PDS system improves further the accuracy rate. Finally, CRB-PDS performances are compared to those of several known competitive classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
214. Sentiment Analysis for Roman Urdu.
- Author
-
RAFIQUE, AYESHA, MALIK, MUHAMMAD KAMRAN, NAWAZ, ZUBAIR, BUKHARI, FAISAL, and JALBANI, AKHTAR HUSSAIN
- Subjects
SENTIMENT analysis ,SUPPORT vector machines ,URDU language ,SUPERVISED learning ,ONLINE comments - Abstract
The majority of online comments/opinions are written in text-free format. Sentiment Analysis can be used as a measure to express the polarity (positive/negative) of comments/opinions. These comments/ opinions can be in different languages i.e. English, Urdu, Roman Urdu, Hindi, Arabic etc. Mostly, people have worked on the sentiment analysis of the English language. Very limited research work has been done in Urdu or Roman Urdu languages. Whereas, Hindi/Urdu is the third largest language in the world. In this paper, we focus on the sentiment analysis of comments/opinions in Roman Urdu. There is no publicly available Roman Urdu public opinion dataset. We prepare a dataset by taking comments/opinions of people in Roman Urdu from different websites. Three supervised machine learning algorithms namely NB (Naive Bayes), LRSGD (Logistic Regression with Stochastic Gradient Descent) and SVM (Support Vector Machine) have been applied on this dataset. From results of experiments, it can be concluded that SVM performs better than NB and LRSGD in terms of accuracy. In case of SVM, an accuracy of 87.22% is achieved. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
215. Machine learning-based cardiac activity non-linear analysis for discriminating COVID-19 patients with different degrees of severity.
- Author
-
Ribeiro, Pedro, Marques, João Alexandre Lobo, Pordeus, Daniel, Zacarias, Laíla, Leite, Camila Ferreira, Sobreira-Neto, Manoel Alves, Peixoto, Arnaldo Aires, de Oliveira, Adriel, Madeiro, João Paulo do Vale, and Rodrigues, Pedro Miguel
- Subjects
NONLINEAR analysis ,COVID-19 ,UNCERTAINTY (Information theory) ,FRACTAL dimensions ,LYAPUNOV exponents - Abstract
This study highlights the potential of an Electrocardiogram (ECG) as a powerful tool for early diagnosis of COVID-19 in critically ill patients with limited access to CT–Scan rooms. In this investigation, 3 categories of patient status were considered: Low, Moderate, and Severe. For each patient, 2 different body positions have been used to collect 2 ECG signals. Then, from each collected signal, 10 non-linear features (Energy, Approximate Entropy, Logarithmic Entropy, Shannon Entropy, Hurst Exponent, Lyapunov Exponent, Higuchi Fractal Dimension, Katz Fractal Dimension, Correlation Dimension and Detrended Fluctuation Analysis) were extracted every 1s ECG time-series length to serve as entries for 19 Machine learning classifiers within a leave-one-out cross-validation procedure. Four different classification scenarios were tested: Low vs. Moderate, Low vs. Severe, Moderate vs. Severe and one Multi-class comparison (All vs. All). The classification report results were: (1) Low vs. Moderate - 100% of Accuracy and 100% of F 1 – S c o r e ; (2) Low vs. Severe - Accuracy of 91.67% and an F 1 – S c o r e of 94.92%; (3) Moderate vs. Severe - Accuracy of 94.12% and an F 1 – S c o r e of 96.43%; and (4) All vs All - 78.57% of Accuracy and 84.75% of F 1 – S c o r e. The results indicate that the applied methodology could be considered a good tool for distinguishing COVID-19's different severity stages using ECG signals. The findings highlight the potential of ECG as a fast and effective tool for COVID-19 examination. In comparison to previous studies using the same database, this study shows a 7.57% improvement in diagnostic accuracy for the All vs All comparison. [Display omitted] • Description of COVID-19 activity using ECG non-linear features. • 19 Machine Learning models have been used. • 3 Different COVID-19 degrees of severity groups have been tested. • The best Accuracy results ranged between 78.57% and 100%. • 7.57% accuracy increase compared with a previous study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
216. A combined framework of Biplots and Machine Learning for real-world driving volatility and emissions data interpretation.
- Author
-
Ferreira, E., Macedo, E., Fernandes, P., and Coelho, M.C.
- Subjects
MACHINE learning ,MOTOR vehicle driving ,RANDOM forest algorithms ,ROADS ,TRAFFIC monitoring ,CARBON dioxide ,NITROGEN dioxide - Abstract
• Field measurements on Euro 6c vehicles were performed using PEMS. • A proof of concept using several types of variables was developed. • Biplots allowed obtaining partitions and correlations among the variables. • Biplots allowed recognizing the differentiation at the driving and road levels. • 5 learning classifiers corroborate biplots results. Advanced visualization techniques can be useful for a better understanding of driving behavior and vehicle emissions in real-time. This study used classic and sparse HJ-biplots to examine the relationship between driving behavior, vehicle engine, exhaust emissions, and route type variables. Different Machine Learning classifiers were applied. Second-by-second vehicle dynamic, engine, and emissions data were collected from three light-duty vehicles (hybrid, diesel, and gasoline) and along three different routes (urban, rural, and highway). The dataset included a sample of 12,150 s of speed, acceleration, vehicular jerk, engine speed, engine load, fuel flow rate, vehicular specific power mode, carbon dioxide and nitrogen oxides emissions. The proposed methodology not only enables the distinction of driving styles, road types, and emissions profiles but also allows for revealing the correlation of variables in a single plot. The Random Forest algorithm showed to present the highest accuracy. This study can be useful in the context of road traffic emissions monitoring since it identifies hidden relationships in input data, and it reduces the redundancy in input parameters without compromising information. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
217. Machine Learning Algorithms for Prediction of the Quality of Transmission in Optical Networks
- Author
-
Stanisław Kozdrowski, Paweł Cichosz, Piotr Paziewski, and Sławomir Sujecki
- Subjects
artificial intelligence ,machine learning ,optical networks ,quality of transmission ,machine learning classifiers ,Science ,Astrophysics ,QB460-466 ,Physics ,QC1-999 - Abstract
Increasing demand in the backbone Dense Wavelength Division (DWDM) Multiplexing network traffic prompts an introduction of new solutions that allow increasing the transmission speed without significant increase of the service cost. In order to achieve this objective simpler and faster, DWDM network reconfiguration procedures are needed. A key problem that is intrinsically related to network reconfiguration is that of the quality of transmission assessment. Thus, in this contribution a Machine Learning (ML) based method for an assessment of the quality of transmission is proposed. The proposed ML methods use a database, which was created only on the basis of information that is available to a DWDM network operator via the DWDM network control plane. Several types of ML classifiers are proposed and their performance is tested and compared for two real DWDM network topologies. The results obtained are promising and motivate further research.
- Published
- 2020
- Full Text
- View/download PDF
218. Classification of Aggressive Movements Using Smartwatches
- Author
-
Franck Tchuente, Natalie Baddour, and Edward D. Lemaire
- Subjects
aggressive movements ,smartwatches ,feature selection ,machine learning classifiers ,performance metrics ,Chemical technology ,TP1-1185 - Abstract
Recognizing aggressive movements is a challenging task in human activity recognition. Wearable smartwatch technology with machine learning may be a viable approach for human aggressive behavior classification. This research identified a viable classification model and feature selector (CM-FS) combination for separating aggressive from non-aggressive movements using smartwatch data and determined if only one smartwatch is sufficient for this task. A ranking method was used to select relevant CM-FS models across accuracy, sensitivity, specificity, precision, F-score, and Matthews correlation coefficient (MCC). The Waikato environment for knowledge analysis (WEKA) was used to run 6 machine learning classifiers (random forest, k-nearest neighbors (kNN), multilayer perceptron neural network (MP), support vector machine, naïve Bayes, decision tree) coupled with three feature selectors (ReliefF, InfoGain, Correlation). Microsoft Band 2 accelerometer and gyroscope data were collected during an activity circuit that included aggressive (punching, shoving, slapping, shaking) and non-aggressive (clapping hands, waving, handshaking, opening/closing a door, typing on a keyboard) tasks. A combination of kNN and ReliefF was the best CM-FS model for separating aggressive actions from non-aggressive actions, with 99.6% accuracy, 98.4% sensitivity, 99.8% specificity, 98.9% precision, 0.987 F-score, and 0.984 MCC. kNN and random forest classifiers, combined with any of the feature selectors, generated the top models. Models with naïve Bayes or support vector machines had poor performance for sensitivity, F-score, and MCC. Wearing the smartwatch on the dominant wrist produced the best single-watch results. The kNN and ReliefF combination demonstrated that this smartwatch-based approach is a viable solution for identifying aggressive behavior. This wrist-based wearable sensor approach could be used by care providers in settings where people suffer from dementia or mental health disorders, where random aggressive behaviors often occur.
- Published
- 2020
- Full Text
- View/download PDF
219. Comparison of Machine Learning Classifiers for Accurate Prediction of Real-Time Stuck Pipe Incidents
- Author
-
Javed Akbar Khan, Muhammad Irfan, Sonny Irawan, Fong Kam Yao, Md Shokor Abdul Rahaman, Ahmad Radzi Shahari, Adam Glowacz, and Nazia Zeb
- Subjects
artificial neural networks ,drilling operation ,machine learning classifiers ,RBF Kernel function ,stuck pipe ,support vector machines ,Technology - Abstract
Stuck pipe incidents are one of the contributors to non-productive time (NPT), where they can result in a higher well cost. This research investigates the feasibility of applying machine learning to predict events of stuck pipes during drilling operations in petroleum fields. The predictive model aims to predict the occurrence of stuck pipes so that relevant drilling operation personnel are warned to enact a mitigation plan to prevent stuck pipes. Two machine learning methodologies were studied in this research, namely, the artificial neural network (ANN) and support vector machine (SVM). A total of 268 data sets were successfully collected through data extraction for the well drilling operation. The data also consist of the parameters with which the stuck pipes occurred during the drilling operations. These drilling parameters include information such as the properties of the drilling fluid, bottom-hole assembly (BHA) specification, state of the bore-hole and operating conditions. The R programming software was used to construct both the ANN and SVM machine learning models. The prediction performance of the machine learning models was evaluated in terms of accuracy, sensitivity and specificity. Sensitivity analysis was conducted on these two machine learning models. For the ANN, two activation functions—namely, the logistic activation function and hyperbolic tangent activation function—were tested. Additionally, all the possible combinations of network structures, from [19, 1, 1, 1, 1] to [19, 10, 10, 10, 1], were tested for each activation function. For the SVM, three kernel functions—namely, linear, Radial Basis Function (RBF) and polynomial—were tested. Apart from that, SVM hyper-parameters such as the regularization factor (C), sigma (σ) and degree (D) were used in sensitivity analysis as well. The results from the sensitivity analysis demonstrate that the best ANN model managed to achieve an 88.89% accuracy, 91.89% sensitivity and 86.36% specificity, whereas the best SVM model managed to achieve an 83.95% accuracy, 86.49% sensitivity and 81.82% specificity. Upon comparison, the ANN model is the better machine learning model in this study because its accuracy, sensitivity and specificity are consistently higher than those of the best SVM model. In conclusion, judging from the promising prediction accurateness as demonstrated in the results of this study, it is suggested that stuck pipe prediction using machine learning is indeed practical.
- Published
- 2020
- Full Text
- View/download PDF
220. Benchmarking Datasets for Breast Cancer Computer-Aided Diagnosis (CADx)
- Author
-
Moura, Daniel Cardoso, López, Miguel Angel Guevara, Cunha, Pedro, de Posada, Naimy González, Pollan, Raúl Ramos, Ramos, Isabel, Loureiro, Joana Pinheiro, Moreira, Inês C., de Araújo, Bruno M. Ferreira, Fernandes, Teresa Cardoso, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Ruiz-Shulcloper, José, editor, and Sanniti di Baja, Gabriella, editor
- Published
- 2013
- Full Text
- View/download PDF
221. Prediction of Radical Hysterectomy Complications for Cervical Cancer Using Computational Intelligence Methods
- Author
-
Kluska, Jacek, Kusy, Maciej, Obrzut, Bogdan, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Rutkowski, Leszek, editor, Korytkowski, Marcin, editor, Scherer, Rafał, editor, Tadeusiewicz, Ryszard, editor, Zadeh, Lotfi A., editor, and Zurada, Jacek M., editor
- Published
- 2012
- Full Text
- View/download PDF
222. Corrosion detection and severity level prediction using acoustic emission and machine learning based approach
- Author
-
Hassan Zaheer, Salman Sabir, Khurram Kamal, Faheem Rafique, Muhammad Fahad Sheikh, and Kashif Khan
- Subjects
Corrosion detection ,Materials science ,business.industry ,General Engineering ,Machine learning ,computer.software_genre ,Engineering (General). Civil engineering (General) ,High frequency sampling ,Corrosion ,Corrosion testing ,Machine learning classifiers ,Acoustic emission ,Naive Bayes classifier ,Severity level prediction ,Kurtosis ,Artificial intelligence ,Severity level ,Accelerated corrosion testing ,TA1-2040 ,business ,computer ,Energy (signal processing) - Abstract
Failure caused by corrosion in industries are the major cause of breakdown maintenance. Acoustic emission during the accelerated corrosion testing is a reliable method for corrosion detection, however, classification of these acoustic emission signals by machine learning techniques is still in its infancy. Proposed approach uses a hybrid technique that combines the detection of corrosion through acoustic emission signals from accelerated corrosion testing with machine learning techniques to accurately predict the corrosion severity levels. Laboratory based experimentation setup was established for accelerated corrosion testing of mild steel samples for different time spans and mass loss of samples were recorded. Acoustic emission signals were acquired at high frequency sampling rate with Sound Well AE sensor, NI Elvis kit and NI Labview software. AE mean, AE RMS, AE energy, and kurtosis were selected as distinct features as they represent a linear relationship with the corrosion process. For multi-class problem, five Corrosion severity levels have been made based on mass loss occurred during accelerated corrosion testing for which Naive Bayes, BP-NN and RBF-NN showed accuracy of 90.4%, 94.57%, and 100% respectively.
- Published
- 2021
223. Brain Tumour Detection and Classification by using Deep Learning Classifier
- Author
-
Solanki, Shubhangi, Singh, Uday Pratap, Chouhan, Siddharth Singh, and Jain, Sanjeev
- Subjects
Brain Tumor Detection ,Magnetic Resonance Images ,Deep Learning Classifiers ,Machine Learning Classifiers ,CNN - Abstract
When it comes to the field of medical image processing, the classification of brain tumours is one of the most significant and difficult problems to solve. As a result of the fact that manual classification with the assistance of humans might result in incorrect diagnoses and forecasts. In addition to this, whenever there is a substantial amount of information that must be processed manually, the process develops into a lengthy activity that is difficult to complete. As a result of the fact that brain tumours can take on a wide variety of forms, as well as the fact that there is a certain degree of similarity among normal and tumor tissues, it can be challenging to distinguish sections of a patient's brain that contain tumours from scans of that brain. As a result, a model is constructed to detect brain tumours from 2D magnetic resonance images of the brain by utilising a hybrid deep learning technique. This methodology is then accompanied with both traditional classification techniques and deep learning approaches. The application of the concept in clinical settings is the ultimate goal. The research was carried out using a Kaggle and BRaTS MICCAI dataset that had a wide range of tumours, each of which had its own size, location, and form, in addition to differing levels of image intensity. A total of 6 various classification methods namely Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Multi-layer Perceptron (MLP), Logistic Regression (LR), and Naive Bayes (NB) were used when doing the conventional phase of categorization. When compared to these conventional classifications models, the SVMproduced the most accurate results. After that, a Convolutional Neural Network (CNN)is used, which, when compared to the traditional classifiers, shows a significant enhancement in overall performance. Various Layers of CNN using different split ratio of dataset was evaluated. It is observed from the experimental findings that 5 layered CNN can obtain the highest performance accuracy of 97.86% using 80:20 split ratio.
- Published
- 2023
224. Introducing ROC Curves as Error Measure Functions: A New Approach to Train ANN-Based Biomedical Data Classifiers
- Author
-
Ramos-Pollán, Raúl, Guevara-López, Miguel Ángel, Oliveira, Eugénio, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Bloch, Isabelle, editor, and Cesar, Roberto M., Jr., editor
- Published
- 2010
- Full Text
- View/download PDF
225. Automatic Quality Inspection of Percussion Cap Mass Production by Means of 3D Machine Vision and Machine Learning Techniques
- Author
-
Tellaeche, A., Arana, R., Ibarguren, A., Martínez-Otzeta, J. M., Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Graña Romay, Manuel, editor, Corchado, Emilio, editor, and Garcia Sebastian, M. Teresa, editor
- Published
- 2010
- Full Text
- View/download PDF
226. Leveraging Electronic Dental Record Data to Classify Patients Based on Their Smoking Intensity.
- Author
-
Patel, J., Siddiqui, Z., Krishnan, A., and Thyvalikakath, T. P.
- Abstract
Background: Smoking is an established risk factor for oral diseases and, therefore, dental clinicians routinely assess and record their patients' detailed smoking status. Researchers have successfully extracted smoking history from electronic health records (EHRs) using text mining methods. However, they could not retrieve patients' smoking intensity due to its limited availability in the EHR. The presence of detailed smoking information in the electronic dental record (EDR) often under a separate section allows retrieving this information with less preprocessing.Objective: To determine patients' detailed smoking status based on smoking intensity from the EDR.Methods: First, the authors created a reference standard of 3,296 unique patients' smoking histories from the EDR that classified patients based on their smoking intensity. Next, they trained three machine learning classifiers (support vector machine, random forest, and naïve Bayes) using the training set (2,176) and evaluated performances on test set (1,120) using precision (P), recall (R), and F-measure (F). Finally, they applied the best classifier to classify smoking status from an additional 3,114 patients' smoking histories.Results: Support vector machine performed best to classify patients into smokers, nonsmokers, and unknowns (P, R, F: 98%); intermittent smoker (P: 95%, R: 98%, F: 96%); past smoker (P, R, F: 89%); light smoker (P, R, F: 87%); smokers with unknown intensity (P: 76%, R: 86%, F: 81%), and intermediate smoker (P: 90%, R: 88%, F: 89%). It performed moderately to differentiate heavy smokers (P: 90%, R: 44%, F: 60%). EDR could be a valuable source for obtaining patients' detailed smoking information.Conclusion: EDR data could serve as a valuable source for obtaining patients' detailed smoking information based on their smoking intensity that may not be readily available in the EHR. [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
227. SmiDCA: An Anti-Smishing Model with Machine Learning Approach.
- Author
-
Sonowal, Gunikhan and Kuppusamy, K S
- Subjects
- *
MACHINE learning , *PHISHING prevention , *TEXT messages , *TEXT mining , *PEARSON correlation (Statistics) - Abstract
Phishing has become a serious cyber-security issue, and it is spreading through various media such as e-mail, SMS to capture the victim’s critical profile information. Although many novel anti-phishing techniques have been developed to forestall the progress of phishing, it remains an unresolved issue. Smishing is an incarnation of Phishing attack, which utilizes Short Messaging Service (SMS) or simple text message on mobile phones to lure the victim’s online credentials. This paper presents an anti-phishing model entitled ‘SmiDCA’ (SMIshing Detection based on Correlation Algorithm). The proposed model has collected different smishing messages from various sources, and 39 distinct features were extracted initially. The SmiDCA model incorporates dimensionality reduction, and machine Learning-based experiments were conducted on without (BFSA) and with (AFSA) reduction of features. The model has been validated with experiments on both the English and non-English datasets and the results of both of these experiments are encouraging in terms of accuracy: 96.40% for English dataset and 90.33% for the non-English dataset. In addition, the model achieved an accuracy of 96.16% even after nearly half of the features were pruned. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
228. PERFORMANCE COMPARISON OF MACHINE LEARNING CLASSIFIERS ON AIRCRAFT DATABASES.
- Author
-
Kamarudin, Nur Diyana, Rahayu, Syarifah Bahiyah, Zainol, Zuraini, Rusli, Mohd Shahrizal, and Ghani, Kamaruddin Abdul
- Subjects
MACHINE learning ,NAIVE Bayes classification ,DATA mining ,COMPARATIVE studies ,PARAMETER estimation ,AERONAUTICS - Abstract
The aim of this research is to analyse the performance of six different classifiers, which are κ-Nearest Neighbours (kNN), Naive Bayes, Random Tree, J48 Decision Tree, Random Forest Tree and Sequential Minimal Optimisation (SMO), using aircraft databases and optimize their cost parameter for better accuracy. The six algorithms are implemented to classify aircraft type and its country of origin using a Waikato Environment for Knowledge Analysis (WEKA) workbench. Additionally, we report our parameter optimisation results for SMO by varying the cost parameters to obtain the optimum result. It is observed that in both classifications, SMO with linear kernel obtained the best performance as compared to the other classifiers in terms of classification accuracy, which is 100%. [ABSTRACT FROM AUTHOR]
- Published
- 2018
229. MRI radiomics analysis of molecular alterations in low-grade gliomas.
- Author
-
Shofty, Ben, Artzi, Moran, Ben Bashat, Dafna, Liberman, Gilad, Haim, Oz, Kashanian, Alon, Bokstein, Felix, Blumenthal, Deborah T., Ram, Zvi, and Shahar, Tal
- Abstract
Purpose: Low-grade gliomas (LGG) are classified into three distinct groups based on their IDH1 mutation and 1p/19q codeletion status, each of which is associated with a different clinical expression. The genomic sub-classification of LGG requires tumor sampling via neurosurgical procedures. The aim of this study was to evaluate the radiomics approach for noninvasive classification of patients with LGG and IDH mutation, based on their 1p/19q codeletion status, by testing different classifiers and assessing the contribution of the different MR contrasts.Methods: Preoperative MRI scans of 47 patients diagnosed with LGG with IDH1-mutated tumors and a genetic analysis for 1p/19q deletion status were included in this study. A total of 152 features, including size, location and texture, were extracted from fluid-attenuated inversion recovery images, T2
-weighted images (WI) and post-contrast T1WI . Classification was performed using 17 machine learning classifiers. Results were evaluated by a fivefold cross-validation analysis.Results: Radiomic analysis differentiated tumors with 1p/19q intact (n=21 ; astrocytomas) from those with 1p/19q codeleted (n=26 ; oligodendrogliomas). Best classification was obtained using the Ensemble Bagged Trees classifier, with sensitivity = 92%, specificity = 83% and accuracy = 87%, and with area under the curve = 0.87. Tumors with 1p/19q intact were larger than those with 1p/19q codeleted (46.2±30.0 vs. 30.8±16.8 cc, respectively; p=0.03 ) and predominantly located to the left insula (p=0.04 ).Conclusion: The proposed method yielded good discrimination between LGG with and without 1p/19q codeletion. Results from this study demonstrate the great potential of this method to aid decision-making in the clinical management of patients with LGG. [ABSTRACT FROM AUTHOR] - Published
- 2018
- Full Text
- View/download PDF
230. Exploratory data analysis and deception detection in news articles on social media using machine learning classifiers.
- Author
-
Sharma, Anu, Sharma, M.K, and Kr. Dwivedi, Rakesh
- Subjects
DATA analysis ,SOCIAL media ,FAKE news ,DECEPTION ,RANDOM forest algorithms ,MACHINE learning - Abstract
This paper investigates realistic ways to identify fake news on digital platforms in this context automatically. To begin, a massive number of current and correlated works were surveyed in an attempt to incorporate all possible features for detecting fake news, followed by exploratory data analysis to identify sources that frequently publish fake news and determine the most frequently occurring words in the title and body of fake and genuine news. Our findings indicate that the suggested computer models possess an advantageous discriminative potential for detecting fake news transmitted via digital channels. In this paper, we classify documents into fake/real news categories using Random Forest (RF), Naive Bayes (NB), and Passive Aggressive (PA)] machine learning classifiers with and without text processing (TP). Our paper's result is determined and calculated using the confusion matrix and the classifier's performance by defining accuracy, precision, recall, and F1 score metrics for fake news detection. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
231. Classification of EEG Signals Based on Pattern Recognition Approach
- Author
-
Hafeez Ullah Amin, Wajid Mumtaz, Ahmad Rauf Subhani, Mohamad Naufal Mohamad Saad, and Aamir Saeed Malik
- Subjects
feature extraction ,feature selection ,machine learning classifiers ,electroencephalogram (EEG) ,Neurosciences. Biological psychiatry. Neuropsychiatry ,RC321-571 - Abstract
Feature extraction is an important step in the process of electroencephalogram (EEG) signal classification. The authors propose a “pattern recognition” approach that discriminates EEG signals recorded during different cognitive conditions. Wavelet based feature extraction such as, multi-resolution decompositions into detailed and approximate coefficients as well as relative wavelet energy were computed. Extracted relative wavelet energy features were normalized to zero mean and unit variance and then optimized using Fisher's discriminant ratio (FDR) and principal component analysis (PCA). A high density EEG dataset validated the proposed method (128-channels) by identifying two classifications: (1) EEG signals recorded during complex cognitive tasks using Raven's Advance Progressive Metric (RAPM) test; (2) EEG signals recorded during a baseline task (eyes open). Classifiers such as, K-nearest neighbors (KNN), Support Vector Machine (SVM), Multi-layer Perceptron (MLP), and Naïve Bayes (NB) were then employed. Outcomes yielded 99.11% accuracy via SVM classifier for coefficient approximations (A5) of low frequencies ranging from 0 to 3.90 Hz. Accuracy rates for detailed coefficients were 98.57 and 98.39% for SVM and KNN, respectively; and for detailed coefficients (D5) deriving from the sub-band range (3.90–7.81 Hz). Accuracy rates for MLP and NB classifiers were comparable at 97.11–89.63% and 91.60–81.07% for A5 and D5 coefficients, respectively. In addition, the proposed approach was also applied on public dataset for classification of two cognitive tasks and achieved comparable classification results, i.e., 93.33% accuracy with KNN. The proposed scheme yielded significantly higher classification performances using machine learning classifiers compared to extant quantitative feature extraction. These results suggest the proposed feature extraction method reliably classifies EEG signals recorded during cognitive tasks with a higher degree of accuracy.
- Published
- 2017
- Full Text
- View/download PDF
232. Classification of Cotton Genotypes with Mixed Continuous and Categorical Variables: Application of Machine Learning Models
- Author
-
Malik, Sudha Bishnoi, Nadhir Al-Ansari, Mujahid Khan, Salim Heddam, and Anurag
- Subjects
machine learning classifiers ,supervised classification ,mixed data ,heterogeneous data ,cotton genotypes - Abstract
Mixed data is a combination of continuous and categorical variables and occurs frequently in fields such as agriculture, remote sensing, biology, medical science, marketing, etc., but only limited work has been done with this type of data. In this study, data on continuous and categorical characters of 452 genotypes of cotton (Gossypium hirsutum) were obtained from an experiment conducted by the Central Institute of Cotton Research (CICR), Sirsa, Haryana (India) during the Kharif season of the year 2018–2019. The machine learning (ML) classifiers/models, namely k-nearest neighbor (KNN), Classification and Regression Tree (CART), C4.5, Naïve Bayes, random forest (RF), bagging, and boosting were considered for cotton genotypes classification. The performance of these ML classifiers was compared to each other along with the linear discriminant analysis (LDA) and logistic regression. The holdout method was used for cross-validation with an 80:20 ratio of training and testing data. The results of the appraisal based on hold-out cross-validation showed that the RF and AdaBoost performed very well, having only two misclassifications with the same accuracy of 97.26% and the error rate of 2.74%. The LDA classifier performed the worst in terms of accuracy, with nine misclassifications. The other performance measures, namely sensitivity, specificity, precision, F1 score, and G-mean, were all together used to find out the best ML classifier among all those considered. Moreover, the RF and AdaBoost algorithms had the highest value of all the performance measures, with 96.97% sensitivity and 97.50% specificity. Thus, these models were found to be the best in classifying the low- and high-yielding cotton genotypes.
- Published
- 2022
- Full Text
- View/download PDF
233. Comparing Machine Learning Classifiers for Object-Based Land Cover Classification Using Very High Resolution Imagery
- Author
-
Yuguo Qian, Weiqi Zhou, Jingli Yan, Weifeng Li, and Lijian Han
- Subjects
object-based classification ,machine learning classifiers ,very high resolution image ,urban area ,tuning parameters ,Science - Abstract
This study evaluates and compares the performance of four machine learning classifiers—support vector machine (SVM), normal Bayes (NB), classification and regression tree (CART) and K nearest neighbor (KNN)—to classify very high resolution images, using an object-based classification procedure. In particular, we investigated how tuning parameters affect the classification accuracy with different training sample sizes. We found that: (1) SVM and NB were superior to CART and KNN, and both could achieve high classification accuracy (>90%); (2) the setting of tuning parameters greatly affected classification accuracy, particularly for the most commonly-used SVM classifier; the optimal values of tuning parameters might vary slightly with the size of training samples; (3) the size of training sample also greatly affected the classification accuracy, when the size of training sample was less than 125. Increasing the size of training samples generally led to the increase of classification accuracies for all four classifiers. In addition, NB and KNN were more sensitive to the sample sizes. This research provides insights into the selection of classifiers and the size of training samples. It also highlights the importance of the appropriate setting of tuning parameters for different machine learning classifiers and provides useful information for optimizing these parameters.
- Published
- 2014
- Full Text
- View/download PDF
234. Machine Learning Classification for Assessing the Degree of Stenosis and Blood Flow Volume at Arteriovenous Fistulas of Hemodialysis Patients Using a New Photoplethysmography Sensor Device
- Author
-
Pei-Yu Chiang, Paul C. -P. Chao, Tse-Yi Tu, Yung-Hua Kao, Chih-Yu Yang, Der-Cherng Tarng, and Chin-Long Wey
- Subjects
photoplethysmography (PPG) sensor ,arteriovenous fistula (AVF) ,hemodialysis (HD) patients ,machine learning classifiers ,support vector machine (SVM) ,Chemical technology ,TP1-1185 - Abstract
The classifier of support vector machine (SVM) learning for assessing the quality of arteriovenous fistulae (AVFs) in hemodialysis (HD) patients using a new photoplethysmography (PPG) sensor device is presented in this work. In clinical practice, there are two important indices for assessing the quality of AVF: the blood flow volume (BFV) and the degree of stenosis (DOS). In hospitals, the BFV and DOS of AVFs are nowadays assessed using an ultrasound Doppler machine, which is bulky, expensive, hard to use, and time consuming. In this study, a newly-developed PPG sensor device was utilized to provide patients and doctors with an inexpensive and small-sized solution for ubiquitous AVF assessment. The readout in this sensor was custom-designed to increase the signal-to-noise ratio (SNR) and reduce the environment interference via maximizing successfully the full dynamic range of measured PPG entering an analog−digital converter (ADC) and effective filtering techniques. With quality PPG measurements obtained, machine learning classifiers including SVM were adopted to assess AVF quality, where the input features are determined based on optical Beer−Lambert’s law and hemodynamic model, to ensure all the necessary features are considered. Finally, the clinical experiment results showed that the proposed PPG sensor device successfully achieved an accuracy of 87.84% based on SVM analysis in assessing DOS at AVF, while an accuracy of 88.61% was achieved for assessing BFV at AVF.
- Published
- 2019
- Full Text
- View/download PDF
235. Texture-Based Metallurgical Phase Identification in Structural Steels: A Supervised Machine Learning Approach
- Author
-
Dayakar L. Naik, Hizb Ullah Sajid, and Ravi Kiran
- Subjects
Gray level co-occurrence matrix (GLCM), ASTM A36 ,steel microstructure ,textural features ,machine learning classifiers ,Mining engineering. Metallurgy ,TN1-997 - Abstract
Automatic identification of metallurgical phases based on thresholding methods in microstructural images may not be possible when the pixel intensities associated with the metallurgical phases overlap and, hence, are indistinguishable. To circumvent this problem, additional visual information about the metallurgical phases, referred to as textural features, are considered in this study. Mathematically, textural features are the second order statistics of an image domain and can be distinct for each metallurgical phase. Textural features are evaluated from the gray level co-occurrence matrix (GLCM) of each metallurgical phase (ferrite, pearlite, and martensite) present in heat-treated ASTM A36 steels in this study. The dataset of textural features and pixel intensities generated for the metallurgical phases is used to train supervised machine learning classifiers, which are subsequently employed to predict the metallurgical phases in the microstructure. Naïve Bayes (NB), k-nearest neighbor (K-NN), linear discriminant analysis (LDA), and decision tree (DT) classifiers are the four classifiers employed in this study. The performances of all four classifiers were assessed prior to their deployment, and the classification accuracy was found to be >97%. The proposed technique has two unique advantages: (1) unlike pixel intensity-based methods, the proposed method does not misclassify the grain boundaries as a metallurgical phase, and (2) the proposed method does not require the end-user to input the number of phases present in the microstructure.
- Published
- 2019
- Full Text
- View/download PDF
236. Machine Learning Algorithms and Fault Detection for Improved Belief Function Based Decision Fusion in Wireless Sensor Networks
- Author
-
Atia Javaid, Nadeem Javaid, Zahid Wadud, Tanzila Saba, Osama E. Sheta, Muhammad Qaiser Saleem, and Mohammad Eid Alzahrani
- Subjects
Wireless Sensor Networks ,machine learning classifiers ,KNN ,ELM ,SVM ,RELM ,belief function ,Chemical technology ,TP1-1185 - Abstract
Decision fusion is used to fuse classification results and improve the classification accuracy in order to reduce the consumption of energy and bandwidth demand for data transmission. The decentralized classification fusion problem was the reason to use the belief function-based decision fusion approach in Wireless Sensor Networks (WSNs). With the consideration of improving the belief function fusion approach, we have proposed four classification techniques, namely Enhanced K-Nearest Neighbor (EKNN), Enhanced Extreme Learning Machine (EELM), Enhanced Support Vector Machine (ESVM), and Enhanced Recurrent Extreme Learning Machine (ERELM). In addition, WSNs are prone to errors and faults because of their different software, hardware failures, and their deployment in diverse fields. Because of these challenges, efficient fault detection methods must be used to detect faults in a WSN in a timely manner. We have induced four types of faults: offset fault, gain fault, stuck-at fault, and out of bounds fault, and used enhanced classification methods to solve the sensor failure issues. Experimental results show that ERELM gave the first best result for the improvement of the belief function fusion approach. The other three proposed techniques ESVM, EELM, and EKNN provided the second, third, and fourth best results, respectively. The proposed enhanced classifiers are used for fault detection and are evaluated using three performance metrics, i.e., Detection Accuracy (DA), True Positive Rate (TPR), and Error Rate (ER). Simulations show that the proposed methods outperform the existing techniques and give better results for the belief function and fault detection in WSNs.
- Published
- 2019
- Full Text
- View/download PDF
237. Position Invariance for Wearables: Interchangeability and Single-Unit Usage via Machine Learning
- Author
-
Soydan Redif, Aras Yurtman, Billur Barshan, Yurtman, Aras, and Barshan, Billur
- Subjects
Computer Networks and Communications ,Computer science ,Activity monitoring and classification ,Position invariance ,Inertial sensors ,02 engineering and technology ,Interchangeability ,Activity recognition ,Inertial measurement unit ,Pattern recognition ,Classifier (linguistics) ,Singular value decomposition ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Flexibility (engineering) ,Orientation (computer vision) ,business.industry ,Orientation invariance ,020206 networking & telecommunications ,Motion sensors ,Gyroscope ,Magnetometer ,Wearable sensing ,Internet of Things (IoT) ,Machine learning classifiers ,Computer Science Applications ,Accelerometer ,Hardware and Architecture ,Signal Processing ,Pattern recognition (psychology) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Information Systems - Abstract
We propose a new methodology to attain invariance to the positioning of body-worn motion-sensor units for recognizing everyday and sports activities. We first consider random interchangeability of the sensor units so that the user does not need to distinguish between them before wearing. To this end, we propose to use the compact singular value decomposition (SVD) that significantly reduces the accuracy degradation caused by random interchanging of the units. Second, we employ three variants of a generalized classifier that requires wearing only a single sensor unit on any one of the body parts to classify the activities. We combine both approaches with our previously developed methods to achieve invariance to both position and orientation, which ultimately allows the user significant flexibility in sensor-unit placement (position and orientation). We assess the performance of our proposed approach on a publicly available activity data set recorded by body-worn motion-sensor units. The experimental results suggest that there is a tolerable reduction in accuracy, which is justified by the significant flexibility and convenience offered to users when placing the units.
- Published
- 2021
- Full Text
- View/download PDF
238. Brain seizures detection using machine learning classifiers based on electroencephalography signals: a comparative study
- Author
-
Attia, Atef Hashem and Said, Ashraf Mahroos
- Subjects
Support vector machine ,Epileptic seizure ,Electroencephalography ,Machine learning classifiers ,Random forest - Abstract
The paper demonstrates various machine learning classifiers, they have been used for detecting epileptic seizures quickly and accurately through electroencephalography (EEG), in real time. Symptoms of epilepsy are caused abnormal brain activity. Analyzing and detecting epileptic seizures presents many challenges because EEG signals are non-stationary, and the patterns of the seizure vary for each patient. Moreover, the EEG signals are noisy, and this affect the process of seizure detection. On the other hand, Machine learning algorithms are very accurate, adaptive and generalize very well when provided with diverse and big training data and can easily analyze complex structure of the EEG signal despite the noisiness when compared to other methods. With this approach the features of epileptic seizures can be learned and used to correctly identify other seizure cases. The demonstration states a comparison between various classifiers, including random forests, K-nearest neighbors (K-NN), decision trees, support vector machine (SVM), logistic regression and naïve bayes. Different performance metrics is used such as accuracy, receiver operating characteristics (ROC), mean absolute error (MAE), root-mean-square error (RMSE) and most importantly detection time for each algorithm. The Bonn university dataset has been used for demonstration process for the classification of the epileptic seizure.
- Published
- 2022
239. Modern drowsiness detection techniques: a review
- Author
-
Sarah Saadoon and Dr.Alia Karim AbdulHassan
- Subjects
General Computer Science ,Identification of fatigue classification ,Optical image processing driver drowsiness sensors ,Electrical and Electronic Engineering ,Machine learning classifiers - Abstract
According to recent statistics, drowsiness, rather than alcohol, is now responsible for one-quarter of all automobile accidents. As a result, many monitoring systems have been created to reduce and prevent such accidents. However, despite the huge amount of state-of-the-art drowsiness detection systems, it is not clear which one is the most appropriate. The following points will be discussed in this paper: Initial consideration should be given to the many sorts of existing supervised detecting techniques that are now in use and grouped into four types of categories (behavioral, physiological, automobile and hybrid), Second, the supervised machine learning classifiers that are used for drowsiness detection will be described, followed by a discussion of the advantages and disadvantages of each technique that has been evaluated, and lastly the recommendation of a new strategy for detecting drowsiness.
- Published
- 2022
240. A Low-Power Wearable Stand-Alone Tongue Drive System for People With Severe Disabilities.
- Author
-
Jafari, Ali, Buswell, Nathanael, Ghovanloo, Maysam, and Mohsenin, Tinoosh
- Abstract
This paper presents a low-power stand-alone tongue drive system (sTDS) used for individuals with severe disabilities to potentially control their environment such as computer, smartphone, and wheelchair using their voluntary tongue movements. A low-power local processor is proposed, which can perform signal processing to convert raw magnetic sensor signals to user-defined commands, on the sTDS wearable headset, rather than sending all raw data out to a PC or smartphone. The proposed sTDS significantly reduces the transmitter power consumption and subsequently increases the battery life. Assuming the sTDS user issues one command every 20 ms, the proposed local processor reduces the data volume that needs to be wirelessly transmitted by a factor of 64, from 9.6 to 0.15 kb/s. The proposed processor consists of three main blocks: serial peripheral interface bus for receiving raw data from magnetic sensors, external magnetic interference attenuation to attenuate external magnetic field from the raw magnetic signal, and a machine learning classifier for command detection. A proof-of-concept prototype sTDS has been implemented with a low-power IGLOO-nano field programmable gate array (FPGA), bluetooth low energy, battery and magnetic sensors on a headset, and tested. At clock frequency of 20 MHz, the processor takes 6.6 $\mu$ s and consumes 27 nJ for detecting a command with a detection accuracy of 96.9%. To further reduce power consumption, an application-specified integrated circuit processor for the sTDS is implemented at the postlayout level in 65-nm CMOS technology with 1-V power supply, and it consumes 0.43 mW, which is 10 $\times$ lower than FPGA power consumption and occupies an area of only 0.016 mm$^2$. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
241. PERFORMANCE ANALYSIS OF CLASSIFICATION METHODS FOR INDOOR LOCALIZATION IN VLC NETWORKS.
- Author
-
Sánchez-Rodríguez, D., Alonso-González, I., Sánchez-Medina, J., Ley-Bosch, C., and Díaz-Vilariño, L.
- Subjects
INDOOR positioning systems ,MOBILE geographic information systems - Abstract
Indoor localization has gained considerable attention over the past decade because of the emergence of numerous location-aware services. Research works have been proposed on solving this problem by using wireless networks. Nevertheless, there is still much room for improvement in the quality of the proposed classification models. In the last years, the emergence of Visible Light Communication (VLC) brings a brand new approach to high quality indoor positioning. Among its advantages, this new technology is immune to electromagnetic interference and has the advantage of having a smaller variance of received signal power compared to RF based technologies. In this paper, a performance analysis of seventeen machine leaning classifiers for indoor localization in VLC networks is carried out. The analysis is accomplished in terms of accuracy, average distance error, computational cost, training size, precision and recall measurements. Results show that most of classifiers harvest an accuracy above 90%. The best tested classifier yielded a 99.0% accuracy, with an average error distance of 0.3 centimetres. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
242. Using Case-Based Reasoning for Phishing Detection.
- Author
-
Abutair, Hassan Y.A. and Belghith, Abdelfettah
- Subjects
PHISHING ,CLASSIFICATION ,CASE-based reasoning ,ADAPTIVE control systems ,BIG data ,FEATURE extraction ,MACHINE learning - Abstract
Abstract: Many classifications techniques have been used and devised to combat phishing threats, but none of them is able to efficiently identify web phishing attacks due to the continuous change and the short life cycle of phishing websites. In this paper, we introduce a Case-Based Reasoning (CBR) Phishing Detection System (CBR-PDS). It mainly depends on CBR methodology as a core part. The proposed system is highly adaptive and dynamic as it can easily adapt to detect new phishing attacks with a relatively small data set in contrast to other classifiers that need to be heavily trained in advance. We test our system using different scenarios on a balanced 572 phishing and legitimate URLs. Experiments show that the CBR-PDS system accuracy exceeds 95.62%, yet it significantly enhances the classification accuracy with a small set of features and limited data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
243. Apnoea detection using ECG signal based on machine learning classifiers and its performances.
- Author
-
J RG and K D
- Subjects
- Humans, Support Vector Machine, Male, Female, Adult, Middle Aged, Algorithms, Electrocardiography methods, Machine Learning, Signal Processing, Computer-Assisted, Sleep Apnea Syndromes diagnosis
- Abstract
Sleep apnoea is a common disorder affecting sleep quality by obstructing the respiratory airway. This disorder can also be correlated to certain diseases like stroke, depression, neurocognitive disorder, non-communicable disease, etc. We implemented machine learning techniques for detecting sleep apnoea to make the diagnosis easier, feasible, convenient, and cost-effective. Electrocardiography signals are the main input used here to detect sleep apnoea. The considered ECG signal undergoes pre-processing to remove noise and other artefacts. Next to pre-processing, extraction of time and frequency domain features is carried out after finding out the R-R intervals from the pre-processed signal. The power spectral density is calculated by using the Welch method for extracting the frequency-domain features. The extracted features are fed to different machine learning classifiers like Support Vector Machine, Decision Tree, k-nearest Neighbour, and Random Forest, for detecting sleep apnoea and performances are analysed. The result shows that the K-NN classifier obtains the highest accuracy of 92.85% compared to other classifiers based on 10 extracted features. The result shows that the proposed method of signal processing and machine learning techniques can be reliable and a promising method for detecting sleep apnoea with a reduced number of features.
- Published
- 2023
- Full Text
- View/download PDF
244. Prediction of diabetes disease using an ensemble of machine learning multi-classifier models.
- Author
-
Abnoosian K, Farnoosh R, and Behzadi MH
- Subjects
- Humans, Bayes Theorem, Computer Systems, Machine Learning, ROC Curve, Diabetes Mellitus diagnosis
- Abstract
Background and Objective: Diabetes is a life-threatening chronic disease with a growing global prevalence, necessitating early diagnosis and treatment to prevent severe complications. Machine learning has emerged as a promising approach for diabetes diagnosis, but challenges such as limited labeled data, frequent missing values, and dataset imbalance hinder the development of accurate prediction models. Therefore, a novel framework is required to address these challenges and improve performance., Methods: In this study, we propose an innovative pipeline-based multi-classification framework to predict diabetes in three classes: diabetic, non-diabetic, and prediabetes, using the imbalanced Iraqi Patient Dataset of Diabetes. Our framework incorporates various pre-processing techniques, including duplicate sample removal, attribute conversion, missing value imputation, data normalization and standardization, feature selection, and k-fold cross-validation. Furthermore, we implement multiple machine learning models, such as k-NN, SVM, DT, RF, AdaBoost, and GNB, and introduce a weighted ensemble approach based on the Area Under the Receiver Operating Characteristic Curve (AUC) to address dataset imbalance. Performance optimization is achieved through grid search and Bayesian optimization for hyper-parameter tuning., Results: Our proposed model outperforms other machine learning models, including k-NN, SVM, DT, RF, AdaBoost, and GNB, in predicting diabetes. The model achieves high average accuracy, precision, recall, F1-score, and AUC values of 0.9887, 0.9861, 0.9792, 0.9851, and 0.999, respectively., Conclusion: Our pipeline-based multi-classification framework demonstrates promising results in accurately predicting diabetes using an imbalanced dataset of Iraqi diabetic patients. The proposed framework addresses the challenges associated with limited labeled data, missing values, and dataset imbalance, leading to improved prediction performance. This study highlights the potential of machine learning techniques in diabetes diagnosis and management, and the proposed framework can serve as a valuable tool for accurate prediction and improved patient care. Further research can build upon our work to refine and optimize the framework and explore its applicability in diverse datasets and populations., (© 2023. BioMed Central Ltd., part of Springer Nature.)
- Published
- 2023
- Full Text
- View/download PDF
245. Investigating the effect of road condition and vacation on crash severity using machine learning algorithms.
- Author
-
Almannaa M, Zawad MN, Moshawah M, and Alabduljabbar H
- Subjects
- Saudi Arabia, Accidents, Traffic, Machine Learning
- Abstract
Investigating the contributing factors to traffic crash severity is a demanding topic in research focusing on traffic safety and policies. This research investigates the impact of 16 roadway condition features and vacations (along with the spatial and temporal factors and road geometry) on crash severity for major intra-city roads in Saudi Arabia. We used a crash dataset that covers four years (Oct. 2016 - Feb. 2021) with more than 59,000 crashes. Machine learning algorithms were utilized to predict the crash severity outcome (non-fatal/fatal) for three types of roads: single, multilane, and freeway. Furthermore, features that have a strong impact on crash severity were examined. Results show that only 4 out of 16 road condition variables were found to be contributing to crash severity, namely: paints, cat eyes, fence side, and metal cable. Additionally, vacation was found to be a contributing factor to crash severity, meaning crashes that occur on vacation are more severe than non-vacation days.
- Published
- 2023
- Full Text
- View/download PDF
246. Classifying Daily and Sports Activities Invariantly to the Positioning of Wearable Motion Sensor Units
- Author
-
Aras Yurtman, Billur Barshan, Barshan, Billur, and Yurtman, Aras
- Subjects
Computer Networks and Communications ,Computer science ,Feature extraction ,Wearable computer ,Inertial sensors ,02 engineering and technology ,Activity recognition ,Activity recognition and monitoring ,Position (vector) ,0202 electrical engineering, electronic engineering, information engineering ,Computer vision ,Flexibility (engineering) ,Wearable motion sensors ,Orientation (computer vision) ,business.industry ,020206 networking & telecommunications ,Motion detection ,Gyroscope ,Rigid body ,Magnetometer ,Wearable sensing ,Internet of Things (IoT) ,Machine learning classifiers ,Computer Science Applications ,Accelerometer ,Hardware and Architecture ,Signal Processing ,Position-invariant sensing ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,Information Systems - Abstract
We propose techniques that achieve invariance to the positioning of wearable motion sensor units on the body for the recognition of daily and sports activities. Using two sequence sets based on the sensory data allows each unit to be placed at any position on a given rigid body part. As the unit is shifted from its ideal position with larger displacements, the activity recognition accuracy of the system that uses these sequence sets degrades slowly, whereas that of the reference system (which is not designed to achieve position invariance) drops very fast. Thus, we observe a tradeoff between the flexibility in sensor unit positioning and the classification accuracy. The reduction in the accuracy is at acceptable levels, considering the convenience and flexibility provided to the user in the placement of the units. We compare the proposed approach with an existing technique to achieve position invariance and combine the former with our earlier methodology to achieve orientation invariance. We evaluate our proposed methodology on a publicly available data set of daily and sports activities acquired by wearable motion sensor units. The proposed representations can be integrated into the preprocessing stage of existing wearable systems without significant effort.
- Published
- 2020
- Full Text
- View/download PDF
247. An artificial intelligence solution for crop recommendation
- Author
-
Varshitha D. N. and Savita Choudhary
- Subjects
Control and Optimization ,Computer Networks and Communications ,Hardware and Architecture ,NPK ,Signal Processing ,Deep learning ,Electrical and Electronic Engineering ,Deep neural network ,Prediction ,Information Systems ,Machine learning classifiers - Abstract
Agriculture is the major occupation in India. The development of India is in the hands of farmers. Farmers are said to be our nation’s backbone, so there is a need to support our farmers technologically so that the difficulties of traditional agricultural practices would be overcome and also there will be positive impact on the yield, harvest, healthy crop output and the income of the farmers. Farmer needs awareness about his soil and the methods to improve his soil to grow the healthy crops. We propose an approach which involves deep learning and some IOT features to help our farmers. Soil parameters such as nitrogen, phosphorous, potassium (NPK), pH, organic carbon, moisture content and few more things are considered for predicting the fertility of the soil and also to predict the right crops to be grown and nutrition required for it. We have developed a deep neural network model to predict the crop which can be suitably grown in the soil. We have also implemented the other machine learning classifiers on the same collected dataset to test the accuracies of each classifier and our deep neural network model.
- Published
- 2022
248. First Steps towards Data-Driven Adversarial Deduplication
- Author
-
Jose N. Paredes, Gerardo I. Simari, Maria Vanina Martinez, and Marcelo A. Falappa
- Subjects
adversarial deduplication ,machine learning classifiers ,cyber threat intelligence ,Information technology ,T58.5-58.64 - Abstract
In traditional databases, the entity resolution problem (which is also known as deduplication) refers to the task of mapping multiple manifestations of virtual objects to their corresponding real-world entities. When addressing this problem, in both theory and practice, it is widely assumed that such sets of virtual objects appear as the result of clerical errors, transliterations, missing or updated attributes, abbreviations, and so forth. In this paper, we address this problem under the assumption that this situation is caused by malicious actors operating in domains in which they do not wish to be identified, such as hacker forums and markets in which the participants are motivated to remain semi-anonymous (though they wish to keep their true identities secret, they find it useful for customers to identify their products and services). We are therefore in the presence of a different, and even more challenging, problem that we refer to as adversarial deduplication. In this paper, we study this problem via examples that arise from real-world data on malicious hacker forums and markets arising from collaborations with a cyber threat intelligence company focusing on understanding this kind of behavior. We argue that it is very difficult—if not impossible—to find ground truth data on which to build solutions to this problem, and develop a set of preliminary experiments based on training machine learning classifiers that leverage text analysis to detect potential cases of duplicate entities. Our results are encouraging as a first step towards building tools that human analysts can use to enhance their capabilities towards fighting cyber threats.
- Published
- 2018
- Full Text
- View/download PDF
249. Support vector machines applied to torsional vibration severity in drill strings
- Author
-
Caballero, E. F., Lobo, D. M., Di Vaio, M. V., Silva, E. C. C. M., and Ritto, T. G.
- Published
- 2021
- Full Text
- View/download PDF
250. Hurricane damage assessment in satellite images using hybrid VGG16 model.
- Author
-
Kaur, Swapandeep, Gupta, Sheifali, Singh, Swati, Koundal, Deepika, Hoang, Vinh Truong, Alkhayyat, Ahmed, and Vu-Van, Hung
- Subjects
- *
REMOTE-sensing images , *HURRICANE damage , *LOGISTIC regression analysis , *FEATURE extraction , *IMAGE recognition (Computer vision) , *K-nearest neighbor classification - Abstract
Hurricanes are one of the most disastrous natural phenomena occurring on Earth that cause loss of human lives and immense damage to property. A damage assessment method has been proposed for damage caused to buildings due to Hurricane Harvey that hit the Texas region in the year 2017. The aim of our study is to predict if there is any damage to the buildings present in the postdisaster satellite images. Principal component analysis has been used for the visualization of data. The VGG16 model has been used for extracting features from the input images. K-nearest neighbor (KNN), logistic regression, decision tree, random forest, and XGBoost classification techniques have been used for classification of the images whose features have been extracted from VGG16. Best accuracy of 97% is obtained by KNN classifier for the balanced test set, and accuracy of 96% is obtained by logistic regression for the unbalanced test set. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.