14,174 results on '"Statistics"'
Search Results
2. Linear Regression and Holt’s Winter Algorithm in Forecasting Daily Coronavirus Disease 2019 Cases in Malaysia: Preliminary Study
- Author
-
Hudzaifah Hasri, Robiah Ahmad, and Siti Armiza Mohd Aris
- Subjects
Coronavirus disease 2019 (COVID-19) ,Linear regression ,Statistics ,Mathematics - Published
- 2021
3. An Improved Deep Forest Regression
- Author
-
Heng Xia and Jian Tang
- Subjects
Statistics ,Environmental science ,Regression - Published
- 2021
4. Palmprint ROI Cropping Based on Enhanced Correlation Coefficient Maximisation Algorithm
- Author
-
Raja Abdullah Raja Ahmad, Muhammad Imran Ahmad, Mohd Nazrin Md Isa, Mustafa Zuhaer Nayef Al-Dabagh, Noor Aldeen A. Khalid, and Thulfiqar H. Mandeel
- Subjects
Correlation coefficient ,Statistics ,Cropping ,Mathematics - Published
- 2021
5. Intelligent Computational Model for Early Heart Disease Prediction using Logistic Regression and Stochastic Gradient Descent (A Preliminary Study)
- Author
-
Eka Miranda, Mediana Aryuni, Charles Bernando, and Faair M Bhatti
- Subjects
Stochastic gradient descent ,Heart disease ,Statistics ,medicine ,medicine.disease ,Logistic regression ,Mathematics - Published
- 2021
6. Spread of COVID-19 Deaths in Jakarta: Cluster and Regression Analysis
- Author
-
Intan Saskia, Ro’fah Nur Rachmawati, and Derwin Suhartono
- Subjects
Geography ,Coronavirus disease 2019 (COVID-19) ,Statistics ,Regression analysis ,Disease cluster - Published
- 2021
7. Research on Evaluation of Green Development Level of Urban Agglomeration Based on Interval Optimization
- Author
-
Sun Zenjie, Zhang Xiaojie, Yang Di, Lv Yuntong, and Zhu Jinhui
- Subjects
Urban agglomeration ,Computer science ,Statistics ,Interval (graph theory) ,Green development - Published
- 2021
8. Probabilistic Imputation for High Resolution Univariate Electric Load Data with Large Gaps
- Author
-
Michael W. Ross and Patrick Giles
- Subjects
Electrical load ,Computer science ,Statistics ,Probabilistic logic ,Univariate ,High resolution ,Imputation (statistics) - Published
- 2021
9. Narenciye Ağaç Yaprak Hastalıklarının Evrişimli Sinir Ağları ile Sınıflandırılması
- Author
-
Ercan Kilic, Irem Nur Ecemis, and Hamza Osman Ilhan
- Subjects
Statistics ,Early detection ,Convolutional neural network ,Mathematics - Abstract
Disease factor in citrus trees greatly affects productivity and quality of the products. Consequently, diseases in the citrus trees harm the farmers financially. Early detection of diseases allows taking necessary precautions and increasing the productivity. As a result, the financial losses of the farmers will be minimized. In this study, a convolutional neural network model, which provides classification of leaf images, has been proposed in order to detect citrus diseases on leaf images. In addition, the classification performances of VGG16 and AlexNet architectures in terms of transfer learning idea for citrus leaf diseases were measured and compared with the custom model. The performances of the models were obtained on a leaf dataset containing images of four different disease and healthy citrus leaf classes. The created ESA model and transfer learning architectures as VGG16 and AlexNet classified the dataset with the accuracy rates of 82.64%, 93.39% and 92.56%, respectively.
- Published
- 2021
10. Technical Analysis of the Displacements of the Centre of Pressure in the Standing Posture Based on Data Obtained Using Selected Stock Market Indicators
- Author
-
Marta Chmura, Piotr Wodarski, Jacek Jurkojć, Grzegorz Gruszka, and Marek Gzik
- Subjects
Moving average ,Stock exchange ,Technical analysis ,Statistics ,Range (statistics) ,Stock market ,Divergence (statistics) ,Balance (ability) ,Mathematics ,MACD - Abstract
BACKGROUND: Commonly used analyses of the ability to maintain balance do not take into account momentary changes appearing during the entire analysis. Looking at the nature of the COP and COM curves, a conclusion can be drawn that such trends could be investigated applying methods used for the technical analysis of stock exchange rates.OBJECTIVE: The objective of the study was to determine whether the Moving Average Convergence/Divergence indicator (MACD) could be used in the analyses concerning the assessment of the ability to maintain balance.METHODS: The study group was consisted of 85 healthy individuals in the real environment tests and 12 healthy individuals in the virtual environment tests (with a oscillating scenery). Performed calculations enabled the identification of time intervals between successive trend changes in relation to the COP displacements.RESULTS: Test results revealed that, when standing, the most frequently appearing time intervals between successive trend changes were restricted within the range of 0.1 s. to 0.5 s.CONCLUSIONS: The above-presented trend was observed both in relation to the measurements conducted in the real environment and those performed in the oscillating virtual environment. The foregoing could indicate that the changes are characteristic and indispensable for the proper maintaining of balance by humans.
- Published
- 2021
11. Classification of Osmancik and Cammeo Rice Varieties using Deep Neural Networks
- Author
-
Ahmet İlhan, Erkut Inan Iseri, Kaan Uyar, and Umit Ilhan
- Subjects
Data set ,Normalization (statistics) ,Artificial neural network ,Statistics ,Deep neural networks ,Mathematics - Abstract
Rice is one of the most widely consumed grains in the world. It is globally known that countries in southern Asia are the ones that mostly produce and also consume this particular type of grain. About 800 million tons of rice in many varieties is produced in the world every year. Each variety has its unique characteristics. This study covers research on the classification of Osmancik and Cammeo rice varieties using Deep Neural Networks (DNNs). There are 3810 numerical data of which 2180 belong to Osmancik and 1630 to Cammeo in the University of California Irvine (UCI) Rice (Osmancik and Cammeo) Data Set that is used in this work. The data is subjected to a normalization process which improves the performance of the multilayer neural networks. The performance of this study is measured thru calculating accuracy, sensitivity, specificity, precision, F1-score, NPV, FPR, FDR and, FNR. The overall success rate of this study is found to be 93.04%.
- Published
- 2021
12. Lower Bounds on the Expected Excess Risk Using Mutual Information
- Author
-
M. Bora Dogan and Michael Gastpar
- Subjects
Statistics ,Absolute risk reduction ,Mutual information ,Mathematics - Published
- 2021
13. Effect of Sampling Method on the Regression Accuracy for a High-Speed Link Problem
- Author
-
Hanzhi Ma, Andreas C. Cangellaris, Xu Chen, and Xing-Jian Shangguan
- Subjects
Statistics ,Link (knot theory) ,Regression ,Mathematics - Published
- 2021
14. Research on Bearing Life Trend Prediction Method Based on Principal Component Analysis and Grey Model
- Author
-
MA Hailong and Li Zhen
- Subjects
Bearing (mechanical) ,Trend prediction ,law ,Statistics ,Principal component analysis ,Mathematics ,law.invention - Published
- 2021
15. Prediction of Cement Specific Surface Area Based on XGBoost
- Author
-
Chunxing Xiong and Zhugang Yuan
- Subjects
symbols.namesake ,Mean absolute percentage error ,Correlation coefficient ,Mean squared error ,Multicollinearity ,Dimensionality reduction ,Statistics ,symbols ,Feature selection ,Mutual information ,Pearson product-moment correlation coefficient ,Mathematics - Abstract
Aiming at the problem of cement grinding specific surface area (SSA) detection, a cement SSA model based on XGBoost is proposed. First, based on actual production data of a cement plant, original data missing values and outliers are detected and processed. Secondly, using Pearson Correlation Coefficient (PCC) and Mutual Information (MI) method for feature selection, the multicollinearity between auxiliary variables is detected, and auxiliary variables with a high correlation to the cement SSA are screened out to achieve dimensionality reduction of input variables. Finally, the prediction model of cement SSA based on XGBoost is established. The results show that, compared with the GBM and RF, XGBoost has obtained a higher square of correlation coefficient (R2) and smaller root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).
- Published
- 2021
16. Retrieval Method of Poor Students Subsidy Resource Database Based on Improved Grey Correlation Degree
- Author
-
Xiaoling Zhang, Xu Pang, and Luyao Wei
- Subjects
Resource (project management) ,Computer science ,Statistics ,Subsidy ,Grey correlation ,Degree (temperature) - Published
- 2021
17. Prediction of The Export Value in Indonesia for The Year 2021 as The Impact of Covid-19 Pandemic Using ARIMA Algorithm
- Author
-
Ririn Ikana Desanti, Aathis Kavana Royan, and Yanti
- Subjects
Coronavirus disease 2019 (COVID-19) ,Statistics ,Value (economics) ,Pandemic ,Autoregressive integrated moving average ,Mathematics - Published
- 2021
18. Research on Auditory Tension Classification Algorithm Based on Interval Statistics
- Author
-
Ning Ding
- Subjects
Computer Science::Sound ,Tension (physics) ,Computer science ,Statistics ,Feature extraction ,Interval (graph theory) ,Complex network ,Algorithm - Abstract
This paper proposes a classification model based on interval statistics and a classification algorithm of auditory tension for complex networks. Firstly, a model of interval statistical method for feature extraction of music and speech is introduced. Then, I understand the relationship and calculation method between interval consonance and auditory tension, and propose a classification model of auditory tension based on interval statistical method. Finally, this classification model is used to classify and simulate unused music and speech.
- Published
- 2021
19. MOD3D-PAT - A Novel Modified 3rd Degree Polynomial Approximation for Modelling Traffic Congestion in Urban Areas
- Author
-
Robin Kuok Cheong Chan, Rajendran Parthiban, and Joanne Mun Yee Lim
- Subjects
Set (abstract data type) ,Traffic congestion ,Mean squared error ,Robustness (computer science) ,Computer science ,Statistics ,Process (computing) ,Suburban area ,Degree of a polynomial ,Traffic flow - Abstract
Traffic congestion continues to be a concern up to this day despite the efforts of many studies to overcome it. Even though there are various state-of-the-art methods such as machine learning to model traffic congestion, there are many factors that are computationally expensive to take into account in this modelling process. This paper aims to provide the following: (i) study general traffic behaviours in an urban or suburban location based on online traffic information (Traffic API), (ii) propose a novel generalised mathematical model used to model traffic congestion for a given set of parameters, and (iii) evaluate the robustness of the proposed mathematical model with traffic data from certain cities in Malaysia (Bukit Bintang, Bandar Sunway, and Damansara Utama). The model was tested on traffic patterns of different days. The goodness-of-fit (R-squared value) and root-mean-square-error (RMSE) of the models for the various tests were obtained. Overall, the model shows a good approximation for all three areas where the main location, Bukit Bintang having an R-square of 0.93 and a RMSE of 0.035 as the worst result and an R-Square of 0.97 and RMSE of 0.024 as the best. Sharp and varied peaks in traffic during the evening such as those in Bandar Sunway can affect the results, but the proposed model is still able to provide a reliable model as per the characteristics of a 3rd degree polynomial, where the evening traffic is averaged out. The proposed model also displays a similar level of accuracy for the suburban area of Damansara Utama.
- Published
- 2021
20. Tracking Rt of COVID-19 Vaccine Effectiveness Using Kalman Filter and SIRD Model
- Author
-
Jinan Charafeddine, Mahmoud Kaddour, and Nazih Moubayed
- Subjects
Coronavirus disease 2019 (COVID-19) ,Epidemic outbreak ,Pandemic ,Statistics ,Kalman filter - Abstract
In this paper, a SIRD model is adapted to study the vaccine’s impact on the spread of coronavirus (COVID19) spread in Lebanon. To describe the epidemic development across the country, a Kalman filter is integrated with the SIRD model in order to estimate the time-varying reproduction number R t - is the most important indicator that predicts the severity of an epidemic outbreak. R t denotes the number of healthy persons to whom an infected person can spread the disease. The results show a reduction in the spread of the pandemic after employing the vaccine. All the data and relevant codebase are available at https://www.moph.gov.lb
- Published
- 2021
21. Missing Values Imputation in Food Consumption: An Analytical Study
- Author
-
Geetanjali Rathee, Hemraj Saini, and Ashok Kumar Tripathi
- Subjects
Statistics::Applications ,Iterative method ,Computer science ,Data_GENERAL ,Convergence (routing) ,Statistics ,Food consumption ,Statistics::Methodology ,Production (economics) ,Imputation (statistics) ,Overall performance ,Missing data ,Quantitative Biology::Genomics - Abstract
Missing values are an unavoidable trouble in some of actual world packages and the way to impute those missing values has end up a challenging problem in food consumption and production. Even though there are a few famous imputation techniques proposed, those techniques carry out poorly within side the estimation of food consumption with Missing Value. With this paper introduces an iterative imputation approach, KNN imputation method and median imputation method. These techniques are an example primarily based totally imputation method that takes benefit of the correlation of attributes. The achievable values for the missing values are expected from those nearest neighbor times. In addition, the iterative imputation permits all to be had values, consisting of the characteristic values within side the times with missing information and the imputed values from preceding new release to be applied for estimating the missing values. Specifically, the imputation approach can fill in all of the missing values with dependable records irrespective of the lacking charge of the food consumption dataset. We test our proposed approach on numerous food consumption datasets at extraordinary lacking costs in assessment with a few present imputation techniques. The experimental consequences recommend that the proposed approach receives a higher overall performance than different techniques in phrases of imputation accuracy and convergence speed.
- Published
- 2021
22. Detecting Aberrant Values and Their Influence on the Time Series Forecast
- Author
-
Florentina-Loredana Dragomir, Cristian Stefan Dumitriu, and Alina Barbulescu
- Subjects
Series (mathematics) ,Statistics ,Outlier ,population characteristics ,Data series ,Precipitation ,health care economics and organizations ,Mathematics - Abstract
This article addresses the influence of outliers on building time series models. Two methods for detecting aberrant values are discussed, and models are built for the studied data series in the outliers' presence and absence. Data used consisted in the annual precipitation series recorded at Sulina (Romania). The models have been further employed for generating precipitation fields. This process shows a good concordance of the historical data and the forecast in terms of mean, minimum, maximum values, and minimum recorded precipitation in two, five, seven, and ten successive years. The results show that the field generated after the outliers' removal is better in terms of ten statistical indicators.
- Published
- 2021
23. Comparative Study of Forecasting Models for COVID-19 Outbreak in Turkey
- Author
-
Mert Nakip, Cuneyt Guzelis, and Onur Copur
- Subjects
education.field_of_study ,Coronavirus disease 2019 (COVID-19) ,Computer science ,Generalization ,Population ,Statistics ,Linear regression ,Feature selection ,Simple linear regression ,Time series ,education ,Perceptron - Abstract
This paper gives an explanation for the failure of machine learning models for the prediction of the cases and the other future trends of Covid-19 pandemic. The paper shows that simple Linear Regression models provide high prediction accuracy values reliably but only for a 2-weeks period and that relatively complex machine learning models, which have the potential of learning long-term predictions with low errors, cannot achieve to obtain good predictions with possessing a high generalization ability. It is suggested in the paper that the lack of a sufficient number of samples is the source of the low prediction performance of the forecasting models. To exploit the information, which is of most relevant with the active cases, we perform feature selection over a variety of variables such as the numbers of active cases, deaths, recoveries, and population. Furthermore, we compare Linear Regression, Multi-Layer Perceptron, and Long-Short Term Memory models each of which is used for prediction of active cases together with various feature selection methods. Our results show that the accurate forecasting of the active cases with high generalization ability is possible up to 3 days because of the small sample size of COVID-19 data. We observe that the Linear Regression model has much better prediction performance with high generalization ability as compared to the complex models but, as expected, its performance decays sharply for more than 14-days prediction horizons.
- Published
- 2021
24. Micro-Spatial Projection of Energy Demand Based on Dominant Factors Identification: an Exploratory Factor Analysis
- Author
-
Hasna Satya Dini, Adri Senen, and Dwi Anggaini
- Subjects
Identification (information) ,Transformation matrix ,Variables ,Mathematical model ,Computer science ,media_common.quotation_subject ,Statistics ,Projection method ,Macro ,Projection (set theory) ,Exploratory factor analysis ,media_common - Abstract
The common energy projection method is macro-based model. As consequence, it is unable to show load centers in microgrids and failed to locate the distribution station. Thus, a macro model for forecasting cannot be applied in creating a master plan of electricity distribution. For this reason, micro-spatial energy projection needs to be implemented. Micro-spatial energy projection method falls into two categories; trending and multivariate simulation analysis. The more variables involved in energy projection, the more accurate the result. The projection is correlated to interaction among variables in the form of factors, as each service area has different dominant factors. Exploratory Factor Analysis was applied in this research to identify the dominant factors. This method is used to determine the dominant factors among observed variables. This research used 12 independent variables, 8 variables are grouped into 3 principal factors based on the result of component transformation matrix. The three factors are used in a mathematical model of projection of energy demand, so the result of projection can be more accurate.
- Published
- 2021
25. Mobility-aware COVID-19 Case Prediction using Cellular Network Logs
- Author
-
Arti Ramesh, Necati Ayan, Sushil Chaskar, Antonio A. de A. Rocha, and Anand Seetharam
- Subjects
Mobility model ,Markov chain ,Mean squared error ,business.industry ,Computer science ,Markov process ,symbols.namesake ,Linear regression ,Statistics ,Cellular network ,symbols ,Leverage (statistics) ,business ,Predictive modelling ,Computer network - Abstract
In this paper, our goal is to model the aggregate mobility of individuals in a city by analyzing cellular network connections, and then leverage the designed mobility model to model and predict the number of COVID-19 infections in future. We analyze cellular network connections from 973 antennas for all users in the city of Rio de Janeiro from April 5, 2020 to July 2, 2020. We design a Markovian model that captures the mobility across municipalities. We then combine the transition probabilities of the Markov chain with the number of COVID-19 cases in a municipality during a particular week in the design of our mobility-aware COVID-19 case prediction models to predict the number of cases for the following week. Our experiments demonstrate that our mobility-aware models significantly out-perform a baseline mobility-agnostic linear regression model in terms of metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE).
- Published
- 2021
26. Double-Weighted Low-Rank Matrix Recovery Based on Rank Estimation
- Author
-
Zhengqin Xu, Shun Fang, Shiqian Wu, Huasong Xing, and Shoulie Xie
- Subjects
Estimation ,business.industry ,Statistics ,Rank (graph theory) ,Low-rank approximation ,Artificial intelligence ,business ,Mathematics - Published
- 2021
27. Understanding the Effects of Visualizing Missing Values on Visual Data Exploration
- Author
-
Hayeong Song, Yu Fu, John Stasko, and Bahador Saket
- Subjects
FOS: Computer and information sciences ,Workflow ,Empirical research ,Computer science ,Statistics ,Computer Science - Human-Computer Interaction ,Portfolio ,Missing data ,Cognitive load ,Stock (geology) ,Human-Computer Interaction (cs.HC) ,Task (project management) ,Visualization - Abstract
When performing data analysis, people often confront data sets containing missing values. We conducted an empirical study to understand the effects of visualizing those missing values on participants' decision-making processes while performing a visual data exploration task. More specifically, our study participants purchased a hypothetical portfolio of stocks based on a dataset where some stocks had missing values for attributes such as PE ratio, beta, and EPS. The experiment used scatterplots to communicate the stock data. For one group of participants, stocks with missing values simply were not shown, while the second group saw such stocks depicted with estimated values as points with error bars. We measured participants' cognitive load involved in decision-making with data with missing values. Our results indicate that their decision-making workflow was different across two conditions.
- Published
- 2021
28. Adaptive NN-based Root Cause Analysis in Volume Diagnosis for Yield Improvement
- Author
-
Min Qin, Cheng Chen, Ruosheng Xu, Xin Huang, Shangling Jui, Pengyun Li, Zhihao Ding, and Yu Huang
- Subjects
Yield (engineering) ,Volume (thermodynamics) ,Statistics ,Root cause analysis ,Mathematics - Published
- 2021
29. Peak Load Forecasting Using Long-Short Term Memory : Case Study of Jawa-Madura-Bali System
- Author
-
K. G. H. Mangunkusumo, Anindita Satria Surya, Musa Partahi Marbun, and Muhammad Ridwan
- Subjects
education.field_of_study ,Mean absolute percentage error ,Recurrent neural network ,Mean squared error ,Artificial neural network ,Population ,Statistics ,Range (statistics) ,Demand forecasting ,education ,Realization (probability) ,Mathematics - Abstract
The process demand forecast at PLN uses many assumptions of projections originating from external PLN, such as economic growth assumptions, population growth, population, inflation, electrification ratio targets, and new and renewable energy development targets. This paper provides an alternative method of calculating annual peak load forecasts using the Long Short-term Memory (LSTM) approach as a part of the Deep Neural Network in Artificial Intelligence (AI). This method aims to improve the accuracy of expense forecasts on the realization of expenses that have occurred by studying patterns that happened in the past. The calculation of the load forecast shows that the Root Mean Square Error (RMSE) of the peak load forecast with the Recurrent Neural Network (RNN)LSTM maximum is 2,167. The Mean Absolute Percentage Error (MAPE) value of the RNN-LSTM obtained a maximum of 8.6% or fell within the range
- Published
- 2021
30. Sampling Frame of Square Segment by Points for Maize Observation
- Author
-
Budi Heru Santosa, Widyo Pura Buana, Bangun Muljo Sukojo, Heri Sadmono, Swasetyo Yulianto, Lena Sumargana, Bambang Winarno, and Fauziah Alhasanah
- Subjects
Statistics ,Square (unit) ,Sampling frame ,Mathematics - Published
- 2021
31. Design of Weight and Height Measurement System Based Wireless Communication
- Author
-
Moch. Zen Samsono Hadi, Haniah Mahmudah, and Sifaul Warohmatulilla
- Subjects
Measurement scales ,business.industry ,Computer science ,System of measurement ,Statistics ,Wireless ,Measurement uncertainty ,Nutritional status ,business ,Body weight ,Field (computer science) - Abstract
In the midst of technology that is developing rapidly, many innovations need to be made in various fields, especially health field. The existing measurement scales weight and height still need the help of others in using it. This research aims to get the value of human body weight and height, then others can monitor the measurement results on the website. The two values are also processed to obtain BMI values that can classify someone’s nutritional status. The data will be sent to the MySQL database using wireless communication. The data will be displayed on the website and LCD screen. These results are indicated by the average percentage error for weight detection, which is 0.52% and 0.5% for height. Then the percentage value of the success of appearing on the LCD is 100%, while on the website it is 99.99%. Therefore, the system can work properly.
- Published
- 2021
32. Spatial Mapping of Tubercolusis Vulnerability in Tuban District, Indonesia, Using Hierarchical Clustering
- Author
-
Arna Fariza, Arif Basofi, and Amalia Kusumaningtyas
- Subjects
Linkage (software) ,medicine.medical_specialty ,Tuberculosis ,Vulnerability assessment ,Public health ,Statistics ,Vulnerability ,medicine ,Cluster analysis ,medicine.disease ,Complete linkage ,Hierarchical clustering - Abstract
Tuberculosis (TB) has spread over the world and continues to be a serious public health concern. The high rate of finding cases of pulmonary TB in Tuban district requires an analysis to find out what actions and policies needed to reduce the number of cases of pulmonary TB disease. spatial analysis is an instrument for identifying areas where there is a risk of disease, so it is an important instrument for managing and planning health policies. This paper proposes a new approach to assess the level of susceptibility to tuberculosis in Tuban district using hierarchical clustering. The multi-criteria for vulnerability assessment consist of population density, number of patients, total number of TB cases, CNR of all TB cases, estimates of patients found and treated, and mortality during treatment. The TB susceptibility level as a result of the clustering of 20 sub-districts was mapped spatially-temporal in 2017-2019. Based on the results of the evaluation of the variance mapping of the 2017-2019 tuberculosis vulnerability level with single linkage, the average variance value is 0.23660, smaller than the average linkage of 0.29861 and the complete linkage of 0.30332. This shows that single linkage grouping is better than average linkage and complete linkage.
- Published
- 2021
33. State-Space versus Linear Regression Models between ECG Leads
- Author
-
Ivan Tomasic, Roman Trobec, and Maria Lindén
- Subjects
medicine.anatomical_structure ,Mean squared error ,Correlation coefficient ,Linear regression ,Statistics ,medicine ,State space ,Regression analysis ,Statistical model ,Torso ,Residual ,Mathematics - Abstract
The first attempts to modeling relationships between electrocardiographic leads, were based on measuring lead vectors by using models of human torso (deterministic approach), and by estimating liner regression models between leads of interest (statistical models). Among the most recent attempts, one of the most prominent is the state-space models approach, because of better noise immunity compared to mean squared error estimated statistical models. This study uses state-space models to synthesize precordial leads and Frank leads, from leads I, II, and V1. The synthesis was evaluated with the linear correlation coefficient (CC) on 200 measurements from the Physionet's PTB diagnostic ECG database. The results show better performance of regression models (mean CC between 0.88 and 0.96) than the state-space models (mean CC between 0.78-0.86). The leads were not pre-aligned for the R-peaks, which can be the main cause for the lower performances of state-space models, as a previous study has also shown. Residual baseline wander (after filtering) was the dominant reason for not obtaining better synthesis results with both methods.
- Published
- 2021
34. Distribution Analysis of Long-Term Heart Rate Variability Versus Blood Glucose
- Author
-
Marjan Gusev, Lidija Poposka, I. Vishinov, Marija Vavlukis, and I. Ahmeti
- Subjects
Correlation ,chemistry.chemical_compound ,chemistry ,Separation (statistics) ,Statistics ,Outlier ,Univariate ,Heart rate variability ,Interval (mathematics) ,Glycated hemoglobin ,F1 score ,Mathematics - Abstract
This research explores the class distributions of longterm heart rate variability (HRV) parameters compared to the distribution of glycated hemoglobin (HbAlc) which depicts the long-term blood glucose regulation ability. The goal is to find the optimal HRV parameter and time interval for which it is measured that correlates to the class distribution based on HbAlc the most. The class distribution separability will provide an answer if future highly accurate, precise, and sensitive machine learning classification can be constructed and if so, to aid their interaction with the input data. We found that removing a dataset sample in which at least one feature value is considered an outlier led to much better results. The strongest point-biserial correlations for the class distribution separation were found for 24-hour SDRMSSD-3 $(\mathrm{r}=-0.43)$ , 20-hour SDRMSSD-3 $(\mathrm{r}=-0.34)$ , and 24-hour ARMSSD-3 $(\mathrm{r}=-0.33)$ satisfying the significance p-value threshold ( $\mathrm{p}\leq 0.01$ ). All correlations were negative, showcasing that lower HRV is associated with worse blood glucose regulation. We observed that the longer the measurement period, the better the point-biserial correlation. The best class distribution separation based on the univariate threshold is achieved for SDRMSSD-3 with $\text{ACC}=86.52\%$ and a weighted F1 score of 86.71%, making it stand out as the single most valuable HRV parameter when it comes to distinguishing good from bad blood glucose regulation.
- Published
- 2021
35. Nonlinear Energy Harvesting Evaluation through the Logit Pearson Distribution
- Author
-
George K. Karagiannidis, Panagiotis D. Diamantoulakis, Nestor D. Chatzidiamantis, and Sotiris A. Tegos
- Subjects
Nonlinear system ,symbols.namesake ,Distribution (number theory) ,Cumulative distribution function ,Logit ,Statistics ,symbols ,Gamma distribution ,Pearson distribution ,Probability density function ,Logistic function ,Mathematics - Abstract
In this paper, we introduce the logit Pearson type III distribution and we utilize it, for first time in the literature, to investigate the statistical behavior of wireless power transfer, assuming that the harvested energy follows a well-established nonlinear energy harvesting (EH) model based on the logistic function. Specifically, we present closed-form expressions for the statistical properties of the introduced logit Pearson type III distribution, e.g., the cumulative distribution function (CDF), the probability density function and the moments and we utilize this distribution to define the logit gamma distribution. Furthermore, taking into account that the logit Pearson type III distribution is closely related to the considered nonlinear EH model the statistical properties of the distribution of the harvested power are derived. These expressions are of high practical value, since useful insights for the EH system can be extracted through the evaluation of the CDF, as well as the average harvested power and the variance.
- Published
- 2021
36. Effect of Criteria Range on the Similarity of Results in the COMET Method
- Author
-
Andrii Shekhovtsov, Jakub Więckowski, Bartłomiej Kizielewicz, and Wojciech Sałabun
- Subjects
Correlation ,Correlation coefficient ,Similarity (network science) ,Comet ,Statistics ,Range (statistics) ,Contrast (statistics) ,Mathematics - Abstract
Defining input values in the decision-making process can be done with appropriate methods or based on expert knowledge. It is essential to ensure that the values are adequate for the problem to be solved in both cases. There may be situations where values are overestimated, and it should be checked whether this affects the final results.In this paper, the Characteristic Objects Method (COMET) was used to investigate the overestimation effect on the final rankings. The decision matrixes with a different number of alternatives and criteria were assessed The obtained results were compared using the WS similarity coefficient and Spearman’s weighted correlation coefficient. The study showed that overestimation has a significant effect on the rankings. A larger number of criteria has a positive effect on the correlation strength of the compared rankings. In contrast, a large overestimation of characteristic values has a negative effect on the similarity of the results.
- Published
- 2021
37. Estimating Wind Power Plant Outputs in Transmission System Planning Studies Based on Probability Approaches
- Author
-
Ahmet Ova and Sevki Demirbas
- Subjects
Wind power ,Mean squared error ,business.industry ,Cumulative distribution function ,Statistics ,Probability density function ,Context (language use) ,Scale factor ,business ,Wind speed ,Weibull distribution ,Mathematics - Abstract
In this study, the hourly power output of Wind Power Plants (WPP) to be used as input in long term Transmission System Planning (TSP) has been examined. In this context, Weibull Distribution approach, which is one of the probabilistic approaches generally used in the wind speed analysis, is used. Using historical hourly wind speed data; average wind speed, most probable wind speed, maximum wind speed, probability density function, cumulative distribution function, power density and wind power output for the Nordex N60 model are obtained for each hour of the planning period. Maximum Likelihood Method(MLM) and Energy Pattern Factor Method(EPFM) are used to obtain the parameters (Weibull Shape Factor, K and Weibull Scale Factor, C) belonging to the Weibull Distribution. In addition, the Root Mean Square Error (RMSE),one of the error analysis method, error analysis is done to find out which of the Weibull parameters’ calculated methods will be suitable for the wind speed analysis.
- Published
- 2021
38. A random forest-based approach for survival curves comparing: principles, computational aspects and asymptotic time complexity analysis
- Author
-
Lubomir Stepanek, Ivana Mala, Lubos Marek, and Filip Habarta
- Subjects
Statistical assumption ,Proportional hazards model ,Robustness (computer science) ,Statistics ,Pruning ,Time complexity ,Survival analysis ,Statistical power ,Mathematics ,Random forest - Abstract
The log-rank test and Cox’s proportional hazard model can be used to compare survival curves but are limited by strict statistical assumptions. In this study, we introduce a novel, assumption-free method based on a random forest algorithm able to compare two or more survival curves. A proportion of the random forest’s trees with sufficient complexity is close to the test’s p-value estimate. The pruning of trees in the model modifies trees’ complexity and, thus, both the method’s robustness and statistical power. The discussed results are confirmed using a simulation study, varying the survival curves and the tree pruning level.
- Published
- 2021
39. Prediction of cardiovascular disease survival based on artificial neural network
- Author
-
Weiliang Zeng, Hui Fu, Xinzui Wang, Kangkang Xu, and Yalun Zhang
- Subjects
Support vector machine ,Correlation ,Naive Bayes classifier ,Blood pressure ,Ejection fraction ,Artificial neural network ,business.industry ,Statistics ,Medicine ,Disease ,Akaike information criterion ,business - Abstract
Cardiovascular disease (CVD) is a kind of chronic disease involving the heart and blood vessels. In the early 21st century, cardiovascular diseases accounted for nearly 50% of the mortality in developed countries and about 25% in developing countries, and cardiovascular diseases have gradually become common diseases. Accurate prediction of survival events of patients with cardiovascular disease can provide more meaningful reference for subsequent treatment, and strive for the best treatment opportunity for patients to achieve the purpose of prolonging life. The data set collected by Kaggle was used in this study, which included variables such as high blood pressure, creatinine phosphokinase, ejection fraction, serum creatinine, and smoking. Based on Akaike information criterion (AIC), stepwise regression analysis was used to select the strongly correlated variables of cardiovascular disease, and then an artificial neural network (ANN) based survival prediction model of cardiovascular disease was constructed. In this paper, support vector machine (SVM) and naive Bayes are used to compare with the artificial neural network. The results show that the performance of artificial neural network is better than other algorithms regardless of the use of strongly correlated variables. After using strongly correlated variables, the performance of each algorithm is improved. After training the artificial neural network with strong correlation variables, it has the highest accuracy, accuracy, recall rate, F1-score and AUC, which can reach 0.81, 0.83, 0.85, 0.84 and 0.84 respectively.
- Published
- 2021
40. A Metabolic Rate Estimation Model Based on Heart Rate and Respiratory Rate
- Author
-
Kun Shang, Tao Wang, Tanqiu Li, and Hexiang Zhang
- Subjects
Estimation ,Heart rate method ,Respiratory rate ,Energy expenditure ,Statistics ,Heart rate ,Metabolic rate ,Mathematics ,Random forest - Abstract
This paper innovatively proposes a method to estimate metabolic rate. By extracting the time-domain statistical features of heart rate and respiratory rate, the metabolic rate evaluation model is established by using random forest regression algorithm. The results show that the model is superior to the existing heart rate method in predicting accuracy of metabolic rate and adapting to individual differences, and R2 can reach 0.99. At present, the accurate measurement of metabolic rate requires complex or expensive equipment, but the method proposed in this paper is conducive to the real-time measurement of dynamic metabolic rate conveniently, which can be used to study energy expenditure during exercise.
- Published
- 2021
41. Explore a New Category Normalized Indicator in the Context of Big Data
- Author
-
Xuelan Li
- Subjects
Correlation ,Correlation coefficient ,Statistics ,Context (language use) ,Sample (statistics) ,A-weighting ,Standard score ,Citation impact ,Reliability (statistics) - Abstract
Category normalized indicators within a short time window are unreliable for scientific research evaluation. To solve this problem. Wang and Zhang proposed a new normalized indicator called WCNCI (Weighted Category Normalized Citation Impact), which reflects the reliability of a paper's standard score by introducing a weighting factor, where the longer (shorter) the publication time of a paper, the greater (smaller) the weight. Because this indicator assigns a large weight to older papers with longer publication time and to disciplines such as biology and medicine that are prone to high citations, the fairness of the evaluation results is biased. Thus, in this study, an improved normalized indicator called the NWCNCI (New WCNCI) is proposed. The values and rankings of each university under different normalized indicators, that is CNCI, WCNCI, and NWCNCI were calculated separately using Java on a sample of the top 500 universities in the world of ARWU2020, and correlation analysis and research were performed. The result demonstrated that the NWCNCI values and rankings of the 500 universities were highly correlated with the CNCI and WCNCI values and rankings, however, simultaneously, some universities still had a large variation in their values and rankings. Thus, it is concluded that the NWCNCI normalized indicator provides a solution to the problem of standardization and is the ideal tool for evaluating research output.
- Published
- 2021
42. Forecasting Stock Exchange Using Gated Recurrent Unit
- Author
-
Djoko Budiyanto Setyohadi, Bernadectus Yudi Dwiandiyanta, and Yakobus Nobel H. Judo Prajitno
- Subjects
Stock exchange ,Value (economics) ,Statistics ,Stock forecasting ,Stock (geology) ,Unit (housing) ,Data modeling ,Mathematics - Abstract
Stock forecasting is an important thing in investing stock to find the next movement. The major aim of this research is to forecast the McDonald's stock from New York Stock Exchange using data covering the period from 6 January 2006 to 14 April 2021 on a daily, weekly, and monthly basis. The GRU (Gated Recurrent Unit) method is used to create the training model and making predictions from closing movement McDonald's shares. The results of this study are to determine the best forecasting results using the GRU method based on the accuracy and error values obtained in the existing data. The results on the three data that have been tested showed that the medium-term data (weekly data) provide the best result compared to the others based on the accuracy value, the minimum error value, and consistent results obtained on reneated tests.
- Published
- 2021
43. Risk Prediction Model of Sudden Public Safety Incidents Based on Neural Network
- Author
-
Fan Li, Siya Yu, Zaipeng Duan, and Yiyang Zhang
- Subjects
Support vector machine ,education.field_of_study ,Artificial neural network ,Mean squared error ,Population ,Statistics ,Graphical model ,education ,Bayesian linear regression ,Regression ,Predictive modelling - Abstract
Taking natural disasters as an example, this paper collects the data of natural disasters in China from 2014 to 2020, and makes a statistical analysis on the temporal and spatial distribution of natural disasters in China. Four prediction models including Bayesian regression, KNN, support vector machine and neural network are established. The natural disaster risk prediction indexes are constructed by using the direct economic loss of natural disasters and the affected population, and the regression fitting and prediction experiments are carried out. The fitting and prediction performance of each model are compared by using four quantitative evaluation indexes: mean absolute error, r2score, mean square error and adjusted R-square. The results show that natural disasters in China show obvious temporal and spatial distribution law, and the prediction performance of neural network model is the best.
- Published
- 2021
44. Research and Implementation of Crop Identification Based on Sentinel-2 Time Series Data in Shijiazhuang City
- Author
-
Weimin Hou, Jianxi Huang, Zhuang Miao, Jia Su, and Huimin Cui
- Subjects
Tree (data structure) ,Statistical classification ,Cohen's kappa ,Statistics ,Classifier (linguistics) ,Decision tree ,Enhanced vegetation index ,Time series ,Random forest ,Mathematics - Abstract
Using accurate remote sensing technology to monitor the spatial distribution of crops is of great practical significance for food security and sustainable development of agriculture. In this study, Shijiazhuang city, Hebei province in China was taken as the study area, and 13 sceneries Sentinel-2 from June to September 2017 was used as the data source. The time series data set of three indices which included enhanced vegetation index (EVI), land surface water index (LSWI) and red edge position (REP) was constructed. Combined with the multispectral data from Sentinel-2 imagery, corn, soybean, peanut, pear tree and walnut tree in this study area were identified by classification and regression tree (CART) algorithm and random forest (RF) algorithm on the Google Earth Engine (GEE) cloud computing platform. The results showed that the combination of EVI and REP achieved the best overall accuracy and kappa coefficient in two classification algorithms and the crop classification from the RF classifier had the higher accuracy than CART classifier, its overall accuracy and kappa coefficient were 95.09% and 0.94, respectively. Therefore, this method provided important reference value for the classification of crops in large area.
- Published
- 2021
45. Measurement of Detection Rate Accuracy in Forecasting Crude Palm Oil Production using Fuzzy Time Series
- Author
-
Santi Prayudani, Al-Khowarizmi, Arif Ridho Lubis, Yulia Fatmi, Yuyun Yusnida Lase, and Julham
- Subjects
Data visualization ,Series (mathematics) ,business.industry ,Statistics ,Palm oil ,Production (economics) ,Detection rate ,Time series ,business ,Fuzzy logic ,Mathematics ,Test data - Abstract
Time Series is a superior method of predicting the future based on past data. Time series are also used in various businesses to make forecasts for profit. Time series data provide data visualization with statistical explanations necessary for business decisions. One of the businesses that operates for the needs of all elements is the Crude Palm Oil (CPO) commodity industry. Where the CPO price can be forecast using time series because it uses a series at the time available in fact. In this paper, 599 data of CPO price data were crawled from September 10, 2019 to April 30, 2021, then divided into 560 training data and 39 testing data. In this case, testing was carried out in measuring accuracy using MAPE in forecasting CPO prices. with time series getting 0.01781302% while accuracy is also measured by MAPE combined with detection rate gaining a percentage of 0.501031843%. This indicates that when forecasting with time series on CPO price data, the best accuracy is calculated using MAPE without any combination with other techniques.
- Published
- 2021
46. Assessing Acceptance of Electronic Marketing among Consumers of Agribusiness Products
- Author
-
Elpawati Elpawati, Ujang Maman, Yayat Suharyat, Nana Danapriatna, Ihsanul Muttaqien, and Iwan Aminudin
- Subjects
Value (ethics) ,Subjective norm ,Variables ,media_common.quotation_subject ,Statistics ,Electronic marketing ,Psychology ,Affect (psychology) ,Structural equation modeling ,Agribusiness ,media_common - Abstract
This study aimed to identify factors of electronic marketing (e-marketing) acceptance and how they influence the acceptance among consumers of Hartanimart.com. This type of research refers to the influences of the independent variables (i.e., usefulness, ease, attitude, and subjective norm) on the dependent variable (i.e., the use of e-marketing). The samples used were 100 people who made a purchase seed through Hartanimart.com by the method of accidental. Data were analyzed statistically using the partial least square structural equation modeling (PLS-SEM) method. The results of the study found that three assumptions were accepted, i.e. a factor of usefulness, ease, and subjective norms have a significant level of 0.05 and the value of t-count is greater than t-table $\boldsymbol{(> 1.96)}$ against the use of e-marketing Hartanimart.com and one of them was rejected, i.e., the factor of attitude did not affect significantly influence the use of e-marketing Hartanimart.com. Factors of usefulness, ease, and subjective norm can support consumers in increasing the use of e-marketing.
- Published
- 2021
47. Optimization Parameters Support Vector Regression using Grid Search Method
- Author
-
Muhammad Agreindra Helmiawan, Yanyan Sofiyan, and Irfan Fadil
- Subjects
Support vector machine ,Cryptocurrency ,Mean absolute percentage error ,Kernel (statistics) ,Hyperparameter optimization ,Statistics ,Value (economics) ,Radial basis function ,Time series ,Mathematics - Abstract
Bitcoin is a cryptocurrency known to have high price fluctuation. Investment depends on price fluctuations which have a high level of risk. Bitcoin investment has these principles. To avoid losses and gain profits, there needs a method that may be used to make forecasts of the price of bitcoin accurately. In this research, Bitcoin price predictions were deployed based on bitcoin price data obtained in the past (Time Series Forecasting) using the method Support Vector Regression. The data retrieved is weekly Bitcoin price data from January 2018 to March 2020. Bitcoin price data is nonlinear, so a kernel is used Radial Basis Function. Meanwhile, the variables of the Support Vector Regression method are optimized using Grid Search Method. The purpose of the research is to determine the accuracy of the Support Vector Regression method by looking at the result of the Mean Absolute Percentage Error value. The research showed that the Mean Absolute Percentage Error value obtained was equal to 10,74 % with parameter value $\mathbf{C=5,} \boldsymbol{\varepsilon=0.004}$ , and $\boldsymbol{\gamma=0.07}$ . The Mean Absolute Percentage Error value indicates that the prediction results are categorized as a good prediction.
- Published
- 2021
48. The impact of data characteristics on the estimation of the three-dimensional passenger macroscopic fundamental diagram
- Author
-
Klaus Bogenberger, Antonia Pawlowski, and Gabriel Tilg
- Subjects
Estimation ,Ground truth ,Observational error ,Sampling (signal processing) ,Computer science ,Position (vector) ,Statistics ,Detector ,Contrast (statistics) ,Context (language use) - Abstract
The three-dimensional passenger macroscopic fundamental diagram (pMFD) defines the functional relation between car accumulation, bus accumulation, and travel production of passengers. It facilitates the analysis of bi-modal traffic dynamics from the perspective of passenger flows. Usually, it is estimated based on empirical or simulated data. This paper focuses on the role of data characteristics for such a data-based estimation. Thereby, we study two main scenarios: First, we examine the effects of penetration and sampling rates, as well as speed measurement errors of person-specific position data on the pMFD estimation. Second, we study the estimation accuracy assuming that only data from automatic passenger count (APC) devices and loop detectors are available. This reflects a more realistic data source for local authorities. In this context, we investigate the impact of the penetration rate of APC device-equipped buses on the pMFD estimation. To quantify the impact of those aspects, we compare the resulting pMFDs to the ground truth pMFD, which has a penetration rate of 100 %, a sampling rate of 1 Hz, and no error in the speed data. To reduce the case dependency, we conduct this analysis for two networks: the well-known Sioux Falls network, and Schwabing, a district in Munich, Germany. The results of our study show that the penetration rate has a strong influence on the pMFD and the error increases faster the lower the penetration rate is. In contrast, the sampling rate and speed measurement error have a smaller impact on the pMFD estimation. Additionally, our results indicate that the estimation accuracy is reasonably high when APC data is utilized. The results of this study are relevant to both scientists and practitioners, as we not only advance the knowledge on the pMFD estimation but also show what data are necessary for robust estimation and thus its application in practice.
- Published
- 2021
49. PCNET: Parallelly Conquer the Large Variance of Person Re-Identification
- Author
-
Jianyuan Wang, Meiyue You, Guanglu Song, Ming Jiang, and Biao Leng
- Subjects
Statistics ,Variance (accounting) ,Re identification ,Mathematics - Published
- 2021
50. Learning Imbalanced Datasets With Maximum Margin Loss
- Author
-
Thang Vu, Haeyong Kang, and Chang D. Yoo
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Margin (machine learning) ,Statistics ,Geology ,Machine Learning (cs.LG) - Abstract
A learning algorithm referred to as Maximum Margin (MM) is proposed for considering the class-imbalance data learning issue: the trained model tends to predict the majority of classes rather than the minority ones. That is, underfitting for minority classes seems to be one of the challenges of generalization. For a good generalization of the minority classes, we design a new Maximum Margin (MM) loss function, motivated by minimizing a margin-based generalization bound through the shifting decision bound. The theoretically-principled label-distribution-aware margin (LDAM) loss was successfully applied with prior strategies such as re-weighting or re-sampling along with the effective training schedule. However, they did not investigate the maximum margin loss function yet. In this study, we investigate the performances of two types of hard maximum margin-based decision boundary shift with LDAM's training schedule on artificially imbalanced CIFAR-10/100 for fair comparisons and effectiveness.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.