Descriptor: "random forest regression" / Database: Supplemental Index - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"random forest regression"' showing total 39 results

Start Over Descriptor "random forest regression" Database Supplemental Index

39 results on '"random forest regression"'

1. Preconditioning of clinical data for intraocular lens formula constant optimisation using Random Forest Quantile Regression Trees.

Author: Langenbucher, Achim, Szentmáry, Nóra, Cayless, Alan, Wendelstein, Jascha, and Hoffmann, Peter
Abstract: To implement a fully data driven strategy for identifying outliers in clinical datasets used for formula constant optimisation, in order to achieve proper formula predicted refraction after cataract surgery, and to assess the capabilities of this outlier detection method. 2 clinical datasets (DS1/DS2: N = 888/403) of eyes treated with a monofocal aspherical intraocular lens (Hoya XY1/Johnson&Johnson Vision Z9003) containing preoperative biometric data, power of the lens implant and postoperative spherical equivalent (SEQ) were transferred to us for formula constant optimisation. Original datasets were used to generate baseline formula constants. A random forest quantile regression algorithm was set up using bootstrap resampling with replacement. Quantile regression trees were grown and the 25% and 75% quantile, and the interquartile range were extracted from SEQ and formula predicted refraction REF for the SRKT, Haigis and Castrop formulae. Fences were defined from the quantiles and data points outside the fences were marked and removed as outliers before recalculating the formula constants. N B = 1000 bootstrap samples were derived from both datasets, and random forest quantile regression trees were grown to model SEQ versus REF and to estimate the median and 25% and 75% quantiles. The fence boundaries were defined as being from 25% quantile - 1.5·IQR to 75% quantile + 1.5·IQR, with data points outside the fence being marked as outliers. In total, for DS1 and DS2, 25/27/32 and 4/5/4 data points were identified as outliers for the SRKT/Haigis/Castrop formulae respectively. The respective root mean squared formula prediction errors for the three formulae were slightly reduced from: 0.4370 dpt;0.4449 dpt/0.3625 dpt;0.4056 dpt/and 0.3376 dpt;0.3532 dpt to: 0.4271 dpt;0.4348 dpt/0.3528 dpt;0.3952 dpt/0.3277 dpt;0.3432 dpt for DS1;DS2. We were able to prove that with random forest quantile regression trees a fully data driven outlier identification strategy acting in the response space is achievable. In a real life scenario this strategy has to be complemented by an outlier identification method acting in the parameter space for a proper qualification of datasets prior to formula constant optimisation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Generating synthetic mixed-type tabular data by decoding samples from a latent-space: a case study in healthcare.

Author: Drapała, Jarosław and Świątek, Jerzy
Subjects: PROBABILITY density function, DECISION support systems, SUPPORT vector machines, MULTIDIMENSIONAL scaling, RANDOM forest algorithms
Abstract: Medical data are subject to privacy regulations, which severely limit AI specialists who wish to construct decision support systems for medicine. Large amounts of this data are tabular, indicating that they are organized into a table format, where patient records are represented in rows and measured variables in columns. Furthermore, the variables come in different types—some are numerical, while others are categorical. In this work, we introduce a novel method for constructing generators of synthetic tabular data with mixed types. The key point of our approach is the explicit utilization of a latent space to represent the original data. A case study using real medical data is presented. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Development of Customer Review Ranking Model Considering Product and Service Aspects Using Random Forest Regression Method.

Author: Djunaidy, Arif and Fano, Nisrina Fadhilah
Abstract: Customer reviews are the second-most reliable source of information, followed by family and friend referrals. However, there are many existing customer reviews. Some online shopping platforms address this issue by ranking customer reviews according to their usefulness. However, we propose an alternative method to rank customer reviews, given that this system is easily manipulable. This study aims to create a ranking model for reviews based on their usefulness by combining product and seller service aspects from customer reviews. This methodology consists of six primary steps: data collection and preprocessing, aspect extraction and sentiment analysis, followed by constructing a regression model using random forest regression, and the review ranking process. The results demonstrate that the ranking model with service considerations outperformed the model without service considerations. This demonstrates the model's superiority in the three tests, which include a comparison of the regression results, the aggregate helpfulness ratio, and the matching score. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. The removal of methylene blue from aqueous solutions by polyethylene microplastics: Modeling batch adsorption using random forest regression.

Author: Bahrami, Mehdi, Amiri, Mohammad Javad, Rajabi, Sara, and Mahmoudi, Mohamadreza
Subjects: PLASTIC marine debris, METHYLENE blue, RANDOM forest algorithms, MICROPLASTICS, AQUEOUS solutions, POLYETHYLENE, ADSORPTION (Chemistry)
Abstract: In light of the extensive contamination of water sources by microplastics, their substantial specific surface area makes them favorable candidates as adsorbents for the simultaneous removal of coexisting contaminants in wastewater. In this regard, polyethylene microplastics were utilized to eliminate methylene blue dye from water. MB adsorption onto microplastics reached equilibrium in just 30 min at pH 7. The better fit of fractional power and Redlich-Peterson models on kinetic and equilibrium adsorption data, respectively, revealed that the MB removal process is a chemisorption in multilayer adsorption on the heterogeneous surface of the microplastics particles. The reusability of the microplastics adsorbent was confirmed based on the promising outcomes observed after five cycles. The results of the random forest regression exhibited an R2 of 97.55% for the correlation between the model-computed and measured amounts of MB reduction. The sensitivity analysis illustrated that the MB sorption process on microplastics is highly influenced by the initial MB concentration and adsorbent mass. These results show that although microplastics may pose potential risks to water environments, their adsorption potential can be utilized to simultaneously omit other pollutants from the aqueous solutions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Intelligent System for Assessing University Student Personality Development and Career Readiness.

Author: Izbassar, Assylzhan, Muratbekova, Muragul, Amangeldi, Daniyar, Oryngozha, Nazzere, Ogorodova, Anna, and Shamoi, Pakizar
Subjects: MACHINE learning, PERSONALITY development, STUDENT development, COLLEGE students, KNOWLEDGE acquisition (Expert systems), PREPAREDNESS
Abstract: While academic metrics such as transcripts and GPA are commonly used to evaluate students' knowledge acquisition, there are limited comprehensive metrics to measure their preparedness for the challenges of post-graduation life. This research paper explores the impact of various factors on university students' readiness for change and transition, with a focus on their preparedness for careers. The methodology employed in this study involves designing a survey based on Paul J. Mayer's "The Balance Wheel" to capture students' sentiments on various life aspects, including satisfaction with the educational process and expectations of salary. The collected data from a KBTU student survey (n=47) were processed through machine learning models: Linear Regression, Support Vector Regression (SVR), and Random Forest Regression. Subsequently, an intelligent system was built using these models and fuzzy sets. The system is capable of evaluating graduates' readiness for their future careers and demonstrates a high predictive power. The findings of this research have practical implications for educational institutions. Such an intelligent system can serve as a valuable tool for universities to assess and enhance students' preparedness for post-graduation challenges. By recognizing the factors contributing to students' readiness for change, universities can refine curricula and processes to better prepare students for their career journeys. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Comparison of Predictive Algorithms for IoT Smart Agriculture Sensor Data.

Author: Fondaj, Jakup, Hamiti, Mentor, Krrabaj, Samedin, Zenuni, Xhemal, and Ajdari, Jaumin
Subjects: INTELLIGENT sensors, ARTIFICIAL neural networks, INTERNET of things, ALGORITHMS, ARTIFICIAL intelligence, AGRICULTURAL technology
Abstract: This paper compares predictive algorithms for smart agriculture sensor data in Internet of Things (IoT) applications. The main objective of IoT in agriculture is to improve productivity and reduce production costs using advanced technology and artificial intelligence. In this study, we compared various predictive algorithms for analyzing IoT smart agriculture sensor data. Specifically, we evaluated the performance of NeuralProphet, Random Forest Regression, SARIMA, and Artificial Neural Networks (ANN) by KERAS algorithms on a dataset containing temperature, humidity, and soil moisture data. The dataset was collected using IoT sensors in a smart agriculture system. The results showed that Random Forest Regression, Seasonal ARIMA, and Artificial Neural Networks by KERAS algorithms outperformed NeuralProphet algorithm in terms of accuracy and computational efficiency. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. MACHINE LEARNING ALGORITHM FOR FINTECH INNOVATION IN BLOCKCHAIN APPLICATIONS.

Author: Narayanan, V. Lakshmana, Pandi, G. Ramesh, Kaleeswari, K., and Veni, S.
Subjects: MACHINE learning, RANDOM forest algorithms, BLOCKCHAINS, FINANCIAL technology, DECISION trees, DEFAULT (Finance)
Abstract: The rapid growth of Fintech innovation and the widespread adoption of blockchain technologies have indeed had a transformative impact on the financial industry. In this paper, the focus is on the application of machine learning algorithms, specifically the Random Forest Regression algorithm, within the context of Fintech and blockchain. This research contributes to the advancement of machine learning techniques in the field of Fintech and blockchain. The Random Forest Regression algorithm utilizes ensemble learning, combining multiple decision trees to analyze complex financial data and make predictions on various outcomes. This algorithm has proven to be effective in addressing key challenges within the industry, such as predicting loan defaults, detecting fraud, and assessing risks. Through experimental evaluations and case studies, the paper demonstrates the effectiveness of the Random Forest Regression algorithm in enhancing Fintech innovation in blockchain applications. The algorithm improved accuracy, scalability, and interpretability enable financial institutions to make data-driven decisions and optimize their operations. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

8. Response of zooplankton to nutrient reduction and enhanced fish predation in a shallow eutrophic lake.

Author: Mao, Zhigang, Cao, Yong, Gu, Xiaohong, Zeng, Qingfei, Chen, Huihui, and Jeppesen, Erik
Subjects: PREDATION, LAKE restoration, WATER quality management, FISH stocking, ZOOPLANKTON, STRUCTURAL equation modeling, RANDOM forest algorithms, LAKES
Abstract: As a key link between top‐down regulators and bottom‐up factors, zooplankton responds sensitively to environmental variations and provides information on the ecological state of freshwater systems. Although the response of zooplankton to anthropogenic pressures and fluctuating natural conditions, such as nutrient loading and climate change, has been extensively examined, findings have varied markedly. The mechanistic basis for the correlation between environmental variability and the zooplankton community is still debated, particularly for subtropical eutrophic lakes. We used two methods to analyze physicochemical and selected biological variables derived from long‐term monitoring of Lake Taihu, a subtropical shallow lake in China. We first applied random forest regression to examine how changes in zooplankton were related to a set of environmental variables on interannual time scales. Then we used the results to guide the construction of a conceptual model for piecewise structural equation modeling (pSEM) to quantify more precisely the zooplankton–environment relationship. Zooplanktivorous fish and nutrient concentrations were the most important predictors of long‐term trends in zooplankton in RF regression. Intensification of planktivorous fish predation led to a lower zooplankton biomass and smaller individuals through the removal of larger crustaceans. Moreover, suppression of zooplankton can in part be explained by increases in inedible algae, triggered by a combination of reduced nutrient concentrations and weakened grazer control. These results were also confirmed in the pSEM, which further indicated that top‐down regulators might be more important than bottom‐up factors for the zooplankton community in Lake Taihu. Our results suggest that stocking of filter‐feeding fish in the lake did not meet the expectation that they would control algae, but that the use of biomanipulation measures considering both water quality and fishery management seems promising. This study offers insights into how indicator metrics of zooplankton can improve our understanding of the associations between plankton communities and ecosystem alterations. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. A random forest‐regression‐based inverse‐modeling evolutionary algorithm using uniform reference points.

Author: Gholamnezhad, Pezhman, Broumandnia, Ali, and Seydi, Vahid
Subjects: RANDOM forest algorithms, DISTRIBUTION (Probability theory), GAUSSIAN processes, EVOLUTIONARY algorithms, PARTICLE swarm optimization
Abstract: The model‐based evolutionary algorithms are divided into three groups: estimation of distribution algorithms, inverse modeling, and surrogate modeling. Existing inverse modeling is mainly applied to solve multi‐objective optimization problems and is not suitable for many‐objective optimization problems. Some inversed‐model techniques, such as the inversed‐model of multi‐objective evolutionary algorithm, constructed from the Pareto front (PF) to the Pareto solution on nondominated solutions using a random grouping method and Gaussian process, were introduced. However, some of the most efficient inverse models might be eliminated during this procedure. Also, there are challenges, such as the presence of many local PFs and developing poor solutions when the population has no evident regularity. This paper proposes inverse modeling using random forest regression and uniform reference points that map all nondominated solutions from the objective space to the decision space to solve many‐objective optimization problems. The proposed algorithm is evaluated using the benchmark test suite for evolutionary algorithms. The results show an improvement in diversity and convergence performance (quality indicators). [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

10. Industrial Centric Node Localization and Pollution Prediction Using Hybrid Swarm Techniques.

Author: Ram, R. Saravana, Kumar, M. Vinoth, Krishnamoorthy, N., Baseera, A., Hussain, D. Mansoor, and Susila, N.
Subjects: WIRELESS sensor networks, INDUSTRIAL revolution, PARTICLE swarm optimization, METAHEURISTIC algorithms, STANDARD deviations, MACHINE learning
Abstract: Major fields such as military applications, medical fields, weather forecasting, and environmental applications use wireless sensor networks for major computing processes. Sensors play a vital role in emerging technologies of the 20th century. Localization of sensors in needed locations is a very serious problem. The environment is home to every living being in the world. The growth of industries after the industrial revolution increased pollution across the environment. Owing to recent uncontrolled growth and development, sensors to measure pollution levels across industries and surroundings are needed. An interesting and challenging task is choosing the place to fit the sensors. Many meta-heuristic techniques have been introduced in node localization. Swarm intelligent algorithms have proven their efficiency in many studies on localization problems. In this article, we introduce an industrial-centric approach to solve the problem of node localization in the sensor network. First, our work aims at selecting industrial areas in the sensed location. We use random forest regression methodology to select the polluted area. Then, the elephant herding algorithm is used in sensor node localization. These two algorithms are combined to produce the best standard result in localizing the sensor nodes. To check the proposed performance, experiments are conducted with data from the KDD Cup 2018, which contain the name of 35 stations with concentrations of air pollutants such as PM, SO2, CO, NO2, and O3. These data are normalized and tested with algorithms. The results are comparatively analyzed with other swarm intelligence algorithms such as the elephant herding algorithm, particle swarm optimization, and machine learning algorithms such as decision tree regression and multi-layer perceptron. Results can indicate our proposed algorithm can suggest more meaningful locations for localizing the sensors in the topology. Our proposed method achieves a lower root mean square value with 0.06 to 0.08 for localizing with Stations 1 to 5. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

11. Towards resilient communities: Evaluating the nonlinear impact of the built environment on COVID-19 transmission risk in residential areas.

Author: Guo, Weiqi, Wang, Jingwei, Liu, Xiaoyu, Pan, Zhenyu, Zhuang, Rui, Li, Chunying, and Tang, Haida
Abstract: • Assessing the community resilience to epidemic based on a GIS grid method. • Residential areas around the city's outer ring roads need specific attention. • Both low-density and high-density construction areas exhibit high epidemic risks. • Socio-economic factors significantly influence the spread of COVID-19. • Property fee and green rate are negatively correlated with the epidemic risk. Reflections on urban epidemics often drive improvements in the resilience of the built environment. However, the assessment regarding the nonlinear influence of the community environment on the spread of Corona Virus Disease 2019 (COVID-19) is inadequate. This study analyzed the influential mechanism of built environment factors on the epidemic risk in residential areas, using Shanghai as a case study. During the lockdown in April 2022, Shanghai reported daily data on COVID-19 outbreaks in residential areas, amounting to a total of 90,324 entries. Based on a GIS-based grid analysis approach, we employed a Random Forest (RF) model and a Multiscale Geographically Weighted Regression (MGWR) model to investigate the marginal effects and spatial heterogeneity of environmental factors on the mean count of COVID-19 outbreak days (MC) in residential areas within each grid zone. The results show that the value of MC forms a ring-mountain distribution surrounding the city's outer ring road. The RF model (R² = 0.57) demonstrates that the house price, population density, family number, and the standard deviation of building height (BH_SD) significantly correlated with MC, with the relative importance of 25 %, 13 %, 11 %, and 6 %, respectively. The MGWR model (R² = 0.63) highlights the spatial heterogeneity of family number, house age, house price, property fee, and delivery density. We also found that property fee and green rate were negatively correlated with the MC. These findings help improve responses to public health emergencies and create more resilient communities to cope with pandemics. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

12. The impact of urban morphology on land surface temperature across urban-rural gradients in the Pearl River Delta, China.

Author: Wu, Ying, Che, Yangzi, Liao, Weilin, and Liu, Xiaoping
Abstract: • Investigated 2D and 3D urban morphology impact on LST using urban-rural gradients. • Used RF models to quantify the impact of 9 indicators on LST. • 3D factors have more significant influence on urban core compared to rural area. • SBS, SCD, and MBH are the major 3D factors in rural area, suburb, and urban core. • MBH and SDH mainly show cooling effect especially in urban core. Optimizing urban morphology effectively mitigates urban heat island effects. However, previous research on urban morphology and land surface temperature (LST) has often concentrated on entire cities or specific local areas, neglecting the heterogeneity within urban regions. This study investigates the impact of two-dimensional (2D) and three-dimensional (3D) urban morphology on LST based on urban-rural gradients. Firstly, we calculated nine 2D and 3D indicators to comprehensively depict the urban morphology. Then, we employed a multi-iterative quantile method based on nighttime light data to delineate three city subclasses: urban core, suburb, and rural area (USR). Finally, we quantified the impact of these indicators on LST thorough correlation analysis and a random forest (RF) model. Results indicate that, overall and post-USR classification, the sum of building area (SBA) is the primary factor influencing LST, contributing up to 31.6% in suburb. The main 3D factors affecting LST differ across subclasses: in rural area, it is the sum of building surface (SBS, 12.9%); in suburb, it is the spatial congestion degree (SCD, 9.7%); and in urban core, it is the mean building height (MBH, 12.0%). Notably, the influence of 3D indicators increases from 68.1% in rural area to 72.3% in urban core. In urban core, MBH and standard deviation of height (SDH) are negatively correlated with LST. Case studies in Tianhe District (Guangzhou) and Futian District (Shenzhen) confirmed these indicators do have a certain effect on reducing the temperature. This study provides valuable insights for improving the urban thermal environment and promoting sustainable urban development. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

13. Predicting The Throughput Of Next Generation IEEE 802.11 WLANs In Dense Deployments.

Author: Mohan, Rajasekar, Ramnan, K Venkat, and J, Manikandan
Subjects: WIRELESS LANs, ARTIFICIAL neural networks, STANDARD deviations, RANDOM forest algorithms, MACHINE learning
Abstract: Next-generation IEEE 802.11 WLANs when deployed in dense environments and complex situations, the throughput achieved is much lower than the estimated values due to interference, overlapping of channel bandwidths and contention. Throughput estimation through simulators is tedious and needs elaborate information regarding the deployment details related to overlapping BSS scenarios. With large accurate datasets of BSS deployments, it is found to be possible to approach the problem of prediction of throughput of each BSS by using well-crafted machine learning (ML) models. In this paper, we proposed three ML approaches to predict the throughput viz artificial neural networks (ANN), k-Nearest Neighbours (kNN) regression and random forest regression. The root mean square error and the mean absolute error thus calculated in each of these approaches in the given setting are promising enough to further pursue the probe for more accurate prediction models based on machine learning. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

14. Random Forest Regression to Predict Catalyst Deactivation in Industrial Catalytic Process.

Author: Hanif, Wisnu Hafi and Gunawan, Fergyanto E.
Subjects: RANDOM forest algorithms, MANUFACTURING processes, CATALYST poisoning, MACHINE learning, TIME series analysis, CATALYSTS
Abstract: Catalyst deactivation has become a great concern in an industry with heterogenous catalystbased production. An accurate model to predict catalyst performance is needed to optimize the maintenance schedule, avoid an unplanned shutdown, and ensure reliable operation. This research work applies a machine learning model to predict catalyst deactivation based on actual data from relevant multitube-reactor sensors. The product conversion is a crucial indicator of the catalyst performance degradation over time. Random forest regression (RFR) algorithm is chosen to construct the model. Hyperparameter tuning is applied and shows improvement over the default model. The result showed that the RFR model could predict the conversion as a time series function. The feature importance analysis shows the most influencing factor and facilitates the model interpretation. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

15. A Recommender System for Predicting Students' Admission to a Graduate Program using Machine Learning Algorithms.

Author: Guabassi, Inssaf El, Bousalem, Zakaria, Marah, Rim, and Qazdar, Aimad
Subjects: RECOMMENDER systems, MACHINE learning, SUPERVISED learning, DECISION trees, GRADUATE education, SUPPORT vector machines, RANDOM forest algorithms, UNIVERSITY & college admission
Abstract: In the 21st century, University educations are becoming a key pillar of social and economic life. It plays a major role not only in the educational process but also in the ensuring of two important things which are a prosperous career and financial security. However, predicting university admission can be especially difficult because the students are not aware of admission requirements. For that reason, the main purpose of this research work is to provide a recommender system for early predicting university admission. Therefore, the contributions are threefold: The first is to apply several Supervised Machine Learning algorithms namely Linear Regression, Support Vector Regression, Decision Tree Regression, and Random Forest Regression. The second purpose is to compare and evaluate algorithms used to create a predictive model based on various evaluation metrics. The last purpose is to determine the most important parameters that influence the chance of admission. The experimental results showed that the Random Forest Regression is the most suitable Machine Learning algorithm for predicting university admission. Also, the Cumulative Grade Point Average is the most important parameter that influences the chance of admission. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

16. Estimation of diaphragm wall deflections for deep braced excavation in anisotropic clays using ensemble learning.

Author: Zhang, Runhong, Wu, Chongzhi, Goh, Anthony T.C., Böhlke, Thomas, and Zhang, Wengang
Abstract: This paper adopts the NGI-ADP soil model to carry out finite element analysis, based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated. More than one thousand finite element cases were numerically analyzed, followed by extensive parametric studies. Surrogate models were developed via ensemble learning methods (ELMs), including the eXtreme Gradient Boosting (XGBoost), and Random Forest Regression (RFR) to predict the maximum lateral wall deformation (δ hmax). Then the results of ELMs were compared with conventional soft computing methods such as Decision Tree Regression (DTR), Multilayer Perceptron Regression (MLPR), and Multivariate Adaptive Regression Splines (MARS). This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast, alternative way. Image 1 • FE analysis considering soil anisotropy via NGI-ADP model carried out. • Effects of anisotropy on diaphragm wall deflections evaluated. • ELMs as well as soft computing models for prediction of lateral wall deformation. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

17. EXPLORING THE DETERMINANTS OF HIGHER EDUCATION DEGREE PRODUCTIVITY IN MACHINE LEARNING.

Author: Qianhui Guo and Lee, James J.
Subjects: MACHINE learning, COVID-19 pandemic, HIGHER education, CLASSROOM environment, ONLINE education
Abstract: Studying the measures and determinants of institutional productivity is a critical field for policymakers and institutional leaders to identify an improvement strategy on how to allocate properly limited resources for higher productivity. Among them, degree completion is not only an ultimate outcome for students who are seeking a degree program, but also one of the essential measures that evaluate student success. Studies show that achieving higher degree productivity is identified as the most common concern for post secondary institutions. This study reveals cost-related determinants play significant impacts on degree productivity by adopting a machine learning approach. The results suggest how these determinants worked in different scenarios which further explains the previous studies in degree productivity. With more online education being forcefully enforced with the lifestyle adaptation with COVID-19 pandemic, this explanation in degree productivity plays an important role to provide visions on how to allocate resources in an online learning environment. [ABSTRACT FROM AUTHOR]
Published: 2020

18. Mass Apprasial With A Machine Learning Algorithm:Random Forest Regression.

Author: CANAZ SEVGEN, Sibel and ALİEFENDİOĞLU TANRIVERMİŞ, Yeşim
Abstract: Copyright of International Journal of InformaticsTechnologies is the property of Institute of Informatics, Gazi University and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2020
Full Text: View/download PDF

19. Population estimation by random forest analysis using Social Sensors.

Author: Hara, Hiroki, Fujita, Yoshikatsu, and Tsuda, Kazuhiko
Subjects: BASEBALL teams, DETECTORS, REGRESSION analysis, SPACE groups
Abstract: This paper aims to estimate the population in a specific space from the numbers of posted tweets and their senders, using Twitter's real-time property and location information data. The population to be estimated was set to be the attendance at each game among the six baseball teams of the Japan Professional Baseball Pacific League held at the main stadium of each team. The relation between the attendance and Twitter data was analyzed, and random forest regression models using Twitter data were used to estimate the attendances. While there are many studies on event detection or location identification using Twitter data, no study has been reported on the estimation of the population in a specific space using "time information" and "location information" characteristic of Twitter data. Using Twitter data, which contains users' messages, for estimating the population can be extended to various types of analyses, such as the analysis of feelings and opinions of the groups in the space. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

20. Restaurants store management based on demand forecasting.

Author: Tanizaki, Takashi, Hoshino, Tomohiro, Shimmura, Takeshi, and Takenaka, Takeshi
Abstract: In this paper, restaurants store management based on demand forecasting is proposed. The restaurant service industry has low productivity due to the simultaneity of service goods. In order to solve such problems, we are researching how to manage restaurant stores such as employee placement, food material ordering, etc., based on highly accurate demand forecasting by machine learning with internal data such as POS data and external data exiting in ubiquitous such as weather and events. In this paper, we discuss the forecasting results of customer order quantity and shop inventory order quantity of draft beer using forecasting method with machine learning for restaurant chain R. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

21. Stock Closing Price Prediction using Machine Learning Techniques.

Author: Vijh, Mehar, Chandola, Deeksha, Tikkiwal, Vinay Anand, and Kumar, Arun
Subjects: STOCK prices, FINANCIAL databases, MACHINE learning, ARTIFICIAL neural networks, FORECASTING, STOCK exchanges
Abstract: Accurate prediction of stock market returns is a very challenging task due to volatile and non-linear nature of the financial stock markets. With the introduction of artificial intelligence and increased computational capabilities, programmed methods of prediction have proved to be more efficient in predicting stock prices. In this work, Artificial Neural Network and Random Forest techniques have been utilized for predicting the next day closing price for five companies belonging to different sectors of operation. The financial data: Open, High, Low and Close prices of stock are used for creating new variables which are used as inputs to the model. The models are evaluated using standard strategic indicators: RMSE and MAPE. The low values of these two indicators show that the models are efficient in predicting stock closing price. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

22. An innovative approach to the assessment of hydro-political risk: A spatially explicit, data driven indicator of hydro-political issues.

Author: Farinosi, F., Giupponi, C., Reynaud, A., Ceccherini, G., Carmona-Moreno, C., De Roo, A., Gonzalez-Sanchez, D., and Bidoglio, G.
Subjects: CLIMATE change, SOCIOECONOMICS, BIOPHYSICS, FRESH water, WATER management
Abstract: Highlights • Data driven spatially explicit index of hydro-political issues magnitude. • Estimation of the non-linear interactions between factors determining water issues. • Increasing climate change and population are likely to boost hydro-political issues. Abstract Competition over limited water resources is one of the main concerns for the coming decades. Although water issues alone have not been the sole trigger for warfare in the past, tensions over freshwater management and use represent one of the main concerns in political relations between riparian states and may exacerbate existing tensions, increase regional instability and social unrest. Previous studies made great efforts to understand how international water management problems were addressed by actors in a more cooperative or confrontational way. In this study, we analyze what are the pre-conditions favoring the insurgence of water management issues in shared water bodies, rather than focusing on the way water issues are then managed among actors. We do so by proposing an innovative analysis of past episodes of conflict and cooperation over transboundary water resources (jointly defined as "hydro-political interactions"). On the one hand, we aim at highlighting the factors that are more relevant in determining water interactions across political boundaries. On the other hand, our objective is to map and monitor the evolution of the likelihood of experiencing hydro-political interactions over space and time, under changing socioeconomic and biophysical scenarios, through a spatially explicit data driven index. Historical cross-border water interactions were used as indicators of the magnitude of corresponding water joint-management issues. These were correlated with information about river basin freshwater availability, climate stress, human pressure on water resources, socioeconomic conditions (including institutional development and power imbalances), and topographic characteristics. This analysis allows for identification of the main factors that determine water interactions, such as water availability, population density, power imbalances, and climatic stressors. The proposed model was used to map at high spatial resolution the probability of experiencing hydro-political interactions worldwide. This baseline outline is then compared to four distinct climate and population density projections aimed to estimate trends for hydro-political interactions under future conditions (2050 and 2100), while considering two greenhouse gases emission scenarios (moderate and extreme climate change). The combination of climate and population growth dynamics is expected to impact negatively on the overall hydro-political risk by increasing the likelihood of water interactions in the transboundary river basins, with an average increase ranging between 74.9% (2050 – population and moderate climate change) to 95% (2100 - population and extreme climate change). Future demographic and climatic conditions are expected to exert particular pressure on already water stressed basins such as the Nile, the Ganges/Brahmaputra, the Indus, the Tigris/Euphrates, and the Colorado. The results of this work allow us to identify current and future areas where water issues are more likely to arise, and where cooperation over water should be actively pursued to avoid possible tensions especially under changing environmental conditions. From a policy perspective, the index presented in this study can be used to provide a sound quantitative basis to the assessment of the Sustainable Development Goal 6, Target 6.5 "Water resources management", and in particular to indicator 6.5.2 "Transboundary cooperation". [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

23. RNNs-RT: Flood based Prediction of Human and Animal deaths in Bihar using Recurrent Neural Networks and Regression Techniques.

Author: Khosla, Ekaansh, Ramesh, Dharavath, Sharma, Rashmi Priya, and Nyakotey, Samuel
Subjects: FLOODS, RAINFALL, ANIMAL mortality, POPULATION density, NATURAL disasters
Abstract: Flood is a natural disaster that occurs when an overflow of water, immerse land which is typically dry. Many times, in case of heavy rainfall, residential areas are not provided with the ample drainage system, which is the main reason behind the deaths caused by floods. The floods causes’ large number of deaths in every country, and due to India’s high population density and low development standards large amount of deaths and damage takes place which otherwise could be avoided, if necessary preventive measures are taken. Bihar, which is India’s one of the most flood-prone state, every year, millions of people and animals are hit by flood resulting in death of many people and animals. To accommodate the flood based prediction, in this study, prediction strategy named RNNs-RT is proposed. As a prediction strategy, first, the amount of rainfall that can occur in the future years in Bihar is predicted by using recurrent neural networks (RNNs) and then the number of human and animal deaths that can be a result of the rainfall in Bihar is predicted by using Regression techniques (RTs). With this, proper preventive measures can be taken to decrease the damage caused due to rains in the upcoming years. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

24. Biophysical and socioeconomic determinants of tea expansion: Apportioning their relative importance for sustainable land use policy.

Author: Zhang, Qianwen, Gao, Wujun, Su, Shiliang, Weng, Min, and Cai, Zhongliang
Subjects: LAND use, SUSTAINABLE agriculture, RANDOM forest algorithms, TEA growing, DATA mining
Abstract: Tea expansion, a typical process of regional land use and cover change (LUCC), has raised great concerns on regional sustainability. In this regard, exploring the determinants of tea expansion should provide critical implications for land use policy. It has been widely recognized that LUCC interacts nonlinearly with a set of determinants and their feedbacks should be rather complex. Policy makers are now facing the challenge to identify, apportion, and compare the determinants of regional tea expansion for designing more targeted political intervenes. Our paper utilizes a robust tool, the random forest (RF) regression in particular, to explore the determinants of tea expansion across two periods (1985–2007 and 2007–2016) in Anji County, a typical region of tea production in subtropical China. More specifically, tea is extracted from Landsat imageries and total tea cultivated area acts as the dependent variable. Exploratory variables include 38 potential determinants and these determinants are divided into two categories (biophysical and socioeconomic) at two levels (pixel and village). We obtain some similar findings, though the relative importance of determinants varies with the two periods. In general, biophysical determinants (e.g., topography, soil type, land use in the neighborhood) present greater relative importance than the socioeconomic determinants in both periods. In period 1985–2007, biophysical determinants at pixel level are more essential in governing tea expansion. In period 2007–2016, the relative importance of pixel level biophysical determinants is comparable with that of the village level determinants. Comparisons of the two periods indicate that relative importance of soil type and socioeconomic proximity becomes greater in period 2007–2016, while that of the total employees and non-agricultural population proportion becomes lower. Partial dependency plots are further drawn to visualize the marginal effect of each determinant. We finally propose three options for land use policy towards sustainability. Our study demonstrates that the RF regression is efficient for policy makers to understand the determinants of tea expansion with a nonlinear and complex nature. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

25. Impact assessment of human activities on resources of juvenile horseshoe crabs in Hainan coastal areas, China.

Author: Chen, Xiaohai, Gu, Yang-Guang, Ying, Ziwei, Luo, Zimeng, Zhang, Wanling, and Xie, Xiaoyong
Subjects: LIMULIDAE, HUMAN geography, HUMAN resources departments, MARINE parks & reserves, BEACHES, COASTS, MARINE habitats
Abstract: The booming coastal zone economy poses increasing anthropogenic threats to marine life and habitats. Using the endangered living fossil horseshoe crab (HSC) as an example, we quantified the intensity of various anthropogenic pressures along the coast of Hainan Island, China, and for the first time assessed their impact on the distribution of juvenile HSCs through a field survey, remote sensing, spatial geographic modeling, and machine learning methods. The results indicate that the Danzhou Bay needs to be protected as a priority based on species and anthropogenic pressure information. Aquaculture and port activities dramatically impact the density of HSCs and therefore be managed priority. Finally, a threshold effect between total, coastal residential, and beach pressure and the density of juvenile HSCs were detected, which indicates the need for a balance between development and conservation as well as the designation of suitable sites for the construction of marine protected areas. • First isolation and quantification of the effects of coastal anthropogenic pressure intensity on juvenile horseshoe crabs. • The impact of human activities on the geographical distribution of juvenile horseshoe crabs along different coasts was revealed for the first time. • Prioritizing decisions for horseshoe crab conservation about the spatial heterogeneity of anthropogenic pressure intensity and its impact on the geographical distribution of juvenile horseshoe crabs. • Complement the baseline data for juvenile Asian HSCs on Hainan Island. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Modeling urban canopy air temperature at city-block scale based on urban 3D morphology parameters– A study in Tianjin, North China.

Author: Li, Xiaorui, Yang, Bisheng, Liang, Fuxun, Zhang, Hongsheng, Xu, Yong, and Dong, Zhen
Subjects: ATMOSPHERIC temperature, URBAN morphology, URBAN heat islands, URBAN growth
Abstract: Urban 3D morphology significantly influences the outdoor thermal environment. Understanding the influence of urban expansion in both horizontal and vertical landscapes helps the canopy urban heat island (CUHI) effect mitigation. However, the microscale numerical CUHI models are difficult to be applied for large-area CUHI effect studies. On the other hand, the mesoscale CUHI models make a wider study area workable but lost some of the model accuracies. To perform a large-area study on the CUHI effect with relatively light computing costs and fine accuracy, this paper builds a canopy air temperature predicting model at city-block scale with urban 3D morphology parameters including building coverage ratio (BCR), grass coverage ratio (GCR) and the mean value of building height (BH) to obtain the citywide block-mean 2-m temperature (T2M). The model accuracy was validated through RMSEs and comparison with the mereological station data. The proposed model shows an RMSE of 0.286 straight °C and an R-square of 0.83. Using the validated model, Tianjin with an area of 647 km 2 was performed to investigate the effects of vertical landscape on the canopy air temperature under different scenarios between 2010 and 2016, including the changes in landcover and building heights. It finds that a 40% increase in BCR may lead to the highest canopy air temperature, and the increase of BH may lead to an increase in the canopy air temperature in low-rise and high-rise building areas, but there is an opposite trend in multi-story and mid-rise building areas. • A citywide canopy air temperature prediction model was proposed at block-scale. • A 40% increase in building coverage ratio would cause the highest canopy air temperature. • Canopy air temperature rises with the increase of building height in low-rise and high-rise building areas. • Canopy air temperature decreases with the increase of building height in multi-story and mid-rise building areas. • The change of urban 3D morphology in Tianjin between 2010 and 2016 has led to a 0.286 °C rise of canopy air temperature. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. What matters in the e-commerce era? Modelling and mapping shop rents in Guangzhou, China.

Author: Liu, Xuan, Tong, De, Huang, Jiangming, Zheng, Wenfeng, Kong, Minghui, and Zhou, Guohui
Subjects: RANDOM forest algorithms, CONSUMER behavior, ELECTRONIC commerce, SUBURBS, PEARSON correlation (Statistics), GEOSPATIAL data
Abstract: The rise of e-commerce is changing consumer behaviours and the value of retail space. Tracking the changes of shop rents under the impact of e-commerce and understanding the logic behind those changes is important for urban management. By identifying the factors that reflect the impacts of e-commerce and applying the Pearson correlation coefficient and the LASSO model, this study uses geospatial big data to quantify and evaluate the changing factors that impact shop rents. The result of the rising importance of trade area and neighbourhood services challenges the traditional emphasis on locations for shop rents. Using the random forest regression algorithm, this study also maps the shop rent distribution in Guangzhou, China, and compares the result with the land value distribution in 2015. The scattered and smaller highest-rent centres indicate the decreasing influence of central place logic. The decline of some traditional spots with the highest commercial values and rise of new catering services centres in suburban areas have been observed, suggesting the changing impacts of the agglomerated externalities. • iIdentify factors that reflect the impacts of the e-commerce on shop rents. • Quantify and evaluate the changing factors that impact shop rents with geospatial big data. • The rising importance of trade area and neighbourhood services challenges the traditional emphasis on locations. • Use the random forest regression algorithm to map the shop rent distribution. • Find smaller highest-rent centres, decline of luxurious goods retails, and rise of suburban catering services centres. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. Investigating the effects of urban morphological factors on seasonal land surface temperature in a "Furnace city" from a block perspective.

Author: Yao, Xiong, Zhu, Zhipeng, Zhou, Xingwen, Shen, Yuanping, Shen, Xiabing, and Xu, Zhanghua
Subjects: LAND surface temperature, NORMALIZED difference vegetation index, URBAN heat islands, SEASONS, RANDOM forest algorithms
Abstract: • The effects of urban block form zone on the seasonal LSTs were explored. • The correlations between LST and urban morphological factors vary with seasons. • Blocks with higher building density and lower building height had higher LST. • Different contributions of urban morphological factors to seasonal LSTs were compared. • More attention should be paid to building form metrics and surface biophysical parameters in urban planning. A growing number of studies have examined the impact of urban morphological factors on surface urban heat islands (SUHI). However, less attention is paid to investigating the comprehensive effect of urban morphological factors on land surface temperature (LST) across seasons, particularly at the block scale. In this study, we investigated 385 blocks in the central part of Fuzhou city, China. Twelve urban morphological factors in four categories were calculated from multi-source data, and the random forest regression (RFR) method was employed to calculate the relative importance of urban morphological factors from a block perspective. Our results confirmed that the blocks with high-density low-rise buildings had the highest LST and distribution index. The correlations between LST and urban morphological factors varied seasonally. The normalized difference vegetation index yielded a positive correlation with LST in winter. The RFR model revealed that twelve urban morphological factors could explain 52–57.8% of LST variation. More importantly, the building form metrics and surface biophysical parameters are the two essential categories affecting LST, contributing >70% of the total relative contribution to LST. These findings provide useful insights to ameliorate the SUHI effect and have substantial implications for urban ecological sustainability from a block perspective. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

29. Comprehensive efficiency evaluation of wastewater treatment plants in northeast Qinghai–Tibet Plateau using slack–based data envelopment analysis.

Author: Feng, Zhaohui, Liu, Xiaojie, Wang, Lingqing, Wang, Yong, Yang, Jun, Wang, Yazhu, Huan, Yizhong, Liang, Tao, and Yu, Qiming Jimmy
Subjects: SEWAGE disposal plants, DATA envelopment analysis, WASTEWATER treatment, KRUSKAL-Wallis Test, RANDOM forest algorithms
Abstract: Comprehensive efficiency analysis of wastewater treatment plants (WWPTs) in the alpine region with harsh environment and poor techniques as well as managing experience could provide targeted and effective improvement evidences for local wastewater treatment industry and help to improve the water quality of downstream reaches. In this paper, slack–based data envelopment analysis (SBM–DEA) was adopted to assess the operating efficiencies of WWPTs in northeast Qinghai–Tibet Plateau (QTP). Results showed that the average efficiency score for all WWPTs was 0.608, and 32.5% of WWPTs were efficient. Some WWPTs had large improvement potentials in operating costs and pollutant removal rates. Lowering expenditures and promoting facility construction for WWPTs to overcome the climate difficulties and improve management level was necessary according to their improvement potentials. In addition, the relative importance of the quantitative influential factors to efficiencies scores calculated by random forest regression (RFR) indicated that design capacity and temperature were important quantitative factors affecting the performance of WWPTs. Furthermore, geographical location and design capacity also had significant influence on the comprehensive efficiency of WWPTs verified by Kruskal–Wallis test. Our results highlight the importance of facilities upgrading, scientific management for WWPTs. And the relative improvement suggestions on overcoming the high and cold environment should also be considered for the efficient operations of WWTPs as well as the protection the aquatic environment. [Display omitted] • Operating efficiency of WWPTs in the northeast QTP was evaluated by SBM–DEA. • Some WWPTs had improvement potentials on saving cost and pollutant removal rates. • Temperature and designed capacity were important factors in impacting efficiency. • Lower altitude and higher temperature were more benefit for operation of WWPTs. • Insulation, upgrading and reconstructing were necessary for WWPTs in QTP. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

30. Prediction of surface roughness during hard turning of AISI 4340 steel (69 HRC).

Author: Agrawal, Anupam, Goel, Saurav, Rashid, Waleed Bin, and Price, Mark
Subjects: SURFACE roughness, STEEL analysis, INTERFEROMETERS, REGRESSION analysis, STATISTICAL correlation
Abstract: In this study, 39 sets of hard turning (HT) experimental trials were performed on a Mori-Seiki SL-25Y (4-axis) computer numerical controlled (CNC) lathe to study the effect of cutting parameters in influencing the machined surface roughness. In all the trials, AISI 4340 steel workpiece (hardened up to 69 HRC) was machined with a commercially available CBN insert (Warren Tooling Limited, UK) under dry conditions. The surface topography of the machined samples was examined by using a white light interferometer and a reconfirmation of measurement was done using a Form Talysurf. The machining outcome was used as an input to develop various regression models to predict the average machined surface roughness on this material. Three regression models – Multiple regression, Random forest, and Quantile regression were applied to the experimental outcomes. To the best of the authors’ knowledge, this paper is the first to apply random forest or quantile regression techniques to the machining domain. The performance of these models was compared to ascertain how feed, depth of cut, and spindle speed affect surface roughness and finally to obtain a mathematical equation correlating these variables. It was concluded that the random forest regression model is a superior choice over multiple regression models for prediction of surface roughness during machining of AISI 4340 steel (69 HRC). [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

31. Machine learning approach to predict terrestrial gross primary productivity using topographical and remote sensing data.

Author: Prakash Sarkar, Deep, Uma Shankar, B., and Ranjan Parida, Bikash
Subjects: REMOTE sensing, CARBON cycle, RANDOM forest algorithms, MACHINE learning, ECOSYSTEM health, ENVIRONMENTAL health, SPATIAL resolution
Abstract: • Proposed an accurate GPP perdition with Random Forest Regression (RFR). • Meteorological and topographical features combined with remote sensing features. • RFR Model is compared to state-of-the-art machine learning models. • Effectiveness of different modality/feature combination are evaluated. • Prediction of GPP based on Plant Functional Types is analysed. Gross Primary Productivity (GPP) is the amount of sequestered CO 2 during plant photosynthesis. GPP is an important indicator of ecosystem health in various ecologies and to assess climate change. The objective of the present work is to propose a machine learning based GPP estimation model using remote sensing (RS) data in combination with meteorological data (MET) and topographical data (TOPO) for prediction of GPP, which can be upscaled in temporal and spatial resolution. Random Forest Regression (RFR) is proposed for this using the Fluxnet2015 GPP dataset for the Australian region. This model has attained a very high accuracy with an R 2 value of 0.82, as estimated by 10-fold cross-validation. The model has been compared with state-of-the-art machine learning models and found to be performing better than others. Different feature sets like MET-features and TOPO-features were evaluated in combination with RS-features. The results exhibited that the RFR model performed better when MET and TOPO features are combined with RS-features. GPP prediction for the year 2014, in 8 days temporal and 500m spatial resolution for the Australian region for different plant function types is demonstrated using the proposed model and produced very high value of R 2 (0.84), when compared to ground truth. Thus, the proposed approach of the RFR model for GPP estimation showed significant improvement in regional carbon cycle studies and can also be employed for simulating GPP for the future under different climate scenarios. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. Analysis of affected population vulnerability to rainstorms and its induced floods at county level: A case study of Zhejiang Province, China.

Author: Liao, Xinli, Xu, Wei, Zhang, Junlin, Qiao, Yu, and Meng, Chenna
Abstract: The impacts of rainstorms and its induced floods (RAIF) are substantial and rising, and the coastal regions in southeastern China may suffer more because of the frequent occurrence of RAIF. Therefore, research on the vulnerability of the affected population to RAIF is vital. This study presents an assessment framework for vulnerability in Zhejiang Province, China, at county level. Based on data related to loss records, precipitation, population, economy, topography and hydrology, relatively important variables were first selected by random forest regression and agglomerative hierarchical clustering to avoid multicollinearity and over-fitting. And then we established and validated a vulnerability model and analyzed the importance score and response curve of each variable. The results indicated the following: (1) The counties suffering more from RAIF were mostly distributed in the southeast and west of Zhejiang Province. Approximately 1.42 million people were affected per year. (2) The R2 of the vulnerability model based on random forest regression was 0.41, and the largest multiple-day rainfall was the most import driver of the population affected by RAIF. (3) The response curve of the largest multiple-day rainfall showed a trend of first increasing and then stabilizing; GDP per capita first decreased sharply and tended to become stable; population first increased, then decreased and showed an increasing trend again. An innovative aspect of this work was the use of machine learning to analyze the vulnerability and the non-linear relationships between variables and the affected population, and these results may help policymakers develop suitable mitigation strategies against RAIF. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. Discovering optimal strategies for mitigating COVID-19 spread using machine learning: Experience from Asia.

Author: Pan, Yue, Zhang, Limao, Yan, Zhenzhen, Lwin, May O., and Skibniewski, Miroslaw J.
Subjects: COVID-19, COVID-19 pandemic, MACHINE learning, SOCIAL adjustment, RANDOM forest algorithms
Abstract: • A spatiotemporal analysis framework is proposed to study COVID-19 evolution in four Asia countries. • Random forest learns various features to capture pandemic dynamics and make time-series predictions. • NSGA-II determines optimal solutions to minimize growth of confirmed cases and deaths. • Two features called temperature and stringency play the most important role in an accurate prediction. • Adjustment of social features can bring larger improvements in fighting COVID-19. To inform data-driven decisions in fighting the global pandemic caused by COVID-19, this research develops a spatiotemporal analysis framework under the combination of an ensemble model (random forest regression) and a multi-objective optimization algorithm (NSGA-II). It has been verified for four Asian countries, including Japan, South Korea, Pakistan, and Nepal. Accordingly, we can gain some valuable experience to better understand the disease evolution, forecast the prevalence of the disease, which can provide sustainable evidence to guide further intervention and management. Random forest with a proper rolling time-window can learn the combined effects of environmental and social factors to accurately predict the daily growth of confirmed cases and daily death rate on a national scale, which is followed by NSGA-II to find a range of Pareto optimal solutions for ensuring the minimization of the infection rate and mortality at the same time. Experimental results demonstrate that the predictive model can alert the local government in advance, allowing the accused time to put forward relevant measures. The temperature in the category of environment and the stringency index belonging to the social factor are identified as the top 2 important features to exert a greater impact on the virus transmission. Moreover, optimal solutions provide references to design the best control strategies towards pandemic containment and prevention that can accommodate the country-specific circumstance, which are possible to decrease the two objectives by more than 95%. In particular, appropriate adjustment of social-related features needs to take priority over others, since it can bring about at least 1.47% average improvement of two objectives compared to environmental factors. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

34. Investigating the spatial distribution of antimony geochemical anomalies located in the Yunnan-Guizhou-Guangxi region, China.

Author: Zhao, Dongjie and Wang, Xueqiu
Subjects: GOLD ores, RIVER sediments, MINES & mineral resources, HYDROTHERMAL deposits, ANTIMONY, RANDOM forest algorithms, ALGORITHMS
Abstract: The Yunnan-Guizhou-Guangxi region is well known for its abundant mineral resources, and low-temperature hydrothermal mineralization represented by the elemental association of gold, arsenic, antimony and mercury is widely developed there. Many studies on the geological-geochemical characteristics of gold have been conducted, but a comprehensive understanding of the antimony geochemical pattern is still lacking. This paper studied the Sb distribution characteristics and the cause of geochemical anomalies based on the geochemical data of stream sediments and rocks in the study area. In addition, the geochemical data of Au, As, Sb and Hg were centered and log-ratio transformed to eliminate the closure effect, and then random forest regression (RFR) with Au, As and Hg as the characteristic variables was used to investigate the ore-related geochemical anomalies of Sb. Seven geochemical provinces were delineated from the original geochemical data, and they are not entirely consistent with the known deposits. Sb moves from the rocks to the stream sediments during weathering. The variation trend in the Sb background values in stream sediments in each tectonic unit is consistent with that in the rocks themselves, implying that Sb in the stream sediments is inherited from the background rocks. The distributions of Sb predicted by RFR are similar to the distribution pattern of Sb in stream sediments. Of the three elements considered, the influence of As on the variations in the Sb geochemical background is the greatest, followed by Au and then Hg. The geochemical anomalies extracted by the residuals produced in this algorithm are consistent with where the known Sb metallogenic district is located, indicating that this method of recognizing geochemical anomalies is feasible and effective and has theoretical and practical significance. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

35. Random forest regression analysis on combined role of meteorological indicators in disease dissemination in an Indian city: A case study of New Delhi.

Author: Hariharan, Ramya
Abstract: Meteorological parameters show a strong influence on disease transmission in urban localities. The combined influence of factors such as daily mean temperature, absolute humidity and average wind speed on the attack rate and mortality rate of COVID-19 rise in Delhi, India has been investigated in this case study. A Random forest regression algorithm has been utilized to compare the epidemiological and meteorological parameters. The performance of the model has been evaluated using statistical performance metrics. The random forest model shows a strong positive correlation between the predictor parameters on the attack rate (96.09%) and mortality rate (93.85%). On both the response variables, absolute humidity has been noted to be the variable of highest influence. In addition, both temperature and wind speed have shown moderate positive influence on the transmission and survival of coronavirus during the study period. The synergistic effect of absolute humidity with temperature and wind speed contributing towards the increase in the attack and mortality rate has been addressed. The inhibition to respiratory droplet evaporation, increment in droplet size due to hygroscopic effect and the enhanced duration of survival of coronavirus borne in respiratory droplets are attributed to the increase in coronavirus infection under the observed weather conditions. [Display omitted] • Combined effect of urban climatic parameters on COVID-19 dissemination. • Random forest regression model built to understand the synergistic effect. • Increase in absolute humidity aids in increase of COVID-19 transmission in Delhi. • Inhibition to respiratory droplet evaporation attributed high humidity. • High correlation coefficient (>92%) achieved in the chosen machine learning model. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

36. Experimental study on erosion behavior of fracturing pipeline involving tensile stress and erosion prediction using random forest regression.

Author: Yang, Siqi, Zhang, Laibin, Fan, Jianchun, and Sun, Bingcai
Subjects: MATERIAL erosion, RANDOM forest algorithms, ARTHRITIS, PIPELINES, PIPE fittings, FRACTURING fluids, PIPELINE failures, HYDRAULIC fracturing
Abstract: Damage induced by erosion wear is an inevitable problem in the oil and gas industry. During the hydraulic fracturing operation, the fracturing pipeline is not only subjected to tensile stress caused by the high internal pressure, but it also inevitably suffers from erosion damage from solid particles carried by the fracturing fluid. Due to the lack of accurate erosion prediction methods for fracturing pipelines in operation conditions, it is difficult to prevent the failure of pipe fittings caused by erosion wear. Therefore, in this paper, erosion wear experiments of fracturing pipelines under varying conditions (including impact angle, tensile stress, target material, flow velocity and particle concentration) were carried out. Results indicate that the tensile stress plays a crucial role in affecting the erosion wear rate. Furthermore, erosion wear prediction models were proposed on basis of the sufficient experimental data by using different machine learning algorithms. The prediction results were validated in comparison with experiments via error analysis. A good performance in prediction accuracy and generalization ability was observed in the random forest regression (RFR) approach, making it a potential solution in predicting the slurry erosion wear of fracturing pipelines and may be developed to all the high-pressure pipelines. • The slurry erosion experiments of fracturing pipeline involving different factors were carried out. • The comparison of erosion resistances of different steel materials taken from fracturing pipelines was conducted. • The applied tensile stress plays a pivotal role in affecting the erosion wear of target material. • A novel erosion prediction approach for high-pressure pipeline based on RFR algorithm was presented. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

37. Applying machine learning methods to better understand, model and estimate mass concentrations of traffic-related pollutants at a typical street canyon.

Author: Šimić, Iva, Lovrić, Mario, Godec, Ranka, Kröll, Mark, and Bešlić, Ivan
Subjects: AIR pollutants, MACHINE learning, POLLUTANTS, CANYONS, SUPPORT vector machines, URBAN planning, PARTIAL least squares regression, SPATIAL variation
Abstract: Narrow city streets surrounded by tall buildings are favorable to inducing a general effect of a "canyon" in which pollutants strongly accumulate in a relatively small area because of weak or inexistent ventilation. In this study, levels of nitrogen-oxide (NO 2), elemental carbon (EC) and organic carbon (OC) mass concentrations in PM 10 particles were determined to compare between seasons and different years. Daily samples were collected at one such street canyon location in the center of Zagreb in 2011, 2012 and 2013. By applying machine learning methods we showed seasonal and yearly variations of mass concentrations for carbon species in PM 10 and NO 2 , as well as their covariations and relationships. Furthermore, we compared the predictive capabilities of five regressors (Lasso, Random Forest, AdaBoost, Support Vector Machine and Partials Least squares) with Lasso regression being the overall best performing algorithm. By showing the feature importance for each model, we revealed true predictors per target. These measurements and application of machine learning of pollutants were done for the first time at a street canyon site in the city of Zagreb, Croatia. Image 1 • Applying Machine Learning Methods of traffic-related pollutants at a street canyon site, Croatia. • Selection of features and ability to explain model were improved with permutation importance. • The predictivity of pollutants by regression models increases in order PM 10 < NO 2 < EC < OC. • Test predictivity of pollutants are important for future urban planning and air quality. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

38. Estimating hourly average indoor PM2.5 using the random forest approach in two megacities, China.

Author: Xu, Chunyu, Xu, Dongqun, Liu, Zhe, Li, Yunpu, Li, Ning, Chartier, Ryan, Chang, Junrui, Wang, Qin, Wu, Yaxi, and Li, Na
Subjects: APARTMENT buildings, MEGALOPOLIS, STANDARD deviations, PARTICULATE matter
Abstract: This study developed a predictive model for hourly indoor fine particulate matter (PM 2.5) concentration based on the random forest regression (RFR) method and compared its performance with the traditional multiple linear regression (MLR) method. The concentrations of indoor and outdoor PM 2.5 were monitored at a total of 66 apartments in Nanjing (NJ) and Beijing (BJ), China, during both the heating and non-heating seasons. In total, 14,442 pairs of hourly indoor and outdoor PM 2.5 were measured by light-scattering nephelometer, while potential influencing factors were obtained via questionnaires. Hourly indoor PM 2.5 prediction were developed based on either the RFR or MLR method. A ten-fold cross-validation (10-fold CV) analysis was used to evaluate the predictive power of the models. The 10-fold CV results revealed the MLR models agree fairly well with the measured data, with coefficients of determination (R 2) ranging from 0.70 (BJ) to 0.73 (NJ), while the root mean square error (RMSE) ranged from 28.0 μg/m3 (NJ) to 28.2 μg/m3 (BJ). Overall, the RFR models outperformed the reference MLR method as indicated by higher CV R 2 (0.82 in BJ and 0.78 in NJ, respectively) and lower CV RMSE (20.4 μg/m3 in BJ and 24.3 μg/m3 in NJ, respectively). Our results show that the RFR approach can exceed the predictive power of the classic MLR method and is a promising methodology for estimating indoor PM 2.5 concentrations in Chinese megacities when direct PM 2.5 measurements are not possible. Image 1 • High intraday variations of hourly indoor PM 2.5 were detected. • Random forest regression (RFR) was applied to modeling the hourly indoor PM 2.5. • RFR performed better than the traditional multiple linear regression (MLR) model. • The outdoor PM 2.5 levels were the most important predictor of indoor PM 2.5. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

39. Analyzing parcel-level relationships between Luojia 1-01 nighttime light intensity and artificial surface features across Shanghai, China: A comparison with NPP-VIIRS data.

Author: Wang, Congxiao, Chen, Zuoqi, Yang, Chengshu, Li, Qiaoxuan, Wu, Qiusheng, Wu, Jianping, Zhang, Guo, and Yu, Bailang
Subjects: LIGHT intensity, INFRARED imaging, REMOTE sensing, BODIES of water, REGRESSION analysis
Abstract: • Parcel-level artificial surface features were linked to Luojia 1-01 nighttime light (NTL) imagery. • NTL variations among artificial surface features in Shanghai were explained using NPP-VIIRS data for comparison. • Luojia 1-01 had fewer "blooming" phenomena than NPP-VIIRS data. • Luojia 1-01 is more suitable for estimating socioeconomic activities at a finer scale than NPP-VIIRS data. Nighttime light (NTL) remote sensing data have been widely used to derive socioeconomic indices at national and regional scales. However, few studies analyzed the factors that may explain NTL variations at a fine scale due to the limited resolution of existing NTL data. As a new generation NTL satellite, Luojia 1-01 provides NTL data with a finer spatial resolution of ∼130 m and can be used to assess the relationship between NTL intensity and artificial surface features on an unprecedented scale. This study represents the first efforts to assess the relationship between Luojia 1-01 NTL intensity and artificial surface features at the parcel level in comparison to the Suomi National Polar-orbiting Partnership-Visible Infrared Imaging Radiometer Suite (NPP-VIIRS) NTL data. Points-of-interest (POIs) and land-use/land-cover (LULC) data were used in random forest (RF) regression models for both Luojia 1-01 and NPP-VIIRS to analyze the feature contribution of artificial surface features to NTL intensity. The results show that luminosity variations in Luojia 1-01 data for different land-use types were more significant than those in NPP-VIIRS data because of the finer spatial resolution and wider measurement range. Seventeen variables extracted from POI and LULC data explained the Luojia 1-01 and NPP-VIIRS NTL intensity, with a good out-of-bag score of 0.62 and 0.76, respectively. Moreover, Luojia 1-01 data had fewer "blooming" phenomena than NPP-VIIRS data, especially for cropland, water body, and rural area. Luojia 1-01 is more suitable for estimating socioeconomic activities and can attain more comprehensive information on human activities, since the feature contribution of POI variables is more sensitive to NTL intensity in the Luojia 1-01 RF regression model than that in the NPP-VIIRS RF regression model. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

39 results on '"random forest regression"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources