9 results
Search Results
2. Logistic regression versus XGBoost for detecting burned areas using satellite images.
- Author
-
Militino, A. F., Goyena, H., Pérez-Goya, U., and Ugarte, M. D.
- Subjects
MODIS (Spectroradiometer) ,MACHINE learning ,BOOSTING algorithms ,LOGISTIC regression analysis ,REMOTE-sensing images ,LANDSAT satellites - Abstract
Classical statistical methods prove advantageous for small datasets, whereas machine learning algorithms can excel with larger datasets. Our paper challenges this conventional wisdom by addressing a highly significant problem: the identification of burned areas through satellite imagery, that is a clear example of imbalanced data. The methods are illustrated in the North-Central Portugal and the North-West of Spain in October 2017 within a multi-temporal setting of satellite imagery. Daily satellite images are taken from Moderate Resolution Imaging Spectroradiometer (MODIS) products. Our analysis shows that a classical Logistic regression (LR) model competes on par, if not surpasses, a widely employed machine learning algorithm called the extreme gradient boosting algorithm (XGBoost) within this particular domain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Probabilistic feature selection for improved asset lifetime estimation in renewables. Application to transformers in photovoltaic power plants.
- Author
-
Ramirez, Ibai, Aizpurua, Jose I., Lasa, Iker, and del Rio, Luis
- Subjects
- *
FEATURE selection , *POWER transformers , *RENEWABLE energy sources , *TRANSFORMER models , *PHOTOVOLTAIC power systems , *ELECTRIC power distribution grids , *POWER plants - Abstract
The increased penetration of renewable energy sources (RESs) as an effective mechanism to reduce carbon emissions leads to an increased weather dependency for power and energy systems. This has created dynamic operation and degradation phenomena, which affect the lifetime estimation of the assets operated with RESs. For the reliable and efficient operation of RES it is crucial to monitor the health of its constituent components and feature selection is a crucial step for building robust and accurate health monitoring approaches. In this context, this paper presents a probabilistic feature selection approach, which probabilistically weights and selects features through a heuristic and iterative process for an improved asset lifetime estimation. Power transformers are key power grid assets and they are used to demonstrate the validity and impact of the proposed approach. The approach is tested on two different photovoltaic power plants operated in Spain and Australia. Results consistently show that the proposed feature-selection approach reduces the prediction error and consistently selects relevant features. The approach has been applied to transformer lifetime estimation, but it can be generally applied to assist in the lifetime estimation of other components operated in RESs. Part of the studies presented here as well as source codes are all open-source under the GitHub repository https://github.com/iramirezg/FeatureSelection. • Probabilistic feature selection approach for improved asset lifetime estimation. • Integration of environmental features for improved renewable-operated asset lifetime. • Systematic and robust feature weighting methodology. • Improved transformer lifetime estimation including sensor and environmental data. • Validated on two real photovoltaic power plant case studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Tackling data challenges in forecasting effluent characteristics of wastewater treatment plants.
- Author
-
Roohi, Ali Mohammad, Nazif, Sara, and Ramazi, Pouria
- Subjects
- *
SEWAGE disposal plants , *MACHINE learning , *FORECASTING , *BAYESIAN analysis , *MISSING data (Statistics) , *ELECTRIC conductivity , *PERCENTILES - Abstract
In wastewater treatment plants (WWTPs), the stochastic nature of influent wastewater and operational and weather conditions cause fluctuations in effluent quality. Data-driven models can forecast effluent quality a few hours ahead as a response to the influent characteristics, providing enough time to adjust system operations and avoid undesired consequences. However, existing data for training models are often incomplete and contain missing values. On the other hand, collecting additional data by installing new sensors is costly. The trade-off between using existing incomplete data and collecting costly new data results in three data challenges faced when developing data-driven WWTP effluent forecasters. These challenges are to determine important variables to be measured, the minimum number of required data instances, and the maximum percentage of tolerable missing values that do not impede the development of an accurate model. As these issues are not discussed in previous studies, in this research, for the first time, a comprehensive analysis is done to provide answers to these challenges. Another issue that arises in all data-driven modeling is how to select an appropriate forecasting model. This paper addresses these issues by first testing nine machine learning models on data collected from three wastewater treatment plants located in Iran, Australia, and Spain. The most accurate forecaster, Bayesian network, was then used to address the articulated challenges. Key variables in forecasting effluent characteristics were flow rate, total suspended solids, electrical conductivity, phosphorus compounds, wastewater temperature, and air temperature. A minimum of 250 samples was needed during the model training to achieve a great reduction in the forecasting error. Moreover, a steep increase in the error was observed should the portion of missing values exceed 10%. The results assist plant managers in estimating the necessary data collection effort to obtain an accurate forecaster, contributing to the quality of the effluent. [Display omitted] • Bayesian network is benchmarked against eight AI models to predict wastewater quality. • Bayesian networks can handle missing value in input data due to probabilistic nature. • Key variables in WWTP quality and quantity were specified. • Input data quality has a huge role in successful development of data-driven models. • The prediction error sharply increases when the portion of missing data exceeds 10%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Profiling Social Sentiment in Times of Health Emergencies with Information from Social Networks and Official Statistics.
- Author
-
Velasco-López, Jorge-Eusebio, Carrasco, Ramón-Alberto, Serrano-Guerrero, Jesús, and Chiclana, Francisco
- Subjects
SOCIAL dynamics ,MACHINE learning ,SOCIAL networks ,DATA mining ,COVID-19 pandemic ,SENTIMENT analysis ,PUBLIC spaces - Abstract
Social networks and official statistics have become vital sources of information in times of health emergencies. The ability to monitor and profile social sentiment is essential for understanding public perception and response in the context of public health crises, such as the one resulting from the COVID-19 pandemic. This study will explore how social sentiment monitoring and profiling can be conducted using information from social networks and official statistics, and how this combination of data can offer a more complete picture of social dynamics in times of emergency, providing a valuable tool for understanding public perception and guiding a public health response. To this end, a three-layer architecture based on Big Data and Artificial Intelligence is presented: the first layer focuses mainly on collecting, storing, and governing the necessary data such as social media and official statistics; in the second layer, the representation models and machine learning necessary for knowledge generation are built, and in the third layer the previously generated knowledge is adapted for better understanding by crisis managers through visualization techniques among others. Based on this architecture, a KDD (Knowledge Discovery in Databases) framework is implemented using methodological tools such as sentiment analysis, fuzzy 2-tuple linguistic models and time series prediction with the Prophet model. As a practical demonstration of the proposed model, we use tweets as data source (from the social network X, formerly known as Twitter) generated during the COVID-19 pandemic lockdown period in Spain, which are processed to identify the overall sentiment using sentiment analysis techniques and fuzzy linguistic variables, and combined with official statistical indicators for prediction, visualizing the results through dashboards. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Improving the prediction of extreme wind speed events with generative data augmentation techniques.
- Author
-
Vega-Bayo, M., Pérez-Aracil, J., Prieto-Godino, L., and Salcedo-Sanz, S.
- Subjects
- *
DATA augmentation , *PROBABILISTIC generative models , *CONVOLUTIONAL neural networks , *MACHINE learning , *WIND speed , *WIND damage , *DEEP learning , *WIND power plants - Abstract
Extreme Wind Speed events (EWS) are responsible for the worst damages caused by wind in wind farms. An accurate estimation of the frequency and intensity of EWS is essential to avoid wind turbine damage and to minimize cut-out events in these facilities. In this paper we discuss how generative Data Augmentation (DA) techniques improve the performance of Machine Learning (ML) and Deep Learning (DL) algorithms in EWS prediction problems. These problems are usually tackled as classification tasks, which are highly unbalanced due to the small number of EWS events in wind farms. Different versions of Variational AutoEncoders (VAE) are proposed and analysed in this work (VAEs, Conditional VAEs (CVAEs) and Class-Informed VAEs (CI-VAE)) as generative DA techniques to balance the data in EWS problems, leading to better performance of the prediction systems. The proposed generative DA techniques have been compared against traditional DA algorithms in a real problem of EWS prediction in Spain, considering ERA5 reanalysis data as predictive variables. The results showed that the CI-VAE with a Convolutional Neural Network approach obtained the best results, with values of Precision 0.62, Recall 0.74 and F1 score 0.67, improving up to 4% the results of the method without data augmentation techniques. [Display omitted] • Generative data augmentation techniques can improve extreme wind prediction. • Variational Autoencoders modifications are used to generate synthetic wind extreme samples. • Experiments on real wind speed data from a Spanish wind farm are presented. • Comparisons with traditional data augmentation techniques show prediction improvements. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Evaluation of multispectral data for recent manure application: A case study in northern Spain.
- Author
-
Pedrayes, Oscar D., Usamentiaga, Rubén, Trichakis, Yanni, and Bouraoui, Faycal
- Subjects
- *
MULTISPECTRAL imaging , *POLLUTION , *FEATURE selection , *REMOTE-sensing images , *AGRICULTURE , *PRECISION farming , *MANURES - Abstract
[Display omitted] • Inclusion of temporal data can improve detection F 1 -Score by 8% • Infrared bands provide about 4% more F 1 -Score than visible bands despite lower resolution • Using over 80 features provides an increase of about 12% F 1 -Score over using less than 10. • The proposed method successfully detects all test plots with nearly 90% F 1 -Score • A dataset of recent manure application, verified through on-site validation, is made public. The use of manure in agricultural fields during the wet season can lead to environmental pollution by releasing nitrates into nearby water sources. To address this issue, authorities may impose closed periods during which manure application is prohibited. However, ensuring compliance with these regulations can be challenging, as it is difficult to monitor all fields in a country. To tackle this problem, a solution has been proposed that involves employing machine learning techniques in conjunction with satellite imagery to automatically identify freshly manured fields. This paper investigates the relationship and effectiveness of the Sentinel-2 satellite bands and 51 frequently utilized multispectral indices in the context of precision agriculture, by exploring different feature selection methods. The proposed method achieves nearly 90% F 1 -Score and detects all test plots of the northern Spanish region, showing its potential for large-scale use in precision agriculture and environmental monitoring. This method incorporates temporal data, resulting in an 8% improvement in the detection F 1 -Score. Despite their lower spatial resolution, infrared bands have proven to be more effective than visible bands, enhancing the F 1 -Score by 4%. Furthermore, the use of over 80 features contributes to a 12% increase in the F 1 -Score compared to using fewer than 10 features. For further research and future studies, a dataset of recently manured plots, verified on-site, has been developed and made publicly available. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Improving Fire Severity Analysis in Mediterranean Environments: A Comparative Study of eeMETRIC and SSEBop Landsat-Based Evapotranspiration Models.
- Author
-
Quintano, Carmen, Fernández-Manso, Alfonso, Fernández-Guisuraga, José Manuel, and Roberts, Dar A.
- Subjects
WILDFIRE prevention ,MACHINE learning ,EVAPOTRANSPIRATION ,HUMAN capital ,FIRE management ,ECOSYSTEM dynamics ,WATER supply ,FOREST fires - Abstract
Wildfires represent a significant threat to both ecosystems and human assets in Mediterranean countries, where fire occurrence is frequent and often devastating. Accurate assessments of the initial fire severity are required for management and mitigation efforts of the negative impacts of fire. Evapotranspiration (ET) is a crucial hydrological process that links vegetation health and water availability, making it a valuable indicator for understanding fire dynamics and ecosystem recovery after wildfires. This study uses the Mapping Evapotranspiration at High Resolution with Internalized Calibration (eeMETRIC) and Operational Simplified Surface Energy Balance (SSEBop) ET models based on Landsat imagery to estimate fire severity in five large forest fires that occurred in Spain and Portugal in 2022 from two perspectives: uni- and bi-temporal (post/pre-fire ratio). Using-fine-spatial resolution ET is particularly relevant for heterogeneous Mediterranean landscapes with different vegetation types and water availability. ET was significantly affected by fire severity according to eeMETRIC (F > 431.35; p-value < 0.001) and SSEBop (F > 373.83; p-value < 0.001) metrics, with reductions of 61.46% and 63.92%, respectively, after the wildfire event. A Random Forest machine learning algorithm was used to predict fire severity. We achieved higher accuracy (0.60 < Kappa < 0.67) when employing both ET models (eeMETRIC and SSEBop) as predictors compared to utilizing the conventional differenced Normalized Burn Ratio (dNBR) index, which resulted in a Kappa value of 0.46. We conclude that both fine resolution ET models are valid to be used as indicators of fire severity in Mediterranean countries. This research highlights the importance of Landsat-based ET models as accurate tools to improve the initial analysis of fire severity in Mediterranean countries. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Elemental Fingerprinting Combined with Machine Learning Techniques as a Powerful Tool for Geographical Discrimination of Honeys from Nearby Regions.
- Author
-
Mara, Andrea, Migliorini, Matteo, Ciulu, Marco, Chignola, Roberto, Egido, Carla, Núñez, Oscar, Sentellas, Sònia, Saurina, Javier, Caredda, Marco, Deroma, Mario A., Deidda, Sara, Langasco, Ilaria, Pilo, Maria I., Spano, Nadia, and Sanna, Gavino
- Subjects
INDUCTIVELY coupled plasma mass spectrometry ,FISHER discriminant analysis ,MACHINE learning ,HONEY ,RARE earth metals ,HONEY composition - Abstract
Discrimination of honey based on geographical origin is a common fraudulent practice and is one of the most investigated topics in honey authentication. This research aims to discriminate honeys according to their geographical origin by combining elemental fingerprinting with machine-learning techniques. In particular, the main objective of this study is to distinguish the origin of unifloral and multifloral honeys produced in neighboring regions, such as Sardinia (Italy) and Spain. The elemental compositions of 247 honeys were determined using Inductively Coupled Plasma Mass Spectrometry (ICP-MS). The origins of honey were differentiated using Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Random Forest (RF). Compared to LDA, RF demonstrated greater stability and better classification performance. The best classification was based on geographical origin, achieving 90% accuracy using Na, Mg, Mn, Sr, Zn, Ce, Nd, Eu, and Tb as predictors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.