217 results on '"Time-series clustering"'
Search Results
2. Identifying temporal changes in student engagement in social annotation during online collaborative reading.
- Author
-
Chen, Fu, Li, Shan, Lin, Lijia, and Huang, Xiaoshan
- Subjects
STUDENT engagement ,ACADEMIC motivation ,COGNITIVE ability ,SOCIAL learning ,SOCIALIZATION - Abstract
Social annotation plays a crucial role in nurturing and sustaining a collaborative reading community, offering the potential to enhance students' motivation and performance within socially supportive learning environments. Nonetheless, research on the dynamic changes in student engagement in social annotation remains limited. This study aims to unveil the temporal changes in students' behavioral, cognitive, affective, and social engagement within the context of social annotation. In addition, it examines the disparities in social annotation behaviors between students with different engagement profiles. Using a multivariate time series clustering approach to analyze a dataset comprising 91 undergraduate students interacting with 29 reading materials, this study identified two distinct engagement profiles. Both profiles revealed a decline in behavioral engagement as the number of reading activities increased. However, students' cognitive, affective, and social engagement levels in social annotation remained relatively stable across these activities. Subsequent analyses showed that students exhibiting a declining engagement profile displayed higher levels of aggregated behavioral, cognitive, and social engagement. Furthermore, they posted a significantly greater number of annotations and responses to peers' annotations compared to students characterized by a low engagement profile. Potential explanations and pedagogical implications of these findings were discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Inferring Demand in Drinking Water Distribution Systems through Stratified Sampling of Billing Data for Smart Meter Installation.
- Author
-
Almeida Silva, Maria, Amado, Conceiçāo, and Loureiro, Dália
- Subjects
- *
SMART meters , *WATER distribution , *MUNICIPAL water supply , *WATER utilities , *SUSTAINABILITY - Abstract
The importance of urban water supply systems and public services is globally recognized. Nonrevenue water directly affects a water utility's economic, financial, and environmental sustainability. In Portugal, the mean of the nonrevenue water for the distribution systems corresponded to 28.8% in 2019. Smart metering technology is crucial for consumption monitoring and enhancing apparent and real loss network management (e.g., water meters' global error evaluation, detection of illegal uses, and real loss estimation through the minimum night flow analysis). However, this technology is expensive in acquisition, installation, operation, and maintenance. This study aims to support water utilities in inferring the total consumption using a representative sample of customers with smart meters instead of smart metering data from all customers. A stratified sampling was considered using only the customers' billing time series for the strata definition. A predominantly domestic zone was used, and eight strata were obtained with a clustering analysis [temporal correlation (CORT) dissimilarity and Ward method]. Stratified sampling was applied to minimize the variance of the total water consumption estimator. A representative sample of 259 dimensions (53%) was chosen to infer, with small errors, essential consumption statistics for water utilities: total consumption (with an error of 0.12%), total consumption time series, water consumption patterns, minimum night consumption, and volume distribution by the flow rate. The successful outcomes obtained were crucial in supporting the proposed methodology. This study has provided evidence that installing smart meters for all consumers in a distribution network area is not necessary to acquire accurate and meaningful consumption information crucial for effective network management and water loss control. Moreover, using only billing data to perform the sample selection of consumers is useful for water utilities, because they may face difficulties obtaining extra consumer information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Intra-annual vegetation changes and spatial variation in China over the past two decades based on remote sensing and time-series clustering.
- Author
-
Cheng, Xi, Luo, Mingliang, Chen, Ke, Sun, Jian, and Wu, Yong
- Subjects
VEGETATION dynamics ,SPATIAL variation ,REMOTE sensing ,REGIONAL economic disparities ,DEW point ,REGIONAL disparities ,TIME series analysis - Abstract
Vegetation is an important link between land, atmosphere, and water, making its changes of great significance. However, existing research has predominantly focused on long-term vegetation changes, neglecting the intra-annual variations of vegetation. Hence, this study is based on the Enhanced Vegetation Index (EVI) data from 2000 to 2022, with a time step of 16 days, to analyze the intra-annual patterns of vegetation changes in China. The average intra-annual EVI values for each municipal-level administrative region were calculated, and the time-series k-means clustering algorithm was employed to divide these regions, exploring the spatial variations in China's intra-annual vegetation changes. Finally, the ridge regression and random forest methods were utilized to assess the drivers of intra-annual vegetation changes. The results showed that: (1) China's vegetation status exhibits a notable intra-annual variation pattern of "high in summer and low in winter," and the changes are more pronounced in the northern regions than in the southern regions; (2) the intra-annual vegetation changes exhibit remarkable regional disparities, and China can be optimally clustered into four distinct clusters, which align well with China's temperature and precipitation zones; and (3) the intra-annual vegetation changes demonstrate significant correlations with meteorological factors such as dew point temperature, precipitation, maximum temperature, and sea-level pressure. In conclusion, our study reveals the characteristics, spatial patterns and driving forces of intra-annual vegetation changes in China, which contribute to explaining ecosystem response mechanisms, providing valuable insights for ecological research and the formulation of ecological conservation and management strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Investigating the temporal dynamics of motor vehicle collision density patterns in urban road networks – A case study of New York.
- Author
-
Chang, Haoliang, Xu, Corey Kewei, and Tang, Tian
- Subjects
- *
MOTOR vehicle dynamics , *URBAN density , *PROBABILITY density function , *MOTOR vehicles , *TRANSPORTATION agencies , *ROAD safety measures - Abstract
• This study tracks the roads with different collision density patterns over time. • A spatio-temporal network KDE method was used to compute collision densities. • Roads with various collision density patterns were grouped by clustering method. • Spatio-temporal and semantic analyses were conducted to profile the risky roads. • The developed method can help the transport department propose the traffic treatment. Introduction : Motor vehicle collisions are a leading source of mortality and injury on urban highways. From a temporal perspective, the determination of a road segment as being collision-prone over time can fluctuate dramatically, making it difficult for transportation agencies to propose traffic interventions. However, there has been limited research to identify and characterize collision-prone road segments with varying collision density patterns over time. Method : This study proposes an identification and characterization framework that profiles collision-prone roads with various collision density variations. We first employ the spatio-temporal network kernel density estimation (STNKDE) method and time-series clustering to identify road segments with different collision density patterns. Next, we characterize collision-prone road segments based on spatio-temporal information, consequences, vehicle types, and contributing factors to collisions. The proposed method is applied to two-year motor vehicle collision records for New York City. Results : Seven clusters of road segments with different collision density patterns were identified. Road segments frequently determined as collision-prone were primarily found in Lower Manhattan and the center of the Bronx borough. Furthermore, collisions near road segments that exhibit greater collision densities over time result in more fatalities and injuries, many of which are caused by both human and vehicle factors. Conclusions : Collision-prone road segments with various collision density patterns over time have distinct differences in the spatio-temporal domain and the collisions that occur on them. Practical Applications : The proposed method can help policymakers understand how collision-prone road segments change over time, and can serve as a reference for more targeted traffic treatment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Unsupervised Machine Learning for Blind Rivets Quality Inspection
- Author
-
Rebe, Ander Martin, Penalva, Mariluz, Veiga, Fernando, Del Val, Alain Gil, Abousoliman, Bilal El Moussaoui, Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Ivanov, Vitalii, Series Editor, Haddar, Mohamed, Series Editor, Cavas-Martínez, Francisco, Editorial Board Member, di Mare, Francesca, Editorial Board Member, Kwon, Young W., Editorial Board Member, Tolio, Tullio A. M., Editorial Board Member, Trojanowska, Justyna, Editorial Board Member, Schmitt, Robert, Editorial Board Member, Xu, Jinyang, Editorial Board Member, Wagner, Achim, editor, Alexopoulos, Kosmas, editor, and Makris, Sotiris, editor
- Published
- 2024
- Full Text
- View/download PDF
7. Twenty-four-hour physical activity patterns associated with depressive symptoms: a cross-sectional study using big data-machine learning approach
- Author
-
Saida Salima Nawrin, Hitoshi Inada, Haruki Momma, and Ryoichi Nagatomi
- Subjects
Activity pattern ,Depressive symptoms ,Kernel K-means ,Objectively measured physical activity ,Time-series clustering ,Unsupervised machine learning ,Public aspects of medicine ,RA1-1270 - Abstract
Abstract Background Depression is a global burden with profound personal and economic consequences. Previous studies have reported that the amount of physical activity is associated with depression. However, the relationship between the temporal patterns of physical activity and depressive symptoms is poorly understood. In this exploratory study, we hypothesize that a particular temporal pattern of daily physical activity could be associated with depressive symptoms and might be a better marker than the total amount of physical activity. Methods To address the hypothesis, we investigated the association between depressive symptoms and daily dominant activity behaviors based on 24-h temporal patterns of physical activity. We conducted a cross-sectional study on NHANES 2011–2012 data collected from the noninstitutionalized civilian resident population of the United States. The number of participants that had the whole set of physical activity data collected by the accelerometer is 6613. Among 6613 participants, 4242 participants had complete demography and Patient Health Questionnaire-9 (PHQ-9) questionnaire, a tool to quantify depressive symptoms. The association between activity-count behaviors and depressive symptoms was analyzed using multivariable logistic regression to adjust for confounding factors in sequential models. Results We identified four physical activity-count behaviors based on five physical activity-counting patterns classified by unsupervised machine learning. Regarding PHQ-9 scores, we found that evening dominant behavior was positively associated with depressive symptoms compared to morning dominant behavior as the control group. Conclusions Our results might contribute to monitoring and identifying individuals with latent depressive symptoms, emphasizing the importance of nuanced activity patterns and their probability of assessing depressive symptoms effectively.
- Published
- 2024
- Full Text
- View/download PDF
8. Twenty-four-hour physical activity patterns associated with depressive symptoms: a cross-sectional study using big data-machine learning approach.
- Author
-
Nawrin, Saida Salima, Inada, Hitoshi, Momma, Haruki, and Nagatomi, Ryoichi
- Subjects
- *
MENTAL depression , *PHYSICAL activity , *CROSS-sectional method , *MORNINGNESS-Eveningness Questionnaire , *MACHINE learning , *LOGISTIC regression analysis - Abstract
Background: Depression is a global burden with profound personal and economic consequences. Previous studies have reported that the amount of physical activity is associated with depression. However, the relationship between the temporal patterns of physical activity and depressive symptoms is poorly understood. In this exploratory study, we hypothesize that a particular temporal pattern of daily physical activity could be associated with depressive symptoms and might be a better marker than the total amount of physical activity. Methods: To address the hypothesis, we investigated the association between depressive symptoms and daily dominant activity behaviors based on 24-h temporal patterns of physical activity. We conducted a cross-sectional study on NHANES 2011–2012 data collected from the noninstitutionalized civilian resident population of the United States. The number of participants that had the whole set of physical activity data collected by the accelerometer is 6613. Among 6613 participants, 4242 participants had complete demography and Patient Health Questionnaire-9 (PHQ-9) questionnaire, a tool to quantify depressive symptoms. The association between activity-count behaviors and depressive symptoms was analyzed using multivariable logistic regression to adjust for confounding factors in sequential models. Results: We identified four physical activity-count behaviors based on five physical activity-counting patterns classified by unsupervised machine learning. Regarding PHQ-9 scores, we found that evening dominant behavior was positively associated with depressive symptoms compared to morning dominant behavior as the control group. Conclusions: Our results might contribute to monitoring and identifying individuals with latent depressive symptoms, emphasizing the importance of nuanced activity patterns and their probability of assessing depressive symptoms effectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. UNSURE - A machine learning approach to cryptocurrency trading.
- Author
-
Kochliaridis, Vasileios, Papadopoulou, Anastasia, and Vlahavas, Ioannis
- Subjects
REINFORCEMENT learning ,CRYPTOCURRENCY exchanges ,DEEP reinforcement learning ,MACHINE learning ,CRYPTOCURRENCIES ,PRICE fluctuations - Abstract
Although cryptocurrency trading can be highly profitable, it carries significant risks due to extreme price fluctuations and high degree of market noise. To increase profits and minimize risks, traders typically use various forecasting methods, such as technical analysis and Machine Learning (ML), but developing effective trading strategies in noisy markets still remains a challenging task. Recently, Deep Reinforcement Learning (DRL) agents have achieved high performance on challenging tasks, including algorithmic trading, however it requires significant amount of time and high-quality data to train effectively. Additionally, DRL agents lack explainability, making them a less popular option for traders. The purpose of this paper is to address these challenges by proposing a reliable trading framework. Our framework, named UNSURE, generates high-quality features from candlestick data using technical analysis along with a novel parameterization method, and then exploits high price fluctuations by combining three ML components: A) Unsupervised component, which further improves feature quality by clustering market data; B) DRL component, which is responsible for training agents that open Buy or Short positions; C) Supervised component, which estimates price fluctuations in order to open and close positions efficiently, while reducing trading uncertainty. We demonstrate the effectiveness of this approach on nine cryptocurrency markets using several risk-adjusted performance metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Basin-scale responses of groundwater-resource quality to drought and recovery, San Joaquin Valley, California.
- Author
-
Levy, Zeno F., Jurgens, Bryant C., Faulkner, Kirsten E., Harkness, Jennifer S., and Fram, Miranda S.
- Subjects
DROUGHTS ,GROUNDWATER quality ,AQUIFER pollution ,WATER quality ,GROUNDWATER tracers ,CLUSTER analysis (Statistics) ,FACTOR analysis ,POLLUTANTS - Abstract
Groundwater-resource quality is assumed to be less responsive to drought compared to that of surface water due to relatively long transit times of recharge to drinking-supply wells. Here, we evidence dynamic perturbations in aquifer pressure dynamics during drought and subsequent recovery periods cause dramatic shifts in groundwater quality on a basin scale. We used a novel application of time-series clustering on annual nitrate anomalies at >450 public-supply wells (PSWs) across California's San Joaquin Valley during 2000-22 to group subpopulations of wells with similar water-quality responses to drought. Additionally, we statistically evaluated the direction and magnitude of multi-constituent water-quality changes across the San Joaquin Valley using a broader dataset of >3000 PSWs with data during two select hydrologic stress periods representing an extreme drought (2012-16) and subsequent recovery (2016-19). Results of time-series clustering and stress-period change analyses corroborate a predominant regional response to pumping stress characterized by increased concentrations of anthropogenic constituents (nitrate, total dissolved solids) and decreased concentrations of geogenic constituents (arsenic, fluoride), which largely reversed during recovery. Cluster analysis also identified a secondary, less commonly occurring group of PSWs where nitrate decreased during drought, but explanatory factor analysis was not able to discern hydrogeologic drivers for these two divergent response patterns. Long-term tracer data support the hypothesis that the predominant regional signal of nitrate increase during drought is caused by enhanced capture of modern-aged groundwater by PSWs during periods of pumping stress, which can drive rapid changes in water quality on seasonal and multiannual timescales. Pumping-induced migration of modern, oxic groundwater to depth during drought may affect geochemical conditions in deeper portions of regional aquifers controlling the mobility of geogenic contaminants over the long term. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Temporal Multi-Features Representation Learning-Based Clustering for Time-Series Data
- Author
-
Jaehoon Lee, Dohee Kim, and Sunghyun Sim
- Subjects
Time-series clustering ,temporal multi-features representation (TMRC) ,temporal multi-features representation learning (TMRL) ,variational mode decomposition (VMD) ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Time-series clustering remains a challenge in data mining. Although novel deep-learning-based representation learning integrated with deep clustering methods have considerably enhanced the performance of time-series clustering, efficiently capturing the various temporal patterns inherent in the data is difficult in representation learning for time-series data. In this study, we proposed a novel representation learning method called temporal multi-features representation learning (TMRL) to capture various temporal patterns embedded in time-series data. Based on TMRL, we introduce the temporal multi-features representation clustering (TMRC) framework for performing time-series clustering. The proposed framework decomposes the input time-series data into k temporal patterns and uses k LSTM autoencoders to extract specialized features for each decomposed diverse temporal pattern through TMRL. Variational-mode decomposition is used to extract temporal multi-features. Finally, temporal multi-features derived from TMRL are ensembled for time-series clustering. To evaluate the superiority of the proposed method, comparative experiments were conducted with 36 publicly available time-series datasets against 16 baseline models. In the comparative experiments, we achieved the highest RI and normalized mutual information values in 12 time-series datasets. Particularly, on datasets consisting of eight types of motion- and spectro-types, the proposed method attained the highest RI and NMI values in six datasets. Furthermore, visualization results of the learned features through TMRL demonstrated superior representation learning compared with existing methods. These results indicated that the proposed TMRC framework is highly suitable for the learning representations of time-series data and can be effectively used for time-series clustering.
- Published
- 2024
- Full Text
- View/download PDF
12. Cross-border spillovers in G20 sovereign CDS markets: cluster analysis based on K-means machine learning algorithm and TVP–VAR models
- Author
-
Chen, Zhizhen, Shi, Guifen, and Sun, Boyang
- Published
- 2024
- Full Text
- View/download PDF
13. Time-series clustering and forecasting household electricity demand using smart meter data
- Author
-
Hyojeoung Kim, Sujin Park, and Sahm Kim
- Subjects
Time-series clustering ,Residential electricity demand ,Time-series forecasting ,Smart meter data ,Weather variables ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This study forecasts electricity consumption in a smart grid environment. We present a bottom-up prediction method using a combination of forecasting values based on time-series clustering using advanced metering infrastructure (AMI) data, one of the core smart grid technologies. Remote data metering every 15 min to 1 h is possible with real-time communication on power generation information, consumption, and AMI development. Hence, its prediction is more challenging due to the large variation of each household’s electricity. These issues were solved by time-series clustering methods using Euclidean distances and Dynamic Time Warping distance. The auto-regressive integrated moving average (ARIMA), ARIMA exogenous (ARIMAX), double seasonal Holt–Winters (DSHW), trigonometric, Box–Cox transform, autoregressive moving average errors, trend and seasonal components (TBATS), neural network nonlinear autoregressive (NNAR), and nonlinear autoregressive exogenous (NARX) models were used for demand forecasting based on clustering. The result showed that the time-series clustering method performed better than that using the total amount of electricity demand regarding the mean absolute percentage error and root mean squared error.Hence, various exogenous variables were considered to improve model accuracy. The model considering exogenous variables—cooling degree day, humidity, insolation, indicator variables, and generation power consumption performed better than that without exogenous variables.
- Published
- 2023
- Full Text
- View/download PDF
14. How do greenspace landscapes affect PM2.5 exposure in Wuhan? Linking spatial-nonstationary, annual varying, and multiscale perspectives.
- Author
-
Zhan, Qingming, Yang, Chen, and Liu, Huimin
- Subjects
PARTICULATE matter ,RISK exposure ,WORLD health ,CENTROID ,CULTURAL landscapes - Abstract
As an ambient atmospheric pollutant, fine particulate matter (PM2.5) has posed significant adverse impacts on public health around the world. To attenuate the population exposure risk to PM2.5 pollution, greenspace has been considered as a promising approach. Little is known, however, about the attenuating impacts of greenspace landscapes on PM2.5 exposure risks at various locations, scales, and exposure levels. This study employed hotspot analysis, weighted barycenter, and time-series clustering to investigate the spatiotemporal dynamics of PM2.5 exposure across Wuhan. In addition, the multi-scale geographically weighted regression (MGWR) was used to determine the relationships between greenspace landscape patterns and yearly PM2.5 exposure over four years (2000, 2005, 2010, and 2015). Results revealed that, between 2000 and 2016, the variations in PM2.5 exposure hotspot coverages within Wuhan showed an inverse U-shape trend. The K-DTW clustering differentiated the study area into seven spatial clusters with homogeneous temporal dynamics. In general, there were three stages of fluctuations in PM2.5 exposure in Wuhan: 2000–2005, 2006–2011, and 2012–2016. MGWR also disclosed associations between PM2.5 exposure and greenspace landscape parameters (AI, ED, SI, and PLAND). PLAND of green spaces can mitigate PM2.5 exposure at a broader scale (the average bandwidth was 1391), while AI, ED, and SI are generally associated with PM2.5 exposure reduction on local scales. In Wuhan, we also confirmed such relationships between four landscape metrics with varying levels of exposure risks. The results indicate that the attenuation effectiveness toward PM2.5 exposure risk by greenspace landscapes is not only site- and scale-dependent but also affected by exposure risk levels. The findings of this study can contribute to greenspace planning and management for mitigating PM2.5-attributable adverse health impacts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Nationwide epidemiology and clinical practice patterns of pediatric urinary tract infections: application of multivariate time-series clustering.
- Author
-
Okubo, Yusuke, Uda, Kazuhiro, Miyairi, Isao, Michihata, Nobuaki, Kumazawa, Ryosuke, Matsui, Hiroki, Fushimi, Kiyohide, and Yasunaga, Hideo
- Subjects
- *
SCIENTIFIC observation , *URINARY tract infections , *MACHINE learning , *RETROSPECTIVE studies , *ACQUISITION of data , *TIME series analysis , *MEDICAL records , *RESEARCH funding , *PHYSICIAN practice patterns , *ALGORITHMS , *ANTIBIOTICS , *CHILDREN - Abstract
Background : The nationwide epidemiology and clinical practice patterns for younger children hospitalized with urinary tract infections (UTIs) were unclear. Methods: We conducted a retrospective observational study consisting of 32,653 children aged < 36 months who were hospitalized with UTIs from 856 medical facilities during fiscal years 2011–2018 using a nationally representative inpatient database in Japan. We investigated the epidemiology of UTIs and changes in clinical practice patterns (e.g., antibiotic use) over 8 years. A machine learning algorithm of multivariate time-series clustering with dynamic time warping was used to classify the hospitals based on antibiotic use for UTIs. Results: We observed marked male predominance among children aged < 6 months, slight female predominance among children aged > 12 months, and summer seasonality among children hospitalized with UTIs. Most physicians selected intravenous second- or third-generation cephalosporins as the empiric therapy for treating UTIs, which was switched to oral antibiotics during hospitalizations for 80% of inpatients. Whereas total antibiotic use was constant over the 8 years, broad-spectrum antibiotic use decreased gradually from 5.4 in 2011 to 2.5 days of therapy per 100 patient-days in 2018. The time-series clustering distinctively classified 5 clusters of hospitals based on antibiotic use patterns and identified hospital clusters that preferred to use broad-spectrum antibiotics (e.g., antipseudomonal penicillin and carbapenem). Conclusions: Our study provided novel insight into the epidemiology and practice patterns for pediatric UTIs. Time-series clustering can be useful to identify the hospitals with aberrant practice patterns to further promote antimicrobial stewardship. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Characterizing Pairwise U-Turn Behavior in Fish: A Data-Driven Analysis.
- Author
-
Tao, Yuan, Zhou, Yuchen, Zheng, Zhicheng, Lei, Xiaokang, and Peng, Xingguang
- Subjects
- *
FISH locomotion , *SWIMMING - Abstract
We applied the time-series clustering method to analyze the trajectory data of rummy-nose tetra (Hemigrammus rhodostomus), with a particular focus on their spontaneous paired turning behavior. Firstly, an automated U-turn maneuver identification method was proposed to extract turning behaviors from the open trajectory data of two fish swimming in an annular tank. We revealed two distinct ways of pairwise U-turn swimming, named dominated turn and non-dominated turn. Upon comparison, the dominated turn is smoother and more efficient, with a fixed leader–follower relationship, i.e., the leader dominates the turning process. Because these two distinct ways corresponded to different patterns of turning feature parameters over time, we incorporated the Toeplitz inverse covariance-based clustering (TICC) method to gain deeper insights into this process. Pairwise turning behavior was decomposed into some elemental state compositions. Specifically, we found that the main influencing factor for a spontaneous U-turn is collision avoidance with the wall. In dominated turn, when inter-individual distances were appropriate, fish adjusted their positions and movement directions to achieve turning. Conversely, in closely spaced non-dominated turn, various factors such as changes in distance, velocity, and movement direction resulted in more complex behaviors. The purpose of our study is to integrate common location-based analysis methods with time-series clustering methods to analyze biological behavioral data. The study provides valuable insights into the U-turn behavior, motion characteristics, and decision factors of rummy-nose tetra during pairwise swimming. Additionally, the study extends the analysis of fish interaction features through the application of time-series clustering methods, offering a fresh perspective for the analysis of biological collective data. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Unraveling fundamental properties of power system resilience curves using unsupervised machine learning
- Author
-
Bo Li and Ali Mostafavi
- Subjects
Power system ,Infrastructure resilience ,Unsupervised learning ,Time-series clustering ,Resilience curve ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 ,Computer software ,QA76.75-76.765 - Abstract
Power system is vital to modern societies, while it is susceptible to hazard events. Thus, analyzing resilience characteristics of power system is important. The standard model of infrastructure resilience, the resilience triangle, has been the primary way of characterizing and quantifying resilience in infrastructure systems for more than two decades. However, the theoretical model provides a one-size-fits-all framework for all infrastructure systems and specifies general characteristics of resilience curves (e.g., residual performance and duration of recovery). Little empirical work has been done to delineate infrastructure resilience curve archetypes and their fundamental properties based on observational data. Most of the existing studies examine the characteristics of infrastructure resilience curves based on analytical models constructed upon simulated system performance. There is a dire dearth of empirical studies in the field, which hindered our ability to fully understand and predict resilience characteristics in infrastructure systems. To address this gap, this study examined more than two hundred power-grid resilience curves related to power outages in three major extreme weather events in the United States. Through the use of unsupervised machine learning, we examined different curve archetypes, as well as the fundamental properties of each resilience curve archetype. The results show two primary archetypes for power grid resilience curves, triangular curves, and trapezoidal curves. Triangular curves characterize resilience behavior based on three fundamental properties: 1. critical functionality threshold, 2. critical functionality recovery rate, and 3. recovery pivot point. Trapezoidal archetypes explain resilience curves based on 1. duration of sustained function loss and 2. constant recovery rate. The longer the duration of sustained function loss, the slower the constant rate of recovery. The findings of this study provide novel perspectives enabling better understanding and prediction of resilience performance of power system infrastructure in extreme weather events.
- Published
- 2024
- Full Text
- View/download PDF
18. Time series clustering of COVID-19 pandemic-related data
- Author
-
Zhixue Luo, Lin Zhang, Na Liu, and Ye Wu
- Subjects
Pandemic time series ,SARS-CoV-2 ,COVID-19 ,Time-series clustering ,Sequence data ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
The COVID-19 pandemic continues to impact daily life worldwide. It would be helpful and valuable if we could obtain valid information from the COVID-19 pandemic sequential data itself for characterizing the pandemic. Here, we aim to demonstrate that it is feasible to analyze the patterns of the pandemic using a time-series clustering approach. In this work, we use dynamic time warping distance and hierarchical clustering to cluster time series of daily new cases and deaths from different countries into four patterns. It is found that geographic factors have a large but not decisive influence on the pattern of pandemic development. Moreover, the age structure of the population may also influence the formation of cluster patterns. Our proven valid method may provide a different but very useful perspective for other scholars and researchers.
- Published
- 2023
- Full Text
- View/download PDF
19. Time-series clustering for pattern recognition of speed and heart rate while driving: A magnifying lens on the seconds around harsh events.
- Author
-
Tselentis, Dimitrios I. and Papadimitriou, Eleonora
- Subjects
- *
PATTERN recognition systems , *HEART beat , *ACCELERATION (Mechanics) , *DATA scrubbing , *SPEED , *DISTRACTION - Abstract
• This research reveals existing driving patterns around harsh events. • A microscopic analytical methodology is employed using naturalistic driving data. • Driving metrics used include driving speed and heard rate. • Time-series clustering is applied for driving pattern recognition. Driving pattern recognition has been applied for the purposes of driving styles identification and harsh driving events detection. However, the evolution of driving behavior around and especially before such events has not been investigated at a microscopic level. The objective of this research is to reveal existing driving patterns around harsh events at the driving 'pulse' level i.e. a few seconds before and after the event. For that purpose, a time-series clustering approach is applied on speed and heart rate metrics of individual drivers using data collected from a large naturalistic driving study. Results show that there are distinct speed patterns before harsh braking, harsh acceleration, and harsh cornering events. A deceleration is identified shortly before most harsh acceleration and cornering events, which possibly indicates reckless behavior, i.e. drivers not dedicating enough time to smoothly brake before cornering, or of a brief 'decision-making' moment before the harsh manoeuvre. On the contrary, speed seems to be steady before harsh braking events. Regarding heart rate, the analysis revealed certain patterns only after raw data were cleansed and filtered. These patterns may show increasing, decreasing or variable heart rate trends, which may correspond to different stress patterns of drivers around harsh events. Finally, we introduce the concept of driving pattern consistency, which can reveal the share of individual drivers that follow the same harsh event pattern. It is indicated that more than half of the drivers are not consistent, suggesting that driving patterns around harsh events may be more context-related than driver personality-related. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Association between intraoperative pulmonary artery pressure and cardiovascular complications after off-pump coronary artery bypass surgery: a single-center observational study
- Author
-
Mitsuhiro Matsuo, Toshio Doi, Masahito Katsuki, Yuichiro Yoshimura, Hisakatsu Ito, Kazuaki Fukahara, Naoki Yoshimura, and Mitsuaki Yamazaki
- Subjects
Artificial intelligence ,Postoperative complication ,Pulmonary hypertension ,Off-pump coronary artery bypass surgery ,Time-series clustering ,Anesthesiology ,RD78.3-87.3 - Abstract
Abstract Background The impact of intraoperative pulmonary hemodynamics on prognosis after off-pump coronary artery bypass (OPCAB) surgery remains unknown. In this study, we examined the association between intraoperative vital signs and the development of major adverse cardiovascular events (MACE) during hospitalization or within 30 days postoperatively. Methods This retrospective study analyzed data from a university hospital. The study cohort comprised consecutive patients who underwent isolated OPCAB surgery between November 2013 and July 2021. We calculated the mean and coefficient of variation of vital signs obtained from the intra-arterial catheter, pulmonary artery catheter, and pulse oximeter. The optimal cut-off was defined as the receiver operating characteristic curve (ROC) with the largest Youden index (Youden index = sensitivity + specificity – 1). Multivariate logistic regression analysis ROC curves were used to adjust all baseline characteristics that yielded P values of
- Published
- 2023
- Full Text
- View/download PDF
21. Analyzing Air Pollution and Traffic Data in Urban Areas in Luxembourg
- Author
-
Wassila Aggoune-Mtalaa and Mohamed Laib
- Subjects
urban air quality ,sustainable mobility ,urban traffic ,time-series clustering ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Monitoring air quality is gaining popularity in the research community since it can help policymakers make the right decisions for mitigating the negative effects of the ever-increasing pollution in cities. One of the significant sources of air pollution in urban areas is road transport. Assessing and understanding the relationship between urban traffic and local pollutants is crucial to maintaining sustainable urban mobility. This paper presents an exploratory data analysis of air pollution and traffic in some cities in Luxembourg. Furthermore, we studied the link that several pollutants have with other parameters, such as temperature and humidity. The paper also focuses on traffic and offers more insights for sustainable urban mobility.
- Published
- 2023
- Full Text
- View/download PDF
22. Elucidation of mosaic patterns in gravel riverbeds using classifying flow velocity regimes obtained from a planar two‐dimensional analysis.
- Author
-
Niwa, Hideyuki, Taki, Kentaro, and Izumino, Tamaho
- Subjects
FLOW velocity ,RIVER channels ,GRAVEL ,RIVER conservation ,BIODIVERSITY conservation ,DRONE aircraft - Abstract
Gravel riverbeds in the middle reaches of Japanese rivers are essential habitats for various plants and animals. Disturbance from flooding is necessary for the formation of gravel riverbeds, but human control of rivers, such as dams and channelization, has altered flow and sediment regimes, thereby reducing disturbance. The flooding generates a mosaic pattern characterized by varying frequencies and intensities of disturbance in gravel riverbeds. Understanding the disturbance regimes that form mosaic patterns is important for the conservation of biodiversity in rivers. In this study, we proposed a method to extract mosaic patterns from flow velocity regimes obtained by planar two‐dimensional analysis by classifying them with time‐series clustering. Based on the distribution of Anaphalis margaritacea var. yedoensis on gravel riverbanks, we compared several past flooding events to identify mosaic patterns that are important for A. margaritacea var. yedoensis. The study site is the Echi River, which flows through Shiga Prefecture in Japan and into Lake Biwa. Using a unmanned aerial vehicle (UAV), orthomosaic images with an average ground resolution of 3.3 mm/pixel were created, and colony polygons of A. margaritacea var. yedoensis were created using image detection and visual correction. Hydraulic analysis was performed using iRIC ver2.3 (Nays2DH ver1.0). Time‐series clustering was used to classify the flow velocity regimes for each computed mesh into 30 clusters. The relationship between the clusters of each flooding event and the distribution of A. margaritacea var. yedoensis was evaluated. Mosaic patterns were created by classifying the flow velocity regimes of each computational mesh calculated by planar 2D analysis into clusters using time‐series clustering. After analyzing the relationship between each cluster and the area of distribution of A. margaritacea var. yedoensis, the first flooding event was determined to be the mosaic pattern that best explained the distribution of A. margaritacea var. yedoensis. Cluster 1, the "low peak, short duration type," was considered the growth center of A. margaritacea var. yedoensis. The method used in this study is an innovative approach for obtaining mosaic patterns that quantifies these five elements of the disturbance regime. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. A Career in Football: What is Behind an Outstanding Market Value?
- Author
-
Acs, Balazs, Toka, Laszlo, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Brefeld, Ulf, editor, Davis, Jesse, editor, Van Haaren, Jan, editor, and Zimmermann, Albrecht, editor
- Published
- 2022
- Full Text
- View/download PDF
24. 4D electrical resistivity to monitor unstable slopes in mountainous tropical regions: an example from Munnar, India.
- Author
-
Watlet, Arnaud, Thirugnanam, Hemalatha, Singh, Balmukund, Kumar M., Nitin, Brahmanandan, Deepak, Inauen, Cornelia, Swift, Russell, Meldrum, Phil, Uhlemann, Sebastian, Wilkinson, Paul, Chambers, Jonathan, and Ramesh, Maneesha Vinodini
- Subjects
- *
ELECTRICAL resistivity , *PORE water pressure , *LANDSLIDE hazard analysis , *SENSOR networks , *SOIL moisture , *COMMUNITIES , *SPATIAL resolution - Abstract
The number of large landslides in India has risen in the recent years, due to an increased occurrence of extreme monsoon rainfall events. There is an urgent need to improve our understanding of moisture-induced landslide dynamics, which vary both spatially and temporally. Geophysical methods provide integrated tools to monitor subsurface hydrological processes in unstable slopes at high spatial resolution. They are complementary to more conventional approaches using networks of point sensors, which can provide high temporal resolution information but are severely limited in terms of spatial resolution. Here, we present and discuss data from an electrical resistivity tomography monitoring system—called PRIME—deployed at the Amrita Landslide Early Warning System (Amrita-LEWS) site located in Munnar in the Western Ghats (Kerala, India). The system monitors changes in electrical resistivity in the subsurface of a landslide-prone slope that directly threatens a local community. The monitoring system provides a 4D resistivity model informing on the moisture dynamics in the subsurface of the slope. Results from a 10-month period spanning from pre-monsoon to the end of the monsoon season 2019 are presented and discussed with regard to the spatial variation of soil moisture. The temporal changes in resistivity within the slope are further investigated through the use of time-series clustering and compared to weather and subsurface pore water pressure data. This study sheds new light on the hydrological processes occurring in the shallow subsurface during the monsoon and potentially leading to slope failure. This geophysical approach aims at better understanding and forecasting slope failure to reduce the risk for the local community, thereby providing a powerful tool to be included in local landslide early warning systems. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Regional Economic Disparities in Europe: Time-Series Clustering of NUTS 3 Regions.
- Author
-
López-Villuendas, Ana María and del Campo, Cristina
- Subjects
- *
REGIONAL economic disparities , *CLUSTER analysis (Statistics) , *REGIONAL disparities - Abstract
The aim of this research is to identify the regional economic disparities in the level of economic wealth and its dynamics in the NUTS 3 regions in EU28 over the period from 2000 to 2017. By performing a time-series clustering analysis at NUTS 3 level, we expect to uncover the economic disparities that might have been hidden in the aggregate NUTS 2 regions. Our results indicate that at a finer spatial scale (NUTS 3 level) disparities flourish, particularly in the period after the global crisis of 2008, in which different recovery rates are observed. In general, NUTS 2 regions tend to spatially cluster at the national level and, although NUTS 3 regions show slightly this tendency as well, the spatial effect is not as strong as it is for NUTS 2 level, revealing specific behaviours of the local economies and markets that remain hidden at the aggregate NUTS 2 level. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
26. Association between intraoperative pulmonary artery pressure and cardiovascular complications after off-pump coronary artery bypass surgery: a single-center observational study.
- Author
-
Matsuo, Mitsuhiro, Doi, Toshio, Katsuki, Masahito, Yoshimura, Yuichiro, Ito, Hisakatsu, Fukahara, Kazuaki, Yoshimura, Naoki, and Yamazaki, Mitsuaki
- Subjects
- *
SURGICAL therapeutics , *BLOOD pressure , *CORONARY artery bypass , *SCIENTIFIC observation , *CONFIDENCE intervals , *MULTIPLE regression analysis , *ISCHEMIC stroke , *MAJOR adverse cardiovascular events , *PULMONARY artery , *CARDIOVASCULAR diseases , *RETROSPECTIVE studies , *DESCRIPTIVE statistics , *TIME series analysis , *RECEIVER operating characteristic curves , *SENSITIVITY & specificity (Statistics) , *LONGITUDINAL method , *HEART failure , *DISEASE complications - Abstract
Background: The impact of intraoperative pulmonary hemodynamics on prognosis after off-pump coronary artery bypass (OPCAB) surgery remains unknown. In this study, we examined the association between intraoperative vital signs and the development of major adverse cardiovascular events (MACE) during hospitalization or within 30 days postoperatively. Methods: This retrospective study analyzed data from a university hospital. The study cohort comprised consecutive patients who underwent isolated OPCAB surgery between November 2013 and July 2021. We calculated the mean and coefficient of variation of vital signs obtained from the intra-arterial catheter, pulmonary artery catheter, and pulse oximeter. The optimal cut-off was defined as the receiver operating characteristic curve (ROC) with the largest Youden index (Youden index = sensitivity + specificity – 1). Multivariate logistic regression analysis ROC curves were used to adjust all baseline characteristics that yielded P values of < 0.05. Results: In total, 508 patients who underwent OPCAB surgery were analyzed. The mean patient age was 70.0 ± 9.7 years, and 399 (79%) were male. There were no patients with confirmed or suspected preoperative pulmonary hypertension. Postoperative MACE occurred in 32 patients (heart failure in 16, ischemic stroke in 16). The mean pulmonary artery pressure (PAP) was significantly higher in patients with than without MACE (19.3 ± 3.0 vs. 16.7 ± 3.4 mmHg, respectively; absolute difference, 2.6 mmHg; 95% confidence interval, 1.5 to 3.8). The area under the ROC curve of PAP for the prediction of MACE was 0.726 (95% confidence interval, 0.645 to 0.808). The optimal mean PAP cut-off was 18.8 mmHg, with a specificity of 75.8% and sensitivity of 62.5% for predicting MACE. After multivariate adjustments, high PAP remained an independent risk factor for MACE. Conclusions: Our findings provide the first evidence that intraoperative borderline pulmonary hypertension may affect the prognosis of patients undergoing OPCAB surgery. Future large-scale prospective studies are needed to verify the present findings. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
27. Characterizing Pairwise U-Turn Behavior in Fish: A Data-Driven Analysis
- Author
-
Yuan Tao, Yuchen Zhou, Zhicheng Zheng, Xiaokang Lei, and Xingguang Peng
- Subjects
rummy-nose tetra ,fish interaction ,time-series clustering ,data analysis ,Science ,Astrophysics ,QB460-466 ,Physics ,QC1-999 - Abstract
We applied the time-series clustering method to analyze the trajectory data of rummy-nose tetra (Hemigrammus rhodostomus), with a particular focus on their spontaneous paired turning behavior. Firstly, an automated U-turn maneuver identification method was proposed to extract turning behaviors from the open trajectory data of two fish swimming in an annular tank. We revealed two distinct ways of pairwise U-turn swimming, named dominated turn and non-dominated turn. Upon comparison, the dominated turn is smoother and more efficient, with a fixed leader–follower relationship, i.e., the leader dominates the turning process. Because these two distinct ways corresponded to different patterns of turning feature parameters over time, we incorporated the Toeplitz inverse covariance-based clustering (TICC) method to gain deeper insights into this process. Pairwise turning behavior was decomposed into some elemental state compositions. Specifically, we found that the main influencing factor for a spontaneous U-turn is collision avoidance with the wall. In dominated turn, when inter-individual distances were appropriate, fish adjusted their positions and movement directions to achieve turning. Conversely, in closely spaced non-dominated turn, various factors such as changes in distance, velocity, and movement direction resulted in more complex behaviors. The purpose of our study is to integrate common location-based analysis methods with time-series clustering methods to analyze biological behavioral data. The study provides valuable insights into the U-turn behavior, motion characteristics, and decision factors of rummy-nose tetra during pairwise swimming. Additionally, the study extends the analysis of fish interaction features through the application of time-series clustering methods, offering a fresh perspective for the analysis of biological collective data.
- Published
- 2023
- Full Text
- View/download PDF
28. How do greenspace landscapes affect PM2.5 exposure in Wuhan? Linking spatial-nonstationary, annual varying, and multiscale perspectives
- Author
-
Qingming Zhan, Chen Yang, and Huimin Liu
- Subjects
PM2.5 exposure ,greenspace ,spatiotemporal variations ,multi-scale geographically weighted regression ,time-series clustering ,Mathematical geography. Cartography ,GA1-1776 ,Geodesy ,QB275-343 - Abstract
As an ambient atmospheric pollutant, fine particulate matter (PM2.5) has posed significant adverse impacts on public health around the world. To attenuate the population exposure risk to PM2.5 pollution, greenspace has been considered as a promising approach. Little is known, however, about the attenuating impacts of greenspace landscapes on PM2.5 exposure risks at various locations, scales, and exposure levels. This study employed hotspot analysis, weighted barycenter, and time-series clustering to investigate the spatiotemporal dynamics of PM2.5 exposure across Wuhan. In addition, the multi-scale geographically weighted regression (MGWR) was used to determine the relationships between greenspace landscape patterns and yearly PM2.5 exposure over four years (2000, 2005, 2010, and 2015). Results revealed that, between 2000 and 2016, the variations in PM2.5 exposure hotspot coverages within Wuhan showed an inverse U-shape trend. The K-DTW clustering differentiated the study area into seven spatial clusters with homogeneous temporal dynamics. In general, there were three stages of fluctuations in PM2.5 exposure in Wuhan: 2000–2005, 2006–2011, and 2012–2016. MGWR also disclosed associations between PM2.5 exposure and greenspace landscape parameters (AI, ED, SI, and PLAND). PLAND of green spaces can mitigate PM2.5 exposure at a broader scale (the average bandwidth was 1391), while AI, ED, and SI are generally associated with PM2.5 exposure reduction on local scales. In Wuhan, we also confirmed such relationships between four landscape metrics with varying levels of exposure risks. The results indicate that the attenuation effectiveness toward PM2.5 exposure risk by greenspace landscapes is not only site- and scale-dependent but also affected by exposure risk levels. The findings of this study can contribute to greenspace planning and management for mitigating PM2.5-attributable adverse health impacts.
- Published
- 2022
- Full Text
- View/download PDF
29. Dynamic characteristics of the COVID-19 epidemic in China's major cities.
- Author
-
Song, Ci, Pei, Tao, Wang, Xi, Liu, Yaxi, Ma, Jia, and Zhou, Daojing
- Subjects
- *
COVID-19 pandemic , *METROPOLIS , *COVID-19 , *SARS-CoV-2 - Abstract
The novel coronavirus disease of 2019 (COVID-19) first appeared in Wuhan and subsequently spread rapidly in cities and provinces across the country and all over the world. In order to effectively control the spread of the epidemic in different areas, zonal management and endemic prevention and control policies should be implemented according to local epidemic situations. This study proposes a time-series clustering method to discover dynamic characteristics of the COVID-19 epidemic by categorizing the epidemic situations in China's major cities into groups based on daily reported confirmed cases and analysing the driving factors of the city background conditions for each category. Our results show that according to the dynamic patterns of the COVID-19 epidemic there are eight types of epidemic situations, including extreme outbreak areas, large spread areas, potential resurged areas, middle spread areas, controlled outbreak areas, limited growth areas, delayed outbreak areas, and lag report areas. These dynamic patterns are mainly related to the city background conditions, such as population flow, local resident number, government emergency response capability, and medical resource conditions. Based on our results, different endemic prevention and control measures are recommended for containing the COVID-19 epidemic in cities with different types of epidemic situations. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. Dynamic characteristics of the COVID-19 epidemic in China’s major cities
- Author
-
Ci Song, Tao Pei, Xi Wang, Yaxi Liu, Jia Ma, and Daojing Zhou
- Subjects
covid-19 ,endemic prevention and control measures ,epidemic dynamic characteristics ,time-series clustering ,Mathematical geography. Cartography ,GA1-1776 - Abstract
The novel coronavirus disease of 2019 (COVID-19) first appeared in Wuhan and subsequently spread rapidly in cities and provinces across the country and all over the world. In order to effectively control the spread of the epidemic in different areas, zonal management and endemic prevention and control policies should be implemented according to local epidemic situations. This study proposes a time-series clustering method to discover dynamic characteristics of the COVID-19 epidemic by categorizing the epidemic situations in China’s major cities into groups based on daily reported confirmed cases and analysing the driving factors of the city background conditions for each category. Our results show that according to the dynamic patterns of the COVID-19 epidemic there are eight types of epidemic situations, including extreme outbreak areas, large spread areas, potential resurged areas, middle spread areas, controlled outbreak areas, limited growth areas, delayed outbreak areas, and lag report areas. These dynamic patterns are mainly related to the city background conditions, such as population flow, local resident number, government emergency response capability, and medical resource conditions. Based on our results, different endemic prevention and control measures are recommended for containing the COVID-19 epidemic in cities with different types of epidemic situations.
- Published
- 2022
- Full Text
- View/download PDF
31. Utilizing Social Clustering-Based Regression Model for Predicting Student’s GPA
- Author
-
Yomna M. I. Hassan, Abeer Elkorany, and Khaled Wassif
- Subjects
Time-series clustering ,k-means ,community of inquiry ,GPA prediction ,DTW similarity ,student modeling ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
The importance of e-learning has exceeded expectations over the past decade. Accordingly, several systems have been developed in completing intelligent assistive tools where students’ behavior can be tracked and followed with suitable recommendations to enhance students’ performance. This paper has two main objectives. First, the Community of Inquiry framework (CoI) is utilized as one of the most prominent student behavioral modeling to select features that best represent the students. According to experts’ annotation, this study filters students’ measured attributes from the StudentLife dataset to the CoI model, focusing on social presence. Second, the research looks at improving the accuracy and runtime of the Grade Point Average (GPA) prediction by introducing a hybrid model that combines combining k-means clustering phase based on student similarity with regression-based prediction. The clustering was performed on both static and Spatio-temporal (spatial time -series) students’ attributes. Results show that LassoCV outperforms other regression techniques such as Standard Linear, Lasso, and Ridge Regression with an RMSE averaged around 0.15 and an average Adjusted R2 of 0.935 overall trials. Selecting the features according to the CoI reduces the number of features by 62.8%. Time-series clustering on its own was not beneficial; however, when conducted with the selection phase, it raised the quality of the model achieved by 2-3%.
- Published
- 2022
- Full Text
- View/download PDF
32. Knowledge graph and behavior portrait of intelligent attack against path planning.
- Author
-
Zhang, Li, Li, Zhao, Ren, Huali, Yu, Xiao, Ma, Yuxi, and Zhang, Quanxin
- Subjects
KNOWLEDGE graphs ,AUTONOMOUS vehicles ,ARTIFICIAL intelligence ,REINFORCEMENT learning ,GRAPH algorithms ,REMOTELY piloted vehicles - Abstract
The broad application of artificial intelligence (AI) shows more and more vulnerabilities. Adversaries have more opportunities to attack AI systems. For example, unmanned vehicles may be interfered with by adversaries in path planning, resulting in unmanned vehicles being unable to move according to the planned route, and even serious safety problems. On the other side, the portrait technology can extract highly refined characteristics of different attack strategies, so that unmanned vehicles can defend themselves based on the characteristics of each attack. Existing research lacks intelligent attack research on path planning in the field of unmanned vehicles, and lacks portraits of attack behaviors in this scenario. This paper combines multiagent reinforcement learning technology, time‐series segmentation clustering technology, and knowledge graph technology to study the portrait technology of adversary intelligent attack behavior in the field of unmanned vehicle path planning. First, the simulation results of unmanned vehicle path planning are obtained, and the steps of adversary attack behavior are extracted by using Toeplitz inverse covariance‐based clustering time‐series segmentation cluster technology. Second, the knowledge graph is used to save the attack strategy, so as to form the attack behavior portrait of unmanned vehicle path planning. The test on the Neo4j platform shows that our method is universal, can effectively describe the attack steps for unmanned vehicle path planning, and provides the basis for attack detection to establish the defense system of unmanned vehicles. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Evaluation and Comparison of Spatial Clustering for Solar Irradiance Time Series.
- Author
-
Garcia-Gutierrez, Luis, Voyant, Cyril, Notton, Gilles, and Almorox, Javier
- Subjects
TIME series analysis ,HIERARCHICAL clustering (Cluster analysis) ,SOLAR radiation ,K-means clustering ,STANDARD deviations ,IRRADIATION - Abstract
This work exposes an innovative clustering method of solar radiation stations, using static and dynamic parameters, based on multi-criteria analysis for future objectives to make the forecasting of the solar resource easier. The innovation relies on a characterization of solar irradiation from both a quantitative point of view and a qualitative one (variability of the intermittent sources). Each of the 76 Spanish stations studied is firstly characterized by static parameters of solar radiation distributions (mean, standard deviation, skewness, and kurtosis) and then by dynamic ones (Hurst exponent and forecastability coefficient, which is a new concept to characterize the "difficulty" to predict the solar radiation intermittence) that are rarely used, or even never used previously, in such a study. A redundancy analysis shows that, among all the explanatory variables used, three are essential and sufficient to characterize the solar irradiation behavior of each site; thus, in accordance with the principle of parsimony, only the mean and the two dynamic parameters are used. Four clustering methods were applied to identify geographical areas with similar solar irradiation characteristics at a half-an-hour time step: hierarchical, k-means, k-medoids, and spectral cluster. The achieved clusters are compared with each other and with an updated Köppen–Geiger climate classification. The relationship between clusters is analyzed according to the Rand and Jaccard Indexes. For both cases (five and three classes), the hierarchical clustering algorithm is the closest to the Köppen classification. An evaluation of the clustering algorithms' performance shows no interest in implementing k-means and spectral clustering simultaneously since the results are similar by more than 90% for three and five classes. The recommendations for operating a solar radiation clustering are to use k-means or hierarchical clustering based on mean, Hurst exponent, and forecastability parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Offline Radar Pulse Track Association with Deinterleaving Errors
- Author
-
Jakobsson, Valdemar and Jakobsson, Valdemar
- Abstract
Radar systems play a crucial role in detecting the position and velocity of objects by emitting electromagnetic signals into the environment. In electronic warfare, it is essential to identify which pulses from pulsed radar systems originate from the same system, in order to track the activity of the radar system. This problem can be solved by deinterleaving the radar pulses, a process that aims to sort the received pulses within a window into different tracks based on which radar system generated the pulses. However, typically deinterleaving does not process the entire recording at once, as such sets of sequences are generated from the deinterleaving process. Additionally, deinterleaving is prone to errors, necessitating solutions to associate sequences to form complete tracks and correct for the errors. This thesis investigates time-series clustering as a solution to this problem, proposing three distinct algorithms: Mixture of Hidden Markov Models Clustering, a Fisher-Rao distance based clustering algorithm and a Long Short-Term Memory based Autoencoder. These algorithms represent varied approaches to time-series clustering, including model-based, distance-based and representation-based methods respectively. Additionally, a two-step process is proposed for error correction: identifying errors within each track using Out-of-Distribution detection, followed by inserting incorrect pulses into the correct track with a classifier. Among the algorithms studied, Mixture of Hidden Markov Models Clustering demonstrates the best performance for sequence association, albeit with high processing costs. However, Fisher-Rao and the Long Short-Term Memory based Autoencoder also yield promising results, offering potential for extension to online applications. For error correction, a combination of a Isolation Forest followed by a Decision Tree was observed to perform well under these circumstances., Radar-system spelar en avgörande roll inom position och hastighets-detektion av objekt. Detta utförs genom att sända ut en elektromagnetisk signal i omgivningen, vilken sedan kommer tillbaka till radar-systemet. Denna radar-signal kan dock även tas upp av en passiv radar. Inom elektronisk krigsföring så är det viktigt att kunna avgöra vilka detekterade pulser som avgetts från samma pulserande radar system. Detta problem kan lösas av en så-kallad avinterfolierings-process, denna process sorterar de inkommande pulserna utefter vilket radar-system som avgett pulserna. Däremot, skapar avinterfoliering typiskt inte kompletta sekvenser av ett radar-system, samt att fel kan uppstå inom denna avinterfolierings process. Därmed är det viktigt att kunna skapa fullständiga sekvenser utifrån de inkompletta sekvenserna, samt rätta till felen i avinterfolierings processen. I denna rapport så undersöks tidsserieklustring som en potentiell lösning till detta problem. Tre olika algoritmer har utvecklas för detta ändamål: Mixture of Hidden Markov Models Clustering, en Fisher-Rao avståndsbaserad klustrings algoritm och en Long Short-Term Memory baserad Autoencoder. Dessa algoritmer representerar olika sätt att lösa ett tidsserieklustrings problem, modellbaserat, avståndsbaserat samt representationsbaserat. Desutom, ges en två-stegs process för att rätta till avinterfolieringsfelen. Först identifieras felen i vardera slutgiltig sekvens med Out-of-Distribution detektion, följt av införning av de detekterade pulserna till korrekt sekvens med hjälp av en klassifierare. Bland de undersökta algoritmerna, så gav Mixture of Hidden Markov Models Clustering bäst prestanda för tidsserieklustringen, dock lider denna algoritm av hög processeringskostnad. Däremot, gav Fisher-Rao samt den Long Short-Term Memory-baserade Autoencodern också bra prestanda, vilket kan ge potentiella lösningar till realtids utökningar av problemet. För avinterfolieringsrättningen, fungerade en kombination av Isolation For
- Published
- 2024
35. Unsupervised Visual Time-Series Representation Learning and Clustering
- Author
-
Anand, Gaurangi, Nayak, Richi, Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, Yang, Haiqin, editor, Pasupa, Kitsuchart, editor, Leung, Andrew Chi-Sing, editor, Kwok, James T., editor, Chan, Jonathan H., editor, and King, Irwin, editor
- Published
- 2020
- Full Text
- View/download PDF
36. Trade network dynamics in a globalized environment and on the edge of crises.
- Author
-
Kosztyán, Zsolt Tibor, Kiss, Dénes, and Fehérvölgyi, Beáta
- Subjects
- *
TECHNOLOGICAL innovations , *GLOBAL Financial Crisis, 2008-2009 , *CRISES , *ELECTRONIC equipment , *INTERNATIONAL trade , *SUSTAINABILITY - Abstract
At the edges of crises, the roles of trade networks and supply chains are evident. Global trade networks aggregate all legal trade activities. Thus, researchers believe that network structure changes reflect phenomena such as crises, globalization, deglobalization, or even technological development, including those related to green technologies. Although trade network analysis has been widely researched, researchers have focused mainly on one or a few products or product groups. While the impact of crises and technological changes in the case of individual products has been demonstrated in some studies, less attention has been given to structural pattern changes in trade on these products. To understand how a shock, crisis, or technological change in one product may affect the trade network of another product, the patterns and relationships of node- and network-level indicators across products need to be identified. In this study, the dynamics and interdependencies of trade networks are analyzed using a combination of network indicators, cluster analysis and causality analysis. Variations and implications of the temporal patterns and causal relationships of trade network indicators are revealed for different products and product groups. Researchers have argued that the deterioration of structural indicators exacerbates crises and amplifies their consequences. A causality analysis shows that disruptions in one product's supply chain quickly affect the trade of other products, altering the structural characteristics of trade networks. The proposed temporal pattern analysis shows that while the 2008 financial crisis did not cause structural changes in the product trade network, indicators improved before, during, and after the crisis. Nevertheless, before the pandemic, there was a noticeable deterioration in almost all structural indicators, signaling more deglobalization. This study contributes to a comprehensive understanding of trade networks, equipping decision-makers with the knowledge to promote the resilience and sustainability of these networks in an evolving global landscape. [Display omitted] • Frameworks for databases of trade network indicators are proposed. • Nonparametric methods for classifying trade network dynamics are employed. • Causality tests for identifying trade shocks and crises are employed. • Network characteristics and technological changes are compared. • The trade patterns of electronic and military equipment are analyzed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Constructing a Large-Scale Urban Land Subsidence Prediction Method Based on Neural Network Algorithm from the Perspective of Multiple Factors.
- Author
-
Zhou, Dingyi, Zuo, Xiaoqing, and Zhao, Zhifang
- Subjects
- *
LAND subsidence , *HYDROGEOLOGY , *STANDARD deviations , *SOIL structure , *ARTIFICIAL neural networks - Abstract
The existing neural network model in urban land-subsidence prediction is over-reliant on historical subsidence data. It cannot accurately capture or predict the fluctuation in the sequence deformation, while the improper selection of training samples directly affects its final prediction accuracy for large-scale urban land subsidence. In response to the shortcomings of previous urban land-subsidence predictions, a subsidence prediction method based on a neural network algorithm was constructed in this study, from a multi-factorial perspective. Furthermore, the scientific selection of a large range of training samples was controlled using a K-shape clustering algorithm in order to produce this high-precision urban land subsidence prediction method. Specifically, the main urban area of Kunming city was taken as the research object, LiCSBAS technology was adopted to obtain the information on the land-subsidence deformation in the main urban area of Kunming city from 2018–2021, and the relationship between the land subsidence and its influencing factors was revealed through a grey correlation analysis. Hydrogeology, geological structure, fault, groundwater, high-speed railways, and high-rise buildings were selected as the influencing factors. Reliable subsidence training samples were obtained by using the time-series clustering K-shape algorithm. Particle swarm optimization–back propagation (PSO-BP) was constructed from a multi-factorial perspective. Additionally, after the neural network algorithm was employed to predict the urban land subsidence, the fluctuation in the urban land-subsidence sequence deformation was predicted with the LSTM neural network from a multi-factorial perspective. Finally, the large-scale urban land-subsidence prediction was performed. The results demonstrate that the maximum subsidence rate in the main urban area of Kunming reached −30.591 mm ⋅ a − 1 between 2018 and 2021. Moreover, there were four main significant subsidence areas in the whole region, with uneven distribution characteristics along Dianchi: within the range of 200–600 m from large commercial areas and high-rise buildings, within the range of 400–1200 m from the under-construction subway, and within the annual average. The land subsidence tended to occur within the range of 109–117 mm of annual average rainfall. Furthermore, the development of faults destroys the stability of the soil structure and further aggravates the land subsidence. Hydrogeology, geological structure, and groundwater also influence the land subsidence in the main urban area of Kunming. The reliability of the training sample selection can be improved by clustering the subsidence data with the K-shape algorithm, and the constructed multi-factorial PSO-BP method can effectively predict the subsidence rate with a mean squared error (MSE) of 4.820 mm. The prediction accuracy was slightly improved compared to the non-clustered prediction. We used the constructed multi-factorial long short-term memory (LSTM) model to predict the next ten periods of any time-series subsidence data in the three types of cluster data (Cluster 1, Cluster 2, and Cluster 3). The root mean square errors (RMSE) were 0.445, 1.475, and 1.468 mm; the absolute error ranges were 0.007–1.030, 0–3.001, and 0.401–3.679 mm; the errors (mean absolute error, MAE) were 0.319, 1.214, and 1.167 mm, respectively. Their prediction accuracy was significantly improved, and the predictions met the measurement specifications. Overall, the prediction method proposed from the multi-factorial perspective improves large-scale, high-accuracy urban land-subsidence prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Information Granulation-Based Fuzzy Clustering of Time Series.
- Author
-
Guo, Hongyue, Wang, Lidong, Liu, Xiaodong, and Pedrycz, Witold
- Abstract
In this article, we propose a two-stage time-series clustering approach to cluster time series with different shapes. The first step is to represent the time series by a suite of information granules following the principle of justifiable granularity to perform dimensionality reduction, while the second step is to realize the fuzzy clustering of the time series in the transformed representation space (viz., the space of information granules). In the dimensionality reduction process, the numerical data are granulated using a collection of information granules forming a new sequence that can well describe the original time series. Then, when clustering the time series, dynamic time warping (DTW) is employed to measure the similarity between time series and DTW barycenter averaging (DBA) is generalized to weighted DBA to be involved in the fuzzy ${C}$ -means (FCMs) algorithm. Finally, the experiments are conducted on the datasets coming from UCR time-series database and Chinese stocks to demonstrate the effectiveness and advantages of the proposed fuzzy clustering approach. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. Cluster Analysis on Dengue Incidence and Weather Data Using K-Medoids and Fuzzy C-Means Clustering Algorithms (Case Study: Spread of Dengue in the DKI Jakarta Province)
- Author
-
Cindy Cindy, Cynthia Cynthia, Valentino Vito, Devvi Sarwinda, Bevina Desjwiandra Handari, and Gatot Fatwanto Hertono
- Subjects
Dengue ,Dynamic Time Warping distance ,Fuzzy C-Means Clustering ,K-Medoids Clustering ,time-series clustering ,Science ,Science (General) ,Q1-390 - Abstract
In Indonesia, Dengue incidence tends to increase every year but has been fluctuating in recent years. The potential for Dengue outbreaks in DKI Jakarta, the capital city, deserves serious attention. Weather factors are suspected of being associated with the incidence of Dengue in Indonesia. This research used weather and Dengue incidence data for five regions of DKI Jakarta, Indonesia, from December 30, 2008, to January 2, 2017. The study used a clustering approach on time-series and non-time-series data using K-Medoids and Fuzzy C-Means Clustering. The clustering results for the non-time-series data showed a positive correlation between the number of Dengue incidents and both average relative humidity and amount of rainfall. However, Dengue incidence and average temperature were negatively correlated. Moreover, the clustering implementation on the time-series data showed that rainfall patterns most closely resembled those of Dengue incidence. Therefore, rainfall can be used to estimate Dengue incidence. Both results suggest that the government could utilize weather data to predict possible spikes in DHF incidence, especially when entering the rainy season and alert the public to greater probability of a Dengue outbreak.
- Published
- 2022
- Full Text
- View/download PDF
40. Data-driven approach for labelling process plant event data
- Author
-
Debora Corrêa, Adriano Polpo, Michael Small, Shreyas Srikanth, Kylie Hollins, and Melinda Hodkiewicz
- Subjects
time-series clustering ,imbalanced data ,process plant data ,event detection ,labelling strategies ,absence of ground truth data ,Engineering machinery, tools, and implements ,TA213-215 ,Systems engineering ,TA168 - Abstract
An essential requirement in any data analysis is to have a response variable representing the aim of the analysis. Much academic work is based on laboratory or simulated data, where the experiment is controlled, and the ground truth clearly defined. This is seldom the reality for equipment performance in an industrial environment and it is common to find issues with the response variable in industry situations. We discuss this matter using a case study where the problem is to detect an asset event (failure) using data available but for which no ground truth is available from historical records. Our data frame contains measurements of 14 sensors recorded every minute from a process control system and 4 current motors on the asset of interest over a three year period. In this situation the ``how to'' label the event of interest is of fundamental importance. Different labelling strategies will generate different models with direct impact on the in-service fault detection efficacy of the resulting model. We discuss a data-driven approach to label a binary response variable (fault/anomaly detection) and compare it to a rule-based approach. Labelling of the time series was performed using dynamic time warping followed by agglomerative hierarchical clustering to group events with similar event dynamics. Both data sets have significant imbalance with 1,200,000 non-event data but only 150 events in the rule-based data set and 64 events in the data-driven data set. We study the performance of the models based on these two different labelling strategies, treating each data set independently. We describe decisions made in window-size selection, managing imbalance, hyper-parameter tuning, training and test selection, and use two models, logistic regression and random forest for event detection. We estimate useful models for both data sets. By useful, we understand that we could detect events for the first four months in the test set. However as the months progressed the performance of both models deteriorated, with an increasing number of false positives, reflecting possible changes in dynamics of the system. This work raises questions such as ``what are we detecting?'' and ``is there a right way to label?'' and presents a data driven approach to support labelling of historical events in process plant data for event detection in the absence of ground truth data.
- Published
- 2022
- Full Text
- View/download PDF
41. Neutral genetic structuring of pathogen populations during rapid adaptation.
- Author
-
Saubin ME, Stoeckel S, Tellier AE, and Halkett F
- Abstract
Pathogen species are experiencing strong joint demographic and selective events, especially when they adapt to a new host, for example through overcoming plant resistance. Stochasticity in the founding event and the associated demographic variations hinder our understanding of the expected evolutionary trajectories and the genetic structure emerging at both neutral and selected loci. What would be the typical genetic signatures of such a rapid adaptation event is not elucidated. Here, we build a demogenetic model to monitor pathogen population dynamics and genetic evolution on two host compartments (susceptible and resistant). We design our model to fit two plant pathogen life cycles, 'with' and 'without' host alternation. Our aim is to draw a typology of eco-evolutionary dynamics. Using time-series clustering, we identify three main scenarios: 1) small variations in the pathogen population size and small changes in genetic structure, 2) a strong founder event on the resistant host that in turn leads to the emergence of genetic structure on the susceptible host, and 3) evolutionary rescue that results in a strong founder event on the resistant host, preceded by a bot- tleneck on the susceptible host. We pinpoint differences between life cycles with notably more evolutionary rescue 'with' host alternation. Beyond the selective event itself, the demographic trajectory imposes specific changes in the genetic structure of the pathogen population. Most of these genetic changes are transient, with a signature of resistance overcoming that vanishes within a few years only. Considering time-series is therefore of utmost importance to accurately decipher pathogen evolution., (© The Author(s) 2024. Published by Oxford University Press on behalf of The American Genetic Association. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.)
- Published
- 2024
- Full Text
- View/download PDF
42. Time-Series Clustering Based on the Characterization of Segment Typologies.
- Author
-
Guijo-Rubio, David, Duran-Rosal, Antonio Manuel, Gutierrez, Pedro Antonio, Troncoso, Alicia, and Hervas-Martinez, Cesar
- Abstract
Time-series clustering is the process of grouping time series with respect to their similarity or characteristics. Previous approaches usually combine a specific distance measure for time series and a standard clustering method. However, these approaches do not take the similarity of the different subsequences of each time series into account, which can be used to better compare the time-series objects of the dataset. In this article, we propose a novel technique of time-series clustering consisting of two clustering stages. In a first step, a least-squares polynomial segmentation procedure is applied to each time series, which is based on a growing window technique that returns different-length segments. Then, all of the segments are projected into the same dimensional space, based on the coefficients of the model that approximates the segment and a set of statistical features. After mapping, a first hierarchical clustering phase is applied to all mapped segments, returning groups of segments for each time series. These clusters are used to represent all time series in the same dimensional space, after defining another specific mapping process. In a second and final clustering stage, all the time-series objects are grouped. We consider internal clustering quality to automatically adjust the main parameter of the algorithm, which is an error threshold for the segmentation. The results obtained on 84 datasets from the UCR Time Series Classification Archive have been compared against three state-of-the-art methods, showing that the performance of this methodology is very promising, especially on larger datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
43. HierArchical-Grid CluStering Based on DaTA Field in Time-Series and the Influence of the First-Order Partial Derivative Potential Value for the ARIMA-Model
- Author
-
Jinklub, Krid, Geng, Jing, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Gan, Guojun, editor, Li, Bohan, editor, Li, Xue, editor, and Wang, Shuliang, editor
- Published
- 2018
- Full Text
- View/download PDF
44. Clustering River Basins Using Time-Series Data Mining on Hydroelectric Energy Generation
- Author
-
Arslan, Yusuf, Küçük, Dilek, Eren, Sinan, Birturk, Aysenur, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Weikum, Gerhard, Series Editor, Woon, Wei Lee, editor, Aung, Zeyar, editor, Catalina Feliú, Alejandro, editor, and Madnick, Stuart, editor
- Published
- 2018
- Full Text
- View/download PDF
45. Driver Classification Using Self-reported, Psychophysiological, and Performance Metrics Within a Simulated Environment
- Author
-
Kummetha, Vishal Chandra, Durrani, Umair, Mason, Justin, Concas, Sisinnio, and Kondyli, Alexandra
- Published
- 2023
- Full Text
- View/download PDF
46. A Framework for Generalizing Uncertainty in Mobile Network Traffic Prediction
- Author
-
Downey, Alexander Roman
- Subjects
- Network traffic prediction, time-series forecasting, time-series clustering, deep learning
- Abstract
As Next Generation (NextG) networks become more complex, it has become increasingly necessary to utilize more advanced algorithms to enhance the robustness, autonomy, and reliability of existing wireless infrastructure. One such algorithm is network traffic prediction, playing a crucial role in the efficient operation of real-time and near-real-time network management. The contributions of this thesis are twofold. The first introduces a novel cluster-train-predict framework that leverages domain knowledge to identify unique timeseries sub-behaviors within aggregates of network data. This method produces distributions that are more robust towards changes in the spatio-temporal environment. The ensemble of time-series prediction models trained on these distributions posses a greater affinity towards accurate network prediction, selectively employing learned behaviors to handle sources of time-series data without any prior knowledge of it. This approach tends to improve the ability to accurately forecast network traffic volumes. Secondly, this thesis explains the development and implementation of a modular data pipeline to support the cluster-train-predict framework under a variety of conditions. This setup promotes repeatable and comparable results, facilitating rapid iteration and experimentation on current and future research. The results of this thesis surpass traditional approaches in [1] by up to 60%. Furthermore, the effectiveness of this framework is also validated using two additional time-series datasets [2] and [3], demonstrating the ability of this approach to generalize towards other time-series data and machine learning applications in uncertain environments.
- Published
- 2024
47. Time-series data clustering with load-shape preservation for identifying residential energy consumption behaviors.
- Author
-
Kim, Jinwoo, Song, Kwonsik, Lee, Gaang, and Lee, SangHyun
- Abstract
Categorizing residential energy demand patterns is a principal task for demand-side management (DSM) and energy-saving strategies. While deep learning (DL)-based clustering offers a promising alternative to conventional machine learning (ML), DL's advantages and disadvantages over ML still remain unclear in identifying energy demand patterns. Moreover, prevalent DL-based clustering can suffer from catastrophic feature distortion when capturing load-shape information from energy-load data, leading to erroneous pattern identification. To address these issues, we propose integrating a load-shape preservation mechanism into representative DL-based clustering and investigate its effectiveness in categorizing energy demand patterns, compared to existing ML and DL. We experiment and compare the three clustering approaches using one-year residential energy-load data. Results show that the proposed DL, equipped with load-shape preservation, outperformed ML quantitatively and closely aligns with the baseline DL's performance. This is particularly significant considering that the baseline DL prioritizes quantitative enhancements, sometimes compromising load-shape precision. Furthermore, the proposed DL discovered more diverse energy demand patterns than the baseline ML and DL, while producing more human-agreeable results. This finding underscores the pivotal role of load-shape preservation in enhancing data clustering and demand pattern recognition in the real-world. These benefits will facilitate personalized DSM interventions and foster residents' energy-saving behaviors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. A multiple k-means cluster ensemble framework for clustering citation trajectories.
- Author
-
Chakraborty, Joyita, Pradhan, Dinesh K., and Nandi, Subrata
- Subjects
K-means clustering ,TIME series analysis ,CITATION networks ,INDIVIDUAL differences - Abstract
Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window (2-5 years). Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non-linear and non-stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule-based approach. All methods are primarily parameter-dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalized clustering framework is required. This paper proposes a feature-based multiple k-means cluster ensemble framework. Multiple learners are trained for evaluating the credibility of class labels, unlike single clustering algorithms. 195,783 and 41,732 well-cited articles from the Microsoft Academic Graph data are considered for clustering short-term (10-year) and long-term (30-year) trajectories, respectively. It has linear run-time. Four distinct trajectories are obtained – Early Rise-Rapid Decline (ER-RD) (2.2%), Early Rise-Slow Decline (ER-SD) (45%), Delayed Rise-Not yet Declined (DR-ND) (53%), and Delayed Rise-Slow Decline (DR-SD) (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit ER-SD and DR-ND patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories' are re-defined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes. • Introduces an innovative unsupervised Multiple K-Means Cluster Ensemble (MKMCE) framework for citation trajectory clustering. • Algorithmically determines the ideal number of clusters, eliminating the need for predefined labels and providing a data-driven approach. • Proposes a feature set that uniformly defines all trajectory patterns, aiding in accurate and consistent feature analysis. • Addresses and analyzes redundant interpretations of similar trajectories identified as different clusters, contributing to clearer nomenclature in the literature. • Offers practical applications, including the automated creation of a taxonomy based on inter-cluster similarity, facilitating a comprehensive understanding of trajectory patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Evaluation and Comparison of Spatial Clustering for Solar Irradiance Time Series
- Author
-
Luis Garcia-Gutierrez, Cyril Voyant, Gilles Notton, and Javier Almorox
- Subjects
solar irradiation ,data mining ,time-series clustering ,artificial intelligence ,statistics methods ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
This work exposes an innovative clustering method of solar radiation stations, using static and dynamic parameters, based on multi-criteria analysis for future objectives to make the forecasting of the solar resource easier. The innovation relies on a characterization of solar irradiation from both a quantitative point of view and a qualitative one (variability of the intermittent sources). Each of the 76 Spanish stations studied is firstly characterized by static parameters of solar radiation distributions (mean, standard deviation, skewness, and kurtosis) and then by dynamic ones (Hurst exponent and forecastability coefficient, which is a new concept to characterize the “difficulty” to predict the solar radiation intermittence) that are rarely used, or even never used previously, in such a study. A redundancy analysis shows that, among all the explanatory variables used, three are essential and sufficient to characterize the solar irradiation behavior of each site; thus, in accordance with the principle of parsimony, only the mean and the two dynamic parameters are used. Four clustering methods were applied to identify geographical areas with similar solar irradiation characteristics at a half-an-hour time step: hierarchical, k-means, k-medoids, and spectral cluster. The achieved clusters are compared with each other and with an updated Köppen–Geiger climate classification. The relationship between clusters is analyzed according to the Rand and Jaccard Indexes. For both cases (five and three classes), the hierarchical clustering algorithm is the closest to the Köppen classification. An evaluation of the clustering algorithms’ performance shows no interest in implementing k-means and spectral clustering simultaneously since the results are similar by more than 90% for three and five classes. The recommendations for operating a solar radiation clustering are to use k-means or hierarchical clustering based on mean, Hurst exponent, and forecastability parameters.
- Published
- 2022
- Full Text
- View/download PDF
50. Scenario Reduction for Stochastic Day-Ahead Scheduling: A Mixed Autoencoder Based Time-Series Clustering Approach.
- Author
-
Liang, Junkai and Tang, Wenyuan
- Abstract
Scenario based stochastic scheduling has drawn a tremendous amount of interests worldwide in tackling the uncertainty of renewable energy and accounting for risks. It is important to generate representative time-series scenarios of renewable energy, while keeping the dimensionality of the scenario set tractable. This article presents a mixed autoencoder based clustering approach to select a reduced scenario set from high-dimensional time series. In contrast to other techniques targeting on minimizing different probability distances, the proposed architecture accounts for the pattern recognition within a large set of scenarios. The effectiveness of the model is verified in the case studies, where the data sets from the Bonneville Power Administration and Elia are used. The numerical results show that the model outperforms the state of the art, in terms of statistical metrics and through empirical analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.