1,932 results on '"Self organizing maps"'
Search Results
2. use of weighted self-organizing maps to interrogate large seismic data sets.
- Author
-
Meyer, S G, Reading, A M, and Bassom, A P
- Subjects
- *
BIG data , *SELF-organizing maps , *MICROSEISMS , *SIGNAL classification , *AUTOMATIC classification , *MACHINE learning , *DATA mining - Abstract
Modern microseismic monitoring systems can generate extremely large data sets with signals originating from a variety of natural and anthropogenic sources. These data sets may contain multiple signal types that require classification, analysis and interpretation: a considerable task if done manually. Machine learning techniques may be applied to these data sets to expedite and improve such analysis. In this study, we apply an unsupervised technique, the Self-Organizing Map (SOM), to high-volume data recorded by an in-mine microseismic network. This represents a good example of a large seismic data set that contains a wide range of signals, owing to the diversity of source processes occurring within the mine. The signals are quantified by extracting a number of features (temporal and spectral) from the waveforms which are provided as input data for the SOM. We develop and implement a weighted variant of the SOM in which the contributions of various different features to the training of the map are allowed to evolve. The standard and weighted SOMs are applied to the data, and the output maps compared. Both variants are able to separate source types based on the waveform characteristics, allowing for rapid, automatic classification of signals and the ability to find sources with similar waveforms. Fast classification of such signals provides practical benefit by automatically discarding waveforms associated with anthropogenic sources within the mine while seismic signals originating from genuine microseismic events, which constitute a small fraction of all signals, can be prioritized for subsequent processing and analysis. The weighted variant provides an exploratory tool through quantification of the contribution of different features to the clustering process. This helps to optimize the performance of the SOM through the identification of redundant features. Furthermore, those features that are assigned large weights are considered to be more representative of the source generation processes as they contribute more to the cluster separation process. We apply weighted SOMs to data from a mine recorded during two different time periods, corresponding to different stages of the mine development. Changes in feature importance and in the observed distribution of feature values indicate evolving source generation processes and may be used to support investigatory analysis. The weighted SOM therefore represents an effective tool to help manage and investigate large seismic data sets, providing both practical benefit and insight into underlying event mechanisms. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Quality assessment of data discrimination using self-organizing maps.
- Author
-
Mekler A and Schwarz D
- Subjects
- Data Mining methods, Pattern Recognition, Automated standards, Russia, Data Mining standards, Databases, Factual classification, Databases, Factual standards, Meaningful Use standards, Neural Networks, Computer, Quality Assurance, Health Care standards, Research Design standards
- Abstract
Motivation: One of the important aspects of the data classification problem lies in making the most appropriate selection of features. The set of variables should be small and, at the same time, should provide reliable discrimination of the classes. The method for the discriminating power evaluation that enables a comparison between different sets of variables will be useful in the search for the set of variables., Results: A new approach to feature selection is presented. Two methods of evaluation of the data discriminating power of a feature set are suggested. Both of the methods implement self-organizing maps (SOMs) and the newly introduced exponents of the degree of data clusterization on the SOM. The first method is based on the comparison of intraclass and interclass distances on the map. Another method concerns the evaluation of the relative number of best matching unit's (BMUs) nearest neighbors of the same class. Both methods make it possible to evaluate the discriminating power of a feature set in cases when this set provides nonlinear discrimination of the classes., Availability: Current algorithms in program code can be downloaded for free at http://mekler.narod.ru/Science/Articles_support.html, as well as the supporting data files., (Copyright © 2014 Elsevier Inc. All rights reserved.)
- Published
- 2014
- Full Text
- View/download PDF
4. Diagnosis of Induced Resistance State in Tomato Using Artificial Neural Network Models Based on Supervised Self-Organizing Maps and Fluorescence Kinetics.
- Author
-
Pantazi, Xanthoula Eirini, Lagopodi, Anastasia L., Tamouridou, Afroditi Alexandra, Kamou, Nathalie Nephelie, Giannakis, Ioannis, Lagiotis, Georgios, Stavridou, Evangelia, Madesis, Panagiotis, Tziotzios, Georgios, Dolaptsis, Konstantinos, and Moshou, Dimitrios
- Subjects
- *
ARTIFICIAL neural networks , *SELF-organizing maps , *PLANT defenses , *FLUORESCENCE , *TOMATOES - Abstract
The aim of this study was to develop three supervised self-organizing map (SOM) models for the automatic recognition of a systemic resistance state in plants after application of a resistance inducer. The pathosystem Fusarium oxysporum f. sp. radicis-lycopersici (FORL) + tomato was used. The inorganic, defense inducer, Acibenzolar-S-methyl (benzo-[1,2,3]-thiadiazole-7-carbothioic acid-S-methyl ester, ASM), reported to induce expression of defense genes in tomato, was applied to activate the defense mechanisms in the plant. A handheld fluorometer, FluorPen FP 100-MAX-LM by SCI, was used to assess the fluorescence kinetics response of the induced resistance in tomato plants. To achieve recognition of resistance induction, three models of supervised SOMs, namely SKN, XY-F, and CPANN, were used to classify fluorescence kinetics data, in order to determine the induced resistance condition in tomato plants. To achieve this, a parameterization of fluorescence kinetics curves was developed corresponding to fluorometer variables of the Kautsky Curves. SKN was the best supervised SOM, achieving 97.22% to 100% accuracy. Gene expression data were used to confirm the accuracy of the supervised SOMs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Pengelompokan Obyek Wisata Potensial dengan Self Organizing Maps (SOM) dan Sum Additive Weighting (SAW)
- Author
-
Indra Dharma Wijaya, Muhammad Afif Hendrawan, and Nurcahya Nania Anabela
- Subjects
Tourism ,Data Mining ,Clustering ,SOM ,SAW ,Information technology ,T58.5-58.64 - Abstract
Probolinggo Regency is an area in East Java that has tourism potential. The condition is seen from the many tourists visiting various attractions in Probolinggo Regency. To increase the number of tourist visits, it is necessary to develop tourism objects. However, not all attractions in Probolinggo Regency can be developed at the same time. This is due to budget limitations for tourism development. Therefore, it is necessary to have a grouping of attractions according to the priority level of development. In this study, researchers utilized Self Organizing Maps (SOM) and Sum Additive Weighing (SAW) methods to group attractions based on their development priority levels. SOM is used to determine groups of tourist objects based on the parameters of the number of domestic tourists, the number of foreign tourists, infrastructure, and the number of attractions. Furthermore, SAW is used to find out which group has the highest priority among other groups based on these parameters. To measure the quality of the resulting group, researchers used the value of the silhouette coefficient. Results from the grouping process resulted in three groups. Group C1 consists of 4 attractions, group C2 consists of 20 attractions, and group C3 consists of 10 attractions. The value of the silhouette coefficient also holds a good value, especially in group 1, which is 0.75006. Furthermore, based on the ranking of groups by the SAW method, the C1 group is the group of tourist attractions with the highest priority for development.
- Published
- 2023
6. Classification of fragmented pottery with the use of Kohonen self‐organising maps (case study from the Hlyboke Ozero‐2 settlement in Eastern Ukraine).
- Author
-
Korokhina, Anastasiia V.
- Subjects
- *
SELF-organizing maps , *MORPHOLOGY , *PRINCIPAL components analysis , *MACHINE learning , *DATA mining - Abstract
The paper is devoted to testing Kohonen self‐organising maps, with elliptic Fourier coefficients as quantitative variables, for the task of morphological classification of fragmented and non‐standardised ceramics. The advantage of the methodology used is its ability to account for the systematic statistical relationships inherent in the dataset, build models of varying degrees of generalisation and visualise multivariate data. The approbation of the method was carried out on materials from the Hlyboke Ozero‐2 settlement in Eastern Ukraine. The results are compared with the results obtained using principal component analysis + k‐means clustering. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Boosting the Development and Management of Wind Energy: Self-Organizing Map Neural Networks for Clustering Wind Power Outputs.
- Author
-
Li, Yanqian, Zhou, Yanlai, Luo, Yuxuan, Ning, Zhihao, and Xu, Chong-Yu
- Subjects
- *
WIND power , *SELF-organizing maps , *POWER resources , *ELECTRIC power distribution grids , *CLUSTER analysis (Statistics) - Abstract
Aimed at the information loss problem of using discrete indicators in wind power output characteristics analysis, a self-organizing map neural network-based clustering method is proposed in this study. By identifying the appropriate representativeness and topological structure of the competition layer, cluster analysis of the wind power output process in four seasons is realized. The output characteristics are evaluated through multiple evaluation indicators. Taking the wind power output of the Hunan power grid as a case study, the results underscore that the 1 × 3-dimensional competition layer structure had the highest representativeness (72.9%), and the wind power output processes of each season were divided into three categories, with a robust and stable topology structure. Summer and winter were the most representative seasons. Summer had strong volatility and small wind power outputs, which required the utilization of other power sources to balance power supply and load demand. Winter featured low volatility and large wind power outputs, necessitating cooperation with peak-shaving power sources to enhance the power grid's absorbability to wind power. The seasonal clustering analysis of wind power outputs will be helpful to analyze the seasonality of wind power outputs and can provide scientific and technical support for guiding the power grid's operation and management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. A model to estimate the Self-Organizing Maps grid dimension for Prototype Generation.
- Author
-
Silva, Leandro A., de Vasconcelos, Bruno P., and Del-Moral-Hernandez, Emilio
- Subjects
- *
SELF-organizing maps , *GRIDS (Cartography) , *ARTIFICIAL intelligence , *NEAREST neighbor analysis (Statistics) , *PROTOTYPES , *DATA mining - Abstract
Due to the high accuracy of the K nearest neighbor algorithm in different problems, KNN is one of the most important classifiers used in data mining applications and is recognized in the literature as a benchmark algorithm. Despite its high accuracy, KNN has some weaknesses, such as the time taken by the classification process, which is a disadvantage in many problems, particularly in those that involve a large dataset. The literature presents some approaches to reduce the classification time of KNN by selecting only the most important dataset examples. One of these methods is called Prototype Generation (PG) and the idea is to represent the dataset examples in prototypes. Thus, the classification process occurs in two steps; the first is based on prototypes and the second on the examples represented by the nearest prototypes. The main problem of this approach is a lack of definition about the ideal number of prototypes. This study proposes a model that allows the best grid dimension of Self-Organizing Maps and the ideal number of prototypes to be estimated using the number of dataset examples as a parameter. The approach is contrasted with other PG methods from the literature based on artificial intelligence that propose to automatically define the number of prototypes. The main advantage of the proposed method tested here using eighteen public datasets is that it allows a better relationship between a reduced number of prototypes and accuracy, providing a sufficient number that does not degrade KNN classification performance. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Outcome prediction for salivary gland cancer using multivariate adaptative regression splines (MARS) and self-organizing maps (SOM)
- Author
-
Lequerica-Fernández, Paloma, Peña, Ignacio, Iglesias-Rodríguez, Francisco Javier, González-Gutiérrez, Carlos, and De Vicente, Juan Carlos
- Published
- 2020
- Full Text
- View/download PDF
10. Diagnosis of Induced Resistance State in Tomato Using Artificial Neural Network Models Based on Supervised Self-Organizing Maps and Fluorescence Kinetics
- Author
-
Xanthoula Eirini Pantazi, Anastasia L. Lagopodi, Afroditi Alexandra Tamouridou, Nathalie Nephelie Kamou, Ioannis Giannakis, Georgios Lagiotis, Evangelia Stavridou, Panagiotis Madesis, Georgios Tziotzios, Konstantinos Dolaptsis, and Dimitrios Moshou
- Subjects
artificial intelligence ,clustering ,data mining ,gene expression ,plant protection ,Chemical technology ,TP1-1185 - Abstract
The aim of this study was to develop three supervised self-organizing map (SOM) models for the automatic recognition of a systemic resistance state in plants after application of a resistance inducer. The pathosystem Fusarium oxysporum f. sp. radicis-lycopersici (FORL) + tomato was used. The inorganic, defense inducer, Acibenzolar-S-methyl (benzo-[1,2,3]-thiadiazole-7-carbothioic acid-S-methyl ester, ASM), reported to induce expression of defense genes in tomato, was applied to activate the defense mechanisms in the plant. A handheld fluorometer, FluorPen FP 100-MAX-LM by SCI, was used to assess the fluorescence kinetics response of the induced resistance in tomato plants. To achieve recognition of resistance induction, three models of supervised SOMs, namely SKN, XY-F, and CPANN, were used to classify fluorescence kinetics data, in order to determine the induced resistance condition in tomato plants. To achieve this, a parameterization of fluorescence kinetics curves was developed corresponding to fluorometer variables of the Kautsky Curves. SKN was the best supervised SOM, achieving 97.22% to 100% accuracy. Gene expression data were used to confirm the accuracy of the supervised SOMs.
- Published
- 2022
- Full Text
- View/download PDF
11. Visual Data Mining With Self-organizing Maps for “Self-monitoring” Data Analysis.
- Author
-
Oliver, Elia, Vallés-Ṕerez, Iván, Baños, Rosa-María, Cebolla, Ausias, Botella, Cristina, and Soria-Olivas, Emilio
- Subjects
- *
DATA analysis , *SELF-organizing maps , *SELF-organizing systems , *DATA mining , *CHILDHOOD obesity - Abstract
Data collected in psychological studies are mainly characterized by containing a large number of variables (multidimensional data sets). Analyzing multidimensional data can be a difficult task, especially if only classical approaches are used (hypothesis tests, analyses of variance, linear models, etc.). Regarding multidimensional models, visual techniques play an important role because they can show the relationships among variables in a data set. Parallel coordinates and Chernoff faces are good examples of this. This article presents self-organizing maps (SOM), a multivariate visual data mining technique used to provide global visualizations of all the data. This technique is presented as a tutorial with the aim of showing its capabilities, how it works, and how to interpret its results. Specifically, SOM analysis has been applied to analyze the data collected in a study on the efficacy of a cognitive and behavioral treatment (CBT) for childhood obesity. The objective of the CBT was to modify the eating habits and level of physical activity in a sample of children with overweight and obesity. Children were randomized into two treatment conditions: CBT traditional procedure (face-to-face sessions) and CBT supported by a web platform. In order to analyze their progress in the acquisition of healthier habits, self-register techniques were used to record dietary behavior and physical activity. In the traditional CBT condition, children completed the self-register using a paper-and-pencil procedure, while in the web platform condition, participants completed the self-register using an electronic personal digital assistant. Results showed the potential of SOM for analyzing the large amount of data necessary to study the acquisition of new habits in a childhood obesity treatment. Currently, the high prevalence of childhood obesity points to the need to develop strategies to manage a large number of data in order to design procedures adapted to personal characteristics and increase treatment efficacy. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
12. Cuticular Hydrocarbons of Tetramorium Ants from Central Europe: Analysis of GC-MS Data with Self-Organizing Maps (SOM) and Implications for Systematics
- Author
-
Steiner, Florian M., Schlick-Steiner, Birgit C., Nikiforov, Alexej, Kalb, Roland, and Mistrik, Robert
- Published
- 2002
- Full Text
- View/download PDF
13. Data on Data Mining and Knowledge Discovery Described by Researchers at Stanford University (Somtimes: Self Organizing Maps for Time Series Clustering and Its Application To Serious Illness Conversations)
- Subjects
United States. National Science Foundation ,Data mining ,Data warehousing/data mining ,Computers ,Stanford University - Abstract
2023 NOV 28 (VerticalNews) -- By a News Reporter-Staff News Editor at Information Technology Newsweekly -- Investigators publish new report on Information Technology - Data Mining and Knowledge Discovery. According [...]
- Published
- 2023
14. A patent quality analysis and classification system using self-organizing maps with support vector machine.
- Author
-
Wu, Jheng-Long, Chang, Pei-Chann, Tsao, Cheng-Chin, and Fan, Chin-Yuan
- Subjects
SELF-organizing maps ,SUPPORT vector machines ,DATA mining ,MACHINE learning ,DATA analysis ,PRINCIPAL components analysis - Abstract
A plethora of patents are approved by the patent officers each year and current patent systems face a solemn quandary of evaluating these patents’ qualities. Traditional researchers and analyzers have fixated on developing sundry patent quality indicators only, but these indicators do not have further prognosticating power on incipient patent applications or publications. Therefore, the data mining (DM) approaches are employed in this article to identify and to classify the new patent's quality in time. An automatic patent quality analysis and classification system, namely SOM-KPCA-SVM, is developed according to patent quality indicators and characteristics, respectively. First, the self-organizing map (SOM) approach is used to cluster patents published before into different quality groups according to the patent quality indicators and defines group quality type instead of via experts. The kernel principal component analysis (KPCA) approach is used to transform nonlinear feature space in order to improve classification performance. Finally, the support vector machine (SVM) is used to build up the patent quality classification model. The proposed SOM-KPCA-SVM is applied to classify patent quality automatically in patent data of the thin film solar cell. Experimental results show that our proposed system can capture the analysis effectively compared with traditional manpower approach. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
15. Atmospheric Environment and Quality of Life Information Extraction from Twitter with the Use of Self-Organizing Maps.
- Author
-
Riga, M., Stocker, M., Rönkkö, M., Karatzas, K., and Kolehmainen, M.
- Subjects
QUALITY of life ,DATA mining ,SELF-organizing maps ,WEB 2.0 - Abstract
The emergence of Web 2.0 technologies has changed dramatically not only the way users perceive the Internet and interact on it but also the way they influence a community and act in real life aspects. With the rapid rise in use and popularity of social media, people tend to share opinions and observations for almost any subject or event in their everyday life. Consequently, microblogging websites have become a rich data source for user-generated information. The leading opportunity is to take advantage of the wisdom of the crowd and to benefit from collective intelligence in any applicable domain. Towards this direction, we focus on the problem of mining and extracting knowledge from unstructured textual content, for the atmospheric environment domain and its effect to quality of life. As the main contribution, we propose a combined methodology of unsupervised learning methods for analyzing posts from Twitter and clustering textual data into concepts with semantically similar context. By applying Self-Organizing Maps and k-means clustering, we identify possible inter-relationships and patterns of words used in tweets that can form upper concepts of atmospheric and health related topics of discussion. We achieve to group together tweets, from more generic to more specific description levels of their content, according to the selected number of clusters. Strong clusters with significant semantic relatedness among their content are revealed, and hidden relations between concepts and their related semantics are acquired. The results highlight the potential use of social media text streams as a highly-valued supplement source of environmental information and situation awareness. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
16. Implementing in-situ self-organizing maps with memristor crossbar arrays for data mining and optimization.
- Author
-
Wang, Rui, Shi, Tuo, Zhang, Xumeng, Wei, Jinsong, Lu, Jian, Zhu, Jiaxue, Wu, Zuheng, Liu, Qi, and Liu, Ming
- Subjects
ARTIFICIAL neural networks ,DATA mining ,PHYSICAL laws ,TRAVELING salesman problem ,ARTIFICIAL intelligence ,SELF-organizing maps ,MACHINE learning - Abstract
A self-organizing map (SOM) is a powerful unsupervised learning neural network for analyzing high-dimensional data in various applications. However, hardware implementation of SOM is challenging because of the complexity in calculating the similarities and determining neighborhoods. We experimentally demonstrated a memristor-based SOM based on Ta/TaO
x /Pt 1T1R chips for the first time, which has advantages in computing speed, throughput, and energy efficiency compared with the CMOS digital counterpart, by utilizing the topological structure of the array and physical laws for computing without complicated circuits. We employed additional rows in the crossbar arrays and identified the best matching units by directly calculating the similarities between the input vectors and the weight matrix in the hardware. Using the memristor-based SOM, we demonstrated data clustering, image processing and solved the traveling salesman problem with much-improved energy efficiency and computing throughput. The physical implementation of SOM in memristor crossbar arrays extends the capability of memristor-based neuromorphic computing systems in machine learning and artificial intelligence. Self-organizing maps are data mining tools for unsupervised learning algorithms dealing with big data problems. The authors experimentally demonstrate a memristor-based self-organizing map that is more efficient in computing speed and energy consumption for data clustering, image processing and solving optimization problems. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
17. Discover botnets in IoT sensor networks: A lightweight deep learning framework with hybrid self-organizing maps.
- Author
-
Khan, Saad and Mailewa, Akalanka B.
- Subjects
- *
SELF-organizing maps , *DEEP learning , *BOTNETS , *DATA mining , *MACHINE learning , *SENSOR networks , *BLENDED learning - Abstract
In recent years, we have witnessed a massive growth of intrusion attacks targeted at Internet of things (IoT) devices. Due to inherent security vulnerabilities, it has become an easy target for hackers to target these devices. Recent studies have focused on deploying intrusion detection systems at the network's edge within IoT devices to localize threat mitigation and avoid computational expenses. Intrusion detection systems based on machine learning and deep learning algorithm has demonstrated the potential to detect zero-day attacks where traditional signature-based detection falls short. Thus, the purpose of the paper is to present a lightweight and robust deep learning framework for intrusion detection that has computational potential to be efficiently scaled down and deployed as a localized threat detection within IoT devices. The paper's methodology to demonstrate the scalability and threat detection performance is to train and test intrusion datasets such as NSL-KDD (Network Security Laboratory - Knowledge Discovery in Databases) and N-BaIoT (Network-Based Anomaly Internet of Things) to assess anomaly detection performance. In addition, the proposed Hybrid model is compared against a benchmark Artificial Neural Network model. The evaluation metrics are training time, precision, recall, accuracy, and f1-score, along with their macro and weighted averages. Significant findings show a 948% decrease in model training time and a 41.87% increase in f1-score when comparing the proposed Hybrid Self Organizing Maps (HSOM) model with the Artificial Neural Network model. Additionally, scaling down the nodes in the proposed Self Organizing Maps (SOM) model demonstrated a reduction of 955% in training time and a 27% increase in macro averages of precision, recall, and f1-score. A significant implication of this study would be adopting the proposed SOM model as localized IoT threat detection, as the research proves the increase in detection performance after scaling down the model's input and output nodes. The contribution of the research is a scalable and high-performant IoT threat detection framework suited for localized IoT deployment. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. An improved impedance-based damage classification using self-organizing maps
- Author
-
Paulo Roberto de Aguiar, Pedro Oliveira Junior, Doriana M. D’Addona, Fabricio Bapstista, Salvatore Conte, Publica, Junior, Pedro Oliveira, Conte, Salvatore, D’Addona, D. M., Aguiar, Paulo, and Bapstista, Fabricio
- Subjects
Self-organizing map ,0209 industrial biotechnology ,Artificial neural network ,Computer science ,Diagnostic and maintenance, Self-organizing maps, Neural networks, Sensor monitoring, SHM, Electromechanical impedance, Grinding ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Field (computer science) ,Identification (information) ,020901 industrial engineering & automation ,EMI ,General Earth and Planetary Sciences ,Classification methods ,Structural health monitoring ,Data mining ,Electrical impedance ,computer ,0105 earth and related environmental sciences ,General Environmental Science - Abstract
The identification and severity of structural damages, especially in the early stage, is critical in structural health monitoring (SHM) systems. Among several approaches used to accomplish this goal, the electromechanical impedance (EMI) technique has taken place within non-destructive evaluation (NDE) methods. On the other hand, neural networks (NN) based on self-organizing maps (SOM) has been a promising tool in many engineering classification problems. However, there is a gap of application regarding the combination of the EMI technique and SOM NN. To encourage this, an enhanced EMI-based damage classification method using self-organizing features is proposed in the present research paper. A SOM NN architecture was implemented whose inputs were derived from representative features of the impedance signatures. As a result, self-organizing maps can be used as an effective tool to enhance the damage classification in EMI-based SHM applications. For the present application, the results indicated a promising and useful contribution to the grinding field.
- Published
- 2020
19. Urban traffic modeling and pattern detection using online map vendors and self-organizing maps
- Author
-
Ludger Hovestadt, Li Biao, and Zifeng Guo
- Subjects
Self-organizing map ,Data source ,Archeology ,Computer science ,media_common.quotation_subject ,Simulation modeling ,Fidelity ,Building and Construction ,computer.software_genre ,Pipeline (software) ,NA1-9428 ,Urban Studies ,Pattern detection ,Urban management ,Data-driven modeling ,Urban traffic patterns ,Map vendors ,Architecture ,Data mining ,computer ,Data limitations ,media_common - Abstract
Typical traffic modeling approaches, such as network-based methods and simulation models, have been shown inadequate for urban-scale studies due to the fidelity issue of models. As a go-around, data-driven models have received increasing attention recently. However, most data-driven methods have been restricted by their data source and cannot be scaled up to manage urban- and regional-scale studies. Regarding this issue, this research proposes a pipeline that collects traffic data from online map vendors to bypass data limitations for large-scale studies. The study consists of two experiments: 1) recognizing the dominant traffic patterns of cities and 2) site-specific predictions of typical traffic or the most probable locations of patterns of interests. The experiments were conducted on 32 Swiss cities using traffic data that were collected for a two-month period. The results show that dominant patterns can be extracted from the temporal traffic data, and similar patterns exist not only in various parts of a city but also in different cities. Moreover, the results reveal that a country-level lockdown decreased traffic congestions in regional highways but increased those connections near the city centers and the country borders., Frontiers of Architectural Research, 10 (4), ISSN:2095-2635, ISSN:2095-2643
- Published
- 2021
20. Clustering and Classification of Red Wines According to Physical-Chemical Properties Using Data Mining Methods.
- Author
-
Bondarev, N. V.
- Subjects
- *
DATA mining , *SELF-organizing maps , *ITALIAN wines , *SUPPORT vector machines , *MACHINE learning , *RED wines - Abstract
The data on 178 samples of Italian red wines taken from the public machine learning repository UCI have been studied. Computer analysis of 13 physico-chemical properties of the wine samples on their distribution between three groups has been performed via different data mining methods. Classification models: factor, discriminant, canonical, and neural network (multilayer perceptron MLP, Kohonen's map SOFM) ones, predicting models (support vector machine, Bayesian classifier, and nearest neighbor method), and decision trees have been built. The neural network classifiers SOFM 13-3, MLP 13-5-3, and SOFM 16-3 have been trained. It has been found that predicting power of the models is determined by the following variables: proline, flavonoids, color intensity, proteins, and alcohol. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Self-Organizing Maps applied to ecological sciences.
- Author
-
Chon, Tae-Soo
- Subjects
SELF-organizing maps ,ECOLOGY ,DATA analysis ,DATA mining ,VISUALIZATION ,COMPUTER network architectures - Abstract
Abstract: Ecological data are considered to be difficult to analyze because numerous biological and environmental factors are involved in a complex manner in environment–organism relationships. The Self-Organizing Map (SOM) has advantages for information extraction (i.e., without prior knowledge) and the efficiency of presentation (i.e., visualization). It has been implemented broadly in ecological sciences across different hierarchical levels of life. Recent applications of the SOM, which are reviewed here, include the molecular, organism, population, community, and ecosystem scales. Further development of the SOM is discussed regarding network architecture, spatio-temporal patterning, and the presentation of model results in ecological sciences. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
22. Assessment of the water quality of Kłodnica River catchment using self-organizing maps.
- Author
-
Olkowska, Ewa, Kudłak, Błażej, Tsakovski, Stefan, Ruman, Marek, Simeonov, Vasil, and Polkowska, Zaneta
- Subjects
- *
WATER quality , *SELF-organizing maps , *WATERSHEDS , *RISK assessment , *BODIES of water , *HEAT production (Biology) , *ELECTRIC power production , *WATER pollution - Abstract
Abstract: Risk assessment of industrial areas heavily polluted due to anthropogenic actions is of increasing concern worldwide. So is the case of Polish Silesia region where mostly heavy industry like smelters, mining, chemical industries as well as heat and electricity production facilities are being located. Such situation raises numerous questions about environmental state of local water bodies with special attention paid to the Kłodnica Catchment which is receiving waste waters from numerous industrial plants. The efforts have been undertaken to describe the situation (spatial and temporal distribution of pollution) in the area of interest with the help of self-organizing maps – modern non-parametric data mining method – yet still rarely applied in environmental studies where numerous input parameters have to be considered. As a result of studies clear distinction into 3 pollution groups could be obtained as well as the seasonal variation of pollution could be distinguished. [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
23. Self-organizing maps for latent semantic analysis of free-form text in support of public policy analysis.
- Author
-
Till, Bernie C., Longo, Justin, Dobell, A. Rod, and Driessen, Peter F.
- Subjects
- *
SELF-organizing maps , *LATENT semantic analysis , *DATA mining , *BIG data , *WEB 3.0 - Abstract
The huge amount of free-form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the associated commentary. We give a tutorial review of latent semantic analysis and the self-organizing maps, as considered in this context, and show how to apply the self-organizing map over a probabilistic latent semantic space to the problem of completely unsupervised clustering of unstructured text in such a way as to be entirely independent of spelling, grammar, and even source language. This provides an algorithm suitable for clustering free-form commentary with a well-structured test environment. The algorithm is applied to academic paper abstracts instead, treated as unstructured text as though they were blog posts, because this set of documents has a known ground truth. The algorithm constructs a word category map and a document map in which words with similar meaning and documents with similar content are clustered together. WIREs Data Mining Knowl Discov 2014, 4:71-86. doi: 10.1002/widm.1112 Conflict of interest: The authors have declared no conflicts of interest for this article. For further resources related to this article, please visit the WIREs website. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
24. Self-organizing maps with information theoretic learning.
- Author
-
Chalasani, Rakesh and Principe, Jose C.
- Subjects
- *
SELF-organizing maps , *INFORMATION theory , *MACHINE learning , *CLUSTER analysis (Statistics) , *DATA mining - Abstract
The self-organizing map (SOM) is one of the popular clustering and data visualization algorithms and has evolved as a useful tool in pattern recognition, data mining since it was first introduced by Kohonen. However, it is observed that the magnification factor for such mappings deviates from the information-theoretically optimal value of 1 (for the SOM it is 2/3). This can be attributed to the use of the mean square error to adapt the system, which distorts the mapping by oversampling the low probability regions. In this work, we first discuss the kernel SOM in terms of a similarity measure called correntropy induced metric (CIM) and empirically show that this can enhance the magnification of the mapping without much increase in the computational complexity of the algorithm. We also show that adapting the SOM in the CIM sense is equivalent to reducing the localized cross information potential, an information-theoretic function that quantifies the similarity between two probability distributions. Using this property we propose a kernel bandwidth adaptation algorithm for Gaussian kernels, with both homoscedastic and heteroscedastic components. We show that the proposed model can achieve a mapping with optimal magnification and can automatically adapt the parameters of the kernel function. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
25. Application of materials informatics to vapor-grown carbon nanofiber/vinyl ester nanocomposites through self-organizing maps and clustering techniques.
- Author
-
Abuomar, O., Nouranian, S., King, R., and Lacy, T.E.
- Subjects
- *
CARBON nanofibers , *NANOCOMPOSITE materials , *ESTERS , *SELF-organizing maps , *MICROCLUSTERS - Abstract
Graphical abstract Highlights • Data mining was employed to acquire new information on a new VGCNF/VE framework. • Self-organizing maps, PCA, and FCM clustering techniques were applied. • Different features were ordered in terms of their effects on the responses. • Optimal responses' values using certain combination(s) of inputs were determined. • The viscoelastic responses of the VGCNF/VE specimens are the most significant. Abstract Data mining and knowledge discovery techniques were employed herein to acquire new information on the viscoelastic, flexural, compressive, and tensile properties of vapor-grown carbon nanofiber (VGCNF)/vinyl ester (VE) nanocomposites. Formulation and processing factors (curing environment, presence or absence of dispersing agent, mixing method, VGCNF weight fraction, VGCNF type, high-shear mixing time, and sonication time) and testing temperature were utilized as inputs and the true ultimate strength, true yield strength, engineering elastic modulus, engineering ultimate strength, flexural modulus, flexural strength, storage modulus, loss modulus, and tan delta were selected as outputs. The data mining and knowledge discovery algorithms used in this study include self-organizing maps (SOMs) and clustering techniques. SOMs demonstrated that temperature and tan delta had the most significant effects on the output responses followed by the VGCNF high-shear mixing time, and sonication time. SOMs were also used to produce optimal responses using certain combination(s) of inputs. Fuzzy C-means algorithm (FCM) was also applied to discover patterns in the nanocomposite behavior subsequent to a principal component analysis (PCA), which is a dimensionality reduction technique. Utilizing these techniques, the nanocomposite specimens were separated into different clusters based on the testing temperature (30 °C and 120 °C being the most dominant responses), tan delta, high-shear mixing time, and sonication time. Furthermore, the VGCNF/VE specimens were separated into a cluster based on their viscoelastic responses (storage and loss moduli) at the same temperature. The FCM results indicate that, while all nanocomposite properties in the new framework are essential, the viscoelastic responses of the VGCNF/VE specimens are the most significant. This work highlights the utility of data mining and knowledge discovery techniques in the context of materials informatics for the discovery of patterns and trends in the material behavior that are not immediately known. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
26. Advanced visualization of Self-Organizing Maps with vector fields
- Author
-
Pölzlbauer, Georg, Dittenbach, Michael, and Rauber, Andreas
- Subjects
- *
SELF-organizing maps , *DATA mining , *VISUALIZATION , *VECTOR analysis - Abstract
Abstract: Self-Organizing Maps have been applied in various industrial applications and have proven to be a valuable data mining tool. In order to fully benefit from their potential, advanced visualization techniques assist the user in analyzing and interpreting the maps. We propose two new methods for depicting the SOM based on vector fields, namely the Gradient Field and Borderline visualization techniques, to show the clustering structure at various levels of detail. We explain how this method can be used on aggregated parts of the SOM that show which factors contribute to the clustering structure, and show how to use it for finding correlations and dependencies in the underlying data. We provide examples on several artificial and real-world data sets to point out the strengths of our technique, specifically as a means to combine different types of visualizations offering effective multidimensional information visualization of SOMs. [Copyright &y& Elsevier]
- Published
- 2006
- Full Text
- View/download PDF
27. Contributions of data mining for psycho‐educational research: what self‐organizing maps tell us about the well‐being of gifted learners
- Author
-
Thuneberg, Helena and Hotulainen, Risto
- Subjects
- *
SELF-organizing maps , *DATA mining , *COMPREHENSION , *ACHIEVEMENT , *GIFTED persons , *LEARNING , *MOTIVATION (Psychology) , *EDUCATIONAL psychology , *AUTONOMY (Psychology) - Abstract
This article explores applications of the Self‐Organizing Maps method (SOM) to psycho‐educational data. The study examines the psychological well‐being, self‐regulatory and motivational styles of pupils at elementary and middle school (N 795). The presentation of the method appears in cases which are related to general education, special needs and giftedness. The aim of this article is to show that SOM provides a unique means with which to visualize, comprehend and interpret psycho‐educational data. The SOM method is a convenient method used to identify and study exceptional subgroups and non‐linear correlations, as well as to examine theoretical assumptions. The results showed that high academic achievement is related to anxiety, as well as to external and internal pressure, in some gifted subgroups. Such a result is obviously socially constructed and for this reason calls for further study. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
28. Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets.
- Author
-
Lingras, Pawan, Hogo, Mofreh, and Snorek, Miroslav
- Subjects
- *
SELF-organizing maps , *SELF-organizing systems , *ARTIFICIAL neural networks , *DATA mining , *KNOWLEDGE management , *WEB analytics - Abstract
Web usage mining involves application of data mining techniques to discover usage patterns from the web data. Clustering is one of the important functions in web usage mining. The likelihood of bad or incomplete web usage data is higher than the conventional applications. The clusters and associations in web usage mining do not necessarily have crisp boundaries. Researchers have studied the possibility of using fuzzy sets in web mining clustering applications. Recent attempts have adapted the K-means clustering algorithm as well as genetic algorithms based on rough sets to find interval sets of clusters. The genetic algorithms based clustering may not be able to handle large amounts of data. The K-means algorithm does not lend itself well to adaptive clustering. This paper proposes an adaptation of Kohonen self-organizing maps based on the properties of rough sets, to find the interval sets of clusters. Experiments are used to create interval set representations of clusters of web visitors on three educational web sites. The proposed approach has wider applications in other areas of web mining as well as data mining. [ABSTRACT FROM AUTHOR]
- Published
- 2004
29. Evaluation of Energy Distribution Using Network Data Envelopment Analysis and Kohonen Self Organizing Maps
- Author
-
Danilo Pinto Moreira de Souza, Thiago Gomes Leal Ganhadeiro, Eliane da Silva Christo, Kelly Alonso Costa, and Lidia Angulo Meza
- Subjects
Self-organizing map ,data envelopment analysis ,Kohonen self-organizing maps ,factor analysis ,multiple regression ,energy efficiency ,Control and Optimization ,Computer science ,020209 energy ,media_common.quotation_subject ,Energy Engineering and Power Technology ,02 engineering and technology ,computer.software_genre ,lcsh:Technology ,Margin (machine learning) ,Linear regression ,0202 electrical engineering, electronic engineering, information engineering ,Data envelopment analysis ,Electrical and Electronic Engineering ,Dimension (data warehouse) ,Engineering (miscellaneous) ,media_common ,Variables ,Renewable Energy, Sustainability and the Environment ,lcsh:T ,020208 electrical & electronic engineering ,Data mining ,computer ,Energy (miscellaneous) ,Curse of dimensionality - Abstract
This article presents an alternative way of evaluating the efficiency of the electric distribution companies in Brazil. This assessment is currently performed and designed by the National Electric Energy Agency (ANEEL), a Brazilian regulatory agency, to regulate energy prices. This involves calculating the X-factor, which represents the efficiency evolution in the price-cap regulation model. The proposed model aims to use a network Data Envelopment Analysis (DEA) model with the network dimension as an intermediate variable and to use Kohonen Self-Organizing Maps (SOM) to correct the difficulties presented by environmental variables. In order to find which environmental variables influence the efficiency, factor analysis was used to reduce the dimensionality of the model. The analysis still uses multiple regression with the previous efficiency as the dependent variable and the four factors extracted from factor analysis as independent variables. The SOM generated four clusters based on the environment and the efficiency for each distributor in each group. This allows for a better evaluation of the correction in the X-factor, since it can be conducted inside each cluster with a maintained margin for comparison. It is expected that the use of this model will reduce the margin of questioning by distributors about the evaluation.
- Published
- 2018
- Full Text
- View/download PDF
30. Identifying health status of wind turbines by using self organizing maps and interpretation-oriented post-processing tools
- Author
-
Pere Marti-Puig, Karina Gibert, Alejandro Blanco-M., Jordi Cusidó, Jordi Solé-Casals, Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa, Universitat Politècnica de Catalunya. Departament d'Enginyeria de Projectes i de la Construcció, and Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic
- Subjects
Self-organizing map ,Control and Optimization ,interpretation oriented tools ,Process (engineering) ,Computer science ,020209 energy ,Energy Engineering and Power Technology ,Sample (statistics) ,02 engineering and technology ,post- processing ,computer.software_genre ,Turbine ,0202 electrical engineering, electronic engineering, information engineering ,Anàlisi multivariable ,wind farms ,Supervisory Control and Data Acquisition(SCADA) data ,self organizing maps (SOM) ,clustering ,fault diagnosis ,renewable energy ,data science ,Electrical and Electronic Engineering ,Cluster analysis ,Engineering (miscellaneous) ,Class (computer programming) ,Wind power ,Renewable Energy, Sustainability and the Environment ,business.industry ,62 Statistics::62H Multivariate analysis [Classificació AMS] ,Renewable energy ,Multivariate analysis ,Matemàtiques i estadística::Estadística matemàtica [Àrees temàtiques de la UPC] ,020201 artificial intelligence & image processing ,Data mining ,business ,computer ,Energy (miscellaneous) - Abstract
Identifying the health status of wind turbines becomes critical to reduce the impact of failures on generation costs (between 25–35%). This is a time-consuming task since a human expert has to explore turbines individually. Methods: To optimize this process, we present a strategy based on Self Organizing Maps, clustering and a further grouping of turbines based on the centroids of their SOM clusters, generating groups of turbines that have similar behavior for subsystem failure. The human expert can diagnose the wind farm health by the analysis of a small each group sample. By introducing post-processing tools like Class panel graphs and Traffic lights panels, the conceptualization of the clusters is enhanced, providing additional information of what kind of real scenarios the clusters point out contributing to a better diagnosis. Results: The proposed approach has been tested in real wind farms with different characteristics (number of wind turbines, manufacturers, power, type of sensors, ...) and compared with classical clustering. Conclusions: Experimental results show that the states healthy, unhealthy and intermediate have been detected. Besides, the operational modes identified for each wind turbine overcome those obtained with classical clustering techniques capturing the intrinsic stationarity of the data.
- Published
- 2018
31. Gene clustering by using query-based self-organizing maps
- Author
-
Chang, Ray-I, Chu, Chih-Chun, Wu, Yu-Ying, and Chen, Yen-Liang
- Subjects
- *
CLUSTER analysis (Statistics) , *SELF-organizing maps , *QUERYING (Computer science) , *GENE expression , *ARTIFICIAL neural networks , *STOCHASTIC convergence , *BIOINFORMATICS , *DATA mining - Abstract
Abstract: Gene clustering is very important for extracting underlying biological information of gene expression data. Currently, SOM (self-organizing maps) is known as one of the most popular neural networks applied for gene clustering. However, SOM is sensitive to the initialization of neurons’ weights. In this case, biologists may need to spend a lot of time in repeating experiments until they obtain a satisfactory clustering result. In this paper, we apply QBSOM (query-based SOM) to tackle the drawbacks of SOM. We have tested the proposed method by several kinds of real gene expression data. Experimental results show that QBSOM is superior to SOM in not only the time consumed but also the result obtained. Considering the gene clustering result of YF (yeast full) dataset, QBSOM yields 17% less in MSE (mean-square-error) and 68% less in computation cost compared with SOM. Our experiments also indicate that QBSOM is particularly adaptive for clustering high dimensional data such as the gene expression data. It is better than SOM for system convergence. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
32. Automatic cluster identification for environmental applications using the self-organizing maps and a new genetic algorithm.
- Author
-
Oyana, Tonny J. and Dajun Dai
- Subjects
- *
SELF-organizing maps , *ARTIFICIAL neural networks , *GENETIC algorithms , *GEOGRAPHIC information systems , *DATA mining - Abstract
A rapid increase of environmental data dimensionality emphasizes the importance of developing data-driven inductive approaches to geographic analysis. This article uses a loosely coupled strategy to combine the technique of self-organizing maps (SOM) with a new genetic algorithm (GA) for automatic identification of clusters in multidimensional environmental datasets. In the first stage, we employ the well-known classic SOM because it is able to handle the dimensional interactions and capture the number of clusters via visualization; and thus provide extraordinary insights into original data. In the second stage, this new GA rigorously delineates the cluster boundaries using a flexibly oriented elliptical search window. To test this approach, one synthetic and two real-world datasets are employed. The results confirm a more robust and reliable approach that provides a better understanding and interpretation of massive multivariate environmental datasets, thus maximizing our insights. Other key benefits include the fact that it provides a computationally fast and efficient environment to accurately detect clusters, and is highly flexible. In a nutshell, the article presents a computational approach to facilitate knowledge discovery of massive multivariate environmental datasets; as we are too familiar with their accelerating growth rate. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
33. Visual dynamic model based on self-organizing maps for supervision and fault detection in industrial processes
- Author
-
Fuertes, Juan J., Domínguez, Manuel, Reguera, Perfecto, Prada, Miguel A., Díaz, Ignacio, and Cuadrado, Abel A.
- Subjects
- *
SELF-organizing maps , *DATA mining , *FACTORY design & construction , *CLUSTER analysis (Statistics) , *SUPERVISION , *MATHEMATICAL variables , *MATHEMATICAL models - Abstract
Abstract: Visual data mining techniques have experienced a growing interest for processing and interpretation of the large amounts of multidimensional data available in current industrial processes. One of the approaches to visualize data is based on self-organizing maps (SOM), which define a projection of the input space onto a 2D or 3D space that can be used to obtain visual representations. Although these techniques have been usually applied to visualize static relations among the process variables, they have proven to be very useful to display dynamic features of the processes. In this work, an approach based on the SOM to model the dynamics of multivariable processes is presented. The proposed method identifies the process conditions (clusters) and the probabilities of transition among them, using the trajectory followed by the input data on the 2D visualization space. Furthermore, a new method of residual computation for fault detection and identification that uses the dynamic information provided by the model of transitions is proposed. The proposed method for modeling and fault identification has been applied to supervise a real industrial plant and the results are included. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
34. Unsupervised feature selection and general pattern discovery using Self-Organizing Maps for gaining insights into the nature of seismic wavefields
- Author
-
Köhler, Andreas, Ohrnberger, Matthias, and Scherbaum, Frank
- Subjects
- *
SELF-organizing maps , *MACHINE learning , *SEISMIC networks , *ELECTRONICS in earth sciences , *EARTHQUAKE zones , *SEISMIC waves , *DATA visualization , *RAYLEIGH waves - Abstract
Abstract: This study presents an unsupervised feature selection and learning approach for the discovery and intuitive imaging of significant temporal patterns in seismic single-station or network recordings. For this purpose, the data are parametrized by real-valued feature vectors for short time windows using standard analysis tools for seismic data, such as frequency-wavenumber, polarization, and spectral analysis. We use Self-Organizing Maps (SOMs) for a data-driven feature selection, visualization and clustering procedure, which is in particular suitable for high-dimensional data sets. Our feature selection method is based on significance testing using the Wald–Wolfowitz runs test for individual features and on correlation hunting with SOMs in feature subsets. Using synthetics composed of Rayleigh and Love waves and real-world data, we show the robustness and the improved discriminative power of that approach compared to feature subsets manually selected from individual wavefield parametrization methods. Furthermore, the capability of the clustering and visualization techniques to investigate the discrimination of wave phases is shown by means of synthetic waveforms and regional earthquake recordings. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
35. THE COMBINED USE OF SELF-ORGANIZING MAPS AND ANDREWS' CURVES.
- Author
-
GARCÍA-OSORIO, C. and FYFE, C.
- Subjects
- *
SELF-organizing maps , *ARTIFICIAL neural networks , *ARTIFICIAL intelligence , *VISUALIZATION , *DATA analysis - Abstract
The use of self-organizing maps to analyze data often depends on finding effective methods to visualize the SOM's structure. In this paper we propose a new way to perform that visualization using a variant of Andrews' Curves. Also we show that the interaction between these two methods allows us to find sub-clusters within identified clusters. Perhaps more importantly, using the SOM to pre-process data by identifying gross features enables us to use Andrews' Curves on data sets which would have previously been too large for the methodology. Finally we show how a three way interaction between the human user and these two methods can be a valuable exploratory data analysis tool. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF
36. Two-level Clustering of Web Sites Using Self-Organizing Maps
- Author
-
Petrilis, Dimitris and Halatsis, Constantin
- Published
- 2008
- Full Text
- View/download PDF
37. Data mining in actuarial performance indicators using self-organizing maps
- Author
-
Schmid, Stephanie
- Subjects
Lebensversicherung ,Selbstorganisierende Karten ,visualising ,Neugesch��ftsmarge ,New Business Margin ,Data Mining ,Visualisierung ,Life Insurance ,K��nstliche Neuronale Netze ,Artificial Neural Networks ,Datenverarbeitung ,Self-Organizing Maps - Abstract
In dieser Arbeit wenden wir eine Data Mining Methode k��nstlicher neuronaler Netze auf einen Datensatz ��ber existierende Neugesch��ftsvertr��ge in der Lebensversicherung an. Ziel dabei ist es, den Performance Indikator der Neugesch��ftsmarge (new business margin) besser zu verstehen und diesen besser vorhersagen zu k��nnen, ohne die internen stochastischen Modelle zu verwenden. Nachdem k��nstliche neuronale Netzt allgemein erkl��rt werden, gehen wir n��her auf die in der Anwendung verwendeten Self-Organizing Maps ein und erkl��ren diese mit Hilfe ihres Vorg��nges, der Vektorquantisierung. Au��erdem wird darauf eingegangen, wir Data Mining in Gesch��ftsprozessen allgemein und im Speziellen in der Lebensversicherung angewendet werden kann. Die Daten bestehen aus Indikatoren ��ber innerhalb eines Monats abgeschlossene Vertr��ge der Allianz Elementar Lebensversicherungs-AG. Die Resultate zeigen, dass Self-Organizing Maps die vorhandenen Daten gruppieren k��nnen und dass eine verst��ndliche Visualisierung m��glich ist. Sie unterst��tzen die Annahme von stark nichtlinearen Verbindungen zwischen der New Business Margin und den dazu f��hrenden Daten. Aufgrund von zeitaufw��ndigen Verarbeitungsschritten vor Anwendung der neuronalen Netze wird eine regelm����ige Verwendung der in dieser Arbeit durchgef��hrten Berechnungen angewendet auf diese Daten abgeraten., Data Mining becomes a vital aspect in data analysis and clustering is a potential tool of Data Mining. In this work we apply the Data Mining method of an Artificial Neural Net, namely Self-Organizing Maps, on a dataset containing information about new business contracts. We explain neural nets in general and how Data Mining can be applied in insurance companies. By means of the classical vector quantisation process we explain the algorithm of Self-Organizing Maps and its parameters. We then apply the algorithm to a data sheet provided by the actuarial life department of Allianz Elementar Lebensversicherungs-AG to get a better insight into parameters affecting the new business margin, going beyond the already widely performed analyses. The outcomes show clear evidence that Self-Organizing Maps can cluster this data into individual groups. Though the results support the assumption of a highly non-linear correlation between the new business margin and the parameters leading to it, only a small amount of data can be used for analyses. A suggestion on how to set parameters leading to a more stable, higher new business margin is made nevertheless. Because of limitations concerning data, software and time, Self-Organizing Maps are not the ideal solution for analysing this kind of data. Especially due to the required time-consuming, manual pre- and postprocessing it is not recommended to use the methods presented in this work on a regular basis on this particular data sheet.
- Published
- 2017
- Full Text
- View/download PDF
38. Relative characterization of rosemary samples according to their geographical origins using microwave-accelerated distillation, solid-phase microextraction and Kohonen self-organizing maps
- Author
-
Tigrine-Kordjani, N., Chemat, F., Meklati, B. Y., Tuduri, L., Giraudel, J. L., and Montury, M.
- Published
- 2007
- Full Text
- View/download PDF
39. Explicit Magnification Control of Self-Organizing Maps for "Forbidden" Data.
- Author
-
Merényi, Erzsébet, Jain, Abha, and Villmann, Thomas
- Subjects
- *
SELF-organizing maps , *SELF-organizing systems , *ARTIFICIAL neural networks , *ALGORITHMS , *DATA mining , *DATA analysis - Abstract
In this paper, we examine the scope of validity of the explicit self-organizing map (SOM) magnification control scheme of Bauer et al. (1996) on data for which the theory does not guarantee success, namely data that are n-dimensional, n ≥ 2, and whose components in the different dimensions are not statistically independent. The Bauer et al. algorithm is very attractive for the possibility of faithful representation of the probability density function (pdf) of a data manifold, or for discovery of rare events, among other properties. Since theoretically unsupported data of higher dimensionality and higher complexity would benefit most from the power of explicit magnification control, we conduct systematic simulations on ‘forbidden’ data. For the unsupported n = 2 cases that we investigate, the simulations show that even though the magnification exponent αachieved achieved by magnification control is not the same as the desired αdesired, αachieved systematically follows αdesired with a slowly increasing positive offset. We show that for simple synthetic higher dimensional data information, theoretically optimum pdf matching (αachieved = 1) can be achieved, and that negative magnification has the desired effect of improving the detectability of rare classes. In addition, we further study theoretically unsupported cases with real data. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
40. Data mining for household water consumption analysis using self-organizing maps
- Author
-
Ioannou, Alexandra, Kofinas, Dimitris, Spyropoulou, Alexandra, and Laspidou, Chrysi
- Subjects
self-organizing maps ,Data mining ,water consumption analysis - Abstract
Household water consumption is a part of the human related water cycle that can get into the core of water resources management. Analysis of water consumption data can reveal great potentials of individualized water services planning. Data mining is the process of identifying and extracting potentially useful information from data sets. Self-Organizing Maps (SOMs) is a data mining technique that involves an unsupervised learning method to analyze, cluster, and model various types of large data sets. In this paper, it is presented how the daily water consumption of a household in Sosnowiec, Poland, can be clustered into days of the week, through some features. The features used to discretize the days of water consumption are statistic metrics and time zone consumption metrics. The time zoning is realized in two ways, the first being the typical morning, noon, afternoon, evening and night and the second considering the local working hour time zones of three main working sectors, banks, offices and shops. We use the SOM algorithm in three approaches. In each approach, we use some of the selected features. We have managed to get some clusters with specific features that divide the days of this household in weekdays and weekends.
- Published
- 2017
- Full Text
- View/download PDF
41. Interpretable interval type-2 fuzzy predicates for data clustering: A new automatic generation method based on self-organizing maps
- Author
-
Gustavo J. Meschino, Virginia L. Ballarin, Juan Ignacio Pastore, Diego S. Comas, and Agustina Bouchet
- Subjects
Clustering high-dimensional data ,Self-organizing map ,Information Systems and Management ,Fuzzy clustering ,Computer science ,Single-linkage clustering ,Correlation clustering ,Fuzzy set ,Conceptual clustering ,02 engineering and technology ,INGENIERÍAS Y TECNOLOGÍAS ,INTERPRETABLE CLUSTERING ,computer.software_genre ,KNOWLEDGE DISCOVERY ,Fuzzy logic ,Management Information Systems ,Artificial Intelligence ,CURE data clustering algorithm ,020204 information systems ,Consensus clustering ,FUZZY PREDICATES ,0202 electrical engineering, electronic engineering, information engineering ,INTERVAL TYPE-2 FUZZY LOGIC ,Cluster analysis ,k-medians clustering ,Ingeniería Eléctrica, Ingeniería Electrónica e Ingeniería de la Información ,Ingeniería de Sistemas y Comunicaciones ,Brown clustering ,business.industry ,Novelty ,Pattern recognition ,SELF-ORGANIZING MAPS ,ComputingMethodologies_PATTERNRECOGNITION ,Compact space ,Data stream clustering ,Canopy clustering algorithm ,FLAME clustering ,Affinity propagation ,020201 artificial intelligence & image processing ,Artificial intelligence ,Data mining ,business ,computer ,Software ,Membership function - Abstract
In previous works, we proposed two methods for data clustering based on automatically discovered fuzzy predicates which were referred to as SOM-based Fuzzy Predicate Clustering (SFPC) [Meschino et al., Neurocomputing, 147, 47–59 (2015)] and Type-2 Data-based Fuzzy Predicate Clustering (T2-DFPC) [Comas et al., Expert Syst. Appl., 68, 136–150 (2017)]. In such methods, fuzzy predicates allow both data clustering and knowledge discovering about the obtained clusters. This last feature constitutes novelty comparing to other existing approaches and it is a major contribution in the data clustering field. Based on these previous methods, in the present paper a new automatic clustering method based on fuzzy predicates is proposed which uses Self-Organizing Maps (SOMs) and is called Type-2 SOM-based Fuzzy Predicate Clustering (T2-SFPC). The new method does not require any prior knowledge about the clustering addressed. First, a random partition is defined on the dataset to be clustered and SOMs are configured and trained using the resulting data subsets. Second, an automatic clustering approach is applied on the SOM codebooks, discovering representative data of the different clusters, which are called cluster prototypes. Third, interval type-2 membership function formed by Gaussian-shape sub-functions and fuzzy predicates are defined, allowing data clustering and its interpretation. The proposed method preserves all the advantages of the previous methods SFPC and T2-DFPC in relation to the knowledge extraction capabilities and their potential application on distributed clustering and parallel computing, but results obtained on several public datasets tested showed more compactness and separation of the clusters defined by the T2-SFPC, outperforming both the previous methods and the several classical clustering approaches tested, considering internal and external validation indices. Additionally, both clustering interpretation and optimization capabilities are improved by the proposed method when compared to the methods SFPC and T2-DFPC. Fil: Comas, Diego Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica.; Argentina Fil: Pastore, Juan Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica.; Argentina Fil: Bouchet, Agustina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica.; Argentina Fil: Ballarin, Virginia Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica.; Argentina Fil: Meschino, Gustavo Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Instituto de Investigaciones Científicas y Tecnológicas en Electrónica.; Argentina. Universidad Nacional de Mar del Plata. Facultad de Ingeniería. Departamento de Ingeniería Eléctrica. Laboratorio de Bioingeniería; Argentina
- Published
- 2017
42. Comparison of substructural epitopes in enzyme active sites using self-organizing maps
- Author
-
Kupas, Katrin, Ultsch, Alfred, and Klebe, Gerhard
- Published
- 2004
- Full Text
- View/download PDF
43. Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis.
- Author
-
Gorzalczany, Marian B. and Rudzinski, Filip
- Subjects
- *
SELF-organizing maps , *CLUSTER analysis (Statistics) , *GENERALIZATION - Abstract
This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain—during learning—to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network—working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)—to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
44. Restoration of Hydrological Data in the Presence of Missing Data via Kohonen Self Organizing Maps
- Author
-
Sobri Harun, Siti Mariyam Shamsuddin, and Marlinda Abdul Malek
- Subjects
Engineering ,Data collection ,business.industry ,Kohonen self organizing map ,computer.software_genre ,Missing data ,Terminology ,Chart ,Current practice ,Data logger ,Data mining ,Imputation (statistics) ,business ,computer - Abstract
The Malaysia National Network system utilises three methods of rainfall data collection, namely manual, chart recording and data logger method. These methods are simultaneously used at most rainfall stations. This leads to, where occurrence of missing data exists, the possibilities of missing data taken shape of three predictable patterns. The missing data patterns identified are either in the form of missing data from one recording method or two recording methods or all three recording methods. It is also noted that, where data is available, there are prevalent measurement inconsistencies between the three methods, even though all apparatus are placed at the same rainfall station. Through data exploration exercise, it is found that the discrepancy between one method of measurement and another may range between 0% - 100%, indicating a relatively unstable data to be replied upon. The current practice to resolve the problem of missing data from one recording method is to substitute the missing data with the remaining recorded data. Similarly, if data from any two of the three recording methods are missing then the available data from the third method is used as a reference. In statistical terminology, this method of substitution is referred as "Hot-deck Imputation". While easily applied, the obvious drawbacks of this method, is the fact that it is not supported by any scientific rationale and it cannot be applied when data is not available from all three recording methods.
- Published
- 2021
45. Stall margin evaluation and data mining based multi-objective optimization design of casing treatment for an axial compressor rotor.
- Author
-
Chi, Zhidong, Chu, Wuli, Zhang, Haoguang, and Zhang, Ziyun
- Subjects
DATA mining ,OPTIMIZATION algorithms ,AXIAL flow compressors ,SELF-organizing maps ,COMPRESSORS ,NUMERICAL calculations - Abstract
Casing treatment is an effective passive technology for improving the compressor stability. However, the current design methods for the casing treatment rely excessively on trial and error experiences, presenting significant challenges to actual engineering applications. In this paper, we propose a multi-objective optimization design method based on stall margin evaluation and data mining to enhance the stability of axial compressor rotors. We have developed a multi-objective optimization platform that combines geometric parameterization, mesh generation, numerical calculations, optimization algorithms, and other relevant components. To optimize six design variables and two objective functions, we have implemented two optimization strategies based on direct stall margin calculation and stall margin evaluation. The optimization results revealed that optimal casing treatment structures can be obtained by considering both compressor stability and efficiency. Furthermore, we employed data mining of self-organizing maps to explain the tradeoffs from the optimal solutions. The aerodynamic analysis demonstrated that the casing treatment enhances stability by restricting negative axial momentum of tip leakage flow and reducing passage blockage. Four categories of stall margin evaluation parameters were quantified, and their effectiveness was assessed through a correlation analysis. Finally, we used the axial momentum of the tip leakage flow-related evaluation parameter for the optimization of stall margin evaluation. Compared with direct stall margin calculation-based optimization, the evaluation of the parameter-based optimization method effectively predicted the stability enhancement of casing treatment while revealing the optimal geometric features. It suggests that the stall margin evaluation-based optimization method should be utilized in the initial optimization process of casing treatment due to its advantages in the optimization speed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. Analyzing International Travelers' Profile with Self-Organizing Maps.
- Author
-
Gang Li, Law, Rob, and Jinlong Wang
- Subjects
- *
SELF-organizing maps , *TOURISM , *TOURIST maps , *TRAVEL , *MARKET segmentation , *MAPS - Abstract
It is generally agreed that knowledge is the most valuable asset to an organization. Knowledge enables a business to effectively compete with its competitors. In the tourism context, an in-depth knowledge of the profile of international travelers to a destination has become a crucial factor for decision makers to formulate their business strategies and better serve their customers. In this research, a self-organizing map (SOM) network was used for segmenting international travelers to Hong Kong, a major travel destination in Asia. An association rules discovery algorithm is then utilized to automatically characterize the profile of each segment. The resulting maps serve as a visual analysis tool for tourism managers to better understand the characteristics, motivations, and behaviors of international travelers. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
47. Data-driven Homogeneous Pavement Groups—Soft Versus Hard Clustering
- Author
-
Mukhtarli, Kanan, Nik-Bakht, Mazdak, and Amador-Jimenez, Luis
- Published
- 2023
- Full Text
- View/download PDF
48. Ranked Centroid Projection: A Data Visualization Approach With Self-Organizing Maps.
- Author
-
Yen, Gary G. and Zheng Wu
- Subjects
- *
ARTIFICIAL neural networks , *SELF-organizing maps , *TEXT mining , *VISUAL programming languages (Computer science) , *CONTENT mining , *SELF-organizing systems , *VECTOR analysis , *DATA mining , *ENCODING - Abstract
The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
49. Dynamic self organizing maps for discovery and sharing of knowledge in multi agent systems.
- Author
-
Wickramasinghe, L. K. and Alahakoon, L. D.
- Subjects
- *
DATA mining , *SELF-organizing maps , *ARTIFICIAL neural networks , *SELF-organizing systems , *DATABASE searching - Abstract
This paper presents a multi agent system which provides a method for cooperation and coordination among agents using collective reasoning. There are two types of agents in the system: distributed agents and a central administrator agent. The distributed agents provide local knowledge extraction tasks and communicate it to the central administrator agent. The central administrator agent builds a global picture about the problem domain using the local knowledge of each agent. It uses the Growing Self Organizing Map (GSOM), which is a dynamic version of the Self Organizing Map (SOM) for analyzing the global knowledge. The system differs from similar systems which employ central administrator functionality as the central administrator agent in the proposed system is capable of autonomously refining the task model of the distributed agents and dynamically determining the efficient number of agents required in the system. [ABSTRACT FROM AUTHOR]
- Published
- 2005
50. How to make large self-organizing maps for nonvectorial data
- Author
-
Kohonen, Teuvo and Somervuo, Panu
- Subjects
- *
SELF-organizing maps , *DATABASE management - Abstract
The self-organizing map (SOM) represents an open set of input samples by a topologically organized, finite set of models. In this paper, a new version of the SOM is used for the clustering, organization, and visualization of a large database of symbol sequences (viz. protein sequences). This method combines two principles: the batch computing version of the SOM, and computation of the generalized median of symbol strings. [Copyright &y& Elsevier]
- Published
- 2002
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.