Author: "Murilo Coelho Naldi" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Murilo Coelho Naldi"' showing total 48 results

Start Over Author "Murilo Coelho Naldi"

48 results on '"Murilo Coelho Naldi"'

1. Optimization Algorithms for Scalable Stream Batch Clustering with k Estimation

Author: Paulo Gustavo Lopes Cândido, Jonathan Andrade Silva, Elaine Ribeiro Faria, and Murilo Coelho Naldi
Subjects: machine learning, clustering, data stream, massive parallel computation, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: The increasing volume and velocity of the continuously generated data (data stream) challenge machine learning algorithms, which must evolve to fit real-world problems. The data stream clustering algorithms face issues such as the rapidly increasing volume of the data, the variety of the number of clusters, and their shapes. The present work aims to improve the accuracy of sequential clustering batches of data streams for scenarios in which clusters evolve dynamically and continuously, automatically estimating their number. In order to achieve this goal, three evolutionary algorithms are presented, along with three novel algorithms designed to deal with clusters of normal distribution based on goodness-of-fit tests in the context of scalable batch stream clustering with automatic estimation of the number of clusters. All of them are developed on top of MapReduce, Discretized-Stream models, and the most recent MPC frameworks to provide scalability, reliability, resilience, and flexibility. The proposed algorithms are experimentally compared with state-of-the-art methods and present the best results for accuracy for normally distributed data sets, reaching their goal.
Published: 2022
Full Text: View/download PDF

2. Automatic identification of charcoal origin based on deep learning

Author: Ricardo Rodrigues de Oliveira Neto, Larissa Ferreira Rodrigues, João Fernando Mari, Murilo Coelho Naldi, Emerson Gomes Milagres, Benedito Rocha Vital, Angélica de Cássia Oliveira Carneiro, Daniel Henrique Breda Binoti, Pablo Falco Lopes, and Helio Garcia Leite
Subjects: Charcoal, classification, deep learning, native wood, preprocessing, Forestry, SD1-669.5, Manufactures, TS1-2301
Abstract: The differentiation between the charcoal produced from (Eucalyptus) plantations and native forests is essential to control, commercialization, and supervision of its production in Brazil. The main contribution of this study is to identify the charcoal origin using macroscopic images and Deep Learning Algorithm. We applied a Convolutional Neural Network (CNN) using VGG-16 architecture, with preprocessing based on contrast enhancement and data augmentation with rotation over the training set images. on the performance of the CNN with fine-tuning using 360 macroscopic charcoal images from the plantation and native forests. The results pointed out that our method provides new perspectives to identify the charcoal origin, achieving results upper 95 % of mean accuracy to classify charcoal from native forests for all compared preprocessing strategies.
Published: 2021
Full Text: View/download PDF

3. Efficient Density-Based Models for Multiple Machine Learning Solutions over Large Datasets.

Author: Natanael F. Dacioli Batista, Bruno Leonel Nunes, and Murilo Coelho Naldi
Published: 2023
Full Text: View/download PDF

4. ESIREOS: Efficient, Scalable, Internal, Relative Evaluation of Outliers Solutions.

Author: William A. Alves, Henrique O. Marques, Murilo Coelho Naldi, and Jörg Sander 0001
Published: 2023
Full Text: View/download PDF

5. CORE-SG: Efficient Computation of Multiple MSTs for Density-Based Methods.

Author: Antônio Cavalcante Araújo Neto, Murilo Coelho Naldi, Ricardo J. G. B. Campello, and Jörg Sander 0001
Published: 2022
Full Text: View/download PDF

6. N-BEATS-RNN: deep learning for time series forecasting.

Author: Attilio Sbrana, André Luis Debiaso Rossi, and Murilo Coelho Naldi
Published: 2020
Full Text: View/download PDF

7. Scalable Batch Stream Clustering with k Estimation.

Author: Paulo G. L. Candido, Jonathan de Andrade Silva, Elaine R. Faria, and Murilo Coelho Naldi
Published: 2018
Full Text: View/download PDF

8. Scalable Data Stream Clustering with k Estimation.

Author: Paulo G. L. Candido, Murilo Coelho Naldi, Jonathan de Andrade Silva, and Elaine R. Faria
Published: 2017
Full Text: View/download PDF

9. Online Detection of Outliers in Clusters of Continuous Data Streaming.

Author: Mariana Alves Pereira, Elaine Ribeiro de Faria Paiva, and Murilo Coelho Naldi
Published: 2017
Full Text: View/download PDF

10. Exploiting Convolutional Neural Networks and Preprocessing Techniques for HEp-2 Cell Classification in Immunofluorescence Images.

Author: Larissa Ferreira Rodrigues, Murilo Coelho Naldi, and João Fernando Mari
Published: 2017
Full Text: View/download PDF

11. Multiple Parallel MapReduce k-Means Clustering with Validation and Selection.

Author: Kemilly Dearo Garcia and Murilo Coelho Naldi
Published: 2014
Full Text: View/download PDF

12. Distributed K-Means Clustering with Low Transmission Cost.

Author: Murilo Coelho Naldi and Ricardo José Gabrielli Barreto Campello
Published: 2013
Full Text: View/download PDF

13. Combining Information from Distributed Evolutionary k-Means.

Author: Murilo Coelho Naldi and Ricardo José Gabrielli Barreto Campello
Published: 2012
Full Text: View/download PDF

14. Comparison Among Methods for k Estimation in k-means.

Author: Murilo Coelho Naldi, Andre Fontana, and Ricardo J. G. B. Campello
Published: 2009
Full Text: View/download PDF

15. Evolutionary Fuzzy Clustering: An Overview and Efficiency Issues.

Author: Danilo Horta, Murilo Coelho Naldi, Ricardo José Gabrielli Barreto Campello, Eduardo R. Hruschka, and André Carlos Ponce de Leon Ferreira de Carvalho
Published: 2009
Full Text: View/download PDF

16. Genetic Clustering for Data Mining.

Author: Murilo Coelho Naldi, André Carlos Ponce de Leon Ferreira de Carvalho, Ricardo José Gabrielli Barreto Campello, and Eduardo R. Hruschka
Published: 2008
Full Text: View/download PDF

17. A review and comparative analysis of coarsening algorithms on bipartite networks

Author: Alan Valejo, Murilo Coelho Naldi, Wellington de Oliveira dos Santos, and Liang Zhao
Subjects: REDES COMPLEXAS, Computer science, Dimensionality reduction, media_common.quotation_subject, General Physics and Astronomy, Visualization, Set (abstract data type), Resource (project management), Bipartite graph, General Materials Science, Relevance (information retrieval), Quality (business), Physical and Theoretical Chemistry, Cluster analysis, Algorithm, media_common
Abstract: Coarsening algorithms have been successfully used as a powerful strategy to deal with data-intensive machine learning problems defined in bipartite networks, such as clustering, dimensionality reduction, and visualization. Their main goal is to build informative simplifications of the original network at different levels of details. Despite its widespread relevance, a comparative analysis of these algorithms and performance evaluation is needed. Additionally, some aspects of these algorithms’ current versions have not been explored in their original or complementary studies. In that regard, we strive to fill this gap, presenting a formal and illustrative description of coarsening algorithms developed for bipartite networks. Afterward, we illustrate the usage of these algorithms in a set of emblematic problems. Finally, we evaluate and quantify their accuracy using quality and runtime measures in a set of thousands of synthetic and real-world networks with various properties and structures. The presented empirical analysis provides evidence to assess the strengths and shortcomings of such algorithms. Our study is a unified and useful resource that provides guidelines to researchers interested in learning about and applying these algorithms.
Published: 2021

18. Clustering using genetic algorithm combining validation criteria.

Author: Murilo Coelho Naldi and André Carlos Ponce de Leon Ferreira de Carvalho
Published: 2007

19. Hierarchical Density-Based Clustering Using MapReduce

Author: Joelson Antonio dos Santos, Joerg Sander, Talat Iqbal Syed, Ricardo J. G. B. Campello, and Murilo Coelho Naldi
Subjects: Information Systems and Management, Computational complexity theory, Computer science, 02 engineering and technology, computer.software_genre, MINERAÇÃO DE DADOS, 01 natural sciences, Automatic summarization, Hierarchical clustering, Data modeling, 010104 statistics & probability, Exploratory data analysis, Scalability, 0202 electrical engineering, electronic engineering, information engineering, Programming paradigm, 020201 artificial intelligence & image processing, Data mining, 0101 mathematics, Cluster analysis, computer, Information Systems
Abstract: Hierarchical density-based clustering is a powerful tool for exploratory data analysis, which can play an important role in the understanding and organization of datasets. However, its applicability to large datasets is limited because the computational complexity of hierarchical clustering methods has a quadratic lower bound in the number of objects to be clustered. MapReduce is a popular programming model to speed up data mining and machine learning algorithms operating on large, possibly distributed datasets. In the literature, there have been attempts to parallelize algorithms such as Single-Linkage, which in principle can also be extended to the broader scope of hierarchical density-based clustering, but hierarchical clustering algorithms are inherently difficult to parallelize with MapReduce. In this paper, we discuss why adapting previous approaches to parallelize Single-Linkage clustering using MapReduce leads to very inefficient solutions when one wants to compute density-based clustering hierarchies. Preliminarily, we discuss one such solution, which is based on an exact, yet very computationally demanding, random blocks parallelization scheme. To be able to efficiently apply hierarchical density-based clustering to large datasets using MapReduce, we then propose a different parallelization scheme that computes an approximate clustering hierarchy based on a much faster, recursive sampling approach. This approach is based on HDBSCAN*, the state-of-the-art hierarchical density-based clustering algorithm, combined with a data summarization technique called data bubbles. The proposed method is evaluated in terms of both runtime and quality of the approximation on a number of datasets, showing its effectiveness and scalability.
Published: 2021

20. ANÁLISE DE AGRUPAMENTO PARA APRIMORAR A EXTRAÇÃO AUTOMÁTICA DE DEMONSTRATIVOS FINANCEIROS COM ESTUDO DE ESCALABILIDADE

Author: Igor Raphael Magollo, Gabriel Olivato, Victor Vieira Ferraz, and Murilo Coelho Naldi
Published: 2022

21. Automatic identification of charcoal origin based on deep learning

Author: Benedito Rocha Vital, Emerson Gomes Milagres, Pablo Falco Lopes, João Fernando Mari, Larissa Ferreira Rodrigues, Daniel Henrique Breda Binoti, Ricardo Rodrigues de Oliveira Neto, Helio Garcia Leite, Angélica de Cássia Oliveira Carneiro, and Murilo Coelho Naldi
Subjects: Contrast enhancement, Computer science, Materials Science (miscellaneous), Manufactures, Convolutional neural network, native wood, TS1-2301, Industrial and Manufacturing Engineering, Native forest, Chemical Engineering (miscellaneous), Preprocessor, preprocessing, Charcoal, Training set, business.industry, Deep learning, deep learning, Forestry, Pattern recognition, SD1-669.5, Identification (information), classification, visual_art, visual_art.visual_art_medium, Artificial intelligence, business, charcoal
Abstract: The differentiation between the charcoal produced from (Eucalyptus) plantations and native forests is essential to control, commercialization, and supervision of its production in Brazil. The main contribution of this study is to identify the charcoal origin using macroscopic images and Deep Learning Algorithm. We applied a Convolutional Neural Network (CNN) using VGG-16 architecture, with preprocessing based on contrast enhancement and data augmentation with rotation over the training set images. on the performance of the CNN with fine-tuning using 360 macroscopic charcoal images from the plantation and native forests. The results pointed out that our method provides new perspectives to identify the charcoal origin, achieving results upper 95 % of mean accuracy to classify charcoal from native forests for all compared preprocessing strategies.
Published: 2021

22. Labor Accidents in Brazil: a Descriptive Analysis

Author: Daniela Giacomelli, Elaine R. Faria, and Murilo Coelho Naldi
Subjects: Descriptive statistics, Socioeconomics
Abstract: Labor accidents cause several misfortunes, such as inconvenience to the injured ones, loss of laborproductivity, and public spending on aid and accident compensation. This work aims to search and characterize groupsof labor accidents, granting interpretability to the obtained results, to extract information that can be relevant to publicmanagers. The method proposed in this work consists of the following steps: data pre-processing; the applicationof two hierarchical clustering algorithms, HDBSCAN * and COBWEB; the evaluation of results using the SimplifiedSilhouette. The research demonstrated the susceptibility of male workers, focused on ages between 18 and 34 years old,with labor accidents that caused injuries on the fingers, by handling machines and equipment or manual tools, followedby those activities such as fishing. Considering clusters majorly composed by female victims, those related to work incellulose, paper, and related products stand out. Moreover, fingers are the most affected part, featured for incidentscaused by the handling of chemical, biological, or hand tools.
Published: 2020

23. Improving k-means through distributed scalable metaheuristics

Author: F.P. Coutinho, Ricardo J. G. B. Campello, Murilo Coelho Naldi, and G.V. Oliveira
Subjects: Theoretical computer science, Computer science, Cognitive Neuroscience, Evolutionary algorithm, k-means clustering, 02 engineering and technology, computer.software_genre, 01 natural sciences, Computer Science Applications, 010104 statistics & probability, ALGORITMOS GENÉTICOS, Artificial Intelligence, Distributed algorithm, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, 0101 mathematics, Cluster analysis, computer, Metaheuristic
Abstract: The recent growing size of datasets requires scalability of data mining algorithms, such as clustering algorithms. The MapReduce programing model provides the scalability needed, alongside with portability as well as automatic data safety and management. k-means is one of the most popular algorithms in data mining and can be easily adapted to the MapReduce model. Nevertheless, k-means has drawbacks, such as the need to provide the number of clusters (k) in advance and the sensitivity of the algorithm to the initial cluster prototypes. This paper presents two evolutionary scalable metaheuristics in MapReduce that automatically seek the solution with the optimal number of clusters and best clustering structure for scalable datasets. The first consists in an algorithm able to iteratively enhance k-means clusterings through evolutionary operators designed to handle distributed data. The second consists in applying evolutionary k-means to cluster each distributed portion of a dataset in an independent way, combining the obtained results into an ensemble afterwards. The proposed techniques are compared asymptotically and experimentally with other state-of-the-art clustering algorithms also developed in MapReduce. The results are analyzed by statistical tests and show that the first proposed metaheuristic yielded results with the best quality, while the second achieved the best computing times.
Published: 2017

24. Comparing convolutional neural networks and preprocessing techniques for HEp-2 cell classification in immunofluorescence images

Author: João Fernando Mari, Larissa Ferreira Rodrigues, and Murilo Coelho Naldi
Subjects: 0301 basic medicine, Computer science, Cytological Techniques, Fluorescent Antibody Technique, Health Informatics, Convolutional neural network, Standard procedure, 03 medical and health sciences, 0302 clinical medicine, Cell Line, Tumor, Image Processing, Computer-Assisted, Preprocessor, Humans, computer.programming_language, Hyperparameter, business.industry, Pattern recognition, Cellular Structures, Computer Science Applications, Identification (information), 030104 developmental biology, Scratch, Hyperparameter optimization, Hep 2 cell, Artificial intelligence, Neural Networks, Computer, business, computer, 030217 neurology & neurosurgery, Algorithms
Abstract: Autoimmune diseases are the third highest cause of mortality in the world, and the identification of an anti-nuclear antibody via an immunofluorescence test for HEp-2 cells is a standard procedure to support diagnosis. In this work, we assess the performance of six preprocessing strategies and five state-of-the-art convolutional neural network architectures for the classification of HEp-2 cells. We also evaluate enhancement methods such as hyperparameter optimization, data augmentation, and fine-tuning training strategies. All experiments were validated using a five-fold cross-validation procedure over the training and test sets. In terms of accuracy, the best result was achieved by training the Inception-V3 model from scratch, without preprocessing and using data augmentation (98.28%). The results suggest the conclusions that most CNNs perform better on non-preprocessed images when trained from scratch on the analyzed dataset, and that data augmentation can improve the results from all models. Although fine-tuning training did not improve the accuracy compared to training the CNNs from scratch, it successfully reduced the training time.
Published: 2019

25. Combining semantic and term frequency similarities for text clustering

Author: Ricardo J. G. B. Campello, Murilo Coelho Naldi, Victor Hugo Andrade Soares, Evangelos E. Milios, and Seyednaser Nourashrafeddin
Subjects: Computer science, 02 engineering and technology, Similarity measure, computer.software_genre, DESCOBERTA DE CONHECIMENTO, Semantic similarity, Similarity (network science), Artificial Intelligence, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, Cluster analysis, Statistical hypothesis testing, Measure (data warehouse), business.industry, Document clustering, Human-Computer Interaction, ComputingMethodologies_PATTERNRECOGNITION, Hardware and Architecture, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, Artificial intelligence, business, computer, Software, Word (computer architecture), Natural language processing, Information Systems
Abstract: A key challenge for document clustering consists in finding a proper similarity measure for text documents that enables the generation of cohesive groups. Measures based on the classic bag-of-words model take into account solely the presence (and frequency) of words in documents. In doing so, semantically similar documents which use different vocabularies may end up in different clusters. For this reason, semantic similarity measures that use external knowledge, such as word n-gram corpora or thesauri, have been proposed in the literature. In this paper, the Frequency Google Tri-gram Measure is proposed to assess similarity between documents based on the frequencies of terms in the compared documents as well as the Google n-gram corpus as an additional semantic similarity source. Clustering algorithms are applied to several real datasets in order to experimentally evaluate the quality of the clusters obtained with the proposed measure and compare it with a number of state-of-the-art measures from the literature. The experimental results demonstrate that the proposed measure improves significantly the quality of document clustering, based on statistical tests. We further demonstrate that clustering results combining bag-of-words and semantic similarity are superior to those obtained with either approach independently.
Published: 2019

26. Scalable Batch Stream Clustering with k Estimation

Author: Jonathan de Andrade Silva, Elaine R. Faria, Paulo L. Candido, and Murilo Coelho Naldi
Subjects: Concept drift, Discretization, Distributed database, Data stream mining, Computer science, 02 engineering and technology, computer.software_genre, Evolutionary computation, Data stream clustering, 020204 information systems, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Data mining, Cluster analysis, Streaming algorithm, computer
Abstract: Approaches that combine streaming algorithms and distributed computing have potential to deal with voluminous and high-speed data streams. Considering the data stream clustering task, also an important issue needs to be addressed, estimate the number of clusters dynamically, since it may vary due to concept drift. This work proposes three evolutionary-based algorithms to overcome these requirements. They are based in the discretized stream model, where the sequential batches of objects are distributed and processed in a parallel way using the MapReduce model. The proposed algorithms achieve superior experimental results, either in quality and processing time, overcoming the state-of-the-art.
Published: 2018

27. Comparison of distributed evolutionary k-means clustering algorithms

Author: Ricardo J. G. B. Campello and Murilo Coelho Naldi
Subjects: Clustering high-dimensional data, Fuzzy clustering, Theoretical computer science, Computer science, Cognitive Neuroscience, Correlation clustering, k-means clustering, Constrained clustering, computer.software_genre, INTELIGÊNCIA ARTIFICIAL, Computer Science Applications, Data stream clustering, Artificial Intelligence, CURE data clustering algorithm, Canopy clustering algorithm, Affinity propagation, Data mining, Cluster analysis, computer, Algorithm
Abstract: Dealing with distributed data is one of the challenges for clustering, as most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable, and easily modifiable to a variety of contexts and application domains. However, exact distributed versions of k-means are still sensitive to the selection of the initial cluster prototypes and require the number of clusters to be specified in advance. Additionally, preserving data privacy among repositories may be a complicating factor. In order to overcome k-means limitations, two different approaches were adopted in this paper: the first obtains a final model identical to the centralized version of the clustering algorithm and the second generates and selects clusters for each distributed data subset and combines them afterwards. It is also described how to apply the algorithms compared while preserving data privacy. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses, and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The results obtained indicate which algorithm is more suitable for each application scenario.
Published: 2015

28. HEp-2 Cell Image Classification Based on Convolutional Neural Networks

Author: Murilo Coelho Naldi, Joo Fernando Mari, and Larissa Ferreira Rodrigues
Subjects: Identification (information), Contextual image classification, business.industry, Computer science, Hyper parameters, Convergence (routing), Hep 2 cell, Subtraction, Pattern recognition, Artificial intelligence, business, Convolutional neural network
Abstract: Autoimmune diseases are the third cause of mortality in the world. A conventional method to support the diagnosis of Autoimmune diseases is the identification of anti-nuclear antibody (ANA) via Immunofluorescence (IIF) test in human epithelial type-2 cells (HEp-2). In the present work, a new evaluation of the Convolutional Neural Networks (CNNs) LeNet-5, AlexNet, and GoogLeNet is made for such task. Here, new validation techniques and a variety of CNNs' hyper-parameters values are considered. We also assess several pre-processing strategies in order to evaluate these CNNs. Moreover, our work presents an analysis of optimization of training hyper-parameters, which can affect the convergence of cost function, the learning speed and the classification performance. Our best results were achieved by GoogLeNet architecture trained with images with contrast stretching and average subtraction resulting in 95.53% of accuracy, with initial learning rate in 0.001 and gamma factor in 0.5.
Published: 2017

29. Scalable Data Stream Clustering with k Estimation

Author: Jonathan de Andrade Silva, Paulo L. Candido, Elaine R. Faria, and Murilo Coelho Naldi
Subjects: Data stream, Computer science, business.industry, Big data, 02 engineering and technology, computer.software_genre, Electronic mail, Data modeling, Data stream clustering, 020204 information systems, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Algorithm design, Data mining, business, Cluster analysis, computer
Abstract: The constant increasing of the generated data in real time has been creating new challenges for machine learning tasks and for one of their main branches: data clustering. The scenario of big data stream (high-speed data stream) has become reality. In order to deal with this scenario, new approaches are required. In this work, we present new techniques based on three clustering fields: Data Stream, MapReduce and automatic estimation of k from data. The goal is to cluster a high-speed data stream with varying number of clusters. Two scalable algorithms are proposed, based on centralized data stream algorithms. The first is based on the StreamKM++ and the second based on the F-EAC, an evolutionary algorithm used for batch clustering purpose. Results achieved the same high quality as the original centralized versions of the algorithms. The proposed techniques can be used instead the centralized ones when the velocity/volume is so high to fit in a centralized system.
Published: 2017

30. Exploiting Convolutional Neural Networks and Preprocessing Techniques for HEp-2 Cell Classification in Immunofluorescence Images

Author: João Fernando Mari, Larissa Ferreira Rodrigues, and Murilo Coelho Naldi
Subjects: Contrast enhancement, Training set, Artificial neural network, Computer science, business.industry, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, 030218 nuclear medicine & medical imaging, 03 medical and health sciences, Identification (information), 0302 clinical medicine, Histogram, 0202 electrical engineering, electronic engineering, information engineering, Hep 2 cell, Preprocessor, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer
Abstract: Autoimmune diseases are the third cause of mortality in the world. The identification of anti-nuclear antibody (ANA) via Immunofluorescence (IIF) test in human epithelial type-2 cells (HEp-2) is a conventional method to support the diagnosis of such diseases. In the present work, three popular Convolutional Neural Networks (CNNs) are evaluated for this task: LeNet-5, AlexNet, and GoogLeNet. We also assess the impact of six different pre-processing strategies on the performance of these CNNs. Additionally, data augmentation based on the rotation of the training set images after the pre-processing strategies was evaluated. Our work is the first to consider AlexNet and GoogLeNet models for the proposed analysis and classification of HEp-2 cells images, besides the LeNet-5. Experimental results allow to conclude that neither pre-processing strategies were essential to improve accuracy values of the CNNs. However, when data augmentation is considered, contrast enhancement followed by data centralization is significant in order to achieve good results. Additionally, our results were compared with results from other state-of-art papers. Our best results were achieved by GoogLeNet architecture trained with images with no pre-processing and no data augmentation, resulting in 98.17% of accuracy, which outperforms the results presented in other works in literature.
Published: 2017

31. Online Detection of Outliers in Clusters of Continuous Data Streaming

Author: Elaine Ribeiro de Faria Paiva, Mariana Alves Pereira, and Murilo Coelho Naldi
Subjects: Data stream, Similarity (geometry), Computer science, 02 engineering and technology, computer.software_genre, Electronic mail, Data stream clustering, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Anomaly detection, Algorithm design, Data mining, Cluster analysis, computer, Auxiliary memory
Abstract: The proposal behind this article is the treatment and detection of outliers in the online phase of the data stream clustering algorithms. The main contribution of our proposal is the use of an auxiliary memory for storing the new stream objects that have not been inserted into any micro-cluster of the clustering model, as they do not hold sufficient similarity. From time to time, the auxiliary memory is verified, clustering their objects together, validating the micro-clusters formed by inliers and inserting them into the model. All the remaining objects that have not been validated are kept in the auxiliary memory until they become valid or obsolete. Then, obsolete objects are removed from it. This paper also proposes CluStreamOD, an improvement of the CluStream clustering algorithm, which deals with outliers using the proposed approach. The performed experiments show the effectiveness of CluStreamOD in detecting and dealing online with outliers from the stream, when compared to CluStream, and the potentiality of the proposed approach to be used in other micro-cluster based data stream algorithms.
Published: 2017

32. Evolutionary k-means for distributed data sets

Author: Ricardo J. G. B. Campello and Murilo Coelho Naldi
Subjects: Fuzzy clustering, Theoretical computer science, Computer science, Cognitive Neuroscience, Constrained clustering, k-means clustering, Evolutionary algorithm, computer.software_genre, Computer Science Applications, Data set, Artificial Intelligence, Distributed algorithm, Scalability, Data mining, Cluster analysis, computer, Statistical hypothesis testing
Abstract: One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. Although distributed versions of k-means have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires the number of clusters to be specified in advance. In this paper, we propose the use of evolutionary algorithms to overcome the k-means limitations and, at the same time, to deal with distributed data. Two different distribution approaches are adopted: the first obtains a final model identical to the centralized version of the clustering algorithm; the second generates and selects clusters for each distributed data subset and combines them afterwards. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The obtained results indicate which variant is more adequate for each application scenario.
Published: 2014

33. Comparação entre abordagens escaláveis para o processamento de conjuntos de dados textuais

Author: Gustavo de Paula Avelar, Murilo Coelho Naldi, Murilo Coelho Naldi, Universidade Federal de Viçosa - Campus Rio Paranaíba, Universidade Federal de Viçosa - Campus Rio Paranaíba, Instituto de Ciências Exatas e Técnológicas, CNPq, FAPEMIG, and FUNARBIC/FUNARBE
Subjects: General Computer Science
Abstract: DataAnalytics eumconceitovoltadoaanalisedegrandesquantidades de dados em busca de padroes e informacoes relevantes. A manipulacao desses da- dos e complexa e exige metodos automaticos capazes de processar grandes volumes de dados exigindo poder computacional para obtencao de informacoes em tempo ha- bil. O modelo de programacao MapReduce surgiu para auxiliar a distribuicao desses problemas entre varias maquinas, melhorando a eficiencia em seu processamento. As plataformas Apache Hadoop e Spark possibilitam a utilizacao deste paradigma em ambientes de hardware commodities . O agrupamento de dados tem como objetivo determinar um conjunto finito de categorias para descrever um conjunto de dados de acordo com as caracteristicas similares dos objetos do conjunto de dados. Diferen- tes estrategias para pre-processamento influenciam os resultados da etapa de agrupa- mento de dados. Deste modo, este trabalho trata do estudo de diferentes metodos de pre-processamento de documentos textuais, visando alcancar representacoes que pro- porcionem bons resultados a etapa de agrupamento. Nele, propomos uma abordagem para selecao de atributos embasado no algoritmo Latent Dirichlet Allocation (LDA).
Published: 2017

34. Cluster ensemble selection based on relative validity indexes

Author: Ricardo J. G. B. Campello, Murilo Coelho Naldi, and André C. P. L. F. de Carvalho
Subjects: Computer Networks and Communications, media_common.quotation_subject, computer.software_genre, INTELIGÊNCIA ARTIFICIAL, Computer Science Applications, Set (abstract data type), Data quality, Cluster (physics), Quality (business), Data mining, Cluster analysis, computer, Selection (genetic algorithm), Information Systems, Statistical hypothesis testing, Mathematics, Relative validity, media_common
Abstract: Cluster ensemble aims at producing high quality data partitions by combining a set of different partitions produced from the same data. Diversity and quality are claimed to be critical for the selection of the partitions to be combined. To enhance these characteristics, methods can be applied to evaluate and select a subset of the partitions that provide ensemble results similar or better than those based on the full set of partitions. Previous studies have shown that this selection can significantly improve the quality of the final partitions. For such, an appropriate evaluation of the candidate partitions to be combined must be performed. In this work, several methods to evaluate and select partitions are investigated, most of them based on relative clustering validity indexes. These indexes select the partitions with the highest quality to participate in the ensemble. However, each relative index can be more suitable for particular data conformations. Thus, distinct relative indexes are combined to create a final evaluation that tends to be robust to changes in the application scenario, as the majority of the combined indexes may compensate the poor performance of some individual indexes. We also investigate the impact of the diversity among partitions used for the ensemble. A comparative evaluation of results obtained from an extensive collection of experiments involving state-of-the-art methods and statistical tests is presented. Based on the obtained results, a practical design approach is proposed to support cluster ensemble selection. This approach was successfully applied to real public domain data sets.
Published: 2012

35. Efficiency issues of evolutionary k-means

Author: André C. P. L. F. de Carvalho, Murilo Coelho Naldi, Eduardo R. Hruschka, and Ricardo J. G. B. Campello
Subjects: Simple (abstract algebra), Computer science, Scalability, k-means clustering, Evolutionary algorithm, Initialization, Data mining, Cluster analysis, computer.software_genre, INTELIGÊNCIA ARTIFICIAL, computer, Software, Evolutionary programming
Abstract: One of the top ten most influential data mining algorithms, k-means, is known for being simple and scalable. However, it is sensitive to initialization of prototypes and requires that the number of clusters be specified in advance. This paper shows that evolutionary techniques conceived to guide the application of k-means can be more computationally efficient than systematic (i.e., repetitive) approaches that try to get around the above-mentioned drawbacks by repeatedly running the algorithm from different configurations for the number of clusters and initial positions of prototypes. To do so, a modified version of a (k-means based) fast evolutionary algorithm for clustering is employed. Theoretical complexity analyses for the systematic and evolutionary algorithms under interest are provided. Computational experiments and statistical analyses of the results are presented for artificial and text mining data sets.
Published: 2011

36. Uma Revisão Sobre Combinação de Agrupamentos

Author: Murilo Coelho Naldi, Katti Faceli, André C. P. L. F. de Carvalho, and Fapesp, Cnpq
Subjects: Ciência da Computação, Combinação de Agrupamentos, General Computer Science
Abstract: Vários algoritmos de agrupamentos foram propostos na literatura. O uso de diferentes algoritmos de agrupamento, ou até mesmo de um único algoritmo, pode obter diferentes resultados quando aplicados em um mesmo conjunto de dados. A combinação de resultados, obtidos de uma técnica de classificação ou de técnicas distintas, é utilizada com sucesso para melhorar a estabilidade ou desempenho dessas técnicas. Por isto, nos últimos anos houve um aumento crescente no interesse do uso de combinação de agrupamentos de dados. Neste trabalho, é feita uma revisão sobre os principais métodos de combinação de agrupamentos encontrados na literatura. Para isso, a revisão começa com uma descrição do problema de combinação e uma análise dos objetivos comumente adotados por métodos de combinação. Em seguida, discorre-se sobre a necessidade da diversidade nos agrupamentos a serem combinados e métodos para medi-la. Também é definido um critério para medir a informação mútua entre agrupamentos e são apresentados exemplos de seu uso. O desempenho dos métodos foi comparado por vários autores na literatura e uma análise dessas comparações é realizada neste trabalho.
Published: 2010

37. Técnicas de combinação para agrupamento centralizado e distribuído de dados

Author: Murilo Coelho Naldi, Ricardo José Gabrielli Barreto Campello, Francisco de Assis Tenório de Carvalho, Maria do Carmo Nicoletti, Solange Oliveira Rezende, and Fernando José von Zuben
Abstract: A grande quantidade de dados gerada em diversas áreas do conhecimento cria a necessidade do desenvolvimento de técnicas de mineração de dados cada vez mais eficientes e eficazes. Técnicas de agrupamento têm sido utilizadas com sucesso em várias áreas, especialmente naquelas em que não há conhecimento prévio sobre a organização dos dados. Contudo, a utilização de diferentes algoritmos de agrupamento, ou variações de um mesmo algoritmo, pode gerar uma ampla variedade de resultados. Tamanha variedade cria a necessidade de métodos para avaliar e selecionar bons resultados. Uma forma de avaliar esses resultados consiste em utilizar índices de validação de agrupamentos. Entretanto, uma grande diversidade de índices de validação foi proposta na literatura, o que torna a escolha de um único índice de validação uma tarefa penosa caso os desempenhos dos índices comparados sejam desconhecidos para a classe de problemas de interesse. Com a finalidade de obter um consenso entre resultados, é possível combinar um conjunto de agrupamentos ou índices de validação em uma única solução final. Combinações de agrupamentos (clustering ensembles) foram bem sucedidas em obter soluções robustas a variações no cenário de aplicação, o que faz do uso de comitês de agrupamentos uma alternativa interessante para encontrar soluções de qualidade razoável, segundo diferentes índices de validação. Adicionalmente, utilizar uma combinação de índices de validação pode tornar a avaliação de agrupamentos mais completa, uma vez que uma maioria dos índices combinados pode compensar o fraco desempenho do restante. Em alguns casos, não é possível lidar com um único conjunto de dados centralizado, por razões físicas ou questões de privacidade, o que gera a necessidade de distribuir o processo de mineração. Combinações de agrupamentos também podem ser estendidas para problemas de agrupamento de dados distribuídos, uma vez que informações sobre os dados, oriundas de diferentes fontes, podem ser combinadas em uma única solução global. O principal objetivo desse trabalho consiste em investigar técnicas de combinação de agrupamentos e de índices de validação aplicadas na seleção de agrupamentos para combinação e na mineração distribuída de dados. Adicionalmente, algoritmos evolutivos de agrupamento são estudados com a finalidade de selecionar soluções de qualidade dentre os resultados obtidos. As técnicas desenvolvidas possuem complexidade computacional reduzida e escalabilidade, o que permite sua aplicação em grandes conjuntos de dados ou cenários em que os dados encontram-se distribuídos The large amount of data resulting from different areas of knowledge creates the need for development of data mining techniques increasingly efficient and effective. Clustering techniques have been successfully applied to several areas, especially when there is no prior knowledge about the data organization. Nevertheless, the use of different clustering algorithms, or variations of the same algorithm, can generate a wide variety of results, what raises the need to create methods to assess and select good results. One way to evaluate these results consists on using cluster validation indexes. However, a wide variety of validation indexes was proposed in the literature, which can make choosing a single index challenging if the performance of the compared indexes is unknown for the application scenario. In order to obtain a consensus among different options, a set of clustering results or validation indexes can be combined into a single final solution. Clustering ensembles successfully obtained results robust to variations in the application scenario, which makes them an attractive alternative to find solutions of reasonable quality, according to different validation indexes. Moreover, using a combination of validation indexes can promote a more powerful evaluation, as the majority of the combined indexes can compensate the poor performance of individual indexes. In some cases, it is not possible to work with a single centralized data set, for physical reasons or privacy concerns, which creates the need to distribute the mining process. Clustering ensembles can be extended to distributed data mining problems, since information about the data from distributed sources can be combined into a single global solution. The main objective of this research resides in investigating combination techniques for validation indexes and clustering results applied to clustering ensemble selection and distributed clustering. Additionally, evolutionary clustering algorithms are studied to select quality solutions among the obtained results. The techniques developed have scalability and reduced computational complexity, allowing their usage in large data sets or scenarios with distributed data
Published: 2015

38. Comparação de Desempenho entre Ambientes Distribuídos Virtualizados na Mineração de Dados

Author: Joelson Antonio dos Santos and Murilo Coelho Naldi
Abstract: Atualmente, grandes quantidades de dados são um desafio e causam a necessidade de distribuição e gerenciamento de grandes conjuntos de dados em repositórios separados. Novos sistemas distribuídos foram desenvolvidos para escalonar de um único servidor para centenas de máquinas. Sistemas como o Apache Hadoop e Apache Mahout são flexíveis e confiáveis, possibilitando o suporte à técnicas de Mineração de Dados. Aliada à esses sistemas, a Virtualização é um mecanismo importante para o desenvolvimento de sistemas estáveis e econˆomicos para que sejam passíveis de analise de grandes quantidades de dados. Atualmente, existem diversos softwares de Virtualização consolidados no mercado como VMware, Virtualbox e Xen, dentre outros. Entretanto, é preciso escolher qual software de Virtualização atende com maior eficiência as necessidades de cenários de aplicações reais ou simuladas. Técnicas de avaliação de desempenho são importantes para avaliar de forma mais precisa as vantagens e desvantagens de cada software de Virtualização. O principal objetivo deste trabalho consiste em desenvolver ambientes virtuais e distribuídos sobre os virtualizadores Virtualbox, VMware Player e Xen que sejam capazes de suportar as plataformas Apache Hadoop e Apache Mahout. O desempenho de cada ambiente desenvolvido é comparado por meio de técnicas de avaliação de desempenho computacional, a fim de buscar vantagens na utilização da Virtualização em tarefas de Mineração de Dados.
Published: 2015

39. Sistema computacional para aquisição automática e disponibilização de dados meteorológicos

Author: Luís César Dias Drumond, Marques Moreira de Sousa, and Murilo Coelho Naldi
Subjects: meteorologia, scripts technology, Agriculture (General), automatização de sistemas, scripts, meteorology, System automation, Agricultural and Biological Sciences (miscellaneous), S1-972
Abstract: O objetivo deste artigo é apresentar uma metodologia de automatização do processo de coleta e disponibilização de dados meteorológicos. Esta metodologia faz o uso de uma linguagem de scripts para definir as ações a serem executadas desde a coleta dos dados na estação climática, passando pelo seu processamento e finalizando com o envio dos dados processados para publicação em um site na internet. Os dados meteorológicos temperatura, umidade, velocidade do vento, chuva, radiação e evapotranspiração disponibilizados no site permitem aos produtores e pesquisadores analisar as variações climáticas, auxiliando assim em tomadas de decisão mais eficientes. O resultado obtido é o fornecimento sem custos de informações importantes e de qualidade sobre o clima. This paper presents a methodology to automate weather data collection and to make them available. This approach uses script technologies to define which actions should be taken to retrieve, process and publish climate data on a website. Once published, meteorological data as temperature, air humidity, wind speed, rainfall, solar radiation and evapotranspiration help producers and researchers to analyze weather variations for an effective decision-making support. The final outcome is to provide important information on climate variables.
Published: 2015

40. Fuzzy Clustering Algorithms and Validity Indices for Distributed Data

Author: Ricardo J. G. B. Campello, Murilo Coelho Naldi, and Lucas Vendramin
Subjects: Fuzzy clustering, Data stream clustering, Distributed algorithm, CURE data clustering algorithm, Correlation clustering, Constrained clustering, FLAME clustering, Data mining, Cluster analysis, computer.software_genre, Algorithm, computer, Mathematics
Abstract: This chapter presents a unified framework to generalize a number of fuzzy clustering algorithms to handle distributed data in an exact way, i.e., with no approximation of results with respect to their original centralized versions. The same framework allows the exact distribution of relative validity indices used to evaluate the quality of fuzzy clustering solutions. Complexity analyses for each distributed algorithm and index are reported in terms of space, time, and communication aspects. A general procedure to estimate the number of clusters in a non-centralized fashion using the proposed framework is also described. Such a procedure is directly applicable not only to distributed data, but to parallel data processing scenarios as well. Experimental results illustrate the speedup obtained when running algorithms under the proposed framework in multiple cores of a processor, when compared to their traditional, centralized counterparts running in a single core. Additionally, the quality of the results and amount of data transmitted are assessed and compared among different fuzzy clustering algorithms.
Published: 2014

41. Multiple Parallel MapReduce k-Means Clustering with Validation and Selection

Author: Murilo Coelho Naldi and Kemilly Dearo Garcia
Subjects: Clustering high-dimensional data, Fuzzy clustering, Data stream clustering, Computer science, CURE data clustering algorithm, Correlation clustering, k-means clustering, Canopy clustering algorithm, Data mining, Cluster analysis, computer.software_genre, computer
Abstract: Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution and management of huge data sets in separate repositories. New distributed systems have been designed to scale up from a single server to thousands of machines. The MapReduce framework allows to divide a job and combine the results seamlessly. The k-means is one of the few clustering algorithms that satisfies the MapReduce constrains, but it requires the previous specification of the number of clusters and is sensitive to their initialization. In this work, we propose a MapReduce clustering algorithm to execute multiple parallel runs of k-means with different initializations and number of clusters. Additionally, a MapReduce version of a cluster relative validity index is implemented and used to find the best result. The proposed algorithm is experimentally compared with the Apache Mahout Project's MapReduce implementation of k-means. Statistical tests applied on the results indicate that the proposed algorithm can outperform the Mahout's implementation when multiple k-means partitions are required.
Published: 2014

42. Combining Information from Distributed Evolutionary k-Means

Author: Ricardo J. G. B. Campello and Murilo Coelho Naldi
Subjects: Determining the number of clusters in a data set, Theoretical computer science, Data stream clustering, CURE data clustering algorithm, Computer science, Correlation clustering, Single-linkage clustering, Canopy clustering algorithm, Constrained clustering, Data mining, computer.software_genre, Cluster analysis, computer
Abstract: One of the challenges for clustering resides in dealing with huge amounts of data, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown. We propose a collection of algorithms that generate and select k-means clustering for each distributed subset of the data and combine them afterwards. The variants of the algorithm are compared from two perspectives: the theoretical one, through asymptotic complexity analyses, and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests.
Published: 2012

43. Comparison Among Methods for k Estimation in k-means

Author: André Fontana, Ricardo J. G. B. Campello, and Murilo Coelho Naldi
Subjects: Estimation, Computer science, media_common.quotation_subject, k-means clustering, computer.software_genre, Partition (database), Evolutionary computation, Position (vector), Simplicity, Data mining, Cluster analysis, Heuristics, computer, media_common
Abstract: One of the most influential algorithms in data mining, k-means, is broadly used in practical tasks for its simplicity, computational efficiency and effectiveness in high dimensional problems. However, k-means has two major drawbacks, which are the need to choose the number of clusters, k, and the sensibility to the initial prototypes’ position. In this work, systematic, evolutionary and order heuristics used to suppress these drawbacks are compared. 27 variants of 4 algorithmic approaches are used to partition 324 synthetic data sets and the obtained results are compared.
Published: 2009

44. Evolutionary Fuzzy Clustering: An Overview and Efficiency Issues

Author: Ricardo J. G. B. Campello, D. Horta, A.C.P.L.F. de Carvalho, Eduardo R. Hruschka, and Murilo Coelho Naldi
Subjects: Fuzzy clustering, business.industry, Correlation clustering, Constrained clustering, Machine learning, computer.software_genre, Determining the number of clusters in a data set, ComputingMethodologies_PATTERNRECOGNITION, Data stream clustering, CURE data clustering algorithm, Canopy clustering algorithm, Artificial intelligence, Data mining, business, Cluster analysis, computer, Mathematics
Abstract: Clustering algorithms have been successfully applied to several data analysis problems in a wide range of domains, such as image processing, bioinformatics, crude oil analysis, market segmentation, document categorization, and web mining. The need for organizing data into categories of similar objects has made the task of clustering very important to these domains. In this context, there has been an increasingly interest in the study of evolutionary algorithms for clustering, especially those algorithms capable of finding blurred clusters that are not clearly separated from each other. In particular, a number of evolutionary algorithms for fuzzy clustering have been addressed in the literature. This chapter has two main contributions. First, it presents an overview of evolutionary algorithms designed for fuzzy clustering. Second, it describes a fuzzy version of an evolutionary algorithm for clustering, which has shown to be more computationally efficient than systematic (i.e., repetitive) approaches when the number of clusters in a data set is unknown. Illustrative experiments showing the influence of local optimization on the efficiency of the evolutionary search are also presented. These experiments reveal interesting aspects of the effect of an important parameter found in many evolutionary algorithms for clustering, namely, the number of iterations of a given local search procedure to be performed at each generation.
Published: 2009

45. Genetic Clustering for Data Mining

Author: André C. P. L. F. de Carvalho, Murilo Coelho Naldi, Eduardo R. Hruschka, and Ricardo José Gabrielli Barreto Campell
Subjects: business.industry, Computer science, Image processing, Genetic operator, Machine learning, computer.software_genre, Range (mathematics), Chromosome (genetic algorithm), Genetic algorithm, Artificial intelligence, Data mining, business, Representation (mathematics), Cluster analysis, computer
Abstract: Genetic Algorithms (GAs) have been successfully applied to several complex data analysis problems in a wide range of domains, such as image processing, bioinformatics, and crude oil analysis. The need for organizing data into categories of similar objects has made the task of clustering increasingly important to those domains. In this chapter, the authors present a survey of the use of GAs for clustering applications. A variety of encoding (chromosome representation) approaches, fitness functions, and genetic operators are described, all of them customized to solve problems in such an application context.
Published: 2008

46. Hybrid clustering techniques with genetic algorithms

Author: Murilo Coelho Naldi, André Carlos Ponce de Leon Ferreira de Carvalho, Zhao Liang, and Maria do Carmo Nicoletti
Subjects: Data set, Computer science, Process (engineering), Data mining, Cluster analysis, computer.software_genre, computer, Free parameter
Abstract: Técnicas de Agrupamento vêm obtendo bons resultados quando utilizados em diversos problemas de análise de dados, como, por exemplo, a análise de dados de expressão gênica. Porém, uma mesma técnica de agrupamento utilizada em um mesmo conjunto de dados pode resultar em diferentes formas de agrupar esses dados, devido aos possíveis agrupamentos iniciais ou à utilização de diferentes valores para seus parâmetros livres. Assim, a obtenção de um bom agrupamento pode ser visto como um processo de otimização. Esse processo procura escolher bons agrupamentos iniciais e encontrar o melhor conjunto de valores para os parâmetros livres. Por serem métodos de busca global, Algoritmos Genéticos podem ser utilizados durante esse processo de otimização. O objetivo desse projeto de pesquisa é investigar a utilização de Técnicas de Agrupamento em conjunto com Algoritmos Genéticos para aprimorar a qualidade dos grupos encontrados por algoritmos de agrupamento, principalmente o k-médias. Esta investigação será realizada utilizando como aplicação a análise de dados de expressão gênica. Essa dissertação de mestrado apresenta uma revisão bibliográfica sobre os temas abordados no projeto, a descrição da metodologia utilizada, seu desenvolvimento e uma análise dos resultados obtidos. Clustering techniques have been obtaining good results when used in several data analysis problems, like, for example, gene expression data analysis. However, the same clustering technique used for the same data set can result in different ways of clustering the data, due to the possible initial clustering or the use of different values for the free parameters. Thus, the obtainment of a good clustering can be seen as an optimization process. This process tries to obtain good clustering by selecting the best values for the free parameters. For being global search methods, Genetic Algorithms have been successfully used during the optimization process. The goal of this research project is to investigate the use of clustering techniques together with Genetic Algorithms to improve the quality of the clusters found by clustering algorithms, mainly the k-means. This investigation was carried out using as application the analysis of gene expression data, a Bioinformatics problem. This dissertation presents a bibliographic review of the issues covered in the project, the description of the methodology followed, its development and an analysis of the results obtained.
Published: 2006

47. Intelligent Systems - 12th Brazilian Conference, BRACIS 2023, Belo Horizonte, Brazil, September 25-29, 2023, Proceedings, Part I

Author: Murilo Coelho Naldi and Reinaldo A. C. Bianchi
Published: 2023
Full Text: View/download PDF

48. Redes neurais como método de otimização para regressão

Author: Victor Azevedo Coscrato, Rafael Izbicki, Murilo Coelho Naldi, and Marcos Oliveira Prates
Subjects: Artificial neural network, Computer science, business.industry, Local regression, Artificial intelligence, Machine learning, computer.software_genre, business, computer, Regression
Abstract: Neural networks are a tool to solve prediction problems that have gained much prominence recently. In general, neural networks are used as a predictive method, that is, their are used to estimate a regression function. Instead, this work presents the use of neural networks as an optimization tool to combine existing regression estimators in order to obtain more accurate predictions and to fit local linear models more efficiently. Several tests were conducted to show the greater efficiency of these methods when compared to the usual ones. Redes neurais são uma ferramenta para resolver problemas de predição que ganharam muito destaque recentemente. Em geral, redes neurais são utilizados como um método preditivo, ou seja, estimando uma função de regressão. Este trabalho, no entanto, apresenta o uso de redes neurais como uma ferramenta de otimização para combinar estimadores de regressão já existentes de modo a obter predições mais precisas e ajustar modelos lineares locais de forma mais eficiente. Vários testes foram conduzidos para mostrar a maior eficiência desses métodos quando comparados aos usuais.
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

48 results on '"Murilo Coelho Naldi"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources