39 results on '"data filtration"'
Search Results
2. On Quality Analysis of Filtration Methods for Bathymetric Data in Harbour Areas through QPS Qimera Software.
- Author
-
Kazimierski, Witold and Jaremba, Małgorzata
- Subjects
- *
DATA quality , *HARBORS , *EVALUATION methodology , *COMPUTER software , *INTERPOLATION - Abstract
This paper presents an assessment of the quality of selected filtration methods for the postprocessing of multibeam echosounder data. In this regard, the methodology used in the quality assessment of these data is an important factor. One of the most important final products derived from bathymetric data is the digital bottom model (DBM). Therefore, quality assessment is often based on factors related to it. In this paper, we propose some quantitative and qualitative factors to perform these assessments, and we analyze a few selected filtration methods as examples. This research makes use of real data gathered in real environments, preprocessed with typical hydrographic flow. The methods presented in this paper may be used in empirical solutions, and the filtration analysis may be useful for hydrographers choosing a filtration method for DBM interpolation. The results showed that both data-oriented and surface-oriented methods can be used in data filtration and that various evaluation methods show different perspectives on data filtration quality assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Data analysis of complex production data using innovative methods
- Author
-
Pelcastre, Leonardo Matias and Pelcastre, Leonardo Matias
- Abstract
SinterCast is an international company specializing in providing tools for manufacturing and analyzing the production of Compacted Graphite Iron (CGI), a complex type of iron often used in engine blocks by automotive companies like Audi, Ford, and Scania. The company is committed to providing its customers with all the tools required to improve its CGI manufacturing process, so it provides its customers with monthly improvements and a production data report. The improvements and reports are generated manually for all customers using Excel for the report generation process. The report generation alone is an extremely time-consuming, complex, and manual process that may contain hidden and difficult-to-rectify faults. The thesis focuses on providing SinterCast with a report generation pipeline that automatically generates a customer report from data on their customer database. The result of the thesis project provides SinterCast with a more correct, more flexible, and faster implementation of the report generation procedure. Specifically, the proposed pipeline reduces report generation time by 95.24%, saving approximately 78 hours of a metallurgist's time each year compared to the current company pipeline. The solution also allows any employee to generate a report through the use of a user-friendly graphical interface. Finally, the solution provides a command-line interface for automating the report generation procedure to allow for overnight report generation.
- Published
- 2024
4. Method for Enhanced Accuracy in Machining Free-Form Surfaces on CNC Milling Machines.
- Author
-
Werner, Andrzej
- Subjects
MILLING machinery ,MANUFACTURING processes ,INFORMATION retrieval ,DATA analysis ,ACCURACY - Abstract
The present article describes a method for enhanced accuracy in machining free-form surfaces produced on CNC milling machines. In this method, surface patch machining programs are generated based on their nominal CAD model. After the pretreatment, coordinate control measurements are carried out. The obtained results of the measurements contain information on the values and distribution of observed machining deviations. These data, after appropriate processing, are used to build a corrected CAD model of the surface produced. This model, made using reverse engineering techniques, compensates for the observed machining deviations. After regeneration of machining programs, the object processing and control measurements are repeated. As a result of the conducted procedure, the accuracy of the manufacture of the surface object is increased. This article also proposes the introduction of a simple procedure for the filtration of measurement data. Its purpose is to minimise the effect of random phenomena on the final machining error correction. The final part of the article presents the effects of the proposed method of increasing the accuracy of manufacturing on 'raw' and filtered measurement data. In both cases, a significant improvement in the accuracy of the machining process was achieved, with better final results obtained from the filtered measurement data. The method proposed in the article has been verified for three-axis machining with a ball-end cutter. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. On Quality Analysis of Filtration Methods for Bathymetric Data in Harbour Areas through QPS Qimera Software
- Author
-
Witold Kazimierski and Małgorzata Jaremba
- Subjects
bathymetry ,multibeam echosounder ,digital bottom model ,data filtration ,hydrography ,surface modeling ,Chemical technology ,TP1-1185 - Abstract
This paper presents an assessment of the quality of selected filtration methods for the postprocessing of multibeam echosounder data. In this regard, the methodology used in the quality assessment of these data is an important factor. One of the most important final products derived from bathymetric data is the digital bottom model (DBM). Therefore, quality assessment is often based on factors related to it. In this paper, we propose some quantitative and qualitative factors to perform these assessments, and we analyze a few selected filtration methods as examples. This research makes use of real data gathered in real environments, preprocessed with typical hydrographic flow. The methods presented in this paper may be used in empirical solutions, and the filtration analysis may be useful for hydrographers choosing a filtration method for DBM interpolation. The results showed that both data-oriented and surface-oriented methods can be used in data filtration and that various evaluation methods show different perspectives on data filtration quality assessment.
- Published
- 2023
- Full Text
- View/download PDF
6. Tracking Dengue on Twitter Using Hybrid Filtration-Polarity and Apache Flume.
- Author
-
Ghani, Norjihan Binti Abdul, Hamid, Suraya, Ahmad, Muneer, Saadi, Younes, Jhanjhi, N. Z., Alzain, Mohammed A., and Masud, Mehedi
- Subjects
DENGUE ,SOCIAL media ,COMPUTER algorithms - Abstract
The world health organization (WHO) terms dengue as a serious illness that impacts almost half of the world's population and carries no specific treatment. Early and accurate detection of spread in affected regions can save precious lives. Despite the severity of the disease, a few noticeable works can be found that involve sentiment analysis to mine accurate intuitions from the social media text streams. However, the massive data explosion in recent years has led to difficulties in terms of storing and processing large amounts of data, as reliable mechanisms to gather the data and suitable techniques to extract meaningful insights from the data are required. This research study proposes a sentiment analysis polarity approach for collecting data and extracting relevant information about dengue via Apache Hadoop. The method consists of two main parts: the first part collects data from social media using Apache Flume, while the second part focuses on querying and extracting relevant information via the hybrid filtration-polarity algorithm using Apache Hive. To overcome the noisy and unstructured nature of the data, the process of extracting information is characterized by pre and post-filtration phases. As a result, only with the integration of Flume and Hive with filtration and polarity analysis, can a reliable sentiment analysis technique be offered to collect and process large-scale data from the social network. We introduce how the Apache Hadoop ecosystem - Flume and Hive - can provide a sentiment analysis capability by storing and processing large amounts of data. An important finding of this paper is that developing efficient sentiment analysis applications for detecting diseases can be more reliable through the use of the Hadoop ecosystem components than through the use of normal machines. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. GPS Data Filtration Method for Drive Cycle Analysis Applications
- Author
-
Earleywine, Matthew
- Published
- 2013
- Full Text
- View/download PDF
8. Detection and Prevention of DDoS Attacks in Wireless Sensor Networks
- Author
-
Dhuria, Shivam, Sachdeva, Monika, Xhafa, Fatos, Series editor, Perez, Gregorio Martinez, editor, Mishra, Krishn K., editor, Tiwari, Shailesh, editor, and Trivedi, Munesh C., editor
- Published
- 2018
- Full Text
- View/download PDF
9. Mixing Textual Data Selection Methods for Improved In-Domain Data Adaptation
- Author
-
Wołk, Krzysztof, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Rocha, Álvaro, editor, Adeli, Hojjat, editor, Reis, Luís Paulo, editor, and Costanzo, Sandra, editor
- Published
- 2018
- Full Text
- View/download PDF
10. Augmenting SMT with Semantically-Generated Virtual-Parallel Corpora from Monolingual Texts
- Author
-
Wołk, Krzysztof, Wołk, Agnieszka, Kacprzyk, Janusz, Series editor, Rocha, Álvaro, editor, Adeli, Hojjat, editor, Reis, Luís Paulo, editor, and Costanzo, Sandra, editor
- Published
- 2018
- Full Text
- View/download PDF
11. Making Deep Neural Networks Robust to Label Noise: Cross-Training With a Novel Loss Function
- Author
-
Zhen Qin, Zhengwen Zhang, Yan Li, and Jun Guo
- Subjects
Deep neural networks ,label noise ,cross-training ,loss function ,data filtration ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Deep neural networks (DNNs) have achieved astonishing results on a variety of supervised learning tasks owing to a large scale of well-labeled training data. However, as recent researches have pointed out, the generalization performance of DNNs is likely to sharply deteriorate when training data contains label noise. In order to address this problem, a novel loss function is proposed to guide DNNs to pay more attention to clean samples via adaptively weighing the traditional cross-entropy loss. Under the guidance of this loss function, a cross-training strategy is designed by leveraging two synergic DNN models, each of which plays the roles of both updating its own parameters and generating curriculums for the other one. In addition, this paper further proposes an online data filtration mechanism and integrates it into the final cross-training framework, which simultaneously optimizes DNN models and filters out noisy samples. The proposed approach is evaluated through a great deal of experiments on several benchmark datasets with man-made or real-world label noise, and the results have demonstrated its robustness to different noise types and noise scales.
- Published
- 2019
- Full Text
- View/download PDF
12. Formal Verification and Performance Analysis of a New Data Exchange Protocol for Connected Vehicles.
- Author
-
Chouali, Samir, Boukerche, Azzedine, Mostefaoui, Ahmed, and Merzoug, Mohammed Amine
- Subjects
- *
DATA analysis , *DATA reduction , *ORDER picking systems , *EXCHANGE , *VEHICLES - Abstract
In this article, we focus on the usage of MQTT (Message Queuing Telemetry Transport) within Connected Vehicles (CVs). Indeed, in the original version of MQTT protocol, the broker is responsible “only” for sending received data to subscribers; abstracting then the underlying mechanism of data exchange. However, within CVs context, subscribers (i.e., the processing infrastructure) may be overloaded with irrelevant data, in particular when the requirement is real or near real-time processing. To overcome this issue, we propose MQTT-CV; a new variant of MQTT protocol, in which the broker is able to perform local processing in order to reduce the workload at the infrastructure; i.e., filtering data before sending them. In this article, we first validate formally the correctness of MQTT-CV protocol (i.e., the three components of the proposed protocol are correctly interacting), through the use of Promela language and its system verification tool; the model checker SPIN. Secondly, using real-world data provided by our car manufacturer partner, we have conducted real implementation and experiments. The obtained results show the effectiveness of our approach in term of data workload reduction at the processing infrastructure. The mean improvement, besides the fact that it is dependent of the target application, was in general about 10 times less in comparison to native MQTT protocol. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. The Impact of Data Filtration on the Accuracy of Multiple Time-Domain Forecasting for Photovoltaic Power Plants Generation.
- Author
-
Eroshenko, Stanislav A., Khalyasmaa, Alexandra I., Snegirev, Denis A., Dubailova, Valeria V., Romanov, Alexey M., and Butusov, Denis N.
- Subjects
PHOTOVOLTAIC power systems ,PHOTOVOLTAIC power generation ,FORECASTING ,WIND power plants ,FILTERS & filtration ,MULTIPLE comparisons (Statistics) ,SCALABILITY - Abstract
The paper reports the forecasting model for multiple time-domain photovoltaic power plants, developed in response to the necessity of bad weather days' accurate and robust power generation forecasting. We provide a brief description of the piloted short-term forecasting system and place under close scrutiny the main sources of photovoltaic power plants' generation forecasting errors. The effectiveness of the empirical approach versus unsupervised learning was investigated in application to source data filtration in order to improve the power generation forecasting accuracy for unstable weather conditions. The k-nearest neighbors' methodology was justified to be optimal for initial data filtration, based on the clusterization results, associated with peculiar weather and seasonal conditions. The photovoltaic power plants' forecasting accuracy improvement was further investigated for a one hour-ahead time-domain. It was proved that operational forecasting could be implemented based on the results of short-term day-ahead forecast mismatches predictions, which form the basis for multiple time-domain integrated forecasting tools. After a comparison of multiple time series forecasting approaches, operational forecasting was realized based on the second-order autoregression function and applied to short-term forecasting errors with the resulting accuracy of 87%. In the concluding part of the article the authors from the points of view of computational efficiency and scalability proposed the hardware system composition. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. Stabilizing Sensor Data Collection for Control of Environment-Friendly Clean Technologies Using Internet of Things.
- Author
-
Bhadoria, Robin Singh and Bajpai, Dhananjai
- Subjects
INTERNET of things ,ACQUISITION of data ,DATA packeting ,ENVIRONMENTAL monitoring ,DATA collection platforms ,INFORMATION sharing - Abstract
The Internet of Things (IoT) is a network formed by smart devices whose core contains embedded technology in order to collect sensory information and exchange it with every single device present within the network. One major challenge in IoT is data stabilization of the sensory data collected by smart devices before it is fed into the cloud. Data collected by sensing devices suffer with random glitches, variation in volume; inter packet delay and data packet loss due to inline power fluctuations. The cleaning and filtration of such noisy characteristics must be done before restoring this raw data to some global repository. This paper advises a method for clean monitoring of captured environmental parameters and proposes few algorithms that help in stabilizing the captured sensory data on the basis of data packet volume, inter packet delay, data type and random amplitude levels. Such an approach supports service level implementation required by an enterprise for deploying error free stabilized data collection without any ambiguity in order to compute an accurate and sustainable result required for effective environmental parameter analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. SEMANTIC APPROACH FOR BUILDING GENERATED VIRTUAL-PARALLEL CORPORA FROM MONOLINGUAL TEXTS.
- Author
-
WOŁK, KRZYSZTOF, WOŁK, AGNIESZKA, and MARASEK, KRZYSZTOF
- Subjects
NATURAL language processing ,CORPORA ,NATURAL languages ,STATISTICS - Abstract
Several natural languages have undergone a great deal of processing, but the problem of limited textual linguistic resources remains. The manual creation of parallel corpora by humans is rather expensive and time consuming, while the language data required for statistical machine translation (SMT) do not exist in adequate quantities for their statistical information to be used to initiate the research process. On the other hand, applying known approaches to build parallel resources from multiple sources, such as comparable or quasi-comparable corpora, is very complicated and provides rather noisy output, which later needs to be further processed and requires in-domain adaptation. To optimize the performance of comparable corpora mining algorithms, it is essential to use a quality parallel corpus for training of a good data classifier. In this research, we have developed a methodology for generating an accurate parallel corpus (Czech-English, Polish-English) from monolingual resources by calculating the compatibility between the results of three machine translation systems. We have created translations of large, single-language resources by applying multiple translation systems and strictly measuring translation compatibility using rules based on the Levenshtein distance. The results produced by this approach were very favorable. The generated corpora successfully improved the quality of SMT systems and seem to be useful for many other natural language processing tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
16. Lifetime improvement of wireless sensor network by information sensitive aggregation method for railway condition monitoring.
- Author
-
Tolani, Manoj, Sunny, and Singh, Rajat Kumar
- Subjects
WIRELESS sensor networks ,WIRELESS communications - Abstract
Abstract Lifetime maximization is an important issue while designing wireless sensor network. One of the ways to increase the lifetime of WSN is to reduce the energy demand of the sensor nodes and cluster head nodes, which can be decreased by filtering out redundant data traffic. A Two-Layer Hierarchal Aggregation (TLHA) protocol has been proposed, which filter out redundant data traffic. Simple aggregation algorithm has been used to classify the sensed data based on their normalized standard deviation for efficient data filtration. The algorithm also performs k- Means based re-classification of data to improve the efficiency of the algorithm. Two-layered redundant data filtration technique has been proposed to classify the data. Aggregation protocol reduces data traffic in lower layer hierarchy (sensor nodes to cluster head node) as well as upper layer hierarchy (cluster head nodes to base station node). In former, Energy efficient Time Division Multiple Access (EA-TDMA) Medium Access Control (MAC) protocol is used while in latter Bit-Map-Assisted (BMA) MAC protocol is used for transmission of data. Simulation and experimental results show that TLHA saves an enormous amount of energy, which finally increases the lifetime of the sensor network. The performance of the proposed TLHA protocol has been compared with already existing protocols viz. Data Aggregation Window Function (DAWF) and Spatial-Temporal Correlation Algorithm (STCA). [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
17. The Impact of Data Filtration on the Accuracy of Multiple Time-Domain Forecasting for Photovoltaic Power Plants Generation
- Author
-
Stanislav A. Eroshenko, Alexandra I. Khalyasmaa, Denis A. Snegirev, Valeria V. Dubailova, Alexey M. Romanov, and Denis N. Butusov
- Subjects
photovoltaic power plant ,short-term forecasting ,data processing ,data filtration ,k-nearest neighbors ,regression ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
The paper reports the forecasting model for multiple time-domain photovoltaic power plants, developed in response to the necessity of bad weather days’ accurate and robust power generation forecasting. We provide a brief description of the piloted short-term forecasting system and place under close scrutiny the main sources of photovoltaic power plants’ generation forecasting errors. The effectiveness of the empirical approach versus unsupervised learning was investigated in application to source data filtration in order to improve the power generation forecasting accuracy for unstable weather conditions. The k-nearest neighbors’ methodology was justified to be optimal for initial data filtration, based on the clusterization results, associated with peculiar weather and seasonal conditions. The photovoltaic power plants’ forecasting accuracy improvement was further investigated for a one hour-ahead time-domain. It was proved that operational forecasting could be implemented based on the results of short-term day-ahead forecast mismatches predictions, which form the basis for multiple time-domain integrated forecasting tools. After a comparison of multiple time series forecasting approaches, operational forecasting was realized based on the second-order autoregression function and applied to short-term forecasting errors with the resulting accuracy of 87%. In the concluding part of the article the authors from the points of view of computational efficiency and scalability proposed the hardware system composition.
- Published
- 2020
- Full Text
- View/download PDF
18. Study of the solar coronal hole rotation.
- Author
-
Oghrapishvili, N.B., Bagashvili, S.R., Maghradze, D.A., Gachechiladze, T.Z., Japaridze, D.R., Shergelashvili, B.M., Mdzinarishvili, T.G., and Chargeishvili, B.B.
- Subjects
- *
CORONAL holes , *SOLAR corona , *ROTATIONAL motion , *CENTROID , *HELIOGRAPH - Abstract
Rotation of coronal holes is studied using data from SDO/AIA for 2014 and 2015. A new approach to the treatment of data is applied. Instead of calculated average angular velocities of each coronal hole centroid and then grouping them in latitudinal bins for calculating average rotation rates of corresponding latitudes, we compiled instant rotation rates of centroids and their corresponding heliographic coordinates in one matrix for further processing. Even unfiltered data showed clear differential nature of rotation of coronal holes. We studied possible reasons for distortion of data by the limb effects to eliminate some discrepancies at high latitudes caused by the high order of scattering of data in that region. A study of the longitudinal distribution of angular velocities revealed the optimal longitudinal interval for the best result. We examined different methods of data filtering and realized that filtration using targeting on the local medians of data with a constant threshold is a more acceptable approach that is not biased towards a predefined notion of an expected result. The results showed a differential pattern of rotation of coronal holes. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. Experiments with Consistency-Based Preprocessing of MMPI Data for Classification Tasks.
- Author
-
Pancerz, Krzysztof and Gomuła, Jerzy
- Subjects
MINNESOTA Multiphasic Personality Inventory ,ELECTRONIC data processing ,PSYCHOMETRICS ,PATHOLOGICAL psychology ,ROUGH sets - Abstract
The paper is devoted to the problem of consistency-based preprocessing of MMPI data for classification tasks. MMPI (Minnesota Multiphasic Personality Inventory) is a standardized psychometric test of adult personality and psychopathology. The MMPI test delivers psychometric data in a form of the so-called profiles consisting of values of thirteen descriptive attributes (corresponding to scales). The preprocessing procedure covers the filtration of cases using consistency factors calculated according to the original rough set based approach. The proposed filtration enables us to simplify the size of classifiers without a significant loss of their classification ability. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
20. Data Filtration Methods of Electronic Measurement of Log Dimensions
- Author
-
Veronika Hunková and Karel Janák
- Subjects
log yards ,round wood ,electronic reception ,electronic measurement ,data filtration ,Forestry ,SD1-669.5 - Abstract
The article deals with the processing of log dimension data collected by electronic measurement during reception at sawmills. The subject of the work concerns the fi ltration of data before their use for calculation. Filtration methods were designed based on simple mathematical and statistical methods, and compared and evaluated with the use of a designed comparative methodology. As a result, five filtration methods were selected that best suit the reception requirements. At the same time, the impact of filtration on the measurement results is evaluated in relation to the calculation method of wood volume. Furthermore, the calculation by sections is recommended, as it is less affected by filtration errors.
- Published
- 2014
- Full Text
- View/download PDF
21. Analytic Correlation Filtration: A New Tool to Reduce Analytical Complexity of Metabolomic Datasets
- Author
-
Stephanie Monnerie, Melanie Petera, Bernard Lyan, Pierrette Gaudreau, Blandine Comte, and Estelle Pujos-Guillot
- Subjects
metabolomics ,data filtration ,high-resolution mass spectrometry ,Microbiology ,QR1-502 - Abstract
Metabolomics generates massive and complex data. Redundant different analytical species and the high degree of correlation in datasets is a constraint for the use of data mining/statistical methods and interpretation. In this context, we developed a new tool to detect analytical correlation into datasets without confounding them with biological correlations. Based on several parameters, such as a similarity measure, retention time, and mass information from known isotopes, adducts, or fragments, the algorithm principle is used to group features coming from the same analyte, and to propose one single representative per group. To illustrate the functionalities and added-value of this tool, it was applied to published datasets and compared to one of the most commonly used free packages proposing a grouping method for metabolomics data: ‘CAMERA’. This tool was developed to be included in Galaxy and is available in Workflow4Metabolomics.
- Published
- 2019
- Full Text
- View/download PDF
22. Estimation of a low pass filter for solar radiation data
- Author
-
Jacobsen, Judith L., Madsen, Henrik, Harremoës, Poul, Arkeryd, Leif, editor, Engl, Heinz, editor, Fasano, Antonio, editor, Mattheij, Robert M. M., editor, Neittaanmäki, Pekka, editor, Neunzert, Helmut, editor, Brøns, Morton, editor, Bendsøe, Martin Philip, editor, and Sørensen, Mads Peter, editor
- Published
- 1997
- Full Text
- View/download PDF
23. Methods of data filtration and their effects on the resulting image of a log at the electronic sensing its dimentisons
- Author
-
Veronika Hunková and Karel Janák
- Subjects
data filtration ,electronic measurement ,log dimensions ,running average ,median ,log diameter ,Agriculture ,Biology (General) ,QH301-705.5 - Abstract
The data taken at the electronic reception of logs are tasked with mistaken values and do not correspond to the real shape of a stem. The aim of the data filtration is to remove the incorrect data and replace them by the ones, closer to the real values.The goal of presented work was to analyse the effect of different filtration methods on original data and to recommend the methods, which results correspond to the real shape of the logs with their defects on a surface as close as possible.Methods of filtration are based on simple mathematical and statistical procedures, which are subsequently variously combined, because the simplest comparative methods did not fulfil the expectations.More than 50 methods were combined and analyzed. Approximately 15 of them were selected and tested. The methods were applied to data scanned on ca 150 logs randomly selected from thousands of files in the sawmill log yard. The filtration effects of different methods were visually assessed. The most applicable three methods were selected and they will be tested practically at sawmill in the next step of research. These three methods are described in following paper.Results refer, that all proposed methods correspond to necessities for log dimensions determination. It is not possible to define the best method generally. Different properties of these methods make them suitable for calculation of log dimensions if different type of use is requested. It means for mid diameter determination necessary for volume calculation by using Huber method; for section diameter determination for volume calculation by using section method; for top diameter determination necessary for sorting; for filtration of drive dogs of the conveyor passing through the measuring equipment.
- Published
- 2010
- Full Text
- View/download PDF
24. Making Deep Neural Networks Robust to Label Noise: Cross-Training With a Novel Loss Function
- Author
-
Zhengwen Zhang, Zhen Qin, Yan Li, and Jun Guo
- Subjects
General Computer Science ,Computer science ,Generalization ,label noise ,02 engineering and technology ,010501 environmental sciences ,Machine learning ,computer.software_genre ,01 natural sciences ,Robustness (computer science) ,Deep neural networks ,0202 electrical engineering, electronic engineering, information engineering ,cross-training ,General Materials Science ,0105 earth and related environmental sciences ,Training set ,Cross-training ,business.industry ,Supervised learning ,General Engineering ,Function (mathematics) ,data filtration ,loss function ,Noise ,Benchmark (computing) ,020201 artificial intelligence & image processing ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,Scale (map) ,business ,lcsh:TK1-9971 ,computer - Abstract
Deep neural networks (DNNs) have achieved astonishing results on a variety of supervised learning tasks owing to a large scale of well-labeled training data. However, as recent researches have pointed out, the generalization performance of DNNs is likely to sharply deteriorate when training data contains label noise. In order to address this problem, a novel loss function is proposed to guide DNNs to pay more attention to clean samples via adaptively weighing the traditional cross-entropy loss. Under the guidance of this loss function, a cross-training strategy is designed by leveraging two synergic DNN models, each of which plays the roles of both updating its own parameters and generating curriculums for the other one. In addition, this paper further proposes an online data filtration mechanism and integrates it into the final cross-training framework, which simultaneously optimizes DNN models and filters out noisy samples. The proposed approach is evaluated through a great deal of experiments on several benchmark datasets with man-made or real-world label noise, and the results have demonstrated its robustness to different noise types and noise scales.
- Published
- 2019
25. The impact of data filtration on the accuracy of multiple time-domain forecasting for photovoltaic power plants generation
- Author
-
Eroshenko, S. A., Khalyasmaa, A. I., Snegirev, D. A., Dubailova, V. V., Romanov, A. M., Butusov, D. N., Eroshenko, S. A., Khalyasmaa, A. I., Snegirev, D. A., Dubailova, V. V., Romanov, A. M., and Butusov, D. N.
- Abstract
The paper reports the forecasting model for multiple time-domain photovoltaic power plants, developed in response to the necessity of bad weather days’ accurate and robust power generation forecasting. We provide a brief description of the piloted short-term forecasting system and place under close scrutiny the main sources of photovoltaic power plants’ generation forecasting errors. The effectiveness of the empirical approach versus unsupervised learning was investigated in application to source data filtration in order to improve the power generation forecasting accuracy for unstable weather conditions. The k-nearest neighbors’ methodology was justified to be optimal for initial data filtration, based on the clusterization results, associated with peculiar weather and seasonal conditions. The photovoltaic power plants’ forecasting accuracy improvement was further investigated for a one hour-ahead time-domain. It was proved that operational forecasting could be implemented based on the results of short-term day-ahead forecast mismatches predictions, which form the basis for multiple time-domain integrated forecasting tools. After a comparison of multiple time series forecasting approaches, operational forecasting was realized based on the second-order autoregression function and applied to short-term forecasting errors with the resulting accuracy of 87%. In the concluding part of the article the authors from the points of view of computational efficiency and scalability proposed the hardware system composition. © 2020 by the authors. Licensee MDPI, Basel, Switzerland.
- Published
- 2020
26. Sources, identification and removal of ChIP-seq artifacts
- Author
-
Shumilova, Aleksandra, Převorovský, Martin, and Fišer, Karel
- Subjects
kontrola kvality ,data filtration ,filtrování dat ,quality control ,chromatin imunoprecipitation ,chromatinová imunoprecipitace ,ChIP-seq ,next generation sequencing - Abstract
Chromatin immunoprecipitation is used to enrich DNA sequences that are associ- ated with a protein of interest, and is used to map those sequences to the genomic regions. Studying these DNA-protein binding regions provides an understanding of gene regulation and chromatin remodeling. However, some signals in fact rep- resent no binding event and are known as false positives. This thesis discusses the main sources of false-positive signals that commonly arise during ChIP-seq analysis, and offers possible solutions on how to minimize or filter them. Keywords: ChIP-seq, chromatin imunoprecipitation, quality control, data filtra- tion iii
- Published
- 2021
27. Data Filtration Methods of Electronic Measurement of Log Dimensions.
- Author
-
Hunkova, Veronika and Janak, Karel
- Abstract
Copyright of Wood Industry / Drvna Industrija is the property of Drvna Industrija and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2014
- Full Text
- View/download PDF
28. The Impact of Data Filtration on the Accuracy of Multiple Time-Domain Forecasting for Photovoltaic Power Plants Generation
- Author
-
Alexey Romanov, Stanislav Eroshenko, Denis A. Snegirev, Alexandra I. Khalyasmaa, Denis N. Butusov, and Valeria V. Dubailova
- Subjects
K-NEAREST NEIGHBORS ,Source data ,photovoltaic power plant ,Computer science ,020209 energy ,media_common.quotation_subject ,DATA FILTRATION ,02 engineering and technology ,lcsh:Technology ,Domain (software engineering) ,lcsh:Chemistry ,REGRESSION ,short-term forecasting ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,AUTOREGRESSION ,Function (engineering) ,lcsh:QH301-705.5 ,Instrumentation ,Physics::Atmospheric and Oceanic Physics ,media_common ,SHORT-TERM FORECASTING ,Fluid Flow and Transfer Processes ,Data processing ,lcsh:T ,autoregression ,Process Chemistry and Technology ,k-nearest neighbors ,General Engineering ,data filtration ,021001 nanoscience & nanotechnology ,DATA PROCESSING ,lcsh:QC1-999 ,Computer Science Applications ,Reliability engineering ,Electricity generation ,PHOTOVOLTAIC POWER PLANT ,lcsh:Biology (General) ,lcsh:QD1-999 ,Autoregressive model ,lcsh:TA1-2040 ,Scalability ,Unsupervised learning ,regression ,lcsh:Engineering (General). Civil engineering (General) ,0210 nano-technology ,lcsh:Physics ,data processing - Abstract
The paper reports the forecasting model for multiple time-domain photovoltaic power plants, developed in response to the necessity of bad weather days&rsquo, accurate and robust power generation forecasting. We provide a brief description of the piloted short-term forecasting system and place under close scrutiny the main sources of photovoltaic power plants&rsquo, generation forecasting errors. The effectiveness of the empirical approach versus unsupervised learning was investigated in application to source data filtration in order to improve the power generation forecasting accuracy for unstable weather conditions. The k-nearest neighbors&rsquo, methodology was justified to be optimal for initial data filtration, based on the clusterization results, associated with peculiar weather and seasonal conditions. The photovoltaic power plants&rsquo, forecasting accuracy improvement was further investigated for a one hour-ahead time-domain. It was proved that operational forecasting could be implemented based on the results of short-term day-ahead forecast mismatches predictions, which form the basis for multiple time-domain integrated forecasting tools. After a comparison of multiple time series forecasting approaches, operational forecasting was realized based on the second-order autoregression function and applied to short-term forecasting errors with the resulting accuracy of 87%. In the concluding part of the article the authors from the points of view of computational efficiency and scalability proposed the hardware system composition.
- Published
- 2020
- Full Text
- View/download PDF
29. Description and validation of a circular padding method for linear roughness measurements of short data lengths
- Author
-
Jean-Pierre Kruth, Wim Dewulf, Han Haitjema, Stijn Schoeters, and Bart Boeckmans
- Subjects
AM, Additive manufacturing ,Computer science ,Clinical Biochemistry ,CP, Circularly padded ,Surface finish ,010501 environmental sciences ,01 natural sciences ,Padding ,NP, Non-padded ,Convolution ,Surface topography ,Data Filtration ,03 medical and health sciences ,symbols.namesake ,PBF, Powder bed fusion ,Profilometry ,lcsh:Science ,Dimensional Metrology ,030304 developmental biology ,0105 earth and related environmental sciences ,0303 health sciences ,Circular Padding ,Waviness ,Filter (signal processing) ,Method Article ,Roughness ,Gaussian filter ,Medical Laboratory Technology ,Kernel (image processing) ,SLM, Selective laser melting ,symbols ,lcsh:Q ,Profilometer ,Algorithm ,GR, Gaussian regression - Abstract
Surface topography measurements are vital in industrial quality control. Linear roughness measurements are among the most preferred methods, being quick to perform and easy to interpret. The ISO 16610 standard series prescribes filters that can be used for most cases, but has limitations for restricted measurement lengths. This is because the standard filter type is a Gaussian filter, which like most instances of kernel convolution filters has no output near the edges of the profile, effectively shortening the length of the filtered output profile as compared to the input. In some cases, this leads to a lack of representative data after filtration. Especially in fields such as Additive Manufacturing (AM) this becomes a problem, due to the high “roughness to measurable data length”-ratio that characterizes complex AM parts. This paper describes a method that allows to overcome this limitation:•A method for circular padding of short measured tracks is described and validated.•A flexible profile data post-processing tool was developed in MATLAB to grant users more control over the data analysis. Results obtained from roughness profiles long enough for normal ISO procedures are shown to not change significantly when circularly padded. When only a shorter section of the data is available, where the standard protocol would not be able to compute a filtered profile and related parameters anymore, the circular padding method is shown to lead to results that are in good agreement with the ISO standard procedures., Graphical abstract Image, graphical abstract
- Published
- 2020
30. Mixture Based Outlier Filtration
- Author
-
P. Pecherková and I. Nagy
- Subjects
data filtration ,system modelling ,mixture models ,Bayesian estimation ,prediction ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Success/failure of adaptive control algorithms – especially those designed using the Linear Quadratic Gaussian criterion – depends on the quality of the process data used for model identification. One of the most harmful types of process data corruptions are outliers, i.e. ‘wrong data’ lying far away from the range of real data. The presence of outliers in the data negatively affects an estimation of the dynamics of the system. This effect is magnified when the outliers are grouped into blocks. In this paper, we propose an algorithm for outlier detection and removal. It is based on modelling the corrupted data by a two-component probabilistic mixture. The first component of the mixture models uncorrupted process data, while the second models outliers. When the outlier component is detected to be active, a prediction from the uncorrupted data component is computed and used as a reconstruction of the observed data. The resulting reconstruction filter is compared to standard methods on simulated and real data. The filter exhibits excellent properties, especially in the case of blocks of outliers.
- Published
- 2006
31. Receiver operating characteristic analysis: a general tool for DNA array data filtration and performance estimation
- Author
-
Khodarev, Nikolai N., Park, James, Kataoka, Yasushi, Nodzenski, Edwardine, Hellman, Samuel, Roizman, Bernard, Weichselbaum, Ralph R., and Pelizzari, Charles A.
- Subjects
- *
DNA , *GENES - Abstract
A critical step for DNA array analysis is data filtration, which can reduce thousands of detected signals to limited sets of genes. Commonly accepted rules for such filtration are still absent. We present a rational approach, based on thresholding of intensities with cutoff levels that are estimated by receiver operating characteristic (ROC) analysis. The technique compares test results with known distributions of positive and negative signals. We apply the method to Atlas cDNA arrays, GeneFilters, and Affymetrix GeneChip. ROC analysis demonstrates similarities in the distribution of false and true positive data for these different systems. We illustrate the estimation of an optimal cutoff level for intensity-based filtration, providing the highest ratio of true to false signals. For GeneChip arrays, we derived filtration thresholds consistent with the reported data based on replicate hybridizations. Intensity-based filtration optimized with ROC combined with other types of filtration (for example, based on significances of differences and/or ratios), should improve DNA array analysis. ROC methodology is also demonstrated for comparison of the performance of different types of arrays, imagers, and analysis software. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
32. Analytic Correlation Filtration: A New Tool to Reduce Analytical Complexity of Metabolomic Datasets
- Author
-
Monnerie, Petera, Lyan, Gaudreau, Comte, and Pujos-Guillot
- Subjects
high-resolution mass spectrometry ,data filtration ,metabolomics - Abstract
Metabolomics generates massive and complex data. Redundant different analytical species and the high degree of correlation in datasets is a constraint for the use of data mining/statistical methods and interpretation. In this context, we developed a new tool to detect analytical correlation into datasets without confounding them with biological correlations. Based on several parameters, such as a similarity measure, retention time, and mass information from known isotopes, adducts, or fragments, the algorithm principle is used to group features coming from the same analyte, and to propose one single representative per group. To illustrate the functionalities and added-value of this tool, it was applied to published datasets and compared to one of the most commonly used free packages proposing a grouping method for metabolomics data: &lsquo, CAMERA&rsquo, This tool was developed to be included in Galaxy and will be available in Workflow4Metabolomics (http://workflow4metabolomics.org). Source code is freely available for download under CeCILL 2.1 license at https://services.pfem.clermont.inra.fr/gitlab/grandpa /tool-acf and implement in Perl.
- Published
- 2019
- Full Text
- View/download PDF
33. Analytic correlation filtration: A new tool to reduce analytical complexity of metabolomic datasets
- Author
-
Pétéra, Mélanie, Lyan, Bernard, Gaudreau, Pierrette, Comte, Blandine, Monnerie, Stéphanie, and Pujos-Guillot, Estelle
- Subjects
Chimie analytique ,data filtration ,high-resolution mass spectrometry ,metabolomics ,Analytical chemistry ,Autre (Sciences du Vivant) - Abstract
Metabolomics generates massive and complex data. Redundant different analytical species and the high degree of correlation in datasets is a constraint for the use of data mining/statistical methods and interpretation. In this context, we developed a new tool to detect analytical correlation into datasets without confounding them with biological correlations. Based on several parameters, such as a similarity measure, retention time, and mass information from known isotopes, adducts, or fragments, the algorithm principle is used to group features coming from the same analyte, and to propose one single representative per group. To illustrate the functionalities and added-value of this tool, it was applied to published datasets and compared to one of the most commonly used free packages proposing a grouping method for metabolomics data: 'CAMERA'. This tool was developed to be included in Galaxy and will be available in Workflow4Metabolomics (http://workflow4metabolomics.org). Source code is freely available for download under CeCILL 2.1 license at https://services.pfem.clermont.inra.fr/gitlab/grandpa /tool-acf and implement in Perl.
- Published
- 2019
34. Methods of data filtration and their effects on the resulting image of a log at the electronic sensing its dimensions
- Author
-
Veronika Hunková and Karel Janák
- Subjects
Computer science ,median ,log dimensions ,lcsh:S ,Sorting ,data filtration ,Measuring equipment ,electronic measurement ,Image (mathematics) ,Original data ,lcsh:Agriculture ,log diameter ,lcsh:Biology (General) ,Moving average ,Statistics ,Filtration (mathematics) ,running average ,General Agricultural and Biological Sciences ,lcsh:QH301-705.5 ,Algorithm - Abstract
HUNKOVA, V., JANAK, K.: Methods of data fi ltration and their eff ects on the resulting image of a log at the electronic sensing its dimentisons. Acta univ. agric. et silvic. Mendel. Brun., 2010, LVIII, No. 1, pp. 77–86 The data taken at the electronic reception of logs are tasked with mistaken values and do not correspond to the real shape of a stem. The aim of the data fi ltration is to remove the incorrect data and replace them by the ones, closer to the real values. The goal of presented work was to analyse the eff ect of diff erent fi ltration methods on original data and to recommend the methods, which results correspond to the real shape of the logs with their defects on a surface as close as possible. Methods of fi ltration are based on simple mathematical and statistical procedures, which are subsequently variously combined, because the simplest comparative methods did not fulfi l the expectations. More than 50 methods were combined and analyzed. Approximately 15 of them were selected and tested. The methods were applied to data scanned on ca 150 logs randomly selected from thousands of fi les in the sawmill log yard. The fi ltration eff ects of diff erent methods were visually assessed. The most applicable three methods were selected and they will be tested practically at sawmill in the next step of research. These three methods are described in following paper. Results refer, that all proposed methods correspond to necessities for log dimensions determination. It is not possible to defi ne the best method generally. Diff erent properties of these methods make them suitable for calculation of log dimensions if diff erent type of use is requested. It means for mid diameter determination necessary for volume calculation by using Huber method; for section diame ter determination for volume calculation by using section method; for top diameter determination necessary for sorting; for fi ltration of drive dogs of the conveyor passing through the measuring equipment. data fi ltration, electronic measurement, log dimensions, running average, median, log diameter
- Published
- 2014
35. Data Filtration Methods of Electronic Measurement of Log Dimensions
- Author
-
Karel Janák and Veronika Hunková
- Subjects
Chromatography ,law ,Statistics ,Forestry ,round wood ,data filtration ,SD1-669.5 ,log yards ,electronic reception ,Filtration ,electronic measurement ,Mathematics ,law.invention - Abstract
The article deals with the processing of log dimension data collected by electronic measurement during reception at sawmills. The subject of the work concerns the fi ltration of data before their use for calculation. Filtration methods were designed based on simple mathematical and statistical methods, and compared and evaluated with the use of a designed comparative methodology. As a result, five filtration methods were selected that best suit the reception requirements. At the same time, the impact of filtration on the measurement results is evaluated in relation to the calculation method of wood volume. Furthermore, the calculation by sections is recommended, as it is less affected by filtration errors.
- Published
- 2014
36. Description and validation of a circular padding method for linear roughness measurements of short data lengths.
- Author
-
Schoeters S, Dewulf W, Kruth JP, Haitjema H, and Boeckmans B
- Abstract
Surface topography measurements are vital in industrial quality control. Linear roughness measurements are among the most preferred methods, being quick to perform and easy to interpret. The ISO 16610 standard series prescribes filters that can be used for most cases, but has limitations for restricted measurement lengths. This is because the standard filter type is a Gaussian filter, which like most instances of kernel convolution filters has no output near the edges of the profile, effectively shortening the length of the filtered output profile as compared to the input. In some cases, this leads to a lack of representative data after filtration. Especially in fields such as Additive Manufacturing (AM) this becomes a problem, due to the high "roughness to measurable data length"-ratio that characterizes complex AM parts. This paper describes a method that allows to overcome this limitation:•A method for circular padding of short measured tracks is described and validated.•A flexible profile data post-processing tool was developed in MATLAB to grant users more control over the data analysis. Results obtained from roughness profiles long enough for normal ISO procedures are shown to not change significantly when circularly padded. When only a shorter section of the data is available, where the standard protocol would not be able to compute a filtered profile and related parameters anymore, the circular padding method is shown to lead to results that are in good agreement with the ISO standard procedures., Competing Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (© 2020 The Authors. Published by Elsevier B.V.)
- Published
- 2020
- Full Text
- View/download PDF
37. Duomenų filtravimas ir apdorojimas biodujų gamybos procese
- Author
-
Kurauskas, Mantas and Tekorius, Tomas
- Subjects
duomenų filtravimas ,biodujų modelis ,bio-gas model ,duomenų apdorojimas ,data filtration ,data processing - Abstract
Naudojant duomenis, surinktus praktikos metu, išbandyti metodai jiems apdoroti ir filtruoti. Kadangi pirminis signalas buvo netinkamas tolimesniam darbui dėl triukšmo, kuris yra dujų gamybos proceso ypatumas, todėl norint gauti optimalią reikšmę atitinkančius duomenis reikia sumažinti arba panaikinti triukšmą iš signalo. Darbe išbandyti šie filtrai eksperimentiniams duomenims filtruoti: slenkančio vidurkio filtras, eksponentinis filtras ir kiti žemo dažnio filtrai. Panaudotas aproksimavimo polinomu metodas. Be to, sudarytas siūlomas duomenims filtruoti filtras. Taip pat sudarytas biodujų kreives imituojantis modelis, kuriame atskiriamos biodujų ir triukšmo dedamosios. Palyginti skirtingais metodais gauti rezultatai., Using data collected during the work practice, methods for their process and filtering were tested. The initial signal was inadequate for further work because of noise, which is usual in the gas manufacturing process. In order to get the optimum value corresponding to the data, reduction of noise or elimination of noise were necessary. In this paper these filters were used to filter experimental data: moving average filter, exponential filter and the other low-pass filters. Polynomial approximation method also were used in this paper. In addition, original filter were formed. Biogas curves simulation model was composed, which separated biogas and noise components. Results gained from different methods were compared.
- Published
- 2015
38. Analizator motorjev z notranjim izgorevanjem
- Author
-
Sotler, Tilen and Škraba, Igor
- Subjects
computer and information science ,računalništvo ,digitalno procesiranje signalov ,visokošolski strokovni študij ,filtracija podatkov ,digital signal processing ,programska oprema za izvajanje meritev ,computer science ,internal combustion engines ,data filtration ,udc:004:621.43(043.2) ,motorji z notranjim izgorevanjem ,računalništvo in informatika ,diploma ,diplomske naloge ,DAQ software - Published
- 2014
39. Časový souhrn audiometrických vyšetření
- Author
-
Barot, Tomáš and Barot, Tomáš
- Abstract
Hlavním cílem diplomové práce bylo vytvoření systému pro statistické vyhodnocení časových souhrnů počtu provedených vyšetření pro ordinace ORL lékařství. Pro výsledky statistického vyhodnocení dat, podle definovatelných filtračních podmínek, je zajištěna vhodná forma prezentace výsledků a možnost volby způsobů jejich interpretace. V teoretické části jsou vysvětleny druhy a způsoby vyšetření sluchu v lékařském odvětví otorhinolaryngologie. Je zde popsána metodika tvorby aplikace a analýza způsobu archivace dat vyšetření v ORL. Praktická část dokumentuje etapu definice požadavků, návrhu architektury systému, realizaci a testování správnosti řešení. Následuje přehled funkcí, které systém pro statistickou analýzu provozu ORL ordinace poskytuje. Závěrem je uvedeno zhodnocení možností praktického nasazení realizovaného systému, a to ve formě statistického systému doplňujícího již existující programové vybavení v ordinacích ORL., The main aim of the thesis was to create the system, that evaluates the time-count summaries of the realized investigations in the ORL medical clinics. For the statistical evaluations of the examination data, was designed the appropriate visual output form. The outputs can be filtered according to the requirement conditions of the doctors. In the teoretically part of the thesis are explained the types and methods of the hearing investigations in the medical branches Otorhinolaryngology. Teoretical part describes the software creating methodology and the analysis of the data way archiving in the ORL. The practical part contains several parts depending on the system requirements, which are the system architecture design, the realization and verification of the solution. In the practical part are explained the functions of the implemented system, that are used for the ORL data statistic analyses and their results visualization. In the final part is mentioned the possible practical utilization of the implemented system as the statistical analytic tool, that can complement the current software equipment in the ORL clinics., Ústav automatizace a řídicí techniky, obhájeno
- Published
- 2010
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.