137 results on '"DATA fusion (Statistics)"'
Search Results
2. Integration of Historical Data for the Analysis of Multiple Assessment Studies.
- Author
-
Marcoulides, Katerina M.
- Subjects
- *
DATA integration , *DATA fusion (Statistics) , *DATA analysis - Abstract
Integrative data analyses have recently been shown to be an effective tool for researchers interested in synthesizing datasets from multiple studies in order to draw statistical or substantive conclusions. The actual process of integrating the different datasets depends on the availability of some common measures or items reflecting the same studied constructs. However, exactly how many common items are needed to effectively integrate multiple datasets has to date not been determined. This study evaluated the effect of using different numbers of common items in integrative data analysis applications. The study used simulations based on realistic data integration settings in which the number of common item sets was varied. The results provided insight concerning the optimal numbers of common items sets to safeguard estimation precision. The practical implications of these findings in view of past research in the psychometric literature concerning the necessary number of common item sets are also discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Toward a Data Fusion Index for the Assessment and Enhancement of 3D Multimodal Reconstruction of Built Cultural Heritage.
- Author
-
Pamart, Anthony, Abergel, Violette, de Luca, Livio, and Veron, Philippe
- Subjects
- *
MULTISENSOR data fusion , *CULTURAL property , *DIGITIZATION , *DATA fusion (Statistics) , *POINT cloud , *ACQUISITION of data , *METADATA , *DATA analysis - Abstract
In the field of digital cultural heritage (DCH), 2D/3D digitization strategies are becoming more and more complex. The emerging trend of multimodal imaging (i.e., data acquisition campaigns aiming to put in cooperation multi-sensor, multi-scale, multi-band and/or multi-epochs concurrently) implies several challenges in term of data provenance, data fusion and data analysis. Making the assumption that the current usability of multi-source 3D models could be more meaningful than millions of aggregated points, this work explores a "reduce to understand" approach to increase the interpretative value of multimodal point clouds. Starting from several years of accumulated digitizations on a single use-case, we define a method based on density estimation to compute a Multimodal Enhancement Fusion Index (MEFI) revealing the intricate modality layers behind the 3D coordinates. Seamlessly stored into point cloud attributes, MEFI is able to be expressed as a heat-map if the underlying data are rather isolated and sparse or redundant and dense. Beyond the colour-coded quantitative features, a semantic layer is added to provide qualitative information from the data sources. Based on a versatile descriptive metadata schema (MEMoS), the 3D model resulting from the data fusion could therefore be semantically enriched by incorporating all the information concerning its digitization history. A customized 3D viewer is presented to explore this enhanced multimodal representation as a starting point for further 3D-based investigations. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. Multisource Data Fusion Analysis of Maintainability for Overlapping Degree High Performance Computing.
- Author
-
Li, Ze, Li, Yonghua, Meng, Lijun, and Meng, Dongrong
- Subjects
- *
HIGH performance computing , *MULTISENSOR data fusion , *DATA fusion (Statistics) , *MAINTAINABILITY (Engineering) , *MANUFACTURING processes , *DATA analysis - Abstract
With the continuous development of social economy, industry has become the main industry that contributes to the economy. In the process of industrial development, human operation is gradually replaced by machine operation, and the replacement of machines is followed. Over the years, machines have become more and more important in industry. However, although the machine liberates manpower, over time, it has experienced external pressures such as the environment and personnel and is internally affected by the technical level, experience, equipment familiarity, and physical and mental state of the maintenance personnel. Industrial machines tend to cause all sorts of problems when they run for too long. Therefore, it is ensured that the efficient operation and long-term operation of the machine are crucial issues. In view of the current situation and problems, this paper combines different equipment maintenance test data, adopts the method of high-performance computing overlap, establishes corresponding multisource data for data conversion and processing, and then uses the Bayesian method to analyze the multisource data. Parameter fusion and overfitting are performed, and finally the device is tested for prior data fusion using the overlap data model. The simulation results of this paper show that the high-performance overlap calculation method is effective and can effectively support the fusion analysis of maintainable multisource data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform.
- Author
-
Wang, Yajun, Cheng, Shengming, Zhang, Xinchen, Leng, Junyu, and Liu, Jun
- Subjects
- *
PARALLEL processing , *ELECTRONIC data processing , *DATA fusion (Statistics) , *BIG data , *DATA analysis , *DISTRIBUTED databases , *DATA distribution , *MULTISENSOR data fusion - Abstract
The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. An overview of air quality analysis by big data techniques: Monitoring, forecasting, and traceability.
- Author
-
Huang, Wei, Li, Tianrui, Liu, Jia, Xie, Peng, Du, Shengdong, and Teng, Fei
- Subjects
- *
AIR analysis , *AIR quality , *AIR quality monitoring , *ARTIFICIAL neural networks , *DATA analysis , *DATA fusion (Statistics) , *BIG data - Abstract
With the rapid development of economy and the frequent occurrence of air pollution incidents, the problem of air pollution has become a hot issue of concern to the whole people. The air quality big data is generally characterized by multi-source heterogeneity, dynamic mutability, and spatial–temporal correlation, which usually uses big data technology for air quality analysis after data fusion. In recent years, various models and algorithms using big data techniques have been proposed. To summarize these methodologies of air quality study, in this paper, we first classify air quality monitoring by big data techniques into three categories, consisting of the spatial model, temporal model and spatial–temporal model. Second, we summarize the typical methods by big data techniques that are needed in air quality forecasting into three folds, which are statistical forecasting model, deep neural network model, and hybrid model, presenting representative scenarios in some folds. Third, we analyze and compare some representative air pollution traceability methods in detail, classifying them into two categories: traditional model combined with big data techniques and data-driven model. Finally, we provide an outlook on the future of air quality analysis with some promising and challenging ideas. • The air quality early warning system are systematically reviewed. • Some air quality analysis methods by big data techniques are overviewed. • The difficulties and some ideas of air quality research are discussed briefly. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
7. Multi-source data fusion for economic data analysis.
- Author
-
Li, Menggang, Wang, Fang, Jia, Xiaojun, Li, Wenrui, Li, Ting, and Rui, Guangwei
- Subjects
- *
MULTISENSOR data fusion , *ECONOMIC research , *DATA analysis , *BOOSTING algorithms , *ECONOMIC forecasting , *DATA fusion (Statistics) - Abstract
Economic data include data of various types and characteristics such as macro-data, meso-data, and micro-data. The source of economic data can be the data related to economy held by the National Bureau of statistics and a various software. These multi-source and heterogeneous data have important value for economic analysis and forecasting. Taking into account the limitations of existing methods such as low accuracy and complex calculations, this paper proposes an economic data analysis and prediction method based on machine learning. We use machine learning to solve the data fusion problem in the process of multi-source data analysis and prediction in the economic field. Specifically, we proposes an economic data analysis and forecasting method combining convolutional auto-encoder and extreme gradient boosting algorithms. This method uses a convolutional auto-encoder to extract the data characteristics of the normalized parameter sequence and uses it to train an extreme gradient boosting model to predict the level of economic development and evaluate the importance of each influencing factor. Finally, through a case study, this paper integrates the data of labor force, education and population to forecast GDP. Through the verification of this case, the prediction accuracy of the proposed method is higher than the AE-XGBoost method and CAE-1D-XGBoost method used in this experiment, and the error is kept below 11.7%. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Calibrated regression estimation using empirical likelihood under data fusion.
- Author
-
Li, Wei, Luo, Shanshan, and Xu, Wangli
- Subjects
- *
ASYMPTOTIC normality , *DATA fusion (Statistics) , *MULTISENSOR data fusion , *EXTREME value theory , *MULTIPLE imputation (Statistics) , *ELECTRONIC data processing , *DATA analysis - Abstract
Data analysis based on information from different sources, typically known as the data fusion problem, is common in economic and biomedical studies. An interesting question concerns the regression of an outcome variable on certain covariates when combining two distinct datasets. These datasets consist of a primary sample containing the outcome and a subset of the covariates, and a supplemental sample comprising information only on the full set of covariates. Previous methods have proposed doubly robust estimation procedures that employ a single propensity score model for the data fusion process and a single imputation model for the covariates available only in the supplemental dataset. However, it may be questionable to assume that either model is correctly specified due to an unknown data generating process. To address this issue, an empirical likelihood based approach that calibrates multiple propensity scores and imputation models is introduced. The resulting estimator is consistent when any one of the models is correctly specified and is robust against extreme values of the fitted propensity scores. The asymptotic normality property and the estimation efficiency are also discussed. Simulation studies show that the proposed estimator has substantial advantages over existing estimators, and an assembled U.S. household expenditure data example is used for illustration. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Non-readily identifiable data collaboration analysis for multiple datasets including personal information.
- Author
-
Imakura, Akira, Sakurai, Tetsuya, Okada, Yukihiko, Fujii, Tomoya, Sakamoto, Teppei, and Abe, Hiroyuki
- Subjects
- *
DATA analysis , *SUPERVISED learning , *MULTISENSOR data fusion , *STATISTICAL sampling , *PERSONALLY identifiable information , *DATA fusion (Statistics) - Abstract
Multi-source data fusion, in which multiple data sources are jointly analyzed to obtain improved information, has attracted considerable research attention. Data confidentiality and cross-institutional communication are critical for the construction of a prediction model using datasets of multiple medical institutions. In such cases, data collaboration (DC) analysis by sharing dimensionality-reduced intermediate representations without iterative cross-institutional communications may be appropriate. Identifiability of the shared data is essential when analyzing data including personal information. In this study, the identifiability of the DC analysis is investigated. The results reveal that the shared intermediate representations are readily identifiable to the original data for supervised learning. This study then proposes a non-readily identifiable DC analysis only sharing non-readily identifiable data for multiple medical datasets including personal information. The proposed method solves identifiability concerns based on a random sample permutation, the concept of interpretable DC analysis, and usage of functions that cannot be reconstructed. In numerical experiments on medical datasets, the proposed method exhibits non-readily identifiability while maintaining a high recognition performance of the conventional DC analysis. The proposed method exhibits a nine percentage point improvement regarding the recognition performance over the local analysis that uses only local dataset for a hospital dataset. • A non-readily identifiable data collaboration (DC) analysis was proposed. • Identifiability of the shared data of DC analysis was investigated. • A novel mathematical definition of the identifiability of data was introduced. • The performance of the proposed method was verified for medical datasets. • The proposed method will be a key technology for privacy-preserving analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Individualized inference through fusion learning.
- Author
-
Cai, Chencheng, Chen, Rong, and Xie, Min‐ge
- Subjects
- *
DATA fusion (Statistics) , *GRAPHICAL modeling (Statistics) , *DATA science , *DATA analysis - Abstract
Fusion learning methods, developed for the purpose of analyzing datasets from many different sources, have become a popular research topic in recent years. Individualized inference approaches through fusion learning extend fusion learning approaches to individualized inference problems over a heterogeneous population, where similar individuals are fused together to enhance the inference over the target individual. Both classical fusion learning and individualized inference approaches through fusion learning are established based on weighted aggregation of individual information, but the weight used in the latter is localized to the target individual. This article provides a review on two individualized inference methods through fusion learning, iFusion and iGroup, that are developed under different asymptotic settings. Both procedures guarantee optimal asymptotic theoretical performance and computational scalability. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Manifold LearningStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Nonparametric MethodsData: Types and Structure > Massive Data [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. A Method of Interest Degree Mining Based on Behavior Data Analysis.
- Author
-
Li, Zhen, Xu, Shuo, and Wang, Tianyu
- Subjects
- *
BEHAVIORAL assessment , *DATA fusion (Statistics) , *USER-generated content , *DATABASES , *INFORMATION-seeking behavior , *DATA analysis , *DATA mining - Abstract
Based on big data, this paper starts from the behavior data of users on social media, and studies and explores the core issues of user modeling under personalized services. Focusing on the goal of user interest modeling, this paper proposes corresponding improvement measures for the existing interest model, which has great difference in interest description among different users and it is difficult to find the user interest change in time. For the above problems, this paper takes user-generated content and user behavior information as the analysis object, and uses natural language processing, knowledge warehouse, data fusion and other methods and techniques to numerically analyze user interest mining based on text mining and multi-source data fusion. We propose a user interest label space mapping method to avoid data sparse problem caused by too many dimensions in interest analysis. At the same time, we propose a method to extract and blend the long-term and short-term interests, and realize the comprehensive evaluation of interests. In the analysis of the big data phase, the user preference social property application preference value law, it is expected to achieve user Internet social media application preference data mining from the perspective of big data. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
12. Fusion of open forest data and machine fieldbus data for performance analysis of forest machines.
- Author
-
Melander, Lari, Einola, Kalle, and Ritala, Risto
- Subjects
- *
FORESTS & forestry , *DATA fusion (Statistics) , *ATMOSPHERIC models , *DATA analysis , *MACHINE performance , *MULTISENSOR data fusion , *WEATHER - Abstract
Forest resource data is important in targeting the forestry operations, and it is in the hearth of the precision forestry concept. The forest resource data can be produced with many techniques, and the number of existing forest data sources has increased during the years. In addition to the forest resource data, other data describing the circumstances of the forest site, such as trafficability and weather conditions, are available. In Finland, a forest data platform gathers the data sources under a single service for easier implementation of the precision forestry applications. This data is useful in operations planning, but it also describes the conditions that prevail when the forest machine arrives to the forest site. This study proposes data fusion between fieldbus time series of the forest machine and the forest data. The fused dataset enables explorative statistical analysis for examining the relationship between the machine performance and the forest attributes and provides data for building predictive models between the two. The presented methods are applied into a dataset generated from a field test data. The results show that some fieldbus time series features are predictable from forest attributes with R 2 value over 0.80, and clustering methods help in interpreting the machine behavior in different environments. In addition, an idea for generating a new forest data source to the forest data platform based on the fusion is discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. Parameter learning and applications of the inclusion-exclusion integral for data fusion and analysis.
- Author
-
Honda, Aoi and James, Simon
- Subjects
- *
MULTISENSOR data fusion , *FUZZY integrals , *DATA fusion (Statistics) , *DATA analysis , *INTEGRALS , *FUZZY measure theory - Abstract
• A framework of modeling with the inclusion-exclusion integral is proposed. • Global and quadratic with linear constraints learning methods are detailed. • Methods are tested on benchmark data and compared with neural networks. • Data analysis examples are illustrated. • The approach demonstrated some advantages over current techniques. Developments in the learning and interpretation of fuzzy integrals have paved the way for a myriad of applications in data analysis and prediction. The ability of the associated fuzzy measure to model heterogeneous interactions allow high flexibility when it comes to data fusion tasks – comparable to that of neural networks – however the fuzzy integral structure and properties also afford a degree of robustness and interpretability not enjoyed by such tools. On the other hand, neural network architectures can accommodate fuzzy integrals as a special case. In this paper, we propose that such a representation allows us to naturally extend and adapt the fuzzy integral framework toward specific applications. We focus on the inclusion-exclusion integral, which is a generalization of the Choquet integral, and detail methods for learning the various parameters, given its extended architecture. We then validate the performance and usefulness of this approach on some benchmark datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. A comparison of Bayesian synthesis approaches for studies comparing two means: A tutorial.
- Author
-
Du, Han, Bradbury, Thomas N., Lavner, Justin A., Meltzer, Andrea L., McNulty, James K., Neff, Lisa A., and Karney, Benjamin R.
- Subjects
- *
MULTISENSOR data fusion , *DATA fusion (Statistics) , *SELF-control , *DATA analysis - Abstract
Researchers often seek to synthesize results of multiple studies on the same topic to draw statistical or substantive conclusions and to estimate effect sizes that will inform power analyses for future research. The most popular synthesis approach is meta‐analysis. There have been few discussions and applications of other synthesis approaches. This tutorial illustrates and compares multiple Bayesian synthesis approaches (i.e., integrative data analyses, meta‐analyses, data fusion using augmented data‐dependent priors, and data fusion using aggregated data‐dependent priors) and discusses when and how to use these Bayesian synthesis approaches to combine studies that compare two independent group means or two matched group means. For each approach, fixed‐, random‐, and mixed‐effects models with other variants are illustrated with real data. R code is provided to facilitate the implementation of each method and each model. On the basis of these analyses, we summarize the strengths and limitations of each approach and provide recommendations to guide future synthesis efforts. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
15. Common and distinct variation in data fusion of designed experimental data.
- Author
-
Alinaghi, Masoumeh, Bertram, Hanne Christine, Brunse, Anders, Smilde, Age K., and Westerhuis, Johan A.
- Subjects
- *
MULTISENSOR data fusion , *DATA fusion (Statistics) , *BIOLOGICAL systems , *EXPERIMENTAL design , *DATA analysis , *MESENCEPHALON - Abstract
Introduction: Integrative analysis of multiple data sets can provide complementary information about the studied biological system. However, data fusion of multiple biological data sets can be complicated as data sets might contain different sources of variation due to underlying experimental factors. Therefore, taking the experimental design of data sets into account could be of importance in data fusion concept. Objectives: In the present work, we aim to incorporate the experimental design information in the integrative analysis of multiple designed data sets. Methods: Here we describe penalized exponential ANOVA simultaneous component analysis (PE-ASCA), a new method for integrative analysis of data sets from multiple compartments or analytical platforms with the same underlying experimental design. Results: Using two simulated cases, the result of simultaneous component analysis (SCA), penalized exponential simultaneous component analysis (P-ESCA) and ANOVA-simultaneous component analysis (ASCA) are compared with the proposed method. Furthermore, real metabolomics data obtained from NMR analysis of two different brains tissues (hypothalamus and midbrain) from the same piglets with an underlying experimental design is investigated by PE-ASCA. Conclusions: This method provides an improved understanding of the common and distinct variation in response to different experimental factors. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
16. Convolutional neural network for hyperspectral data analysis and effective wavelengths selection.
- Author
-
Liu, Yisen, Zhou, Songbin, Han, Wei, Liu, Weixin, Qiu, Zefan, and Li, Chang
- Subjects
- *
DATA analysis , *SUPPORT vector machines , *WEIGHING instruments , *WAVELENGTHS , *COFFEE beans , *DATA fusion (Statistics) , *PIPELINE inspection - Abstract
Fusion of spectral and spatial information has been proved to be an effective approach to improve model performance in near-infrared hyperspectral data analysis. Regardless, most of the existing spectral-spatial classification methods require fairly complex pipelines and exact selection of parameters, which mainly depend on the investigator's experience and the object under test. Convolutional neural network (CNN) is a powerful tool for representing complicated data and usually works with few "hand-engineering", making it an appropriate candidate for developing a general and automatic approach. In this paper, a two-branch convolutional neural network (2B–CNN) was developed for spectral-spatial classification and effective wavelengths (EWs) selection. The proposed network was evaluated by three classification data sets, including herbal medicine, coffee bean and strawberry. The results showed that the 2B–CNN obtained the best classification accuracies (96.72% in average) when compared with support vector machine (92.60% in average), one dimensional CNN (92.58% in average), and grey level co-occurrence matrix based support vector machine (93.83% in average). Furthermore, the learned weights of the two-dimensional branch in 2B–CNN were adopted as the indicator of EWs and compared with the successive projections algorithm. The 2B–CNN models built with wavelengths selected by the weight indicator achieved the best accuracies (96.02% in average) among all the examined EWs models. Different from the conventional EWs selection method, the proposed algorithm works without any additional retraining and has the ability to comprehensively consider the discriminative power in spectral domain and spatial domain. Image 1 • Convolutional neural network is developed for hyperspectral data classification. • Two-branch network structure is adopted for spectral-spatial information fusion. • The learned weights are adopted as the indicator of effective wavelengths. • The proposed network shows robustness under small sample size. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
17. Data analytics approach for melt-pool geometries in metal additive manufacturing.
- Author
-
Lee, Seulbi, Peng, Jian, Shin, Dongwon, and Choi, Yoon Suk
- Subjects
- *
MANUFACTURING processes , *PROCESS optimization , *STATISTICAL correlation , *MACHINE learning , *GEOMETRY , *DATA analysis , *DATA fusion (Statistics) - Abstract
Modern data analytics was employed to understand and predict physics-based melt-pool formation by fabricating Ni alloy single tracks using powder bed fusion. An extensive database of melt-pool geometries was created, including processing parameters and material characteristics as input features. Correlation analysis provided insight for relationships between process parameters and melt-pools, and enabled the development of meaningful machine learning models via the use of highly correlated features. We successfully demonstrated that data analytics facilitates understanding of the inherent physics and reliable prediction of melt-pool geometries. This approach can serve as a basis for the melt-pool control and process optimization. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
18. Weighted Evidential Fusion Method for Fault Diagnosis of Mechanical Transmission Based on Oil Analysis Data.
- Author
-
Yan, Shu-fa, Ma, Biao, Zheng, Chang-song, and Chen, Man
- Subjects
- *
FAULT diagnosis , *DATA fusion (Statistics) , *BASE oils , *MULTISENSOR data fusion , *DATA analysis , *MECHANICAL wear - Abstract
Condition monitoring (CM) and fault diagnosis are critical for the stable and reliable operation of mechanical transmissions. Mechanical transmission wear, which leads to changes in the physicochemical properties of the lubrication oil and thus severe wear, is a slow degradation process that can be monitored by oil analysis, but the actual degradation degree is difficult to evaluate. To solve this problem, we propose a new weighted evidential data fusion method to better characterize the degradation degree of the mechanical transmission through the fusion of multiple CM datasets from oil analysis. This method includes weight allocation and data fusion steps that lead to a more accurate data-based fault diagnostic result for CM. First, the weight of each evidence is modeled with a weighted average function by measuring the relative scale of the permutation entropy from each CM dataset. Then, the multiple CM datasets are fused by the Dempster combination rule. Compared with other evidential data fusion methods, the proposed method using the new weight allocation function seems more reasonable. The rationality and superiority of the proposed method were evaluated through a case study involving an oilbased CM dataset from a power-shift steering transmission. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
19. The data analysis of roughness extraction of target topography using minimum median plane fitting method.
- Author
-
Wang, Qiangfeng, Cao, Yan, Bai, Yu, Wu, Yujia, and Wu, Qingyun
- Subjects
- *
DATA analysis , *TOPOGRAPHY , *INTERPOLATION algorithms , *FEATURE extraction , *SURFACE roughness , *DATA fusion (Statistics) - Abstract
According to the problem that the elevation data does not reflect the slope and surface roughness of target topography, the preprocessing of topographic elevation data and extraction algorithm of topographic feature are proposed, and the corresponding extraction of topographic feature is done. A terrain risk assessment method is presented based on terrain roughness and slope information fusion, aiming at the problem that terrain roughness and gradient cannot be directly reflected from the terrain elevation data, in this paper. The innovation is that it is the first time that the bilinear interpolation algorithm is applied in preprocess of elevation data and extraction of topographic feature, as well as the terrain roughness and gradient information fusion algorithm are applied to terrain feature extraction and risk assessment for the first time. By simulation and checking calculation of a certain digital topography example, it is proved that the extraction method of topographic information based on elevation data is feasible and reliable. It will provide a new research approach for target information recognition and topography risk assessment accurately. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
20. A geographical origin assessment of Italian hazelnuts: Gas chromatography-ion mobility spectrometry coupled with multivariate statistical analysis and data fusion approach.
- Author
-
Sammarco, Giuseppe, Bardin, Daniele, Quaini, Federica, Dall'Asta, Chiara, Christmann, Joscha, Weller, Philipp, and Suman, Michele
- Subjects
- *
MULTISENSOR data fusion , *HAZELNUTS , *STATISTICS , *DATA fusion (Statistics) , *DATA analysis , *PRINCIPAL components analysis , *MULTIVARIATE analysis - Abstract
[Display omitted] • Geographical origin assessment is a growing requirement for both food quality and safety. • GC-IMS is proposed as an effective technique for the geographical assessment of Italian hazelnuts. • VOCs analysed differ in samples, according to their provenience. • GC-IMS data are successfully merged with sensory analysis ones, providing better statistical models performance. Hazelnut is a commodity that has gained interest in the food science community concerning its authenticity. The quality of the Italian hazelnuts is guaranteed by Protected Designation of Origin and Protected Geographical Indication certificates. However, due to their modest availability and the high price, fraudulent producers/suppliers blend, or even substitute, Italian hazelnuts with others from different countries, having a lower price, and often a lower quality. To contrast or prevent these illegal activities, the present work investigated the application of the Gas Chromatography-Ion mobility spectrometry (GC-IMS) technique on the hazelnut chain (fresh, roasted, and paste of hazelnuts). The raw data obtained were handled and elaborated using two different ways, software for statistical analysis, and a programming language. In both cases, Principal Component Analysis and Partial Least Squares-Discriminant Analysis models were exploited, to study how the Volatile Organic Profiles of Italian, Turkish, Georgian, and Azerbaijani products differ. A prediction set was extrapolated from the training set, for a preliminary models' evaluation, then an external validation set, containing blended samples, was analysed. Both approaches highlighted an interesting class separation and good model parameters (accuracy, precision, sensitivity, specificity, F1-score). Moreover, a data fusion approach with a complementary methodology, sensory analysis, was achieved, to estimate the performance enhancement of the statistical models, considering more discriminant variables and integrating at the same time further information correlated to quality aspects. GC-IMS could be a key player as a rapid, direct, cost-effective strategy to face authenticity issues regarding the hazelnut chain. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Multi-block DD-SIMCA as a high-level data fusion tool.
- Author
-
Rodionova, O. and Pomerantsev, A.
- Subjects
- *
MULTISENSOR data fusion , *BLOCK codes , *DATA analysis , *DATA fusion (Statistics) - Abstract
Multi-block classification method based on the Data Driven Soft Independent Modeling of Class Analogy (DD-SIMCA) is presented. A high-level data fusion approach is used for the joint analysis of data collected with the help of different analytical instruments. The proposed fusion technique is very simple and straightforward. It uses a Cumulative Analytical Signal which is a combination of outcomes of the individual classification models. Any number of blocks can be combined. Although the high-level fusion eventually leads to a rather complex model, the analysis of partial distances makes it possible to establish a meaningful relationship between the classification results and the influence of individual samples and specific tools. Two real world examples are used to demonstrate the applicability of the multi-block algorithm and the consistency of the multi-block method with its predecessor, a conventional DD-SIMCA. [Display omitted] • New high-level data fusion method for class modeling is presented. • The method is based on a new concept of Cumulative Analytical Signal. • Multi-block DD-SIMCA procedure is simple and straightforward. • Explicit links between final results and specific blocks/objects are kept. • Any number of blocks can be combined in a single step. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Development of a clustering based fusion framework for locating the most consistent IrisCodes bits.
- Author
-
Sadhya, Debanjan, De, Kanjar, Balasubramanian, Raman, and Pratim Roy, Partha
- Subjects
- *
DATA fusion (Statistics) , *CLUSTER analysis (Statistics) , *BIOMETRIC identification , *MATHEMATICAL models , *DATA analysis - Abstract
• A framework has been developed for extracting the most consistent bits from iris features (IrisCodes). • The proposed model is based on the incorporation of some novel features such as use of iris masks and optimized scoring based on 1D cluster formation. • The results obtained on four benchmark iris databases (CASIAv3 Interval, IITD, MMU2, and CASIAv4 Thousand) demonstrate significant improvements over the baseline EER (%). Iris-based biometric systems are widely considered as one of the most accurate forms for authenticating individual identities. Features from an iris image are commonly represented as a sequence of bits, known as IrisCodes. The work in this paper focuses on locating and subsequently extracting the most consistent bit-locations from these binary iris features. We achieve this objective by initially constructing a Matching-Code vector from some specifically designated training IrisCodes, and subsequently forming a series of 1D clusters in them. Every cluster element is then assigned a score in the range [ 0 − 1 ] on the basis of two cluster properties - the size of the cluster it belongs to and its distance from the center of the cluster. We term this cumulative score as the Significance Index S (b) for a cluster element b. Finally, we select those locations which correspond to the highest scores for every IrisCode. We have tested our approach for four benchmark iris databases (CASIAv3-Interval, CASIAv4-Thousand, IIT Delhi and MMU2) while varying the number of extracted bit-locations from 50 to 300. Our empirical results exhibit significant improvements over the baseline results regarding both the consistency of the extracted bit-locations, as well as the overall performance of the resulting biometric system. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
23. Fusing information from tickets and alerts to improve the incident resolution process.
- Author
-
Salah, Saeed, Maciá-Fernández, Gabriel, and Díaz-Verdejo, Jesús E.
- Subjects
- *
DATA fusion (Statistics) , *STATISTICAL correlation , *SEMANTIC integration (Computer systems) , *TELECOMMUNICATION systems ,INFORMATION technology personnel - Abstract
In the context of network incident monitoring, alerts are useful notifications that provide IT management staff with information about incidents. They are usually triggered in an automatic manner by network equipment and monitoring systems, thus containing only technical information available to the systems that are generating them. On the other hand, ticketing systems play a different role in this context. Tickets represent the business point of view of incidents. They are usually generated by human intervention and contain enriched semantic information about ongoing and past incidents. In this article, our main hypothesis is that incorporating tickets information into the alert correlation process will be beneficial to the incident resolution life-cycle in terms of accuracy, timing, and overall incident’s description. We propose a methodology to validate this hypothesis and suggest a solution to the main challenges that appear. The proposed correlation approach is based on the time alignment of the events (alerts and tickets) that affect common elements in the network. For this we use real alert and ticket datasets obtained from a large telecommunications network. The results have shown that using ticket information enhances the incident resolution process, mainly by reducing and aggregating a higher percentage of alerts compared with standard alert correlation systems that only use alerts as the main source of information. Finally, we also show the applicability and usability of this model by applying it to a case study where we analyze the performance of the management staff. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
24. Unsupervised graph-based feature selection via subspace and pagerank centrality.
- Author
-
Henni, K., Mezghani, N., and Gouin-Vallerand, C.
- Subjects
- *
GRAPH theory , *FEATURE selection , *FEATURE extraction , *DATA analysis , *DATA fusion (Statistics) - Abstract
Highlights • Features that discriminate classes are linked to provide an undirected graph. • Features relationships are defined based on unsupervised subspace learning. • PageRank is investigated to rank features according to their importance in graph. • High dimension low sample data are used to assess and compare the proposed method. • The proposed unsupervised graph based method outperforms competitive methods. Abstract Feature selection has become an indispensable part of intelligent systems, especially with the proliferation of high dimensional data. It identifies the subset of discriminative features leading to better learning performances, i.e., higher learning accuracy, lower computational cost and significant model interpretability. This paper proposes a new efficient unsupervised feature selection method based on graph centrality and subspace learning called UGFS for ‘ Unsupervised Graph-based Feature Selection ’. The method maps features on an affinity graph where the relationships (edges) between feature nodes are defined by means of data points subspace preference. Feature importance score is then computed on the entire graph using a centrality measure. For this purpose, we investigated the Google’s PageRank method originally introduced to rank web-pages. The proposed feature selection method has been evaluated using classification and redundancy rates measured on the selected feature subsets. Comparisons with the well-known unsupervised feature selection methods, on gene/expression benchmark datasets, demonstrate the validity and the efficiency of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
25. A multimodal fusion based framework to reinforce IDS for securing Big Data environment using Spark.
- Author
-
Donkal, Gita and Verma, Gyanendra K.
- Subjects
- *
BIG data , *DATA security , *DATA fusion (Statistics) , *DATA analysis , *ELECTRONIC data processing - Abstract
Securing Big Data has become one of the major issues of the exponentially pacing computing world, where data analysis plays an integral role, as it helps data analysts to figure out the interests and detailed information of organizational and industrial assets. Acts like cyber espionage and data theft lead to the inappropriate use of data. In order to detect the malicious content, we propose a model that ensures the security of heterogeneous data residing in commodity hardware. To verify the correctness of our model, (Knowledge Data Discovery) NSL KDD Cup 99 dataset is used that has been used by various researchers for working on (Intrusion Detection System) IDS. We incorporate decision-based majority voting multi-modal fusion that combines the results of different classifiers and facilitates better performance in terms of accuracy, detection rate and false alarm rate. Moreover, (Non-dominated Sorting Genetic Algorithm) NSGA-II plays its integral role for the selection of most promising features. Additionally, to reduce the computational complexity which is again a crucial aspect while processing Big Data, we incorporate the concepts of Hadoop MapReduce and Spark to ensure the fast processing of Big Data in a parallel computational environment. Our proposed model is able to achieve 92.03% accuracy, 99.38% detection rate and a testing time of 0.32 seconds. Additionally, we have achieved advantages in terms of accuracy and testing time of data over the existing techniques that use IDS as a security mechanism. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
26. Fusing Landsat and MODIS data to retrieve multispectral information from fire-affected areas over tropical savannah environments in the Brazilian Amazon.
- Author
-
Borini Alves, Daniel, Montorio Llovería, Raquel, Pérez-Cabello, Fernando, and Vlassova, Lidia
- Subjects
- *
FOREST fires , *MODIS (Spectroradiometer) , *LANDSAT satellites , *DATA analysis , *DATA fusion (Statistics) - Abstract
In this study, the combination of surface reflectance products from Terra- Moderate Resolution Imaging Spectroradiometer and Landsat-Enhanced Thematic Mapper Plus sensors are explored through the Flexible Spatiotemporal DAta Fusion (FSDAF) algorithm within the framework of forest fire studies over tropical savannah environments. Thus, 60 fusion-derived images were generated from four spectral bands [red, near-infrared, shortwave infrared (SWIR1 and SWIR2)] and six spectral indices [normalized difference vegetation index, normalized difference moisture index, global environment monitoring index, soil-adjusted vegetation index, normalized burn ratio (NBR), and differenced normalized burn ratio (dNBR)] over two selected study sites. For all fusion processes performed, the actual Landsat images for the corresponding dates are available, which supports validation of the blended images. Additionally, integration of blended spectral indices in the immediate post-fire evaluation and the generation of fire severity were analysed. The blended bands presented correlation and Structure Similarity Index Measure (SSIM) values that were consistently higher than 0.819 and root mean square error values of less than 0.027, which confirms good accuracy levels obtained from the model. Similar correlation and SSIM accuracy levels were observed in the blended indices assessment for both study sites, which enables its values to be well-integrated for an analysis of the immediately post-fire date. However, the fire severity mapping from fused images needs to be carefully implemented since the dNBR index is generally less accurate than other blended indices. FSDAF fusion proved to be a useful alternative to retrieving multispectral information from savannah environments affected by fires. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
27. Chemometrics in analytical chemistry—part II: modeling, validation, and applications.
- Author
-
Brereton, Richard G., Jansen, Jeroen, Lopes, João, Marini, Federico, Pomerantsev, Alexey, Rodionova, Oxana, Roger, Jean Michel, Walczak, Beata, and Tauler, Romà
- Subjects
- *
CHEMOMETRICS , *ANALYTICAL chemistry , *DATA analysis , *DATA fusion (Statistics) , *CALIBRATION - Abstract
The contribution of chemometrics to important stages throughout the entire analytical process such as experimental design, sampling, and explorative data analysis, including data pretreatment and fusion, was described in the first part of the tutorial “Chemometrics in analytical chemistry.” This is the second part of a tutorial article on chemometrics which is devoted to the supervised modeling of multivariate chemical data, i.e., to the building of calibration and discrimination models, their quantitative validation, and their successful applications in different scientific fields. This tutorial provides an overview of the popularity of chemometrics in analytical chemistry. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
28. In-move aligned SINS/GNSS system using recurrent wavelet neural network (RWNN)-based integration scheme.
- Author
-
Rafatnia, S., Nourmohammadi, H., Keighobadi, J., and Badamchizadeh, M.A.
- Subjects
- *
AUTONOMOUS vehicles , *GLOBAL Positioning System , *MEMS resonators , *DATA fusion (Statistics) , *DATA analysis - Abstract
Abstract Advances in micro-electro mechanical system (MEMS) technology bring about revolutionary changes in autonomous vehicle navigation. As a new development, strap-down inertial navigation system (SINS) is effectively combined with global navigation satellite system (GNSS) to construct an integrated SINS/GNSS system. However, time-growing navigation error is the main challenge of using MEMS-grade inertial measurement unit (IMU) in the SINS/GNSS system. Failure of un-accounted inertial sensor error causes a rapid degradation in the overall performance of low-cost SINSs. This paper aims to enhance the long-term performance of low-cost MEMS-grade SINS/GNSS navigation system. A new integration scheme is presented for in-move aligned SINS/GNSS system. Un-modeled nonlinearities in the SINS dynamics as well as error uncertainties in the measurements of MEMS-grade IMU motivate using a robust data fusion algorithm for the proposed integration scheme. Considering these facts, a new recurrent wavelet neural network (RWNN)-based algorithm is designed for data fusion in the proposed in-move aligned SINS/GNSS system. Several vehicular field tests have been carried out to assess the long-term performance and accuracy of the proposed navigation algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
29. (Un)Conditional Sample Generation Based on Distribution Element Trees.
- Author
-
Meyer, Daniel W.
- Subjects
- *
BIG data , *STATISTICAL bootstrapping , *DATA analysis , *DATA fusion (Statistics) , *COMPUTATIONAL statistics - Abstract
Recently, distribution element trees (DETs) were introduced as an accurate and computationally efficient method for density estimation. In this work, we demonstrate that the DET formulation promotes an easy and inexpensive way to generate random samples similar to a smooth bootstrap. These samples can be generated unconditionally, but also, without further complications, conditionally using available information about certain probability-space components. This article is accompanied by the R codes that were used to produce all simulation results. Supplementary material for this article is available online. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
30. Regression Modeling and File Matching Using Possibly Erroneous Matching Variables.
- Author
-
Dalzell, Nicole M. and Reiter, Jerome P.
- Subjects
- *
DATA fusion (Statistics) , *STATISTICAL matching , *MULTIPLE imputation (Statistics) , *REGRESSION analysis , *DATA analysis - Abstract
Many analyses require linking records from two databases comprising overlapping sets of individuals. In the absence of unique identifiers, the linkage procedure often involves matching on a set of categorical variables, such as demographics, common to both files. Typically, however, the resulting matches are inexact: some cross-classifications of the matching variables do not generate unique links across files. Further, the variables used for matching can be subject to reporting errors, which introduce additional uncertainty in analyses. We present a Bayesian file matching methodology designed to estimate regression models and match records simultaneously when categorical variables used for matching are subject to errors. The method relies on a hierarchical model that includes (1) the regression of interest involving variables from the two files given a vector indicating the links, (2) a model for the linking vector given the true values of the variables used for matching, (3) a model for reported values of the variables used for matching given their true values, and (4) a model for the true values of the variables used for matching. We describe algorithms for sampling from the posterior distribution of the model. We illustrate the methodology using artificial data and data from education records in the state of North Carolina. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
31. DECHADE: DEtecting slight Changes with HArd DEcisions in Wireless Sensor Networks.
- Author
-
Ciuonzo, D. and Salvo Rossi, P.
- Subjects
- *
DATA fusion (Statistics) , *DATA analysis , *WIRELESS sensor networks , *INTERNET of things , *BANDWIDTH allocation , *BINOMIAL distribution - Abstract
This paper focuses on the problem of change detection through a Wireless Sensor Network (WSN) whose nodes report only binary decisions (on the presence/absence of a certain event to be monitored), due to bandwidth/energy constraints. The resulting problem can be modelled as testing the equality of samples drawn from independent Bernoulli probability mass functions, when the bit probabilities under both hypotheses are not known. Both One-Sided (OS) and Two-Sided (TS) tests are considered, with reference to: (i) identical bit probability (a homogeneous scenario), (ii) different per-sensor bit probabilities (a non-homogeneous scenario) and (iii) regions with identical bit probability (a block-homogeneous scenario) for the observed samples. The goal is to provide a systematic framework collecting a plethora of viable detectors (designed via theoretically founded criteria) which can be used for each instance of the problem. Finally, verification of the derived detectors in two relevant WSN-related problems is provided to show the appeal of the proposed framework. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
32. Optimal estimation of sensor biases for asynchronous multi-sensor data fusion.
- Author
-
Pu, Wenqiang, Liu, Ya-Feng, Yan, Junkun, Liu, Hongwei, and Luo, Zhi-Quan
- Subjects
- *
DATA fusion (Statistics) , *DATA analysis , *ESTIMATION theory , *MATHEMATICAL statistics , *STOCHASTIC processes - Abstract
An important step in a multi-sensor surveillance system is to estimate sensor biases from their noisy asynchronous measurements. This estimation problem is computationally challenging due to the highly nonlinear transformation between the global and local coordinate systems as well as the measurement asynchrony from different sensors. In this paper, we propose a novel nonlinear least squares formulation for the problem by assuming the existence of a reference target moving with an (unknown) constant velocity. We also propose an efficient block coordinate decent (BCD) optimization algorithm, with a judicious initialization, to solve the problem. The proposed BCD algorithm alternately updates the range and azimuth bias estimates by solving linear least squares problems and semidefinite programs. In the absence of measurement noise, the proposed algorithm is guaranteed to find the global solution of the problem and the true biases. Simulation results show that the proposed algorithm significantly outperforms the existing approaches in terms of the root mean square error. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
33. Efficient error detection in soft data fusion for cooperative spectrum sensing.
- Author
-
Bhatti, Dost Muhammad Saqib, Ahmed, Saleem, Saeed, Nasir, and Shaikh, Bushra
- Subjects
- *
ERROR detection (Information theory) , *INFORMATION theory , *DATA fusion (Statistics) , *DATA analysis , *WIRELESS sensor networks - Abstract
The primary objective of cooperative spectrum sensing (CSS) is to determine whether a particular spectrum is occupied by a licensed user or not, so that unlicensed users called secondary users (SUs) can utilize that spectrum, if it is not occupied. For CSS, all SUs report their sensing information through reporting channel to the central base station called fusion center (FC). During transmission, some of the SUs are subjected to fading and shadowing, due to which the overall performance of CSS is degraded. We have proposed an algorithm which uses error detection technique on sensing measurement of all SUs. Each SU is required to re-transmit the sensing data to the FC, if error is detected on it. Our proposed algorithm combines the sensing measurement of limited number of SUs. Using Proposed algorithm, we have achieved the improved probability of detection (PD) and throughput. The simulation results compare the proposed algorithm with conventional scheme. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
34. Analysis of Field Return Data With Failed-But-Not-Reported Events.
- Author
-
Wang, Xin, Ye, Zhi-Sheng, Hong, Yi-Li, and Tang, Loon-Ching
- Subjects
- *
DATA analysis , *DESCRIPTIVE statistics , *DATA fusion (Statistics) , *DATA modeling , *DATA mining - Abstract
Warranty data contain valuable information on product field reliability and customer behaviors. Most previous studies on analysis of warranty data implicitly assume that all failures within the warranty period are reported and recorded. However, the failed-but-not-reported (FBNR) phenomenon is quite common for a product whose price is not very high. Ignorance of the FBNR phenomenon leads to an overestimate of product reliability based on field return data or an overestimate of warranty cost based on lab data or tracking data. Being an indicator of customer satisfaction, the FBNR proportion provides valuable managerial insights. In this study, statistical inference for the FBNR phenomenon as well as field lifetime distribution is described. We first propose a flexible FBNR function to model the time-dependent FBNR behavior. Then, a framework for data analysis is developed. In the framework, both semiparametric and parametric approaches are used to jointly analyze warranty claim data and supplementary tracking data from a follow-up of selected customers. The FBNR problem in the tracking data is minimal and thus the data can be used to effectively decouple the FBNR information from the warranty claim data. The proposed methods are illustrated with an example. Supplementary materials for this article are available online. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
35. Small-area estimation in the presence of area-level correlated responses.
- Author
-
Bartoli, Luca, Pagliarella, Maria Chiara, Russo, Carlo, and Salvatore, Renato
- Subjects
- *
SPATIAL ability , *MATHEMATICAL domains , *COMPUTER simulation , *DATA analysis , *DATA fusion (Statistics) - Abstract
The Fay-Herriot area-level model for correlated response data is augmented with a between-groups-of-domains effect. Correlated-response parameters of small-area estimates no longer need the assumption of spatial contiguity. A simulation shows that area-level correlated-response observations increase the efficiency of the estimates, but do not reduce the biases. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
36. Hybrid Orientation Based Human Limbs Motion Tracking Method.
- Author
-
Glonek, Grzegorz and Wojciechowski, Adam
- Subjects
- *
MOTION detectors , *KINECT (Motion sensor) , *GESTURE controlled interfaces (Computer systems) , *DATA fusion (Statistics) , *DATA analysis - Abstract
One of the key technologies that lays behind the human-machine interaction and human motion diagnosis is the limbs motion tracking. To make the limbs tracking efficient, it must be able to estimate a precise and unambiguous position of each tracked human joint and resulting body part pose. In recent years, body pose estimation became very popular and broadly available for home users because of easy access to cheap tracking devices. Their robustness can be improved by different tracking modes data fusion. The paper defines the novel approach--orientation based data fusion--instead of dominating in literature position based approach, for two classes of tracking devices: depth sensors (i.e., Microsoft Kinect) and inertial measurement units (IMU). The detailed analysis of their working characteristics allowed to elaborate a new method that let fuse more precisely limbs orientation data from both devices and compensates their imprecisions. The paper presents the series of performed experiments that verified the method's accuracy. This novel approach allowed to outperform the precision of position-based joints tracking, the methods dominating in the literature, of up to 18%. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
37. Scene Recognition for Indoor Localization Using a Multi-Sensor Fusion Approach.
- Author
-
Mengyun Liu, Ruizhi Chen, Deren Li, Yujin Chen, Guangyi Guo, Zhipeng Cao, and Yuanjin Pan
- Subjects
- *
DATA analysis , *WIRELESS localization , *DATA fusion (Statistics) , *GLOBAL Positioning System , *INDOOR positioning systems - Abstract
After decades of research, there is still no solution for indoor localization like the GNSS (Global Navigation Satellite System) solution for outdoor environments. The major reasons for this phenomenon are the complex spatial topology and RF transmission environment. To deal with these problems, an indoor scene constrained method for localization is proposed in this paper, which is inspired by the visual cognition ability of the human brain and the progress in the computer vision field regarding high-level image understanding. Furthermore, a multi-sensor fusion method is implemented on a commercial smartphone including cameras,WiFi and inertial sensors. Compared to former research, the camera on a smartphone is used to "see" which scene the user is in. With this information, a particle filter algorithm constrained by scene information is adopted to determine the final location. For indoor scene recognition, we take advantage of deep learning that has been proven to be highly effective in the computer vision community. For particle filter, bothWiFi and magnetic field signals are used to update the weights of particles. Similar to other fingerprinting localization methods, there are two stages in the proposed system, offline training and online localization. In the offline stage, an indoor scene model is trained by Caffe (one of the most popular open source frameworks for deep learning) and a fingerprint database is constructed by user trajectories in different scenes. To reduce the volume requirement of training data for deep learning, a fine-tuned method is adopted for model training. In the online stage, a camera in a smartphone is used to recognize the initial scene. Then a particle filter algorithm is used to fuse the sensor data and determine the final location. To prove the effectiveness of the proposed method, an Android client and a web server are implemented. The Android client is used to collect data and locate a user. The web server is developed for indoor scene model training and communication with an Android client. To evaluate the performance, comparison experiments are conducted and the results demonstrate that a positioning accuracy of 1.32 m at 95% is achievable with the proposed solution. Both positioning accuracy and robustness are enhanced compared to approaches without scene constraint including commercial products such as IndoorAtlas. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
38. Heterogeneous Data Fusion Method to Estimate Travel Time Distributions in Congested Road Networks.
- Author
-
Chaoyang Shi, Bi Yu Chen, Lam, William H. K., and Qingquan Li
- Subjects
- *
DATA analysis , *ESTIMATION theory , *TRAFFIC congestion , *TRAFFIC engineering , *DATA fusion (Statistics) - Abstract
Travel times in congested urban road networks are highly stochastic. Provision of travel time distribution information, including both mean and variance, can be very useful for travelers to make reliable path choice decisions to ensure higher probability of on-time arrival. To this end, a heterogeneous data fusion method is proposed to estimate travel time distributions by fusing heterogeneous data from point and interval detectors. In the proposed method, link travel time distributions are first estimated from point detector observations. The travel time distributions of links without point detectors are imputed based on their spatial correlations with links that have point detectors. The estimated link travel time distributions are then fused with path travel time distributions obtained from the interval detectors using Dempster-Shafer evidence theory. Based on fused path travel time distribution, an optimization technique is further introduced to update link travel time distributions and their spatial correlations. A case study was performed using real-world data from Hong Kong and showed that the proposed method obtained accurate and robust estimations of link and path travel time distributions in congested road networks. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
39. Development and evaluation of a lookup-table-based approach to data fusion for seasonal wetlands monitoring: An integrated use of AMSR series, MODIS, and Landsat.
- Author
-
Mizuochi, Hiroki, Hiyama, Tetsuya, Ohta, Takeshi, Fujioka, Yuichiro, Kambatuku, Jack R., Iijima, Morio, and Nasahara, Kenlo N.
- Subjects
- *
LANDSAT satellites , *DATA fusion (Statistics) , *DATA analysis , *ARTIFICIAL satellites , *MODIS (Spectroradiometer) - Abstract
Broad scale monitoring of inland waters is essential to research on carbon and water cycles, and for application in the monitoring of disasters including floods and droughts on various spatial and temporal scales. Satellite remote sensing using spatiotemporal data fusion (STF) has recently attracted attention as a way of simultaneously describing spatial heterogeneity and tracking the temporal variability of inland waters. However, existing STF approaches have limitations in describing abrupt temporal changes, integrating “dissimilar” datasets (i.e., fusions between microwave and optical data), and compiling long-term, frequent STF datasets. To overcome these limitations, in this study we developed and evaluated a lookup table (LUT)-based STF, termed database unmixing (DBUX), using multiple types of satellite data (AMSR series, MODIS, and Landsat), and applied it to semi-arid seasonal wetlands in Namibia. The results show that DBUX is: 1) flexible in integrating optical data (MODIS or Landsat) with microwave (AMSR series) and seasonal (day of year) information; 2) able to generate long-term, frequent Landsat-like datasets; and 3) more reliable than an existing approach (spatial and temporal adaptive reflectance fusion model; STARFM) for tracking dynamic temporal variations in seasonal wetlands. Water maps retrieved from the resulting STF dataset for the wetlands had a 30-m spatial resolution and a temporal frequency of 1 or 2 days, and the dataset covered from 2002 to 2015. The time series water maps accurately described both seasonal and interannual changes in the wetlands, and could act as a basis for understanding the hydrological features of the region. Further studies are required to enable application of DBUX in other regions, and for other landscapes with different satellite sensor combinations. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
40. Secure approximation of edit distance on genomic data.
- Author
-
Momin Al Aziz, Md, Alhadidi, Dima, and Mohammed, Noman
- Subjects
- *
GENETIC engineering , *NUCLEOTIDE sequencing , *APPROXIMATION theory , *DATA analysis , *DATA fusion (Statistics) - Abstract
Background: Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext. Methods: In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations. Results: Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method. Conclusion: The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
41. Collinear masking effect in visual search is independent of perceptual salience.
- Author
-
Jingling, Li, Lu, Yi-Hui, Cheng, Miao, and Tseng, Chia-huei
- Subjects
- *
RESEARCH methodology , *OPTIMISM , *EMOTIONS , *DATA analysis , *DATA fusion (Statistics) - Abstract
Searching for a target in a salient region should be easier than looking for one in a nonsalient region. However, we previously discovered a contradictory phenomenon in which a local target in a salient structure was more difficult to find than one in the background. The salient structure was constructed of orientation singletons aligned to each other to form a collinear structure. In the present study, we undertake to determine whether such a masking effect was a result of salience competition between a global structure and the local target. In the first 3 experiments, we increased the salience value of the local target with the hope of adding to its competitive advantage and eventually eliminating the masking effect; nevertheless, the masking effect persisted. In an additional 2 experiments, we reduced salience of the global collinear structure by altering the orientation of the background bars and the masking effect still emerged. Our salience manipulations were validated by a controlled condition in which the global structure was grouped noncollinearly. In this case, local target salience increase (e.g., onset) or global distractor salience reduction (e.g., randomized flanking orientations) effectively removed the facilitation effect of the noncollinear structure. Our data suggest that salience competition is unlikely to explain the collinear masking effect, and other mechanisms such as contour integration, border formation, or the crowding effect may be prospective candidates for further investigation. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
42. Single and double over-barrier ionization of He, He and Ne system by positron impact.
- Author
-
Yang, Aixiang, Zhang, Ning, Zhu, Binghui, Zou, Xianrong, Chen, Ximeng, and Shao, Jianxiong
- Subjects
- *
DATA analysis , *DATA fusion (Statistics) , *POSITRONS , *BETA rays , *POSITRON annihilation - Abstract
The classical over-barrier ionization model (COBI) method and trajectory calculations were utilized to simulate the ionization of He impacted by a positron. The calculated ionization cross sections of He agree with other theoretical data. Additionally, we found that the double ionization of He has a definite association with the positron-He impact. This result can explain why doubly ionized He seemed to be positron-scattered by the rest of the He in our previous study. The COBI model was also extended to study the double ionization caused by positron-Ne impacts. Our theoretical results agree with the experimental data. Graphical abstract: [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
43. Mobile agent itinerary planning for WSN data fusion: considering multiple sinks and heterogeneous networks.
- Author
-
Gavalas, Damianos, Venetis, Ioannis E., Konstantopoulos, Charalampos, and Pantziou, Grammati
- Subjects
- *
MOBILE agent systems , *DATA fusion (Statistics) , *DATA analysis , *WIRELESS sensor networks , *COMPUTER simulation - Abstract
Mobile agent (MA)-based middleware has been thoroughly investigated in the past few years as a means to address the efficiency, scalability, and reliability issues of data fusion applications on wireless sensor networks. Deriving an efficient itinerary for each MA to follow is of high importance, because itineraries determine to a large extent the overall performance of data fusion tasks. In this article, we present a novel algorithmic approach for efficient itinerary planning of MA objects undertaking data fusion tasks. We adopt a method based on iterated local search to construct the itineraries (ie, visiting sequences of source nodes) assigned to multiple traveling MAs. We apply alternative optimization criteria which aim either at minimizing the overall energy expenditure over all derived MA itineraries or prolonging the network lifetime. Furthermore, we propose algorithmic solutions for 2 realistic settings which have not been investigated in the past: firstly, the employment of multiple sinks that share the responsibility of MA-based data fusion tasks across the sensor field, and secondly, the consideration of heterogeneous sensor networks comprising nodes powerful enough to host the runtime environment required to execute MA code as well as 'ordinary' nodes which lack these resources. Simulation tests verify the performance gain attained by our algorithmic methods against alternative itinerary planning approaches which involve multiple MAs. Copyright © 2016 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
44. A Multiple Data Fusion Approach to Wheel Slip Control for Decentralized Electric Vehicles.
- Author
-
Dejun Yin, Nan Sun, Danfeng Shan, and Jia-Sheng Hu
- Subjects
- *
MULTISENSOR data fusion , *ELECTRIC vehicles , *ELECTRIC automobiles , *DATA fusion (Statistics) , *DATA analysis - Abstract
Currently, active safety control methods for cars, i.e., the antilock braking system (ABS), the traction control system (TCS), and electronic stability control (ESC), govern the wheel slip control based on the wheel slip ratio, which relies on the information from non-driven wheels. However, these methods are not applicable in the cases without non-driven wheels, e.g., a four-wheel decentralized electric vehicle. Therefore, this paper proposes a new wheel slip control approach based on a novel data fusion method to ensure good traction performance in any driving condition. Firstly, with the proposed data fusion algorithm, the acceleration estimator makes use of the data measured by the sensor installed near the vehicle center of mass (CM) to calculate the reference acceleration of each wheel center. Then, the wheel slip is constrained by controlling the acceleration deviation between the actual wheel and the reference wheel center. By comparison with non-control and model following control (MFC) cases in double lane change tests, the simulation results demonstrate that the proposed control method has significant anti-slip effectiveness and stabilizing control performance. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
45. Surface and subsurface data integration and geological modelling from the Little Ice Age to the present, in the Ravenna coastal plain, northwest Adriatic Sea (Emilia-Romagna, Italy).
- Author
-
Scarelli, Frederico M., Barboza, Eduardo G., Cantelli, Luigi, and Gabbianelli, Giovanni
- Subjects
- *
DATA analysis , *DATA integration , *DATABASE management , *DATA protection , *DATA fusion (Statistics) - Abstract
New geological analysis of the Ravenna coastal plain has allowed a major update of the geological coastal model, with anthropogenic overprints removed from an area heavily affected by human actions in the last four centuries, a period in which the Little Ice Age (LIA) has had a large influence on the coastal dynamics. The geological build-up of the coastal plain is a natural occurrence that is important for understanding the present-day coastal buildup. However, coastal zone anthropization has defined the actual coastal morphology, canceling out most of surface expression needed to create an accurate coastal evolution model, as the beach ridges morphology in the study area. As the lack of surface expression due to the anthropogenic actions has changed the coastal zone under study, it is not possible to only use geomorphological analysis to construct a local coastal evolution model. Because of this, building a new model and understanding how coastal evolution would proceed without human influence would allow us to understand the interaction between natural forces and anthropogenic influence, which has acted to shape the actual morphology of the study area from the LIA period to the present days. We therefore propose an integration of data using the following: actual data from the surface, such as the high-resolution digital surface model (DSM) and the digital terrain model (DTM); the actual database from the Emilia-Romagna Region Geological Survey, including the shapefile with the regional soil chart and the geological elements present in the study area; knowledge from previous work carried out in the area; and the subsurface data from Ground Penetrating Radar (GPR), which would make the study the first to use a GPR survey of the coastal plain of the northwest Adriatic. In addition, the work will follows a successful case study done in the southern Brazilian coastal plain that integrated previous works done in the coastal plain with new methods such as GPR data acquisition. The GPR data allows for corroboration of the interpretation of the coastal model constructed using the surface data. Finally, the surface geological model proposed in this work may aid by providing the following: i) a new means for local coastal geology research; ii) a methodology to apply to other barrier-lagoon systems around the world in order to update our knowledge about these systems; and iii) powerful information to support coastal managers and decision makers in constructing a long-term master plan for effective Integrated Coastal Zone Management. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
46. Data mining approach to monitoring the requirements of the job market: A case study.
- Author
-
Karakatsanis, Ioannis, AlKhader, Wala, MacCrory, Frank, Alibasic, Armin, Omar, Mohammad Atif, Aung, Zeyar, and Woon, Wei Lee
- Subjects
- *
DATA analysis , *DESCRIPTIVE statistics , *DATA binning , *DATA mining , *DATA fusion (Statistics) - Abstract
In challenging economic times, the ability to monitor trends and shifts in the job market would be hugely valuable to job-seekers, employers, policy makers and investors. To analyze the job market, researchers are increasingly turning to data science and related techniques which are able to extract underlying patterns from large collections of data. One database which is of particular relevance in the presence context is O*NET, which is one of the most comprehensive publicly accessible databases of occupational requirements for skills, abilities and knowledge. However, by itself the information in O * NET is not enough to characterize the distribution of occupations required in a given market or region. In this paper, we suggest a data mining based approach for identifying the most in-demand occupations in the modern job market. To achieve this, a Latent Semantic Indexing (LSI) model was developed that is capable of matching job advertisement extracted from the Web with occupation description data in the O * NET database. The findings of this study demonstrate the general usefulness and applicability of the proposed method for highlighting job trends in different industries and geographical areas, identifying occupational clusters, studying the changes in jobs context over time and for various other research embodiments. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
47. A survey of official online sources of high-quality free-of-charge geospatial data for maritime geographic information systems applications.
- Author
-
Kalyvas, Christos, Kokkos, Athanasios, and Tzouramanis, Theodoros
- Subjects
- *
GEOSPATIAL data collection & preservation , *DIGITAL preservation , *DATA analysis , *DATA fusion (Statistics) , *DATA binning - Abstract
Maritime information systems are innovative geographic information systems for study, monitoring and action-taking in maritime areas. They respond to needs in the development of intelligent systems for applications such as scientific research and safety (monitoring the global ecosystem, the atmosphere, the oceans, the biosphere, ice fields, fish populations etc .) or the support of the maritime industry and its related organizations (tracking the position of vessels in motion, providing them with safe routing etc .). For these systems to efficiently handle the complex demands made on such specialized applications, up-to-date real-world data purchased or downloaded from official, trustworthy online data sources is needed. This article examines geospatial free-of-charge data sources and discusses the various classes of available data. Several hundred resources and their available datasets were empirically tested and their quality and usefulness verified, producing a selective thesaurus. An accompanying website summarizing useful available information about the data sources and datasets also includes information which could not be mentioned in the article. The survey, covering a wide spectrum of online information regarding up-to-date sources for genuine valuable real-world high-precision maritime data worldwide, is, to the best of the authors’ knowledge, the only one of its kind at the time of writing. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
48. Parallel meta-blocking for scaling entity resolution over big heterogeneous data.
- Author
-
Efthymiou, Vasilis, Papadakis, George, Papastefanatos, George, Stefanidis, Kostas, and Palpanas, Themis
- Subjects
- *
DATA analysis , *DESCRIPTIVE statistics , *DATA binning , *DATA fusion (Statistics) , *MODEL validation - Abstract
Entity resolution constitutes a crucial task for many applications, but has an inherently quadratic complexity. In order to enable entity resolution to scale to large volumes of data, blocking is typically employed: it clusters similar entities into (overlapping) blocks so that it suffices to perform comparisons only within each block. To further increase efficiency, Meta-blocking is being used to clean the overlapping blocks from unnecessary comparisons, increasing precision by orders of magnitude at a small cost in recall. Despite its high time efficiency though, using Meta-blocking in practice to solve entity resolution problem on very large datasets is still challenging: applying it to 7.4 million entities takes (almost) 8 full days on a modern high-end server. In this paper, we introduce scalable algorithms for Meta-blocking, exploiting the MapReduce framework. Specifically, we describe a strategy for parallel execution that explicitly targets the core concept of Meta-blocking, the blocking graph. Furthermore, we propose two more advanced strategies, aiming to reduce the overhead of data exchange. The comparison-based strategy creates the blocking graph implicitly, while the entity-based strategy is independent of the blocking graph, employing fewer MapReduce jobs with a more elaborate processing. We also introduce a load balancing algorithm that distributes the computationally intensive workload evenly among the available compute nodes. Our experimental analysis verifies the feasibility and superiority of our advanced strategies, and demonstrates their scalability to very large datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
49. Multiobjective Location Routing Problem considering Uncertain Data after Disasters.
- Author
-
Chang, Keliang, Zhou, Hong, Chen, Guijing, and Chen, Huiqin
- Subjects
- *
DATA analysis , *DESCRIPTIVE statistics , *TRANSPORTATION , *DATA binning , *DATA fusion (Statistics) - Abstract
The relief distributions after large disasters play an important role for rescue works. After disasters there is a high degree of uncertainty, such as the demands of disaster points and the damage of paths. The demands of affected points and the velocities between two points on the paths are uncertain in this article, and the robust optimization method is applied to deal with the uncertain parameters. This paper proposes a nonlinear location routing problem with half-time windows and with three objectives. The affected points can be visited more than one time. The goals are the total costs of the transportation, the satisfaction rates of disaster nodes, and the path transport capacities which are denoted by vehicle velocities. Finally, the genetic algorithm is applied to solve a number of numerical examples, and the results show that the genetic algorithm is very stable and effective for this problem. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
50. Matrix-based dynamic updating rough fuzzy approximations for data mining.
- Author
-
Huang, Yanyong, Li, Tianrui, Luo, Chuan, Fujita, Hamido, and Horng, Shi-jinn
- Subjects
- *
DATA analysis , *DESCRIPTIVE statistics , *DATA binning , *DATA fusion (Statistics) , *META-synthesis - Abstract
In a dynamic environment, the data collected from real applications varies not only with the amount of objects but also with the number of features, which will result in continuous change of knowledge over time. The static methods of updating knowledge need to recompute from scratch when new data are added every time. This makes it potentially very time-consuming to update knowledge, especially as the dataset grows dramatically. Calculation of approximations is one of main mining tasks in rough set theory, like frequent pattern mining in association rules. Considering the fuzzy descriptions of decision states in the universe under fuzzy environment, this paper aims to provide an efficient approach for computing rough approximations of fuzzy concepts in dynamic fuzzy decision systems (FDS) with simultaneous variation of objects and features. We firstly present a matrix-based representation of rough fuzzy approximations by a Boolean matrix associated with a matrix operator in FDS. While adding the objects and features concurrently, incremental mechanisms for updating rough fuzzy approximations are introduced, and the corresponding matrix-based dynamic algorithm is developed. Unlike the static method of computing approximations by updating the whole relation matrix, our new approach partitions it into sub-matrices and updates each sub-matrix locally by utilizing the previous matrix information and the interactive information of each sub-matrix to avoid unnecessary calculations. Experimental results on six UCI datasets shown that the proposed dynamic algorithm achieves significantly higher efficiency than the static algorithm and the combination of two reference incremental algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.