Descriptor: "Data density" / Topic: computer - Searchworks@Jio Institute Digital Library Search Results

1. Drug-target affinity prediction using applicability domain based on data density

Author: Masahito Ohue and Shunya Sugita
Subjects: Data density, Training set, Drug candidate, Drug discovery, Computer science, Drug target, Binding potential, computer.software_genre, Data modeling, chemistry.chemical_compound, chemistry, Chemogenomics, Data mining, computer, Applicability domain
Abstract: In the pursuit of research and development of drug discovery, the computational prediction of the target affinity of a drug candidate is useful for screening compounds at an early stage and for verifying the binding potential to an unknown target. The chemogenomics-based method has attracted increased attention as it integrates information pertaining to the drug and target to predict drug-target affinity (DTA). However, the compound and target spaces are vast, and without sufficient training data, proper DTA prediction is not possible. If a DTA prediction is made in this situation, it will potentially lead to false predictions. In this study, we propose a DTA prediction method that can advise whether/when there are insufficient samples in the compound/target spaces based on the concept of the applicability domain (AD) and the data density of the training dataset. AD indicates a data region in which a machine learning model can make reliable predictions. By preclassifying the samples to be predicted by the constructed AD into those within (In-AD) and those outside the AD (Out-AD), we can determine whether a reasonable prediction can be made for these samples. The results of the evaluation experiments based on the use of three different public datasets showed that the AD constructed by the k-nearest neighbor (k-NN) method worked well, i.e., the prediction accuracy of the samples classified by the AD as Out-AD was low, while the prediction accuracy of the samples classified by the AD as In-AD was high.
Published: 2021
Full Text: View/download PDF

2. RADNN: ROBUST TO IMPERCEPTIBLE ADVERSARIAL ATTACKS DEEP NEURAL NETWORK

Author: Plamen Angelov and Eduardo Soares
Subjects: Data density, Adversarial system, Artificial neural network, Computer science, Order (business), Mechanism (biology), Data_MISCELLANEOUS, Data mining, Data patterns, computer.software_genre, computer
Abstract: This paper presents the RADNN algorithm. The RADNN is a robust to imperceptible adversarial attack algorithm that uses the concept of data density and similarities to detect attacks on real-time. Differently from traditional deep learnings that need be trained on the attacks to be able to detect, RADNN has a mechanism that detects data patterns changes. In order to evaluate the proposed method, we considered the PerC attacks and a 1000 images from the Imagenet dataset. The RADNN could correctly identify 97.2% of the attacks.
Published: 2021
Full Text: View/download PDF

3. Long-term storage of information about nuclear waste. 100 000 years and beyond

Author: Martin Kunze
Subjects: Data density, Research program, Civilization, Order (exchange), Computer science, media_common.quotation_subject, Sustainability, Radioactive waste, Computer security, computer.software_genre, computer, media_common, Term (time)
Abstract: In the 20th century, intertwined with the topic of “final nuclear repository”, the ethical requirement to warn about the danger of radioactive radiation over a period of 1 million years was debated. In the meantime, a narrative is beginning to gain acceptance – also in public – that postulates that a repository should be described in terms of content and location in such a way that future generations are capable of making their own informed decisions. After all, nuclear waste consists of materials ranging from dangerous to precious. From the concept of sustainability and responsible usage of resources comes the demand to not isolate, bury and forget nuclear waste in the biosphere forever, but rather to leave the information about it in such a way that even if the transmission of information is interrupted, it can be reconstructed by a technically industrialized civilization. The materials that we store in the depths, especially in places where one would not expect them geologically, could represent valuable resources for future generations. The following questions arise: What time horizons are we talking about? In what form can information exist for so long? What language or symbols do we use for this? Who are the addressees? Conventional information carriers are unsuitable for these purposes. Even the most durable, even with optimal storage, have a shelf life that is orders of magnitude below the temporal safety requirements of nuclear waste repositories. In this lecture, the latest technologies and methods for long-term storage of information are introduced. Ceramic-based data carriers. Ceramic-based data carriers with a durability extending to millions of years even under the most extreme conditions. Originating from the Memory of Mankind project in Hallstatt, Austria, a research program is being carried out at the Vienna University of Technology for data carriers which, in addition to an extremely long durability, also have a high data density. Data formats. There is no guarantee that the digital formats used today will be readable in the near or distant future. Information that is intended for addressees in thousands of years must therefore be recognized as such and be directly legible. Data formats must be intuitively decodable and readable. And finally, universal icons are needed for a “manual”, in order to describe the location and contents of a nuclear waste repository to a distant technical civilization.
Published: 2021
Full Text: View/download PDF

4. Intelligent Seismic Deblending Based Deep Learning Based U-Net

Author: Y. Wang, J. Li, B. Wang, and D. Han
Subjects: Data density, Data processing, Computer science, business.industry, Deep learning, Stability (learning theory), computer.software_genre, Trial and error, Thresholding, Synthetic data, Set (abstract data type), Data mining, Artificial intelligence, business, computer
Abstract: Summary Blended acquisition can help improve the acquisition efficiency or enhance the data density. However, blended seismic data which contains information of multiple sources, should first be separated for traditional seismic data processing steps. Thus, we propose a U-net based intelligent deblending algorithm, combined with the traditional iterative strategy in this abstract. The proposed method can obtain the optimal parameters through self-learning while it should be selected by trial and error in traditional method. We train and valid the U-net by using a set of synthetic data with labels, and then parts of field data with labels are used to finetuned it. Finally, the finetuned U-net is used for intelligent deblending of the left field data. The deblending performance is promising compared with the curvelet transform based thresholding method, which demonstrates the validity of the proposed intelligent deblending algorithm in deblending accuracy, stability and efficiency.
Published: 2021
Full Text: View/download PDF

5. A Crawling Method with No Parameters for Geo-social Data based on Road Maps

Author: Shohei Yokoyama, Masaharu Hirota, and Sou Ijima
Subjects: Data density, Difficult problem, Grid size, Computer science, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, 0211 other engineering and technologies, 021107 urban & regional planning, 02 engineering and technology, Crawling, Grid, computer.software_genre, Data mining, Focus (optics), Web crawler, computer, 021101 geological & geomatics engineering
Abstract: Researchers must crawl geo-social data to analyze and visualize geo-social data. A conventional method to exhaustively crawl geosocial data is based on a grid. The crawler divides a specified area into a grid and uses the center coordinates of each cell to query databases using APIs. However, there is a difficult problem when using the grid-based method. It is that researchers cannot estimate the optimized grid size to exhaustively crawl geo-social data in advance because the optimized grid size depends on data density owing to geographical characteristics of an area. We focus on the fact that geo-social data are dense along roads. Thus, we propose a method based on road maps to exhaustively crawl geo-social data. We demonstrated that our method can crawl geo-social data by using almost the same number of queries compared to the crawler with an optimized grid size.
Published: 2019
Full Text: View/download PDF

6. Recurrent neural networks for early detection of heart failure from longitudinal electronic health record data: Implications for temporal modeling with respect to time before diagnosis, data density, data quantity and data type

Author: Xiaowei Yan, Robert Chen, Walter F. Stewart, Jimeng Sun, and Kenney Ng
Subjects: Male, Time Factors, Alcohol Drinking, Early detection, 030204 cardiovascular system & hematology, computer.software_genre, Data type, Article, California, Machine Learning, 03 medical and health sciences, 0302 clinical medicine, Electronic health record, Predictive Value of Tests, Risk Factors, Medicine, Electronic Health Records, Humans, 030212 general & internal medicine, Diagnosis, Computer-Assisted, Longitudinal Studies, Data density, Heart Failure, Primary Health Care, business.industry, Vital Signs, Incidence, Smoking, Reproducibility of Results, medicine.disease, Recurrent neural network, Early Diagnosis, Heart failure, Female, Data mining, Neural Networks, Computer, Cardiology and Cardiovascular Medicine, business, computer, Temporal modeling, Volume (compression)
Abstract: Background: We determined the impact of data volume and diversity and training conditions on recurrent neural network methods compared with traditional machine learning methods. Methods and Results: Using longitudinal electronic health record data, we assessed the relative performance of machine learning models trained to detect a future diagnosis of heart failure in primary care patients. Model performance was assessed in relation to data parameters defined by the combination of different data domains (data diversity), the number of patient records in the training data set (data quantity), the number of encounters per patient (data density), the prediction window length, and the observation window length (ie, the time period before the prediction window that is the source of features for prediction). Data on 4370 incident heart failure cases and 30 132 group-matched controls were used. Recurrent neural network model performance was superior under a variety of conditions that included (1) when data were less diverse (eg, a single data domain like medication or vital signs) given the same training size; (2) as data quantity increased; (3) as density increased; (4) as the observation window length increased; and (5) as the prediction window length decreased. When all data domains were used, the performance of recurrent neural network models increased in relation to the quantity of data used (ie, up to 100% of the data). When data are sparse (ie, fewer features or low dimension), model performance is lower, but a much smaller training set size is required to achieve optimal performance compared with conditions where data are more diverse and includes more features. Conclusions: Recurrent neural networks are effective for predicting a future diagnosis of heart failure given sufficient training set size. Model performance appears to continue to improve in direct relation to training set size.
Published: 2019

7. Early Detection of Heart Failure Using Electronic Health Records

Author: Sanjoy Dey, Kenney Ng, Steven R Steinbuhl, Christopher DeFilippi, and Walter F. Stewart
Subjects: Male, Time Factors, 02 engineering and technology, 030204 cardiovascular system & hematology, Health records, computer.software_genre, Machine Learning, 0302 clinical medicine, Risk Factors, Statistics, 0202 electrical engineering, electronic engineering, information engineering, Data Mining, Electronic Health Records, 030212 general & internal medicine, Data diversity, Data density, Incidence, Data domain, General Medicine, Prognosis, Area Under Curve, Female, 020201 artificial intelligence & image processing, Data mining, Cardiology and Cardiovascular Medicine, medicine.medical_specialty, Relation (database), 0206 medical engineering, Early detection, Primary care, Risk Assessment, Data type, Article, Set (abstract data type), 03 medical and health sciences, Predictive Value of Tests, medicine, Humans, Intensive care medicine, Practical implications, Heart Failure, business.industry, Pennsylvania, medicine.disease, 020601 biomedical engineering, Early Diagnosis, Logistic Models, ROC Curve, Case-Control Studies, Heart failure, Observational study, business, computer
Abstract: Background— Using electronic health records data to predict events and onset of diseases is increasingly common. Relatively little is known, although, about the tradeoffs between data requirements and model utility. Methods and Results— We examined the performance of machine learning models trained to detect prediagnostic heart failure in primary care patients using longitudinal electronic health records data. Model performance was assessed in relation to data requirements defined by the prediction window length (time before clinical diagnosis), the observation window length (duration of observation before prediction window), the number of different data domains (data diversity), the number of patient records in the training data set (data quantity), and the density of patient encounters (data density). A total of 1684 incident heart failure cases and 13 525 sex, age-category, and clinic matched controls were used for modeling. Model performance improved as (1) the prediction window length decreases, especially when Conclusions— These empirical findings suggest possible guidelines for the minimum amount and type of data needed to train effective disease onset predictive models using longitudinal electronic health records data.
Published: 2016
Full Text: View/download PDF

8. Method and analysis for the upscaling of structural data

Author: Thomas Llewellyn Carmichael and Laurent Ailleres
Subjects: Data density, Data collection, 010504 meteorology & atmospheric sciences, Process (engineering), InformationSystems_DATABASEMANAGEMENT, Geology, 010502 geochemistry & geophysics, computer.software_genre, 01 natural sciences, Physics::Geophysics, Set (abstract data type), Outlier, Range (statistics), Data mining, Scale (map), Cluster analysis, computer, 0105 earth and related environmental sciences
Abstract: 3D geological models are created to integrate a set of input measurements into a single geological model. There are many problems with this approach, as there is uncertainty in all stages of the modelling process, from initial data collection to the approach used in the modelling scheme itself to calculate the geological model. This study looks at the uncertainty inherent in geological models due to data density and introduces a novel method to upscale geological data that optimises the information in the initial dataset. This method also provides the ability for the dominant trend of a geological dataset to be determined at different scales. By using self-organizing maps (SOM's) to examine the different metrics used to quantify a geological model, we allow for a larger range of metrics to be used compared to traditional statistical methods, due to the SOM's ability to deal with incomplete datasets. The classification of the models into clusters based on the geological metrics using k-means clustering provides a useful insight into the models that are most similar and models that are statistical outliers. Our approach is guided and can be calculated on any input dataset of this type to determine the effect that data density will have on a resultant model. These models are all statistical derivations that represent simplifications and different scales of the initial dataset and can be used to interrogate the scale of observations.
Published: 2016
Full Text: View/download PDF

9. Appropriate Data Density Models in Probabilistic Machine Learning Approaches for Data Analysis

Author: Mehrdad Mohannazadeh Bakhtiari, Andrea Villmann, Marika Kaden, and Thomas Villmann
Subjects: Data density, Measure (data warehouse), Mathematical model, Computer science, business.industry, Process (computing), Probabilistic logic, Estimator, 02 engineering and technology, Machine learning, computer.software_genre, 03 medical and health sciences, 0302 clinical medicine, Data visualization, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, 030217 neurology & neurosurgery
Abstract: This paper investigates the mathematically appropriate treatment of data density estimators in machine learning approaches, if these estimators rely on data dissimilarity density models. We show exemplarily for two well-known machine learning approaches for classification and data visualization that this dependence is apparently analyzing the respective mathematical models. We show by numerical experiments that data sets generate different data dissimilarity densities depending on the dissimilarity measure in use. Thus an appropriate choice in machine learning models is mandatory to process the data consistently.
Published: 2019
Full Text: View/download PDF

10. Data density-based fault detection and diagnosis with nonlinearities between variables and multimodal data distributions

Author: Hiromasa Kaneko and Kimito Funatsu
Subjects: Data density, Computer simulation, Computer science, Process Chemistry and Technology, Data domain, Process (computing), Process variable, computer.software_genre, Fault detection and isolation, Computer Science Applications, Analytical Chemistry, Process control, Partial derivative, Data mining, computer, Spectroscopy, Software
Abstract: Multivariate statistical process control (MSPC) is an important means of monitoring multiple process variables and their interrelationships while controlling chemical and industrial plants efficiently and stably. To consider nonlinearities between process variables and multimodal data distributions, the data density can be used as an index for fault detection. Data domains with a low data density are considered abnormal states. However, after fault detection, faulty process variables cannot be diagnosed with an MSPC model based on the data density. Therefore, we have developed a new index to diagnose the process variables that contribute to process faults using a data density-based MSPC model. The proposed index uses the partial derivative of an MSPC model with respect to each process variable. We demonstrate the effectiveness of the proposed method using numerical simulation data, Tennessee Eastman process data, and real plant data analyses.
Published: 2015
Full Text: View/download PDF

11. Data density and poor organization

Author: Nadine Sarter and Nadine Marie Moacdieh
Subjects: Data density, Visual search, Computer science, business.industry, 05 social sciences, Word error rate, Eye movement, Response time, Pattern recognition, computer.software_genre, 050105 experimental psychology, Medical Terminology, Graphics software, Clutter, Eye tracking, 0501 psychology and cognitive sciences, Computer vision, Artificial intelligence, business, computer, 050107 human factors, Medical Assisting and Transcription
Abstract: Display clutter has been shown to lead to breakdowns in attention and performance during visual search in data-rich domains. However, the contribution of, and interaction between, the two key aspects of clutter - data density and poor organization - are not well understood. The aim of this study was to fill this gap by systematically varying both factors and collecting performance and eye tracking data. This data was then used to analyze the performance and underlying attentional costs resulting from the two aspects of clutter. Participants performed visual search tasks in a simulated graphics program. Data density (density of icons) and display organization (grouping of icons) were manipulated. The dependent measures were response time, error rate, eye movements, and subjective clutter ratings. Results confirmed the negative effects of high data density and poor organization on response time and error rate. More importantly, eye tracking metrics reflected the effects of data density and organization on attention allocation and helped explain the observed performance decrements. In particular, spatial density mirrored the interaction effects between data and organization, and the nearest neighbor index (NNI) helped differentiate between the effects of high data density and poor organization. These findings suggest that eye tracking is a powerful means of obtaining a more detailed understanding of the effects of clutter and may also prove useful for real-time detection of clutter.
Published: 2015
Full Text: View/download PDF

12. Dynamic Load Shedding Scheme based on Input Rate of Spatial Data Stream and Data Density

Author: Weonil Jeong
Subjects: Data stream, Scheme (programming language), Data density, Computer science, Real-time computing, Load Shedding, Spatial analysis, computer, Dynamic load testing, computer.programming_language
Abstract: In u-GIS environments, various load shedding techniques have been researched in order to balance loadscaused by input spatial data streams. However, typical load shedding methods on aspatial data lack regard for characteristics of spatial data, also previous load shedding approaches on spatial, which still lack regard for spatialdata density or dynamic input data stream, give rise to troubles on spatial query processing performance and accuracy.Therefore, dynamic load shedding scheme over spatial data stream is proposed through stored spatial data deviationand load ratio of input data stream in order to improve spatial continuous query accuracy and performance in u-GISenvironment. In proposed scheme, input data which are a big probability related to spatial continuous query may bea strong chance to be dropped relatively. Key Words : Data Stream, Load Shedding, Spatial Continuous Query, Smart Object 1. 서론 유비쿼터스 환경에서 위치 정보를 이용한 응용 서비스를 지원하기 위한 기반 기술로 u-GIS 플랫폼 기술이 대두되고 있다[1-2]. u-GIS 플랫폼 기술은 건물, 도로, 하천과 같은 2차원 또는 3차원상의 정적인 공간 정보와 유비쿼터스 환경에서 시간에 따라 위치 정보가 포함된 동적인 GeoSensor 정보의 연계 처리가 요구된다. GeoSensor에서 생성되는 실시간 공간 정보를 처리하기 위해 데이터 스트림 처리 기술이 활용되고 있다[3-6]. 실시간으로 수집된 대용량의 데이터 스트림은 처리 과정에서 유한한 저장 공간을 초과할 수 있고 이로 인해 데이터가 손실되
Published: 2015
Full Text: View/download PDF

13. The evolution of non-quantitative geological graphics in texts during the formative years of geology (1788–1840)

Author: Renee M. Clary and James H. Wandersee
Subjects: Data density, Multivariate statistics, business.industry, Small multiple, Graphic design, computer.software_genre, Proxy (climate), Formative assessment, History and Philosophy of Science, Data presentation, General Earth and Planetary Sciences, Artificial intelligence, Graphics, business, computer, Natural language processing
Abstract: Although modern geology uses both pictorial and graphical illustrations for conveying information and data presentation, early books in the discipline did not place such a reliance on graphics. This study investigated the numbers and types of graphics in 72 texts containing geological illustrations, which were considered to be representative (excluding works with solely mineralogical or paleontological illustrations), published during the formative years of geology (1788–1840) in terms of Edward R. Tufte's principles of graphic design. The text graphics were analyzed in terms of the presence of proxy or inferred imagery, direct or keyed labeling, unnecessary embellishment, and their data density; and whether they exhibited multivariate properties, represented the small multiple format, or exhibited graphic modifications. Mixed methodology analyses revealed four stages in the evolution of geologic illustrations in the interval from 1788–1840: (1) early pictorial or proxy representations; (2) the introduction of labeled graphics, coinciding with the first geology textbooks; (3) ‘grand' or elaborate illustration; and (4) a high graphic density. Although progress was made in graphical representation during the time period studied, statistical graphics were hardly ever used.
Published: 2015
Full Text: View/download PDF

14. Research on Improve DBSCAN Algorithm Based On Ant Clustering

Author: Liu Ying, Fang Yuankang, Huang Zhiqiu, Ye Zan, and Luo Yuping
Subjects: DBSCAN, Data density, Computer science, business.industry, Improved algorithm, Pattern recognition, Computer Science::Computational Geometry, computer.software_genre, Similarity (network science), Control and Systems Engineering, SUBCLU, Data pre-processing, Artificial intelligence, Data mining, Cluster analysis, business, computer
Abstract: DBSCAN algorithm is sensitive to the input parameter of Eps, especially when the data density is non-uniform. It gets poor result in clustering using the same global Eps. In addition, the algorithm has difficulty with high-dimension of data. In this paper, an improved DBSCAN algorithm LF-DBSCAN is proposed, which uses ant clustering algorithm in data preprocessing phase to classify the datasets and to get several values of parameter Eps, then call DBSCAN algorithm with different values of Eps to cluster the non-uniform datasets. Experimental results demonstrate the effectiveness of the improved algorithm.
Published: 2014
Full Text: View/download PDF

15. Applicability Domain Based on Ensemble Learning in Classification and Regression Analyses

Author: Kimito Funatsu and Hiromasa Kaneko
Subjects: Data density, Computer science, business.industry, General Chemical Engineering, Reliability (computer networking), Regression analysis, General Chemistry, Models, Theoretical, Library and Information Sciences, Machine learning, computer.software_genre, Ensemble learning, Regression, Computer Science Applications, Set (abstract data type), ComputingMethodologies_PATTERNRECOGNITION, Learning, Regression Analysis, Artificial intelligence, Data mining, business, computer, Applicability domain
Abstract: We discuss applicability domains (ADs) based on ensemble learning in classification and regression analyses. In regression analysis, the AD can be appropriately set, although attention needs to be paid to the bias of the predicted values. However, because the AD set in classification analysis is too wide, we propose an AD based on ensemble learning and data density. First, we set a threshold for data density below which the prediction result of new data is not reliable. Then, only for new data with a data density higher than the threshold, we consider the reliability of the prediction result based on ensemble learning. By analyzing data from numerical simulations and quantitative structural relationships, we validate our discussion of ADs in classification and regression analyses and confirm that appropriate ADs can be set using the proposed method.
Published: 2014
Full Text: View/download PDF

16. Supervised Rank Normalization for Support Vector Machines

Author: Soo-Jong Lee and Gyeongyong Heo
Subjects: Data density, Normalization (statistics), business.industry, Supervised learning, Pattern recognition, computer.software_genre, Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, Feature Dimension, Distribution function, Data point, Decision boundary, Data mining, Artificial intelligence, business, computer, Mathematics
Abstract: Feature normalization as a pre-processing step has been widely used in classification problems to reduce the effect of different scale in each feature dimension and error as a result. Most of the existing methods, however, assume some distribution function on feature distribution. Even worse, existing methods do not use the labels of data points and, as a result, do not guarantee the optimality of the normalization results in classification. In this paper, proposed is a supervised rank normalization which combines rank normalization and a supervised learning technique. The proposed method does not assume any feature distribution like rank normalization and uses class labels of nearest neighbors in classification to reduce error. SVM, in particular, tries to draw a decision boundary in the middle of class overlapping zone, the reduction of data density in that area helps SVM to find a decision boundary reducing generalized error. All the things mentioned above can be verified through experimental results.
Published: 2013
Full Text: View/download PDF

17. Estimation of predictive accuracy of soft sensor models based on data density

Author: Kimito Funatsu and Hiromasa Kaneko
Subjects: Data density, Measure (data warehouse), Spectrum analyzer, Computer science, business.industry, Process Chemistry and Technology, Process (computing), Value (computer science), Pattern recognition, Soft sensor, computer.software_genre, Computer Science Applications, Analytical Chemistry, Support vector machine, Error bar, Artificial intelligence, Data mining, business, computer, Spectroscopy, Software
Abstract: Soft sensors are widely used to predict process variables that are difficult to measure online. By using soft sensors, analyzer faults can be detected when the difference between a measured value and a predicted value is large. However, it is difficult to detect abnormal data and determine the reasons for the abnormality because prediction errors increase not only because of analyzer faults but also because of variations caused by changes in the state of the chemical plants. To separate these factors, we previously applied applicability domains to the soft sensors and proposed construction of the relationships between the distances to soft sensor models (DMs) and the prediction accuracy of the models quantitatively, and estimated the prediction accuracy, i.e. the error bar, for new data online. In this paper, we use k -nearest-neighbor method and a one-class support vector machine (OCSVM) to estimate the data density and use the average of the distances from the k nearest data and the output of an OCSVM as DMs, respectively. The proposed method was applied to both simulation data and real industrial data, and the superiority of the proposed DMs compared with the traditional models was demonstrated by comparison of their results.
Published: 2013
Full Text: View/download PDF

18. Enrich the data density of cluster for imbalanced learning using immune representatives

Author: Xusheng Ai, Yufeng Yao, Zhiming Cui, Xuefeng Xian, and Jian Wu
Subjects: Data density, 0209 industrial biotechnology, Dependency (UML), Computer science, Immune network, 02 engineering and technology, Overfitting, computer.software_genre, 020901 industrial engineering & automation, 0202 electrical engineering, electronic engineering, information engineering, Cluster (physics), Oversampling, 020201 artificial intelligence & image processing, Data architecture, Data mining, Cluster analysis, computer
Abstract: To deal with between-class and within-class imbalances, a novel over-sampling method, shaped-based oversampling (SBO) is proposed. It reduces the dependency of parameter setting of CURE by generating the variable-length representatives, which represents data architecture. Meanwhile, out method discriminates faked clusters and generates immune representatives in small disjuncts. As immune representatives are not copies of original examples, overfitting is also alleviated. Our experimental results also shows that our proposed over-sampling method SBO can achieve better performance than other renowned re-sampling methods.
Published: 2016
Full Text: View/download PDF

19. Online evolving fuzzy rule-based prediction model for high frequency trading financial data stream

Author: Georgi Gaydadjiev, William A. Gruver, Plamen Angelov, Xiaowei Gu, and Azliza Mohd Ali
Subjects: Data density, Finance, Data stream, Fuzzy rule, business.industry, Computer science, Data stream mining, Intelligent decision support system, computer.software_genre, Machine learning, Adaptive system, Artificial intelligence, Data mining, Data patterns, High-frequency trading, business, computer
Abstract: Analyzing and predicting the high frequency trading (HFT) financial data stream is very challenging due to the fast arrival times and large amount of the data samples. Aiming at solving this problem, an online evolving fuzzy rule-based prediction model is proposed in this paper. Because this prediction model is based on evolving fuzzy rule-based systems and a novel, simpler form of data density, it can autonomously learn from the live data stream, automatically build/remove its rules and recursively update the parameters. This model responds quickly to all unpredictable sudden changes of financial data and re-adjusts itself to follow the new data pattern. Experimental results show the excellent prediction performance of the proposed approach with real financial data stream regardless of quick shifts of data patterns and frequent appearances of abnormal data samples.
Published: 2016
Full Text: View/download PDF

20. Automatic Reverse Engineering Based on Reconstructing Measurement Data in 3D-Lattice

Author: Hideki Aoyama and Kiyomoto Tsushima
Subjects: Reverse engineering, Data density, Engineering, Physical model, Mathematical model, business.industry, Car model, Mechanical Engineering, Lattice (group), CAD, computer.software_genre, Mechanics of Materials, General Materials Science, Data mining, business, computer, Algorithm
Abstract: Reverse engineering systems are used to construct mathematical models of physical models such as clay model based on measurement data. In this study, we proposed a reverse engineering method which can construct high quality surface data automatically. This method consists of the following steps; The first globally and regionally smooths measured data based on the target shape by fitting quadric surface to measurement data. The second defines quadric surfaces and converts measurement points into 3D lattice points to obtain uniform measurement data density. As the positions of measurement data are converted from coordinate values into 3D lattice points, it is easier to find neighboring points and clarify neighboring relations between surfaces. The third acquires segment measurement data based on maximum curvatures and normals at each point. The last defines NURBS surfaces for each segment using the least square method to average positional errors. In order to validate the effectiveness of the proposed method, we developed a reverse engineering system and constructed mathematical models through basic experiments using clay car model measurement data.
Published: 2012
Full Text: View/download PDF

21. Improving rival penalized competitive learning using density‐evaluated mechanism

Author: Yao-Jen Chang, Chia‐Lu Ho, and Sheng‐Sung Yang
Subjects: Data density, Computer science, Mechanism (biology), business.industry, Competitive learning, General Engineering, k-means clustering, Machine learning, computer.software_genre, Determining the number of clusters in a data set, Local optimum, Convergence (routing), Artificial intelligence, business, Cluster analysis, computer
Abstract: Rival penalized competitive learning (RPCL) and its variants have provided attractive ways to perform clustering without knowing the exact cluster number. However, they are always accompanied by problems of falling in local optima and slow learning speed. Thus we investigate the RPCL and propose a mechanism to directly prune the RPCL's structure by evaluating the data density of each unit. We call the new strategy density‐evaluated RPCL (DERPCL). The communication channel state is estimated by the DERPCL in the simulations, and comprehensive comparisons are made with other RPCLs. Results show that the DERPCL is superior in terms of convergence accuracy and speed.
Published: 2010
Full Text: View/download PDF

22. Multi-dimensional data density estimation in P2P networks

Author: Aoying Zhou, Xueqing Gong, Minqi Zhou, and Weining Qian
Subjects: Data density, Information Systems and Management, Computer science, Density estimation, Load balancing (computing), Data structure, computer.software_genre, Hardware and Architecture, Histogram, Discrete cosine transform, Data mining, computer, Software, Information exchange, Multi dimensional data, Information Systems
Abstract: Estimating the global data distribution in Peer-to-Peer (P2P) networks is an important issue and has not yet been well addressed. It can benefit many P2P applications, such as load balancing analysis, query processing, data mining, and so on. In this paper, we propose a novel algorithm which is based on compact multi-dimensional histogram information to achieve high estimation accuracy with low estimation cost. Maintaining data distribution in a multi-dimensional histogram which is spread among peers without overlapping and each part of which is further condensed by a set of discrete cosine transform coefficients, each peer is capable to hierarchically accumulate the compact information to the entire histogram by information exchange and consequently estimates the global data density with accuracy and efficiency. Algorithms on discrete cosine transform coefficients hierarchically accumulating as well as density estimation error are introduced with detailed theoretical analysis and proof. Our extensive performance study confirms the effectiveness and efficiency of our methods on density estimation in dynamic P2P networks.
Published: 2009
Full Text: View/download PDF

23. Development of Color QR Code for Increasing Capacity

Author: Sartid Vongpradhip and Nutchanad Taveerad
Subjects: Data density, Java, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Barcode, law.invention, Mobile phone, law, Code (cryptography), Android application, Android (operating system), business, computer, Computer hardware, Decoding methods, computer.programming_language
Abstract: Barcodes have been widely popular. Their popularity has encouraged an ongoing invention of decoding methods. Barcodes can be categorized into 2 main groups, namely one-dimension (1D) barcodes at which information is stored horizontally and two-dimension (2D) barcodes which contain information in both vertical and horizontal direction, promising a higher storage capacity compared to 1D barcodes. Despite high data density, an amount of information obtained in 2D barcodes still limited to some extent. This study selected QR Code (Quick Response Code) is a type of 2D barcode because firstly, it can handle a variety of information. Secondly, decoding is reasonably straightforward. Finally, the structure of QR code is specified clearly by its developer. This research aimed to increase QR Code capacity by proposing a color Quick Response Code (color QR code) encoding concept which can hold a larger amount of information than that of the traditional black and white QR Code regarding their physical size. A two-color (black and white) QR Code can store 1 bit in each module only, whereas a module of a color QR code with sixteen different colors can contain 4-bit data. In order to decode a color QR code, this study used a code reader equipped with at least an 8-megapixel camera and a decoding application was developed on Android (Android application on mobile phone) and Java (Java application on PC) platform.
Published: 2015
Full Text: View/download PDF

24. Qualizon graphs

Author: Paolo Federico, Alexander Rind, Wolfgang Aigner, Silvia Miksch, and Stephan Hoffmann
Subjects: Data density, Theoretical computer science, Series (mathematics), Computer science, business.industry, Horizon, Qualitative property, Space (commercial competition), computer.software_genre, Visualization, Information visualization, Data mining, Time series, business, computer
Abstract: In several application fields, the joint visualization of quantitative data and qualitative abstractions can help analysts make sense of complex time series data by associating precise numeric values with corresponding domain-specific interpretations, such as good, bad, high, low, normal. At the same time, the need to analyse large multivariate time-oriented datasets often calls for keeping visualizations as compact as possible. In this paper, we introduce Qualizon Graphs, a compact visualization that combines quantitative data and qualitative abstractions. It is based on the well known Horizon Graphs, but instead of a predefined number of equally sized bands, it uses as many bands as qualitative categories with corresponding different sizes. In this way, Qualizon Graphs increase the data density of visualized quantitative values and inherently integrate qualitative abstractions. A user study shows that Qualizon Graphs are as fast and accurate as Horizon Graphs for quantitative data, and are an alternative to state-of-the-art visualizations for both quantitative and qualitative data, enabling a trade-off between speed and accuracy.
Published: 2014
Full Text: View/download PDF

25. An Adaptation of the Barnes Filter Applied to the Objective Analysis of Radar Data

Author: Jerry M. Straka, Jean-Pierre Aubagnac, and Mark A. Askelson
Subjects: Scheme (programming language), Data density, Atmospheric Science, Computer science, Objective analysis, law.invention, Filter (video), law, Range (statistics), Radar, Adaptation (computer science), computer, Algorithm, computer.programming_language
Abstract: Spatial objective analysis is routinely performed in several applications that utilize radar data. Because of their relative simplicity and computational efficiency, one-pass distance-dependent weighted-average (DDWA) schemes that utilize either the Cressman or the Barnes filter are often used in these applications. The DDWA schemes that have traditionally been used do not, however, directly account for two fundamental characteristics of radar data. These are 1) the spacing of radar data depends on direction and 2) radar data density systematically decreases with increasing range. A DDWA scheme based on an adaptation of the Barnes filter is proposed. This scheme, termed the adaptive Barnes (A-B) scheme, explicitly takes into account radar data properties 1 and 2 above. Both theoretical and experimental investigations indicate that two attributes of the A-B scheme, direction-splitting and automatic adaptation to data density, may facilitate the preservation of the maximum amount of meaningful info...
Published: 2000
Full Text: View/download PDF

26. Potential for two‐dimensional codes in automated manufacturing

Author: Anthony Furness and Keith A. Osman
Subjects: Flexibility (engineering), Data density, Engineering, Database, business.industry, Barcode, computer.software_genre, Industrial and Manufacturing Engineering, law.invention, Data capacity, Identifier, Control and Systems Engineering, law, Encoding (memory), Data file, Data mining, business, computer
Abstract: Linear barcodes have found wide acceptance in all sectors of industry as machine‐readable part identifiers, but their low data density limits practical data capacity to some 20 characters. Two‐dimensional codes, however, have a much higher data density, and can contain significant volumes of data in compact symbols that can be printed or marked directly on to small parts. When used as portable data files 2‐D encoding provides both flexibility and prospects for applications unachievable with linear barcode data carriers. This paper discusses 2‐D codes and the potential for their applications in automated manufacturing.
Published: 2000
Full Text: View/download PDF

27. Collaborative Filtering Algorithm Based on Preference of Item Properties

Author: Xiao-gang Yang
Subjects: Data density, Information retrieval, Computer science, media_common.quotation_subject, Rating matrix, Recommender system, computer.software_genre, Preference, Similarity (network science), Collaborative filtering, Quality (business), Data mining, Algorithm, computer, media_common
Abstract: To address the shortcomings of traditional collaborative filtering algorithm for data sparsity of the user–item rating matrix, a collaborative filtering algorithm based on the preferences of the item properties is proposed. The algorithm calculates the similarity between users through user preference value for item properties. Then, it predicts item ratings that users have not rated based on user similarity to increase data density of the original user–item rating matrix. Finally, it adopts the corresponding collaborative filtering algorithm based on the item properties preference to achieve the personalized recommendation. The experimental results show that this method can effectively improve the quality of the recommendations.
Published: 2014
Full Text: View/download PDF

28. Applying Over-sampling Technique Based on Data Density and Cost-sensitive SVM to Imbalanced Learning

Author: Qinghua Cao and SenZhang Wang
Subjects: Data density, business.industry, Group method of data handling, Computer science, Cost sensitive, Pattern recognition, Minority class, Overfitting, Machine learning, computer.software_genre, Support vector machine, ComputingMethodologies_PATTERNRECOGNITION, Oversampling, Artificial intelligence, Noise (video), business, computer
Abstract: Performance of SVM is greatly limited when it is used to imbalanced datasets in which the classification categories are not approximately equally represented. In real world datasets are often composed of "normal" examples with only a small percentage of "abnormal" examples. Under-sampling of majority class and over-sampling minority class are two obvious ways to balance the datasets before training. SMOTE algorithm is a simple and effective over-sampling technique. But SMOTE algorithm ignores data distribution and density information which is important to synthesize minority examples. SMOTE algorithm cannot effectively eliminate the influence of noise either. A novel over-sampling algorithm-SMOBD is proposed and shows better performance in experiments. We also combine this algorithm with different error costs SVM. We compare the performance of our algorithm against regular SVM, SMOTE, SMOTE-ENN, SDC (SMOTE with different costs of SVM) and the experiment results show our algorithm outperforms all of them.
Published: 2011
Full Text: View/download PDF

29. Memristive Multilevel Memory with Applications in Audio Signal Storage

Author: Lidan Wang, Xiaofang Hu, and Shukai Duan
Subjects: Scheme (programming language), Data density, Hardware_MEMORYSTRUCTURES, Audio signal, business.industry, Computer science, Multilevel memory, Memristor, Hewlett packard, law.invention, Memistor, law, Binary data, Electronic engineering, business, computer, Computer hardware, computer.programming_language
Abstract: Memristor, a two-terminal device with dynamic conductance depending on the charge or the flux flowing it was predicted by Leon Chua about four decades ago and named the fourth fundamental circuit element. In 2008, Hewlett Packard (HP) laboratory announced they have found the missing memristor in nano-scale physical device. Since that memristor has garnered extensive interests among numerous researchers and proposed in many applications. In this paper, an implement scheme of a memristive multilevel memory with a single unit storing a bit multilevel information (several bits of binary data) is presented. A record/play system with the memristive multilevel memory is designed as an application in audio signal storage. Due to the multilevel memory ability and nano-scale size of the memristor, this design possesses simpler, smaller circuit structure, greater data density and nonvolatile. A series of computer simulations verify the effectiveness of the memristive memory and provide a new solution for audio signal storage and processing.
Published: 2011
Full Text: View/download PDF

30. A novel self-organizing map algorithm for text mining

Author: Chung-Hong Lee and Hsin-Chang Yang
Subjects: Self-organizing map, Data density, Structure (mathematical logic), Computer science, business.industry, computer.software_genre, Machine learning, ComputingMethodologies_PATTERNRECOGNITION, Text mining, Pattern recognition (psychology), Cluster (physics), Data mining, Artificial intelligence, business, computer, Algorithm
Abstract: Self-organizing map (SOM) learning algorithm has been widely applied in solving various tasks in pattern recognition, machine learning, and data mining, etc. Recently, it has been used to cluster documents and produced reasonable results. Traditional SOM algorithm learns from data using a fixed map. Approaches have been proposed to allow adaptable map structure. In this work, we propose a novel SOM learning algorithm that can expand the map laterally and hierarchically. The adaption of the map structure is based on topics identified from the underlying document clusters. This approach is different from traditional approaches which are typically driven by the data density of clusters. Preliminary experiment result suggested that the proposed algorithm outperforms other similar approaches.
Published: 2010
Full Text: View/download PDF

31. Database of Marine Magnetic Anomalies in the Northeast Pacific, Atlantic, and Southeast Indian Oceans

Author: Keizo Sayanagi and Kensaku Tamaki
Subjects: Data density, geography, geography.geographical_feature_category, Database, Seamount, computer.software_genre, Lineation, Indian ocean, Tectonics, Oceanography, Earth's magnetic field, General Earth and Planetary Sciences, Magnetic anomaly, Expansive, computer, Geology, General Environmental Science
Abstract: We have compiled 3.5 million data of marine magnetic anomalies in four areas in the Northeast Pacific, Atlantic, and Southeast Indian Oceans and tried to produce an expansive database of marine magnetic anomalies. The boundaries of the study areas are 20°-60°N and 170°-105°W for the Northeast Pacific, 20°-60°N and 70°-5°W for the North Atlantic, 50°-25°S and 45°W-10°E for the South Atlantic, and 60°-30°S and 125°-145°E for the Southeast Indian Ocean. Marine magnetic anomalies were obtained from geomagnetic total intensity data using IGRF/DGRF. All the data of marine magnetic anomalies in those areas were reduced to a 5-minute regular grid using weighted average interpolation. The gridded data in the North-east Pacific and the North Atlantic provide quite remarkable resolution of magnetic anomaly lineations, fracture zones, and seamounts. Results in the South Atlantic and the Southeast Indian Ocean show a limit of the gridding procedure due to low data density. We can, however, see characteristic patterns of marine magnetic anomalies corresponding to major tectonic features in parts of the areas where the data density is comparatively high. This attempt to make a database of marine magnetic anomalies has succeeded at least in the Northeast Pacific and North Atlantic Oceans, because magnetic anomaly maps confirm that characteristic features of marine magnetic anomalies in these areas are well represented by the gridded data. We can easily compare the magnetic database with other database such as the database of topography (ETOPO5). As more cruise data are accumulated in areas which have lack of magnetic data, this magnetic database will be more useful, especially to analyze a global marine magnetic anomaly.
Published: 1992
Full Text: View/download PDF

32. Demonstration Report for Geonics EM-63 Cued-Interrogation Data Collection, Processing and Archiving at Camp Sibert, Alabama

Author: Kevin Kingdon and Stephen Billings
Subjects: Cued speech, Data density, Engineering, Data processing, Data collection, business.industry, Anomaly (natural sciences), computer.software_genre, Data acquisition, Metal detectors, Data mining, business, Interrogation, computer, Remote sensing
Abstract: This report describes the data collection, processing and archiving of Geonics EM-63 time-domain electromagnetic data over selected anomalies within Site 18 of the Camp Sibert FUDS) In May 2007, cued-interrogation data were collected over 200 anomalies in a blind-test scenario and 38 items in a GPO. The cued-data were collected dynamically along 11 North-South lines spaced 30 cm apart, 3 East-West lines spaced 50 cm apart and on two "pitch" lines directly over the anomaly center. The pitch lines involved collecting data while the EM-63 was pitched backwards and then forwards over the anomaly. In addition to the cued-interrogation, full-coverage data at 0.5 m spacing were collected over the GPO and a 35 m by 60 m area of the blind site. In this demonstration report we evaluate six identified performance metrics for the technology including reliability/robustness, data density, survey rate, percentage of site covered, and position and depth accuracy of inverted positions. Metrics more directly related to discrimination performance are assessed in a separate demonstration report that addresses data processing and interpretation.
Published: 2008
Full Text: View/download PDF

33. Part Surfaces 3D Digitising

Author: Charyar Mehdi-Souzani and Claire Lartigue
Subjects: Data density, Laser scanning, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, computer.software_genre, Voxel, Data accuracy, Artificial intelligence, Completeness (statistics), business, computer, Surface reconstruction
Abstract: The accurate measurement of characteristic lines using contact-less sensors is an important issue for it conditions the further point exploitation. For applications such as surface reconstruction the characteristic lines represent the 3D boundaries of point sets that can be fitted with surface models. The problem combines two specific issues: evaluating the accuracy of data obtained using non-contact sensors and identifying characteristic lines from discrete data. The first problem is solved through the use of quality indicators that are representative of data density, data completeness, and data accuracy. A particular attention is given to the digitising noise that greatly influences the precision of the acquired points. The second problem is solved through the 2D identification of points that are characteristic of 3D contours. The proposed approach relies on a voxel-space representation of the 3D digitised data that allows both the extraction of voxels belonging to the contour and the evaluation of voxels not satisfying the specified index precision. To improve data accuracy a method for rescanning non-accurate zones is detailed on the calculation of new sensor orientations so that the sensor is as normal as possible to the contour voxel and the digitising noise is limited.
Published: 2005
Full Text: View/download PDF

34. Corrigendum to 'Strategic Parameter Search Method Based on Prediction Errors and Data Density for Efficient Product Design' [Chemom. Intell. Lab. Syst. 127 (2013) 70–79]

Author: Takuya Kishio, Kimito Funatsu, and Hiromasa Kaneko
Subjects: Data density, Product design, Computer science, Process Chemistry and Technology, Data mining, computer.software_genre, computer, Spectroscopy, Software, Computer Science Applications, Analytical Chemistry
Published: 2014
Full Text: View/download PDF

35. Challenges and Solutions in Allocating Data in a SAN Environment

Author: Bruce Naegel
Subjects: Data density, Storage area network, Computer science, Distributed computing, Operating system, computer.software_genre, Application software, computer, Computer network management
Abstract: This storage explosion shows little sign of slowing down. The 72 GB capacity of our largest disk drives today will grow to 1TB within a few years. In addition to data density on disk drives, we have faster processor speeds to use all this data and increasingly faster networks to access and share the data. This means we need methods of allocating storage for all these uses that can respond to rapidly changing environments in an intelligent fashion.
Published: 2001
Full Text: View/download PDF

36. Database inspection capability for the high-grade device

Author: NamKyu Park, Jong-woon Chang, Hwa-Sup Bae, and Seung-Woo Yoo
Subjects: Automated optical inspection, Data density, Engineering, Critical layer, Database, business.industry, Volume (computing), computer.software_genre, Micrometre, Optical proximity correction, Control system, business, computer, Dram
Abstract: Recently, as the design rule of the device is rapidly tightened, defect control is more critical and high-end masks like 256 M and 1 G DRAM level have difficulty for database inspection due to high data volume, data density, OPC, etc. Therefore, it is necessary to evaluate the machine capability of database inspection and defect capture ability for critical layer. For the experiment, we prepared three test plates that have tight CD design and extreme small OPC patterns, and one of them is combined by 4 different layers as metal, contact, ipso, and poly and design rule is 1.0 - 1.5 micrometer. And we shrinked the some area (80, 75%) for confirming the limitation of DB inspection. Through this evaluation, we tried to identify current barriers such as CD uniformity problem, and overcome the barriers and find ways how to improve the inspection capability.
Published: 1997
Full Text: View/download PDF

37. Metrics for effective information visualization

Author: Richard Brath
Subjects: Data density, business.industry, Computer science, Context (language use), computer.software_genre, Software metric, Visualization, Information visualization, Data visualization, Data point, Overhead (computing), Data mining, business, computer
Abstract: Metrics for information visualization will help designers create and evaluate 3D information visualizations. Based on experience from 60+ 3D information visualizations, the metrics we propose are: number of data points and data density; number of dimensions and cognitive overhead; occlusion percentage; and reference context and percentage of identifiable points.
Published: 1997
Full Text: View/download PDF

38. Data compression

Author: Debra A. Lelewer and Daniel S. Hirschberg
Subjects: Data density, General Computer Science, Computer science, Data_CODINGANDINFORMATIONTHEORY, Fano plane, Information theory, Huffman coding, computer.software_genre, Theoretical Computer Science, symbols.namesake, Redundancy (information theory), symbols, Data mining, File storage, computer, Data compression
Abstract: This paper surveys a variety of data compression methods spanning almost 40 years of research, from the work of Shannon, Fano, and Huffman in the late 1940s to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems. Concepts from information theory as they relate to the goals and evaluation of data compression methods are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported, and possibilities for future research are suggested.
Published: 1987
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

Publisher

38 results on '"Data density"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources