Publication Type: Magazines / Search Limiters: Full Text / Topic: acquisition of data - Searchworks@Jio Institute Digital Library Search Results

1. Chapter 3: Visualizations of a Digital Collection's Data.

Author: Glowacka-Musial, Monika
Subjects: PLUTO (Dwarf planet), SPACE sciences, VISUALIZATION, OPTICAL character recognition, ACQUISITION of data, TELECOMMUNICATION satellites
Abstract: The aticle presents an extended example of visualizing the digital collection of papers of the astronomer Clyde W. Tombaugh at New Mexico State University Library Archives and Special Collections. Topics include the discovery of Pluto has the beginning of an exciting career; and awarded the Edwin Emory Slosson Scholarship to the University of Kansas, US.
Published: 2021

2. Technical Perspective: The Effectiveness of Security Measures.

Author: Christin, Nicolas
Subjects: COMPUTER security, INFORMATION technology security, COMPUTER users, HUMAN behavior, DATA privacy, ACQUISITION of data, INTERVIEWING
Abstract: The article discusses various aspects of computer and information security practices, and it mentions computer user behavior in relation to information privacy and security-related problems. Computer user behavior-based data collection is examined, along with analyses of how individuals react to advice regarding security measures and procedures. Exposure to computer-based vulnerabilities is assessed, as well as user interviews and passive measurements of user behavior.
Published: 2022
Full Text: View/download PDF

3. Event-Based Visual Flow.

Author: Benosman, Ryad, Clercq, Charles, Lagorce, Xavier, Ieng, Sio-Hoi, and Bartolozzi, Chiara
Subjects: VISUALIZATION, ARTIFICIAL neural networks, RETINA, ACQUISITION of data, SPATIOTEMPORAL processes, COMPUTATIONAL complexity, VOLTAGE control
Abstract: This paper introduces a new methodology to compute dense visual flow using the precise timings of spikes from an asynchronous event-based retina. Biological retinas, and their artificial counterparts, are totally asynchronous and data-driven and rely on a paradigm of light acquisition radically different from most of the currently used frame-grabber technologies. This paper introduces a framework to estimate visual flow from the local properties of events' spatiotemporal space. We will show that precise visual flow orientation and amplitude can be estimated using a local differential approach on the surface defined by coactive events. Experimental results are presented; they show the method adequacy with high data sparseness and temporal resolution of event-based acquisition that allows the computation of motion flow with microsecond accuracy and at very low computational cost. [ABSTRACT FROM PUBLISHER]
Published: 2014
Full Text: View/download PDF

4. Many cancer studies can’t be replicated.

Author: HAELLE, TARA
Subjects: CANCER research, REPRODUCIBLE research, ACQUISITION of data, EFFECT sizes (Statistics), RESEARCH ethics
Abstract: The article discusses the Reproducibility Project: Cancer Biology study which aimed to replicate 193 experiments from 53 top cancer papers published from 2010 to 2012. Topics include the replication problem with cancer research identified in the study, the reasons researchers could not complete the majority of experiments such as not getting enough information from the original papers, and the criteria used by the research team to determine if a replication was successful such as effect size.
Published: 2022

5. Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks.

Author: Cerutti, Gianmarco, Cavigelli, Lukas, Andri, Renzo, Magno, Michele, Farella, Elisabetta, and Benini, Luca
Subjects: FEATURE extraction, ENERGY consumption, DEEP learning, ACQUISITION of data, MICROCONTROLLERS
Abstract: Keyword spotting (KWS) is a crucial function enabling the interaction with the many ubiquitous smart devices in our surroundings, either activating them through wake-word or directly as a human-computer interface. For many applications, KWS is the entry point for our interactions with the device and, thus, an always-on workload. Many smart devices are mobile and their battery lifetime is heavily impacted by continuously running services. KWS and similar always-on services are thus the focus when optimizing the overall power consumption. This work addresses KWS energy-efficiency on low-cost microcontroller units (MCUs). We combine analog binary feature extraction with binary neural networks. By replacing the digital preprocessing with the proposed analog front-end, we show that the energy required for data acquisition and preprocessing can be reduced by $29\times $ , cutting its share from a dominating 85% to a mere 16% of the overall energy consumption for our reference KWS application. Experimental evaluations on the Speech Commands Dataset show that the proposed system outperforms state-of-the-art accuracy and energy efficiency, respectively, by 1% and $4.3\times $ on a 10-class dataset while providing a compelling accuracy-energy trade-off including a 2% accuracy drop for a $71\times $ energy reduction. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

6. A Nonlinear Semantic-Preserving Projection Approach to Visualize Multivariate Periodical Time Series.

Author

Blanchart, Pierre and Depecker, Marine

Subjects

DIMENSIONAL reduction algorithms, DIMENSION reduction (Statistics), DATA acquisition systems, ACQUISITION of data, DATA management

Abstract

A major drawback of nonlinear dimensionality reduction (DR) techniques is their inability to preserve some authentic information from the source domain, leading to projections that are often hard to interpret when it comes to observing anything other than the topological structure of the data. In this paper, we propose a nonlinear DR approach enforcing projection constraints resulting from an a priori knowledge about the structure of the data in multivariate periodical time series. We then propose several ways of exploiting this constrained projection to extract user-relevant information, such as the nominal behavior of a periodical dynamical system or the deviant behaviors which may occur at different time scales. The techniques are demonstrated on both a synthetic dataset composed of simulated multivariate data exhibiting a periodical behavior, and a real dataset corresponding to six months of sensor data acquisitions and recordings inside experimental buildings.1

We would like to thank the Institut National de l'Energie Solaire (INES) and the CEA, LITEN, Laboratoire Energétique du Bâtiment for providing the data. [ABSTRACT FROM AUTHOR]

Published

2014

Full Text

View/download PDF

7. Yahoo concerned that release of redacted FISA papers may mislead.

Author: Ribeiro, John
Subjects: ACQUISITION of data, ELECTRONIC surveillance
Abstract: The article reports that internet company Yahoo! Inc. has asked the U.S. Foreign Intelligence Surveillance Court (FISA) to release redacted documents. It states that the company thinks that the documents which show that the company resisted data collection by the U.S. National Security Agency (NSA) could mislead people. The company highlights that information disclosure will reveal that Yahoo objected throughout the proceedings.
Published: 2013

8. Decentralized privacy-preserving truth discovery for crowd sensing.

Author: Xiong, Ping, Li, Guirong, Liu, Hengzhu, Hu, Yiyi, and Jin, Dawei
Subjects: *CROWDSENSING, *DISTRIBUTED databases, *INTERNET privacy, *DATA privacy, *ACQUISITION of data, *COLLUSION
Abstract: Truth discovery is an efficient technique for tackling data conflict problems in crowd sensing for distributed data collection. As the sensory data to be collected may include sensitive information about users, privacy-preserving truth discovery has attracted significant attention in recent years. Most existing studies apply a centralized architecture based on a cryptographic system, which may be vulnerable to single-point attacks and also has a very high computational cost. In this paper, we propose DPriTD, a decentralized privacy-preserving framework for truth discovery in crowd sensing. The proposed approach leverages the additively homomorphic property of Shamir's Secret Sharing scheme to protect user's privacy. DPriTD provides a strict privacy guarantee for crowd sensing applications. Because each sensitive data point, considered to be a secret, is split into a batch of shares, and the secret cannot be recovered unless a sufficient number of shares are aggregated, DPriTD achieves effective truth discovery while protecting sensitive data from collusion attacks. Furthermore, DPriTD is independent of a centralized server and can perform reliably when not all participants are online in real time. It thus enhances the robustness of a crowd sensing system. Extensive experiments conducted on real-world datasets demonstrate the high performance of our method compared with existing mechanisms. • DPriTD is a decentralized privacy-preserving framework for truth discovery in crowd sensing. • DPriTD leverages the additively homomorphic property of Shamir's Secret Sharing scheme to protect user's privacy. • DPriTD achieves effective truth discovery while protects sensitive data from collusion attacks. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. An industrial virus propagation model based on SCADA system.

Author: Zhu, Qingyi, Zhang, Gang, Luo, Xuhang, and Gan, Chenquan
Subjects: *SUPERVISORY control & data acquisition systems, *SUPERVISORY control systems, *COMPUTER network security, *ZIKA Virus Epidemic, 2015-2016, *ACQUISITION of data, *VIRAL transmission
Abstract: Supervisory control and data acquisition (SCADA) systems are widely used in the national infrastructure. As more and more SCADA systems have been connected to the Internet, SCADA systems are facing great network security threats, especially attacks from industrial viruses. This paper aims to propose a novel mathematical model of industrial viruses propagation over SCADA systems. Then, the existence and stability of the equilibrium point of this model are studied. To better control the spread of this virus with limited resources, an optimal control system is established for the model. Then, the existence of optimal control is proved, and the corresponding optimal control system is derived. Numerical simulation results show that our proposed control method can effectively control the spread of industrial viruses. • A novel industrial virus propagation model based on SCADA system. • The local and global dynamic behaviors are fully studied. • The optimal control strategy to suppress malware propagation is obtained. • Numerical simulations are given to verify the main results. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. Should data ever be thrown away? Pooling interval-censored data sets with different precision.

Author: Tretiak, Krasymyr and Ferson, Scott
Subjects: *DATA quality, *SAMPLE size (Statistics), *EPISTEMIC uncertainty, *ACQUISITION of data, *DATA analysis
Abstract: Data quality is an important consideration in many engineering applications and projects. Data collection procedures do not always involve careful utilization of the most precise instruments and strictest protocols. As a consequence, data are invariably affected by imprecision and sometimes sharply varying levels of quality of the data. Different mathematical representations of imprecision have been suggested, including a classical approach to censored data which is considered optimal when the proposed error model is correct, and a weaker approach called interval statistics based on partial identification that makes fewer assumptions. Maximizing the quality of statistical results is often crucial to the success of many engineering projects, and a natural question that arises is whether data of differing qualities should be pooled together or we should include only precise measurements and disregard imprecise data. Some worry that combining precise and imprecise measurements can depreciate the overall quality of the pooled data. Some fear that excluding data of lesser precision can increase their overall uncertainty about results because lower sample size implies more sampling uncertainty. This paper explores these concerns and describes simulation results that show when it is advisable to combine fairly precise data with rather imprecise data by comparing analyses using different mathematical representations of imprecision. Pooling data sets is preferred when the low-quality data set does not exceed a certain level of uncertainty. However, so long as the data are random, it may be legitimate to reject the low-quality data if its reduction of sampling uncertainty does not counterbalance the effect of its imprecision on the overall uncertainty. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. Beyond the Repository: Best practices for open source ecosystems researchers.

Author: CASARI, AMANDA, FERRAIOLI, JULIA, and LOVATO, JUNIPER
Subjects: OPEN source software, RESEARCH personnel, RESEARCH ethics, ACQUISITION of data, OPEN data movement, DATA privacy, INFORMED consent (Law)
Abstract: This article details best practices for open source ecosystems research to uphold the integrity of ecosystems. The article details nine best practices as a guide for researchers working with ecosystems with an emphasis on ethics and respect. Topics include understanding and adhering to information usage policies, data collection methods, and collaboration with the communities involved with these ecosystems.
Published: 2023
Full Text: View/download PDF

12. Library Technology REPORTS.

Author: Glowacka-Musial, Monika
Subjects: PROGRAMMING languages, HISTORICAL source material, COLLECTION agencies, ACQUISITION of data, LIBRARIES
Abstract: Since the 1990s, libraries have invested in developing digital collections and online services to provide access to historical sources. One way to inspire users to actively engage with these materials is by creating visual contexts for the materials. These visuals provide an overview of a collection's content and inspire users to experiment with the collection's data for various purposes, including research. This issue of Library Technology Reports (vol. 57, no. 1) presents an approach that views digital collections as data that can be mined, analyzed, and visualized by means of the R programming language. R is open source, relatively easy to learn, and supported by an established community of coders. The selection of plots presented in the report includes R scripts, fragments of data tables, and some explanation of the R code used to create the plots. [ABSTRACT FROM AUTHOR]
Published: 2021

13. An accurate and dynamic predictive model for a smart M-Health system using machine learning.

Author: Naseer Qureshi, Kashif, Din, Sadia, Jeon, Gwanggil, and Piccialli, Francesco
Subjects: *CLOUD storage, *MACHINE learning, *MOBILE health, *PREDICTION models, *ACQUISITION of data, *DYNAMIC models
Abstract: • Emerging Mobile Health systems are examples of novel technologies. • Data are collected from sensor nodes and forwarded to local databases. • From cloud computing services, the data are collected for further analysis. • This paper presents a detailed overview of M-Health systems. • We propose a secure Android-based architecture to collect patient data. Nowadays, new highly-developed technologies are changing traditional processes related to medical and healthcare systems. Emerging Mobile Health (M-Health) systems are examples of novel technologies based on advanced data communication, deep learning, artificial intelligence, cloud computing, big data, and other machine learning methods. Data are collected from sensor nodes and forwarded to local databases through new technologies that enable cellular networks and then store the information in cloud storage systems. From cloud computing services or medical centres, the data are collected for further analysis. Furthermore, machine learning techniques are being used for accurate prediction of disease analysis and for purposes of classification. This paper presents a detailed overview of M-Health systems, their model and architecture, technologies and applications and also discusses statistical and machine learning approaches. We also propose a secure Android-based architecture to collect patient data, a reliable cloud-based model for data storage. Finally, a predictive model able to classify cardiovascular diseases according to their seriousness will be discussed. Moreover, the proposed prediction model has been compared with existing models in terms of accuracy, sensitivity, and specificity. The experimental results show encouraging results in terms of the proposed predictive model for an M-Health system. Keywords: Machine Learning, Predictive, Models, M-Health, Classification, SVM, Decision Tree, Accuracy [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

14. The optimal upper bound of the number of queries for Laplace mechanism under differential privacy.

Author: Li, Xiaoguang, Li, Hui, Zhu, Hui, and Huang, Muyang
Subjects: *INFORMATION theory, *BIG data, *ACQUISITION of data, *PROBABILITY theory, *COMPUTER systems
Abstract: Differential privacy is a state-of-the-art technology for privacy preserving, and Laplace mechanism is a simple and powerful tool to realize differential privacy. However, there is an obvious flaw in differential privacy, which is each query function can only be executed finite times for the reason that adversary can recover the real query result if he executes the same query function many times. Unfortunately, how to set the upper bound for the number of linear queries is still an issue. In this paper, we focus on the linear query function in Laplace-based mechanisms, and we propose a method to set the upper bound for the number of linear queries from the perspective of information theory. The main idea is, firstly we find the most aggressive linear query that leaks the maximum information about the dataset to adversaries, and we set the upper bound of the number of queries so that even if the most aggressive linear query cannot leak the whole self-information about any individual to the adversary. On the other hand, the number of queries is also influenced by the type of dataset (continuous and discrete). In this paper, we also discuss the different upper bound for the number of queries for continuous datasets and discrete datasets. Finally, we conduct the experiments on the continuous dataset and the discrete dataset to prove our result. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

15. Physical unclonable functions based secret keys scheme for securing big data infrastructure communication.

Author: Farha, Fadi, Ning, Huansheng, Liu, Hong, Yang, Laurence T., and Chen, Liming
Subjects: *BIG data, *COMMUNICATION, *INTERNET of things, *ACQUISITION of data, *BLOCKCHAINS
Abstract: Internet of Things (IoT) is expanding rapidly and so is the number of devices, sensors and actuators joining this world. IoT devices are an important part of the data collection process in Big Data systems, so by protecting them we support and improve the security of the whole system. ZigBee is a secure communication system for the underlying Internet of Things (IoT) infrastructure. Even though ZigBee has a strong security stack built on a variety of secret keys, ZigBee devices are vulnerable to the side-channel and key extraction attacks. Due to the low cost and limited resources, most ZigBee devices store their secret keys in plaintext. In this paper, we focus on protecting the storage of ZigBee secret keys and show how Physical Unclonable Functions (PUFs) can help the ZigBee devices to be robust tamper-resistant against the physical attacks. The proposed schemes include PUF-based key storage protection and key generation. The experiments in this paper were done using SRAM-PUF. Furthermore, two algorithms were proposed to overcome the defects in the randomness of keys generated using SRAM-PUF and, at the same time, to increase the reliability of these keys. We were able to significantly improve the hardware security of ZEDs by protecting their keying materials using costless, high secure, random, stable and volatile PUF-based secret keys. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

16. FastTrack: Efficient and Precise Dynamic Race Detection.

Author: Flanagan, Cormac and Freund, Stephen N.
Subjects: PARALLEL programs (Computer programs), VECTOR analysis, DATA analysis, ACQUISITION of data, COMPUTER programming, ALGORITHM research
Abstract: Multithreaded programs are notoriously prone to race conditions. Prior work developed precise dynamic race detectors that never report false alarms. However, these checkers employ expensive data structures, such as vector clocks (VCs), that result in significant performance overhead. This paper exploits the insight that the full generality of VCs is not necessary in most cases. That is, we can replace VCs with an adaptive lightweight representation that, for almost all operations of the target program, requires constant space and supports constant-time operations. Experimental results show that the resulting race detection algorithm is over twice as fast as prior precise race detectors, with no loss of precision. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

17. Federated stochastic configuration networks for distributed data analytics.

Author: Dai, Wei, Ji, Langlong, and Wang, Dianhui
Subjects: *DATA security, *COLLABORATIVE learning, *CLOUD storage, *PRIVACY, *ACQUISITION of data
Abstract: Stochastic configuration networks (SCNs), as a class of randomized learning models, are incrementally built under a supervisory mechanism, and theoretically ensure error-free learning for training sets. This paper proposes a federated version of SCNs (FSCNs) for large-scale data, which are geographically distributed among different end-user clients with non-shareable data due to privacy and security concerns. Unlike centralized learning that needs to collect data from clients and store them collectively on a cloud server, FSCNs enable distributed analytics in a collaborative learning paradigm without centrally aggregating new data, thereby preventing the leakage of private information. Considering different supervisory and aggregate schemes of model parameters, two FSC algorithms with two aggregate strategies are presented. The experiment results on both data regression and classification show the effectiveness and feasibility of our proposed federated learning scheme. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

18. A structure noise-aware tensor dictionary learning method for high-dimensional data clustering.

Author: Yang, Jing-Hua, Chen, Chuan, Dai, Hong-Ning, Fu, Le-Le, and Zheng, Zibin
Subjects: *DOCUMENT clustering, *DATA scrubbing, *RANDOM noise theory, *GAUSSIAN distribution, *DATA mining, *ACQUISITION of data
Abstract: With the development of data acquisition technology, high-dimensional data clustering is an important yet challenging task in data mining. Despite advances achieved by current clustering methods, they can be further improved. First, many of them usually unfold the high-dimensional data into a large matrix, consequently resulting in destroying the intrinsic structural property. Second, some methods assume that the noise in the dataset conforms to a predefined distribution (e.g., the Gaussian or Laplacian distribution), which violates real-world applications and eventually decreases the clustering performance. To address these issues, in this paper, we propose a novel tensor dictionary learning method for clustering high-dimensional data with the coexistence of structure noise. We adopt tensors, the natural and powerful tools for the generalizations of vectors and matrices, to characterize high-dimensional data. Meanwhile, to depict the noise accurately, we decompose the observed data into clean data, structure noise, and Gaussian noise. Furthermore, we use low-rank tensor modeling to characterize the inherent correlations of clean data and adopt tensor dictionary learning to adaptively and accurately describe the structure noise instead of using the predefined distribution. We design the proximal alternating minimization algorithm to solve the proposed model with the theoretical convergence guarantee. Experimental results on both simulated and real datasets show that the proposed method outperforms the compared methods for high-dimensional data clustering. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

19. Hybrid Sampling-Based Clustering Ensemble With Global and Local Constitutions.

Author: Yang, Yun and Jiang, Jianmin
Subjects: DOCUMENT clustering, BOOSTING algorithms, BOOTSTRAP aggregation (Algorithms), ACQUISITION of data, FACE perception
Abstract: Among a number of ensemble learning techniques, boosting and bagging are the most popular sampling-based ensemble approaches for classification problems. Boosting is considered stronger than bagging on noise-free data set with complex class structures, whereas bagging is more robust than boosting in cases where noise data are present. In this paper, we extend both ensemble approaches to clustering tasks, and propose a novel hybrid sampling-based clustering ensemble by combining the strengths of boosting and bagging. In our approach, the input partitions are iteratively generated via a hybrid process inspired by both boosting and bagging. Then, a novel consensus function is proposed to encode the local and global cluster structure of input partitions into a single representation, and applies a single clustering algorithm to such representation to obtain the consolidated consensus partition. Our approach has been evaluated on 2-D-synthetic data, collection of benchmarks, and real-world facial recognition data sets, which show that the proposed technique outperforms the existing benchmarks for a variety of clustering tasks. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

20. Tensor nonconvex unified prior for tensor recovery.

Author: Wu, Yumo, Sun, Jianing, and Yin, Junping
Subjects: *PRINCIPAL components analysis, *INVERSE problems, *ACQUISITION of data, *PRIOR learning, *NOISE
Abstract: Tensor data, such as hyperspectral images and multi-frame videos, have gained significant attention in practical applications. However, the inherent degradation phenomena during data acquisition, including noise and missing pixels, give rise to a series of ill-posed inverse problems that need to be addressed. Currently, the rational exploration of prior knowledge for tensor recovery, including global low-rankness and local smoothness, has emerged as a common concern. Inspired by recent notable works, this paper proposes a novel tensor non-convex unified prior term, which employs weighted tensor Schatten p -norm as a rank surrogate function in the gradient domain. The new prior can yield a regularizer that effectively captures low-rankness and smoothness, and is applied to tensor completion and tensor robust principal component analysis models. An efficient algorithm is developed by using the alternating direction method of multipliers and its convergence analysis is also provided. Extensive experimental results demonstrate that the proposed method outperforms the state-of-the-art methods, particularly in cases of high missing rates and strong noise levels. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Topology-Based Clustering Using Polar Self-Organizing Map.

Author: Xu, Lu, Chow, Tommy W. S., and Ma, Eden W. M.
Subjects: SELF-organizing maps, DOCUMENT clustering, TOPOLOGY, REPRESENTATION theory, ACQUISITION of data
Abstract: Cluster analysis of unlabeled data sets has been recognized as a key research topic in varieties of fields. In many practical cases, no a priori knowledge is specified, for example, the number of clusters is unknown. In this paper, grid clustering based on the polar self-organizing map (PolSOM) is developed to automatically identify the optimal number of partitions. The data topology consisting of both the distance and density is exploited in the grid clustering. The proposed clustering method also provides a visual representation as PolSOM allows the characteristics of clusters to be presented as a 2-D polar map in terms of the data feature and value. Experimental studies on synthetic and real data sets demonstrate that the proposed algorithm provides higher clustering accuracy and lower computational cost compared with six conventional methods. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

22. Robust Self-Triggered Coordination With Ternary Controllers.

Author: De Persis, Claudio and Frasca, Paolo
Subjects: ROBUST control, COMMUNICATION policy, INFORMATION theory, ACQUISITION of data, HYBRID systems, CONTROL theory (Engineering)
Abstract: This paper regards the coordination of networked systems, studied in the framework of hybrid dynamical systems. We design a coordination scheme which combines the use of ternary controllers with a self-triggered communication policy. The communication policy requires the agents to measure, at each sampling time, the difference between their states and those of their neighbors. The collected information is then used to update the control and determine the following sampling time. We show that the proposed scheme ensures finite-time convergence to a neighborhood of a consensus state: the coordination scheme does not require the agents to share a global clock, but allows them to rely on local clocks. We then study the robustness of the proposed self-triggered coordination system with respect to skews in the agents' local clocks, to delays, and to limited precision in communication. Furthermore, we present two significant variations of our scheme. First, assuming a global clock to be available, we design a time-varying controller which asymptotically drives the system to consensus. The assumption of a global clock is then discussed, and relaxed to a certain extent. Second, we adapt our framework to a communication model in which each agent polls its neighbors separately, instead of polling all of them simultaneously. This communication policy actually leads to a self-triggered “gossip” coordination system. [ABSTRACT FROM PUBLISHER]
Published: 2013
Full Text: View/download PDF

23. The Lean Data Scientist: Recent Advances toward Overcoming the Data Bottleneck: A taxonomy of the methods used to obtain quality datasets enhances existing resources.

Author: SHANI, CHEN, ZARECKI, JONATHAN, and SHAHAF, DAFNA
Subjects: BOTTLENECKS (Manufacturing), ACQUISITION of data, MACHINE learning, DATA science, DATA augmentation, BIG data
Abstract: The article offers insights on how to overcome the "data bottleneck" of obtaining data for machine-learning (ML) applications. Particular focus is given to a comprehensive taxonomy of ways to tackle this "data bottleneck." Methods discussed include dataset repurposing (using a preexisting dataset for a different task than it was originally constructed for), data augmentation (artificial inflation of the training set through the application of modifications) and multimodal learning (attempts to enrich the input to the learning algorithm).
Published: 2023
Full Text: View/download PDF

24. An influence analysis of diversity and collective cardinality on collective performance.

Author: Nguyen, Van Du and Nguyen, Ngoc Thanh
Subjects: *SWARM intelligence, *ACQUISITION of data, *COMPUTER performance, *INFORMATION theory, *COMPUTATIONAL complexity
Abstract: This paper presents a general framework to demonstrate the prominent role of diversity in the effectiveness of collective performance. There appears to be ample evidence that diversity is one of the essential criteria of which a collective to be intelligent. Intuitively, a collective involving diverse individuals may add new information, new perspectives, and so forth on the problem that needs to be solved. Moreover, the diversity of individual solutions to the given problem has been proven helpful in eliminating the phenomenon of correlated errors. The objective of the paper is to investigate the influence of the latter kind of diversity on the collective performance by taking into account the collective cardinality. Our findings qualify the positive impact of diversity on collective performance. Particularly, collectives with higher diversity levels will lead to better collective performances. Subsequently, expanding the collective cardinality that causes an increase in its diversity will also be positively associated with the collective performance. With some restrictions, the hypothesis “the more diverse the collective, the higher the collective performance” is formally proved. Furthermore, the conditions under which increasing the cardinality of a collective will cause its diversity to be increased (or decreased) are worked out. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

25. A trust active and Trace back based trust Management system about effective data collection for mobile IoT services.

Author: Zhang, Rui, Liu, Anfeng, Wang, Tian, Xiong, Neal N., and Vasilakos, Athanasios V.
Subjects: *TRUST, *ACQUISITION of data, *TELECOMMUNICATION satellites, *DRONE aircraft, *INTERNET of things
Abstract: Mobile Crowdsensing (MCs) is a widely applicable and inexpensive data-obtaining method that leverages mobile devices to sense and report data without deploying sensors. The rapid development of the Integration of IoT systems, as well as the widespread use of communication satellites, make MCs further develop and receive extensive attention from researchers. One key issue is how to ensure the reliability of the reported data to avoid the presence of malicious participants leading to poor IoT service quality. In this paper, we proposed an Active and Trace Back based Trust Management (ATBTM) algorithm to evaluate the trust of participants and handle malicious participants. The main innovations are: (1) An active trust evaluation approach for MCs is proposed, which uses an Unmanned Aerial Vehicle (UAV) to collect data as a baseline to verify the reliability of the data reported by participants. Then, the data reported by high reputation participants can also be used as a sub-baseline to verify the trust of some participants. (2) A traceback based trust evaluation method is also proposed. In this approach, some reliable devices provide historical sensing data that has been collected but not reported. Then the system compares it to real data with corresponding timestamps to evaluate the trust of other participants. Sufficient theoretical analysis and experimental results demonstrate that the proposed ATBTM framework can effectively identify the malicious workers, and conquer the drawbacks of lacking trust evaluation method in existed MCs. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games.

Author: Li, Jinna, Modares, Hamidreza, Chai, Tianyou, Lewis, Frank L., and Xie, Lihua
Subjects: REINFORCEMENT learning, ACQUISITION of data
Abstract: This paper develops an off-policy reinforcement learning (RL) algorithm to solve optimal synchronization of multiagent systems. This is accomplished by using the framework of graphical games. In contrast to traditional control protocols, which require complete knowledge of agent dynamics, the proposed off-policy RL algorithm is a model-free approach, in that it solves the optimal synchronization problem without knowing any knowledge of the agent dynamics. A prescribed control policy, called behavior policy, is applied to each agent to generate and collect data for learning. An off-policy Bellman equation is derived for each agent to learn the value function for the policy under evaluation, called target policy, and find an improved policy, simultaneously. Actor and critic neural networks along with least-square approach are employed to approximate target control policies and value functions using the data generated by applying prescribed behavior policies. Finally, an off-policy RL algorithm is presented that is implemented in real time and gives the approximate optimal control policy for each agent using only measured data. It is shown that the optimal distributed policies found by the proposed algorithm satisfy the global Nash equilibrium and synchronize all agents to the leader. Simulation results illustrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

27. Cox model and decision trees: an application to breast cancer data.

Author: Cardoso Pereira, Lucas, Silva, Sóstenes Jerônimo da, Romualdo Fidelis, Cleanderson, de Lima Brito, Alisson, Alves Xavier Júnior, Silvio Fernando, dos Santos Andrade, Lorena Sofia, Casé de Oliveira, Milena Edite, and Almeida de Oliveira, Tiago
Subjects: *BREAST cancer prognosis, *BREAST tumor treatment, *DECISION trees, *MATHEMATICAL statistics, *PARAMETERS (Statistics), *SPECIALTY hospitals, *HORMONE therapy, *TIME, *RETROSPECTIVE studies, *ACQUISITION of data, *REGRESSION analysis, *MOLECULAR pathology, *CANCER patients, *TREATMENT effectiveness, *CANCER treatment, *COMPARATIVE studies, *RISK assessment, *MEDICAL records, *KAPLAN-Meier estimator, *DESCRIPTIVE statistics, *RESEARCH funding, *STATISTICAL models, *BREAST tumors, *PROPORTIONAL hazards models, *LONGITUDINAL method, *HORMONE receptor positive breast cancer, *IMMUNOTHERAPY, MORTALITY risk factors
Abstract: Objective. To evaluate, using semiparametric methodologies of survival analysis, the relationship between covariates and time to death of patients with breast cancer, as well as the determination discriminatory power in the conditional inference tree of patients who had cancer. Methods. A retrospective cohort study was conducted using data collected from medical records of women who had breast cancer and underwent treatment between 2005 and 2015 at the Hospital da Fundação de Assistencial da Paraíba in Campina Grande, State of Paraiba, Brazil. Survival curves were estimated using the Kaplan-Meier method, Cox regression, and conditional decision tree. Results. Women with triple-negative molecular subtypes had a shorter survival time compared to women with positive hormone receptors. The addition of hormone therapy reduced the risk of a patient dying by 5.5%, and the risk of a HER2-positive patient dying was 34.5% lower compared to those who were negative for this gene. Patients undergoing hormone therapy had a median survival time of 4 753 days. Conclusions. This paper shows a favorable scenario for the use of immunotherapy for patients with HER2 overexpression. Further studies could assess the effectiveness of immunotherapy in patients with other conditions, to favor the prognosis and better quality of life for the patient. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. Action needed on breastfeeding data collection to gauge medicine impact.

Subjects: BREASTFEEDING, ACQUISITION of data, LIFE sciences
Abstract: Keywords: Breastfeeding; Data Acquisition; Epidemiology; Health and Medicine; Information Technology; Pregnancy; Risk and Prevention; Swansea University; Women's Health EN Breastfeeding Data Acquisition Epidemiology Health and Medicine Information Technology Pregnancy Risk and Prevention Swansea University Women's Health 25 25 1 05/08/23 20230509 NES 230509 2023 MAY 11 (NewsRx) -- By a News Reporter-Staff News Editor at Women's Health Weekly -- A new systematic review from Swansea University in collaboration with ConcePTION, an Innovative Medicines Initiative (IMI), has called for action to: In the paper, which has been published in PLOS ONE, researchers identified only 10 established databases reporting on breastfeeding, medicines and infant outcomes: none reported education outcomes. Keywords for this news article include: Pregnancy, Epidemiology, Breastfeeding, Women's Health, Data Acquisition, Swansea University, Health and Medicine, Risk and Prevention, Information Technology. [Extracted from the article]
Published: 2023

29. HOW TO BUILD A Desk Statistics Tracker IN LESS THAN AN HOUR USING FORMS IN GOOGLE DOCS.

Author: Carter, Sunshine and Ambrosi, Thomas
Subjects: ACADEMIC libraries, INTERNET, ONLINE information services, LIBRARY reference services, STATISTICS, SEARCH engines, ACQUISITION of data
Abstract: The article discusses the University of Minnesota-Duluth's (UMD) development of a desk statistics tracking tool for use in its library. An overview of the paper method collection of daily usage statistics at the library is provided. A discussion of the reference team's attempt to create its own reference tracker using Microsoft Access is given. The UMD accessed Forms using the UM Google Docs which allows data to be entered into a spreadsheet from a form.
Published: 2011

30. Consensus algorithms for biased labeling in crowdsourcing.

Author: Zhang, Jing, Sheng, Victor S., Li, Qianmu, Wu, Jian, and Wu, Xindong
Subjects: *ALGORITHMS, *CROWDSOURCING, *ACQUISITION of data, *ADAPTIVE control systems, *EXPERT systems
Abstract: Although it has become an accepted lay view that when labeling objects through crowdsourcing systems, non-expert annotators often exhibit biases, this argument lacks sufficient evidential observation and systematic empirical study. This paper initially analyzes eight real-world datasets from different domains whose class labels were collected from crowdsourcing systems. Our analyses show that biased labeling is a systematic tendency for binary categorization; in other words, for a large number of annotators, their labeling qualities on the negative class (supposed to be the majority) are significantly greater than are those on the positive class (minority). Therefore, the paper empirically studies the performance of four existing EM-based consensus algorithms, DS, GLAD, RY, and ZenCrowd, on these datasets. Our investigation shows that all of these state-of-the-art algorithms ignore the potential bias characteristics of datasets and perform badly although they model the complexity of the systems. To address the issue of handling biased labeling, the paper further proposes a novel consensus algorithm, namely adaptive weighted majority voting (AWMV), based on the statistical difference between the labeling qualities of the two classes. AWMV utilizes the frequency of positive labels in the multiple noisy label set of each example to obtain a bias rate and then assigns weights derived from the bias rate to negative and positive labels. Comparison results among the five consensus algorithms (AWMV and the four existing) show that the proposed AWMV algorithm has the best overall performance. Finally, this paper notes some potential related topics for future study. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

31. A method for measuring consensus within groups: An index of disagreement via conditional probability.

Author: Akiyama, Yoshio, Nolan, James, Darrah, Marjorie, Abdal Rahem, Mushtaq, and Wang, Lei
Subjects: *GROUP theory, *ACQUISITION of data, *VARIANCES, *DISTRIBUTION (Probability theory), *ARITHMETIC mean
Abstract: This paper presents a new index of disagreement (or measure of consensus) for comparison of data collected using Likert items. This new index, which assesses the level of disagreement among group members, exploits the conditional distribution of the variance for a given mean. The variance is often used as a measure of disagreement, with high variance seen as a high disagreement in a group. However, since the range of the variance is a function of the mean, this implies that for a mean close to the end points of the scale, the range of the variance is relatively small and for a mean at the center of the scale the range of the variance is larger. The index of disagreement introduced in this paper takes into account both the mean and the variance and provides a way to compare two groups that is more meaningful than just considering the variance or other measures of disagreement or consensus that only depend on the variance. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

32. Technical Perspective: Motion Fields for Interactive Character Animation.

Author: van de Panne, Michiel
Subjects: ANIMATION (Cinematography), VIRTUAL reality, COMPUTER graphics, ACQUISITION of data, COMPUTER simulation, LEARNING
Abstract: The author conveys his thoughts on discarding the idea of motion clips for character animation as a set of independent motion vectors. Topics discussed include the use of prerecorded motion clips as a standard building block for generating the motion of virtual characters for use in film, games, and interactive simulations, the basic notion for developing new motions with motion clips in hand, and the need to use the neighboring motion vectors to define a discrete set of actions that represent an alternative choice for guiding the future evolution of the motion.
Published: 2014
Full Text: View/download PDF

33. Multi-subject data augmentation for target subject semantic decoding with deep multi-view adversarial learning.

Author: Li, Dan, Du, Changde, Wang, Shengpei, Wang, Haibao, and He, Huiguang
Subjects: *FUNCTIONAL magnetic resonance imaging, *ACQUISITION of data
Abstract: • We provide a new insight into the problem of brain semantic decoding. That is, we introduce a multi-subject fMRI data augmentation method to improve the performance of the target subject. • A latent space is introduced to solve the problem of feature mismatch and multiple GAN architectures are introduced to solve the problem of distribution mismatch between distinct subjects. • The experimental results show that our method is better than the baseline methods, especially when the size of the training data is small. Functional magnetic resonance imaging (fMRI) is widely used in the field of brain semantic decoding. However, as fMRI data acquisition is time-consuming and expensive, the number of samples is usually small in the existing fMRI datasets. It is difficult to build an accurate brain decoding model for a subject with insufficient fMRI data. The majority of semantic decoding methods focus on designing predictive model with limited samples, while less attention is paid to fMRI data augmentation. Leveraging data from related but different subjects can be regarded as a new strategy to improve the performance of predictive model. There are two challenges when using information from different subjects: 1) feature mismatch; 2) distribution mismatch. In this paper, we propose a multi-subject fMRI data augmentation method to address the above two challenges, which can improve the decoding accuracy of the target subject. Specifically, the subject information can be translated from one to another by using multiple subject-specific encoders, decoders and discriminators. The encoder maps each subject to a shared latent space, solving the feature mismatch problem. The decoders and discriminators form multiple generative adversarial network architectures, which solves the distribution mismatch problem. Meanwhile, to ensure that the representation of the latent space preserves information of the input space, our method not only minimizes the local data reconstruction loss, but also preserves the sparse reconstruction (semantic) relation over the whole dataset of the input space. Extensive experiments on three fMRI datasets demonstrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

34. A trustworthiness-based vehicular recruitment scheme for information collections in Distributed Networked Systems.

Author: Li, Ting, Liu, Anfeng, Xiong, Neal N., Zhang, Shaobo, and Wang, Tian
Subjects: *ACQUISITION of data, *INFORMATION technology security, *REWARD (Psychology), *INTERNET of things, *MACHINE learning
Abstract: Because of high mobility, large number of vehicles are utilized to achieve timely and quality-based information in the smart Internet of Things, which has formulated into a dynamic Distributed Networked Systems (DNS). However, designing a vehicular recruitment scheme to enhance a security-based DNS is challenging since it is hard to judge trustworthiness values of vehicular sensors. Therefore, in this paper, a novel vehicular trust evaluation scheme is proposed to analyze and supervise the data collected by vehicular sensors with a trust and low-cost style. To obtain vehicular trusts, the proposed scheme that considers time factor and gap between sensed data and real data is designed to calculate trustworthiness of vehicles. Moreover, sensing data in the vehicle sparse regions has more contributions because of its rareness. Thus, to inspire vehicles to sense data in the vehicle sparse regions, a trustworthiness-based gradient pricing method is designed to pay rewards for the vehicular sensors. Finally, with real vehicular GPS datasets, simulation results demonstrate that the proposed scheme can improve accuracy rate of data sensing by 37.72% and can improve data quality by 76.95%. By incentive pricing method, coverage ratio of data sensing is improved by 13.1%. In general, performances of the proposed scheme can be improved by 19. 39% to 22.32% approximately. Future works focus on improving information security by advanced machine learning methods in the dynamic DNS. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

35. Validating the coverage of bus schedules: A Machine Learning approach.

Author: Mendes-Moreira, João, Moreira-Matias, Luís, Gama, João, and Freire de Sousa, Jorge
Subjects: *COMPUTER scheduling, *MACHINE learning, *PUBLIC transit, *AUTOMATIC vehicle location systems, *ACQUISITION of data
Abstract: Nowadays, every public transportation company uses Automatic Vehicle Location (AVL) systems to track the services provided by each vehicle. Such information can be used to improve operational planning. This paper describes an AVL-based evaluation framework to test whether the actual Schedule Plan fits, in terms of days covered by each schedule, the network’s operational conditions. Firstly, clustering is employed to group days with similar profiles in terms of travel times (this is done for each different route). Secondly, consensus clustering is used to obtain a unique set of clusters for all routes. Finally, a set of rules about the groups content is drawn based on appropriate decision variables. Each group will correspond to a different schedule and the rules identify the days covered by each schedule. This methodology is simultaneously an evaluator of the schedules that are offered by the company (regarding its coverage) and an advisor on possible changes to such offer. It was tested by using data collected for one year in a company running in Porto, Portugal. The results are sound. The main contribution of this paper is that it proposes a way to combine Machine Learning techniques to add a novel dimension to the Schedule Plan evaluation methods: the day coverage. Such approach meets no parallel in the current literature. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

36. Data-Driven Diversity.

Author: Williams, Joan C. and Dolkas, Jamie
Subjects: DIVERSITY in the workplace, COMMERCIAL statistics, ACQUISITION of data, BUSINESS process management, RISK management in business
Abstract: Many companies today recognize that workforce diversity is both a moral imperative and a key to stronger business performance. U.S. firms alone spend billions of dollars every year to educate their employees about diversity, equity, and inclusion (DEI). But research shows that such training programs don’t lead to meaningful change. What’s necessary, say the authors, is a metrics-based approach that can identify problems, establish baselines, and measure progress. Company managers and in-house lawyers often worry that collecting diversity data may yield evidence of discrimination that can fuel lawsuits against them. But there are ways to minimize the legal threats while still embracing the use of metrics. The authors suggest first determining your risk tolerance and then developing an action plan. You will need to track both outcome metrics and process metrics and act promptly on what you find. Starting with a pilot program can be a good idea. You should also build the business case for intervention, control expectations through careful messaging, and create clear protocols for accessing, sharing, and retaining DEI data. [ABSTRACT FROM AUTHOR]
Published: 2022

37. Program Complexity.

Author: Hayes, Stephen, Kopunic, Daniel, and Wood, Roy
Subjects: PROJECT management, ACQUISITION of data, LEADERSHIP, PERFORMANCE standards, EXCELLENCE
Abstract: The article discusses the establishment of the International Centre for Complex Project Management (ICCPM) in providing solutions to the complexity of defense acquisition process. It mentions that ICCPM is a network committed to better management and delivery of complex projects to industries and government sectors. It states that ICCPM would provide the international leadership in knowledge advancement, applied practice, and delivery of excellence.
Published: 2011

38. Privacy preservation for machine learning training and classification based on homomorphic encryption schemes.

Author: Li, Jing, Kuang, Xiaohui, Lin, Shujie, Ma, Xu, and Tang, Yi
Subjects: *MACHINE learning, *DENIAL of service attacks, *REAL numbers, *PRIVACY, *ACQUISITION of data, *MACHINE performance, *VIDEO coding
Abstract: • This work presented a novel homomorphic encryption framework over non-abelian rings (matrix-ring). It is one-way secure based on the Conjugacy Search Problem. • The scheme supports real numbers encryption and achieves fast ciphertexts homomorphic comparison without decrypting any ciphetexts operations's intermediate result. • We use the scheme to realize privacy preservation for machine learning training and classification in data ciphertexts environment. The analysis shows that our proposed schemes are efficient for encryption/decryption and homomorphic operations. In recent years, more and more machine learning algorithms depend on the cloud computing. When a machine learning system is trained or classified in the cloud environment, the cloud server obtains data from the user side. Then, the privacy of the data depends on the service provider, it is easy to induce the malicious acquisition and utilization of data. On the other hand, the attackers can detect the statistical characteristics of machine learning data and infer the parameters of machine learning model through reverse attacks. Therefore, it is urgent to design an effective encryption scheme to protect the data's privacy without breaking the performance of machine learning. In this paper, we propose a novel homomorphic encryption framework over non-abelian rings, and define the homomorphism operations in ciphertexts space. The scheme can achieve one-way security based on the Conjugacy Search Problem. After that, a homomorphic encryption was proposed over a matrix-ring. It supports real numbers encryption based on the homomorphism of 2-order displacement matrix coding function and achieves fast ciphertexts homomorphic comparison without decrypting any ciphetexts operations' intermediate result. Furthermore, we use the scheme to realize privacy preservation for machine learning training and classification in data ciphertexts environment. The analysis shows that our proposed schemes are efficient for encryption/decryption and homomorphic operations. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

39. A decentralized trust inference approach with intelligence to improve data collection quality for mobile crowd sensing.

Author: Yang, Xuezheng, Zeng, Zhiwen, Liu, Anfeng, Xiong, Neal N., Wang, Tian, and Zhang, Shaobo
Subjects: *CROWDSENSING, *TRUST, *DATA quality, *ACQUISITION of data, *MATRIX decomposition, *PROBABILISTIC databases, *INFERENCE (Logic)
Abstract: Mobile Crowd Sensing (MCS) has been recognized as a promising param to construct numerous applications by employing enormous workers to perceive and collect data. The quality of MCS depends on the quality of the data submitted by the workers. Therefore, there is an urgent need to gain the trust of workers. In this paper, we propose a Decentralized Trust Inference (DTI) approach to improve data collection quality for MCS, mainly including the following components: (a) A trust evaluation method is proposed to obtain trust baselines of different levels, which can be used to assess the trust of workers. In addition, a data filling scheme on the basis of Bayesian Probabilistic Matrix Factorization (BPMF) is adopted to fill data when there is no credible baseline in the region. (b) Based on the DTI approach, we propose a worker recruitment method according to the priority of trust and bid ratio. Then, by preferentially selecting reliable and low-bid workers, we can improve data quality and reduce costs. Finally, theoretical analysis and experimental results demonstrate the effectiveness of our proposed scheme. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

40. A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems.

Author: Zhang, Junbo, Wong, Jian-Syuan, Li, Tianrui, and Pan, Yi
Subjects: *PARALLEL computers, *THEORY of knowledge, *ACQUISITION of data, *ROUGH sets, *MATHEMATICAL mappings, *SYSTEMS theory
Abstract: Abstract: Nowadays, with the volume of data growing at an unprecedented rate, large-scale data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel large-scale rough set based methods for knowledge acquisition using MapReduce in this paper. We implemented them on several representative MapReduce runtime systems: Hadoop, Phoenix and Twister. Performance comparisons on these runtime systems are reported in this paper. The experimental results show that (1) The computational time is mostly minimum on Twister while employing the same cores; (2) Hadoop has the best speedup for larger data sets; (3) Phoenix has the best speedup for smaller data sets. The excellent speedups also demonstrate that the proposed parallel methods can effectively process very large data on different runtime systems. Pitfalls and advantages of these runtime systems are also illustrated through our experiments, which are helpful for users to decide which runtime system should be used in their applications. [Copyright &y& Elsevier]
Published: 2014
Full Text: View/download PDF

41. Ulterior Motives: 2023-2024 ACM Athena Lecturer Margo Seltzer recalls the motivations behind the development of the Berkeley DB database software library, and other achievements during her career.

Author: Hoffmann, Leah
Subjects: SOFTWARE libraries (Computer programming), DATABASE design, ACQUISITION of data, OPEN source software, WOMEN college teachers
Abstract: An interview with lecturer Margo Seltzer is presented which addresses topics such as the motivations behind the creation of the Berkeley DB database software library by Seltzer, Keith Bostic, and Mike Olson. The Sleepycat Software firm and fundraising are assessed, as well as open-source dual licenses, data provenance, and collections of certain types of data.
Published: 2023
Full Text: View/download PDF

42. AI Regulation Is Coming.

Author: Candelon, Francois, di Carlo, Rodolphe Charme, De Bondt, Midas, and Evgeniou, Theodoros
Subjects: ARTIFICIAL intelligence, ARTIFICIAL intelligence laws, ACQUISITION of data, ALGORITHMS, ALGORITHM software
Abstract: For years public concern about technological risk has focused on the misuse of personal data. But as firms embed more and more artificial intelligence in products and processes, attention is shifting to the potential for bad or biased decisions by algorithms—particularly the complex, evolving kind that diagnose cancers, drive cars, or approve loans. Inevitably, many governments will feel regulation is essential to protect consumers from that risk. This article explains the moves regulators are most likely to make and the three main challenges businesses need to consider as they adopt and integrate AI. The first is ensuring fairness. That requires evaluating the impact of AI outcomes on people’s lives, whether decisions are mechanical or subjective, and how equitably the AI operates across varying markets. The second is transparency. Regulators are very likely to require firms to explain how the software makes decisions, but that often isn’t easy to unwind. The third is figuring out how to manage algorithms that learn and adapt; while they may be more accurate, they also can evolve in a dangerous or discriminatory way. Though AI offers businesses great value, it also increases their strategic risk. Companies need to take an active role in writing the rulebook for algorithms. [ABSTRACT FROM AUTHOR]
Published: 2021

43. Privacy-preserving and high-accurate outsourced disease predictor on random forest.

Author: Ma, Zhuoran, Ma, Jianfeng, Miao, Yinbin, and Liu, Ximeng
Subjects: *RANDOM forest algorithms, *ACQUISITION of data, *DATA distribution, *DATA security, *MEDICAL information storage & retrieval systems
Abstract: • We propose a privacy-preserving and high-accurate outsourced disease predictor on random forest. • We design secure computation protocols over rational numbers to guarantee computation accuracy. • Our system can provide secure disease predictor over ciphertexts in multi-data source settings. • We prove that our system can implement privacy protection and high-accurate prediction. Training data distributed across multiple different institutions is ubiquitous in disease prediction applications. Data collection may involve multiple data sources who are willing to contribute their datasets to train a more precise classifier with a larger training set. Nevertheless, integrating multiple-source datasets will leak sensitive information to untrusted data sources. Hence, it is imperative to protect multiple-source data privacy during the predictor construction process. Besides, since disease diagnosis is strongly associated with health and life, it is vital to guarantee prediction accuracy. In this paper, we propose a privacy-preserving and high-accurate outsourced disease predictor on random forest, called PHPR. PHPR system can perform secure training with medical information which belongs to different data owners, and make accurate prediction. Besides, the original data and computed results in the rational field can be securely processed and stored in cloud without privacy leakage. Specifically, we first design privacy-preserving computation protocols over rational numbers to guarantee computation accuracy and handle outsourced operations on-the-fly. Then, we demonstrate that PHPR system achieves secure disease predictor. Finally, the experimental results using real-world datasets demonstrate that PHPR system not only provides secure disease predictor over ciphertexts, but also maintains the prediction accuracy as the original classifier. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

44. User selection utilizing data properties in mobile crowdsensing.

Author: Wang, En, Yang, Yongjian, and Lou, Kaihao
Subjects: *BIG data, *HEURISTIC algorithms, *PROBABILITY theory, *ACQUISITION of data, *DETECTORS
Abstract: Highlights • We propose a triple-layer architecture considering the data's property. • We propose a user selection utilizing data properties in mobile crowdsensing. • We conduct simulations based on a synthetic data set and three real-world traces. Abstract The information sensed by a mobile device is just data, while the aggregation of information sensed by thousands of devices is knowledge. This is the basic idea of Mobile CowdSensing (MCS). In MCS, the user selection problem has always been a key issue, where the task publisher attempts to recruit a suitable set of users to cooperatively sense the events or data at some particular time points and places. Hence, the temporal and spatial constraint becomes the main criterion to select the users. However, whether a user prefers to accomplish the sensing data not only depends on the temporal and spatial relationship, but also lies on the data property. In other words, a user is likely to take the sensing task if it is interested in the data to be sensed. In this paper, we propose a user Selection utilizing data properties in mobile crowdsensing (SPM), where a triple-layer structure considering not only the temporal and spatial probability, but also the data's property is formulated, and task finishing probabilities are calculated in both the intentional and unintentional situations. Then, a greedy algorithm is proposed to select a suitable set of users for finishing the sensing tasks. We conduct extensive simulations based on three widely-used real-world traces: geolife, roma/taxi, epfl and a synthetic data set. The results show that, compared with other user selection strategies, SPM finishes the largest number of sensing tasks. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

45. PPTDS: A privacy-preserving truth discovery scheme in crowd sensing systems.

Author: Zhang, Chuan, Zhu, Liehuang, Xu, Chang, Sharif, Kashif, and Liu, Ximeng
Subjects: *DATA privacy, *REMOTE sensing, *HUMAN-computer interaction, *ACQUISITION of data, *COMPUTER users
Abstract: Abstract Benefiting from the fast development of human-carried mobile devices, crowd sensing has become an emerging paradigm to sense and collect data. However, reliability of sensory data provided by participating users is still a major concern. To address this reliability challenge, truth discovery is an effective technology to improve data accuracy, and has garnered significant attention. Nevertheless, many of state of art works in truth discovery, either failed to address the protection of participants' privacy or incurred tremendous overhead on the user side. In this paper, we first propose a privacy-preserving truth discovery scheme, named PPTDS-I, which is implemented on two non-colluding cloud platforms. By capitalizing on properties of modular arithmetic, this scheme is able to protect both users' sensory data and reliability information, and simultaneously achieve high efficiency and fault-tolerance. Additionally, for the scenarios with resource constrained devices, an efficient truth discovery scheme, named PPTDS-II, is presented. It can not only protect users' sensory data, but also avoids user participation in the iterative truth discovery procedure. Detailed security analysis shows that the proposed schemes are secure under a comprehensive threat model. Furthermore, extensive experimental analysis has been conducted, which proves the efficiency of the proposed schemes. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

46. Coupling privileged kernel method for multi-view learning.

Author: Tang, Jingjing, Tian, Yingjie, Liu, Dalian, and Kou, Gang
Subjects: *MACHINE learning, *KERNEL functions, *SUPPORT vector machines, *ACQUISITION of data, *INFORMATION science
Abstract: Highlights • MCPK serves as a simple yet effective model satisfying both consensus and complementarity principles. • We theoretically analyze the generalization capability of MCPK and compare MCPK with PSVM-2V and SVM-2K. • Extensive experiments are conducted to demonstrate the effectiveness and efficiency of the proposed method. Abstract Multi-view learning concentrates on fully using the data collected from diverse domains or obtained from various feature extractors to learn effectively. The consensus and complementarity principles provide significant guidance in multi-view modeling. Many support vector machine (SVM)-based multi-view learning models have been proposed, which mainly follow the consensus principle through exploiting the label correlation with regularization terms. In this paper, we propose a simple yet effective coupling privileged kernel method for multi-view learning, termed as MCPK. The coupling term included in the primal objective allows the combination of the errors from all views to be minimized, which guarantees the consensus principle. Similar to our previous work PSVM-2V, MCPK realizes the complementarity principle by applying the learning using privileged information (LUPI) paradigm. The proposed model not only fully integrates the information from all views in the learning process, but also maintains the characteristic of different views to some extent. We employ the standard quadratic programming solver to solve MCPK. Further more, we theoretically analyze the performance of MCPK from the viewpoints of the generalization capability and the PSVM-2V and SVM-2K models. Experimental results demonstrate that MCPK compares more favorably than other state-of-the-art multi-view algorithms in terms of classification accuracy and efficiency. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

47. Key-value data collection and statistical analysis with local differential privacy.

Author: Zhu, Hui, Tang, Xiaohu, Yang, Laurence Tianruo, Fu, Chao, and Peng, Shuangrong
Subjects: *STATISTICS, *ACQUISITION of data, *STATISTICAL accuracy, *BUDGET, *PRIVACY, *DATA analysis
Abstract: The collection and statistical analysis of simple data types (e.g., categorical, numerical and multi-dimensional data) under local differential privacy has been widely studied. Recently, researchers have focused on the collection of the key-value data, which is one of the main types of NoSQL data model. In the collection and statistical analysis of key-value data under local differential privacy, the frequency and mean of each key must be estimated simultaneously. However, achieving a good utility-privacy tradeoff is difficult, because key-value data has inherent correlation, and some users may have different numbers of key-value pairs. In this paper, we propose an efficient sampling based scheme for collecting and analyzing key-value data. Note that the more valid data collected, the higher the accuracy of statistical data under the same disturbance level and disturbance algorithm. Therefore, we make full use of probability sampling and the inherent correlation of key-value data to improve the probability of users submitting valid key-value data. Moreover, we optimize the budget allocation on key-value data, so that the overall variance of frequency and mean estimation is close to optimal. Detailed theoretical analysis and experimental results show that the proposed scheme is superior to existing schemes in accuracy. • We propose an efficient SKV-GRR scheme with separate key and value selection for collecting and analyzing key-value data. • In the key selection, we use unequal probability sampling to improve the probability of users submitting valid data. • The value selection based on weak correlated perturbation can improve the probability of users submitting valid value data. • We optimize the budget allocation on the selected key and the selected value to improve the accuracy of estimated data. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. Advertising Gets Personal.

Subjects: BEHAVIORAL targeting (Internet advertising), ACQUISITION of data, DATA privacy, RIGHT of privacy, INTERNET advertising, CONSUMER research
Abstract: The article discusses behavioral targeting in advertising and consumer privacy. According to the article, statisticians use data aggregation of online behavior and purchasing behavior to identify likely consumers for products. Topics include data mining, the retail chain Target, and activities monitored by behavioral tracking including web browsing, credit purchases, and interactions on social media websites. The internet advertising service AdChoices and the Consumer Privacy Bill of Rights measure from U.S. President Barack Obama are mentioned.
Published: 2012
Full Text: View/download PDF

49. GCOTraj: A storage approach for historical trajectory data sets using grid cells ordering.

Author: Yang, Shengxun, He, Zhen, and Chen, Yi-Ping Phoebe
Subjects: *TRAJECTORY measurements, *ACQUISITION of data, *INFORMATION retrieval, *SPATIOTEMPORAL processes, *DATA mining
Abstract: Vast amounts of trajectory data have been collected due to the popularity of GPS devices. Analyzing this wealth of data is important, thus highlighting the need to efficiently index and store this large amount of data on secondary storage to allow for efficient retrieval. Existing approaches index trajectories based on data partitioning index structures such as R-trees or space partitioning index structures such as quad-trees. R-tree like data structures, when used for indexing trajectories, result in large overlapping minimum bounding boxes and are therefore inefficient for the indexing and storage of large trajectory data sets. Existing approaches based on space partitioning do not allow the tradeoff of time versus space constraints in a way that is sensitive to query patterns. This paper proposes a new indexing and storage approach called GCOTraj, which partitions a large spatio-temporal data space into multi-dimensional grid cells and orders these grid cells in two different ways, first via traditional space filling curves which are not sensitive to query patterns; and second, via the Graph-Based Ordering approach (GBO) which is a state-of-the-art workload-based ordering technique for multidimensional data. GCOTraj clusters and stores trajectories to secondary storage based on the ordering produced by ordering algorithms. A good ordering will result in less disk seeks when retrieving disk blocks to answer a query. In addition, GCOTraj uses an index to spot the targeted data on disk and reduce the redundant data retrieved therefore reducing disk IO. Extensive experiments suggest that GCOTraj outperforms the state-of-the-art trajectory storage scheme TrajStore by a factor of up to 16.07 in IO time to answer range queries. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

50. Map matching for low-sampling-rate GPS trajectories by exploring real-time moving directions.

Author: Hsueh, Yu-Ling and Chen, Ho-Chian
Subjects: *GLOBAL Positioning System, *ACQUISITION of data, *REAL-time computing, *ALGORITHMS, *ERRORS
Abstract: Abstract Map matching is the process of matching a series of recorded geographic coordinates (e.g., a GPS trajectory) to a road network. Due to GPS positioning errors and the sampling constraints, the GPS data collected by the GPS devices are not precise, and the location of a user cannot always be correctly shown on the map. Therefore, map matching is an important preprocessing step for many applications such as navigation systems, traffic flow analysis, and autonomous cars. Unfortunately, most current map-matching algorithms only consider the distance between the GPS points and the road segments, the topology of the road network, and the speed constraint of the road segment to determine the matching results. Moreover, most current map-matching algorithms cannot handle the matching errors at junctions. In this paper, we propose a spatio-temporal based matching algorithm (STD-matching) for low-sampling-rate GPS trajectories. STD-matching considers (1) the spatial features such as the distance information and topology of the road network, (2) the speed constraints of the road network, and (3) the real-time moving direction which shows the movement of the user. Moreover, we also reduce the running time by performing GPS clustering, GPS smoothing, and the A * shortest path algorithms. In our experiments, we compare STD-matching with three existing algorithms, the ST-matching algorithm, the stMM algorithm, and the HMM-RCM algorithm, using a real data set. The experiment results show that our STD-matching algorithm outperforms the three existing algorithms in terms of matching accuracy. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

152 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources