162,140 results on '"020204 information systems"'
Search Results
2. The Social Fabric Framework
- Author
-
Signe Dyrby, Michel Avital, and Tina Blegind Jensen
- Subjects
02 engineering and technology ,Library and Information Sciences ,Digital media ,020204 information systems ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Social media ,Sociology ,Research method ,Organisational work ,business.industry ,05 social sciences ,Public relations ,Enterprise social media ,Making-of ,Entreprise social media ,Social fabric ,Trace data ,Social making ,Controversy analysis ,business ,050203 business & management ,Information Systems - Abstract
The proliferation of enterprise social media generates an ever-growing record of digital traces that provides ample opportunities to study the social making of organisations. Subsequently, we present the social fabric framework, which comprises a structured five-step approach for eliciting, interpreting, and representing the situated social idiosyncrasies and underlying patterns of the social making of organisations. The paper focuses on the application of the social fabric framework as a research method. However, the framework also lends itself to practice as a diagnostic tool that can detect emergent changes in the social fabric of an organisation as well as support organisational development and change. Moreover, by providing a vocabulary for articulating the social making of organisations, the framework can help organisation members reify their dispositions, make sense of the social dynamics, and enable a constructive discussion at the grassroots level about any controversy or aspiration.
- Published
- 2023
3. Discovering and Interpreting Biased Concepts in Online Communities
- Author
-
Xavier Ferrer-Aran, Tom van Nuenen, Natalia Criado, and Jose Such
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,02 engineering and technology ,Computer Science Applications ,Computer Science - Computers and Society ,Artificial Intelligence (cs.AI) ,Computational Theory and Mathematics ,020204 information systems ,Computers and Society (cs.CY) ,68T50, 68T09, 91D30 ,0202 electrical engineering, electronic engineering, information engineering ,Computation and Language (cs.CL) ,Information Systems - Abstract
Language carries implicit human biases, functioning both as a reflection and a perpetuation of stereotypes that people carry with them. Recently, ML-based NLP methods such as word embeddings have been shown to learn such language biases with striking accuracy. This capability of word embeddings has been successfully exploited as a tool to quantify and study human biases. However, previous studies only consider a predefined set of biased concepts to attest (e.g., whether gender is more or less associated with particular jobs), or just discover biased words without helping to understand their meaning at the conceptual level. As such, these approaches can be either unable to find biased concepts that have not been defined in advance, or the biases they find are difficult to interpret and study. This could make existing approaches unsuitable to discover and interpret biases in online communities, as such communities may carry different biases than those in mainstream culture. This paper improves upon, extends, and evaluates our previous data-driven method to automatically discover and help interpret biased concepts encoded in word embeddings. We apply this approach to study the biased concepts present in the language used in online communities and experimentally show the validity and stability of our method
- Published
- 2023
4. A Deep Dual Adversarial Network for Cross-Domain Recommendation
- Author
-
Qian Zhang, Wenhui Liao, Guangquan Zhang, Bo Yuan, and Jie Lu
- Subjects
Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,08 Information and Computing Sciences ,02 engineering and technology ,Computer Science Applications ,Information Systems - Abstract
Data sparsity is a common issue for most recommender systems and can severely degrade the usefulness of a system. One of the most successful solutions has been cross-domain recommender systems, which supplement the sparse data of the target domain with knowledge transferred from a source domain rich with data that is in some way related. However, there are three challenges that, if overcome, could significantly improve the quality and accuracy of cross-domain recommendation: 1) ensuring latent feature spaces of the users and items are both maximally matched; 2) taking consideration of user-item relationship and their interaction in modelling user preference; 3) enabling a two-way cross-domain recommendation that both the source and the target domains benefit from a knowledge exchange. Hence, in this paper, we propose a novel deep neural network called Dual Adversarial network for Cross-Domain Recommendation. By training the shared encoders with a domain discriminator via dual adversarial learning, the latent feature spaces for both the users and items are maximally matched. Allowing the two domains to collaboratively benefit from each other results in better recommendations for both domains. Extensive experiments with real-world datasets on six tasks demonstrate that DA-CDR significantly outperforms seven state-of-the-art baselines.
- Published
- 2023
5. Public Management Challenges in the Digital Risk Society
- Author
-
Helle Zinner Henriksen, Ojelanki K. Ngwenyama, and Daniel Hardt
- Subjects
Critical social theory ,Digital risk society ,Government ,business.industry ,05 social sciences ,Public debate ,02 engineering and technology ,Data breach ,Library and Information Sciences ,Public relations ,language.human_language ,Danish ,Business economics ,Impression management ,Critical theory ,020204 information systems ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,language ,Risk society ,National digital infrastructure ,business ,050203 business & management ,Information Systems ,Critical is research - Abstract
The rise of the digital society is accompanied by incalculable social risks, but very little IS research has examined the implications of the new digital society. Drawing on concepts from Beck’s critical theory of the risk society and critical discourse analysis, this study examines the public discourse on risk events during the launch of NemID, a personal digital identifier for Danish citizens. This research illustrates our difficulties and challenges in managing some of the fundamental social risks from societal digitalisation. Limited institutional capabilities for digital technologies force public officials to depend on private companies motived by profit instead of the public interest. Beliefs in digital technology as the primary determinant of social and economic progress also present many public management dilemmas. When digital risk events occur and citizens’ fears are stoked by news media and public discourse, public officials seem to have no other strategy for managing the escalating fears than systematically distorted communication. The continued rise of the digital risk society demands that IS research respond to the challenge of generating knowledge for its public management.
- Published
- 2023
6. Data Anonymization With Diversity Constraints
- Author
-
Yu Huang, Mostafa Milani, and Fei Chiang
- Subjects
Computational Theory and Mathematics ,Data anonymization ,Computer science ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Data mining ,computer.software_genre ,computer ,Computer Science Applications ,Information Systems ,Diversity (business) - Published
- 2023
7. Balancing Security and Privacy in Genomic Range Queries
- Author
-
Ercan Ozturk, Gene Tsudik, and Xuhua Ding
- Subjects
Novel technique ,0303 health sciences ,Range query (data structures) ,General Computer Science ,business.industry ,Computer science ,02 engineering and technology ,Computer security ,computer.software_genre ,03 medical and health sciences ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Personalized medicine ,business ,Safety, Risk, Reliability and Quality ,computer ,030304 developmental biology - Abstract
Exciting recent advances in genome sequencing, coupled with greatly reduced storage and computation costs, make genomic testing increasingly accessible to individuals. Already today, one’s digitized DNA can be easily obtained from a sequencing lab and later used to conduct numerous tests by engaging with a testing facility. Due to the inherent sensitivity of genetic material and the often-proprietary nature of genomic tests, privacy is a natural and crucial issue. While genomic privacy received a great deal of attention within and outside the research community, genomic security has not been sufficiently studied. This is surprising since the usage of fake or altered genomes can have grave consequences, such as erroneous drug prescriptions and genetic test outcomes. Unfortunately, in the genomic domain, privacy and security (as often happens) are at odds with each other. In this article, we attempt to reconcile security with privacy in genomic testing by designing a novel technique for a secure and private genomic range query protocol between a genomic testing facility and an individual user. The proposed technique ensures authenticity and completeness of user-supplied genomic material while maintaining its privacy by releasing only the minimum thereof. To confirm its broad usability, we show how to apply the proposed technique to a previously proposed genomic private substring matching protocol. Experiments show that the proposed technique offers good performance and is quite practical. Furthermore, we generalize the genomic range query problem to sparse integer sets and discuss potential use cases.
- Published
- 2023
8. A random growth model with any real or theoretical degree distribution
- Author
-
Frédéric Giroire, Stéphane Pérennes, Thibaud Trolliet, Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA), Combinatorics, Optimization and Algorithms for Telecommunications (COATI), COMmunications, Réseaux, systèmes Embarqués et Distribués (Laboratoire I3S - COMRED), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Laboratoire d'Informatique, Signaux, et Systèmes de Sophia Antipolis (I3S), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), This work has been supported by the French government through the UCA JEDI(ANR-15-IDEX-01) and EUR DS4H (ANR-17-EURE-004) Investments in the Futureprojects, by the SNIF project, and by Inria associated team EfDyNet., ANR-15-IDEX-0001,UCA JEDI,Idex UCA JEDI(2015), Université Nice Sophia Antipolis (1965 - 2019) (UNS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-COMmunications, Réseaux, systèmes Embarqués et Distribués (Laboratoire I3S - COMRED), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (1965 - 2019) (UNS), EfdyNet, SNIF, and ANR-17-EURE-0004,UCA DS4H,UCA Systèmes Numériques pour l'Homme(2017)
- Subjects
FOS: Computer and information sciences ,Random Growth Model ,General Computer Science ,Twitter ,02 engineering and technology ,Poisson distribution ,Preferential attachment ,Power law ,Complex Networks ,[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI] ,Theoretical Computer Science ,[INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI] ,symbols.namesake ,Random Graphs ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Statistical physics ,Mathematics ,Social and Information Networks (cs.SI) ,Degree (graph theory) ,Preferential Attachment ,Computer Science - Social and Information Networks ,Function (mathematics) ,Complex network ,Heavy-Tailed Distributions ,Degree distribution ,Degree Distribution ,[MATH.MATH-PR]Mathematics [math]/Probability [math.PR] ,symbols ,020201 artificial intelligence & image processing ,Node (circuits) - Abstract
The degree distributions of complex networks are usually considered to be power law. However, it is not the case for a large number of them. We thus propose a new model able to build random growing networks with (almost) any wanted degree distribution. The degree distribution can either be theoretical or extracted from a real-world network. The main idea is to invert the recurrence equation commonly used to compute the degree distribution in order to find a convenient attachment function for node connections - commonly chosen as linear. We compute this attachment function for some classical distributions, as the power-law, broken power-law, geometric and Poisson distributions. We also use the model on an undirected version of the Twitter network, for which the degree distribution has an unusual shape. We finally show that the divergence of chosen attachment functions is heavily links to the heavy-tailed property of the obtained degree distributions., Comment: 23 pages, 3 figures
- Published
- 2023
9. Formal Analysis and Estimation of Chance in Datasets Based on Their Properties
- Author
-
Petr Knoth, Abdel Aziz Taha, Mihai Lupu, Luca Papariello, and Bampoulidis Alexandros
- Subjects
Estimation ,Generalization ,Process (engineering) ,Computer science ,business.industry ,Small number ,Estimator ,02 engineering and technology ,Machine learning ,computer.software_genre ,Class (biology) ,Computer Science Applications ,Computational Theory and Mathematics ,Sample size determination ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,computer ,Predictive modelling ,Information Systems - Abstract
Machine learning research, particularly in genomics, is often based on wide shaped datasets, i.e. datasets having a large number of features, but a small number of samples. Such configurations raise the possibility of chance influence (the increase of measured accuracy due to chance correlations) on the learning process and the evaluation results. Prior research underlined the problem of generalization of models obtained based on such data. In this paper, we investigate the influence of chance on prediction and show its significant effects on wide shaped datasets. First, we empirically demonstrate how significant the influence of chance in such datasets is by showing that prediction models trained on thousands of randomly generated datasets can achieve high accuracy. This is the case even when using cross-validation. We then provide a formal analysis of chance influence and design formal chance influence estimators based on the dataset parameters, namely its sample size, the number of features, the number of classes and the class distribution. Finally, we provide an in-depth discussion of the formal analysis including applications of the findings and recommendations on chance influence mitigation.
- Published
- 2022
10. Faster Domain Adaptation Networks
- Author
-
Lei Zhu, Jingjing Li, Hongzu Su, Ke Lu, Heng Tao Shen, and Mengmeng Jing
- Subjects
Early stopping ,Artificial neural network ,business.industry ,Computer science ,Distributed computing ,Deep learning ,02 engineering and technology ,Computer Science Applications ,Domain (software engineering) ,Computational Theory and Mathematics ,020204 information systems ,Transfer (computing) ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,Adaptation (computer science) ,Protocol (object-oriented programming) ,Edge computing ,Information Systems - Abstract
It is widely acknowledged that the success of deep learning is built on large-scale training data and tremendous computing power. However, the data and computing power are not always available for many real-world applications. In this paper, we address the machine learning problem where it lacks training data and limits computing power. Specifically, we investigate domain adaptation which is able to transfer knowledge from one labeled source domain to an unlabeled target domain, so that we do not need much training data from the target domain. At the same time, we consider the situation that the running environment is confined, e.g., in edge computing the end device has very limited running resources. Technically, we present the Faster Domain Adaptation (FDA) protocol and further report two paradigms of FDA: early stopping and amid skipping. The former accelerates domain adaptation by multiple early exit points. The latter speeds up the adaptation by wisely skip several amid neural network blocks. Extensive experiments on standard benchmarks verify that our method is able to achieve the comparable and even better accuracy but employ much less computing resources. To the best of our knowledge, there are very few works which investigated accelerating knowledge adaptation in the community.
- Published
- 2022
11. Coloring Embedder: Towards Multi-Set Membership Queries in Web Cache Sharing
- Author
-
Zhaodong Kang, Jin Xu, Wenqi Wang, Jie Jiang, Tong Yang, Shiqi Jiang, Tilman Wolf, and Bin Cui
- Subjects
Theoretical computer science ,Source code ,Computer science ,media_common.quotation_subject ,Hash function ,02 engineering and technology ,Bloom filter ,Data structure ,Computer Science Applications ,Set (abstract data type) ,Computational Theory and Mathematics ,020204 information systems ,Web cache ,0202 electrical engineering, electronic engineering, information engineering ,Hit rate ,Graph (abstract data type) ,Information Systems ,media_common - Abstract
Multi-set membership queries are fundamental operations in data science. In this paper, we propose a new data structure for multi-set membership queries, named coloring embedder, which is fast, accurate, and memory efficient. The idea of coloring embedder is to first map elements to a high-dimensional space, which nearly eliminates hashing collisions, and then use a dimensional reduction representation, similar to coloring a graph, to save memory. Theoretical proofs and experimental results show that the coloring embedder is effective in solving the problem of multi-set membership queries. We also find that web cache sharing is one of the typical application scenarios of the multi-set membership queries and current methods based on Bloom filters always send redundant queries. We apply coloring embedder to web cache sharing by arranging our data structure on the on-chip and off-chip memory and designing query, insertion and deletion operations for this scenario. The experimental results show that compared with the present method, our method can reduce the queries sent by proxies while reaching equal hit rate with the same size of on-chip memory. The source code of coloring embedder has been released on Github.
- Published
- 2022
12. Learning From Incomplete and Inaccurate Supervision
- Author
-
Yuan Jiang, Peng Zhao, Zhi-Hua Zhou, and Zhen-Yu Zhang
- Subjects
Training set ,Exploit ,Notice ,Computer science ,business.industry ,Supervised learning ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science Applications ,Task (project management) ,Noise ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Labeled data ,020201 artificial intelligence & image processing ,Software system ,Noise (video) ,Artificial intelligence ,business ,computer ,Information Systems - Abstract
In plenty of real-life tasks, strongly supervised information is hard to obtain, such that there is not sufficient high-quality supervision to make traditional learning approaches succeed. Therefore, weakly supervised learning has drawn considerable attention recently. In this paper, we consider the problem of learning from incomplete and inaccurate supervision, where only a limited subset of training data is labeled but potentially with noise. This setting is challenging and of great importance but rarely studied in the literature. We notice that in many applications, the limited labeled data are usually with one-sided noise. For instance, considering the bug detection task in the software system, the identified buggy codes are indeed with defects whereas the codes that have been checked many times or newly fixed may still have other flaws due to the complexity of the system. We propose a novel method which is able to effectively alleviate the negative influence of one-sided label noise with the help of a vast number of unlabeled data. Excess risk analysis is provided as theoretical justifications on the usefulness of incomplete and one-sided inaccurate supervision. We conduct experiments on synthetic, benchmark datasets, and real-life tasks to validate the effectiveness of the proposed approach.
- Published
- 2022
13. On Distributed Computing With Heterogeneous Communication Constraints
- Author
-
Nishant Shakya, Jinyuan Chen, and Fan Li
- Subjects
Mobile edge computing ,Shuffling ,Computer science ,Computer Networks and Communications ,Computation ,Distributed computing ,020206 networking & telecommunications ,02 engineering and technology ,Computer Science Applications ,020204 information systems ,Server ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Edge computing ,Heterogeneous network ,Software - Abstract
We consider a distributed computing framework where the distributed nodes have different communication capabilities, motivated by the heterogeneous networks in data centers and mobile edge computing systems. Following the structure of MapReduce, this framework consists of Map computation phase, Shuffle phase, and Reduce computation phase. The Shuffle phase allows distributed nodes to exchange intermediate values, in the presence of heterogeneous communication bottlenecks for different nodes (heterogeneous communication load constraints). For this setting, we characterize the minimum total computation load and the minimum worst-case computation load in some cases, under the heterogeneous communication load constraints. While the total computation load depends on the sum of the computation loads of all the nodes, the worst-case computation load depends on the computation load of a node with the heaviest job. We show an interesting insight that, for some cases, there is a tradeoff between the minimum total computation load and the minimum worst-case computation load, in the sense that both cannot be achieved at the same time. The achievability schemes are proposed with careful design on the file assignment and data shuffling. Beyond the cut-set bound, a novel converse is proposed using the proof by contradiction. For the general case, we identify two extreme regimes in which the scheme with coding and the scheme without coding are optimal, respectively.
- Published
- 2022
14. Possibilistic Data Cleaning
- Author
-
Henning Köhler and Sebastian Link
- Subjects
Degree (graph theory) ,Computer science ,Relational database ,Vertex cover ,02 engineering and technology ,Computer Science Applications ,Data modeling ,Set (abstract data type) ,Computational Theory and Mathematics ,020204 information systems ,Data integrity ,0202 electrical engineering, electronic engineering, information engineering ,Tuple ,Algorithm ,Information Systems ,Possibility theory - Abstract
Classical data cleaning performs a minimal set of operations on the data to satisfy the given integrity constraints. Often, this minimization is equivalent to vertex cover, for example when tuples can be removed due to the violation of functional dependencies. Classically, the uncertainty of tuples and constraints is ignored. We propose not to view data as dirty but the uncertainty information about data. Since probabilities are often unavailable and their treatment is limited due to correlations in the data, we investigate a qualitative approach to uncertainty. Tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. Our approach is non-invasive to the data as we lower the possibility degree of tuples as little as possible. The new resulting qualitative version of vertex cover remains NP-hard. We establish an algorithm that is fixed-parameter tractable in the size of the qualitative vertex cover. Experiments with synthetic and real-world data show that our algorithm outperforms the classical algorithm proportionally to the available number of uncertainty degrees. Based on the novel mining of the certainty degrees with which constraints hold, our framework becomes applicable even when uncertainty information is unavailable.
- Published
- 2022
15. Transfer Learning for Dynamic Feature Extraction Using Variational Bayesian Inference
- Author
-
Stevan Dubljevic, Biao Huang, and Junyao Xie
- Subjects
Training set ,Computer science ,Probabilistic logic ,02 engineering and technology ,computer.software_genre ,Bayesian inference ,Soft sensor ,Computer Science Applications ,Weighting ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Probability distribution ,Data mining ,Point estimation ,Transfer of learning ,computer ,Information Systems ,Test data - Abstract
Data-driven methods have been extensively utilized in establishing predictive models from historical data for process monitoring and prediction of quality variables. However, most data-driven approaches assume that training data and testing data come from steady-state operating regions and follow the same distribution, which may not be the case when it comes to complex industrial processes. To avoid these restrictive assumptions and account for practical implementation, a novel online transfer learning technique is proposed to dynamically learn cross-domain features based on the variational Bayesian inference in this work. Stemming from the probabilistic slow feature analysis, a transfer slow feature analysis (TSFA) technique is presented to transfer dynamic models learned from different source processes to enhance prediction performance in the target process. In particular, two weighting functions associated with transition and emission equations are introduced and updated dynamically to quantify the transferability from source domains to the target domain at each time instant. Instead of point estimation, a variational Bayesian inference scheme is designed to learn the parameters under probability distributions accounting for corresponding uncertainties. The effectiveness of the proposed technique with applications to soft sensor modelling is demonstrated by a simulation example, a public dataset and an industrial case study.
- Published
- 2022
16. Context-Aware Service Recommendation Based on Knowledge Graph Embedding
- Author
-
Djamal Benslimane, Haithem Mezni, Ladjel Bellatreche, Service Oriented Computing (SOC), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), and Université de Lyon-Université Lumière - Lyon 2 (UL2)
- Subjects
Service (business) ,Class (computer programming) ,Information retrieval ,Relation (database) ,Computer science ,Quality of service ,Context (language use) ,02 engineering and technology ,Computer Science Applications ,Recurrent neural network ,Computational Theory and Mathematics ,020204 information systems ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,[INFO]Computer Science [cs] ,Representation (mathematics) ,ComputingMilieux_MISCELLANEOUS ,Information Systems - Abstract
As a class of context-aware systems, context-aware service recommendation aims to bind high-quality services to users while taking into account their context requirements, including invocation time, location, social profiles, connectivity, and so on. However, current CASR approaches are not scalable with the huge amount of service data (QoS and context information, users reviews and feedbacks). In addition, they lack a rich representation of contextual information as they adopt a simple matrix view. Moreover, current CASR approaches adopt the traditional user-service relation and they do not allow for multi-relational interactions between users and services in different contexts. To offer a scalable and context-sensitive service recommendation with great analysis and learning capabilities, we provide a rich and multi-relational representation of the CASR knowledge, based on the concept of knowledge graph. The constructed context-aware service knowledge graph (C-SKG) is, then, transformed into a low-dimentional vector space to facilitate its processing. For this purpose, we adopt Dilated Recurrent Neural Networks to propose a context-aware knowledge graph embedding, based on the principles of first-order and subgraph-aware proximity. Finally, a recommendation algorithm is defined to deliver the top-rated services according to the target user's context. Experiments have proved the accuracy and scalability of our CASR approach.
- Published
- 2022
17. Density-Based Top-K Spatial Textual Clusters Retrieval
- Author
-
Christian S. Jensen, Song Wu, Kezhong Lu, Hao Zhou, Ilkcan Keles, Simonas Šaltenis, and Dingming Wu
- Subjects
DBSCAN ,Information retrieval ,Exploit ,Point of interest ,Computer science ,query processing ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,02 engineering and technology ,Space (commercial competition) ,Object (computer science) ,Clustering ,Computer Science Applications ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Cluster (physics) ,data management ,Web content ,Cluster analysis ,indexing ,Information Systems - Abstract
So-called spatial web queries retrieve web content representing points of interest, such that the points of interest have descriptions that are relevant to query keywords and are located close to a query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, the top-k spatial textual cluster retrieval ( $k$ -STC) query that returns the top-k clusters that (i) are located close to a query location, (ii) contain objects that are relevant with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a DBSCAN-based approach and an OPTICS-based approach that rely on on-line density-based clustering and that exploit early stop conditions. Empirical studies on real data sets offer evidence that the paper's proposals can find good quality clusters and are capable of excellent performance.
- Published
- 2022
18. Point-of-Interest Recommendation With Global and Local Context
- Author
-
Aixin Sun, Kai Zheng, Shuo Shang, Xiangliang Zhang, Peng Han, and Peilin Zhao
- Subjects
Context model ,Information retrieval ,Point of interest ,Computer science ,Context (language use) ,02 engineering and technology ,Computer Science Applications ,Task (project management) ,Matrix decomposition ,Computational Theory and Mathematics ,020204 information systems ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Collaborative filtering ,Task analysis ,Information Systems - Abstract
The task of point of interest (POI) recommendation aims to recommend unvisited places to users based on their check-in history. A major challenge in POI recommendation is data sparsity, because a user typically visits only a very small number of POIs among all available POIs. In this paper, we propose AUC-MF to address the POI recommendation problem by maximizing Area Under the ROC curve (AUC). AUC has been widely used for measuring classification performance with imbalanced data distributions. To optimize AUC, we transform the recommendation task to a classification problem, where the visited locations are positive examples and the unvisited are negative ones. We define a new lambda for AUC to utilize the LambdaMF model, which combines the lambda-based method and matrix factorization model in collaborative filtering. Many studies have shown that geographic information plays an important role in POI recommendation. In this study, we focus on two levels geographic information: local similarity and global similarity. We further show that AUC-MF can be easily extended to incorporate geographical contextual information for POI recommendation.
- Published
- 2022
19. Attentive Representation Learning With Adversarial Training for Short Text Clustering
- Author
-
Jianyong Wang, Chao Dong, Wei Zhang, and Jianhua Yin
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Semantic analysis (machine learning) ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science - Information Retrieval ,Machine Learning (cs.LG) ,Adversarial system ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,Computer Science - Computation and Language ,business.industry ,Unified Model ,Document clustering ,Minimax ,Automatic summarization ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Artificial intelligence ,business ,Computation and Language (cs.CL) ,computer ,Feature learning ,Information Retrieval (cs.IR) ,Information Systems - Abstract
Short text clustering has far-reaching effects on semantic analysis, showing its importance for multiple applications such as corpus summarization and information retrieval. However, it inevitably encounters the severe sparsity of short text representations, making the previous clustering approaches still far from satisfactory. In this paper, we present a novel attentive representation learning model for shot text clustering, wherein cluster-level attention is proposed to capture the correlations between text representations and cluster representations. Relying on this, the representation learning and clustering for short texts are seamlessly integrated into a unified model. To further ensure robust model training for short texts, we apply adversarial training to the unsupervised clustering setting, by injecting perturbations into the cluster representations. The model parameters and perturbations are optimized alternately through a minimax game. Extensive experiments on four real-world short text datasets demonstrate the superiority of the proposed model over several strong competitors, verifying that robust adversarial training yields substantial performance gains., Comment: 14pages, to appear in IEEE TKDE
- Published
- 2022
20. Fast Error-Bounded Distance Distribution Computation
- Author
-
Man Lung Yiu, Jiahao Zhang, Qing Li, and Bo Tang
- Subjects
Work (thermodynamics) ,Computer science ,Computation ,Sampling (statistics) ,02 engineering and technology ,Distance measures ,Computer Science Applications ,Distribution (mathematics) ,Computational Theory and Mathematics ,Orders of magnitude (time) ,020204 information systems ,Bounded function ,0202 electrical engineering, electronic engineering, information engineering ,Cluster analysis ,Algorithm ,Information Systems - Abstract
In this work we study the distance distribution computation problem. It has been widely used in many real-world applications, e.g., human genome clustering, cosmological model analysis, and parameter tuning. The straightforward solution for the exact distance distribution computation problem is unacceptably slow due to (i) massive data size, and (ii) expensive distance computation. In this paper, we propose a novel method to compute approximate distance distributions with error bound guarantees. Furthermore, our method is generic to different distance measures. We conduct extensive experimental studies on three widely used distance measures with real-world datasets. The experimental results demonstrate that our proposed method outperforms sampling-based solution (without error guarantees) by up to three orders of magnitude.
- Published
- 2022
21. MDLdroidLite: A Release-and-Inhibit Control Approach to Resource-Efficient Deep Neural Networks on Mobile Devices
- Author
-
Yu Zhang, Xi Zhang, and Tao Gu
- Subjects
Computer Networks and Communications ,Computer science ,business.industry ,Deep learning ,Distributed computing ,Inference ,02 engineering and technology ,Model predictive control ,Resource (project management) ,020204 information systems ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,Overhead (computing) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Electrical and Electronic Engineering ,Adaptation (computer science) ,business ,Mobile device ,Software - Abstract
Mobile Deep Learning (MDL) has emerged as a privacy-preserving learning paradigm for mobile devices. This paradigm offers unique features such as privacy preservation, continual learning and low-latency inference to the building of personal mobile sensing applications. However, squeezing Deep Learning to mobile devices is extremely challenging due to resource constraint. Traditional Deep Neural Networks (DNNs) are usually over-parametered, hence incurring huge resource overhead for on-device learning. In this paper, we present a novel on-device deep learning framework named MDLdroidLite that transforms traditional DNNs into resource-efficient model structures for on-device learning. To minimize resource overhead, we propose a novel Release-and-Inhibit Control (RIC) approach based on Model Predictive Control theory to efficiently grow DNNs from tiny to backbone. We also design a gate-based fast adaptation mechanism for channel-level knowledge transformation to quickly adapt new-born neurons with existing neurons, enabling safe parameter adaptation and fast convergence for on-device training. Our evaluations show that MDLdroidLite boosts on-device training on various PMS datasets with 28x to 50x less model parameters, 4x to 10x less floating number operations than the state-of-the-art model structures while keeping the same accuracy level.
- Published
- 2022
22. CFFNN: Cross Feature Fusion Neural Network for Collaborative Filtering
- Author
-
Bo Jin, Ruiyun Yu, Jie Li, Fadi J. Kurdahi, Dezhi Ye, Wang Zhihong, Zhang Biyun, and Ann Move Oguti
- Subjects
Focus (computing) ,Artificial neural network ,Computer science ,business.industry ,Feature extraction ,Pattern recognition ,02 engineering and technology ,Construct (python library) ,Perceptron ,Computer Science Applications ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Collaborative filtering ,Learning to rank ,Artificial intelligence ,Layer (object-oriented design) ,business ,Information Systems - Abstract
Numerous state-of-the-art recommendation frameworks employ deep neural networks in Collaborative Filtering (CF). In this paper, we propose a cross feature fusion neural network (CFFNN) for the enhancement of CF. Existing studies overlook either user preferences for various item features or the relationship between item features and user features. To solve this problem, we construct a cross feature fusion network to enable the fusion of user features and item features as well as a self-attention network to determine users' preferences for items. Specifically, we design a feature extraction layer with multiple MLP (Multilayer Perceptrons) modules to extract both user features and item features. Then, we introduce a cross feature fusion mechanism for an accurate determination of the relationship between different user-item interactions. The features of users and items are crossly embedded and then fed into a prediction network. The attention mechanism enables the model to focus on more effective features. The effectiveness of CFFNN model is demonstrated through extensive experiments on four real-world datasets. The experimental results indicate that CFFNN significantly outperforms the existing state-of-the-art models, with a relative improvement of 3.0\% to 12.1\% on hit ratio (HR) and normalized discounted cumulative gain (NDCG) compared with the baselines.
- Published
- 2022
23. Accelerated Log-Regularized Convolutional Transform Learning and Its Convergence Guarantee
- Author
-
Haoli Zhao, Shengli Xie, Zuyuan Yang, Yongcheng Guo, and Zhenni Li
- Subjects
Computer science ,Open problem ,Regular polygon ,Extrapolation ,02 engineering and technology ,Function (mathematics) ,Convolutional neural network ,Computer Science Applications ,Human-Computer Interaction ,CTL ,Control and Systems Engineering ,020204 information systems ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,Unsupervised learning ,020201 artificial intelligence & image processing ,Electrical and Electronic Engineering ,Algorithm ,Software ,Information Systems - Abstract
Convolutional transform learning (CTL), learning filters by minimizing the data fidelity loss function in an unsupervised way, is becoming very pervasive, resulting from keeping the best of both worlds: the benefit of unsupervised learning and the success of the convolutional neural network. There have been growing interests in developing efficient CTL algorithms. However, developing a convergent and accelerated CTL algorithm with accurate representations simultaneously with proper sparsity is an open problem. This article presents a new CTL framework with a log regularizer that can not only obtain accurate representations but also yield strong sparsity. To efficiently address our nonconvex composite optimization, we propose to employ the proximal difference of the convex algorithm (PDCA) which relies on decomposing the nonconvex regularizer into the difference of two convex parts and then optimizes the convex subproblems. Furthermore, we introduce the extrapolation technology to accelerate the algorithm, leading to a fast and efficient CTL algorithm. In particular, we provide a rigorous convergence analysis for the proposed algorithm under the accelerated PDCA. The experimental results demonstrate that the proposed algorithm can converge more stably to desirable solutions with lower approximation error and simultaneously with stronger sparsity and, thus, learn filters efficiently. Meanwhile, the convergence speed is faster than the existing CTL algorithms.
- Published
- 2022
24. Event Popularity Prediction Using Influential Hashtags From Social Media
- Author
-
Lei Chen, Xi Chen, Jeffrey Chan, Timos Sellis, Yanchun Zhang, and Xiangmin Zhou
- Subjects
Boosting (machine learning) ,Event (computing) ,business.industry ,Computer science ,02 engineering and technology ,Machine learning ,computer.software_genre ,Popularity ,Computer Science Applications ,Set (abstract data type) ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,Graph (abstract data type) ,Pairwise comparison ,Social media ,Artificial intelligence ,business ,computer ,Information Systems - Abstract
Event popularity prediction over social media is crucial for estimating information propagation scope, decision making, and emergency prevention. However, existing approaches only focus on predicting the occurrences of single attribute such as a message, a hashtag or an image, which are not comprehensive enough for representing complex social event propagation. In this paper, we predict the event popularity, where an event is described as a set of messages containing multiple hashtags. We propose a novel hashtag-influence-based event popularity prediction by mining the impact of an influential hashtag set on the event propagation. Specifically, we first propose a hashtag-influence-based cascade model to select the influential hashtags over an event hashtag graph built by the pairwise hashtag similarity and the topic distribution of event-related hashtags. A novel measurement is proposed to identify the hashtag influence of an event over its content and social impacts. A hashtag correlation-based algorithm is proposed to optimize the seed selection in a greedy manner. Then, we propose an event-fitting boosting model to predict the event popularity by embedding the feature importance over events into the XGBOOST model. Moreover, we propose an event-structure-based method, which incrementally updates the prediction model over social streams. We have conducted extensive experiments to prove the effectiveness and efficiency of the proposed approach.
- Published
- 2022
25. Consensus One-Step Multi-View Subspace Clustering
- Author
-
Pei Zhang, Xinwang Liu, Zhiping Cai, Jian Xiong, Wentao Zhao, En Zhu, and Sihang Zhou
- Subjects
Optimization problem ,Iterative method ,Computer science ,02 engineering and technology ,computer.software_genre ,Computer Science Applications ,Computational Theory and Mathematics ,Similarity (network science) ,Discriminative model ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Noise (video) ,Data mining ,Representation (mathematics) ,Cluster analysis ,computer ,Information Systems - Abstract
Multi-view clustering has attracted increasing attention in multimedia, machine learning and data mining communities. As one kind of the essential multi-view clustering algorithm, multi-view subspace clustering (MVSC) becomes more and more popular due to its strong ability to reveal the intrinsic low dimensional clustering structure hidden across views. Despite superior clustering performance in various applications, we observe that existing MVSC methods directly fuse multi-view information in the similarity level by merging noisy affinity matrices; and isolate the processes of affinity learning, multi-view information fusion and clustering. Both factors may cause insufficient utilization of multi-view information, leading to unsatisfying clustering performance. This paper proposes a novel consensus one-step multi-view subspace clustering (COMVSC) method to address these issues. Instead of directly fusing multiple affinity matrices, COMVSC optimally integrates discriminative partition-level information, which is helpful to eliminate noise among data. Moreover, the affinity matrices, consensus representation and final clustering labels matrix are learned simultaneously in a unified framework. By doing so, the three steps can negotiate with each other to best serve the clustering task, leading to improved performance. Accordingly, we propose an iterative algorithm to solve the resulting optimization problem. Extensive experiment results on benchmark datasets demonstrate the superiority of our method against other state-of-the-art approaches.
- Published
- 2022
26. Unsupervised Ensemble Classification With Sequential and Networked Data
- Author
-
Panagiotis A. Traganitis and Georgios B. Giannakis
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Computer Science::Machine Learning ,Independent and identically distributed random variables ,Computer Science - Machine Learning ,Matching (graph theory) ,Computer science ,Machine Learning (stat.ML) ,02 engineering and technology ,Machine learning ,computer.software_genre ,Data type ,Machine Learning (cs.LG) ,Data modeling ,Ensembles of classifiers ,Statistics - Machine Learning ,020204 information systems ,Classifier (linguistics) ,FOS: Electrical engineering, electronic engineering, information engineering ,0202 electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,business.industry ,Ensemble learning ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Computational Theory and Mathematics ,Graph (abstract data type) ,Artificial intelligence ,business ,computer ,Information Systems - Abstract
Ensemble learning, the machine learning paradigm where multiple algorithms are combined, has exhibited promising perfomance in a variety of tasks. The present work focuses on unsupervised ensemble classification. The term unsupervised refers to the ensemble combiner who has no knowledge of the ground-truth labels that each classifier has been trained on. While most prior works on unsupervised ensemble classification are designed for independent and identically distributed (i.i.d.) data, the present work introduces an unsupervised scheme for learning from ensembles of classifiers in the presence of data dependencies. Two types of data dependencies are considered: sequential data and networked data whose dependencies are captured by a graph. Moment matching and Expectation Maximization algorithms are developed for the aforementioned cases, and their performance is evaluated on synthetic and real datasets., Comment: Accepted at IEEE Transactions on Knowledge and Data Engineering
- Published
- 2022
27. Recurrent Learning on PM2.5 Prediction Based on Clustered Airbox Dataset
- Author
-
Ling-Jyh Chen, Wen Hsing Huang, Min-Te Sun, Wei-Shinn Ku, Ming Feng Ho, Kazuya Sakai, and Chia Yu Lo
- Subjects
Artificial neural network ,Correlation coefficient ,Computer science ,Air pollution ,02 engineering and technology ,medicine.disease_cause ,Autoencoder ,Computer Science Applications ,Airbox ,Computational Theory and Mathematics ,020204 information systems ,Statistics ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Cluster analysis ,Air quality index ,Predictive modelling ,Information Systems - Abstract
The reliance on thermal power plants as well as increased vehicle emissions have constituted the primary factors of serious air pollution. Inhaling too much particulate air pollution may lead to respiratory diseases and even death, especially PM2.5. By predicting the air pollutant concentration, people can take precautions to avoid overexposure to air pollutants. Consequently, accurate PM2.5 prediction becomes more important. In this thesis, we propose a PM2.5 prediction system, which utilizes the dataset from EdiGreen Airbox and Taiwan EPA. Autoencoder and Linear interpolation are adopted for solving the missing value problem. Spearman's correlation coefficient is used to identify the most relevant features for PM2.5. Two prediction models (i.e., LSTM and LSTM based on K-means) are implemented which predict PM2.5 value for each Airbox device. To assess the performance of the model prediction, the daily average error and the hourly average accuracy for the duration of a week are calculated. The experimental results show that LSTM based on K-means has the best performance among all methods.
- Published
- 2022
28. Efficient Similarity-Aware Influence Maximization in Geo-Social Network
- Author
-
Kai Zheng, Xiaofang Zhou, Xuanhao Chen, Rui Sun, Guanfeng Liu, and Yan Zhao
- Subjects
Online and offline ,Similarity-aware ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,Machine learning ,computer.software_genre ,Mathematical model ,Promotion (rank) ,020204 information systems ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Social media ,Greedy algorithm ,Probability ,Q measurement ,media_common ,Consumption (economics) ,Measurement ,Social network ,business.industry ,Social networking (online) ,Sun ,Geo-social networks ,Maximization ,Influence maximization ,Computer Science Applications ,Computational Theory and Mathematics ,Artificial intelligence ,business ,computer ,Upper bound ,Information Systems - Abstract
With the explosion of GPS-enabled smartphones and social media platforms, geo-social networks are increasing as tools for businesses to promote their products or services. Influence maximization, which aims to maximize the expected spread of influence in the networks, has drawn increasing attention. However, most recent work tries to study influence maximization by only considering geographic distance, while ignoring the influence of users' spatio-temporal behavior on information propagation or location promotion, which can often lead to poor results. To relieve this problem, we propose a Similarity-aware Influence Maximization (SIM) model to efficiently maximize the influence spread by taking the effect of users' spatio-temporal behavior into account, which is more reasonable to describe the real information propagation. We first calculate the similarity between users according to their historical check-ins, and then we propose a Propagation to Consumption (PTC) model to capture both online and offline behaviors of users. Finally, we propose two greedy algorithms to efficiently maximize the influence spread. The extensive experiments over real datasets demonstrate the efficiency and effectiveness of the proposed algorithms. With the explosion of GPS-enabled smartphones and social media platforms, geo-social networks are increasing as tools for businesses to promote their products or services. Influence maximization, which aims to maximize the expected spread of influence in the networks, has drawn increasing attention. However, most recent work tries to study influence maximization by only considering geographic distance, while ignoring the influence of users' spatio-temporal behavior on information propagation or location promotion, which can often lead to poor results. To relieve this problem, we propose a Similarity-aware Influence Maximization (SIM) model to efficiently maximize the influence spread by taking the effect of users' spatio-temporal behavior into account, which is more reasonable to describe the real information propagation. We first calculate the similarity between users according to their historical check-ins, and then we propose a Propagation to Consumption (PTC) model to capture both online and offline behaviors of users. Finally, we propose two greedy algorithms to efficiently maximize the influence spread. The extensive experiments over real datasets demonstrate the efficiency and effectiveness of the proposed algorithms.
- Published
- 2022
29. Cross-View Locality Preserved Diversity and Consensus Learning for Multi-View Unsupervised Feature Selection
- Author
-
Jing Zhang, Lizhe Wang, Xinwang Liu, Chang Tang, Wei Zhang, Xiao Zheng, and Jian Xiong
- Subjects
business.industry ,Computer science ,Feature vector ,Locality ,Feature selection ,Pattern recognition ,02 engineering and technology ,Regularization (mathematics) ,Computer Science Applications ,Projection (relational algebra) ,Computational Theory and Mathematics ,Discriminative model ,Feature (computer vision) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Graph (abstract data type) ,Artificial intelligence ,business ,Information Systems - Abstract
Although demonstrating great success, previous multi-view unsupervised feature selection (MV-UFS) methods often construct a view-specific similarity graph and characterize the local structure of data within each single view. In such a way, the cross-view information could be ignored. In addition, they usually assume that different feature views are projected from a latent feature space while the diversity of different views cannot be fully captured. In this work, we resent a MV-UFS model via cross-view local structure preserved diversity and consensus learning, referred to as CvLP-DCL briefly. In order to exploit both the shared and distinguishing information across different views, we project each view into a label space, which consists of a consensus part and a view-specific part. Therefore, we regularize the fact that different views represent same samples. Meanwhile, a cross-view similarity graph learning term with matrix-induced regularization is embedded to preserve the local structure of data in the label space. By imposing the $l_{2,1}$ -norm on the feature projection matrices for constraining row sparsity, discriminative features can be selected from different views. An efficient algorithm is designed to solve the resultant optimization problem and extensive experiments on six publicly datasets are conducted to validate the effectiveness of the proposed CvLP-DCL.
- Published
- 2022
30. Congestion Control for Cross-Datacenter Networks
- Author
-
Yibo Zhu, Lei Cui, Kai Chen, Ge Chen, Wei Bai, Dongsu Han, and Gaoxiong Zeng
- Subjects
TCP Vegas ,business.industry ,Computer science ,Computer Networks and Communications ,ComputerSystemsOrganization_COMPUTER-COMMUNICATIONNETWORKS ,Testbed ,020206 networking & telecommunications ,Linux kernel ,02 engineering and technology ,Computer Science Applications ,Network congestion ,Wide area network ,Packet loss ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Latency (engineering) ,Electrical and Electronic Engineering ,business ,Software-defined networking ,Software ,Computer network - Abstract
Geographically distributed applications hosted on cloud are becoming prevalent. They run on cross-datacenter network that consists of multiple data center networks (DCNs) connected by a wide area network (WAN). Such a cross-DC network imposes significant challenges in transport design because the DCN and WAN segments have vastly distinct characteristics (e.g., butter depths, RTTs). In this paper, we find that existing DCN or WAN transports reacting to ECN or delay alone do not (and cannot be extended to) work well for such an environment. The key reason is that neither of the signals, by itself, can simultaneously capture the location and degree of congestion. This is due to the discrepancies between DCN and WAN. Motivated by this, we present the design and implementation of GEMINI that strategically integrates both ECN and delay signals for cross-DC congestion control. To achieve low latency, GEMINI bounds the inter-DC latency with delay signal and prevents the intra-DC packet loss with ECN. To maintain high throughput, GEMINI modulates the window dynamics and maintains low butter occupancy utilizing both congestion signals. GEMINI is implemented in Linux kernel and evaluated by extensive testbed experiments. Results show that GEMINI achieves up to 53%, 31% and 76% reduction of small flow average completion times compared to TCP Cubic, DCTCP and BBR; and up to 58% reduction of large flow average completion times compared to TCP Vegas.
- Published
- 2022
31. cGAIL: <u>C</u>onditional <u>G</u>enerative <u>A</u>dversarial <u>I</u>mitation <u>L</u>earning—An Application in Taxi Drivers’ Strategy Learning
- Author
-
Xin Zhang, Jun Luo, Yanhua Li, and Xun Zhou
- Subjects
Information Systems and Management ,Operations research ,business.industry ,Computer science ,Quality of service ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,Adversarial system ,020204 information systems ,Urban computing ,Public transport ,0202 electrical engineering, electronic engineering, information engineering ,Trajectory ,Global Positioning System ,business ,Baseline (configuration management) ,Generative grammar ,0105 earth and related environmental sciences ,Information Systems - Abstract
Smart passenger-seeking strategies employed by taxi drivers contribute not only to drivers’ incomes, but also higher quality of service passengers received. Therefore, understanding taxi drivers’ behaviors and learning the good passenger-seeking strategies are crucial to boost taxi drivers’ well-being and public transportation quality of service. However, we observe that drivers’ preferences of choosing which area to find the next passenger are diverse and dynamic across locations and drivers. It is hard to learn the location-dependent preferences given the partial data (i.e., an individual driver's trajectory may not cover all locations). In this paper, we make the first attempt to develop conditional generative adversarial imitation learning (cGAIL) model, as a unifying collective inverse reinforcement learning framework that learns the driver's decision-making preferences and policies by transferring knowledge across taxi driver agents and across locations. Our evaluation results on three months of taxi GPS trajectory data in Shenzhen, China, demonstrate that the driver's preferences and policies learned from cGAIL are on average 34.7% more accurate than those learned from other state-of-the-art baseline approaches.
- Published
- 2022
32. Looking Back on the Past: Active Learning With Historical Evaluation Results
- Author
-
Jian-Yun Nie, Jing Yao, Ji-Rong Wen, and Zhicheng Dou
- Subjects
Iterative and incremental development ,Sequence ,Heuristic ,Computer science ,business.industry ,Active learning (machine learning) ,Sampling (statistics) ,02 engineering and technology ,computer.software_genre ,Machine learning ,Small set ,Computer Science Applications ,Computational Theory and Mathematics ,Named-entity recognition ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Labeled data ,Artificial intelligence ,business ,computer ,Information Systems - Abstract
Active learning is an effective approach for tasks with limited labeled data. It samples a small set of data to annotate actively and is widely applied in various AI tasks. It uses an iterative process, during which we utilize the current trained model to evaluate all unlabeled samples and annotate the best samples based on a specific query strategy to update the underlying model iteratively. Most existing active learning approaches rely on only the evaluation results generated by the current model and ignore the results from previous iterations. In this paper, we propose using more historical evaluation results which can provide additional information to help better select samples. First, we apply two kinds of heuristic features of the historical evaluation results, the weighted sum of and fluctuation of the historical evaluation sequence, to improve the effectiveness of sampling. Next, to use the information contained in the historical results more globally, we design a novel query strategy that learns how to select samples based on the historical sequences automatically. We also improve current state-of-the-art active learning methods by introducing historical evaluation results. Experimental results on two common NLP tasks including text classification and named entity recognition show that our methods significantly outperform current methods
- Published
- 2022
33. COVID-19 Contact Tracing and Privacy: A Longitudinal Study of Public Opinion
- Author
-
Tadayoshi Kohno, Ryan Calo, Maggie Jiang, Jack Keng-Wei Chang, Franziska Roesner, and Lucy Simko
- Subjects
FOS: Computer and information sciences ,medicine.medical_specialty ,Longitudinal study ,Computer Science - Cryptography and Security ,Computer Networks and Communications ,Internet privacy ,Computer Science - Human-Computer Interaction ,02 engineering and technology ,Public opinion ,Human-Computer Interaction (cs.HC) ,Computer Science - Computers and Society ,03 medical and health sciences ,Leverage (negotiation) ,020204 information systems ,Computers and Society (cs.CY) ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Wearable technology ,030304 developmental biology ,0303 health sciences ,Event (computing) ,business.industry ,Public health ,3. Good health ,Computer Science Applications ,Hardware and Architecture ,business ,Cryptography and Security (cs.CR) ,Safety Research ,Software ,Contact tracing ,Information Systems ,Diversity (business) - Abstract
There is growing use of technology-enabled contact tracing, the process of identifying potentially infected COVID-19 patients by notifying all recent contacts of an infected person. Governments, technology companies, and research groups alike have been working towards releasing smartphone apps, using IoT devices, and distributing wearable technology to automatically track "close contacts" and identify prior contacts in the event an individual tests positive. However, there has been significant public discussion about the tensions between effective technology-based contact tracing and the privacy of individuals. To inform this discussion, we present the results of seven months of online surveys focused on contact tracing and privacy, each with 100 participants. Our first surveys were on April 1 and 3, before the first peak of the virus in the US, and we continued to conduct the surveys weekly for 10 weeks (through June), and then fortnightly through November, adding topical questions to reflect current discussions about contact tracing and COVID-19. Our results present the diversity of public opinion and can inform policy makers, technologists, researchers, and public health experts on whether and how to leverage technology to reduce the spread of COVID-19, while considering potential privacy concerns. We are continuing to conduct longitudinal measurements and will update this report over time; citations to this version of the report should reference Report Version 2.0, December 4, 2020., 37 pages, 11 figures. Supercedes arXiv:2005.06056
- Published
- 2022
34. Do You Really Know if It’s True? How Asking Users to Rate Stories Affects Belief in Fake News on Social Media
- Author
-
Alan R. Dennis, Patricia L. Moravec, Antino Kim, and Randall K. Minas
- Subjects
Information Systems and Management ,Point (typography) ,business.industry ,Event (computing) ,Computer Networks and Communications ,media_common.quotation_subject ,Flagging ,05 social sciences ,Internet privacy ,02 engineering and technology ,Library and Information Sciences ,Management Information Systems ,020204 information systems ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,050211 marketing ,The Internet ,Social media ,Misinformation ,Fake news ,business ,Psychology ,Skepticism ,media_common ,Information Systems - Abstract
The rise of “fake news” has become a major concern for social media platforms. In response, Facebook has proposed and tested the idea of users flagging and rating news articles and sources, much akin to how consumers rate products and services on the Internet. One obvious challenge with this crowdsourced rating approach is whether the users really know enough to rate news articles and sources. Perhaps, a side benefit of asking users to evaluate an article—and asking about their personal experience with the event described in the article—is making them realize that they do not know enough about the event to make an accurate judgment, thus pushing them to become more skeptical. We asked 68 social media users to assess the believability of 42 social media headlines. We found that, while users were generally more likely to believe articles that agreed with their point of view, asking users to rate pushed them to think more critically about the truthfulness of the articles. Moreover, once users had been asked to rate some articles, they remained critical of other articles as well, even without the rating prompt. Overall, our findings suggest that asking users to evaluate the truthfulness of articles may not only produce rating information that can be a useful reference at a later point in time but also have an immediate benefit of alerting users to think more critically about all articles they see.
- Published
- 2022
35. Towards an Optimal Bus Frequency Scheduling: When the Waiting Time Matters
- Author
-
Baihua Zheng, Songsong Mo, Zhifeng Bao, and Zhiyong Peng
- Subjects
Mathematical optimization ,Schedule ,Optimization problem ,Linear programming ,Heuristic (computer science) ,Computer science ,Approximation algorithm ,02 engineering and technology ,Computer Science Applications ,Scheduling (computing) ,Bus network ,Computational Theory and Mathematics ,020204 information systems ,11. Sustainability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Greedy algorithm ,Information Systems - Abstract
Reorganizing bus frequencies to cater for actual travel demands can signicantly save the cost of the public transport system. This paper studies the bus frequency optimization problem considering the user satisfaction. Specically, for the rst time to our best knowledge, we study how to schedule the buses such that the total number of passengers who could receive their bus services within the waiting time threshold can be maximized. We propose two variants of the problem, FAST and FASTCO, to cater for different application needs and prove that both are NP-hard. To solve FAST effectively and efciently, we rst present an index-based (1 1/e)-approximation algorithm. By exploiting the locality property of routes in a bus network, we further propose a partition-based greedy method for that achieves a (1)(1 1/e) approximation ratio. Then we propose a progressive partition-based greedy method for to further boost the efciency while achieving a (1)(1 1/e) approximation ratio. For the FASTCO problem, two greedy-based heuristic methods are proposed. Experiments on a real city-wide bus dataset in Singapore have been conducted to verify the efciency, effectiveness, and scalability of our methods in addressing FAST and FASTCO.
- Published
- 2022
36. Entity Alignment for Knowledge Graphs With Multi-Order Convolutional Networks
- Author
-
Bolong Zheng, Thanh Trung Huynh, Vinh Van Tong, Tam Thanh Nguyen, Quoc Viet Hung Nguyen, Darnbi Sakong, and Hongzhi Yin
- Subjects
Theoretical computer science ,Computational Theory and Mathematics ,Knowledge graph ,Computer science ,020204 information systems ,Knowledge engineering ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,02 engineering and technology ,Computer Science Applications ,Information Systems ,Data modeling - Abstract
Knowledge graphs (KGs) have become popular structures for unifying real-world entities by modelling the relationships between them and their attributes. Entity alignment -- the task of identifying corresponding entities across different KGs -- has attracted a great deal of attention in both academia and industry. However, existing alignment techniques often require large amounts of labelled data, are unable to encode multi-modal data simultaneously, and enforce only few consistency constraints. In this paper, we propose an end-to-end, unsupervised entity alignment framework for cross-lingual KGs that fuses different types of information in order to fully exploit the richness of KG data. The model captures the relation-based correlation between entities by using a multi-order graph convolutional neural (GCN) model that is designed to satisfy the consistency constraints, while incorporating the attribute-based correlation via a translation machine. We adopt a late-fusion mechanism to combine all the information together, which allows these approaches to complement each other and thus enhances the final alignment result, and makes the model more robust to consistency violations. Empirical results show that our model is more accurate and orders of magnitude faster than existing baselines. We also demonstrate its sensitivity to hyper-parameters, effort saving in terms of labelling, and the robustness against adversarial conditions.
- Published
- 2022
37. Few-Shot Named Entity Recognition via Meta-Learning
- Author
-
Hao Wang, Jing Li, Billy Chiu, and Shanshan Feng
- Subjects
Meta learning (computer science) ,Contextual image classification ,Computer science ,business.industry ,02 engineering and technology ,Overfitting ,computer.software_genre ,Relationship extraction ,Sequence labeling ,Computer Science Applications ,Task (project management) ,Computational Theory and Mathematics ,Named-entity recognition ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Artificial intelligence ,business ,computer ,Natural language processing ,Information Systems - Abstract
Few-shot learning under the N-way K-shot setting (i.e., K annotated samples for each of N classes) has been widely studied in relation extraction (e.g., FewRel) and image classification (e.g., Mini-ImageNet). Named entity recognition (NER) is typically framed as a sequence labeling problem where the entity classes are inherently entangled together because the entity number and classes in a sentence are not known in advance, leaving the N-way K-shot NER problem so far unexplored. In this paper, we first formally define a more suitable N-way K-shot setting for NER. Then we propose FewNER, a novel meta-learning approach for few-shot NER. FewNER separates the entire network into a task-independent part and a task-specific part. During training in FewNER, the task-independent part is meta-learned across multiple tasks and a task-specific part is learned for each single task in a low-dimensional space. At test time, FewNER keeps the task-independent part fixed and adapts to a new task via gradient descent by updating only the task-specific part, resulting in it being less prone to overfitting and more computationally efficient. The results demonstrate that FewNER achieves state-of-the-art performance against nine baseline methods by significant margins on three adaptation experiments.
- Published
- 2022
38. Smooth Compact Tensor Ring Regression
- Author
-
Jiani Liu, Yipeng Liu, and Ce Zhu
- Subjects
Ring (mathematics) ,Rank (linear algebra) ,Computer science ,02 engineering and technology ,Computer Science Applications ,Matrix decomposition ,Constraint (information theory) ,Computational Theory and Mathematics ,Robustness (computer science) ,020204 information systems ,Linear regression ,0202 electrical engineering, electronic engineering, information engineering ,Applied mathematics ,Tensor ,Information Systems - Abstract
In learning tasks with high order correlations, the low-rank approximation of the regression coefficient tensor has become increasingly important. Tensor ring can capture more correlation information among tensor networks. However, its optimal rank is generally unknown and needs to be tuned from multiple combinations. To address the issue, we propose a novel tensor regression framework with a group sparsity constraint on latent factors for tensor ring rank estimation. Specifically, the proposed group sparsity term constrained matrix factorization problem is first proved to be equivalent to a better approximation of matrix rank, namely Schatten- ${1/2}$ quasi-norm. Extending it into tensor, the tensor ring rank can be inferred during the learning process to balance the prediction error and the model complexity. Besides, a total variation term is introduced to enhance the local consistency of the predicted response, which is useful for reducing the adverse effects of random noise. Experiments on the simulation dataset show that the proposed method can exactly obtain the tensor ring rank, and the effectiveness and robustness of the proposed algorithm is further verified on a real dataset for human motion capture tasks.
- Published
- 2022
39. Efficient Radius-Bounded Community Search in Geo-Social Networks
- Author
-
Shuting Wang, Xin Cao, Kai Wang, and Lu Qin
- Subjects
Theoretical computer science ,Computer science ,computer.internet_protocol ,02 engineering and technology ,Computer Science Applications ,Vertex (geometry) ,Computational Theory and Mathematics ,Cover (topology) ,restrict ,020204 information systems ,Bounded function ,0202 electrical engineering, electronic engineering, information engineering ,Search problem ,RADIUS ,08 Information and Computing Sciences ,Pruning (decision trees) ,Baseline (configuration management) ,computer ,Information Systems - Abstract
Driven by real-life applications in geo-social networks, we study the problem of computing radius-bounded k-cores (RB-k-cores) that aims to find communities satisfying both social and spatial constraints. In particular, the model k-core (i.e., the subgraph where each vertex has at least k neighbors) is used to ensure the social cohesiveness, and a radius-bounded circle is used to restrict the locations of users in an RB-k-core. We explore several algorithmic paradigms to compute RB-k-cores, including a triple-vertex-based paradigm, a binary-vertex-based paradigm, and a paradigm utilizing the concept of rotating circles. The rotating-circle-based paradigm is further enhanced by several pruning techniques to achieve better efficiency. In addition, to find representative RB-k-cores, we study the diversified radius-bounded k-core search problem, which finds t RB-k-cores to cover the most number of vertices. We first propose a baseline algorithm that identifies the distinctive RB-k-cores after finding all the RB-k-cores. Beyond this, we design algorithms that can efficiently maintain the top-t candidate RB-k-cores and also achieve a guaranteed approximation ratio. Experimental studies on both real and synthetic datasets demonstrate that our proposed techniques can efficiently compute (diversified) RB-k-cores. Moreover, our techniques can be used to compute the minimum-circle-bounded k-core and significantly outperform the existing techniques.
- Published
- 2022
40. An Attribute-Aware Attentive GCN Model for Attribute Missing in Recommendation
- Author
-
Fan Liu, Lei Zhu, Chenghao Liu, Zhiyong Cheng, and Liqiang Nie
- Subjects
Information retrieval ,Computer science ,Aggregate (data warehouse) ,02 engineering and technology ,Construct (python library) ,Recommender system ,Computer Science Applications ,Computational Theory and Mathematics ,020204 information systems ,Node (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Leverage (statistics) ,Graph (abstract data type) ,Representation (mathematics) ,Feature learning ,Information Systems - Abstract
As important side information, attributes have been widely exploited in the existing recommender system for better performance. Prior studies usually use a default value (i.e., ``other") to represent the missing attribute, resulting in sub-optimal performance. To address this problem, in this paper, we present an attribute-aware attentive graph convolution network. In particular, we first construct a graph, where users, items, and attributes are three types of nodes and their associations are edges. Thereafter, we leverage the graph convolution network to characterize the complicated interactions among . Furthermore, to learn the node representation, we adopt the message-passing strategy to aggregate the messages passed from the other directly linked types of nodes (e.g., a user or an attribute). Towards this end, we are capable of incorporating associate attributes to strengthen the user and item representation learning, and thus naturally solve the attribute missing problem. Given that for different users, the attributes of an item have different influence on their preference to this item, we design a novel attention mechanism to filter the message passed from an item to a target user by considering the attribute information. Extensive experiments on several publicly accessible datasets demonstrate the superiority of our model over several state-of-the-art methods.
- Published
- 2022
41. Wahrnehmung von Technik und Digitalisierung in Deutschland und Europa : Befunde aus dem TechnikRadar
- Author
-
Jürgen Hampel, Constanze Störk-Biber, Michael M. Zwick, and Cordula Kropp
- Subjects
0209 industrial biotechnology ,020901 industrial engineering & automation ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,02 engineering and technology - Abstract
Der vorliegende Beitrag stellt die Wahrnehmung von Technik und Digitalisierung in Deutschland und im Vergleich mit dem europäischen Ausland dar. Er geht auf die Ergebnisse des TechnikRadar aus den Jahren 2018 und 2019 zurück. Das TechnikRadar von acatech - Deutsche Akademie der Technikwissenschaften und Körber-Stiftung beleuchtet jährlich mit verschiedenem Fokus, was die Deutschen über Technik denken. Erstellt und wissenschaftlich ausgewertet wird es von den AutorInnen des Beitrags. 2018 und 2019 standen die Wahrnehmung und Bewertung der Digitalisierung im Mittelpunkt, wobei 2018 die Ergebnisse der Repräsentativbefragung in Deutschland vorgestellt und 2019 diese Ergebnisse sowohl einer vertiefenden Analyse zu alters- und geschlechtsspezifischen Unterschieden unterzogen als auch mit Befunden aus anderen internationalen Studien verglichen wurden. Anhand von Anwendungsfeldern wie dem autonomen Fahren, Pflegerobotik, Smart Home und der Digitalisierung von Infrastrukturen wird herausgestellt, wie Technik in den jeweiligen Kontexten wahrgenommen und welche Chancen und Risiken gesehen werden. Dabei zeigt sich, dass die eigene Kompetenzwahrnehmung in Bezug auf den Umgang mit digitalen Technologien und Diensten sowie das Vertrauen der Menschen in die Akteure der Digitalisierung wesentlich für eine grundlegende gesellschaftliche Akzeptanz von Digitalisierungsprozessen ist., Projekt DEAL
- Published
- 2023
- Full Text
- View/download PDF
42. First-Order Stable Model Semantics with Intensional Functions
- Author
-
Michael Bartholomew and Joohyung Lee
- Subjects
FOS: Computer and information sciences ,Computer Science - Symbolic Computation ,Linguistics and Language ,Theoretical computer science ,Computer science ,Computer Science - Artificial Intelligence ,Modulo ,Classical logic ,02 engineering and technology ,Symbolic Computation (cs.SC) ,First order ,Language and Linguistics ,Answer set programming ,Artificial Intelligence (cs.AI) ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Artificial Intelligence ,020204 information systems ,Satisfiability modulo theories ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Logic program ,Real number ,Stable model semantics - Abstract
In classical logic, nonBoolean fluents, such as the location of an object, can be naturally described by functions. However, this is not the case in answer set programs, where the values of functions are pre-defined, and nonmonotonicity of the semantics is related to minimizing the extents of predicates but has nothing to do with functions. We extend the first-order stable model semantics by Ferraris, Lee, and Lifschitz to allow intensional functions -- functions that are specified by a logic program just like predicates are specified. We show that many known properties of the stable model semantics are naturally extended to this formalism and compare it with other related approaches to incorporating intensional functions. Furthermore, we use this extension as a basis for defining Answer Set Programming Modulo Theories (ASPMT), analogous to the way that Satisfiability Modulo Theories (SMT) is defined, allowing for SMT-like effective first-order reasoning in the context of ASP. Using SMT solving techniques involving functions, ASPMT can be applied to domains containing real numbers and alleviates the grounding problem. We show that other approaches to integrating ASP and CSP/SMT can be related to special cases of ASPMT in which functions are limited to non-intensional ones., 69 pages
- Published
- 2023
43. Recommender Systems for Online and Mobile Social Networks: A survey
- Author
-
Mattia Giovanni Campana and Franca Delmastro
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Exploit ,Computer Networks and Communications ,Process (engineering) ,Computer science ,02 engineering and technology ,Recommender system ,computer.software_genre ,Machine Learning (cs.LG) ,Task (project management) ,Computer Science - Information Retrieval ,World Wide Web ,Open research ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Social media ,mobile social networks ,Social and Information Networks (cs.SI) ,Distributed Computing Environment ,Multimedia ,Communication ,Computer Science - Social and Information Networks ,online social networks ,020201 artificial intelligence & image processing ,Standard algorithms ,computer ,Information Retrieval (cs.IR) ,Information Systems - Abstract
Recommender Systems (RS) currently represent a fundamental tool in online services, especially with the advent of Online Social Networks (OSN). In this case, users generate huge amounts of contents and they can be quickly overloaded by useless information. At the same time, social media represent an important source of information to characterize contents and users’ interests. RS can exploit this information to further personalize suggestions and improve the recommendation process. In this paper we present a survey of Recommender Systems designed and implemented for Online and Mobile Social Networks, highlighting how the use of social context information improves the recommendation task, and how standard algorithms must be enhanced and optimized to run in a fully distributed environment, as opportunistic networks. We describe advantages and drawbacks of these systems in terms of algorithms, target domains, evaluation metrics and performance evaluations. Eventually, we present some open research challenges in this area.
- Published
- 2023
44. Hybrid Fuzzy Neural Search Retrieval System
- Author
-
Rawan Ghnemat and Adnan Shaout
- Subjects
Adaptive neuro fuzzy inference system ,Engineering ,Information Systems and Management ,Web search query ,business.industry ,Process (computing) ,02 engineering and technology ,computer.software_genre ,Machine learning ,Computer Science Applications ,Management Information Systems ,Hybrid intelligent system ,Query expansion ,Search engine ,Experimental system ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Beam search ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,business ,computer - Abstract
Search engines are crucial for information gathering systems (IGS). New challenges face search engines concerning automatic learning from user requests. In this paper, a new hybrid intelligent system is proposed to enhance the search process. Based on a Multilayer Fuzzy Inference System (MFIS), the first step is to implement a scalable system to relay logical rules in order to produce three classifications for search behavior, user profiles, and query characteristics from analysis of navigation log files. These three outputs from the MFIS are used as inputs for the second step, an Adaptive Neuro-Fuzzy Inference System (ANFIS). The training process of the ANFIS replaced the rules by adjusting the weights in order to find the most relevant result for the search query. This proposed system, called MFIS-ANFIS, is implemented as an experimental system. The system performance is evaluated using quantitative and comparative analysis. MFIS-ANFIS aimed to be the core of intelligent and reliable search process.
- Published
- 2022
45. A Multi-Stage Fuzzy Model for Assessing Applicants for Faculty Positions in Universities
- Author
-
Raghda Hraiz, Mariam Khader, and Adnan Shaout
- Subjects
Interpretation (logic) ,Java ,Operations research ,Computer science ,Fuzzy model ,02 engineering and technology ,Fuzzy control system ,Fuzzy logic ,Multi stage ,020204 information systems ,Credibility ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Decision Sciences (miscellaneous) ,computer ,Reliability (statistics) ,Information Systems ,computer.programming_language - Abstract
Assessing applicants for faculty positions in universities involves many issues. Each issue may involve a judgment based on uncertain or imprecise data. The uncertainty in data may exist in the interpretation made by the evaluator. This issue might lead to improper decision making. Modeling such a system using fuzzy logic will provide a more efficient model for handling imprecision. This article presents a fuzzy system for modeling the assessment of applicants for employment at academic universities. This system will utilize a multi-stage fuzzy model for measuring and evaluating the applicants. Utilizing fuzzy logic for applicants' evaluation will help administrators in choosing the best candidates for faculty positions. The fuzzy system was developed using jFuzzyLogic Java library. The reliability of the proposed system was proved by evaluating real-world case studies to prove its effectiveness to mimic human judgment. Moreover, the developed system has been evaluated by comparing it with a traditional mathematical method to prove the credibility and fairness of the proposed fuzzy system.
- Published
- 2022
46. Improving Auto-Detection of Phishing Websites using Fresh-Phish Framework
- Author
-
Indrakshi Ray, Kyle Haefner, and Hossein Shirazi
- Subjects
World Wide Web ,Computer science ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Immunology and Allergy ,020201 artificial intelligence & image processing ,02 engineering and technology ,Phishing - Abstract
Denizens of the Internet are under a barrage of phishing attacks of increasing frequency and sophistication. Emails accompanied by authentic looking websites are ensnaring users who, unwittingly, hand over their credentials compromising both their privacy and security. Methods such as the blacklisting of these phishing websites become untenable and cannot keep pace with the explosion of fake sites. Detection of nefarious websites must become automated and be able to adapt to this ever-evolving form of social engineering. There is an improved framework that was previously implemented called “Fresh-Phish”, for creating current machine-learning data for phishing websites. The improved framework uses a total of 28 different website features that query using python, then a large labeled dataset is built and analyze over several machine learning classifiers against this dataset to determine which is the most accurate. This modified framework improves the accuracy of modeling those features by using integer rather than binary values where possible. This article analyzes not just the accuracy of the technique, but also how long it takes to train the model.
- Published
- 2022
47. Graph Neural Network for Fraud Detection via Spatial-Temporal Attention
- Author
-
Liqing Zhang, Ying Zhang, Dawei Cheng, and Xiaoyang Wang
- Subjects
Focus (computing) ,Computer science ,business.industry ,Credit card fraud ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science Applications ,Domain (software engineering) ,Empirical research ,Computational Theory and Mathematics ,Knowledge extraction ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Domain knowledge ,Graph (abstract data type) ,08 Information and Computing Sciences ,Artificial intelligence ,business ,Database transaction ,computer ,Information Systems - Abstract
Card fraud is an important issue and incurs a considerable cost for both cardholders and issuing banks. Contemporary methods apply machine learning-based approaches to detect fraudulent behavior from transaction records. But manually generating features needs domain knowledge and may lay behind the modus operandi of fraud, which means we need to automatically focus on the most relevant fraudulent behavior patterns in the online detection system. Therefore, in this work, we propose a spatial-temporal attention-based graph network (STAGN) for credit card fraud detection. In particular, we learn the temporal and location-based transaction graph features by a graph neural network first. Afterwards, we employ the spatial-temporal attention on top of learned tensor representations, which are then fed into a 3D convolution network. The attentional weights are jointly learned in an end-to-end manner with 3D convolution and detection networks. After that, we conduct extensive experiments on the real-word card transaction dataset. The result shows that STAGN performs better than other state-of-the-art baselines in both AUC and precision-recall curves. Moreover, we conduct empirical studies with domain experts on the proposed method for fraud detection and knowledge discovery; the result demonstrates its superiority in detecting suspicious transactions, mining spatial and temporal fraud hotspots, and uncover fraud patterns. The effectiveness of the proposed method in other user behavior-based tasks is also demonstrated. Finally, in order to tackle the challenges of big data, we integrate our proposed STAGN into the fraud detection system as the predictive model and present the implementation detail of each module in the system.
- Published
- 2022
48. Position-Transitional Particle Swarm Optimization-Incorporated Latent Factor Analysis
- Author
-
Nianyin Zeng, Xin Luo, Ye Yuan, Zidong Wang, and Sili Chen
- Subjects
Mathematical optimization ,Computer science ,Computation ,Process (computing) ,Swarm behaviour ,Particle swarm optimization ,02 engineering and technology ,Missing data ,Computer Science Applications ,Matrix (mathematics) ,Stochastic gradient descent ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Information Systems ,Premature convergence - Abstract
High-dimensional and sparse (HiDS) matrices are frequently found in various industrial applications. A latent factor analysis (LFA) model is commonly adopted to extract useful knowledge from an HiDS matrix, whose parameter training mostly relies on a stochastic gradient descent (SGD) algorithm. However, an SGD-based LFA model's learning rate is hard to tune in real applications, making it vital to implement its self-adaptation. To address this critical issue, this study firstly investigates the evolution process of a particle swarm optimization algorithm with care, and then proposes to incorporate more dynamic information into it for avoiding accuracy loss caused by premature convergence without extra computation burden, thereby innovatively achieving a novel position-transitional particle swarm optimization (P2SO) algorithm. It is subsequently adopted to implement a P2SO-based LFA (PLFA) model that builds a learning rate swarm applied to the same group of LFs. Thus, a PLFA model implements highly efficient learning rate adaptation as well as represents an HiDS matrix precisely. Experimental results on four HiDS matrices emerging from real applications demonstrate that compared with an SGD-based LFA model, a PLFA model no longer suffers from a tedious and expensive tuning process of its learning rate to achieve higher prediction accuracy for missing data.
- Published
- 2022
49. Joint Representation Learning and Clustering: A Framework for Grouping Partial Multiview Data
- Author
-
Wenzhang Zhuge, Chenping Hou, Hong Tao, Dongyun Yi, Tingjin Luo, and Ling-Li Zeng
- Subjects
Optimization problem ,Theoretical computer science ,Computer science ,Iterative method ,02 engineering and technology ,Computer Science Applications ,Matrix (mathematics) ,Computational Theory and Mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Task analysis ,Embedding ,Graph (abstract data type) ,Cluster analysis ,Feature learning ,Information Systems - Abstract
Partial multi-view clustering has attracted various attentions from diverse fields. Most existing methods adopt separate steps to obtain unified representations and extract clustering indicators. This separate manner prevents two learning processes to negotiate to achieve optimal performance. In this paper, we propose the Joint Representation Learning and Clustering (JRLC) framework to address this issue. The JRLC framework employs representation matrices to extract view-specific clustering information directly from the presence of partial similarity matrices, and rotates them to learn a common probability label matrix simultaneously, which connects representation learning and clustering seamlessly to achieve better clustering performance. Under the guidance of JRLC framework, several new incomplete multi-view clustering methods can be developed by extending existing single-view graph-based representation learning methods. For illustration, within the framework, we propose two specific methods, JRLC with spectral embedding (JRLC-SE) and JRLC via integrating nonnegative embedding and spectral embedding (JRLC-NS). Two iterative algorithms with guaranteed convergence are designed to solve the resultant optimization problems of JRLC-SE and JRLC-NS. Experimental results on various datasets and news topic clustering application demonstrate the effectiveness of the proposed algorithms.
- Published
- 2022
50. Multiple Flat Projections for Cross-Manifold Clustering
- Author
-
Yuan-Hai Shao, Lan Bai, Nai-Yang Deng, Zhen Wang, and Wei-Jie Chen
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer science ,Machine Learning (stat.ML) ,02 engineering and technology ,Machine Learning (cs.LG) ,Statistics - Machine Learning ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Electrical and Electronic Engineering ,Cluster analysis ,Projection (set theory) ,Mathematics::Symplectic Geometry ,Series (mathematics) ,Manifold ,Computer Science Applications ,Human-Computer Interaction ,Nonlinear system ,ComputingMethodologies_PATTERNRECOGNITION ,Kernel (image processing) ,Control and Systems Engineering ,Video tracking ,Benchmark (computing) ,020201 artificial intelligence & image processing ,Mathematics::Differential Geometry ,Algorithm ,Software ,Information Systems - Abstract
Cross-manifold clustering is a hard topic and many traditional clustering methods fail because of the cross-manifold structures. In this paper, we propose a Multiple Flat Projections Clustering (MFPC) to deal with cross-manifold clustering problems. In our MFPC, the given samples are projected into multiple subspaces to discover the global structures of the implicit manifolds. Thus, the cross-manifold clusters are distinguished from the various projections. Further, our MFPC is extended to nonlinear manifold clustering via kernel tricks to deal with more complex cross-manifold clustering. A series of non-convex matrix optimization problems in MFPC are solved by a proposed recursive algorithm. The synthetic tests show that our MFPC works on the cross-manifold structures well. Moreover, experimental results on the benchmark datasets show the excellent performance of our MFPC compared with some state-of-the-art clustering methods., Comment: 12 pages, 58 figures
- Published
- 2022
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.