99 results on '"Rossi, Ryan"'
Search Results
2. Factors Associated With Tobacco Cessation Advice Recall and Quit Rates in Vascular Surgery Patients. A Single Center Study.
- Author
-
Peng, Yuanzun, Rossi, Ryan, Falkenhain, Alec, Bose, Saideep, Williams, Michael, Wittgen, Catherine, Han, David, and Smeds, Matthew R.
- Subjects
- *
SMOKING cessation , *RISK assessment , *PATIENT education , *SURGERY , *PATIENTS , *SMOKING , *OUTPATIENT medical care , *RETROSPECTIVE studies , *DESCRIPTIVE statistics , *VASCULAR surgery , *LONGITUDINAL method , *MEMORY , *MEDICAL appointments , *STATISTICS , *MEDICAL records , *ACQUISITION of data , *COUNSELING , *SOCIAL classes , *TIME - Abstract
Objectives: Smoking is an important modifiable risk factor in all vascular diseases and verbal advice from providers has been shown to increase rates of tobacco cessation. We sought to identify factors that will improve tobacco cessation and recall of receiving verbal cessation advice in vascular surgery patients at a single institution. Methods: The study is a retrospective cohort study. Patients seen in outpatient vascular surgery clinic who triggered a tobacco Best Practice Advisory (BPA) during their office visits over a 10-month period were contacted post-clinic and administered surveys detailing smoking status, cessation advice recall, and validated scales for nicotine dependence and willingness to quit smoking. This BPA is a "hard stop" that requires providers to document actions taken. Charts were reviewed for tobacco cessation documentation. Nine-digit zip-codes identified the area deprivation index, a measure of socioeconomic status. Univariate analysis was used to identify factors associated with cessation and advice recall. Results: One hundred out of 318 (31.4%) patients responded to the survey. Epic Slicer Dicer found 97 BPA responses. To dismiss the BPA, 89 providers (91.8%) selected "advised tobacco cessation" and "Unable to Advise" otherwise. Of the 318 patients, 115 (36.1%) had cessation intervention documented in their provider notes and 151 (47.5%) received written tobacco cessation advice. Of survey respondents, 70 recalled receiving verbal advice, 27 recalled receiving written advice, 28 reported receiving offers of medication/therapy for cessation. 55 patients reported having tobacco cessation plans, and among those 17 reported having quit tobacco. Recall of receiving written advice (P <.001) and recall of receiving medication/therapy (P =.008) were associated with recall of receiving verbal cessation advice. Conclusions: Providing patients with tobacco cessation medication/therapy and written tobacco cessation education during office visits is associated with increased patients' recall of tobacco cessation advice. Vascular surgeons should continue to provide directed tobacco cessation advice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Bias and Fairness in Large Language Models: A Survey.
- Author
-
Gallegos, Isabel O., Rossi, Ryan A., Barrow, Joe, Tanjim, Md Mehrab, Kim, Sungchul, Dernoncourt, Franck, Yu, Tong, Zhang, Ruiyi, and Ahmed, Nesreen K.
- Subjects
- *
LANGUAGE models , *NATURAL language processing , *RESEARCH personnel , *SOCIAL groups , *COUNTERFACTUALS (Logic) - Abstract
Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this article, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely, metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Fairness-Aware Graph Neural Networks: A Survey.
- Author
-
Chen, April, Rossi, Ryan A., Park, Namyong, Trivedi, Puja, Wang, Yu, Yu, Tong, Kim, Sungchul, Dernoncourt, Franck, and Ahmed, Nesreen K.
- Subjects
GRAPH neural networks ,AGGREGATION operators - Abstract
Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. We categorize these techniques by whether they focus on improving fairness in the pre-processing, in-processing (during training), or post-processing phases. We discuss how such techniques can be used together whenever appropriate and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics, including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Correction to: Complex networks are structurally distinguishable by domain
- Author
-
Rossi, Ryan A. and Ahmed, Nesreen K.
- Published
- 2020
- Full Text
- View/download PDF
6. FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback
- Author
-
Singh, Ashish, Agarwal, Prateek, Huang, Zixuan, Singh, Arpita, Yu, Tong, Kim, Sungchul, Bursztyn, Victor, Vlassis, Nikos, and Rossi, Ryan A.
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Computation and Language ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem., 19 pages, 4 figures. Benchmark Documentation: https://figcapshf.github.io/
- Published
- 2023
7. Fairness-Aware Graph Neural Networks: A Survey
- Author
-
Chen, April, Rossi, Ryan A., Park, Namyong, Trivedi, Puja, Wang, Yu, Yu, Tong, Kim, Sungchul, Dernoncourt, Franck, and Ahmed, Nesreen K.
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Social and Information Networks ,Information Retrieval (cs.IR) ,Machine Learning (cs.LG) ,Computer Science - Information Retrieval - Abstract
Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. Previous work on fair GNN models and techniques are discussed in terms of whether they focus on improving fairness during a preprocessing step, during training, or in a post-processing phase. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.
- Published
- 2023
8. Learning the Visualness of Text Using Large Vision-Language Models
- Author
-
Verma, Gaurav, Rossi, Ryan A., Tensmeyer, Christopher, Gu, Jiuxiang, and Nenkova, Ani
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Visual text evokes an image in a person's mind, while non-visual text fails to do so. A method to automatically detect visualness in text will unlock the ability to augment text with relevant images, as neural text-to-image generation and retrieval models operate on the implicit assumption that the input text is visual in nature. We curate a dataset of 3,620 English sentences and their visualness scores provided by multiple human annotators. Additionally, we use documents that contain text and visual assets to create a distantly supervised corpus of document text and associated images. We also propose a fine-tuning strategy that adapts large vision-language models like CLIP that assume a one-to-one correspondence between text and image to the task of scoring text visualness from text input alone. Our strategy involves modifying the model's contrastive learning objective to map text identified as non-visual to a common NULL image while matching visual text to their corresponding images in the document. We evaluate the proposed approach on its ability to (i) classify visual and non-visual text accurately, and (ii) attend over words that are identified as visual in psycholinguistic studies. Empirical evaluation indicates that our approach performs better than several heuristics and baseline models for the proposed task. Furthermore, to highlight the importance of modeling the visualness of text, we conduct qualitative analyses of text-to-image generation systems like DALL-E., 9 pages of main text, 5 pages of appendix, 9 figures, and 9 tables
- Published
- 2023
9. Complex networks are structurally distinguishable by domain
- Author
-
Rossi, Ryan A. and Ahmed, Nesreen K.
- Published
- 2019
- Full Text
- View/download PDF
10. Graphlet decomposition: framework, algorithms, and applications
- Author
-
Ahmed, Nesreen K., Neville, Jennifer, Rossi, Ryan A., Duffield, Nick G., and Willke, Theodore L.
- Published
- 2017
- Full Text
- View/download PDF
11. PersonaSAGE: A Multi-Persona Graph Neural Network
- Author
-
Choudhary, Gautam, Burhanuddin, Iftikhar Ahamath, Koh, Eunyee, Du, Fan, and Rossi, Ryan A.
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) - Abstract
Graph Neural Networks (GNNs) have become increasingly important in recent years due to their state-of-the-art performance on many important downstream applications. Existing GNNs have mostly focused on learning a single node representation, despite that a node often exhibits polysemous behavior in different contexts. In this work, we develop a persona-based graph neural network framework called PersonaSAGE that learns multiple persona-based embeddings for each node in the graph. Such disentangled representations are more interpretable and useful than a single embedding. Furthermore, PersonaSAGE learns the appropriate set of persona embeddings for each node in the graph, and every node can have a different number of assigned persona embeddings. The framework is flexible enough and the general design helps in the wide applicability of the learned embeddings to suit the domain. We utilize publicly available benchmark datasets to evaluate our approach and against a variety of baselines. The experiments demonstrate the effectiveness of PersonaSAGE for a variety of important tasks including link prediction where we achieve an average gain of 15% while remaining competitive for node classification. Finally, we also demonstrate the utility of PersonaSAGE with a case study for personalized recommendation of different entity types in a data management platform., 10 pages, 6 figures, 7 tables
- Published
- 2022
12. A Hypergraph Neural Network Framework for Learning Hyperedge-Dependent Node Embeddings
- Author
-
Aponte, Ryan, Rossi, Ryan A., Guo, Shunan, Hoffswell, Jane, Lipka, Nedim, Xiao, Chang, Chan, Gromit, Koh, Eunyee, and Ahmed, Nesreen
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Discrete Mathematics (cs.DM) ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) ,Computer Science - Discrete Mathematics - Abstract
In this work, we introduce a hypergraph representation learning framework called Hypergraph Neural Networks (HNN) that jointly learns hyperedge embeddings along with a set of hyperedge-dependent embeddings for each node in the hypergraph. HNN derives multiple embeddings per node in the hypergraph where each embedding for a node is dependent on a specific hyperedge of that node. Notably, HNN is accurate, data-efficient, flexible with many interchangeable components, and useful for a wide range of hypergraph learning tasks. We evaluate the effectiveness of the HNN framework for hyperedge prediction and hypergraph node classification. We find that HNN achieves an overall mean gain of 7.72% and 11.37% across all baseline models and graphs for hyperedge prediction and hypergraph node classification, respectively.
- Published
- 2022
13. GraphZIP: a clique-based sparse graph compression method
- Author
-
Rossi, Ryan A. and Zhou, Rong
- Published
- 2018
- Full Text
- View/download PDF
14. Parallel collective factorization for modeling large heterogeneous networks
- Author
-
Rossi, Ryan A. and Zhou, Rong
- Published
- 2016
- Full Text
- View/download PDF
15. Network Report: A Structured Description for Network Datasets
- Author
-
Zheng, Xinyi, Rossi, Ryan A., Ahmed, Nesreen, and Moritz, Dominik
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Computers and Society ,Computer Science - Machine Learning ,Computers and Society (cs.CY) ,Computer Science - Human-Computer Interaction ,Computer Science - Social and Information Networks ,Human-Computer Interaction (cs.HC) ,Machine Learning (cs.LG) - Abstract
The rapid development of network science and technologies depends on shareable datasets. Currently, there is no standard practice for reporting and sharing network datasets. Some network dataset providers only share links, while others provide some contexts or basic statistics. As a result, critical information may be unintentionally dropped, and network dataset consumers may misunderstand or overlook critical aspects. Inappropriately using a network dataset can lead to severe consequences (e.g., discrimination) especially when machine learning models on networks are deployed in high-stake domains. Challenges arise as networks are often used across different domains (e.g., network science, physics, etc) and have complex structures. To facilitate the communication between network dataset providers and consumers, we propose network report. A network report is a structured description that summarizes and contextualizes a network dataset. Network report extends the idea of dataset reports (e.g., Datasheets for Datasets) from prior work with network-specific descriptions of the non-i.i.d. nature, demographic information, network characteristics, etc. We hope network reports encourage transparency and accountability in network research and development across different fields.
- Published
- 2022
16. Neural Point Process for Learning Spatiotemporal Event Dynamics
- Author
-
Zhou, Zihao, Yang, Xingyi, Rossi, Ryan, Zhao, Handong, and Yu, Rose
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Machine Learning (cs.LG) - Abstract
Learning the dynamics of spatiotemporal events is a fundamental problem. Neural point processes enhance the expressivity of point process models with deep neural networks. However, most existing methods only consider temporal dynamics without spatial modeling. We propose Deep Spatiotemporal Point Process (\ours{}), a deep dynamics model that integrates spatiotemporal point processes. Our method is flexible, efficient, and can accurately forecast irregularly sampled events over space and time. The key construction of our approach is the nonparametric space-time intensity function, governed by a latent process. The intensity function enjoys closed form integration for the density. The latent process captures the uncertainty of the event sequence. We use amortized variational inference to infer the latent process with deep networks. Using synthetic datasets, we validate our model can accurately learn the true intensity function. On real-world benchmark datasets, our model demonstrates superior performance over state-of-the-art baselines. Our code and data can be found at the https://github.com/Rose-STL-Lab/DeepSTPP.
- Published
- 2021
17. Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes
- Author
-
Reddy, Aravind, Rossi, Ryan A., Song, Zhao, Rao, Anup, Mai, Tung, Lipka, Nedim, Wu, Gang, Koh, Eunyee, and Ahmed, Nesreen
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
In this paper, we introduce the online and streaming MAP inference and learning problems for Non-symmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory. The online setting has an additional requirement of maintaining a valid solution at any point in time. For solving these new problems, we propose algorithms with theoretical guarantees, evaluate them on several real-world datasets, and show that they give comparable performance to state-of-the-art offline algorithms that store the entire data in memory and take multiple passes over it.
- Published
- 2021
18. Coloring large complex networks
- Author
-
Rossi, Ryan A. and Ahmed, Nesreen K.
- Published
- 2014
- Full Text
- View/download PDF
19. Graph Deep Factors for Probabilistic Time-series Forecasting.
- Author
-
HONGJIE CHEN, ROSSI, RYAN A., MAHADIK, KANAK, SUNGCHUL KIM, and ELDARDIRY, HODA
- Subjects
MACHINE learning ,FORECASTING ,GRAPH connectivity ,ONLINE education ,GLOBAL method of teaching - Abstract
Effective time-series forecasting methods are of significant importance to solve a broad spectrum of research problems. Deep probabilistic forecasting techniques have recently been proposed for modeling large collections of time-series. However, these techniques explicitly assume either complete independence (local model) or complete dependence (global model) between time-series in the collection. This corresponds to the two extreme cases where every time-series is disconnected from every other time-series in the collection or likewise, that every time-series is related to every other time-series resulting in a completely connected graph. In this work, we propose a deep hybrid probabilistic graph-based forecasting framework called Graph Deep Factors (GraphDF) that goes beyond these two extremes by allowing nodes and their time-series to be connected to others in an arbitrary fashion. GraphDF is a hybrid forecasting framework that consists of a relational global and relational local model. In particular, a relational global model learns complex non-linear time-series patterns globally using the structure of the graph to improve both forecasting accuracy and computational efficiency. Similarly, instead of modeling every time-series independently, a relational local model not only considers its individual time-series but also the time-series of nodes that are connected in the graph. The experiments demonstrate the effectiveness of the proposed deep hybrid graph-based forecasting model compared to the state-of-the-art methods in terms of its forecasting accuracy, runtime, and scalability. Our case study reveals that GraphDF can successfully generate cloud usage forecasts and opportunistically schedule workloads to increase cloud cluster utilization by 47.5% on average. Furthermore, we target addressing the common nature of many time-series forecasting applications where time-series are provided in a streaming version; however, most methods fail to leverage the newly incoming time-series values and result in worse performance over time. In this article, we propose an online incremental learning framework for probabilistic forecasting. The framework is theoretically proven to have lower time and space complexity. The framework can be universally applied to many other machine learning-based methods. Effective time-series forecasting methods are of significant importance to solve a broad spectrum of research problems. Deep probabilistic forecasting techniques have recently been proposed for modeling large collections of time-series. However, these techniques explicitly assume either complete independence (local model) or complete dependence (global model) between time-series in the collection. This corresponds to the two extreme cases where every time-series is disconnected from every other time-series in the collection or likewise, that every time-series is related to every other time-series resulting in a completely connected graph. In this work, we propose a deep hybrid probabilistic graph-based forecasting framework called Graph Deep Factors (GraphDF) that goes beyond these two extremes by allowing nodes and their time-series to be connected to others in an arbitrary fashion. GraphDF is a hybrid forecasting framework that consists of a relational global and relational local model. In particular, a relational global model learns complex non-linear time-series patterns globally using the structure of the graph to improve both forecasting accuracy and computational efficiency. Similarly, instead of modeling every time-series independently, a relational local model not only considers its individual time-series but also the time-series of nodes that are connected in the graph. The experiments demonstrate the effectiveness of the proposed deep hybrid graph-based forecasting model compared to the state-of-the-art methods in terms of its forecasting accuracy, runtime, and scalability. Our case study reveals that GraphDF can successfully generate cloud usage forecasts and opportunistically schedule workloads to increase cloud cluster utilization by 47.5% on average. Furthermore, we target addressing the common nature of many time-series forecasting applications where time-series are provided in a streaming version; however, most methods fail to leverage the newly incoming time-series values and result in worse performance over time. In this article, we propose an online incremental learning framework for probabilistic forecasting. The framework is theoretically proven to have lower time and space complexity. The framework can be universally applied to many other machine learning-based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. Insight-centric Visualization Recommendation
- Author
-
Harris, Camille, Rossi, Ryan A., Malik, Sana, Hoffswell, Jane, Du, Fan, Lee, Tak Yeon, Koh, Eunyee, and Zhao, Handong
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction ,Information Retrieval (cs.IR) ,Human-Computer Interaction (cs.HC) ,Machine Learning (cs.LG) ,Computer Science - Information Retrieval - Abstract
Visualization recommendation systems simplify exploratory data analysis (EDA) and make understanding data more accessible to users of all skill levels by automatically generating visualizations for users to explore. However, most existing visualization recommendation systems focus on ranking all visualizations into a single list or set of groups based on particular attributes or encodings. This global ranking makes it difficult and time-consuming for users to find the most interesting or relevant insights. To address these limitations, we introduce a novel class of visualization recommendation systems that automatically rank and recommend both groups of related insights as well as the most important insights within each group. Our proposed approach combines results from many different learning-based methods to discover insights automatically. A key advantage is that this approach generalizes to a wide variety of attribute types such as categorical, numerical, and temporal, as well as complex non-trivial combinations of these different attribute types. To evaluate the effectiveness of our approach, we implemented a new insight-centric visualization recommendation system, SpotLight, which generates and ranks annotated visualizations to explain each insight. We conducted a user study with 12 participants and two datasets which showed that users are able to quickly understand and find relevant insights in unfamiliar data.
- Published
- 2021
21. Personalized Visualization Recommendation.
- Author
-
XIN QIAN, ROSSI, RYAN A., FAN DU, SUNGCHUL KIM, EUNYEE KOH, MALIK, SANA, TAK YEON LEE, and AHMED, NESREEN K.
- Subjects
VISUALIZATION ,PSYCHOLOGICAL feedback ,RECOMMENDER systems - Abstract
Visualization recommendation work has focused solely on scoring visualizations based on the underlying dataset, and not the actual user and their past visualization feedback. These systems recommend the same visualizations for every user, despite that the underlying user interests, intent, and visualization preferences are likely to be fundamentally different, yet vitally important. In this work, we formally introduce the problem of personalized visualization recommendation and present a generic learning framework for solving it. In particular, we focus on recommending visualizations personalized for each individual user based on their past visualization interactions (e.g., viewed, clicked, manually created) along with the data from those visualizations. More importantly, the framework can learn from visualizations relevant to other users, even if the visualizations are generated from completely different datasets. Experiments demonstrate the effectiveness of the approach as it leads to higher quality visualization recommendations tailored to the specific user intent and preferences. To support research on this new problem, we release our user-centric visualization corpus consisting of 17.4k users exploring 94k datasets with 2.3 million attributes and 32k user-generated visualizations. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation
- Author
-
Raman, Mrigank, Chan, Aaron, Agarwal, Siddhant, Wang, Peifeng, Wang, Hansen, Kim, Sungchul, Rossi, Ryan, Zhao, Handong, Lipka, Nedim, and Ren, Xiang
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computation and Language (cs.CL) ,Machine Learning (cs.LG) - Abstract
Knowledge graphs (KGs) have helped neural models improve performance on various knowledge-intensive tasks, like question answering and item recommendation. By using attention over the KG, such KG-augmented models can also "explain" which KG information was most relevant for making a given prediction. In this paper, we question whether these models are really behaving as we expect. We show that, through a reinforcement learning policy (or even simple heuristics), one can produce deceptively perturbed KGs, which maintain the downstream performance of the original KG while significantly deviating from the original KG's semantics and structure. Our findings raise doubts about KG-augmented models' ability to reason about KG information and give sensible explanations., 13 pages, 11 figures
- Published
- 2020
23. Automating Outlier Detection via Meta-Learning
- Author
-
Zhao, Yue, Rossi, Ryan A., and Akoglu, Leman
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Information Retrieval (cs.IR) ,Machine Learning (cs.LG) ,Computer Science - Information Retrieval - Abstract
Given an unsupervised outlier detection (OD) task on a new dataset, how can we automatically select a good outlier detection method and its hyperparameter(s) (collectively called a model)? Thus far, model selection for OD has been a "black art"; as any model evaluation is infeasible due to the lack of (i) hold-out data with labels, and (ii) a universal objective function. In this work, we develop the first principled data-driven approach to model selection for OD, called MetaOD, based on meta-learning. MetaOD capitalizes on the past performances of a large body of detection models on existing outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without using any labels. To capture task similarity, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Through comprehensive experiments, we show the effectiveness of MetaOD in selecting a detection model that significantly outperforms the most popular outlier detectors (e.g., LOF and iForest) as well as various state-of-the-art unsupervised meta-learners while being extremely fast. To foster reproducibility and further research on this new problem, we open-source our entire meta-learning system, benchmark environment, and testbed datasets., 21 pages. The code is available at http://github.com/yzhao062/MetaOD
- Published
- 2020
24. From Static to Dynamic Node Embeddings
- Author
-
Jin, Di, Kim, Sungchul, Rossi, Ryan A., and Koutra, Danai
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) - Abstract
We introduce a general framework for leveraging graph stream data for temporal prediction-based applications. Our proposed framework includes novel methods for learning an appropriate graph time-series representation, modeling and weighting the temporal dependencies, and generalizing existing embedding methods for such data. While previous work on dynamic modeling and embedding has focused on representing a stream of timestamped edges using a time-series of graphs based on a specific time-scale (e.g., 1 month), we propose the notion of an $\epsilon$-graph time-series that uses a fixed number of edges for each graph, and show its superiority over the time-scale representation used in previous work. In addition, we propose a number of new temporal models based on the notion of temporal reachability graphs and weighted temporal summary graphs. These temporal models are then used to generalize existing base (static) embedding methods by enabling them to incorporate and appropriately model temporal dependencies in the data. From the 6 temporal network models investigated (for each of the 7 base embedding methods), we find that the top-3 temporal models are always those that leverage the new $\epsilon$-graph time-series representation. Furthermore, the dynamic embedding methods from the framework almost always achieve better predictive performance than existing state-of-the-art dynamic node embedding methods that are developed specifically for such temporal prediction tasks. Finally, the findings of this work are useful for designing better dynamic embedding methods.
- Published
- 2020
25. Inferring Individual Level Causal Models from Graph-based Relational Time Series
- Author
-
Rossi, Ryan, Sarkhel, Somdeb, and Ahmed, Nesreen
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) - Abstract
In this work, we formalize the problem of causal inference over graph-based relational time-series data where each node in the graph has one or more time-series associated to it. We propose causal inference models for this problem that leverage both the graph topology and time-series to accurately estimate local causal effects of nodes. Furthermore, the relational time-series causal inference models are able to estimate local effects for individual nodes by exploiting local node-centric temporal dependencies and topological/structural dependencies. We show that simpler causal models that do not consider the graph topology are recovered as special cases of the proposed relational time-series causal inference model. We describe the conditions under which the resulting estimate can be used to estimate a causal effect, and describe how the Durbin-Wu-Hausman test of specification can be used to test for the consistency of the proposed estimator from data. Empirically, we demonstrate the effectiveness of the causal inference models on both synthetic data with known ground-truth and a large-scale observational relational time-series data set collected from Wikipedia.
- Published
- 2020
26. Role-Based Graph Embeddings.
- Author
-
Ahmed, Nesreen K., Rossi, Ryan A., Lee, John Boaz, Willke, Theodore L., Zhou, Rong, Kong, Xiangnan, and Eldardiry, Hoda
- Subjects
- *
RANDOM walks , *TASK analysis - Abstract
Random walks are at the heart of many existing node embedding and network representation learning methods. However, such methods have many limitations that arise from the use of traditional random walks, e.g., the embeddings resulting from these methods capture proximity (communities) among the vertices as opposed to structural similarity (roles). Furthermore, the embeddings are unable to transfer to new nodes and graphs as they are tied to node identity. To overcome these limitations, we introduce the Role2Vec framework based on the proposed notion of attributed random walks to learn structural role-based embeddings. Notably, the framework serves as a basis for generalizing any walk-based method. The Role2Vec framework enables these methods to be more widely applicable by learning inductive functions that capture the structural roles in the graph. Furthermore, the original methods are recovered as a special case of the framework when each vertex is mapped to its own function that uniquely identifies it. Finally, the Role2Vec framework is shown to be effective with an average AUC improvement of 17.8 percent for link prediction while requiring on average 853x less space than existing methods on a variety of graphs from different domains. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. An Automated Approach to Reasoning About Task-Oriented Insights in Responsive Visualization.
- Author
-
Kim, Hyeok, Rossi, Ryan, Sarma, Abhraneel, Moritz, Dominik, and Hullman, Jessica
- Subjects
VISUALIZATION ,RANDOM forest algorithms ,MACHINE learning - Abstract
Authors often transform a large screen visualization for smaller displays through rescaling, aggregation and other techniques when creating visualizations for both desktop and mobile devices (i.e., responsive visualization). However, transformations can alter relationships or patterns implied by the large screen view, requiring authors to reason carefully about what information to preserve while adjusting their design for the smaller display. We propose an automated approach to approximating the loss of support for task-oriented visualization insights (identification, comparison, and trend) in responsive transformation of a source visualization. We operationalize identification, comparison, and trend loss as objective functions calculated by comparing properties of the rendered source visualization to each realized target (small screen) visualization. To evaluate the utility of our approach, we train machine learning models on human ranked small screen alternative visualizations across a set of source visualizations. We find that our approach achieves an accuracy of 84% (random forest model) in ranking visualizations. We demonstrate this approach in a prototype responsive visualization recommender that enumerates responsive transformations using Answer Set Programming and evaluates the preservation of task-oriented insights using our loss measures. We discuss implications of our approach for the development of automated and semi-automated responsive visualization recommendation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Polyphony : a workflow orchestration framework for Cloud Computing
- Author
-
Shams, Khawaja S, Powell, Dr. Mark W, Crockett, Tom M, Norris, Dr. Jeffrey S, Rossi, Ryan, and Soderstrom, Tom
- Published
- 2010
29. Linear-time Hierarchical Community Detection
- Author
-
Rossi, Ryan A., Ahmed, Nesreen K., Koh, Eunyee, and Kim, Sungchul
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,Computer Science - Social and Information Networks ,Distributed, Parallel, and Cluster Computing (cs.DC) - Abstract
Community detection in graphs has many important and fundamental applications including in distributed systems, compression, image segmentation, divide-and-conquer graph algorithms such as nested dissection, document and word clustering, circuit design, among many others. Finding these densely connected regions of graphs remains an important and challenging problem. Most work has focused on scaling up existing methods to handle large graphs. These methods often partition the graph into two or more communities. In this work, we focus on the problem of hierarchical community detection (i.e., finding a hierarchy of dense community structures going from the lowest granularity to the largest) and describe an approach that runs in linear time with respect to the number of edges and thus fast and efficient for large-scale networks. The experiments demonstrate the effectiveness of the approach quantitatively. Finally, we show an application of it for visualizing large networks with hundreds of thousands of nodes/links.
- Published
- 2019
30. Higher-Order Ranking and Link Prediction: From Closing Triangles to Closing Higher-Order Motifs
- Author
-
Rossi, Ryan A., Rao, Anup, Kim, Sungchul, Koh, Eunyee, Ahmed, Nesreen K., and Wu, Gang
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Information Retrieval (cs.IR) ,Machine Learning (cs.LG) ,Computer Science - Information Retrieval - Abstract
In this paper, we introduce the notion of motif closure and describe higher-order ranking and link prediction methods based on the notion of closing higher-order network motifs. The methods are fast and efficient for real-time ranking and link prediction-based applications such as web search, online advertising, and recommendation. In such applications, real-time performance is critical. The proposed methods do not require any explicit training data, nor do they derive an embedding from the graph data, or perform any explicit learning. Existing methods with the above desired properties are all based on closing triangles (common neighbors, Jaccard similarity, and the ilk). In this work, we investigate higher-order network motifs and develop techniques based on the notion of closing higher-order motifs that move beyond closing simple triangles. All methods described in this work are fast with a runtime that is sublinear in the number of nodes. The experimental results indicate the importance of closing higher-order motifs for ranking and link prediction applications. Finally, the proposed notion of higher-order motif closure can serve as a basis for studying and developing better ranking and link prediction methods.
- Published
- 2019
31. Figure Captioning with Reasoning and Sequence-Level Training
- Author
-
Chen, Charles, Zhang, Ruiyi, Koh, Eunyee, Kim, Sungchul, Cohen, Scott, Yu, Tong, Rossi, Ryan, and Bunescu, Razvan
- Subjects
FOS: Computer and information sciences ,Computer Science - Computation and Language ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Computation and Language (cs.CL) - Abstract
Figures, such as bar charts, pie charts, and line plots, are widely used to convey important information in a concise format. They are usually human-friendly but difficult for computers to process automatically. In this work, we investigate the problem of figure captioning where the goal is to automatically generate a natural language description of the figure. While natural image captioning has been studied extensively, figure captioning has received relatively little attention and remains a challenging problem. First, we introduce a new dataset for figure captioning, FigCAP, based on FigureQA. Second, we propose two novel attention mechanisms. To achieve accurate generation of labels in figures, we propose Label Maps Attention. To model the relations between figure labels, we propose Relation Maps Attention. Third, we use sequence-level training with reinforcement learning in order to directly optimizes evaluation metrics, which alleviates the exposure bias issue and further improves the models in generating long captions. Extensive experiments show that the proposed method outperforms the baselines, thus demonstrating a significant potential for the automatic captioning of vast repositories of figures.
- Published
- 2019
32. Heterogeneous Network Motifs
- Author
-
Rossi, Ryan A., Ahmed, Nesreen K., Carranza, Aldo, Arbour, David, Rao, Anup, Kim, Sungchul, and Koh, Eunyee
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Machine Learning ,Discrete Mathematics (cs.DM) ,Quantitative Biology::Molecular Networks ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) ,Computer Science - Discrete Mathematics - Abstract
Many real-world applications give rise to large heterogeneous networks where nodes and edges can be of any arbitrary type (e.g., user, web page, location). Special cases of such heterogeneous graphs include homogeneous graphs, bipartite, k-partite, signed, labeled graphs, among many others. In this work, we generalize the notion of network motifs to heterogeneous networks. In particular, small induced typed subgraphs called typed graphlets (heterogeneous network motifs) are introduced and shown to be the fundamental building blocks of complex heterogeneous networks. Typed graphlets are a powerful generalization of the notion of graphlet (network motif) to heterogeneous networks as they capture both the induced subgraph of interest and the types associated with the nodes in the induced subgraph. To address this problem, we propose a fast, parallel, and space-efficient framework for counting typed graphlets in large networks. We discover the existence of non-trivial combinatorial relationships between lower-order ($k-1$)-node typed graphlets and leverage them for deriving many of the $k$-node typed graphlets in $o(1)$ constant time. Thus, we avoid explicit enumeration of those typed graphlets. Notably, the time complexity matches the best untyped graphlet counting algorithm. The experiments demonstrate the effectiveness of the proposed framework in terms of runtime, space-efficiency, parallel speedup, and scalability as it is able to handle large-scale networks.
- Published
- 2019
33. Predicting Graph Categories from Structural Properties
- Author
-
Canning, James P., Ingram, Emma E., Nowak-Wolff, Sammantha, Ortiz, Adriana M., Ahmed, Nesreen K., Rossi, Ryan A., Schmitt, Karl R. B., and Soundarajan, Sucheta
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks - Abstract
This paper has been withdrawn from arXiv.org due to a disagreement among the authors related to several peer-review comments received prior to submission on arXiv.org. Even though the current version of this paper is withdrawn, there was no disagreement between authors on the novel work in this paper. One specific issue was the discussion of related work by Ikehara \& Clauset (found on page 8 of the previously posted version). Peer-review comments on a similar version made ALL authors aware that the discussion misrepresented their work prior to submission to arXiv.org. However, some authors choose to post to arXiv a minimally updated version without the consent of all authors or properly addressing this attribution issue. ================ Original Paper Abstract: Complex networks are often categorized according to the underlying phenomena that they represent such as molecular interactions, re-tweets, and brain activity. In this work, we investigate the problem of predicting the category (domain) of arbitrary networks. This includes complex networks from different domains as well as synthetically generated graphs from five different network models. A classification accuracy of $96.6\%$ is achieved using a random forest classifier with both real and synthetic networks. This work makes two important findings. First, our results indicate that complex networks from various domains have distinct structural properties that allow us to predict with high accuracy the category of a new previously unseen network. Second, synthetic graphs are trivial to classify as the classification model can predict with near-certainty the network model used to generate it. Overall, the results demonstrate that networks drawn from different domains (and network models) are trivial to distinguish using only a handful of simple structural properties., This submission has been withdrawn by one of the authors due to an unresolved conflict between the authors. This version of the article did not receive consent for posting to arXiv.org from authors: Karl R. B. Schmitt, Sucheta Soundarajan, James P. Canning, Emma E. Ingram, Sammantha Nowak-Wolff, Adriana M. Ortiz
- Published
- 2018
34. Online Sampling of Temporal Networks.
- Author
-
AHMED, NESREEN K., DUFFIELD, NICK, and ROSSI, RYAN A.
- Subjects
ALGORITHMS ,TIME-varying networks ,PREDICTION models - Abstract
Temporal networks representing a stream of timestamped edges are seemingly ubiquitous in the real world. However, the massive size and continuous nature of these networks make them fundamentally challenging to analyze and leverage for descriptive and predictive modeling tasks. In this work, we propose a general framework for temporal network sampling with unbiased estimation. We develop online, single-pass sampling algorithms, and unbiased estimators for temporal network sampling. The proposed algorithms enable fast, accurate, and memory-efficient statistical estimation of temporal network patterns and properties. In addition, we propose a temporally decaying sampling algorithm with unbiased estimators for studying networks that evolve in continuous time, where the strength of links is a function of time, and the motif patterns are temporally weighted. In contrast to the prior notion of a Δt-temporal motif, the proposed formulation and algorithms for counting temporally weighted motifs are useful for forecasting tasks in networks such as predicting future links, or a future time-series variable of nodes and links. Finally, extensive experiments on a variety of temporal networks from different domains demonstrate the effectiveness of the proposed algorithms. A detailed ablation study is provided to understand the impact of the various components of the proposed framework. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
35. Heterogeneous Graphlets.
- Author
-
ROSSI, RYAN A., AHMED, NESREEN K., CARRANZA, ALDO, ARBOUR, DAVID, RAO, ANUP, SUNGCHUL KIM, and EUNYEE KOH
- Subjects
MAGNITUDE (Mathematics) ,ALGORITHMS - Abstract
In this article, we introduce a generalization of graphlets to heterogeneous networks called typed graphlets. Informally, typed graphlets are small typed induced subgraphs. Typed graphlets generalize graphlets to rich heterogeneous networks as they explicitly capture the higher-order typed connectivity patterns in such networks. To address this problem, we describe a general framework for counting the occurrences of such typed graphlets. The proposed algorithms leverage a number of combinatorial relationships for different typed graphlets. For each edge, we count a few typed graphlets, and with these counts along with the combinatorial relationships, we obtain the exact counts of the other typed graphlets in o(1) constant time. Notably, the worst-case time complexity of the proposed approach matches the time complexity of the best known untyped algorithm. In addition, the approach lends itself to an efficient lock-free and asynchronous parallel implementation. While there are no existing methods for typed graphlets, there has been some work that focused on computing a different and much simpler notion called colored graphlet. The experiments confirm that our proposed approach is orders of magnitude faster and more space-efficient than methods for computing the simpler notion of colored graphlet. Unlike these methods that take hours on small networks, the proposed approach takes only seconds on large networks with millions of edges. Notably, since typed graphlet is more general than colored graphlet (and untyped graphlets), the counts of various typed graphlets can be combined to obtain the counts of the much simpler notion of colored graphlets. The proposed methods give rise to new opportunities and applications for typed graphlets. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
36. Inductive Representation Learning in Large Attributed Graphs
- Author
-
Ahmed, Nesreen K., Rossi, Ryan A., Zhou, Rong, Lee, John Boaz, Kong, Xiangnan, Willke, Theodore L., and Eldardiry, Hoda
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Learning ,Artificial Intelligence (cs.AI) ,Statistics - Machine Learning ,Computer Science - Artificial Intelligence ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) - Abstract
Graphs (networks) are ubiquitous and allow us to model entities (nodes) and the dependencies (edges) between them. Learning a useful feature representation from graph data lies at the heart and success of many machine learning tasks such as classification, anomaly detection, link prediction, among many others. Many existing techniques use random walks as a basis for learning features or estimating the parameters of a graph model for a downstream prediction task. Examples include recent node embedding methods such as DeepWalk, node2vec, as well as graph-based deep learning algorithms. However, the simple random walk used by these methods is fundamentally tied to the identity of the node. This has three main disadvantages. First, these approaches are inherently transductive and do not generalize to unseen nodes and other graphs. Second, they are not space-efficient as a feature vector is learned for each node which is impractical for large graphs. Third, most of these approaches lack support for attributed graphs. To make these methods more generally applicable, we propose a framework for inductive network representation learning based on the notion of attributed random walk that is not tied to node identity and is instead based on learning a function $\Phi : \mathrm{\rm \bf x} \rightarrow w$ that maps a node attribute vector $\mathrm{\rm \bf x}$ to a type $w$. This framework serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many other previous methods that leverage traditional random walks., Comment: NIPS WiML
- Published
- 2017
37. Deep Graph Attention Model
- Author
-
Lee, John Boaz, Rossi, Ryan, and Kong, Xiangnan
- Subjects
FOS: Computer and information sciences ,Computer Science - Learning ,Artificial Intelligence (cs.AI) ,Computer Science - Artificial Intelligence ,Machine Learning (cs.LG) ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Graph classification is a problem with practical applications in many different domains. Most of the existing methods take the entire graph into account when calculating graph features. In a graphlet-based approach, for instance, the entire graph is processed to get the total count of different graphlets or sub-graphs. In the real-world, however, graphs can be both large and noisy with discriminative patterns confined to certain regions in the graph only. In this work, we study the problem of attentional processing for graph classification. The use of attention allows us to focus on small but informative parts of the graph, avoiding noise in the rest of the graph. We present a novel RNN model, called the Graph Attention Model (GAM), that processes only a portion of the graph by adaptively selecting a sequence of "interesting" nodes. The model is equipped with an external memory component which allows it to integrate information gathered from different parts of the graph. We demonstrate the effectiveness of the model through various experiments.
- Published
- 2017
38. Network Classification and Categorization
- Author
-
Canning, James P., Ingram, Emma E., Nowak-Wolff, Sammantha, Ortiz, Adriana M., Ahmed, Nesreen K., Rossi, Ryan A., Schmitt, Karl R. B., and Soundarajan, Sucheta
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Statistics - Machine Learning ,Digital Libraries (cs.DL) ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Computer Science - Digital Libraries - Abstract
To the best of our knowledge, this paper presents the first large-scale study that tests whether network categories (e.g., social networks vs. web graphs) are distinguishable from one another (using both categories of real-world networks and synthetic graphs). A classification accuracy of $94.2\%$ was achieved using a random forest classifier with both real and synthetic networks. This work makes two important findings. First, real-world networks from various domains have distinct structural properties that allow us to predict with high accuracy the category of an arbitrary network. Second, classifying synthetic networks is trivial as our models can easily distinguish between synthetic graphs and the real-world networks they are supposed to model.
- Published
- 2017
39. A Framework for Generalizing Graph-based Representation Learning Methods
- Author
-
Ahmed, Nesreen K., Rossi, Ryan A., Zhou, Rong, Lee, John Boaz, Kong, Xiangnan, Willke, Theodore L., and Eldardiry, Hoda
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Learning ,Artificial Intelligence (cs.AI) ,Statistics - Machine Learning ,Computer Science - Artificial Intelligence ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) - Abstract
Random walks are at the heart of many existing deep learning algorithms for graph data. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to node identity. In this work, we introduce the notion of attributed random walks which serves as a basis for generalizing existing methods such as DeepWalk, node2vec, and many others that leverage random walks. Our proposed framework enables these methods to be more widely applicable for both transductive and inductive learning as well as for use on graphs with attributes (if available). This is achieved by learning functions that generalize to new nodes and graphs. We show that our proposed framework is effective with an average AUC improvement of 16.1% while requiring on average 853 times less space than existing methods on a variety of graphs from several domains.
- Published
- 2017
40. Deep Feature Learning for Graphs
- Author
-
Rossi, Ryan A., Zhou, Rong, and Ahmed, Nesreen K.
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) - Abstract
This paper presents a general graph representation learning framework called DeepGL for learning deep node and edge representations from large (attributed) graphs. In particular, DeepGL begins by deriving a set of base features (e.g., graphlet features) and automatically learns a multi-layered hierarchical graph representation where each successive layer leverages the output from the previous layer to learn features of a higher-order. Contrary to previous work, DeepGL learns relational functions (each representing a feature) that generalize across-networks and therefore useful for graph-based transfer learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns interpretable features, and is space-efficient (by learning sparse feature vectors). In addition, DeepGL is expressive, flexible with many interchangeable components, efficient with a time complexity of $\mathcal{O}(|E|)$, and scalable for large networks via an efficient parallel implementation. Compared with the state-of-the-art method, DeepGL is (1) effective for across-network transfer learning tasks and attributed graph representation learning, (2) space-efficient requiring up to 6x less memory, (3) fast with up to 182x speedup in runtime performance, and (4) accurate with an average improvement of 20% or more on many learning tasks.
- Published
- 2017
41. Estimation of Graphlet Statistics
- Author
-
Rossi, Ryan A., Zhou, Rong, and Ahmed, Nesreen K.
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Distributed, Parallel, and Cluster Computing ,Statistics - Machine Learning ,FOS: Mathematics ,Mathematics - Combinatorics ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Combinatorics (math.CO) - Abstract
Graphlets are induced subgraphs of a large network and are important for understanding and modeling complex networks. Despite their practical importance, graphlets have been severely limited to applications and domains with relatively small graphs. Most previous work has focused on exact algorithms, however, it is often too expensive to compute graphlets exactly in massive networks with billions of edges, and finding an approximate count is usually sufficient for many applications. In this work, we propose an unbiased graphlet estimation framework that is (a) fast with significant speedups compared to the state-of-the-art, (b) parallel with nearly linear-speedups, (c) accurate with
- Published
- 2017
42. On Proximity and Structural Role-based Embeddings in Networks: Misconceptions, Techniques, and Applications.
- Author
-
ROSSI, RYAN A., DI JIN, SUNGCHUL KIM, AHMED, NESREEN K., KOUTRA, DANAI, and LEE, JOHN BOAZ
- Subjects
EMBEDDINGS (Mathematics) - Abstract
Structural roles define sets of structurally similar nodes that are more similar to nodes inside the set than outside, whereas communities define sets of nodes with more connections inside the set than outside. Roles based on structural similarity and communities based on proximity are fundamentally different but important complementary notions. Recently, the notion of structural roles has become increasingly important and has gained a lot of attention due to the proliferation of work on learning representations (node/edge embeddings) from graphs that preserve the notion of roles. Unfortunately, recent work has sometimes confused the notion of structural roles and communities (based on proximity) leading to misleading or incorrect claims about the capabilities of network embedding methods. As such, this article seeks to clarify the misconceptions and key differences between structural roles and communities, and formalize the general mechanisms (e.g., random walks and feature diffusion) that give rise to community- or role-based structural embeddings. We theoretically prove that embedding methods based on these mechanisms result in either community- or role-based structural embeddings. These mechanisms are typically easy to identify and can help researchers quickly determine whether a method preserves community- or role-based embeddings. Furthermore, they also serve as a basis for developing new and improved methods for community- or role-based structural embeddings. Finally, we analyze and discuss applications and data characteristics where community- or role-based embeddings are most appropriate. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
43. Deep Inductive Graph Representation Learning.
- Author
-
Rossi, Ryan A., Zhou, Rong, and Ahmed, Nesreen K.
- Subjects
- *
REPRESENTATIONS of graphs , *NATURAL language processing , *DEEP learning - Abstract
This paper presents a general inductive graph representation learning framework called $\text{DeepGL}$ DeepGL for learning deep node and edge features that generalize across-networks. In particular, $\text{DeepGL}$ DeepGL begins by deriving a set of base features from the graph (e.g., graphlet features) and automatically learns a multi-layered hierarchical graph representation where each successive layer leverages the output from the previous layer to learn features of a higher-order. Contrary to previous work, $\text{DeepGL}$ DeepGL learns relational functions (each representing a feature) that naturally generalize across-networks and are therefore useful for graph-based transfer learning tasks. Moreover, $\text{DeepGL}$ DeepGL naturally supports attributed graphs, learns interpretable inductive graph representations, and is space-efficient (by learning sparse feature vectors). In addition, $\text{DeepGL}$ DeepGL is expressive, flexible with many interchangeable components, efficient with a time complexity of $\mathcal {O}(|E|)$ O (| E |) , and scalable for large networks via an efficient parallel implementation. Compared with recent methods, $\text{DeepGL}$ DeepGL is (1) effective for across-network transfer learning tasks and large (attributed) graphs, (2) space-efficient requiring up to 6x less memory, (3) fast with up to 106x speedup in runtime performance, and (4) accurate with an average improvement in AUC of 20 percent or more on many learning tasks and across a wide variety of networks. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
44. Revisiting Role Discovery in Networks: From Node to Edge Roles
- Author
-
Ahmed, Nesreen K., Rossi, Ryan A., Willke, Theodore L., and Zhou, Rong
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Machine Learning (cs.LG) ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
Previous work in network analysis has focused on modeling the mixed-memberships of node roles in the graph, but not the roles of edges. We introduce the edge role discovery problem and present a generalizable framework for learning and extracting edge roles from arbitrary graphs automatically. Furthermore, while existing node-centric role models have mainly focused on simple degree and egonet features, this work also explores graphlet features for role discovery. In addition, we also develop an approach for automatically learning and extracting important and useful edge features from an arbitrary graph. The experimental results demonstrate the utility of edge roles for network analysis tasks on a variety of graphs from various problem domains.
- Published
- 2016
45. Hybrid CPU-GPU Framework for Network Motifs
- Author
-
Rossi, Ryan A. and Zhou, Rong
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,I.2.6 ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,H.2.8 ,G.1.0 ,Computer Science::Performance ,Computer Science::Graphics ,Computer Science - Distributed, Parallel, and Cluster Computing ,Statistics - Machine Learning ,Computer Science::Mathematical Software ,Distributed, Parallel, and Cluster Computing (cs.DC) - Abstract
Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition to the hybrid multi-core CPU-GPU framework, we also investigate single GPU methods (using multiple cores) and multi-GPU methods that leverage all available GPUs simultaneously for computing induced subgraph statistics. Both methods leverage GPU devices only, whereas the hybrid multi-core CPU-GPU framework leverages all available multi-core CPUs and multiple GPUs for computing graphlets in large networks. Compared to recent approaches, our methods are orders of magnitude faster, while also more cost effective enjoying superior performance per capita and per watt. In particular, the methods are up to 300 times faster than the recent state-of-the-art method. To the best of our knowledge, this is the first work to leverage multiple CPUs and GPUs simultaneously for computing induced subgraph statistics.
- Published
- 2016
46. Relational Similarity Machines
- Author
-
Rossi, Ryan A., Zhou, Rong, and Ahmed, Nesreen K.
- Subjects
FOS: Computer and information sciences ,Computer Science::Machine Learning ,Computer Science - Learning ,Artificial Intelligence (cs.AI) ,Statistics - Machine Learning ,Computer Science - Artificial Intelligence ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
This paper proposes Relational Similarity Machines (RSM): a fast, accurate, and flexible relational learning framework for supervised and semi-supervised learning tasks. Despite the importance of relational learning, most existing methods are hard to adapt to different settings, due to issues with efficiency, scalability, accuracy, and flexibility for handling a wide variety of classification problems, data, constraints, and tasks. For instance, many existing methods perform poorly for multi-class classification problems, graphs that are sparsely labeled or network data with low relational autocorrelation. In contrast, the proposed relational learning framework is designed to be (i) fast for learning and inference at real-time interactive rates, and (ii) flexible for a variety of learning settings (multi-class problems), constraints (few labeled instances), and application domains. The experiments demonstrate the effectiveness of RSM for a variety of tasks and data., MLG16
- Published
- 2016
47. A Web-based Interactive Visual Graph Analytics Platform
- Author
-
Ahmed, Nesreen K. and Rossi, Ryan A.
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Statistics - Machine Learning ,Computer Science - Human-Computer Interaction ,Machine Learning (stat.ML) ,Computer Science - Social and Information Networks ,Human-Computer Interaction (cs.HC) - Abstract
This paper proposes a web-based visual graph analytics platform for interactive graph mining, visualization, and real-time exploration of networks. GraphVis is fast, intuitive, and flexible, combining interactive visualizations with analytic techniques to reveal important patterns and insights for sense making, reasoning, and decision making. Networks can be visualized and explored within seconds by simply drag-and-dropping a graph file into the web browser. The structure, properties, and patterns of the network are computed automatically and can be instantly explored in real-time. At the heart of GraphVis lies a multi-level interactive network visualization and analytics engine that allows for real-time graph mining and exploration across multiple levels of granularity simultaneously. Both the graph analytic and visualization techniques (at each level of granularity) are dynamic and interactive, with immediate and continuous visual feedback upon every user interaction (e.g., change of a slider for filtering). Furthermore, nodes, edges, and subgraphs are easily inserted, deleted or exported via a number of novel techniques and tools that make it extremely easy and flexible for exploring, testing hypothesis, and understanding networks in real-time over the web. A number of interactive visual graph analytic techniques are also proposed including interactive role discovery methods, community detection, as well as a number of novel block models for generating graphs with community structure. Finally, we also highlight other key aspects including filtering, querying, ranking, manipulating, exporting, partitioning, as well as tools for dynamic network analysis and visualization, interactive graph generators, and a variety of multi-level network analysis, summarization, and statistical techniques.
- Published
- 2015
48. NetworkRepository: An Interactive Data Repository with Multi-scale Visual Analytics
- Author
-
Rossi, Ryan A. and Ahmed, Nesreen K.
- Subjects
Social and Information Networks (cs.SI) ,FOS: Computer and information sciences ,Computer Science - Human-Computer Interaction ,Digital Libraries (cs.DL) ,Computer Science - Digital Libraries ,Computer Science - Social and Information Networks ,Human-Computer Interaction (cs.HC) - Abstract
Network Repository (NR) is the first interactive data repository with a web-based platform for visual interactive analytics. Unlike other data repositories (e.g., UCI ML Data Repository, and SNAP), the network data repository (networkrepository.com) allows users to not only download, but to interactively analyze and visualize such data using our web-based interactive graph analytics platform. Users can in real-time analyze, visualize, compare, and explore data along many different dimensions. The aim of NR is to make it easy to discover key insights into the data extremely fast with little effort while also providing a medium for users to share data, visualizations, and insights. Other key factors that differentiate NR from the current data repositories is the number of graph datasets, their size, and variety. While other data repositories are static, they also lack a means for users to collaboratively discuss a particular dataset, corrections, or challenges with using the data for certain applications. In contrast, we have incorporated many social and collaborative aspects into NR in hopes of further facilitating scientific research (e.g., users can discuss each graph, post observations, visualizations, etc.)., AAAI 2015 DT
- Published
- 2014
49. Interactive Visual Graph Mining and Learning.
- Author
-
Rossi, Ryan A., Ahmed, Nesreen K., Zhou, Rong, and Eldardiry, Hoda
- Subjects
- *
DATA mining , *INTERACTIVE computer graphics , *MACHINE learning , *DATA modeling , *PREDICTION models - Abstract
This article presents a platform for interactive graph mining and relational machine learning called GraphVis. The platform combines interactive visual representations with state-of-the-art graph mining and relational machine learning techniques to aid in revealing important insights quickly as well as learning an appropriate and highly predictive model for a particular task (e.g., classification, link prediction, discovering the roles of nodes, and finding influential nodes). Visual representations and interaction techniques and tools are developed for simple, fast, and intuitive real-time interactive exploration, mining, and modeling of graph data. In particular, we propose techniques for interactive relational learning (e.g., node/link classification), interactive link prediction and weighting, role discovery and community detection, higher-order network analysis (via graphlets, network motifs), among others. GraphVis also allows for the refinement and tuning of graph mining and relational learning methods for specific application domains and constraints via an end-to-end interactive visual analytic pipeline that learns, infers, and provides rapid interactive visualization with immediate feedback at each change/prediction in real-time. Other key aspects include interactive filtering, querying, ranking, manipulating, exporting, as well as tools for dynamic network analysis and visualization, interactive graph generators (including new block model approaches), and a variety of multi-level network analysis techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
50. Estimation of Graphlet Counts in Massive Networks.
- Author
-
Rossi, Ryan A., Zhou, Rong, and Ahmed, Nesreen K.
- Subjects
- *
COMPUTER networks , *GRAPH theory - Abstract
Graphlets are induced subgraphs of a large network and are important for understanding and modeling complex networks. Despite their practical importance, graphlets have been severely limited to applications and domains with relatively small graphs. Most previous work has focused on exact algorithms; however, it is often too expensive to compute graphlets exactly in massive networks with billions of edges, and finding an approximate count is usually sufficient for many applications. In this paper, we propose an unbiased graphlet estimation framework that is: (a) fast with large speedups compared to the state of the art; (b) parallel with nearly linear speedups; (c) accurate with less than 1% relative error; (d) scalable and space efficient for massive networks with billions of edges; and (e) effective for a variety of real-world settings as well as estimating global and local graphlet statistics (e.g., counts). On 300 networks from 20 domains, we obtain <1% relative error for all graphlets. This is vastly more accurate than the existing methods while using less data. Moreover, it takes a few seconds on billion edge graphs (as opposed to days/weeks). These are by far the largest graphlet computations to date. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.