Descriptor: "scholarly big data" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"scholarly big data"' showing total 31 results

Start Over Descriptor "scholarly big data"

31 results on '"scholarly big data"'

1. Deep Pre-Training Transformers for Scientific Paper Representation.

Author: Wang, Jihong, Yang, Zhiguang, and Cheng, Zhanglin
Subjects: NATURAL language processing, LANGUAGE models, SCIENTIFIC literature, BIG data, VECTOR spaces
Abstract: In the age of scholarly big data, efficiently navigating and analyzing the vast corpus of scientific literature is a significant challenge. This paper introduces a specialized pre-trained BERT-based language model, termed SPBERT, which enhances natural language processing tasks specifically tailored to the domain of scientific paper analysis. Our method employs a novel neural network embedding technique that leverages textual components, such as keywords, titles, abstracts, and full texts, to represent papers in a vector space. By integrating recent advancements in text representation and unsupervised feature aggregation, SPBERT offers a sophisticated approach to encode essential information implicitly, thereby enhancing paper classification and literature retrieval tasks. We applied our method to several real-world academic datasets, demonstrating notable improvements over existing methods. The findings suggest that SPBERT not only provides a more effective representation of scientific papers but also facilitates a deeper understanding of large-scale academic data, paving the way for more informed and accurate scholarly analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Anomalous citations detection in academic networks.

Author: Liu, Jiaying, Bai, Xiaomei, Wang, Mengying, Tuarob, Suppawong, and Xia, Feng
Abstract: Citation network analysis attracts increasing attention from disciplines of complex network analysis and science of science. One big challenge in this regard is that there are unreasonable citations in citation networks, i.e., cited papers are not relevant to the citing paper. Existing research on citation analysis has primarily concentrated on the contents and ignored the complex relations between academic entities. In this paper, we propose a novel research topic, that is, how to detect anomalous citations. To be specific, we first define anomalous citations and propose a unified framework, named ACTION, to detect anomalous citations in a heterogeneous academic network. ACTION is established based on non-negative matrix factorization and network representation learning, which considers not only the relevance of citation contents but also the relationships among academic entities including journals, papers, and authors. To evaluate the performance of ACTION, we construct three anomalous citation datasets. Experimental results demonstrate the effectiveness of the proposed method. Detecting anomalous citations carry profound significance for academic fairness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Measuring Similarity of Dual-Modal Academic Data Based on Multi-Fusion Representation Learning

Author: Li Zhang, Qiang Gao, Ming Liu, Zepeng Gu, and Bo Lang
Subjects: Scholarly big data, deep learning, multi fusion, dual-modal academic data, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: Nowadays, academic materials such as articles, patents, lecture notes, and observation records often use both texts and images (i.e., dual-modal data) to illustrate scientific issues. Measuring the similarity of such dual-modal academic data largely depends on dual-modal features, which is far from satisfying in practice. To learn dual-modal feature representation, most current approaches mine interactions between texts and images on top of their fusion networks. This work proposes a multi-fusion deep learning framework that learns semantically richer dual-modal representations. The framework designs multiple fusion points in the feature space of various levels, and gradually integrates the fusion information from the low-level to the high-level. In addition, we develop a multi-channel decoding network with alternate fine-tuning strategies to mine modal-specific features and cross-modal correlations thoroughly. To our knowledge, this is the first work to bring forward deep learning functions for dual-modal academic data. It reduces the semantic and statistical attribute differences between two modalities, thereby learning robust representations. A large number of experiments conducted on real-world data sets show that our method has significant performance compared with state-of-the-art approaches.
Published: 2024
Full Text: View/download PDF

4. Preprocessing framework for scholarly big data management.

Author: Khan, Samiya and Alam, Mansaf
Abstract: Big data technologies have found applications in disparate domains. One of the largest sources of textual big data is scientific documents and papers. Scholarly big data has been used in numerous ways to develop innovative applications such as collaborator discovery, expert finding and research management systems. With the evolution of machine and deep learning techniques, the efficacy of such applications has risen manifold. However, the biggest challenge in the development of deep learning models for scholarly applications in cloud-based environment is the under-utilization of resources because of the excessive time required for textual preprocessing. This paper presents a preprocessing pipeline that uses Spark for data ingestion and Spark ML for performing preprocessing tasks. The proposed approach is evaluated with the help of a case study, which uses LSTM-based text summarization to generate title or summaries from abstracts of scholarly articles. Results indicate a substantial reduction in ingestion, preprocessing and cumulative time for the proposed approach, which shall manifest reduction in development time and costs as well. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach.

Author: Im, Yunjeong, Song, Gyuwon, and Cho, Minsang
Subjects: RECOMMENDER systems, MACHINE learning, CONFLICT of interests, STANDARD deviations, FIELD research
Abstract: Academic societies and funding bodies that conduct peer reviews need to select the best reviewers in each field to ensure publication quality. Conventional approaches for reviewer selection focus on evaluating expertise based on research relevance by subject or discipline. An improved perceiving conflict of interest (CoI) reviewer recommendation process that combines the five expertise indices and graph analysis techniques is proposed in this paper. This approach collects metadata from the academic database and extracts candidates based on research field similarities utilizing text mining; then, the candidate scores are calculated and ranked through a professionalism index-based analysis. The highly connected subgraphs (HCS) algorithm is used to cluster similar researchers based on their association or intimacy in the researcher network. The proposed method is evaluated using root mean square error (RMSE) indicators for matching the field of publication and research fields of the recommended experts using keywords of papers published in Korean journals over the past five years. The results show that the system configures a group of Top-K reviewers with an RMSE 0.76. The proposed method can be applied to the academic society and national research management system to realize fair and efficient screening and management. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. CLARA: citation and similarity-based author ranking.

Author: Bedru, Hayat D., Zhang, Chen, Xie, Feng, Yu, Shuo, and Hussain, Iftikhar
Abstract: Scientific collaboration is getting tremendous attention from scholars and becoming the most common way of producing research works from different disciplines, enabling them to solve complex problems. Nevertheless, when the number of collaborators increases in research work, it becomes challenging to single out and recognize one scholar who contributes the most to the collaboration team of multiauthored publications. Hence, determining an influential author either from multiauthored papers or co-authorship networks is an interesting research problem. To address these problems, we develop a citation and similarity-based author ranking method, namely CLARA, that captures the influential author in multiauthored publications. The method considers attributes of publications such as citing papers and co-cited papers and similarity between publications. Firstly, the method computes the contribution of the co-authors in a given paper by employing fractional counting metrics. Secondly, it computes the contextual similarity between the given paper and its co-cited papers. Finally, the method ranks each co-author using the mathematically defined metric, called KeyScore, and discovers the "key" author among the co-authors of the given paper. We validate our method by extracting the papers of the "Chinese Outstanding Youth" winning researchers from the Microsoft Academic Graph dataset. The experimental results show that the CLARA method performs well in identifying key authors accurately and effectively, despite the position of the authors in the author list of their corresponding papers. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. An Interactive Scholarly Collaborative Network Based on Academic Relationships and Research Collaborations.

Author: Almuhanna, Abrar A., Yafooz, Wael M. S., and Alsaeedi, Abdullah
Subjects: UNIVERSITY research, DIGITAL technology, INTERNET publishing, ELECTRONIC publications, SOCIAL networks, ELECTRONIC journals
Abstract: In this era of digital transformation, when the amount of scholarly literature is rapidly growing, hundreds of papers are published online daily with regard to different fields, especially in relation to academic subjects. Therefore, it difficult to find an expert/author to collaborate with from a specific research area. This is thought to be one of the most challenging activities in academia, and few people have considered authors' multi-factors as an enhanced method to find potential collaborators or to identify the expert among them; consequently, this research aims to propose a novel model to improve the process of recommending authors. This is based on the authors' similarity measurements by extracting their explicit and implicit topics of interest from their academic literature. The proposed model mainly consists of three factors: author-selected keywords, the extraction of a topic's distribution from their publications, and their publication-based statistics. Furthermore, an enhanced approach for identifying expert authors by extracting evidence of expertise has been proposed based on the topic-modeling principle. Subsequently, an interactive network has been constructed that represents the predicted authors' collaborative relationship, including the top-k potential collaborators for each individual. Three experiments have been conducted on the collected data; they demonstrated that the most influential factor for accurately recommending a collaborator was the topic's distribution, which had an accuracy rate of 88.4%. Future work could involve building a heterogeneous co-collaboration network that includes both the authors with their affiliations and computing their similarities. In addition, the recommendations would be improved if potential and real collaborations were combined in a single network. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

8. Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach

Author: Yunjeong Im, Gyuwon Song, and Minsang Cho
Subjects: conflict of interest, expert recommendation, highly connected subgraph, machine learning, natural language processing, scholarly big data, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Academic societies and funding bodies that conduct peer reviews need to select the best reviewers in each field to ensure publication quality. Conventional approaches for reviewer selection focus on evaluating expertise based on research relevance by subject or discipline. An improved perceiving conflict of interest (CoI) reviewer recommendation process that combines the five expertise indices and graph analysis techniques is proposed in this paper. This approach collects metadata from the academic database and extracts candidates based on research field similarities utilizing text mining; then, the candidate scores are calculated and ranked through a professionalism index-based analysis. The highly connected subgraphs (HCS) algorithm is used to cluster similar researchers based on their association or intimacy in the researcher network. The proposed method is evaluated using root mean square error (RMSE) indicators for matching the field of publication and research fields of the recommended experts using keywords of papers published in Korean journals over the past five years. The results show that the system configures a group of Top-K reviewers with an RMSE 0.76. The proposed method can be applied to the academic society and national research management system to realize fair and efficient screening and management.
Published: 2023
Full Text: View/download PDF

9. The Use of Academic Social Networking Sites in Scholarly Communication: Scoping Review.

Author: Hailu, Milkyas and Wu, Jianhua
Subjects: *ONLINE social networks, *SCHOLARLY communication, *INFORMATION literacy, *INTERSTELLAR communication, *SOCIAL interaction
Abstract: This research provides a systematic analysis of 115 previous literatures on the use of academic social networking sites (ASNs) in scholarly communication. Previous research on the subject has mainly taken a disciplinary and user perspective. This research conceptualizes the use of ASNs in scholarly communication in the space between social interactions and the technologies themselves. Keyword analysis and scoping review approaches have been used to analyze the comprehensive literature in the field. The study found a geographic variation in what motivates academics to use ASNs. Scholar discovery and sharing are the primary driving factors identified in the literature. Four main themes within the research literature are proposed: motivation and uses, impact assessment, features and services, and scholarly big data. The study found that there has been an increase in scholarly big data research in recent years. The paper also discusses the key findings and concepts stated in each theme. This gives academics a better understanding of what ASNs can do and their weaknesses, and identifies gaps in the literature that are worth addressing in future investigations. We suggest that future studies may also extend the existing theoretical framework and epistemological approaches to better predict and clarify the socio-technical dimensions of ASNs use in scholarly communication. In addition, this study has implications for academic and research institutions, libraries and information literacy programs, and future studies on the topic. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

10. A Pattern-Based Academic Reviewer Recommendation Combining Author-Paper and Diversity Metrics

Author: Musa Ibrahim Musa Ishag, Kwang Ho Park, Jong Yun Lee, and Keun Ho Ryu
Subjects: High utility itemset mining, recommender system, expert finding, scholarly big data, reviewer assignment, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: With the rapid increase of publishable research articles and manuscripts, the pressure to find reviewers often overwhelms the journal editors. This paper incorporates the major entity level metrics found in the heterogeneous publication networks into a pattern mining process in order to recommend academic reviewers and potential research collaborators. In essence, the paper integrates authors' h-index and papers' citation count and proposes a quantification to account for the author diversity into one formula duped impact to measure the real influence of a scientific paper. Thereafter, this paper formulates two kinds of target patterns and mines them harnessing the high-utility itemset mining (HUIM) framework. The first pattern, researcher-general topic patterns (RGP), is a pattern that includes only researchers; whereas, the researcher-specific topic patterns (RSP) is comprised of combinations of researchers and keywords that summarize their niche of expertise. The HUI algorithms of Two Phase, IHUP, UP-Growth, FHM, FHN, HUINIV-Mine, D2HUP, and EFIM were compared on two real-world citation datasets related to Deep Learning and HUIM, in addition to the open source mushroom dataset. The EFIM algorithm showed good performance in terms of run time and memory usage. Consequently, it was then used to mine the patterns within the proposed framework. The discovered patterns of RGP and RSP showed high coverage, proving the efficiency of the proposed framework.
Published: 2019
Full Text: View/download PDF

11. API: An Index for Quantifying a Scholar’s Academic Potential

Author: Jing Ren, Lei Wang, Kailai Wang, Shuo Yu, Mingliang Hou, Ivan Lee, Xiangjie Kong, and Feng Xia
Subjects: Scholarly big data, scholarly data analysis, academic potential, rising stars, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: In the context of big scholarly data, various metrics and indicators have been widely applied to evaluate the impact of scholars from different perspectives, such as publication counts, citations, h -index, and their variants. However, these indicators have limited capacity in characterizing prospective impacts or achievements of scholars. To solve this problem, we propose the Academic Potential Index (API) to quantify scholar's academic potential. Furthermore, an algorithm is devised to calculate the value of API. It should be noted that API is a dynamic index throughout scholar's academic career. By applying API to rank scholars, we can identify scholars who show their academic potentials during the early academic careers. With extensive experiments conducted based on the Microsoft Academic Graph dataset, it can be found that the proposed index evaluates scholars' academic potentials effectively and captures the variation tendency of their academic impacts. Besides, we also apply this index to identify rising stars in academia. Experimental results show that the proposed API can achieve superior performance in identifying potential scholars compared with three baseline methods.
Published: 2019
Full Text: View/download PDF

12. Detecting Target Text Related to Algorithmic Efficiency in Scholarly Big Data Using Recurrent Convolutional Neural Network Model

Author: Safder, Iqra, Sarfraz, Junaid, Hassan, Saeed-Ul, Ali, Mohsen, Tuarob, Suppawong, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Choemprayong, Songphan, editor, Crestani, Fabio, editor, and Cunningham, Sally Jo, editor
Published: 2017
Full Text: View/download PDF

13. Discovering communities based on mention distance.

Author: Zhang, Li, Liu, Ming, Wang, Bo, Lang, Bo, and Yang, Peng
Abstract: Scholarly community detection has important applications in various fields. Current studies rely heavily on structured scholar networks, which have high computational complexity and are challenging to construct in practice. We propose a novel approach that can detect disjoint and overlapping scholarly communities directly from large textual corpora. To the best of our knowledge, this is the first study intended to detect communities directly from unstructured texts. In general, academic articles tend to mention related work and researchers. Researchers that are more closely related to each other are mentioned in a closer grouping in lines of academic text. Based on this correlation, we propose an intuitional method that measures the mutual relatedness of researchers through their textual distance. First, we extract and disambiguate the researcher names from academic articles. Then, we embed each researcher as an implicit vector and measure the relatedness of researchers by their vector distance. Finally, the communities are identified by vector clusters. We develop and evaluate our method on several real-world datasets. The experimental results demonstrate that our method achieves comparable performance with several state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

14. Metaphor Research in the 21st Century: A Bibliographic Analysis.

Author: Dongyu Zhang, Minghao Zhang, Ciyuan Peng, Jung, Jason J., and Feng Xia
Abstract: Metaphor is widely used in human communication. The cohort of scholars studying metaphor in various fields is continuously growing, but very few work has been done in bibliographical analysis of metaphor research. This paper examines the advancements in metaphor research from 2000 to 2017. Using data retrieved from Microsoft Academic Graph and Web of Science, this paper makes a macro analysis of metaphor research, and expounds the underlying patterns of its development. Taking into consideration sub-fields of metaphor research, the internal analysis of metaphor research is carried out from a micro perspective to reveal the evolution of research topics and the inherent relationships among them. This paper provides novel insights into the current state of the art of metaphor research as well as future trends in this field, which may spark new research interests in metaphor from both linguistic and interdisciplinary perspectives. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

15. An Interactive Scholarly Collaborative Network Based on Academic Relationships and Research Collaborations

Author: Abrar A. Almuhanna, Wael M. S. Yafooz, and Abdullah Alsaeedi
Subjects: scholarly big data, scholar similarity, collaborator recommendation, expert finding, expertise evidence, academic social networking, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: In this era of digital transformation, when the amount of scholarly literature is rapidly growing, hundreds of papers are published online daily with regard to different fields, especially in relation to academic subjects. Therefore, it difficult to find an expert/author to collaborate with from a specific research area. This is thought to be one of the most challenging activities in academia, and few people have considered authors’ multi-factors as an enhanced method to find potential collaborators or to identify the expert among them; consequently, this research aims to propose a novel model to improve the process of recommending authors. This is based on the authors’ similarity measurements by extracting their explicit and implicit topics of interest from their academic literature. The proposed model mainly consists of three factors: author-selected keywords, the extraction of a topic’s distribution from their publications, and their publication-based statistics. Furthermore, an enhanced approach for identifying expert authors by extracting evidence of expertise has been proposed based on the topic-modeling principle. Subsequently, an interactive network has been constructed that represents the predicted authors’ collaborative relationship, including the top-k potential collaborators for each individual. Three experiments have been conducted on the collected data; they demonstrated that the most influential factor for accurately recommending a collaborator was the topic’s distribution, which had an accuracy rate of 88.4%. Future work could involve building a heterogeneous co-collaboration network that includes both the authors with their affiliations and computing their similarities. In addition, the recommendations would be improved if potential and real collaborations were combined in a single network.
Published: 2022
Full Text: View/download PDF

16. PePSI: Personalized Prediction of Scholars’ Impact in Heterogeneous Temporal Academic Networks

Author: Jun Zhang, Bo Xu, Jiaying Liu, Amr Tolba, Zafer Al-makhadmeh, and Feng Xia
Subjects: Heterogeneous academic networks, scholarly big data, scientific impact prediction, random walk, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
Abstract: The prediction of scholars' scientific impact plays a significant role in accelerating the advancement of science, such as providing basis for the noble prizes, predicting the future influential scholars or research trends, offering tenures for researchers, and selecting promising candidates for research funding. Therefore, the study on scientific impact is of great significance and has drawn increasing interests. However, most current literature on predicting the impact of scholars neglect several vital facts, which are the time evolvement of academic networks, the distinct dynamics of different scholars' impact, and the mutual influence among different scholarly entities. Inspired by the above-mentioned facts, we propose the PePSI solution for personalized prediction of scholars' scientific impact. Our method primarily classifies scholars into different types according to their citation dynamics. For different scholars, we apply modified random walk algorithms to predict their impact in heterogeneous temporal academic networks with different time functions to capture the time-varying feature of academic networks. Experimental results on real data set demonstrate the effectiveness of PePSI in predicting top scholars and the overall impact of scholars with a rather short-term academic information as compared with the state-of-the-art prediction methods.
Published: 2018
Full Text: View/download PDF

17. Automatic Classification of Algorithm Citation Functions in Scientific Literature.

Author: Tuarob, Suppawong, Kang, Sung Woo, Wettayakorn, Poom, Pornprasit, Chanatip, Sachati, Tanakitti, Hassan, Saeed-Ul, and Haddawy, Peter
Subjects: *SCIENTIFIC literature, *CLASSIFICATION algorithms, *AUTOMATIC classification, *ALGORITHMS, *COMPUTER science
Abstract: Computer sciences and related disciplines evolve around developing, evaluating, and applying algorithms. Typically, an algorithm is not developed from scratch, but uses and builds upon existing ones, which often are proposed and published in scholarly articles. The ability to capture this evolution relationship among these algorithms in scientific literature would not only allow us to understand how a particular algorithm is composed, but also shed light on large-scale analysis of algorithmic evolution through different temporal spans and thematic scales. We propose to capture such evolution relationship between two algorithms by investigating the knowledge represented in citation contexts, where authors explain how cited algorithms are used in their works. A set of heterogeneous ensemble machine-learning methods is proposed, where the combination of two base classifiers trained with heterogeneous feature types is used to automatically identify the algorithm usage relationship. The proposed heterogeneous ensemble methods achieve the best average F1 of 0.749 and 0.905 for fine-grained and binary algorithm citation function classification, respectively. The success of this study will allow us to generate a large-scale algorithm citation network from a collection of scholarly documents representing multiple time spans, venues, and fields of study. Such a network will be used as an instrument not only to answer critical questions in algorithm search, such as identifying the most influential and generalizable algorithms, but also to study the evolution of algorithmic development and trends over time. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

18. Anatomy of scholarly information behavior patterns in the wake of academic social media platforms.

Author: Alhoori, Hamed, Samaka, Mohammed, Furuta, Richard, and Fox, Edward A.
Subjects: *INFORMATION-seeking behavior, *SOCIAL media, *SOCIAL networks, *SOCIAL media in education, *SOCIAL services, *DIGITAL libraries, *ELECTRONIC journals
Abstract: As more scholarly content is born digital or converted to a digital format, digital libraries are becoming increasingly vital to researchers seeking to leverage scholarly big data for scientific discovery. Although scholarly products are available in abundance—especially in environments created by the advent of social networking services—little is known about international scholarly information needs, information-seeking behavior, or information use. The purpose of this paper is to address these gaps via an in-depth analysis of the information needs and information-seeking behavior of researchers, both students and faculty, at two universities, one in the USA and the other in Qatar. Based on this analysis, the study identifies and describes new behavior patterns on the part of researchers as they engage in the information-seeking process. The analysis reveals that the use of academic social networks has notable effects on various scholarly activities. Further, this study identifies differences between students and faculty members in regard to their use of academic social networks, and it identifies differences between researchers according to discipline. Although the researchers who participated in the present study represent a range of disciplinary and cultural backgrounds, the study reports a number of similarities in terms of the researchers' scholarly activities. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

19. MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries

Author: Choudhury, Muntabir Hasan, Salsabil, Lamia, Jayanetti, Himarsha R., Wu, Jian, Ingram, William A., Fox, Edward A., Choudhury, Muntabir Hasan, Salsabil, Lamia, Jayanetti, Himarsha R., Wu, Jian, Ingram, William A., and Fox, Edward A.
Abstract: Metadata quality is crucial for discovering digital objects through digital library (DL) interfaces. However, due to various reasons, the metadata of digital objects often exhibits incomplete, inconsistent, and incorrect values. We investigate methods to automatically detect, correct, and canonicalize scholarly metadata, using seven key fields of electronic theses and dissertations (ETDs) as a case study. We propose MetaEnhance, a framework that utilizes state-of-the-art artificial intelligence (AI) methods to improve the quality of these fields. To evaluate MetaEnhance, we compiled a metadata quality evaluation benchmark containing 500 ETDs, by combining subsets sampled using multiple criteria. We evaluated MetaEnhance against this benchmark and found that the proposed methods achieved nearly perfect F1-scores in detecting errors and F1-scores ranging from 0.85 to 1.00 for correcting five of seven key metadata fields. The codes and data are publicly available on GitHub11https://github.com/lamps-lab/ETDMiner/tree/master/metadata-correction.
Published: 2023

20. A serendipity-biased Deepwalk for collaborators recommendation

Author: Zhenzhen Xu, Yuyuan Yuan, Haoran Wei, and Liangtian Wan
Subjects: Deepwalk, Collaborators recommendation, Serendipity, Vector representation learning, Scholarly big data, Electronic computers. Computer science, QA75.5-76.95
Abstract: Scientific collaboration has become a common behaviour in academia. Various recommendation strategies have been designed to provide relevant collaborators for the target scholars. However, scholars are no longer satisfied with the acquainted collaborator recommendations, which may narrow their horizons. Serendipity in the recommender system has attracted increasing attention from researchers in recent years. Serendipity traditionally denotes the faculty of making surprising discoveries. The unexpected and valuable scientific discoveries in science such as X-rays and penicillin may be attributed to serendipity. In this paper, we design a novel recommender system to provide serendipitous scientific collaborators, which learns the serendipity-biased vector representation of each node in the co-author network. We first introduce the definition of serendipitous collaborators from three components of serendipity: relevance, unexpectedness, and value, respectively. Then we improve the transition probability of random walk in DeepWalk, and propose a serendipity-biased DeepWalk, called Seren2vec. The walker jumps to the next neighbor node with the proportional probability of edge weight in the co-author network. Meanwhile, the edge weight is determined by the three indices in definition. Finally, Top-N serendipitous collaborators are generated based on the cosine similarity between scholar vectors. We conducted extensive experiments on the DBLP data set to validate our recommendation performance, and the evaluations from serendipity-based metrics show that Seren2vec outperforms other baseline methods without much loss of recommendation accuracy.
Published: 2019
Full Text: View/download PDF

21. CiteSeerx: A Scholarly Big Dataset

Author: Caragea, Cornelia, Wu, Jian, Ciobanu, Alina, Williams, Kyle, Fernández-Ramírez, Juan, Chen, Hung-Hsuan, Wu, Zhaohui, Giles, Lee, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, de Rijke, Maarten, editor, Kenter, Tom, editor, de Vries, Arjen P., editor, Zhai, ChengXiang, editor, de Jong, Franciska, editor, Radinsky, Kira, editor, and Hofmann, Katja, editor
Published: 2014
Full Text: View/download PDF

22. CSTeller: forecasting scientific collaboration sustainability based on extreme gradient boosting.

Author: Wang, Wei, Xu, Bo, Liu, Jiaying, Cui, Zixin, Yu, Shuo, Kong, Xiangjie, and Xia, Feng
Subjects: *COOPERATIVE research, *SUSTAINABILITY, *SOCIAL network analysis, *SCIENTIFIC community, *SCIENTIFIC knowledge
Abstract: The mechanism why two strange scholars become collaborators has been extensively studied from the perspective of social network analysis. In academia, two scholars may collaborate with each other more than once, which means that scientific collaboration is to some extent sustainable. However, less research has been done to explore the sustainability of scientific collaboration. In this paper, we examine to what extent the collaboration sustainability can be predicted. For this purpose, an extreme gradient boosting-based collaboration sustainability prediction model named CSTeller is devised. We propose to analyze the sustainability of scientific collaboration from the perspectives of collaboration duration and collaboration times. We investigate factors that may affect collaboration sustainability based on scholars' local properties and network properties. These factors are adopted as input features of CSTeller. Extensive experiments on two real scholarly datasets demonstrate the effectiveness of our proposed model. To the best of our knowledge, this is the first attempt to explore scientific collaboration mechanism from the perspective of sustainability. Our work may shed light on scientific collaboration analysis and benefit many practical issues such as collaborator recommendation since a scientific collaboration is not a one-shot deal. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

23. Evaluating journal impact based on weighted citations.

Author: Zhang, Fuli
Abstract: Publishing at high-rank journals is a common objective to most researchers, and there's a crucial need for a journal ranking system with universal recognition. This paper presents a quantitative approach to rank scientific journals. The approach, HR-PageRank, combines weighted PageRank according to author's H-index, and relevance between citing and cited papers. The output of the proposed approach is compared against journal impact factor, H5-index, PageRank algorithm and China Computer Federation ranking list. The experiments of quantifying scholarly impact objectively are conducted in two real scholarly data sets: (1) Microsoft Academic Graph and (2) Digital Bibliography and Library Project. Our experimental results indicate that HR-PageRank algorithm outperforms the well-known PageRank algorithm in finding the influential journals according to Spearman's rank correlation coefficient, discounted cumulated gain and the correlation C. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

24. An Overview on Evaluating and Predicting Scholarly Article Impact.

Author: Xiaomei Bai, Hui Liu, Fuli Zhang, Zhaolong Ning, Xiangjie Kong, Lee, Ivan, and Feng Xia
Subjects: *PUBLISHED articles, *MACHINE learning, *DATA mining
Abstract: Scholarly article impact reflects the significance of academic output recognised by academic peers, and it often plays a crucial role in assessing the scientific achievements of researchers, teams, institutions and countries. It is also used for addressing various needs in the academic and scientific arena, such as recruitment decisions, promotions, and funding allocations. This article provides a comprehensive review of recent progresses related to article impact assessment and prediction. The review starts by sharing some insight into the article impact research and outlines current research status. Some core methods and recent progress are presented to outline how article impact metrics and prediction have evolved to consider integrating multiple networks. Key techniques, including statistical analysis, machine learning, data mining and network science, are discussed. In particular, we highlight important applications of each technique in article impact research. Subsequently, we discuss the open issues and challenges of article impact research. At the same time, this review points out some important research directions, including article impact evaluation by considering Conflict of Interest, time and location information, various distributions of scholarly entities, and rising stars. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

25. Exploring time factors in measuring the scientific impact of scholars.

Author: Zhang, Jun, Ning, Zhaolong, Bai, Xiaomei, Kong, Xiangjie, Zhou, Jinmeng, and Xia, Feng
Abstract: Taking advantage of the easy access to the rich and massive scholarly data, more and more researchers are focusing on the studies of analyzing and utilizing the scholarly big data. Among them, evaluating the scientific impact of scholars has significant importance. Measuring the scientific impact of scholars can not only provide basis for the applications of academic foundations and awards, but also shed light on the research directions for scholars. Currently, citation based methods and network based metrics are the most commonly used ways to evaluate the scientific impact. However, these approaches ignore several important facts, i.e. the dynamics of citations and the initial qualities of different articles. To alleviate the shortcomings of them, we propose a Time-aware Ranking algorithm (TRank) to evaluate the impact of scholars. Due to scholars' sustainable supreme concerns of academic innovations, the TRank algorithm gives more credits to the newly published scholarly papers as well as their references according to the representative time functions. Our method also combines the merits of random walk algorithms and heterogeneous network topology, i.e. the mutual influences among different scholarly entities in heterogeneous academic networks. To validate the suitable time function for TRank algorithm and explore its performance, we construct the experiments on two real datasets: (1) Digitial Bibliography and Library Project, and (2) American Physical Society. The experimental results demonstrate that our algorithm outperforms other methods in selecting outstanding scholars and the evaluation results on the overall impact of scholars. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

26. CiteSeer x : A Scholarly Big Dataset.

Author: Caragea, Cornelia, Wu, Jian, Ciobanu, Alina, Williams, Kyle, Fernández-Ramírez, Juan, Chen, Hung-Hsuan, Wu, Zhaohui, and Giles, Lee
Abstract: The CiteSeer x digital library stores and indexes research articles in Computer Science and related fields. Although its main purpose is to make it easier for researchers to search for scientific information, CiteSeer x has been proven as a powerful resource in many data mining, machine learning and information retrieval applications that use rich metadata, e.g., titles, abstracts, authors, venues, references lists, etc. The metadata extraction in CiteSeer x is done using automated techniques. Although fairly accurate, these techniques still result in noisy metadata. Since the performance of models trained on these data highly depends on the quality of the data, we propose an approach to CiteSeer x metadata cleaning that incorporates information from an external data source. The result is a subset of CiteSeer x , which is substantially cleaner than the entire set. Our goal is to make the new dataset available to the research community to facilitate future work in Information Retrieval. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

27. API: An Index for Quantifying a Scholar’s Academic Potential

Author: Ivan Lee, Jing Ren, Mingliang Hou, Shuo Yu, Kailai Wang, Xiangjie Kong, Feng Xia, Lei Wang, Ren, Jing, Wang, Lei, Wang, Kailai, Yu, Shuo, Hou, Mingliang, Lee, Ivan, Kong, Xiangjie, and Xia, Feng
Subjects: Index (economics), rising stars, General Computer Science, Computer science, Scholarly big data, scholarly data analysis, Statistics, General Engineering, academic potential, General Materials Science
Abstract: In the context of big scholarly data, various metrics and indicators have been widely applied to evaluate the impact of scholars from different perspectives, such as publication counts, citations, ${h}$-index, and their variants. However, these indicators have limited capacity in characterizing prospective impacts or achievements of scholars. To solve this problem, we propose the Academic Potential Index (API) to quantify scholar's academic potential. Furthermore, an algorithm is devised to calculate the value of API. It should be noted that API is a dynamic index throughout scholar's academic career. By applying API to rank scholars, we can identify scholars who show their academic potentials during the early academic careers. With extensive experiments conducted based on the Microsoft Academic Graph dataset, it can be found that the proposed index evaluates scholars' academic potentials effectively and captures the variation tendency of their academic impacts. Besides, we also apply this index to identify rising stars in academia. Experimental results show that the proposed API can achieve superior performance in identifying potential scholars compared with three baseline methods Refereed/Peer-reviewed
Published: 2019

28. Collaborative Team Recognition: A Core Plus Extension Structure.

Author: Yu, Shuo, Alqahtani, Fayez, Tolba, Amr, Lee, Ivan, Jia, Tao, and Xia, Feng
Subjects: TEAMS in the workplace, TEAMS, COOPERATIVE research
Abstract: • A fine-grained collaborative team recognition method is proposed. • Collaborative teams are formulated with "core+extension" structure. • The underlying relationship between team output and collaboration intensity is found. • It is found that core members have broad collaboration relationships and fixed collaboration patterns. Scientific collaboration is a significant behavior in knowledge creation and idea exchange. To tackle large and complex research questions, a trend of team formation has been observed in recent decades. In this study, we focus on recognizing collaborative teams and exploring inner patterns using scholarly big graph data. We propose a collaborative team recognition (CORE) model with a "core + extension" team structure to recognize collaborative teams in large academic networks. In CORE, we combine an effective evaluation index called the collaboration intensity index with a series of structural features to recognize collaborative teams in which members are in close collaboration relationships. Then, CORE is used to guide the core team members to their extension members. CORE can also serve as the foundation for team-based research. The simulation results indicate that CORE reveals inner patterns of scientific collaboration: senior scholars have broad collaborative relationships and fixed collaboration patterns, which are the underlying mechanisms of team assembly. The experimental results demonstrate that CORE is promising compared with state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

29. Anatomy of scholarly information behavior patterns in the wake of academic social media platforms

Author: Richard Furuta, Mohammed Samaka, Hamed Alhoori, and Edward A. Fox
Subjects: Computer science, Scholarly big data, Information needs, Library and Information Sciences, 050905 science studies, Scholarly communication, Academic social network, Social media, Reference management, Information seeking behavior, Information-seeking behavior, Born-digital, Digital libraries, business.industry, 05 social sciences, Public relations, Digital library, Information behavior, Information organization, 0509 other social sciences, 050904 information & library sciences, business, Discipline
Abstract: As more scholarly content is born digital or converted to a digital format, digital libraries are becoming increasingly vital to researchers seeking to leverage scholarly big data for scientific discovery. Although scholarly products are available in abundance—especially in environments created by the advent of social networking services—little is known about international scholarly information needs, information-seeking behavior, or information use. The purpose of this paper is to address these gaps via an in-depth analysis of the information needs and information-seeking behavior of researchers, both students and faculty, at two universities, one in the USA and the other in Qatar. Based on this analysis, the study identifies and describes new behavior patterns on the part of researchers as they engage in the information-seeking process. The analysis reveals that the use of academic social networks has notable effects on various scholarly activities. Further, this study identifies differences between students and faculty members in regard to their use of academic social networks, and it identifies differences between researchers according to discipline. Although the researchers who participated in the present study represent a range of disciplinary and cultural backgrounds, the study reports a number of similarities in terms of the researchers’ scholarly activities. This publication was made possible by NPRP Grant # 4?029?1?007 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors. This work was supported in part by the Office of Advanced Scientific Computing Research, Office of Science, US Department of Energy, under Contract DE-AC02?06CH11357. An earlier version of the initial work was presented at a TPDL conference [136] and an ICADL conference [137]. Scopus
Published: 2018

30. An overview on evaluating and predicting scholarly article impact

Author: Ivan Lee, Xiaomei Bai, Fuli Zhang, Xiangjie Kong, Hui Liu, Zhaolong Ning, Feng Xia, Bai, Xiaomei, Liu, Hui, Zhang, Fuli, Ning, Zhaolong, Kong, Xiangjie, Lee, Ivan, and Xia, Feng
Subjects: scholarly big data, article impact, machine learning, data mining, FOS: Computer and information sciences, lcsh:T58.5-58.64, lcsh:Information technology, Computer science, Impact assessment, Impact evaluation, 05 social sciences, Conflict of interest, Computer Science - Digital Libraries, Network science, 050905 science studies, Data science, Important research, Statistical analysis, Digital Libraries (cs.DL), 0509 other social sciences, 050904 information & library sciences, Information Systems
Abstract: Scholarly article impact reflects the significance of academic output recognised by academic peers, and it often plays a crucial role in assessing the scientific achievements of researchers, teams, institutions and countries. It is also used for addressing various needs in the academic and scientific arena, such as recruitment decisions, promotions, and funding allocations. This article provides a comprehensive review of recent progresses related to article impact assessment and prediction. The review starts by sharing some insight into the article impact research and outlines current research status. Some core methods and recent progress are presented to outline how article impact metrics and prediction have evolved to consider integrating multiple networks. Key techniques, including statistical analysis, machine learning, data mining and network science, are discussed. In particular, we highlight important applications of each technique in article impact research. Subsequently, we discuss the open issues and challenges of article impact research. At the same time, this review points out some important research directions, including article impact evaluation by considering Conflict of Interest, time and location information, various distributions of scholarly entities, and rising stars. Refereed/Peer-reviewed
Published: 2017

31. A serendipity-biased Deepwalk for collaborators recommendation.

Author: Xu Z, Yuan Y, Wei H, and Wan L
Abstract: Scientific collaboration has become a common behaviour in academia. Various recommendation strategies have been designed to provide relevant collaborators for the target scholars. However, scholars are no longer satisfied with the acquainted collaborator recommendations, which may narrow their horizons. Serendipity in the recommender system has attracted increasing attention from researchers in recent years. Serendipity traditionally denotes the faculty of making surprising discoveries. The unexpected and valuable scientific discoveries in science such as X-rays and penicillin may be attributed to serendipity. In this paper, we design a novel recommender system to provide serendipitous scientific collaborators, which learns the serendipity-biased vector representation of each node in the co-author network. We first introduce the definition of serendipitous collaborators from three components of serendipity: relevance, unexpectedness, and value, respectively. Then we improve the transition probability of random walk in DeepWalk, and propose a serendipity-biased DeepWalk, called Seren2vec. The walker jumps to the next neighbor node with the proportional probability of edge weight in the co-author network. Meanwhile, the edge weight is determined by the three indices in definition. Finally, Top-N serendipitous collaborators are generated based on the cosine similarity between scholar vectors. We conducted extensive experiments on the DBLP data set to validate our recommendation performance, and the evaluations from serendipity-based metrics show that Seren2vec outperforms other baseline methods without much loss of recommendation accuracy., Competing Interests: The authors declare there are no competing interests., (©2019 Xu et al.)
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

31 results on '"scholarly big data"'

1. Deep Pre-Training Transformers for Scientific Paper Representation.

2. Anomalous citations detection in academic networks.

3. Measuring Similarity of Dual-Modal Academic Data Based on Multi-Fusion Representation Learning

4. Preprocessing framework for scholarly big data management.

5. Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach.

6. CLARA: citation and similarity-based author ranking.

7. An Interactive Scholarly Collaborative Network Based on Academic Relationships and Research Collaborations.

8. Perceiving Conflict of Interest Experts Recommendation System Based on a Machine Learning Approach

9. The Use of Academic Social Networking Sites in Scholarly Communication: Scoping Review.

10. A Pattern-Based Academic Reviewer Recommendation Combining Author-Paper and Diversity Metrics

11. API: An Index for Quantifying a Scholar’s Academic Potential

12. Detecting Target Text Related to Algorithmic Efficiency in Scholarly Big Data Using Recurrent Convolutional Neural Network Model

13. Discovering communities based on mention distance.

14. Metaphor Research in the 21st Century: A Bibliographic Analysis.

15. An Interactive Scholarly Collaborative Network Based on Academic Relationships and Research Collaborations

16. PePSI: Personalized Prediction of Scholars’ Impact in Heterogeneous Temporal Academic Networks

17. Automatic Classification of Algorithm Citation Functions in Scientific Literature.

18. Anatomy of scholarly information behavior patterns in the wake of academic social media platforms.

19. MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries

20. A serendipity-biased Deepwalk for collaborators recommendation

21. CiteSeerx: A Scholarly Big Dataset

22. CSTeller: forecasting scientific collaboration sustainability based on extreme gradient boosting.

23. Evaluating journal impact based on weighted citations.

24. An Overview on Evaluating and Predicting Scholarly Article Impact.

25. Exploring time factors in measuring the scientific impact of scholars.

26. CiteSeer x : A Scholarly Big Dataset.

27. API: An Index for Quantifying a Scholar’s Academic Potential

28. Collaborative Team Recognition: A Core Plus Extension Structure.

29. Anatomy of scholarly information behavior patterns in the wake of academic social media platforms

30. An overview on evaluating and predicting scholarly article impact

31. A serendipity-biased Deepwalk for collaborators recommendation.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

31 results on '"scholarly big data"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources