45 results
Search Results
2. A novel double keys adapted elliptic curve cryptography and log normalized Gaussian sigmoid adaptive neuro‐fuzzy interference system based secure resource allocation system in decentralized cloud storage.
- Author
-
Karuppasamy, Lakshmanan and Vasudevan, Venkatraman
- Subjects
- *
RESOURCE allocation , *CLOUD storage , *QUANTUM cryptography , *DATA warehousing , *ELLIPTIC curve cryptography , *INFORMATION retrieval , *MATHEMATICAL optimization , *CLOUD computing - Abstract
Cloud computing has emerged as a promising platform that grants users direct yet shared access to computing resources and services without worrying about the internal complex infrastructure. However, secure data storage and data retrieval is the basic characteristics. Therefore, this paper proposes a novel double keys adapted elliptic curve cryptography (DKECC) and log normalized Gaussian sigmoid adaptive neuro‐fuzzy interference system (LGS‐ANFIS) based secure resource allocation system in decentralized cloud storage. Initially, the input data undergoes data fragmentation. After that, encryption is carried out using DKECC algorithm. Next, hash code generation using Pearson hash function is done. Then, resource availability is estimated via logistic sine chaotic mapping indulged rock hyraxes swarm optimization technique. Afterwards, data is checked for deduplication. Finally, resource allocation through LGS‐ANFIS takes place. The experimental outcomes demonstrated better results compared to baseline techniques. The modelling process showed that the suggested safe resource allocation model achieves the total security level of 96.27% while also having shorter reaction times (3018 ms), greater throughputs (1258), reduced load balancing (0.355874), and reduced latency (4372 ms). [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Long‐Lasting and Rapid‐Responsive Media for Rewritable Information Storage Based on Low‐Cost N‐Substituted Maleimides Oligomers.
- Author
-
Zhai, Congcong, Sun, Yipeng, Xu, Lin, Azhar, Umair, Zhang, Yabin, Zong, Chuanyong, and Zhang, Shuxiang
- Subjects
- *
INFORMATION retrieval , *ELECTRONIC paper , *MALEIMIDES , *OLIGOMERS , *MOLECULAR structure , *HIGH resolution imaging - Abstract
Controllable synthesis of high‐performance materials with low‐cost is critical for the development of functional devices. Herein, a series of sequence‐controlled low‐dispersity oligomers with acid–base chromotropic capability via simple copolymerization between N‐substituted maleimides (NMI) and methyldiallylamine (MDAA) are synthesized. The structure and molecular weight of the oligomers are characterized. Owing to the presence of strong electrophilic carbonyl oxygen and imine nitrogen, a rapid‐responsive color switching system is achieved. This change is attributed to the acid/base‐triggered isomerization between enolate state and the enol or keto tautomer via intramolecular proton transfer. More importantly, the color of P(MDAA/NMI) is regulated by the variation of the substituent group of the N‐substituted maleimide. A new type of rewritable paper based on the designed N‐substituted maleimide oligomers is fabricated by a simple spin‐coating process, on which images with high resolution can be acid‐printed and base‐erased for over ten cycles. The writing and erasure times can be as short as 10 s and the legible time can be more than 90 days under ambient conditions. The as‐formed rewritable paper with excellent rewriting performances is low‐cost and easy for large‐scale production and may find more advanced potential applications in memory devices, rewritable labels, an sensors. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
4. Chinese Fine-Grained Geological Named Entity Recognition With Rules and FLAT.
- Author
-
Siying Chen, Weihua Hua, Xiuguo Liu, Xiaotong Deng, Xinling Zeng, and Jianchao Duan
- Subjects
- *
GEOLOGICAL modeling , *DATA mining , *RANDOM fields , *EXPERIMENTAL literature , *PROBLEM solving , *INFORMATION retrieval - Abstract
Geological named entity recognition (NER) is an essential prerequisite to realizing geological information extraction and information retrieval and is an actual means for accomplishing structured reconstruction of unstructured geological data. Existing geological NER methods mainly focus on coarse-grained geological entity recognition, but geological entities are fine-grained. To solve this problem, a Chinese fine-grained geological entity corpus encompassing 21 types of fine-grained labels is constructed. In addition, in this article, a fine-grained geological entity recognition model based on Bidirectional Encoder Representations from Transformer (BERT)-Flat-Lattice Transformer is designed. This paper names this method FGNER (Fine-grained Geological Named Entity Recognition) which adds geological naming rules to revise the model results to improve the recognition of complex geological entities. The fine-grained geological entity recognition method is evaluated using regional geological literature reports as experimental data. The experimental results show that the precision, recall, and F1-score of the FGNER model are 95.73%, 89.26%, and 92.05%, respectively, thus achieving better performance than baseline models, such as BERT-Conditional Random Field. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. ISFET‐based ion sensors with photopolymerizable membranes.
- Author
-
Abramova, Natalia and Bratov, Andrey
- Subjects
- *
POLYURETHANES , *FIELD-effect transistors , *MICROFABRICATION , *POLYVINYL chloride , *INFORMATION retrieval - Abstract
This minireview summarizes the information on the development and application of ion‐sensitive field‐effect transistors (ISFETs) with UV‐cured polyurethane ion‐selective membranes. Among the advantages of photopolymerizable polymer membranes in comparison with traditional poly(vinyl chloride) are excellent adhesion to a solid sensor surface and very fast curing time. Moreover, processes of membrane deposition and curing are compatible with ISFETs fabrication technology which is important for sensors mass production. The paper presents specific features of photopolymerizable membranes and discusses required precautions in their preparation and treatment. Examples of various analytical applications of this kind of sensors are provided. Finally, the outlook and perspectives of further development in the field of miniature solid‐state ion sensors are given. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Online variational inference on finite multivariate Beta mixture models for medical applications.
- Author
-
Manouchehri, Narges, Kalra, Meeta, and Bouguila, Nizar
- Subjects
- *
TECHNOLOGICAL innovations , *MACHINE learning , *INFORMATION retrieval , *DATA extraction , *IMAGE segmentation , *COLORECTAL cancer - Abstract
Technological advances led to the generation of large scale complex data. Thus, extraction and retrieval of information to automatically discover latent pattern have been largely studied in the various domains of science and technology. Consequently, machine learning experienced tremendous development and various statistical approaches have been suggested. In particular, data clustering has received a lot of attention. Finite mixture models have been revealed to be one of the flexible and popular approaches in data clustering. Considering mixture models, three crucial aspects should be addressed. The first issue is choosing a distribution which is flexible enough to fit the data. In this paper, a model based on multivariate Beta distributions is proposed. The two other challenges in mixture models are estimation of model's parameters and model complexity. To tackle these challenges, variational inference techniques demonstrated considerable robustness. In this paper, two methods are studied, namely, batch and online variational inferences and the models are evaluated on four medical applications including image segmentation of colorectal cancer, multi‐class colon tissue analysis, digital imaging in skin lesion diagnosis and computer aid detection of Malaria. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
7. An approach of method‐level bug localization.
- Author
-
Ni, Zhen, Bo, Lili, Li, Bin, Chen, Tianhao, Sun, Xiaobing, and Wu, Xiaoxue
- Subjects
- *
SOURCE code , *INFORMATION retrieval , *SOFTWARE engineers - Abstract
Bug localization is an important field in software engineering research. The traditional bug localization approaches based on information retrieval separate words through lexical analysis. In this way, the comments of the source code are ignored or treated as plain text, which will lose some semantic information. In this paper, MBL_SHL, an automatic Method‐level Bug Localization approach, which utilises code Summarization, Historical fixed bugs and code Length, is presented. Based on the code summarization technology, this approach first supplements the comment for uncommented code, and then calculates the Word2vec vector and Term Frequency–Inverse Document Frequency vector for the bug report, methods and comments, respectively. After that the authors calculate separately the similarity between the bug report and each method, the bug report and each comment. The code length information and historical fix information are also considered as a weight and a part of the score, respectively, to calculate the final score of each method. Finally, the scores are sorted to determine the list of methods that may need to be modified when fixing the software bugs. We built a method‐granular bug localization dataset, which contains five open‐source projects. The experimental results show that the proposed approach significantly outperforms the existing approaches on the method level. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Cross‐modal semantic correlation learning by Bi‐CNN network.
- Author
-
Wang, Chaoyi, Li, Liang, Yan, Chenggang, Wang, Zhan, Sun, Yaoqi, and Zhang, Jiyong
- Subjects
- *
SEMANTIC computing , *MACHINE learning , *INFORMATION retrieval , *DATA extraction , *DATA analysis - Abstract
Cross modal retrieval can retrieve images through a text query and vice versa. In recent years, cross modal retrieval has attracted extensive attention. The purpose of most now available cross modal retrieval methods is to find a common subspace and maximize the different modal correlation. To generate specific representations consistent with cross modal tasks, this paper proposes a novel cross modal retrieval framework, which integrates feature learning and latent space embedding. In detail, we proposed a deep CNN and a shallow CNN to extract the feature of the samples. The deep CNN is used to extract the representation of images, and the shallow CNN uses a multi‐dimensional kernel to extract multi‐level semantic representation of text. Meanwhile, we enhance the semantic manifold by constructing cross modal ranking and within‐modal discriminant loss to improve the division of semantic representation. Moreover, the most representative samples are selected by using online sampling strategy, so that the approach can be implemented on a large‐scale data. This approach not only increases the discriminative ability among different categories, but also maximizes the relativity between different modalities. Experiments on three real word datasets show that the proposed method is superior to the popular methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Error correction in data storage systems using polar codes.
- Author
-
Gerrar, Nana Kobina, Zhao, Shengmei, and Kong, Lingjun
- Subjects
- *
INFORMATION retrieval , *ERROR correction (Information theory) , *BACK up systems , *INFORMATION storage & retrieval systems , *CANCELLATION theory (Group theory) - Abstract
This paper investigates error correction in two‐dimensional (2‐D) intersymbol interference (ISI) channels using polar codes. 2‐D channels offer high data storage capacity compared to traditional one‐dimensional (1‐D) channels but suffer from greater ISI effects. Error correction codes can be used to mitigate the effects of ISI in 2‐D data storage systems and improve their performance and reliability. Polar codes achieve the symmetric capacity for the class of binary‐input discrete memoryless channels (B‐DMCs). A post‐processing technique is proposed to improve the performance of polar codes in the 2‐D ISI channel. The simplified successive cancellation (SSC) polar decoder is useful for data storage systems owing to its good waterfall region performance, low error floors and good throughput rates and is therefore adopted for the study. Simulation results indicate that the proposed technique has an error correction performance comparable to some existing decoders and has better performance than other decoding techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. Assessing CryoSat‐2 Antarctic Snow Freeboard Retrievals Using Data From ICESat‐2.
- Author
-
Fons, S. W., Kurtz, N. T., Bagnardi, M., Petty, A. A., and Tilling, R. L.
- Subjects
- *
SEA ice , *LASER altimeters , *INFORMATION retrieval , *ANTARCTIC ice , *STANDARD deviations , *DATA binning - Abstract
NASA's Ice, Cloud, and land Elevation Satellite‐2 (ICESat‐2) laser altimeter launched in Fall 2018, providing an invaluable addition to the polar altimetry record generated by ESA's CryoSat‐2 radar altimeter. The simultaneous operation of these two satellite altimeters enables unique comparison studies of sea ice altimetry, utilizing the different frequencies and profiling strategies of the two instruments. Here, we use freeboard data from ICESat‐2 to assess Antarctic snow freeboard retrievals from CryoSat‐2. We first discuss updates made to a previously published CryoSat‐2 retrieval process and show how this Version 2 algorithm improves upon the original method by comparing the new retrievals to ICESat‐2 in specific along‐track profiles as well as on the basin‐scale. In two near‐coincident along‐track profiles, we find mean snow freeboard differences (standard deviations of differences) of 0.3 (9.3) and 7.6 cm (9.6 cm) with 25 km binned correlation coefficients of 0.77 and 0.89. Monthly mean freeboard differences range between −2.9 (10.8) and 6.6 cm (16.8 cm) basin wide, with the largest differences typically occurring in Austral fall months that is hypothesized to be related to new ice growth and the use of static snow backscatter coefficients in the retrieval. Monthly mean correlation coefficients range between 0.57 and 0.80. While coincident data show good agreement between the two sensors, they highlight issues related to geometric and frequency sampling differences that can impact the freeboard distributions. Plain Language Summary: Measuring sea ice freeboard from space is an important first step in estimating its thickness. A previous study had developed a new method of measuring freeboard over Antarctic sea ice using ESA's CryoSat‐2 altimeter, however, few validation data existed at the time to determine how well it performed. In this paper, we improve the CryoSat‐2 processing and make use of data from NASA's ICESat‐2 altimeter for comparisons with the CryoSat‐2 data. While agreement is strong overall, there are still differences between the measurements that we hypothesize come from the different footprint sizes and wavelengths of the two instruments. Key Points: We present an updated CryoSat‐2 Antarctic snow freeboard retrieval methodThese improved CryoSat‐2 snow freeboard retrievals show strong agreement with ICESat‐2 data both along‐track and basin‐wideThis assessment highlights difficulties in laser‐radar comparisons brought on by frequency and geometric sampling discrepancies [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. Linking data from a large clinical trial with the Australian Cerebral Palsy Register.
- Author
-
Shepherd, Emily, Mcintyre, Sarah, Smithers‐Sheedy, Hayley, Ashwood, Pat, Sullivan, Thomas R, Velde, Anna, Doyle, Lex W, Makrides, Maria, Middleton, Philippa, Crowther, Caroline A, Smithers-Sheedy, Hayley, and Te Velde, Anna
- Subjects
- *
CEREBRAL palsy , *CHILDREN with cerebral palsy , *CLINICAL trials , *MAGNESIUM sulfate , *GESTATIONAL age , *DROOLING , *CEREBRAL palsy prevention , *ACQUISITION of data , *INFORMATION retrieval - Abstract
Aim: To link data from a large maternal perinatal trial with the Australian Cerebral Palsy Register (ACPR) to identify children with cerebral palsy (CP).Method: Deidentified data from the Australasian Collaborative Trial of Magnesium Sulphate (ACTOMgSO4 ) and the ACPR were linked. Children born from 1996 to 2000 at Australian hospitals who survived and had 2-year paediatric assessments were included. Children identified with CP in: (1) both the ACTOMgSO4 (2y) and the ACPR (5y), (2) the ACTOMgSO4 only, and (3) the ACPR only were compared.Results: We included 913 children (492 males, 421 females; mean gestational age at birth 27.8wks [standard deviation 2.1wks]; range 23.0-40.0wks). Eighty-four children received a CP diagnosis: 35 by the ACTOMgSO4 and the ACPR, 29 by the ACTOMgSO4 only, and 20 by the ACPR only. The ACTOMgSO4 diagnosed 76.2% (95% confidence interval [CI] 65.9-84.1) and the ACPR identified 65.5% (95% CI 54.7-74.9). Children born in states/territories with long-standing versus more recently established registers were more likely to be included on the ACPR (p<0.05).Interpretation: Linking deidentified perinatal trial data with the ACPR was achieved. Limitations of both strategies for identifying children with CP in this era (late 1990s and early 2000s) probably explain many of the differences observed, and inform future linkage studies and evaluations of CP-preventive interventions.What This Paper Adds: Randomized trial data were linked with the Australian Cerebral Palsy Register. Trial (2y) and register (up to 5y) diagnoses of cerebral palsy (CP) differed. States with long-standing registers were more likely to include children with CP. [ABSTRACT FROM AUTHOR]- Published
- 2020
- Full Text
- View/download PDF
12. A topic‐based term frequency normalization framework to enhance probabilistic information retrieval.
- Author
-
Jian, Fanghong, Huang, Jimmy X., Zhao, Jiashu, Ying, Zhiwei, and Wang, Yuqi
- Subjects
- *
INFORMATION retrieval , *INFORMATION modeling , *TERMS & phrases - Abstract
Many well‐known probabilistic information retrieval models have shown promise for use in document ranking, especially BM25. Nevertheless, it is observed that the control parameters in BM25 usually need to be adjusted to achieve improved performance on different data sets; additionally, the assumption in BM25 on the bag‐of‐words model prevents its direct utilization of rich information that lies at the sentence or document level. Inspired by the above challenges with respect to BM25, we first propose a new normalization method on the term frequency in BM25 (called BM25QL in this paper); in addition, the method is incorporated into CRTER2, a recent BM25‐based model, to construct CRTER2QL. Then, we incorporate topic modeling and word embedding into BM25 to relax the assumption of the bag‐of‐words model. In this direction, we propose a topic‐based retrieval model, TopTF, for BM25, which is then further incorporated into the language model (LM) and the multiple aspect term frequency (MATF) model. Furthermore, an enhanced topic‐based term frequency normalization framework, ETopTF, based on embedding is presented. Experimental studies demonstrate the great effectiveness and performance of these methods. Specifically, on all tested data sets and in terms of the mean average precision (MAP), our proposed models, BM25QL and CRTER2QL, are comparable to BM25 and CRTER2 with the best b parameter value; the TopTF models significantly outperform the baselines, and the ETopTF models could further improve the TopTF in terms of the MAP. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
13. ArA*summarizer: An Arabic text summarization system based on subtopic segmentation and using an A* algorithm for reduction.
- Author
-
Bahloul, Belahcene, Aliane, Hassina, and Benmohammed, Mohamed
- Subjects
- *
NATURAL language processing , *GRAPH algorithms , *ALGORITHMS , *INFORMATION retrieval , *GRAPH theory , *HYBRID systems - Abstract
Automatic text summarization is a field situated at the intersection of natural language processing and information retrieval. Its main objective is to automatically produce a condensed representative form of documents. This paper presents ArA*summarizer, an automatic system for Arabic single document summarization. The system is based on an unsupervised hybrid approach that combines statistical, cluster‐based, and graph‐based techniques. The main idea is to divide text into subtopics then select the most relevant sentences in the most relevant subtopics. The selection process is done by an A* algorithm executed on a graph representing the different lexical–semantic relationships between sentences. Experimentation is conducted on Essex Arabic summaries corpus and using recall‐oriented understudy for gisting evaluation, automatic summarization engineering, merged model graphs, and n‐gram graph powered evaluation via regression evaluation metrics. The evaluation results showed the good performance of our system compared with existing works. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. Swat: A system for detecting salient Wikipedia entities in texts.
- Author
-
Ponza, Marco, Ferragina, Paolo, and Piccinno, Francesco
- Subjects
- *
NATURAL language processing , *LATENT semantic analysis - Abstract
We study the problem of entity salience by proposing the design and implementation of Swat, a system that identifies the salient Wikipedia entities occurring in an input document. Swat consists of several modules that are able to detect and classify on‐the‐fly Wikipedia entities as salient or not, based on a large number of syntactic, semantic, and latent features properly extracted via a supervised process, which has been trained over millions of examples drawn from the New York Times corpus. The validation process is performed through a large experimental assessment, eventually showing that Swat improves known solutions over all publicly available datasets. We release Swat via an API that we describe and comment in the paper to ease its use in other software. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Direct information retrieval after 3D reconstruction in grating‐based X‐ray phase‐contrast computed tomography.
- Author
-
Wu, Zhao, Gao, Kun, Wang, Zhili, Wei, Chenxi, Wali, Faiz, Zan, Guibin, Wei, Wenbin, Zhu, Peiping, and Tian, Yangchao
- Subjects
- *
IMAGE reconstruction , *INFORMATION retrieval , *COMPUTED tomography , *THREE-dimensional imaging , *ABSORPTION - Abstract
Grating‐based X‐ray differential phase‐contrast imaging has attracted a great amount of attention and has been considered as a potential imaging method in clinical medicine because of its compatibility with the traditional X‐ray tube source and the possibility of a large field of view. Moreover, phase‐contrast computed tomography provides three‐dimensional phase‐contrast visualization. Generally, two‐dimensional information retrieval performed on every projection is required prior to three‐dimensional reconstruction in phase‐contrast computed tomography. In this paper, a three‐dimensional information retrieval method to separate absorption and phase information directly from two reconstructed images is derived. Theoretical derivations together with numerical simulations have been performed to confirm the feasibility and veracity of the proposed method. The advantages and limitations compared with the reverse projection method are also discussed. Owing to the reduced data size and the absence of a logarithm operation, the computational time for information retrieval is shortened by the proposed method. In addition, the hybrid three‐dimensional images of absorption and phase information were reconstructed using an absorption reconstruction algorithm, hence the existing data pre‐processing methods and iterative reconstruction algorithms in absorption reconstruction may be utilized in phase reconstruction immediately. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
16. A parallel computational approach for similarity search using Bloom filters.
- Author
-
Chauhan, Sachendra Singh and Batra, Shalini
- Subjects
- *
DATA science , *INFORMATION retrieval , *GRAPHICS processing units , *INTEGERS , *COEFFICIENTS (Statistics) - Abstract
Abstract: Finding similar items in a large and unstructured dataset is a challenging task in many applications of data science, such as searching, indexing, and retrieval. With the increasing data volume and demand for real time responses, similarity search has gained much consideration. In this paper, a parallel computational approach for similarity search using Bloom filters (PCASSB) has been proposed, which uses Bloom filter for the representation of features of document and comparison with user's query. Query features are stored in integer query array (IQA), an array of integer. The PCASSB, an approximate similarity search technique, has been implemented on graphics processing unit with compute unified device architecture as the programming platform. To compute the similarity score between query and reference dataset, Dice coefficient has been used as a baseline method. The accuracy of the results generated by PCASSB is compared with the baseline method and other state‐of‐the‐art methods. The experimental results show that the proposed technique is quite effective in processing large number of text documents as it takes less computational time. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. Modeling multiple interactions with a Markov random field in query expansion for session search.
- Author
-
Li, Jingfei, Zhao, Xiaozhao, Zhang, Peng, and Song, Dawei
- Subjects
- *
MARKOV processes , *SEARCH engines , *MARKOV random fields , *QUERY (Information retrieval system) , *ARTIFICIAL intelligence , *INFORMATION retrieval - Abstract
Abstract: How to automatically understand and answer users' questions (eg, queries issued to a search engine) expressed with natural language has become an important yet difficult problem across the research fields of information retrieval and artificial intelligence. In a typical interactive Web search scenario, namely, session search, to obtain relevant information, the user usually interacts with the search engine for several rounds in the forms of, eg, query reformulations, clicks, and skips. These interactions are usually mixed and intertwined with each other in a complex way. For the ideal goal, an intelligent search engine can be seen as an artificial intelligence agent that is able to infer what information the user needs from these interactions. However, there still exists a big gap between the current state of the art and this goal. In this paper, in order to bridge the gap, we propose a Markov random field–based approach to capture dependence relations among interactions, queries, and clicked documents for automatic query expansion (as a way of inferring the information needs of the user). An extensive empirical evaluation is conducted on large‐scale web search data sets, and the results demonstrate the effectiveness of our proposed models. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
18. A case study of duplications detection for educational domain thorough ad hoc search and identification NLP-based method.
- Author
-
Mikhaylov, S.N., Chuikova, V.V., Sokolova, Marina V., and Potapenko, A.M.
- Subjects
- *
NATURAL language processing , *ELECTRONIC records , *WEB services , *TEACHING , *EDUCATION , *COMPUTER software - Abstract
During the organization and planning of lecture courses for a discipline, its content may be overlapped and partially delivered in more than one course. Sometimes this action causes time loss through unnecessary repeating. This paper introduces an automated tool for duplications detections adapting methods of natural language processing used for Web search. The experiment for unstructured electronic document repositories clustering for thematic duplicate identification in different documents in the case of educational domain is presented. A prototype of this Web service-based software search engine is being designed and discussed. The experiment aimed to identify thematic duplicates of various courses within one of the teaching disciplines is also presented. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
19. EMOTAG: AN APPROACH TO AUTOMATED MARKUP OF EMOTIONS IN TEXTS.
- Author
-
Francisco, Virginia and Gervás, Pablo
- Subjects
- *
EMOTIONS , *ONTOLOGY , *ALGORITHMS , *ABSTRACT thought , *NATURAL language processing , *INFORMATION retrieval - Abstract
This paper presents an approach to the automated markup of texts with emotional labels. The approach considers two possible representations of emotions in parallel: emotional categories (emotional tags used to refer to emotions) and emotional dimensions (measures that try to model the essential aspects of emotions numerically). For each representation, a corpus of example texts previously annotated by human evaluators is mined for an initial assignment of emotional features to words. This results in a list of emotional words (LEW) which becomes a useful resource for later automated markup. The algorithm proposed for the automated markup of text closely mirrors the steps taken during feature extraction, employing a combination of the LEW resource and the ANEW word list for the actual assignment of emotional features, and WordNet for knowledge-based expansion of words not occurring in either and an ontology of emotional categories. The algorithm for automated markup is tested and the results are discussed with respect to three main issues: the relative adequacy of each of the representations used, correctness and coverage of the proposed algorithm, and additional techniques and solutions that may be employed to improve the results. The average percentage o f success obtained by our approach when it marks up with emotional dimensions is around 80% and when it marks up with emotional categories is around 50%. The main contribution of the approach presented in this paper is that it allows dimensions and categories at different levels of abstraction to operate simultaneously during markup. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
20. Aggressive pruning strategy for time series retrieval using a multi-resolution representation based on vector quantization coupled with discrete wavelet transform.
- Author
-
Muhammad Fuad, Muhammad Marwan
- Subjects
- *
TIME series analysis , *DISCRETE wavelet transforms , *VECTOR quantization , *INFORMATION retrieval , *SEARCH algorithms - Abstract
Time series representation methods are widely used to handle time series data by projecting them onto low-dimensional spaces where queries are processed. Multi-resolution representation methods speed up the similarity search process by using pre-computed distances, which are calculated and stored at the indexing stage and then used at the query stage, together with filters in the form of exclusion conditions. In this paper, we present a new multi-resolution representation method that combines the Haar wavelet-based multi-resolution method with vector quantization to maximize the pruning power of the similarity search algorithm. The new method is validated through extensive experiments on different datasets from several time series repositories. The results obtained prove the efficiency of the new method. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
21. The History of ASIS&T and Information Science and Technology.
- Author
-
Miller, Karen
- Subjects
- *
INFORMATION science associations , *PROFESSIONAL associations , *CLASSIFICATION , *CONFERENCES & conventions , *INFORMATION retrieval , *ORAL history , *SPECIAL days , *HISTORY - Abstract
A highlight of the 2012 ASIS&T Annual Meeting, the pre-conference session on the History of ASIS&T and Information Science and Technology Worldwide drew presenters and attendees from around the globe. The day featured papers on four historical themes, starting with the institutional roots of ASIS&T and recognizing decades of research presented in the Annual Review of Information Science and Technology. The evolution of the field was apparent through a review of information revolutions prompted by the printing press, the post-World War II information crisis and the Internet, as well as through presentations on digital curation,ongoing work on relevance, sense-making theory and developments from Croatia to France.Discussion of the historical contexts of technology innovations and impacts consideredphotographic documentary techniques, binary computing and networking standards. The development of foundational ideas was explored through presentations on pioneering document indexing methods, the semantic challenge of term-oriented retrieval, early European perceptions of classification systems and the French view of communication and information science. Efforts to deepen the historical understanding of information science and technology will continue through oral history interviews, funded research and awards for outstanding papers [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
22. Hierarchy and concentration in the American urban system of technological advance* Hierarchy and concentration in the American urban system of technological advance.
- Author
-
Maliszewski, Paul J. and hUallacháin, Breandán Ó
- Subjects
- *
CITIES & towns , *METROPOLITAN areas , *POWER law (Mathematics) , *DISAGGREGATED data , *SEMICONDUCTORS , *X-rays , *INFORMATION retrieval - Abstract
This paper investigates aspects of the urban hierarchy and concentration of patent sub-categories in United States metropolitan areas by estimating Zipf, Gini and Moran's I coefficients. Results do not support a power law depiction of the location of disaggregate patenting in the entire metropolitan system. The most concentrated and hierarchical patent technologies are computer hardware and software, computer peripherals, information storage, communications, surgery and medical instruments, nuclear and x-rays, semiconductor devices, optics and organic compounds. Technologies are cross-classified, which reveals aspects of variety in locational patterns and offers clues into systems of knowledge exchange in urban-based technological advance. Resumen Este artículo investiga los aspectos de la jerarquía urbana y la concentración de subcategorías de patentes en áreas metropolitanas de los Estados Unidos, mediante la estimación de los coeficientes de Zipf, Gini, y la I de Moran. Los resultados no siguen una ley de potencias para la localización de patentes desagregadas en la totalidad del sistema metropolitano. Las tecnologías de patentes más concentradas y jerárquicas son las de hardware y software, periféricos informáticos, almacenamiento de información, comunicaciones, cirugía e instrumentos médicos, radiología (rayos X, nuclear), dispositivos semiconductores, óptica y compuestos orgánicos. Las tecnologías muestran una clasificación transversal, que revela aspectos de diversidad en los patrones de localización y ofrece indicios sobre los sistemas de intercambio de conocimientos en el avance tecnológico de carácter urbano. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
23. Multi-level reranking approach for bug localization.
- Author
-
Kılınç, Deniz, Yücalar, Fatih, Borandağ, Emin, and Aslan, Ersin
- Subjects
- *
COMPUTER viruses , *COMPUTER software quality control , *INFORMATION retrieval research , *COMPUTER programming , *OPEN source software - Abstract
Bug fixing has a key role in software quality evaluation. Bug fixing starts with the bug localization step, in which developers use textual bug information to find location of source codes which have the bug. Bug localization is a tedious and time consuming process. Information retrieval requires understanding the programme's goal, coding structure, programming logic and the relevant attributes of bug. Information retrieval (IR) based bug localization is a retrieval task, where bug reports and source files represent the queries and documents, respectively. In this paper, we propose BugCatcher, a newly developed bug localization method based on multi-level re-ranking IR technique. We evaluate BugCatcher on three open source projects with approximately 3400 bugs. Our experiments show that multi-level reranking approach to bug localization is promising. Retrieval performance and accuracy of BugCatcher are better than current bug localization tools, and BugCatcher has the best Top N, Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR) values for all datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
24. THE NAME OF THE ROSE: A REVIEW OF IDEAS ON THE EUROPEAN BIAS IN ANGIOSPERM CLASSIFICATION.
- Author
-
Walters, S. M.
- Subjects
- *
ANGIOSPERMS , *ROSES , *ETHNOBIOLOGY , *BIOLOGY in folklore , *PLANT classification - Abstract
This 'Tansley Review' paper takes as its starting point a paper of mine published in 1961, which explored the shape and size of modern Angiosperm families and genera as a product of taxonomic practice over centuries. It considers how far the conclusion - that our existing scientific classification, based on Linnaeus' masterly standardization in the eighteenth century, is very markedly 'Eurocentric' - has been criticized, accepted or modified by subsequent writers. In particular, it assesses the important contribution made in recent years by ethnobiologists using expert knowledge from social anthropology, linguistics and other disciplines. Finally, the paper considers briefly the broader aspects of current controversies about the nature and purpose of taxonomic activity, including the argument about its relevance to evolutionary knowledge and speculation. It concludes that, to a remarkable degree, practising taxonomists ignore conceptual or philosophical difficulties and are able to co-operate, although their leisurely and arcane procedures may not measure up to the 'information explosion'. [ABSTRACT FROM AUTHOR]
- Published
- 1986
- Full Text
- View/download PDF
25. Structured Methods for reproducible science.
- Author
-
Polychronidou, Maria
- Subjects
- *
DOCUMENTATION , *INFORMATION services , *RECORDS management , *INFORMATION retrieval , *GUIDELINES - Abstract
Detailed and accurate documentation of the reagents, tools and methods used in a study is key for reproducible science. However, the information provided in the Materials and Methods section is not always sufficiently detailed to allow for the adoption of methodologies across laboratories. Substantial time and effort, as well as extensive correspondence with the authors of a published paper, is often required in order to obtain all the relevant information related to a particular technique. Even after following a trail of references that frequently lead to a paper published decades ago, it is sometimes impossible to find a sufficiently detailed description of a technique “performed as described before”. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
26. A Case-Based Reasoning system for complex medical diagnosis.
- Author
-
Chattopadhyay, Subhagata, Banerjee, Suvendu, Rabhi, Fethi A., and Acharya, U. Rajendra
- Subjects
- *
CASE-based reasoning , *COMPUTER diagnostic software , *PREMENSTRUAL syndrome , *INFORMATION retrieval , *GYNECOLOGY , *PSYCHIATRY , *EUCLIDEAN distance , *DIAGNOSIS - Abstract
A Case-Based Reasoning (CBR) system for medical diagnosis mimics the way doctors make a diagnosis. Given a new case, its accuracy in practice depends on successful retrieval of similar cases. CBR systems have had some success in dealing with simple diseases because of the robustness of their case base. However, their diagnostic accuracy suffers when dealing with complex diseases particularly those that involve multiple domains in medicine. An example of such a condition is Premenstrual syndrome (PMS) as it falls under both gynaecology and psychiatry. To address this issue, the paper proposes a CBR-based expert system that uses the K-nearest neighbour (KNN) algorithm to search k similar cases based on the Euclidean distance measure. The novelty of the system is in the design of a flexible auto-set tolerance (T), which serves as a threshold to extract cases for which similarities are greater than the assigned value of T. A prototype software tool with a menu-driven Graphical User Interface (GUI) has been developed for case input, analysis of results, and case adaptation within the system. Finally, the performance of the tool has been checked on a set of real-world PMS cases. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
27. Evolving and Emerging Research Methods: 2012 ASIS&T SIG/USE Symposium.
- Author
-
Mon, Lorri and Williamson, Jeanine
- Subjects
- *
INFORMATION science associations , *CONFERENCES & conventions , *ENDOWMENTS , *HOMELESS persons , *INFORMATION retrieval , *RESEARCH , *INFORMATION-seeking behavior , *SOCIAL media - Abstract
SIG/USE celebrated members' work and achievements in research methods at the 2012 ASIS&T Annual Meeting through a keynote speech, brief talks and awards. In her opening talk, professor Lisa Given challenged attendees to expand research methods to engage participants more fully, include qualitative findings and explore information behavior in nontraditional media. The topics of two-minute lightning talks ranged from research techniques for exploring young people's information behavior to imaging brain activity related to relevance decisions. Others addressed incorporating mobile technologies in research, collaborative search, direct interaction with participants in contrast with background server log checks and working with data collections from social Q&A sites. Following the talks, small group discussions further explored topics such as cognitive approaches, content analysis and text analytics and usability. The symposium wound to a close with the 2011 Elfreda A. Chatman Research Award presentation on information strategies of the homeless and a preview of the 2012 proposed study on refugees' information seeking. Additional awards were presented for best paper and poster to support travel and further presentations. Pertti Vakkari was recognized for outstanding contributions to information behavior research and was inducted into the SIG/USE Academy of Fellows. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
28. The 23rd Annual SIG/CR Classification Research Workshop: A Report.
- Author
-
Furner, Jonathan
- Subjects
- *
CLASSIFICATION , *INFORMATION retrieval , *MATHEMATICAL models , *METADATA , *ADULT education workshops , *THEORY - Abstract
The 23rd SIG/CR workshop on classification research featured papers, lightning talks, brief presentations of doctoral projects and two keynote talks, all exploring what's new in the field. Under the theme of new approaches with a historical focus, presenters explored novel theories, models and applications, approaches to building classificatory structures, methods and criteria for evaluation and much more. Classification theory, concepts and terminology were considered from a historical perspective, and new theories and changes in conceptualization and classification structures were raised. Modern perspectives on classification include folksonomies, personal classification practices, power structures captured through classification and the limitations of standardization. Researchers discussed cognitive processes involved in classifying, the evolution of concepts associated with terms and sources for new terms in a domain. Through the variety of presentations, it was clear that classification encompasses a broad array of topics, ultimately serving information retrieval and access. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
29. AN ASPECT QUERY LANGUAGE MODEL BASED ON QUERY DECOMPOSITION AND HIGH-ORDER CONTEXTUAL TERM ASSOCIATIONS.
- Author
-
Song, Dawei, Huang, Qiang, Bruza, Peter, and Lau, Raymond
- Subjects
- *
QUERY languages (Computer science) , *INFORMATION retrieval , *ASSOCIATION rule mining , *DOCUMENT selection , *INFORMATION resources management - Abstract
In information retrieval (IR) research, more and more focus has been placed on optimizing a query language model by detecting and estimating the dependencies between the query and the observed terms occurring in the selected relevance feedback documents. In this paper, we propose a novel Aspect Language Modeling framework featuring term association acquisition, document segmentation, query decomposition, and an Aspect Model (AM) for parameter optimization. Through the proposed framework, we advance the theory and practice of applying high-order and context-sensitive term relationships to IR. We first decompose a query into subsets of query terms. Then we segment the relevance feedback documents into chunks using multiple sliding windows. Finally we discover the higher order term associations, that is, the terms in these chunks with high degree of association to the subsets of the query. In this process, we adopt an approach by combining the AM with the Association Rule (AR) mining. In our approach, the AM not only considers the subsets of a query as 'hidden' states and estimates their prior distributions, but also evaluates the dependencies between the subsets of a query and the observed terms extracted from the chunks of feedback documents. The AR provides a reasonable initial estimation of the high-order term associations by discovering the associated rules from the document chunks. Experimental results on various TREC collections verify the effectiveness of our approach, which significantly outperforms a baseline language model and two state-of-the-art query language models namely the Relevance Model and the Information Flow model. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
30. Representation Recovers Information.
- Author
-
Thornton, Chris
- Subjects
- *
COGNITIVE science , *REPRESENTATION (Philosophy) , *INFORMATION retrieval , *REASONING , *THOUGHT & thinking , *BINARY principle (Linguistics) , *EMPIRICISM , *RATIONALISM , *COGNITION - Abstract
Early agreement within cognitive science on the topic of representation has now given way to a combination of positions. Some question the significance of representation in cognition. Others continue to argue in favor, but the case has not been demonstrated in any formal way. The present paper sets out a framework in which the value of representation use can be mathematically measured, albeit in a broadly sensory context rather than a specifically cognitive one. Key to the approach is the use of Bayesian networks for modeling the distal dimension of sensory processes. More relevant to cognitive science is the theoretical result obtained, which is that a certain type of representational architecture is necessary for achievement of sensory efficiency. While exhibiting few of the characteristics of traditional, symbolic encoding, this architecture corresponds quite closely to the forms of embedded representation now being explored in some embedded/embodied approaches. It becomes meaningful to view that type of representation use as a form of information recovery. A formal basis then exists for viewing representation not so much as the substrate of reasoning and thought, but rather as a general medium for efficient, interpretive processing. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
31. A knowledge acquisition methodology to ontology construction for information retrieval from medical documents.
- Author
-
Valencia-García, Rafael, Fernández-Breis, Jesualdo Tomás, Ruiz-Martínez, Juana María, García-Sánchez, Francisco, and Martínez-Béjar, Rodrigo
- Subjects
- *
MANAGEMENT science research , *MEDICAL informatics , *COMPUTERS in medicine , *MEDICAL records , *INFORMATION retrieval software , *TEXT mining , *INFORMATION storage & retrieval systems - Abstract
Vast amounts of medical information reside within text documents, so that the automatic retrieval of such information would certainly be beneficial for clinical activities. The need for overcoming the bottleneck provoked by the manual construction of ontologies has generated several studies and research on obtaining semi-automatic methods to build ontologies. Most techniques for learning domain ontologies from free text have important limitations. Thus, they can extract concepts so that only taxonomies are generally produced although there are other types of semantic relations relevant in knowledge modelling. This paper presents a language-independent approach for extracting knowledge from medical natural language documents. The knowledge is represented by means of ontologies that can have multiple semantic relationships among concepts. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
32. CONTEXTUALIZING LEARNING OBJECTS USING ONTOLOGIES.
- Author
-
Mohammed, Phaedra and Mohan, Permanand
- Subjects
- *
ONTOLOGIES (Information retrieval) , *COMPUTER assisted instruction , *COMPUTERS in education , *DATA structures , *INFORMATION retrieval , *EDUCATIONAL technology - Abstract
Educational research over the past three years has intensified such that the context of learning resources needs to be properly modeled. Many researchers have described and even mandated the use of ontologies in the research being conducted, yet the process of actually connecting one or more ontologies to a learning object has not been extensively discussed. This paper describes a practical model for associating multiple ontologies with learning objects while making full use of the IEEE LOM specification. The model categorizes these ontologies according to five major categories of context based on the most popular fields of study actively being pursued by the educational research community: Thematic context, Pedagogical context, Learner context, Organizational context, and Historical/Statistical context. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
33. THE DATALOGDL COMBINATION OF DEDUCTION RULES AND DESCRIPTION LOGICS.
- Author
-
JING MEI, ZUOQUAN LIN, BOLEY, HAROLD, JIE LI, and BHAVSAR, VIRENDRAKUMAR C.
- Subjects
- *
ONTOLOGIES (Information retrieval) , *PROGRAMMING languages , *COMPUTER logic , *COMPUTER algorithms , *INFORMATION retrieval , *COMPUTER science - Abstract
Uniting ontologies and rules has become a central topic in the Semantic Web. Bridging the discrepancy between these two knowledge representations, this paper introduces DatalogDL as a family of hybrid languages, where Datalog rules are parameterized by various DL (description logic) languages ranging from to . Making Datalog DL a decidable system with complexity of EXPTIME, we propose independent properties in the DL body as the restriction to hybrid rules, and weaken the safeness condition to balance the trade-off between expressivity and reasoning power. Building on existing well-developed techniques, we present a principled approach to enrich (RuleML) rules with information from (OWL) ontologies, and develop a prototype system combining a rule engine (OO jDREW) with a DL reasoner (RACER). [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
34. CASE-BASED REASONING AND KNOWLEDGE DISCOVERY IN MEDICAL APPLICATIONS WITH TIME SERIES.
- Author
-
Funk, Peter and Xiong, Ning
- Subjects
- *
CASE-based reasoning , *DATA mining , *TIME series analysis , *CLINICAL medicine , *BAYESIAN analysis , *INFORMATION retrieval - Abstract
This paper discusses the role and integration of knowledge discovery (KD) in case-based reasoning (CBR) systems. The general view is that KD is complementary to the task of knowledge retaining and it can be treated as a separate process outside the traditional CBR cycle. Unlike knowledge retaining that is mostly related to case-specific experience, KD aims at the elicitation of new knowledge that is more general and valuable for improving the different CBR substeps. KD for CBR is exemplified by a real application scenario in medicine in which time series of patterns are to be analyzed and classified. As single pattern cannot convey sufficient information in the application, sequences of patterns are more adequate. Hence it is advantageous if sequences of patterns and their co-occurrence with categories can be discovered. Evaluation with cases containing series classified into a number of categories and injected with indicator sequences shows that the approach is able to identify these key sequences. In a clinical applica- tion and a case library that is representative of the real world, these key sequences would improve the classification ability and may spawn clinical research to explain the co-occurrence between certain sequences and classes. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
35. ACCOUNTING FOR THE TEMPORAL DIMENSION IN CASE-BASED RETRIEVAL: A FRAMEWORK FOR MEDICAL APPLICATIONS.
- Author
-
Montani, Stefania and Portinale, Luigi
- Subjects
- *
TIME series analysis software , *CASE-based reasoning , *CLINICAL medicine , *HEMODIALYSIS , *INFORMATION retrieval , *INTENSIVE care units , *THERAPEUTICS - Abstract
Time-varying information embedded in cases has often been neglected and its role oversimplified in case-based reasoning systems. In several real-world problems, and in particular in medical applications, a case should capture the evolution of the observed phenomenon over time. To this end, we propose to represent temporal information at two levels: (1) at the case level, when some features are collected in the form of time series, because they describe parameters varying within a period of time (which corresponds to the case duration), and we aim at analyzing the system behavior within the case duration interval itself; (2) at the history level, when we are interested in reconstructing the evolution of the system by retrieving temporally related cases. In this paper, we describe a framework for case representation and retrieval that is able to take into account the temporal dimension, and is meant to be used in any time dependent domain, which is particularly well suited for medical applications. To support case retrieval, we provide an analysis of similarity-based time series retrieval techniques; to support history retrieval, we introduce possible ways to summarize the case content, together with the corresponding strategies for identifying similar instances in the knowledge base. A concrete application of our framework is represented byRhene, a system for intelligent retrieval in the hemodialysis domain. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
36. TREC: Improving Information Access through Evaluation.
- Author
-
Voorhees, Ellen M.
- Subjects
- *
FORUMS , *INFORMATION retrieval , *INFORMATION science , *INFORMATION storage & retrieval systems , *INFORMATION resources , *INFORMATION technology - Abstract
This article describes the aims and objectives of the Text REtrieval Conference (TREC) workshops and its impact in the areas of retrieval system effectiveness, retrieval system evaluation, and support of new retrieval tasks. The first TREC workshop was held in November 1992, and there has been a workshop held annually since then. The cumulative effort represented by the TREC is significant. Approximately 250 distinct groups representing more than 20 different countries have participated in at least one TREC, thousands of individual retrieval experiments have been performed and hundreds of papers have been published in the TREC proceedings. The TREC's impact on information retrieval research has been equally significant. A variety of large test collections have been built for both traditional ad hoc retrieval and new tasks such as cross-language retrieval, speech retrieval and question answering. The TREC has standardized the evaluation methodology used to assess the quality of retrieval results and, through the large repository of retrieval runs, demonstrated both the validity and efficacy of the methodology.
- Published
- 2005
37. CBPOP: A Domain-Independent Multi-Case Reuse Planner.
- Author
-
Britanik, J. and Marefat, M.
- Subjects
- *
PLANNING , *COMPUTATIONAL intelligence , *PLANNERS , *ARTIFICIAL intelligence , *INFORMATION retrieval - Abstract
The reuse of multiple cases to solve a single planning problem presents a promise of better utilization of past experience over single-reuse planning, which can lead to better planning performance. In this paper, we present the theory and implementation of CBPOP, and show how it addresses the multi-reuse planning problems. In particular, we present novel approaches to retrieval and refitting. We also explore the difficult issue of when to retrieve in multi-reuse scenarios, and we empirically compare the results of several solutions we propose. Results from our experiments show that the best ranking function for pure generative planning is not necessarily the best ranking function for multi-reuse planning. The surprising result in the reuse scenarios is that the single-goal case library performed better than larger case libraries consisting of solutions to multi-goal problems. [ABSTRACT FROM AUTHOR]
- Published
- 2004
- Full Text
- View/download PDF
38. Data-driven approaches to information access
- Author
-
Dumais, Susan
- Subjects
- *
LATENT structure analysis , *INFORMATION retrieval - Abstract
This paper summarizes three lines of research that are motivated by the practical problem of helping users find information from external data sources, most notably computers. The application areas include information retrieval, text categorization, and question answering. A common theme in these applications is that practical information access problems can be solved by analyzing the statistical properties of words in large volumes of real world texts. The same statistical properties constrain human performance, thus we believe that solutions to practical information access problems can shed light on human knowledge representation and reasoning. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
39. Sequential sampling models of human text classification
- Author
-
Lee, Michael D. and Corlett, Elissa Y.
- Subjects
- *
INFORMATION storage & retrieval systems , *RANDOM walks - Abstract
Text classification involves deciding whether or not a document is about a given topic. It is an important problem in machine learning, because automated text classifiers have enormous potential for application in information retrieval systems. It is also an interesting problem for cognitive science, because it involves real world human decision making with complicated stimuli. This paper develops two models of human text document classification based on random walk and accumulator sequential sampling processes. The models are evaluated using data from an experiment where participants classify text documents presented one word at a time under task instructions that emphasize either speed or accuracy, and rate their confidence in their decisions. Fitting the random walk and accumulator models to these data shows that the accumulator provides a better account of the decisions made, and a “balance of evidence” measure provides the best account of confidence. Both models are also evaluated in the applied information retrieval context, by comparing their performance to established machine learning techniques on the standard Reuters-21578 corpus. It is found that they are almost as accurate as the benchmarks, and make decisions much more quickly because they only need to examine a small proportion of the words in the document. In addition, the ability of the accumulator model to produce useful confidence measures is shown to have application in prioritizing the results of classification decisions. [Copyright &y& Elsevier]
- Published
- 2003
- Full Text
- View/download PDF
40. Research for Informed Decisions: An Evolving Model of Applied Research.
- Author
-
Nye, F. Ivan
- Subjects
- *
FAMILIES , *DECISION making , *DECISION theory , *PROBLEM solving , *EDUCATORS , *INFORMATION retrieval - Abstract
In a society which is not only complex and rapidly changing, but which has lost much of its normative structure, individuals, families, and other groups are often confronted with decisions which may have major effects on them for many years and may be without adequate information on which to base these decisions. In most instances there is much information which may help them make decisions consistent with their own values and goals. This paper proposes a model through which researcher/educators can identify critical issues which confront large numbers of persons in the society, find the facts from research relevant to the issues, integrate these findings, and, if necessary, restate them in language appropriate to the persons affected. It also presents innovative ways of disseminating this information to those faced with decisions concerning the issues. The material employed in this evolving research model is drawn from library research rather than from the collection and analysis of new data. Examples are drawn from two recent applied family projects. [ABSTRACT FROM AUTHOR]
- Published
- 1982
- Full Text
- View/download PDF
41. Text analytics for big data using rough–fuzzy soft computing techniques.
- Author
-
Al‐Maitah, Mohammed
- Subjects
- *
SOFT computing , *BIG data , *PARTICLE swarm optimization , *PROCESS mining , *INFORMATION retrieval , *FUZZY algorithms , *VISUAL analytics - Abstract
Text mining or analytics is important for various applications such as market analysis and biomedical purposes because it enables the efficient retrieval of information from large datasets. During the analysis, increasing the dimensionality of the data reduces the performance of an entire system because doing so may retrieve irrelevant text, which creates errors. Therefore, this paper introduces big data and data mining techniques to analyse large volumes of information while mining texts, emails, blogs, online forums, news, and call centre documents. Initially, the data are collected from various sources that contain noise, which is removed by applying normalization techniques. Data mining techniques eliminate the irrelevant information and noise, and the relevant features are selected using the rough set‐based particle swarm optimization algorithm. The selected features are formed as a cluster using a fuzzy set with the particle swarm optimization algorithm, which improves the efficiency of the mining process. Then, the efficiency of the system is evaluated using the University of California Irvine Machine Learning Repository knowledge process mining database, along with the sum of the intra cluster distances, the mean squared error rate, and the accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
42. A systematic review on deep learning architectures and applications.
- Author
-
Khamparia, Aditya and Singh, Karan Mehtab
- Subjects
- *
DEEP learning , *INFORMATION retrieval , *MACHINE learning , *ARTIFICIAL neural networks , *COMPUTER network architectures - Abstract
The amount of digital data in the universe is growing at an exponential rate, doubling every 2 years, and changing how we live in the world. The information storage capacity and data requirement crossed the zettabytes. With this level of bombardment of data on machine learning techniques, it becomes very difficult to carry out parallel computations. Deep learning is broadening its scope and gaining more popularity in natural language processing, feature extraction and visualization, and almost in every machine learning trend. The purpose of this study is to provide a brief review of deep learning architectures and their working. Research papers and proceedings of conferences from various authentic resources (Institute of Electrical and Electronics Engineers, Wiley, Nature, and Elsevier) are studied and analyzed. Different architectures and their effectiveness to solve domain specific problems are evaluated. Various limitations and open problems of current architectures are discussed to provide better insights to help researchers and student to resume their research on these issues. One hundred one articles were reviewed for this meta‐analysis of deep learning. From this analysis, it is concluded that advanced deep learning architectures are combinations of few conventional architectures. For example, deep belief network and convolutional neural network are used to build convolutional deep belief network, which has higher capabilities than the parent architectures. These combined architectures are more robust to explore the problem space and thus can be the answer to build a general‐purpose architecture. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
43. Type of Submissions.
- Subjects
- *
REPORT writing , *INFORMATION science , *INFORMATION resources management , *INFORMATION resources , *INFORMATION retrieval , *COMMUNICATION , *INFORMATION technology , *TECHNOLOGY - Abstract
The article describes the type of paper submissions that will be accepted by the program committee of the American Society for Information Science and Technology. There are four types of paper submissions: 1) contributed papers; 2) contributed posters and short papers; 3) symposia and panels; and, 4) pre-conference sessions. For instance, contributed papers present original, recent research and design projects, and theoretical developments or innovative practical applications providing more general insight into an area of practice.
- Published
- 2005
44. What's New? Selected Abstracts from JASIS&T.
- Subjects
- *
INFORMATION science , *INFORMATION resources , *LIBRARY science , *RECORDS management , *INFORMATION retrieval , *CITATION indexes , *CITATION of electronic information resources - Abstract
This article presents abstracts of research papers related to information science. A study titled "Role-Related Library Use by Local Union Officials," by M.A. Chaplan and E.J. Hertenstem, presents information on a group of library users that have not been studied in decades and relates their library use to their union roles. The study collected specific information about library use, both in order to understand their information-seeking behavior and also in order to learn how to improve library service to this group. It also proposes a possible model for union officials' information-seeking behavior with directions for further research to determine the applicability of the model. Another study "Web Citation Data for Impact Assessment: A Comparison of Four Science Disciplines," by L. Vaughan and D. Shaw, sampled 5,972 articles published in 114 journals covering four science disciplines. The researchers searched for Web citations to these articles and found that the numbers of citations correlated.
- Published
- 2005
45. Inside ASIS&T.
- Subjects
- *
ASSOCIATIONS, institutions, etc. , *INFORMATION science , *DIGITAL libraries , *INFORMATION retrieval , *INFORMATION storage & retrieval systems , *CYBERNETICS - Abstract
This article presents news briefs related to the American Society for Information Science and Technology's members and chapters. Special Interest Group (SIG)/International Information Issues raised over $7000 in just two months this spring for the 6th International Paper Contest on International Digital Libraries and Information Science and Technology Advances in Developing Countries. With fundraising a continuing process, SIG officials expressed pleasure and expectation for ongoing support from the information community. For its June 2005 meeting, the Los Angeles Chapter of the American Society for Information Science and Technology combined a professional development workshop on Personal Knowledge Management: Coping with Information Overload with a business meeting. Karen Fisher, Joan Durrance and Marian Bouch Hinton are the winners of ALA's 2005 Jesse H. Shera Award for Distinguished Published Research, for their article "Information Grounds and the Use of Need-based Services by Immigrants in Queens, NY: A Context-based, Outcome Evaluation Approach."
- Published
- 2005
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.