86 results on '"Rocchio algorithm"'
Search Results
2. Utilizing Center-Based Sampling Theory to Enhance Particle Swarm Classification of Textual Data
- Author
-
Yahya, Anwar Ali, Asiri, Yousef, Alattab, Ahmed Abdu, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Fujita, Hamido, editor, Selamat, Ali, editor, Lin, Jerry Chun-Wei, editor, and Ali, Moonis, editor
- Published
- 2021
- Full Text
- View/download PDF
3. Block-based pseudo-relevance feedback for image retrieval.
- Author
-
Lin, Wei-Chao
- Subjects
- *
IMAGE retrieval , *INFORMATION retrieval , *PSYCHOLOGICAL feedback - Abstract
Pseudo-relevance feedback (PRF) is a relevance feedback (RF) technique for information retrieval that treats the top k retrieved images as relevance feedback. PRF is used to avoid the limitations of the traditional RF approach, which is a human-in-the-loop process. Although the pseudo-relevance feedback set contains noise, PRF can perform retrieval reasonably effectively. For implementing PRF, the Rocchio algorithm has been considered reasonably effective and is a well-established baseline method. However, it simply treats all of the top k feedback images as being equally similar to the query. Therefore, we present a block-based PRF approach for improving image retrieval performance. In this approach, images in the positive and negative feedback sets are further divided into predefined blocks, each of which contains one to several images, and blocks containing higher- or lower-ranked images will be assigned higher or lower weights, respectively. Experiments using the NUS-WIDE-LITE and Caltech 256 datasets and two different feature representations consistently show that the proposed approach using 30 blocks outperforms the baseline PRF in terms of P@10, P@20, and P@50. Furthermore, we show that a system that incorporates the user's feedback allows the 30-block-based PRF approach to perform even better. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Adversarial Attacks on Content-Based Filtering Journal Recommender Systems.
- Author
-
Zhaoquan Gu, Yinyin Cai, Sheng Wang, Mohan Li, Jing Qiu, Shen Su, Xiaojiang Du, and Zhihong Tian
- Subjects
RECOMMENDER systems ,ELECTRONIC journals ,FILTERS & filtration ,PERIODICAL publishing - Abstract
Recommender systems are very useful for people to explore what they really need. Academic papers are important achievements for researchers and they often have a great deal of choice to submit their papers. In order to improve the efficiency of selecting the most suitable journals for publishing their works, journal recommender systems (JRS) can automatically provide a small number of candidate journals based on key information such as the title and the abstract. However, users or journal owners may attack the system for their own purposes. In this paper, we discuss about the adversarial attacks against content-based filtering JRS. We propose both targeted attack method that makes some target journals appear more often in the system and non-targeted attack method that makes the system provide incorrect recommendations. We also conduct extensive experiments to validate the proposed methods. We hope this paper could help improve JRS by realizing the existence of such adversarial attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
5. Rocchio Algorithm to Enhance Semantically Collaborative Filtering
- Author
-
Ben Ticha, Sonia, Roussanaly, Azim, Boyer, Anne, Bsaïes, Khaled, van der Aalst, Wil M.P., Series editor, Mylopoulos, John, Series editor, Rosemann, Michael, Series editor, Shaw, Michael J., Series editor, Szyperski, Clemens, Series editor, Monfort, Valérie, editor, and Krempels, Karl-Heinz, editor
- Published
- 2015
- Full Text
- View/download PDF
6. A New Similarity Measure for Document Classification and Text Mining.
- Author
-
Eminağaoğlu, Mete and Gökșen, Yılmaz
- Subjects
TEXT mining ,DECISION support systems ,INFORMATION retrieval ,KNOWLEDGE management ,CUSTOMER relationship management ,PEARSON correlation (Statistics) - Abstract
Accurate, efficient and fast processing of textual data and classification of electronic documents have become an important key factor in knowledge management and related businesses in today's world. Text mining, information retrieval, and document classification systems have a strong positive impact on digital libraries and electronic content management, e-marketing, electronic archives, customer relationship management, decision support systems, copyright infringement, and plagiarism detection, which strictly affect economics, businesses, and organizations. In this study, we propose a new similarity measure that can be used with k-nearest neighbors (k-NN) and Rocchio algorithms, which are some of the well-known algorithms for document classification, information retrieval, and some other text mining purposes. We have tested our novel similarity measure with some structured textual data sets and we have compared the results with some other standard distance metrics and similarity measures such as Cosine similarity, Euclidean distance, and Pearson correlation coefficient. We have obtained some promising results, which show that this proposed similarity measure could be alternatively used within all suitable algorithms, methods, and models for text mining, document classification, and relevant knowledge management systems. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
7. Improving Rocchio Algorithm for Updating User Profile in Recommender Systems
- Author
-
Wang, Chong, Shen, Yao, Yang, Huan, Guo, Minyi, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Lin, Xuemin, editor, Manolopoulos, Yannis, editor, Srivastava, Divesh, editor, and Huang, Guangyan, editor
- Published
- 2013
- Full Text
- View/download PDF
8. A Text Classification Algorithm Based on Rocchio and Hierarchical Clustering
- Author
-
Zeng, Anping, Huang, Yongping, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Huang, De-Shuang, editor, Gan, Yong, editor, Bevilacqua, Vitoantonio, editor, and Figueroa, Juan Carlos, editor
- Published
- 2012
- Full Text
- View/download PDF
9. Swarm intelligence-based approach for educational data classification.
- Author
-
Yahya, Anwar Ali
- Subjects
SWARM intelligence ,PARTICLE swarm optimization ,FEATURE selection ,MACHINE learning ,TAXONOMY - Abstract
Abstract This paper explores the effectiveness of Particle Swarm Classification (PSC) for a classification task in the field of educational data mining. More specifically, it proposes PSC to design a classification model capable of classifying questions into the six cognitive levels of Bloom's taxonomy. To this end, this paper proposes a novel specialized initialization mechanism based on Rocchio Algorithm (RA) to mitigate the adverse effects of the curse of dimensionality on the PSC performance. Furthermore, in the design of the RA-based PSC model of questions classification, several feature selection approaches are investigated. In doing so, a dataset of teachers' classroom questions was collected, annotated manually with Bloom's cognitive levels, and transformed into a vector space representation. Using this dataset, several experiments are conducted, and the results show a poor performance of the standard PSC due to the curse of dimensionality. However, when the proposed RA-based initialization mechanism is used, a significant improvement in the average performance, from 0.243 to 0.663, is obtained. In addition, the results indicate that the feature selection approaches play a role in the performance of the RA-based PSC (average performance ranges from 0.535 to 0.708). Finally, a comparison between the performance of RA-based PSC (average performance = 0.663) and seven machine learning approaches (best average performance = 0.646) confirms the effectiveness of the proposed RA-based PSC approach. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Centroid particle swarm optimisation for high-dimensional data classification.
- Author
-
Yahya, Anwar Ali
- Subjects
- *
PARTICLE swarm optimization , *SWARM intelligence , *GENETIC algorithms , *ARTIFICIAL intelligence , *MATHEMATICAL optimization - Abstract
This paper proposes a new variant of Particle Swarm Optimization (PSO), dubbed CentroidPSO, to tackle data classification problem in high dimensional domains. It is inspired by the center-based sampling theory, which states that the center region of a search space contains points with higher probability to be closer to the optimal solution. The experimental results show striking performance of the CentroidPSO as compared to the standard PSO, four closely related PSO variants, and three recent evolutionary computation approaches. Moreover, a comparison with three machine learning approaches indicate that the CentroidPSO is a very competitive and promising classifier. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
11. Block-based pseudo-relevance feedback for image retrieval
- Author
-
Wei-Chao Lin
- Subjects
Rocchio algorithm ,Computer science ,business.industry ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Relevance feedback ,02 engineering and technology ,Theoretical Computer Science ,Artificial Intelligence ,020204 information systems ,Block (telecommunications) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,business ,Image retrieval ,Software - Abstract
Pseudo-relevance feedback (PRF) is a relevance feedback (RF) technique for information retrieval that treats the top k retrieved images as relevance feedback. PRF is used to avoid the limitations o...
- Published
- 2021
- Full Text
- View/download PDF
12. KAMUS DIGITAL TANAMAN OBAT MENGGUNAKAN ALGORITMA ROCCHIO BERBASIS MOBILE
- Author
-
Triastinurmiatiningsih, Arie Qur'ania, and Nazar Muhamad Ikhbal
- Subjects
Rocchio algorithm ,Information retrieval ,Similarity (network science) ,business.industry ,Computer science ,Word processing ,Relevance feedback ,Word search ,business ,Digital media ,Weighting ,Term (time) - Abstract
Digital dictionaries have been widely used to facilitate word processing and word search through digital media such as mobile phones. Society generally knows the efficacy and how to mix medicinal plants from the experience of previous parents or through books and writings. Searching through books or writings requires a short time compared to searching through digital media, one of which is a digital dictionary. The research aims to create a digital dictionary of mobile-based medicinal plants which has a search facility based on the words entered, for example, the contents of the medicinal plants. The digital dictionary application of medicinal plants uses a search method with the Rocchio algorithm with relevance feedback techniques to check the proximity of the query to the average document relevant to the level of similarity calculation through the stages of tokenizing, filtering, stemming, and Term Weighting with a total data of 200 medicinal plants.
- Published
- 2020
- Full Text
- View/download PDF
13. Query bot for retrieving patients’ clinical history: a COVID-19 use-case
- Author
-
Hari Trivedi, Judy Wawira Gichoya, Fiza M. Khan, Yibo Wang, Imon Banerjee, and Amara Tariq
- Subjects
Computer science ,media_common.quotation_subject ,Information Storage and Retrieval ,Relevance feedback ,Health Informatics ,Feedback ,Contextual design ,Humans ,Word2vec ,information retrieval ,clinical notes ,Cluster analysis ,media_common ,Original Research ,Rocchio algorithm ,relevance feedback ,Information retrieval ,SARS-CoV-2 ,COVID-19 ,Ambiguity ,Computer Science Applications ,K-Means ,Language model ,Algorithms ,Natural language ,BERT - Abstract
Objective With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients' medical history are needed for effective clinical decision making. Using COVID-19 as a case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes. Materials and methods We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance. Results In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p ≪ 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback. Discussion Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback. Conclusion In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance.
- Published
- 2021
14. The effect of low-level image features on pseudo relevance feedback.
- Author
-
Lin, Wei-Chao, Chen, Zong-Yao, Ke, Shih-Wen, Tsai, Chih-Fong, and Lin, Wei-Yang
- Subjects
- *
IMAGE processing , *FEATURE extraction , *FEEDBACK control systems , *IMAGE retrieval , *PERFORMANCE evaluation , *ALGORITHMS - Abstract
Relevance feedback (RF) is a technique popularly used to improve the effectiveness of traditional content-based image retrieval systems. However, users must provide relevant and/or irrelevant images as feedback for their queries, which is a tedious task. To alleviate this problem, pseudo relevance feedback (PRF) can be utilized. It not only automates the manual component of RF, but can also provide reasonably good retrieval performance. Specifically, it is assumed that a fraction of the top-ranked images in the initial search results are pseudo-positive. The Rocchio algorithm is a classic approach for the implementation of RF/PRF, which is based on the query vector modification discipline. The aim is to reproduce a new query vector by taking the weighted sum of the original query and the mean vectors of the relevant and irrelevant sets. Image feature representation is the key factor affecting the PRF performance. This study is the first to examine the retrieval performances of 63 different image feature descriptors ranging from 64 to 10426 dimensionalities in the context of PRF. Experimental results are obtained based on the NUS-WIDE dataset which contains 22156 Flickr images associated with 69 concepts. It is shown that the combination of color moments, edges, wavelet textures, and locality-constrained linear coding of the bag-of-words model provides the optimal feature representation, giving relatively good retrieval effectiveness and reasonably good retrieval efficiency for Rocchio based PRF. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
15. Text categorization based on improved Rocchio algorithm.
- Author
-
Gao, Guanyu and Guan, Shengxiao
- Abstract
Text categorization is used to assign each text document to predefined categories. This paper presents a new text classification method for classifying Chinese text based on Rocchio algorithm. We firstly use the TFIDF to extract document vectors from the training documents which have been correctly categorized, and then use those document vectors to generate codebooks as classification models using the LBG and Rocchio algorithm. The codebook is then used to categorize the target documents using vector scores. We tested this method in the experiment and the result shows that this method can achieve better performance. [ABSTRACT FROM PUBLISHER]
- Published
- 2012
- Full Text
- View/download PDF
16. A content-based recommender system with consideration of repeat purchase behavior.
- Author
-
Kuo, R.J. and Cheng, Hong-Ruei
- Subjects
CONSUMER behavior ,RECOMMENDER systems ,CONSUMER preferences ,PSYCHOLOGICAL feedback ,VERNACULAR architecture ,CUSTOMER feedback ,ONLINE shopping - Abstract
With the increasing popularity of online shopping and information explosion, personalized recommender systems for e-commerce become more and more necessary, which helps customers find the desired products efficiently among variety of categories based on their previous behavior such as buying pattern and rating history. However, most recommender systems for e-commerce adopt binary (purchase/non-purchase) or subjective weighting methods to represent the customer preferences, which is hard to predict their profiles precisely since rapid change in tastes. Therefore, this study focuses on the application of transactional data. A personalized recommender system for e-commerce (PROSE) is proposed in order to enhance the quality of recommendations by integrating the architecture of traditional content-based recommender system with a new component called feedback adjuster, which is designed to make customer implicit feedback reflects the reality of preferences as possible through taking into consideration their behavior of repeat purchase. The computational results indicate that the proposed algorithm is able to outperform other algorithms. • Propose a personalized recommender system for e-commerce. • The proposed algorithm integrates the architecture of traditional content-based recommender system with a feedback adjuster. • The computational results indicate that the proposed algorithm is able to outperform other existing algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Information Retrieval based Improvising Search using Automatic Query Expansion
- Author
-
Mayura Kulkarni and Shubhangi Kale
- Subjects
Rocchio algorithm ,Thesaurus (information retrieval) ,Vocabulary ,Query expansion ,Search engine ,Web search query ,Information retrieval ,Computer science ,media_common.quotation_subject ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Cosine similarity ,Relevance feedback ,media_common - Abstract
Current search engines like Google, Yahoo give the search results, indeed users are facing problems in information retrieval. The main problem is because of word mismatch and availability of many resources. Petabytes of information is available because of Internet. From that huge available data, for a naive user it becomes hectic to distinguish between relevant and irrelevant information to individual interest. Also, another reason of getting irrelevant information is incompatibility between terms that users are using and keywords present in documents. Query expansion is an adding keyword to the original query. The main issue in query expansion is selection of appropriate terms from user’s original query. Vocabulary database helps to solve this issue. Identification of the similar words and language entities that are similar in meaning is done by vocabulary which is frequently incorporated in information retrieval system. Thesaurus has been used across a large area of in information retrieval also many applications and natural language processing. In this work, to improve performance of a search query, BM25 model is used for query expansion. Cosine similarity is used to determine similarity between two keywords. Rocchio algorithm is used to calculate the relevance feedback. Experimental result shows better results using Rocchio algorithm.
- Published
- 2021
- Full Text
- View/download PDF
18. Improving Pseudo Relevance Feedback with Term Relationship using Firefly Algorithm
- Author
-
Muhammad Fikri Hasani and Rila Mandala
- Subjects
Rocchio algorithm ,Query expansion ,Jaccard index ,Computer science ,Relevance feedback ,Firefly algorithm ,Data mining ,computer.software_genre ,computer ,Word (computer architecture) ,Term (time) ,Weighting - Abstract
When searching for information with an information retrieval (IR) system, sometimes the results of the search documents provided by the system do not match the information needs of the user. Pseudo Relevance Feedback (PRF) based Query expansion (QE) tries to overcome these problems by adding words that are expected to improve retrieval results from top N ranked documents retrieved. The use of firefly algorithm (FA) as one of the optimization methods has been proven by the previous study to improve the performance of the IR system. However, in that study the weighting of words was done using the rocchio function of the Pseudo Relevant Document (PRD), so it is feared that the performance of IR system will be reduced if the number of relevant documents in PRD is little or none at all. Therefore, scoring by term relationship between query and PRD is used in this study combined with rocchio algorithm. The results of the study showed that usage of term relationship word co-occurrence or word similarity can improve the performance of the IRS that was previously built. In addition, word co-occurrence with jaccard have the best performance compared to the previous study and other combinations. FA itself was able to choose the optimal terms, even though the number of top N ranked documents increased. Furthermore, the combination of term relationship and rocchio algorithm can increase the convergence rate than the ones without rocchio algorithm.
- Published
- 2020
- Full Text
- View/download PDF
19. Improving Effectiveness Information Retrieval System Using Pseudo Irrelevance Feedback
- Author
-
Elvina and Rila Mandala
- Subjects
Rocchio algorithm ,Query expansion ,Improved performance ,ComputingMethodologies_PATTERNRECOGNITION ,Information retrieval ,Computer science ,Component (UML) ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Relevance feedback ,Centroid ,Filter (signal processing) ,Ranking (information retrieval) - Abstract
Pseudo relevance feedback (PRF) enhances the retrieval performance of the relevance feedback. Pseudo relevance feedback assumes that the k highest-ranking documents in the first retrieval are relevant and extract query expansion from them. Rocchio algorithm is a classical algorithm for implementing relevance feedback into vector space models. The Rocchio algorithm forms a new query moves toward the centroid of the relevant documents and keeps away from centroid of the irrelevant documents. However, in the relevance feedback method, irrelevant documents are ignored. In this paper, we conduct a method for pseudo irrelevance feedback (PIRF) documents components that effectively applied to the Rocchio algorithm. Documents with a high ranking outside of k relevant documents and those documents dissimilar to any k relevant documents can extract good query expansion if the documents are applied as irrelevant documents. The Rocchio algorithm uses PRF as a component of relevant documents and this research method for irrelevant documents as a component of irrelevant documents denoted by Roc PRF PIRF (filter). Experiment on CISI dataset show that Roc PRF PIRF (filter) improved performance by testing several variations the number of irrelevant documents compared to the standard Rocchio algorithm and Rocchio algorithm with irrelevant documents but without proposed method).
- Published
- 2020
- Full Text
- View/download PDF
20. Swarm intelligence-based approach for educational data classification
- Author
-
Anwar Ali Yahya
- Subjects
Rocchio algorithm ,General Computer Science ,business.industry ,Computer science ,Particle swarm optimization ,Initialization ,020206 networking & telecommunications ,Feature selection ,02 engineering and technology ,Machine learning ,computer.software_genre ,Educational data mining ,Swarm intelligence ,lcsh:QA75.5-76.95 ,Taxonomy (general) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,lcsh:Electronic computers. Computer science ,Artificial intelligence ,Data mining ,business ,computer ,Curse of dimensionality - Abstract
This paper explores the effectiveness of Particle Swarm Classification (PSC) for a classification task in the field of educational data mining. More specifically, it proposes PSC to design a classification model capable of classifying questions into the six cognitive levels of Bloom's taxonomy. To this end, this paper proposes a novel specialized initialization mechanism based on Rocchio Algorithm (RA) to mitigate the adverse effects of the curse of dimensionality on the PSC performance. Furthermore, in the design of the RA-based PSC model of questions classification, several feature selection approaches are investigated. In doing so, a dataset of teachers' classroom questions was collected, annotated manually with Bloom's cognitive levels, and transformed into a vector space representation. Using this dataset, several experiments are conducted, and the results show a poor performance of the standard PSC due to the curse of dimensionality. However, when the proposed RA-based initialization mechanism is used, a significant improvement in the average performance, from 0.243 to 0.663, is obtained. In addition, the results indicate that the feature selection approaches play a role in the performance of the RA-based PSC (average performance ranges from 0.535 to 0.708). Finally, a comparison between the performance of RA-based PSC (average performance = 0.663) and seven machine learning approaches (best average performance = 0.646) confirms the effectiveness of the proposed RA-based PSC approach. Keywords: Particle swarm classification, Rocchio Algorithm, Educational data mining, Questions classification, Bloom's taxonomy
- Published
- 2019
- Full Text
- View/download PDF
21. RLIRank: Learning to Rank with Reinforcement Learning for Dynamic Search
- Author
-
Jianghong Zhou and Eugene Agichtein
- Subjects
FOS: Computer and information sciences ,Rocchio algorithm ,Artificial neural network ,Computer science ,business.industry ,02 engineering and technology ,Machine learning ,computer.software_genre ,Computer Science - Information Retrieval ,Domain (software engineering) ,03 medical and health sciences ,Search engine ,0302 clinical medicine ,Ranking ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Reinforcement learning ,020201 artificial intelligence & image processing ,Learning to rank ,Artificial intelligence ,business ,computer ,Information Retrieval (cs.IR) ,030217 neurology & neurosurgery - Abstract
To support complex search tasks, where the initial information requirements are complex or may change during the search, a search engine must adapt the information delivery as the user's information requirements evolve. To support this dynamic ranking paradigm effectively, search result ranking must incorporate both the user feedback received, and the information displayed so far. To address this problem, we introduce a novel reinforcement learning-based approach, RLIrank. We first build an adapted reinforcement learning framework to integrate the key components of the dynamic search. Then, we implement a new Learning to Rank (LTR) model for each iteration of the dynamic search, using a recurrent Long Short Term Memory neural network (LSTM), which estimates the gain for each next result, learning from each previously ranked document. To incorporate the user's feedback, we develop a word-embedding variation of the classic Rocchio Algorithm, to help guide the ranking towards the high-value documents. Those innovations enable RLIrank to outperform the previously reported methods from the TREC Dynamic Domain Tracks 2017 and exceed all the methods in 2016 TREC Dynamic Domain after multiple search iterations, advancing the state of the art for dynamic search., Proceedings of The Web Conference 2020 (WWW '20), April 20--24, 2020, Taipei, Taiwan
- Published
- 2020
- Full Text
- View/download PDF
22. Centroid particle swarm optimisation for high-dimensional data classification
- Author
-
Anwar Ali Yahya
- Subjects
Clustering high-dimensional data ,Rocchio algorithm ,0209 industrial biotechnology ,Hardware_MEMORYSTRUCTURES ,Computer science ,Computer Science::Neural and Evolutionary Computation ,Data classification ,MathematicsofComputing_NUMERICALANALYSIS ,Centroid ,Particle swarm optimization ,02 engineering and technology ,New variant ,Educational data mining ,Theoretical Computer Science ,020901 industrial engineering & automation ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Center (algebra and category theory) ,Algorithm ,Software - Abstract
This paper proposes a new variant of Particle Swarm Optimization (PSO), dubbed CentroidPSO, to tackle data classification problem in high dimensional domains. It is inspired by the center-based sam...
- Published
- 2018
- Full Text
- View/download PDF
23. Biomedical system based on the Discrete Hidden Markov Model using the Rocchio–Genetic approach for the classification of internal carotid artery Doppler signals
- Author
-
Uğuz, Harun, Güraksın, Gür Emre, Ergün, Uçman, and Saraçoğlu, Rıdvan
- Subjects
- *
MEDICINE , *CAROTID artery , *DOPPLER effect , *HIDDEN Markov models , *GENETIC algorithms , *MATHEMATICAL optimization - Abstract
Abstract: When the maximum likelihood approach (ML) is used during the calculation of the Discrete Hidden Markov Model (DHMM) parameters, DHMM parameters of the each class are only calculated using the training samples (positive training samples) of the same class. The training samples (negative training samples) not belonging to that class are not used in the calculation of DHMM model parameters. With the aim of supplying that deficiency, by involving the training samples of all classes in calculating processes, a Rocchio algorithm based approach is suggested. During the calculation period, in order to determine the most appropriate values of parameters for adjusting the relative effect of the positive and negative training samples, a Genetic algorithm is used as an optimization technique. The purposed method is used to classify the internal carotid artery Doppler signals recorded from 136 patients as well as of 55 healthy people. Our proposed method reached 97.38% classification accuracy with fivefold cross-validation (CV) technique. The classification results showed that the proposed method was effective for the classification of internal carotid artery Doppler signals. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
24. Pairwise optimized Rocchio algorithm for text categorization
- Author
-
Miao, Yun-Qian and Kamel, Mohamed
- Subjects
- *
ALGORITHMS , *MATHEMATICAL category theory , *COMPARATIVE studies , *SUPPORT vector machines , *MATHEMATICAL optimization , *MATHEMATICAL analysis - Abstract
Abstract: This paper examines the Rocchio algorithm and its application in text categorization. Existing approaches using global parameters optimization of Rocchio algorithm result in choosing one fixed prototype representing each category for multi-category text categorization problems. Therefore, they have limited discriminating power on different category’s distribution and their parameter optimization methods are based on weak representation ability of the negative samples consisting of several categories. We present a pairwise optimized Rocchio algorithm, which dynamically adjusts the prototype position between pairs of categories. Experiments were conducted on three benchmark corpora, the 20-Newsgroup, Reuters-21578 and TDT2. The results confirm that our proposed pairwise method achieves encouraging performance improvement over the conventional Rocchio method. A comparative study with the top notch text classifier Support Vector Machine (SVM) also shows the pairwise Rocchio method achieves competitive results. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
25. Rough set based hybrid algorithm for text classification
- Author
-
Miao, Duoqian, Duan, Qiguo, Zhang, Hongyun, and Jiao, Na
- Subjects
- *
TEXT processing (Computer science) , *ROUGH sets , *ALGORITHMS , *TEXT mining , *DATA mining , *NEAREST neighbor analysis (Statistics) - Abstract
Abstract: Automatic classification of text documents, one of essential techniques for Web mining, has always been a hot topic due to the explosive growth of digital documents available on-line. In text classification community, k-nearest neighbor (kNN) is a simple and yet effective classifier. However, as being a lazy learning method without premodelling, kNN has a high cost to classify new documents when training set is large. Rocchio algorithm is another well-known and widely used technique for text classification. One drawback of the Rocchio classifier is that it restricts the hypothesis space to the set of linear separable hyperplane regions. When the data does not fit its underlying assumption well, Rocchio classifier suffers. In this paper, a hybrid algorithm based on variable precision rough set is proposed to combine the strength of both kNN and Rocchio techniques and overcome their weaknesses. An experimental evaluation of different methods is carried out on two common text corpora, i.e., the Reuters-21578 collection and the 20-newsgroup collection. The experimental results indicate that the novel algorithm achieves significant performance improvement. [Copyright &y& Elsevier]
- Published
- 2009
- Full Text
- View/download PDF
26. A new approach based on a discrete hidden Markov model using the Rocchio algorithm for the diagnosis of heart valve diseases.
- Author
-
Uğuz, Harun and Arslan, Ahmet
- Subjects
- *
HEART disease diagnosis , *MARKOV processes , *MEDICAL imaging systems , *ARTIFICIAL neural networks , *ARTIFICIAL intelligence , *HEART diseases - Abstract
Application of the Doppler ultrasound technique in the diagnosis of heart diseases has been increasing in the last decade since it is non-invasive, practicable and reliable. In this study, a new approach based on the discrete hidden Markov model (DHMM) is proposed for the diagnosis of heart valve disorders. For the calculation of hidden Markov model (HMM) parameters according to the maximum likelihood approach, HMM parameters belonging to each class are calculated by using training samples that only belong to their own classes. In order to calculate the parameters of DHMMs, not only training samples of the related class but also training samples of other classes are included in the calculation. Therefore HMM parameters that reflect a class's characteristics are more represented than other class parameters. For this aim, the approach was to use a hybrid method by adapting the Rocchio algorithm. The proposed system was used in the classification of the Doppler signals obtained from aortic and mitral heart valves of 215 subjects. The performance of this classification approach was compared with the classification performances in previous studies which used the same data set and the efficiency of the new approach was tested. The total classification accuracy of the proposed approach (95.12%) is higher than the total accuracy rate of standard DHMM (94.31%), continuous HMM (93.5%) and support vector machine (92.67%) classifiers employed in our previous studies and comparable with the performance levels of classifications using artificial neural networks (95.12%) and fuzzy-C-means/CHMM (95.12%). [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
27. Rocchio algorithm-based particle initialization mechanism for effective PSO classification of high dimensional data
- Author
-
Anwar Ali Yahya, Mohammad Said El-Bashir, and Addin Osman
- Subjects
Rocchio algorithm ,Clustering high-dimensional data ,General Computer Science ,Computer science ,business.industry ,General Mathematics ,Data classification ,MathematicsofComputing_NUMERICALANALYSIS ,Evolutionary algorithm ,Initialization ,Particle swarm optimization ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Machine learning ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,Educational data mining ,ComputingMethodologies_PATTERNRECOGNITION ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,Artificial intelligence ,business ,computer ,Curse of dimensionality - Abstract
In recent years, there has been a growing interest in applying Particle Swarm Optimization (PSO) to data classification. Nonetheless, due to the curse of dimensionality, the effectiveness of the PSO applied to high dimensional data classification becomes questionable. This paper proposes a novel specialized PSO initialization mechanism, developed specifically for PSO applications to high dimensional data classification. The proposed initialization mechanism is inspired by the center-based sampling theory, which argues that the center of the search space is a promising region for the initialization step in evolutionary algorithms. Furthermore, the proposed initialization mechanism is based on an information retrieval algorithm called Rocchio Algorithm (RA); that identifies the center region of the search space of data classification. To validate the proposed mechanism, RA-based PSO has been applied to a high dimensional classification task in educational data mining. More specifically, RA-based PSO has been applied to classify a dataset of teachers' classroom questions into Bloom's taxonomy cognitive levels. To do so, a dataset of teachers' classroom questions has been collected and annotated manually with Bloom's taxonomy cognitive levels. Pre-processing steps have been applied to convert questions into a representation suitable for classification. Using this dataset, the standard PSO, PSO with generic initialization mechanisms, and RA-based PSO have been experimented and compared. The results show a poor performance of the standard PSO and the PSO with the generic initialization mechanisms, as well as a significant improvement in the performance of RA-based PSO. These results indicate that a proper task-specific PSO initialization mechanism is crucial for effective PSO performance in high dimensional data classification. Furthermore, a comparison between RA-based PSO and pure RA classification provide a quantitative estimation of the role of initialization mechanism and PSO search for the classification of the dataset. On the other hand, the comparison between RA-based PSO approach and three conventional machine learning approaches, experimented on the same dataset confirms the effectiveness of RA-based PSO for high dimensional data classification. Moreover, the comparison between RA-based PSO approach and machine learning approaches, in terms of computational time efficiency, shows that they are comparable in classification time. However, as the learning of PSO is a time-consuming process, its effectiveness is significantly affected if the learning time is a matter.
- Published
- 2017
- Full Text
- View/download PDF
28. A New Similarity Measure for Document Classification and Text Mining
- Author
-
Mete Eminağaoğlu and Yılmaz Gökşen
- Subjects
Rocchio algorithm ,Information retrieval ,Computer science ,business.industry ,Document classification ,Cosine similarity ,Similarity measure ,Digital library ,computer.software_genre ,ComputingMethodologies_PATTERNRECOGNITION ,Text mining ,Similarity (network science) ,Plagiarism detection ,business ,computer - Abstract
Accurate, efficient and fast processing of textual data and classification of electronic documents have become an important key factor in knowledge management and related businesses in today’s world. Text mining, information retrieval, and document classification systems have a strong positive impact on digital libraries and electronic content management, e-marketing, electronic archives, customer relationship management, decision support systems, copyright infringement, and plagiarism detection, which strictly affect economics, businesses, and organizations. In this study, we propose a new similarity measure that can be used with k-nearest neighbors (k-NN) and Rocchio algorithms, which are some of the well-known algorithms for document classification, information retrieval, and some other text mining purposes. We have tested our novel similarity measure with some structured textual data sets and we have compared the results with some other standard distance metrics and similarity measures such as Cosine similarity, Euclidean distance, and Pearson correlation coefficient. We have obtained some promising results, which show that this proposed similarity measure could be alternatively used within all suitable algorithms, methods, and models for text mining, document classification, and relevant knowledge management systems. Keywords: text mining, document classification, similarity measures, k-NN, Rocchio algorithm
- Published
- 2020
- Full Text
- View/download PDF
29. Content based Image Retrieval with Rocchio Algorithm for Relevance Feedback Using 2D Image Feature Representation
- Author
-
Indah Agustien Siradjuddin, S Mochammad Kautsar, and Aryandi Triyanto
- Subjects
Rocchio algorithm ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Cosine similarity ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Relevance feedback ,Pattern recognition ,0102 computer and information sciences ,Content-based image retrieval ,01 natural sciences ,030218 nuclear medicine & medical imaging ,Image (mathematics) ,03 medical and health sciences ,0302 clinical medicine ,010201 computation theory & mathematics ,Feature (computer vision) ,Artificial intelligence ,Representation (mathematics) ,business - Abstract
This paper presents Content based Image Retrieval with Relevance Feedback to retrieve relevant images based on an image query. Three main steps are proposed, first, obtain 2D feature representation of an image query and image database using the Integrated Color Co-Occurrence Matrix. This feature extraction method captures two features simultaneously, they are color and texture features. Second, compute cosine similarity measurement to retrieve similar images between features of an image query and features of all images in the database. Third, update the query features using Rocchio algorithm based on the user's relevance feedback, and recalculation of the cosine similarity between the updated feature of query and features of all images in the database. Experiments are conducted using Corel Image database that consists of 1000 images from ten classes. The proposed model for retrieving similar images achieved higher performance accuracy compare to the Content based Image Retrieval without Relevance feedback.
- Published
- 2019
- Full Text
- View/download PDF
30. A new approach based on discrete hidden Markov model using Rocchio algorithm for the diagnosis of the brain diseases
- Author
-
Uğuz, Harun and Arslan, Ahmet
- Subjects
- *
HIDDEN Markov models , *ALGORITHMS , *DIAGNOSIS of brain diseases , *TRANSCRANIAL Doppler ultrasonography , *CEREBRAL circulation , *MEDICAL sciences , *HEMORRHAGE , *FEATURE extraction - Abstract
Abstract: Transcranial Doppler (TCD) study of the adult intracerebral circulation has gained an important popularity in last 10 years, since it is a non-invasive, easy to apply and reliable technique. In this study, an implementation on biomedical system has been developed for classification of signals gathered from middle cerebral arteries in the temporal area via TCD for 24 healthy and 82 ill people which have one of the four different brain patients such as; cerebral aneurysm, brain hemorrhage, cerebral oedema and brain tumor. Basically, the system is composed of feature extraction and classification parts. In the feature extraction stage, the Linear Predictive Coding (LPC) Analysis and Cepstral Analysis were applied in order to extract the cepstral and delta-cepstral coefficients in frame level as feature vectors. In the classification stage a new Discrete Hidden Markov Model (DHMM) based approach was proposed for the diagnosis of brain diseases. This proposed method was developed via Rocchio algorithm. Therefore, to calculate DHMM parameters regulated according to maximum likelihood (ML) approach, both training samples of related class and other classes were included in calculation. Thus, DHMM model parameters presenting one class were suggested to represent the training samples related to that class better as well as not to represent the training samples related to other classes. The performance of the proposed DHMM with Rocchio approach was compared with some methods such as DHMM, Artificial Neural Network (ANN), neuro-fuzzy approaches and obtained better classification performance than these methods. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
31. Book Search using social information
- Author
-
Rajendra Pamula and Ritesh Kumar
- Subjects
Rocchio algorithm ,Query expansion ,Information retrieval ,Computer science ,Book search ,Social book search ,Social information ,Task (project management) - Abstract
Books are the source of knowledge and information. Users search relevant books based on their need. In this paper we develop method for book suggestion. In this paper, we combine Sequential Dependence Model with heterogeneous social information such as tags, reviews and ratings. These social informations are assigned by users. Therefore, we use CLEF-2016 Social Book Search track (Suggestion task). We train our proposed method on CLEF-2015 Social Book Search (Suggestions task) datasets and tested on CLEF-2016 Social book Search datasets. We get better result compared to other systems.
- Published
- 2018
- Full Text
- View/download PDF
32. An Improved Rocciho Algorithm for Music Mood Classification
- Author
-
G P Sajeev and K Sri Nikitha
- Subjects
Rocchio algorithm ,0209 industrial biotechnology ,InformationSystems_INFORMATIONINTERFACESANDPRESENTATION(e.g.,HCI) ,Computer science ,business.industry ,02 engineering and technology ,Musical ,Class (biology) ,Metadata ,030507 speech-language pathology & audiology ,03 medical and health sciences ,020901 industrial engineering & automation ,Mood ,Resource (project management) ,Music information retrieval ,The Internet ,0305 other medical science ,business ,Algorithm - Abstract
The amount of information that is available on internet related to music is very huge. The information related to music can be mined using many features, and the on-line contribution of both musical experts and general listeners has provided music researchers with a rich resource of information. The mood of a song helps in recommending songs to online users. Also, there is a strong application-oriented interest in mood classification for music download services and audio players allow music collection browsing using mood as one search criteria. This paper proposes a novel mood classification technique using improved Roccihio algorithm. Since Rocchio algorithm uses only one prototype vector for representing a class, it offers a less prediction accuracy. We addressed this problem by considering the $k$ -nearest vectors along with prototype vector. The proposed method is validated using real music data, collected from well known music portals, in comparison with other machine learning methods.
- Published
- 2018
- Full Text
- View/download PDF
33. Research on MapReduce-Based Rocchio Relevance Feedback in Massive Information Filtering
- Author
-
Zhi Dong Shang, Zhi Cheng Zhang, and Wen Chuan Yang
- Subjects
Rocchio algorithm ,Data set ,Statistical classification ,Computer science ,General Engineering ,Relevance feedback ,Data mining ,computer.software_genre ,computer - Abstract
Traditional text classification algorithms have vital impact on information filtering. However, their performances were confined to a large extent in terms of the massive data set. This paper proposes an approach using MapReduce-based Rocchio relevance feedback algorithm, which improved the traditional Rocchio algorithm in the MapReduce paradigm, to resolve the problem of massive information filtering. The experiments on Hadoop cluster showed an effective improvement in performance by using the new method.
- Published
- 2014
- Full Text
- View/download PDF
34. The English Language Scientific Literature Classification Based on Abstract Using Rocchio Algorithm
- Author
-
Dedy Arisandi, Ulfi Andayani, Mohammad Fadly Syahputra, Baihaqi Siregar, and Misbah Hasugian
- Subjects
Rocchio algorithm ,History ,Training set ,Information retrieval ,Computer science ,Pattern recognition (psychology) ,System testing ,English language ,Scientific literature ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,Computer Science Applications ,Education ,Test data - Abstract
The need for the documents in the form of journals or scientific articles is currently increasing which means the need of documents available also increasing, in another side it makes difficult to find and present journals. Therefore, a method is needed to classify journals automatically according to the categories of the journal. One method that can help organize documents according to their categories is classification. In this study, Rocchio algorithm is used as a method to classify journals. The data used comes from Google Scholar. Journal consists of 4 (four) categories: Computer Vision and Pattern Recognition, Artificial Intelligent, Data Mining and Analysis, and Computer System. System testing is carried out by taking 160 journals for training data and 48 journals for testing data. The results of this study produced a web-based journal classification system and showed the method used was able to classify journals with accuracy result is 93%.
- Published
- 2019
- Full Text
- View/download PDF
35. Text categorization using Rocchio algorithm and random forest algorithm
- Author
-
A. Vincent, G. Neeraja, V. Abinaya, R. Deepika, P. Karthikeyan, and S. Thamarai Selvi
- Subjects
Computer science ,Feature extraction ,0211 other engineering and technologies ,Decision tree ,Word error rate ,02 engineering and technology ,Machine learning ,computer.software_genre ,0202 electrical engineering, electronic engineering, information engineering ,Relevance (information retrieval) ,Cluster analysis ,Rocchio algorithm ,021110 strategic, defence & security studies ,Stop words ,business.industry ,Boosting methods for object categorization ,Random forest ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,Categorization ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,020201 artificial intelligence & image processing ,Algorithm design ,Artificial intelligence ,business ,computer - Abstract
Millions of file uploads and downloads happen every minute resulting in big data creation and manual text categorization is not possible. Hence, there is a need for automatic categorization of documents that makes storage and retrieval more efficient. This research paper proposes a hybrid text categorization model that combines both Rocchio algorithm and Random Forest algorithm to perform Multi-label text categorization. Stop word remover and word stemmer has been used to overcome the limitations in Rocchio Algorithm. Random Forest model takes minimal categories as input to reduce its error rate. Experiments were done on standard text categorization datasets. Our proposed model is found to be more efficient in categorizing the documents when compared with other text categorization models such as fuzzy relevance clustering, ML-KNN (Multi-label KNN) and Naive-Bayes Algorithms.
- Published
- 2017
- Full Text
- View/download PDF
36. Large scale multi-label text classification of a hierarchical dataset using Rocchio algorithm
- Author
-
Sowmya B J, Chetan, and K. G. Srinivasa
- Subjects
Rocchio algorithm ,Computer science ,business.industry ,computer.software_genre ,Machine learning ,Hierarchical database model ,k-nearest neighbors algorithm ,ComputingMethodologies_PATTERNRECOGNITION ,Text categorization ,Data mining ,Artificial intelligence ,Scale (map) ,business ,computer - Abstract
Hierarchical data is becoming increasingly prominent, especially on the web. Wikipedia is one such example where there are millions of documents that are classified into multiple classes in a hierarchical fashion. This gives rise to an interesting problem of automating the classification of new documents. As the size of the dataset grows, so does the number of classes. Further, there seems to be sparsity issue even with the increase in the dataset. Therefore, this poses a challenge to classify data in such a manner. We present two different algorithms based on text categorization: Rocchio algorithm and kNN. We implement and compare the above mentioned methods to better understand the approach to take in classifying hierarchical data.
- Published
- 2016
- Full Text
- View/download PDF
37. An Improved Parallel Algorithm for Text Categorization
- Author
-
Wenchuan Yang, Yimin Fu, and Dong Zhang
- Subjects
Rocchio algorithm ,020203 distributed computing ,business.industry ,Computer science ,Parallel algorithm ,Relevance feedback ,020207 software engineering ,02 engineering and technology ,Machine learning ,computer.software_genre ,Data modeling ,Support vector machine ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,0202 electrical engineering, electronic engineering, information engineering ,Algorithm design ,Artificial intelligence ,Data mining ,business ,Cluster analysis ,computer - Abstract
This paper proposes an approach using MapReduce-based Rocchio relevance feedback algorithm, which improved the traditional Rocchio algorithm in the MapReduce paradigm, to resolve the problem of massive information filtering. Traditional text classification algorithms have vital impact on information filtering.
- Published
- 2016
- Full Text
- View/download PDF
38. Semantic Mapping in Video Retrieval
- Author
-
Maaike H. T. de Boer
- Subjects
Rocchio algorithm ,Information retrieval ,Computer science ,Event (computing) ,business.industry ,Deep learning ,Relevance feedback ,020207 software engineering ,02 engineering and technology ,Content-based image retrieval ,Sensor fusion ,TRECVID ,Management Information Systems ,Query expansion ,Semantic mapping ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,business - Abstract
In the modern world, networked sensor technology makes it possible to capture the world around us in real-time. In the security domain cameras are an important source of information. Cameras in public places, bodycams, drones and recordings with smart phones are used for real time monitoring of the environment to prevent crime (monitoring case); and/or for investigation and retrieval of crimes, for example in evidence forensics (forensic case). In both cases it is required to quickly obtain the right information, without having to manually search through the data. Currently, many algorithms are available to index a video with some pre-trained concepts, such as people, objects and actions. These algorithms require a representative and large enough set of examples (training data) to recognize the concept. This training data is, however, not always present. In this thesis, we aim to assist an analyst in their work on video stream data by providing a search capability that handles ad-hoc textual queries, i.e. queries that include concepts or events that are not pre-trained. We use the security domain as inspiration for our work, but the analyst can be working in any application domain that uses video stream data, or even indexed data. Additionally, we do only consider the technical aspects of the search capability and not on the legal, ethical or privacy issues related to video stream data. We focus on the retrieval of high-level events, such as birthday parties. We assume that these events can be composed of smaller pre-trained concepts, such as a group of people, a cake and decorations and relations between those concepts, to capture the essence of that unseen event (decompositionality assumption). Additionally, we hold the open world assumption, i.e. the system does not have complete world knowledge. Although current state of the art systems are able to detect an increasingly large number of concepts, this number still falls far behind the near infinite number of possible (textual) queries that a system needs to be able to handle. In our aim to assist the analyst, we focus on the improvement of the visual search effectiveness (e.g. performance) by a semantic query-to-concept mapping: the mapping from the user query to the set of pre-trained concepts. We use the TRECVID Multimedia Event Detection benchmark, as it contains high-level events inspired by the security domain. In this thesis, we show that the main improvements can be achieved by using a combination of i) queryto- concept mapping based on semantic word embeddings (+12%), ii) exploiting user feedback (+26%) and iii) fusion of different modalities (data sources) (+17%). First, we propose an incremental word2vec (i-w2v) method [1], which uses word2vec trained on GoogleNews items as a semantic embedding model and incrementally adds concepts to the set of selected concepts for a query in order to deal with query drift. This method improve performance in terms of MAP compared to the state of the art word2vec method and knowledge based techniques. In combination with a state of the art video event retrieval pipeline, we achieve top performance on the TRECVID MED benchmark regarding the zero-example task (MED14Test results). This improvement is, however, dependent on the availability of the concepts in the Concept Bank: without concepts related to or occurring in the event, we cannot detect the event. We, thus, need a properly composed Concept Bank to properly index videos. Second, we propose an Adaptive Relevance Feedback interpretation method named ARF [2] that not only achieves high retrieval performance, but is also theoretically founded through the Rocchio algorithm from the text retrieval field. This algorithm is adjusted to the event retrieval domain in a way that the weights for the concepts are changed based on the positive and negative annotations on videos. The ARF method has higher visual search effectiveness compared to k-NN based methods on video level annotations and methods based on concept level annotations. Third, we propose blind late fusion methods that are based on state of the art methods [3], such as average fusion or fusion based on probabilities. Especially the combination of a Joint Ratio (ratio of probabilities) and Extreme Ratio (ratio of minimum and maximum) method (JRER) achieves high performance in cases with reliable detectors, i.e. enough training examples. This method is not only applicable to the video retrieval field, but also in sensor fusion in general. Although future work can be done in the direction of implicit query-to-concept mapping through deep learning methods, smartly combining the concepts and the usage of spatial and temporal information, we have shown that our proposed methods can improve the visual search effectiveness by a semantic query-to-concept mapping which brings us a step closer to a search capability that handles ad-hoc textual queries for analysts.
- Published
- 2018
- Full Text
- View/download PDF
39. Text Categorization Based on Topic Model
- Author
-
Shibin Zhou, Yushu Liu, and Kan Li
- Subjects
Topic model ,General Computer Science ,Computer science ,Latent Dirichlet allocation ,Variational Inference ,computer.software_genre ,Machine learning ,lcsh:QA75.5-76.95 ,Naive Bayes classifier ,symbols.namesake ,Pachinko allocation ,Additive smoothing ,Rocchio algorithm ,business.industry ,Computer Science::Information Retrieval ,Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) ,QA75.5-76.95 ,Category LanguageModel ,Dynamic topic model ,Computational Mathematics ,ComputingMethodologies_PATTERNRECOGNITION ,Electronic computers. Computer science ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,symbols ,lcsh:Electronic computers. Computer science ,Artificial intelligence ,Language model ,business ,computer ,Natural language processing - Abstract
In the text literature, many topic models were proposed to represent documents and words as topics or latent topics in order to process text effectively and accurately. In this paper, we propose LDACLM or Latent Dirichlet Allocation Category LanguageModel for text categorization and estimate parameters of models by variational inference. As a variant of Latent Dirichlet Allocation Model, LDACLM regards documents of category as Language Model and uses variational parameters to estimate maximum a posteriori of terms. In general, experiments show LDACLM model is effective and outperform Na¨?ve Bayes with Laplace smoothing and Rocchio algorithm but little inferior to SVM for text categorization.
- Published
- 2009
- Full Text
- View/download PDF
40. A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods
- Author
-
Pan-Jun Kim
- Subjects
Rocchio algorithm ,Computer science ,Data mining ,Performance improvement ,computer.software_genre ,tf–idf ,computer ,Classifier (UML) ,Weighting - Abstract
This study examines various weighting methods for improving the performance of automatic classification based on Rocchio algorithm on two collections(LISA, Reuters-21578). First, three factors for weighting are identified as document factor, document factor, category factor for each weighting schemes, the performance of each was investigated. Second, the performance of combined weighting methods between the single schemes were examined. As a result, for the single schemes based on each factor, category-factor-based schemes showed the best performance, document set-factor-based schemes the second, and document-factor-based schemes the worst. For the combined weighting schemes, the schemes(idf*cat) which combine document set factor with category factor show better performance than the combined schemes(tf*cat or ltf*cat) which combine document factor with category factor as well as the common schemes (tfidf or ltfidf) that combining document factor with document set factor. However, according to the results of comparing the single weighting schemes with combined weighting schemes in the view of the collections, while category-factor-based schemes(cat only) perform best on LISA, the combined schemes(idf*cat) which combine document set factor with category factor showed best performance on the Reuters-21578. Therefore for the practical application of the weighting methods, it needs careful consideration of the categories in a collection for automatic classification.
- Published
- 2008
- Full Text
- View/download PDF
41. Rocchio Algorithm to Enhance Semantically Collaborative Filtering
- Author
-
Khaled Bsaïes, Azim Roussanaly, Anne Boyer, Sonia Ben Ticha, Unité de Recherche en Programmation Algorithmique et Heuristique (URPAH), Faculté des Sciences Mathématiques, Physiques et Naturelles de Tunis (FST), Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), Knowledge Information and Web Intelligence (KIWI), Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Valérie Monfort, Karl-Heinz Krempels, Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Rocchio algorithm ,Information retrieval ,Probabilistic latent semantic analysis ,Latent semantic analysis ,Computer science ,Recommender system ,Semantic data model ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Latent Semantic Analysis ,Semantic computing ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,Collaborative filtering ,Hybrid Recommender System ,Information filtering system ,Rocchio Algorithm - Abstract
International audience; Recommender system provides relevant items to users from huge catalogue. Collaborative filtering and content-based filtering are the most widely used techniques in personalized recommender systems. Collaborative filtering uses only the user-ratings data to make predictions, while content-based filtering relies on semantic information of items for recommendation. Hybrid recommendation system combines the two techniques. In this paper, we present another hybridization approach: User Semantic Collaborative Filtering. The aim of our approach is to predict users preferences for items based on their inferred preferences for semantic information of items. In this aim, we design a new user semantic model to describe the user preferences by using Rocchio algorithm. Due to the high dimension of item content, we apply a latent semantic analysis to reduce the dimension of data. User semantic model is then used in a user-based collaborative filtering to compute prediction ratings and to provide recommendations. Applying our approach to real data set, the MoviesLens 1M data set, significant improvement can be noticed compared to usage only approach, content based only approach.
- Published
- 2015
- Full Text
- View/download PDF
42. Text learning for user profiling in e-commerce
- Author
-
Teresa Maria Altomare Basile, Marco Degemmis, Stefano Ferilli, Giovanni Semeraro, N. Di Mauro, and Pasquale Lops
- Subjects
Rocchio algorithm ,Information retrieval ,Computer science ,business.industry ,Computation ,Probabilistic logic ,E-commerce ,Computer Science Applications ,Theoretical Computer Science ,World Wide Web ,Inductive logic programming ,Control and Systems Engineering ,Text learning ,Profiling (information science) ,business ,Relevant information - Abstract
Exploring digital collections to find information relevant to a user's interests is a challenging task. Algorithms designed to solve this relevant information problem base their relevance computations on user profiles in which representations of the users' interests are maintained. This article presents a new method, based on the classic Rocchio algorithm for text categorization, able to discover user preferences from the analysis of textual descriptions of items in online catalog of e-commerce Web sites. Experiments have been carried out on several data sets, and results have been compared with those obtained using an inductive logic programming (ILP) approach and a probabilistic one.
- Published
- 2006
- Full Text
- View/download PDF
43. A Study on the Automatic Descriptor Assignment for Scientific Journal Articles Using Rocchio Algorithm
- Author
-
Pan-Jun Kim
- Subjects
Rocchio algorithm ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Search engine indexing ,Pattern recognition ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Text categorization ,Automatic indexing ,Controlled vocabulary ,Learning based ,Artificial intelligence ,Performance improvement ,business - Abstract
Several performance factors which have applied to the automatic indexing with controlled vocabulary and text categorization based on Rocchio algorithm were examined, and the simple method for performance improvement of them were tried. Also, results of the methods using Rocchio algorithm were compared with those of other learning based methods on the same conditions. As a result, keeping with the strong points which are implementational easiness and computational efficiency, the methods based Rocchio algorithms showed equivalent or better results than other learning based methods(SVM, VPT, NB). Especially, for the semi-automatic indexing(computer-aided indexing), the methods using Rocchio algorithm with a high recall level could be used preferentially.
- Published
- 2006
- Full Text
- View/download PDF
44. Filtering search results using an optimal set of terms identified by an artificial neural network
- Author
-
Peretz Shoval, Zvi Boger, and Tsvi Kuflik
- Subjects
Rocchio algorithm ,User profile ,Artificial neural network ,Computer science ,Feature selection ,Filter (signal processing) ,Library and Information Sciences ,Management Science and Operations Research ,computer.software_genre ,Computer Science Applications ,Set (abstract data type) ,Media Technology ,Feature (machine learning) ,Data mining ,computer ,Word (computer architecture) ,Information Systems - Abstract
Information filtering (IF) systems usually filter data items by correlating a set of terms representing the user's interest (a user profile) with similar sets of terms representing the data items. Many techniques can be employed for constructing user profiles automatically, but they usually yield large sets of term. Various dimensionality-reduction techniques can be applied in order to reduce the number of terms in a user profile. We describe a new terms selection technique including a dimensionality-reduction mechanism which is based on the analysis of a trained artificial neural network (ANN) model. Its novel feature is the identification of an optimal set of terms that can classify correctly data items that are relevant to a user. The proposed technique was compared with the classical Rocchio algorithm. We found that when using all the distinct terms in the training set to train an ANN, the Rocchio algorithm outperforms the ANN based filtering system, but after applying the new dimensionality-reduction technique, leaving only an optimal set of terms, the improved ANN technique outperformed both the original ANN and the Rocchio algorithm.
- Published
- 2006
- Full Text
- View/download PDF
45. Relevant data expansion for learning concept drift from sparsely labeled data
- Author
-
John Yen and Dwi H. Widyantoro
- Subjects
Rocchio algorithm ,Concept drift ,business.industry ,Computer science ,Data stream mining ,Relevance feedback ,Semi-supervised learning ,Missing data ,Machine learning ,computer.software_genre ,Computer Science Applications ,Information extraction ,Computational Theory and Mathematics ,Concept learning ,Artificial intelligence ,Data mining ,business ,computer ,Information Systems - Abstract
Keeping track of changing interests is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. Being able to do so with a few feedback examples poses an even more important and challenging problem because existing concept drift learning algorithms that handle the task typically suffer from it. This work presents a new computational framework for extending incomplete labeled data stream (FEILDS), which extends the capability of existing algorithms for learning concept drift from a few labeled data. The system transforms the original input stream into a new stream that can be conveniently tracked by the existing learning algorithms. The experiment results reveal that FEILDS can significantly improve the performances of a Multiple Three-Descriptor Representation (MTDR) algorithm, Rocchio algorithm, and window-based concept drift learning algorithms when learning from a sparsely labeled data stream with respect to their performances without using FEILDS.
- Published
- 2005
- Full Text
- View/download PDF
46. Improving linear classifier for Chinese text categorization
- Author
-
Jyh-Jong Tsay and Jing-Doo Wang
- Subjects
Rocchio algorithm ,business.industry ,Linear model ,Linear classifier ,Pattern recognition ,Library and Information Sciences ,Management Science and Operations Research ,Quadratic classifier ,Machine learning ,computer.software_genre ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Text categorization ,Categorization ,Margin classifier ,Media Technology ,Artificial intelligence ,business ,Classifier (UML) ,computer ,Information Systems ,Mathematics - Abstract
The goal of this paper is to derive extra representatives from each class to compensate for the potential weakness of linear classifiers that compute one representative for each class. To evaluate the effectiveness of our approach, we compared with linear classifier produced by Rocchio algorithm and the k-nearest neighbor (kNN) classifier. Experimental results show that our approach improved linear classifier and achieved micro-averaged accuracy close to that of kNN, with much less classification time. Furthermore, we could provide a suggestion to reorganize the structure of classes when identify new representatives for linear classifier.
- Published
- 2004
- Full Text
- View/download PDF
47. [Untitled]
- Author
-
Padmini Srinivasan and Miguel E. Ruiz
- Subjects
Rocchio algorithm ,Divide and conquer algorithms ,Artificial neural network ,Computer science ,Library and Information Sciences ,computer.software_genre ,Mixture of experts ,ComputingMethodologies_PATTERNRECOGNITION ,Text categorization ,Categorization ,Test set ,Data mining ,computer ,Classifier (UML) ,Information Systems - Abstract
This paper presents the design and evaluation of a text categorization method based on the Hierarchical Mixture of Experts model. This model uses a divide and conquer principle to define smaller categorization problems based on a predefined hierarchical structure. The final classifier is a hierarchical array of neural networks. The method is evaluated using the UMLS Metathesaurus as the underlying hierarchical structure, and the OHSUMED test set of MEDLINE records. Comparisons with an optimized version of the traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers are provided. The results show that the use of the hierarchical structure improves text categorization performance with respect to an equivalent flat model. The optimized Rocchio algorithm achieves a performance comparable with that of the hierarchical neural networks.
- Published
- 2002
- Full Text
- View/download PDF
48. User Semantic Model for Dependent Attributes to Enhance Collaborative Filtering
- Author
-
Azim Roussanaly, Anne Boyer, Sonia Ben Ticha, Khaled Bsaïes, Knowledge Information and Web Intelligence (KIWI), Department of Complex Systems, Artificial Intelligence & Robotics (LORIA - AIS), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Unité de Recherche en Programmation Algorithmique et Heuristique (URPAH), Faculté des Sciences Mathématiques, Physiques et Naturelles de Tunis (FST), Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), INSTICC, Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Rocchio algorithm ,Information retrieval ,Probabilistic latent semantic analysis ,Latent semantic analysis ,Computer science ,Recommender system ,Semantic data model ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Latent Semantic Analysis ,Semantic computing ,Collaborative filtering ,Hybrid Recommender System ,Information filtering system ,Rocchio Algorithm - Abstract
International audience; Recommender system provides relevant items to users from huge catalogue. Collaborative filter-ing and content-based filtering are the most widely used techniques in personalized recommender systems. Collaborative filtering uses only the user-ratings data to make predictions, while content-based filtering relies on semantic information of items for recommendation. Hybrid recommenda-tion system combines the two techniques. The aim of this work is to introduce a new approach for semantically enhanced collaborative filtering. Many works have addressed this problem by proposing hybrid solutions. In this paper, we present another hybridization technique that pre-dicts users preferences for items based on their inferred preferences for semantic information of items. For this, we design a new user semantic model by using Rocchio algorithm and we apply a latent semantic analysis to reduce the dimension of data. Applying our approach to real data, the MoviesLens 1M dataset, significant improvement can be noticed compared to usage only approach, and hybrid algorithm.
- Published
- 2014
- Full Text
- View/download PDF
49. Parallel Sentiment Polarity Classification Method with Substring Feature Reduction
- Author
-
Yaowen Zhang, Cunyan Yin, Lin Shang, and Xiaojun Xiang
- Subjects
Rocchio algorithm ,Polarity (physics) ,business.industry ,Computer science ,Feature extraction ,Sentiment analysis ,Pattern recognition ,computer.software_genre ,Substring ,Data set ,Reduction (complexity) ,Feature (machine learning) ,Data mining ,Artificial intelligence ,business ,computer - Abstract
Sentiment analysis is an important issue in machine learning, which aims to identify the emotion expressed in corpus. However, sentiment analysis is a difficult task, especially in large-scale data, where feature reduction is needed. In this paper, we propose a parallel feature reduction algorithm for sentiment polarity classification based on a substring method. Specifically, the proposed algorithm is based on parallel computing under the Hadoop platform. The proposed algorithm is examined on a large data set and a K-nearest neighbor algorithm and a Rocchio algorithm are used for classification. Experimental results show that the proposed algorithm outperforms other commonly used methods in terms of the classification performance and the computational cost.
- Published
- 2013
- Full Text
- View/download PDF
50. Text categorization based on improved Rocchio algorithm
- Author
-
Shengxiao Guan and Guanyu Gao
- Subjects
Rocchio algorithm ,Training set ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Codebook ,Pattern recognition ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,Text categorization ,Text mining ,Categorization ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Algorithm design ,Artificial intelligence ,business ,tf–idf - Abstract
Text categorization is used to assign each text document to predefined categories. This paper presents a new text classification method for classifying Chinese text based on Rocchio algorithm. We firstly use the TFIDF to extract document vectors from the training documents which have been correctly categorized, and then use those document vectors to generate codebooks as classification models using the LBG and Rocchio algorithm. The codebook is then used to categorize the target documents using vector scores. We tested this method in the experiment and the result shows that this method can achieve better performance.
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.