14 results on '"Probabilistic latent semantic analysis"'
Search Results
2. Unified collaborative filtering model based on combination of latent features
- Author
-
Zhong, Jiang and Li, Xue
- Subjects
- *
INFORMATION filtering systems , *RECOMMENDER systems , *LATENT semantic analysis , *PROBABILITY theory , *PREDICTION models , *EXPERT systems - Abstract
Abstract: Collaborative filtering (CF) has been studied extensively in the literature and is demonstrated successfully in many different types of personalized recommender systems. In this paper, we propose a unified method combining the latent and external features of users and items for accurate recommendation. A mapping scheme for collaborative filtering problem to text analysis problem is introduced, and the probabilistic latent semantic analysis was used to calculate the latent features based on the historical rating data. The main advantages of this technique over standard memory-based methods are the higher accuracy, constant time prediction, and an explicit and compact model representation. The experimental evaluation shows that substantial improvements in accuracy over existing methods can be obtained. [Copyright &y& Elsevier]
- Published
- 2010
- Full Text
- View/download PDF
3. Email thread sentiment sequence identification using PLSA clustering algorithm.
- Author
-
Srinivasarao, Ulligaddala and Sharaff, Aakanksha
- Subjects
- *
EMAIL , *LATENT semantic analysis , *USER-generated content , *FEATURE extraction , *ALGORITHMS , *SENTIMENT analysis - Abstract
[Display omitted] • SentiWordNet lexicon is used for generating sentiment features of email data. • Probabilistic Latent Semantic Analysis algorithm is used for clustering. • The sentiment sequence of thread and thread size are identified. • Evaluated the proposed model using accuracy, precision, recall and F-measure. Email messaging is the most common way of providing effective communication between internauts. Consequently, the total sent and received emails count will be increased. But, the internaut can't remember all such emails. Even though email thread identification approaches give satisfactory benefits to the internauts, but they may fail to alert them for a cause to identify the sentiments behind an email thread. To address, this issue Probabilistic Latent Semantic Analysis clustering algorithm has been used in this paper to identify the email sentiment thread sequence. The sentiment and the thread sequence within the emails have been discovered as clustering sentiment polarity and temporal categories with the help of PLSA clusters. At the initial stage, we used three feature extraction methods, latent semantic analysis (LDA), bag of words (BoW), TF-IDF and SentiWordNet (SWN) lexicon for generating sentiment features of email. Next, Probabilistic Latent Semantic Analysis algorithm is used to form email clusters based on sentiment features. Thus, it helps to identify thread sentiment and sequence of sentiment threads. Email threads give a mechanism by which any user will be able to find out the sequence in the thread on the basis of sentiment analysis of email related to a specific set of communication during a specific time period. Various parameters evaluation measures have been considered in this work to evaluate the proposed model such as accuracy, precision, recall and F-measure, and the proposed algorithm is compared with other standard algorithms. Furthermore, a statistical test has also been performed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Machine learning techniques for business blog search and mining
- Author
-
Chen, Yun, Tsai, Flora S., and Chan, Kap Luk
- Subjects
- *
MACHINE learning , *BLOGS , *DATA mining , *DATABASES , *SEARCH engines , *DATABASE searching , *LATENT semantic analysis - Abstract
Weblogs, or blogs, have rapidly gained in popularity over the past few years. In particular, the growth of business blogs that are written by or provide commentary on businesses and companies opens up new opportunities for developing blog-specific search and mining techniques. In this paper, we propose probabilistic models for blog search and mining using two machine learning techniques, latent semantic analysis (LSA) and probabilistic latent semantic analysis (PLSA). We implement the models in our database of business blogs, BizBlogs07, with the aim of achieving higher precision and recall. The probabilistic model is able to segment the business blogs into separate topic areas, which is useful for keywords detection on the blogosphere. Various term-weighting schemes and factor values were also studied in detail, which reveal interesting patterns in our database of business blogs. Our multi-functional business blog system is indeed found to be very different from existing blog search engines, as it aims to provide better relevance and precision of the search. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
5. Hierarchically linked infinite hidden Markov model based trajectory analysis and semantic region retrieval in a trajectory dataset
- Author
-
Kyuchang Kang, Yongjin Kwon, Jongyoul Park, Junho Jin, and Jinyoung Moon
- Subjects
Dependency (UML) ,Probabilistic latent semantic analysis ,Computer science ,General Engineering ,020207 software engineering ,02 engineering and technology ,Object (computer science) ,Semantics ,computer.software_genre ,Computer Science Applications ,Semantic similarity ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Trajectory ,020201 artificial intelligence & image processing ,Data mining ,Hidden Markov model ,Semantic compression ,computer - Abstract
A novel model for trajectories and semantic regions (sest-hiHMM) is proposed.A sticky version of sest-hiHMMs is proposed for reducing redundant semantic regions.An extended definition of semantic regions covers actual regions, not sets of points.Our models concern the temporal dependency of observations in a trajectory.Our models retrieve reasonable semantic regions from a real trajectory dataset. With an increasing attempt of finding latent semantics in a video dataset, trajectories have become key components since they intrinsically include concise characteristics of object movements. An approach to analyze a trajectory dataset has concentrated on semantic region retrieval, which extracts some regions in which have their own patterns of object movements. Semantic region retrieval has become an important topic since the semantic regions are useful for various applications, such as activity analysis. The previous literatures, however, have just revealed semantically relevant points, rather than actual regions, and have less consideration of temporal dependency of observations in a trajectory. In this paper, we propose a novel model for trajectory analysis and semantic region retrieval. We first extend the meaning of semantic regions that can cover actual regions. We build a model for the extended semantic regions based on a hierarchically linked infinite hidden Markov model, which can capture the temporal dependency between adjacent observations, and retrieve the semantic regions from a trajectory dataset. In addition, we propose a sticky extension to diminish redundant semantic regions that occur in a non-sticky model. The experimental results demonstrate that our models well extract semantic regions from a real trajectory dataset.
- Published
- 2017
- Full Text
- View/download PDF
6. Ensemble multi-label text categorization based on rotation forest and latent semantic indexing
- Author
-
Wafa Saadaoui, Alex Aussem, Haytham Elghazel, Ouadie Gharroudi, Elghazel, Haytham, Data Mining and Machine Learning (DM2L), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), and Université de Lyon-Université Lumière - Lyon 2 (UL2)
- Subjects
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI] ,Vocabulary ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,computer.software_genre ,Machine learning ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Ranking (information retrieval) ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Artificial Intelligence ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,ComputingMilieux_MISCELLANEOUS ,media_common ,Multi-label classification ,Structure (mathematical logic) ,Probabilistic latent semantic analysis ,business.industry ,4. Education ,General Engineering ,Bootstrapping (linguistics) ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,Ensemble learning ,Computer Science Applications ,Projection (relational algebra) ,Ranking ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Latent semantic indexing - Abstract
Text categorization has gained increasing popularity in the last years due the explosive growth of multimedia documents. As a document can be associated with multiple non-exclusive categories simultaneously (e.g., Virus, Health, Sports, and Olympic Games), text categorization provides many opportunities for developing novel multi-label learning approaches devoted specifically to textual data. In this paper, we propose an ensemble multi-label classification method for text categorization based on four key ideas: (1) performing Latent Semantic Indexing based on distinct orthogonal projections on lower-dimensional spaces of concepts; (2) random splitting of the vocabulary; (3) document bootstrapping; and (4) the use of BoosTexter as a powerful multi-label base learner for text categorization to simultaneously encourage diversity and individual accuracy in the committee. Diversity of the ensemble is promoted through random splits of the vocabulary that leads to different orthogonal projections on lower-dimensional latent concept spaces. Accuracy of the committee members is promoted through the underlying latent semantic structure uncovered in the text. The combination of both rotation-based ensemble construction and Latent Semantic Indexing projection is shown to bring about significant improvements in terms of Average Precision, Coverage, Ranking loss and One error compared to five state-of-the-art approaches across 14 real-word textual data sets covering a wide variety of topics including health, education, business, science and arts.
- Published
- 2016
- Full Text
- View/download PDF
7. Text classification using genetic algorithm oriented latent semantic features
- Author
-
Alper Kursat Uysal, Serkan Gunal, Anadolu Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Uysal, Alper Kurşat, and Günal, Serkan
- Subjects
Genetic Algorithm ,Probabilistic latent semantic analysis ,Latent semantic analysis ,business.industry ,Computer science ,General Engineering ,Pattern recognition ,Feature selection ,Filter (signal processing) ,computer.software_genre ,Computer Science Applications ,Artificial Intelligence ,Feature (computer vision) ,Genetic algorithm ,Artificial intelligence ,Data mining ,Feature Selection ,Latent Semantic Indexing ,business ,Projection (set theory) ,Text Classification ,computer ,Latent semantic indexing - Abstract
WOS: 000336872300023, In this paper, genetic algorithm oriented latent semantic features (GALSF) are proposed to obtain better representation of documents in text classification. The proposed approach consists of feature selection and feature transformation stages. The first stage is carried out using the state-of-the-art filter-based methods. The second stage employs latent semantic indexing (LSI) empowered by genetic algorithm such that a better projection is attained using appropriate singular vectors, which are not limited to the ones corresponding to the largest singular values, unlike standard LSI approach. In this way, the singular vectors with small singular values may also be used for projection whereas the vectors with large singular values may be eliminated as well to obtain better discrimination. Experimental results demonstrate that GALSF outperforms both LSI and filter-based feature selection methods on benchmark datasets for various feature dimensions
- Published
- 2014
- Full Text
- View/download PDF
8. Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis.
- Author
-
Kim, Suhyeon, Park, Haecheong, and Lee, Junghye
- Subjects
- *
LATENT semantic analysis , *TREND analysis , *EXPERT systems , *INDUSTRY 4.0 , *TECHNOLOGY , *STATISTICS - Abstract
• Blockchain has a considerable value as one of the promising technologies in industrial 4.0. • We propose a new topic modeling method based on word embedding and clustering. • The proposed method outperforms an existing method in both qualitative and quantitative views. • The proposed method contributes to analyzing the research trend of blockchain technology. Blockchain has become one of the core technologies in Industry 4.0. To help decision-makers establish action plans based on blockchain, it is an urgent task to analyze trends in blockchain technology. However, most of existing studies on blockchain trend analysis are based on effort demanding full-text investigation or traditional bibliometric methods whose study scope is limited to a frequency-based statistical analysis. Therefore, in this paper, we propose a new topic modeling method called Word2vec-based Latent Semantic Analysis (W2V-LSA), which is based on Word2vec and Spherical k -means clustering to better capture and represent the context of a corpus. We then used W2V-LSA to perform an annual trend analysis of blockchain research by country and time for 231 abstracts of blockchain-related papers published over the past five years. The performance of the proposed algorithm was compared to Probabilistic LSA, one of the common topic modeling techniques. The experimental results confirmed the usefulness of W2V-LSA in terms of the accuracy and diversity of topics by quantitative and qualitative evaluation. The proposed method can be a competitive alternative for better topic modeling to provide direction for future research in technology trend analysis and it is applicable to various expert systems related to text mining. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
9. Using Google latent semantic distance to extract the most relevant information
- Author
-
Shi-Jen Lin, P. C. Chen, and Ya-Chi Chu
- Subjects
User profile ,Information retrieval ,Probabilistic latent semantic analysis ,Computer science ,business.industry ,media_common.quotation_subject ,General Engineering ,Semantic search ,Construct (python library) ,Computer Science Applications ,Domain (software engineering) ,Search engine ,Semantic similarity ,Artificial Intelligence ,Reading (process) ,business ,media_common - Abstract
Research highlights? We adapted the Google similarity distance algorithm into a more efficient new algorithm. ? We used the PLSA to enhance the original 2-gram NGD into a 3-gram algorithm. ? To extract the most important sequence of keywords to provide the most relevant search results to the user. There have been many studies about how to help users enter more keywords into a search engine to find the most relevant documents or search results. Methods previously reported in the literature require a database to save the user profile, and construct a well-trained model to provide the potential "next keyword" to the user. Because the predictive models are based on the training data, they can only be used in a single knowledge domain. In this paper, we describe a new algorithm called "Google latent semantic distance" (GLSD) and use it to extract the most important sequence of keywords to provide the most relevant search results to the user. Our method utilizes on-line, real-time processing and needs no training data. Thus, it can be used in different knowledge domains. Our experiments show that the GLSD can achieve high accuracy, and we can find out the most relevant information in the top search results in most cases. We believe that this new system can increase users' effectiveness in both reading and writing articles.
- Published
- 2011
- Full Text
- View/download PDF
10. A novel sentence similarity measure for semantic-based expert systems
- Author
-
Ming Che Lee
- Subjects
Computer science ,WordNet ,computer.software_genre ,Semantic role labeling ,Semantic similarity ,Semantic equivalence ,Artificial Intelligence ,Explicit semantic analysis ,Semantic computing ,Semantic Web ,Semantic compression ,Information retrieval ,Probabilistic latent semantic analysis ,business.industry ,General Engineering ,Expert system ,Computer Science Applications ,Vector space model ,Ontology ,Semantic technology ,Artificial intelligence ,business ,computer ,Natural language processing ,Sentence ,Natural language - Abstract
Research highlights? This research takes advantages of corpus-based ontology and Information Retrieval technologies to evaluate the semantic similarity between irregular sentences. ? The part of speech concept was taken into account and was integrated into the proposed semantic-VSM measure. ? This research tries to qualify the semantic similarity of natural language sentences. A novel sentence similarity measure for semantic based expert systems is presented. The well-known problem in the fields of semantic processing, such as QA systems, is to evaluate the semantic similarity between irregular sentences. This paper takes advantage of corpus-based ontology to overcome this problem. A transformed vector space model is introduced in this article. The proposed two-phase algorithm evaluates the semantic similarity for two or more sentences via a semantic vector space. The first phase built part-of-speech (POS) based subspaces by the raw data, and the latter carried out a cosine evaluation and adopted the WordNet ontology to construct the semantic vectors. Unlike other related researches that focused only on short sentences, our algorithm is applicable to short (4-5 words), medium (8-12 words), and even long sentences (over 12 words). The experiment demonstrates that the proposed algorithm has outstanding performance in handling long sentences with complex syntax. The significance of this research lies in the semantic similarity extraction of sentences, with arbitrary structures.
- Published
- 2011
- Full Text
- View/download PDF
11. Using backward elimination with a new model order reduction algorithm to select best double mixture model for document clustering
- Author
-
Farshad Almasganj and Tahereh Emami Azadi
- Subjects
Probabilistic latent semantic analysis ,business.industry ,Model selection ,General Engineering ,Pattern recognition ,Latent variable ,Document clustering ,computer.software_genre ,Mixture model ,Latent Dirichlet allocation ,Latent class model ,Computer Science Applications ,symbols.namesake ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Expectation–maximization algorithm ,symbols ,Data mining ,Artificial intelligence ,business ,computer ,Algorithm ,Mathematics - Abstract
Probabilistic latent semantic analysis (PLSA) is a double structure mixture model which has got a wide application in text and web mining. This method is capable of establishing hidden semantic relations among the observed features, using a number of latent variables. In this approach, the selection of the correct number of latent variables is critical. In the most of the previous researches, the number of latent topics was selected based on the number of invoked classes. This paper presents a method, based on backward elimination approach, which is capable of unsupervised order selection in PLSA. This method starts with a model having a number of components more than the needed value, and then prunes the mixtures to reach their optimum size. During the elimination process, proper selection of some latent variables which must be deleted is the most essential problem, and its relation to the final performance of the pruned model is straightforward. To treat this problem, we introduce a new combined pruning method which selects the best options for removal, while keeping a low computational cost, at all. We conducted some experiments on two datasets from Reuters-21578 corpus. The obtained results show that this algorithm leads to an optimized number of latent variables and in turn achieves better clustering performance compared to the conventional model selection methods. It also shows superiority over the case in which a PLSA model with a fixed number of latent variables, equal to the real number of clusters, is exploited.
- Published
- 2009
- Full Text
- View/download PDF
12. Incorporating topic transition in topic detection and tracking algorithms
- Author
-
Shiyong Zhang and Jianping Zeng
- Subjects
Topic model ,Information retrieval ,Probabilistic latent semantic analysis ,Artificial Intelligence ,Computer science ,Transition (fiction) ,General Engineering ,Document clustering ,Hidden Markov model ,Tracking (particle physics) ,Mixture model ,Algorithm ,Computer Science Applications - Abstract
Topics often transit among documents in a document collection. To improve the accuracy of the topic detection and tracking (TDT) algorithms in discovering topics or classifying documents, it is necessary to make full use of this kind of topic transition information. However, TDT algorithms usually find topics based on topic models, such as LDA, pLSI, etc., which are a kind of mixture model and make the topic transition difficult to be denoted and implemented. A topic transition model representation based on hidden Markov model is present, and learning the topic transition from documents is discussed. Based on the model, two TDT algorithms incorporating topic transition, i.e. topic discovering and document classifying, are provided to show the application of the proposed model. Experiments on two real-world document collections are done with the two algorithms, and performance comparison with other similar algorithm shows that the accuracy can achieve 93% for topic discovering in Reuters-21578, and 97.3% in document classifying. Furthermore, topic transition discovered by the algorithm on a dataset which was collected from a BBS website is consistent with the manual analysis results.
- Published
- 2009
- Full Text
- View/download PDF
13. Web page classification based on a support vector machine using a weighted vote schema
- Author
-
Rung-Ching Chen and Chung Hsun Hsieh
- Subjects
Information retrieval ,Probabilistic latent semantic analysis ,Latent semantic analysis ,Computer science ,Feature extraction ,General Engineering ,Feature selection ,Computer Science Applications ,Support vector machine ,Schema (genetic algorithms) ,ComputingMethodologies_PATTERNRECOGNITION ,Artificial Intelligence ,Web query classification ,Web page ,Website Parse Template - Abstract
Traditional information retrieval method use keywords occurring in documents to determine the class of the documents, but usually retrieves unrelated web pages. In order to effectively classify web pages solving the synonymous keyword problem, we propose a web page classification based on support vector machine using a weighted vote schema for various features. The system uses both latent semantic analysis and web page feature selection training and recognition by the SVM model. Latent semantic analysis is used to find the semantic relations between keywords, and between documents. The latent semantic analysis method projects terms and a document into a vector space to find latent information in the document. At the same time, we also extract text features from web page content. Through text features, web pages are classified into a suitable category. These two features are sent to the SVM for training and testing respectively. Based on the output of the SVM, a voting schema is used to determine the category of the web page. Experimental results indicate our method is more effective than traditional methods.
- Published
- 2006
- Full Text
- View/download PDF
14. A semantic learning for content-based image retrieval using analytical hierarchy process
- Author
-
Tzu-Chuan Chou, Shyi-Chyi Cheng, Hung-Yi Chang, and Chao-Lung Yang
- Subjects
Information retrieval ,Probabilistic latent semantic analysis ,Computer science ,business.industry ,General Engineering ,Semantic search ,Relevance feedback ,Semantic data model ,Content-based image retrieval ,Computer Science Applications ,Automatic image annotation ,Semantic similarity ,Semantic equivalence ,Artificial Intelligence ,Explicit semantic analysis ,Semantic computing ,Semantic learning ,Semantic technology ,Visual Word ,business ,Image retrieval ,Semantic compression - Abstract
In this paper, a new semantic learning method for content-based image retrieval using the analytic hierarchical process (AHP) is proposed. AHP proposed by Satty used a systematical way to solve multi-criteria preference problems involving qualitative data and was widely applied to a great diversity of areas. In general, the interpretations of an image are multiple and hard to describe in terms of low-level features due to the lack of a complete image understanding model. The AHP provides a good way to evaluate the fitness of a semantic description used to interpret an image. According to a predefined concept hierarchy, a semantic vector, consisting of the fitness values of semantic descriptions of a given image, is used to represent the semantic content of the image. Based on the semantic vectors, the database images are clustered. For each semantic cluster, the weightings of the low-level features (i.e. color, shape, and texture) used to represent the content of the images are calculated by analyzing the homogeneity of the class. In this paper, the values of weightings setting to the three low-level feature types are diverse in different semantic clusters for retrieval. The proposed semantic learning scheme provides a way to bridge the gap between the high-level semantic concept and the low-level features for content-based image retrieval. Experimental results show that the performance of the proposed method is excellent when compared with that of the traditional text-based semantic retrieval techniques and content-based image retrieval methods.
- Published
- 2005
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.