35 results on '"P. C. Reghu Raj"'
Search Results
2. A novel technique using graph neural networks and relevance scoring to improve the performance of knowledge graph-based question answering systems.
- Author
-
Sincy V. Thambi and P. C. Reghu Raj
- Published
- 2024
- Full Text
- View/download PDF
3. State-of-the-Art Methods for Fine-Grained Emotion Detection from Malayalam Text using Deep Learning: A Survey
- Author
-
K Anuja, P C Reghu Raj, and Remesh Babu K R
- Published
- 2022
- Full Text
- View/download PDF
4. Mapping Documents Onto Concept Databases for Threshold-Based Retrieval.
- Author
-
P. C. Reghu Raj and S. Raman 0001
- Published
- 2003
5. Design of a high speed string matching co-processor for NLP.
- Author
-
Vadali Srinivasa Murty, P. C. Reghu Raj, and S. Raman 0001
- Published
- 2003
- Full Text
- View/download PDF
6. Phrase Grammar-Based Automatic Conceptual Tagging System.
- Author
-
P. C. Reghu Raj and S. Raman 0001
- Published
- 2003
7. Augmenting Phrase-Based Text Representation with Conceptual Indexing for Effective Retrieval.
- Author
-
Rupali Sharma, P. C. Reghu Raj, and S. Raman 0001
- Published
- 2003
8. A Phrase Grammar-Based Conceptual Indexing Paradigm.
- Author
-
P. C. Reghu Raj and S. Raman 0001
- Published
- 2005
- Full Text
- View/download PDF
9. Summarization and Categorization of Text Data in High-Level Data Cleaning for Information Retrieval.
- Author
-
M. Saravanan 0001, P. C. Reghu Raj, and S. Raman 0001
- Published
- 2003
- Full Text
- View/download PDF
10. Architecture of an Ontology-Based Domain-Specific Natural Language Question Answering System.
- Author
-
Athira P. M., Sreeja M., and P. C. Reghu Raj
- Published
- 2013
11. Pre-trained Word Embeddings for Malayalam Language: A Review
- Author
-
P C Rafeeque, K Reji Rahmath, and P C Reghu Raj
- Subjects
0209 industrial biotechnology ,Word embedding ,business.industry ,Computer science ,Sentiment analysis ,Context (language use) ,02 engineering and technology ,Semantic property ,computer.software_genre ,Semantics ,language.human_language ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,Malayalam ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing ,Meaning (linguistics) - Abstract
Word embeddings are used to convert human language into a numerical form by encoding the semantic properties of words. Using it each word can be transformed to a set of N-dimensional vectors. It plays a vital role in processing of linguistic applications like natural language inference, information retrieval, sentiment analysis, etc. The goal of word embedding is to capture the meaning of words in their context. And it also find the semantic relationships and similarities between words. The aim of this work is to summarize the existing embedding techniques for words and available corpus for Malayalam language. Since Malayalam is a resource-constrained Indian language, this paper is expected to help NLP researchers in Malayalam to identify the existing resources and to improve the current research trend.
- Published
- 2021
- Full Text
- View/download PDF
12. Improving relation extraction beyond sentence boundaries using attention
- Author
-
C. A. Deepa, P. C. Reghu Raj, and Ajeesh Ramanujan
- Subjects
0209 industrial biotechnology ,Sentence boundary disambiguation ,Relation (database) ,Computer science ,business.industry ,Self attention ,02 engineering and technology ,computer.software_genre ,Relationship extraction ,Focus (linguistics) ,Information extraction ,020901 industrial engineering & automation ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Word (computer architecture) ,Sentence - Abstract
Reation Extraction(RE) is the subprocess of Information Extraction(IE) which focuses on determining and extracting the reation between two participating entities. Most of the past work focus on extracting relations within a sentence. Nowadays, research on relation extraction focuses on identifying and determining relationship between participating entities across sentences. This paper proposes a bi-directional GRU model with self attention mechanism for inter-sentential relation extraction. First, a bi-directional GRU with self attention mechanism is used to capture the information about the relation from intermediary terms between two entities. Then a bi-directional GRU is used to capture the information represented by entities, which plays a vital role in relation extraction. Finally, the proposed model combines both word embeddings and entity embeddings for extracting a relation. Experimental results show that the proposed Bi-directional GRU model can deliver state-of-the-art results on relation classification. Application of self attention mechanism on intermediary terms improves the performance of relation extraction. Experimental results show that F-measure of the proposed inter-sentential relation extraction is 0.75, which is better than state-of-the-art systems of inter-sentential relation extraction with the same dataset.
- Published
- 2021
- Full Text
- View/download PDF
13. Relation Extraction across sentences using Bi-directional Long Short Term Memory Networks
- Author
-
Ajeesh Ramanujan, P C Reghu Raj, and C. A. Deepa
- Subjects
Sentence boundary disambiguation ,Relation (database) ,Computer science ,business.industry ,computer.software_genre ,Relationship extraction ,Field (computer science) ,Long short term memory ,Relation classification ,Work (electrical) ,Artificial intelligence ,business ,computer ,Natural language processing ,Sentence - Abstract
Most of the past work on relation extraction(RE) has focused on identifying relationships between entities within a sentence. Nowadays, most of the research in the field of RE has got interested in relation extraction between entity pairs across sentence boundaries. This paper proposes a Bi-directional LSTM model for for inter-sentential RE. Experimental results show that the proposed Bi-LSTM model can achieve better results on relation classification by capturing the information hidden in long-distance relation patterns.
- Published
- 2020
- Full Text
- View/download PDF
14. Web Page Ranking Using Multilingual Information Search Algorithm - A Novel Approach
- Author
-
P.V. Vidya, P. C. Reghu Raj, and V. Jayan
- Subjects
Cognitive models of information retrieval ,Information retrieval ,Concept search ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Google Search ,Word Frequency ,020206 networking & telecommunications ,02 engineering and technology ,Google Translate API ,Query language ,Ranking (information retrieval) ,World Wide Web ,Query expansion ,Human–computer information retrieval ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,Multilingual Information Retrieval ,Inverted Index ,020201 artificial intelligence & image processing ,Relevance (information retrieval) ,Information filtering system ,General Environmental Science - Abstract
The goal of an information retrieval system is to provide the information that is relevant to the user's query. In some cases the information relevant to the user request may not exist in the user's native language. Situations may also arise where the user is able to read documents in languages different from the native one, but might have difficulty in formulating queries in them. The main intention behind Multilingual Information Retrieval is to find the relevant information available irrespective of the language used in the query.
- Published
- 2016
- Full Text
- View/download PDF
15. Sandhi Splitter for Malayalam Using MBLP Approach
- Author
-
P. C. Reghu Raj and M. Nisha
- Subjects
Topic model ,Agglutinative language ,Memory Based Language Processing ,Computer science ,Speech recognition ,computer.software_genre ,050105 experimental psychology ,Sandhi ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0501 psychology and cognitive sciences ,Segmentation ,General Environmental Science ,business.industry ,05 social sciences ,Search engine indexing ,Malayalam Morphology ,language.human_language ,Identification (information) ,Malayalam ,language ,General Earth and Planetary Sciences ,Artificial intelligence ,Memory Based Learning ,Suffix ,0305 other medical science ,business ,computer ,Natural language processing - Abstract
The morphological richness and the agglutinative nature of Malayalam make it necessary to retrieve the root word from its inflected form in most of the NLP tasks. This paper presents an approach to identify the suffixes of Malayalam words using MBLP approach. The idea here is to use Memory Based Language Processing (MBLP) algorithm for Malayalam suffix identification. MBLP is an approach to language processing based on exemplar storage during learning and analogical reasoning during processing. Sandhi splitting is essential for morphological analysis, document indexing and topic modeling. Suffix separation improves the quality of machine translated text. Training instances created from words are manually annotated for their segmentation and the system is trained using TiMBL (Tilberg Memory Based Learner). The paper presents memory-based model of Malayalam suffix identification and its generalization accuracy.
- Published
- 2016
- Full Text
- View/download PDF
16. Unsupervised Approach to Word Sense Disambiguation in Malayalam
- Author
-
P. C. Reghu Raj, V. Jayan, and K.P. Sruthi Sankar
- Subjects
Machine translation ,Information extraction ,Computer science ,media_common.quotation_subject ,Word sense disambiguation ,Context (language use) ,02 engineering and technology ,computer.software_genre ,Context similarity ,Unsupervised methods ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Collocations ,General Environmental Science ,media_common ,business.industry ,Ambiguity ,language.human_language ,SemEval ,Word lists by frequency ,Malayalam ,language ,General Earth and Planetary Sciences ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
Word Sense Disambiguation (WSD) is the task of identifying the correct sense of a word in a specific context when the word has multiple meaning. WSD is very important as an intermediate step in many Natural Language Processing (NLP) tasks, especially in Information Extraction(IE), Machine Translation(MT) and Question/Answering Systems. Word sense ambiguity arises when a particular word has more than one possible sense. The peculiarity of any language is that it includes a lot of ambiguous words. Since the sense of a word depends on its context of use, disambiguation process requires the understanding of word knowledge. Automatic WSD systems are available for structured languages like English, Chinese, etc. But Indian languages are morphologically rich and thus the processing task is very complex. The aim of this work is to develop a WSD system for Malayalam, a language spoken in India, predominantly used in the state of Kerala. The proposed system uses a corpus which is collected from various Malayalam web documents. For each possible sense of the ambiguous word, a relatively small set of training examples (seed sets) are identified which represents the sense. Collocations and most co-occurring words are considered as training examples. Seed set expansion module extends the seed set by adding most similar words to the seed set elements. These extended sets act as sense clusters. The most similar sense cluster to the input text context is considered as the sense of the target word.
- Published
- 2016
- Full Text
- View/download PDF
17. Paragraph Ranking Based on Eigen Analysis
- Author
-
O. K. Reshma and P. C. Reghu Raj
- Subjects
Eigen analysis ,Information retrieval ,Software_GENERAL ,Computer science ,media_common.quotation_subject ,Fuzzy graph ,Paragraph correlation ,Ranking (information retrieval) ,Data set ,Semantic similarity ,Reading (process) ,Node (computer science) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,General Earth and Planetary Sciences ,Paragraph ,Paragraph ranking ,General Environmental Science ,media_common - Abstract
The information contained in the document can be retrieved from its most significant paragraph, rather than by reading the whole document. The proposed work ranks the paragraphs of a text document using eigen analysis and returns the most important paragraph of a document. The importance of each paragraph is determined based on the correlation between the paragraphs. The proposed method explores the use of fuzzy graphs in capturing the inter-paragraph correlation of text documents. This approach models the document as a fuzzy graph where a node refers to a paragraph and an edge indicates the relationship between the paragraphs. The correlation between paragraphs is measured by extracting their semantic similarity. The importance of each node is determined based on this correlation. Subsequently the system ranks the paragraphs according to their importance. The proposed system is evaluated using DUC 2001 data set. The ROUGE scores show that the significant paragraph suggested by the proposed method covers relatively a good amount of relevant information in the document.
- Published
- 2015
- Full Text
- View/download PDF
18. RTRL based adaptive neuro-controller for damping SSR oscillations in SCIG based windfarms
- Author
-
P. C. Reghu Raj and K. C. Sindhu Thampatty
- Subjects
Wind power ,business.industry ,Computer science ,020209 energy ,Induction generator ,02 engineering and technology ,Grid ,Wind speed ,Renewable energy ,Electric power system ,Electric power transmission ,Control theory ,0202 electrical engineering, electronic engineering, information engineering ,business - Abstract
As the global energy consumption is rising dramatically, wind energy is a prominent one among the renewable energy sources. The penetration of wind energy into grid is increasing day by day. In order to carry huge amount of wind power during the grid integration of large scale wind farms, high transmission line capability is demanded. In order to improve the power carrying capability of the transmission line and to improve the stability of the system, series compensation is the best practical solution. Series compensation can result in Sub-Synchronous Resonance (SSR) oscillations in the electrical system which will lead to damages in the system such as shaft failure. In this paper, a novel idea of using the Real Time Recurrent Learning (RTRL) based adaptive neuro controller is proposed for damping SSR oscillations in grid connected windfarms. The controller is trained in real time without a reference model. The effectiveness of the proposed controller is tested under varying series compensation, wind speeds and grid impedance conditions and it has been proved that the proposed controller performs far better than any other linear controllers.
- Published
- 2017
- Full Text
- View/download PDF
19. Architecture of an Ontology-Based Domain-Specific Natural Language Question Answering System
- Author
-
P. C. Reghu Raj, M Sreeja, and P M Athira
- Subjects
Information retrieval ,Computer science ,Process (engineering) ,Question answering ,Ontology ,Domain knowledge ,Architecture ,Semantic Web ,Natural language ,Domain (software engineering) - Abstract
Question answering (QA) system aims at retrieving precise information from a large collection of documents against a query. This paper describes the architecture of a Natural Language Question Answering (NLQA) system for a specific domain based on the ontological information, a step towards semantic web question answering. The proposed architecture defines four basic modules suitable for enhancing current QA capabilities with the ability of processing complex questions. The first module was the question processing, which analyses and classifies the question and also reformulates the user query. The second module allows the process of retrieving the relevant documents. The next module processes the retrieved documents, and the last module performs the extraction and generation of a response. Natural language processing techniques are used for processing the question and documents and also for answer extraction. Ontology and domain knowledge are used for reformulating queries and identifying the relations. The aim of the system is to generate short and specific answer to the question that is asked in the natural language in a specific domain. We have achieved 94 % accuracy of natural language question answering in our implementation.
- Published
- 2013
- Full Text
- View/download PDF
20. A ROBUST DLQG CONTROLLER FOR DAMPING OF SUB-SYNCHRONOUS OSCILLATIONS IN A SERIES COMPENSATED POWER SYSTEM
- Author
-
K C Sindhu Thampatty and P. C. Reghu Raj
- Subjects
LTI system theory ,Electric power system ,Engineering ,business.industry ,Robustness (computer science) ,Control theory ,Full state feedback ,Open-loop controller ,Kalman filter ,Modular design ,business ,Linear-quadratic-Gaussian control - Abstract
This paper investigates the use of Discrete Linear Quadratic Gaussian (DLQG) Compensator to damp sub synchronous oscillations in a Thyrisor Controlled Series Capacitor (TCSC) compensated power system. The study is conducted on IEEE First Benchmark Model (FBM) in which, TCSC is modelled as a discrete linear time-invariant modular unit in the synchronously rotating DQ reference frame. This modular TCSC is then integrated with the Linear Time Invariant (LTI) model of the rest of the system. The design of DLQG includes the design of a Kalman filter for full state estimation and a full state feedback for control. Since the order of the controller is as large as the order of the system considered here(27 states), the practical implementation of the controller is difficult. Hence by using Hankels norm approximation technique, the order of the controller is reduced from 27 to 15 without losing the significant system dynamics. The eigen analysis of the system shows that the use of DLQG can damp torsional oscillations as well as the swing mode oscillations simultaneously, which is practically difficult for a conventional sub-synchronous damping controller. The performance of the system with DLQG is appreciable for all operating conditions and it shows the robustness of the controller.
- Published
- 2013
- Full Text
- View/download PDF
21. Design and Implementation of RTRL Based Adaptive Controller for TCSC to enhance power system stability
- Author
-
K. C. Sindhu Thampatty and P. C. Reghu Raj
- Subjects
Engineering ,Artificial neural network ,business.industry ,020208 electrical & electronic engineering ,Stability (learning theory) ,Thyristor ,Control engineering ,02 engineering and technology ,law.invention ,Electric power system ,Capacitor ,Flexible AC transmission system ,Control theory ,law ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Electric power ,business - Abstract
The power system complexity is increasing day by day and the requirement of stable, secure and high quality electrical power is mandatory in present scenario. Flexible AC Transmission System (FACTS) devices such as Thyristor Controlled Series Capacitor (TCSC) are commonly used nowadays to improve the power system performance. This paper presents the design and Implementation of non-linear, Adaptive Real Time Recurrent Learning Algorithm (RTRL) based controller for TCSC to damp power system oscillations and enhance the stability of the system. This control scheme requires two sets of neural networks. The first set is a neuro-identifier and the second set is a neuro-controller which generate the required control signals for the thyristors.
- Published
- 2016
- Full Text
- View/download PDF
22. A Memory Based approach to Malayalam noun generation
- Author
-
Reji Rahmath K and P. C. Reghu Raj
- Subjects
Root (linguistics) ,Machine translation ,business.industry ,Computer science ,Speech recognition ,Part of speech ,computer.software_genre ,language.human_language ,Rule-based machine translation ,Noun ,Malayalam ,language ,Artificial intelligence ,business ,computer ,Natural language processing ,Word (computer architecture) ,Generator (mathematics) - Abstract
Words are the important building blocks of every language. Morphological generator is used to get the inflected form of a word, given its root word and a set of properties such as lexical category and morphological properties. Morphological Generation and analysis are necessary for developing computational grammars as well as machine translation systems. This paper presents a morphological generator for Malayalam nouns using Memory Based Language Processing (MBLP) approach. MBLP is an approach to language processing based on exemplar storage during learning, and analogical reasoning during processing. For training the system, a training corpus is created. It contains the basic examples of root words and their features. The feature set for this Malayalam noun generation system includes number, case, and the last syllable of the root word. Tilburg Memory based Learner (TiMBL) is used for training the system. The system doesn't require a dictionary or rules for its working. It gives a satisfactory result, having an accuracy of 93.68%
- Published
- 2015
- Full Text
- View/download PDF
23. Text chunker for Malayalam using Memory-Based Learning
- Author
-
P. C. Reghu Raj and C T Rekha Raj
- Subjects
Analogical reasoning ,Shallow parsing ,Phrase ,Computer science ,business.industry ,Speech recognition ,Reuse ,computer.software_genre ,Class (biology) ,language.human_language ,Chunking (psychology) ,Malayalam ,language ,Artificial intelligence ,business ,computer ,Natural language processing ,Word order - Abstract
Text chunking consists of dividing a text into syntactically correlated parts of words. Given the words and their morphosyntactic class, a chunker will decide which words can be grouped as chunks. Malayalam is a free word order language and has relatively unrestricted phrase structures that make the problem of chunking quite challenging. This paper aims to develop a text chunker for Malayalam using Memory-Based Learning (MBL) approach. Memory-Based Learning is a machine learning methodology based on the idea that the direct reuse of examples using analogical reasoning is more suited for solving language processing problems than the application of rules extracted from those examples. The chunker was trained using the tool Memory-Based Tagger (MBT) with words and their POS tags as features. The chunker demonstrated an accuracy of 97.14%.
- Published
- 2015
- Full Text
- View/download PDF
24. Tamil to Malayalam Transliteration
- Author
-
P.V. Vidya, T V Sreerekha, R. R. Rajeev, P. C. Reghu Raj, and Kavitha Raju
- Subjects
Machine translation ,Computer science ,business.industry ,Feature extraction ,Pragmatics ,computer.software_genre ,language.human_language ,ComputingMethodologies_PATTERNRECOGNITION ,Transcription (linguistics) ,Writing system ,Tamil ,Malayalam ,language ,Transliteration ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Transliteration forms an essential part of transcription which converts text from one writing system to another. The need for translating data has become larger than before as the world is getting together through social media. Machine transliteration has emerged as a part of information retrieval and machine translation projects to translate named entities, that are not registered in the dictionary, based on phonemes and graphemes. This paper proposes a machine learning technique that performs transliteration from Tamil to Malayalam, two languages that belong to Dravidian family. Transliteration can be used to supplement machine translation process by handling the issues that can happen due to the presence of named entities.
- Published
- 2015
- Full Text
- View/download PDF
25. Adaptive RTRL based hybrid controller for series connected FACTS devices for damping power system oscillations
- Author
-
K. C. Sindhu Thampatty and P. C. Reghu Raj
- Subjects
Engineering ,Artificial neural network ,business.industry ,Thyristor ,Control engineering ,Power (physics) ,law.invention ,Capacitor ,Recurrent neural network ,Control theory ,law ,Control system ,business ,MATLAB ,computer ,computer.programming_language - Abstract
This paper presents a novel design of a co-ordinated controller for series connected FACTS devices like Thyristor Controlled Series Capacitor(TCSC) and Thyristor controlled Power Angle Regulator (TCPAR). The scheme can be used for non-linear system control, in which the exact linearized mathematical model of the system is not required, can be used to control many FACTS devices with a single controller. The basis of the proposed design is the Real Time Recurrent Learning (RTRL) algorithm in which the Neural Network (NN) is trained in real time. This requires two sets of neural networks. The first set is a fully connected Recurrent Neural Network (RNN) which acts as a neuro-identifier that provides the dynamic model of the system. The second set of neural network is the neuro-controller, used to generate the required control signals for the thyristors. Simulations results of the system using MATLAB/SIMULINK show that the performance of the system with the proposed controller is better than the conventional PI controllers and GA-based PI controllers.
- Published
- 2015
- Full Text
- View/download PDF
26. Malayalam morphological analysis using MBLP approach
- Author
-
P. C. Reghu Raj, C T Rekha Raj, Reji Rahmath K, and M. Nisha
- Subjects
Analogical reasoning ,Computer science ,Generalization ,business.industry ,Speech recognition ,computer.software_genre ,language.human_language ,Statistical classification ,Morphological analysis ,Malayalam ,language ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
This paper presents an approach to morphological analysis of Malayalam words as a classification Problem. The idea here is to use Memory Based Language Processing (MBLP) algorithm for Malayalam morphological analysis. MBLP is an approach to language processing based on exemplar storage during learning, and analogical reasoning during processing. The aim of the system is to find the citation forms (or meaningful parts) of words rather than a detailed morphological analysis. Training instances created from words are manually annotated for their segmentation and the system is trained using TiMBL (Tilburg Memory based Learner). The paper presents memory based model of Malayalam morphological analysis and its generalization accuracy.
- Published
- 2015
- Full Text
- View/download PDF
27. Fuzzy logic based hybrid approach for sentiment analysisl of Malayalam movie reviews
- Author
-
P. C. Reghu Raj, Raveena R Kumar, M Anagha, and K Sreetha
- Subjects
Fuzzy classification ,Neuro-fuzzy ,business.industry ,Computer science ,Feature extraction ,Sentiment analysis ,Context (language use) ,computer.software_genre ,Machine learning ,Fuzzy logic ,language.human_language ,Mood ,Text mining ,Malayalam ,language ,Artificial intelligence ,business ,computer ,Sentence ,Natural language processing ,Natural language - Abstract
In this paper, Sentence level Sentiment Analysis of Malayalam movie reviews is done by classifying the polarity of opinions obtained from the user as positive, negative and neutral. Sentiment analysis is an application of Natural Language Processing and text analysis which helps to identify the emotions in a given context. In this work a hybrid approach for Sentiment Analysis is used in which Machine Learning method is used for tagging and Fuzzy Logic is used to find the membership of the review in each sentiment class. Fuzzy logic can be used to handle the vagueness in natural language. Manually tagged data are trained using TnT and Fuzzy rules are incorporated for identifying and classifying the emotions. Certain other rules are also incorporated to handle certain special cases. The fuzzy rules yield output that varies in degree between 0 and 1.
- Published
- 2015
- Full Text
- View/download PDF
28. TnT tagger with fuzzy rule based learning
- Author
-
Alen Jacob, P. C. Reghu Raj, and Amal Babu
- Subjects
Fuzzy rule ,Computer science ,business.industry ,Speech recognition ,Pattern recognition ,Context (language use) ,Viterbi algorithm ,Fuzzy logic ,Set (abstract data type) ,symbols.namesake ,symbols ,Probability distribution ,Artificial intelligence ,Forward algorithm ,Computational linguistics ,Hidden Markov model ,business ,Word (computer architecture) - Abstract
TnT is an efficient statistical Parts-of-speech (POS) Tagger based on Hidden Markov Model. TnT stands for Trigrams‘n’Tags. Viterbi algorithm is used for finding the best tag sequence for a given observation sequence of words. TnT performs well on known word sequences. But, the performance degrades with increase in the number of unknown words. In this paper, we propose a method to overcome this performance degradation using fuzzy rules. Fuzzy rule based model is designed to provide TnT with sufficient information about the tag of unknown words without degrading the performance of TnT. When TnT with fuzzy rule based learning encounters an unknown word, the TnT generates a set of possible tags for the given word, based on the fuzzy rules matched by the word. If the word does not match any fuzzy rule, then the model depends upon the probability distribution of the suffix. This approach guarantees that the performance of TnT will only be improved from its normal performance.
- Published
- 2015
- Full Text
- View/download PDF
29. Towards improving the performance of language identification system for Indian languages
- Author
-
P. C. Reghu Raj, Abitha Anto, K. T. Sreekumar, and C. Santhosh Kumar
- Subjects
Hindi ,Phonotactics ,Language identification ,business.industry ,Computer science ,Speech recognition ,computer.software_genre ,language.human_language ,Data modeling ,Tamil ,language ,Malayalam ,Artificial intelligence ,Language model ,business ,computer ,Natural language processing ,Test data - Abstract
In this paper, we present the details of a phonotactic language identification (LID) system developed for five Indian languages, English (Indian), Hindi, Malayalam, Tamil and Kan-nada. Since there are no publicly available speech databases for English, Malayalam and Kannada, we developed the database for each of the target languages by downloading the audio files from YouTube videos and removing the non-speech signals manually. The system was tested using a test data set consisting of 40 utterances with duration of 30, 10, and 3 sees, in each of 5 target languages. The performance evaluation was done separately accordingly to the NIST benchmarking sessions, for 30s, 10s and 3s segments separately. For the baseline system, we got an overall EER of 10.41 %, 19.56 % and 31.45 % for 30, 10, and 3 sees segments when tested with a 3-gram language model. The use of 4-gram language model has helped enhance the performance of the LID system to 9.81 %, 19.38 % and 32.77% respectively for 30,10 and 3 sees test segments. Further, by using the n-gram smoothing, we were able to improve the EER of the LID system, 9.02 %, 18.70 % and 29.24 % for 3-gram language models and 8.88 %, 16.46 % and 32.03 % for 4-gram language models, respectively for 30,10, and 3 sec test segments. The study shows that the use of 4-gram language models can help enhance the performance of LID systems for Indian languages.
- Published
- 2014
- Full Text
- View/download PDF
30. Random forest algorithm for improving the performance of speech/non-speech detection
- Author
-
P. C. Reghu Raj, K. T. Sreekumar, C. Santhosh Kumar, and Sincy V. Thambi
- Subjects
Voice activity detection ,business.industry ,Computer science ,Speech recognition ,Feature extraction ,Decision tree ,Pattern recognition ,Feature selection ,Random forest ,Statistical classification ,Frequency domain ,Artificial intelligence ,business ,Smoothing - Abstract
Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral domain features for short time frames of 20 ms. size along with their mean and standard deviation for segments of size 200 ms. We then analysed if selecting a subset of the features can help improve the performance of the SND system. Towards this, we experimented with different feature selection algorithms, and observed that correlation based feature selection gave the best results. Further, we experimented with different decision tree classification algorithms, and note that random forest algorithm outperformed other decision tree algorithms. We further improved the SND system performance by smoothing the decisions over 5 segments of 200 ms. each. Our baseline system has 272 features, a classification accuracy of 94.45 % and the final system with 8 features has a classification accuracy of 97.80 %.
- Published
- 2014
- Full Text
- View/download PDF
31. Towards improving the performance of speaker recognition systems
- Author
-
P. C. Reghu Raj, Kuruvachan K. George, C. Santhosh Kumar, and Neethu Johnson
- Subjects
Speaker diarisation ,Voice activity detection ,Mobile phone ,Computer science ,Headset ,Speech recognition ,Test set ,Feature extraction ,Speaker recognition ,Data modeling - Abstract
This paper studies the contribution of different phones in speech data towards improving the performance of text/language independent speaker recognition systems. This work is motivated by the fact that the removal of silence segments from the speech data improves the system performance significantly as it does not contain any speaker-specific information. It is also clear from the literature that not all the phones in the speech data contains equal amount of speaker-specific information in it and the performance of the speaker recognition systems depends on this information. In addition to the silence segments, our work empirically finds 18 other diluent phones that has minimum speaker discrimination capability. We propose to use a preprocessing stage that identifies all non-informative set of phones recursively and removes them along with silence segments. Results show that using phones removed preprocessed data in state-of-the-art i-vector system outperforms the baseline i-vector system. We report absolute improvements of 1%, 1%, 2%, 2% and 1% in EER for test set collected through channels of Digital Voice Recorder, Headset, Mobile Phone 1, Mobile Phone 2 and Tablet PC respectively on IITG-MV database.
- Published
- 2014
- Full Text
- View/download PDF
32. An effective Malayalam information retrieval system using query expansion
- Author
-
P. C. Reghu Raj, C. Sreejith, and O. K. Reshma
- Subjects
Agglutinative language ,Information retrieval ,Computer science ,business.industry ,Information needs ,Query language ,computer.software_genre ,language.human_language ,Classical language ,Query expansion ,Negation ,Malayalam ,language ,Artificial intelligence ,business ,computer ,Natural language processing ,RDF query language ,computer.programming_language - Abstract
Malayalam, a classical language in India, is spoken by over 40 million people. This paper proposes an effective information retrieval system for Malayalam, which retrieve Malayalam documents relevant to the user's information need. The proposed system improves effectiveness by considering synonyms and negations of the terms specified in the query. Though the evaluation of the system is performed over a small corpus, the results are promising. The proposed system is thus relevant for various natural language processing tasks in an highly agglutinative language like Malayalam.
- Published
- 2013
- Full Text
- View/download PDF
33. LALITHA: A light weight Malayalam stemmer using suffix stripping method
- Author
-
P. C. Reghu Raj, U. Prajitha, and C. Sreejith
- Subjects
Agglutinative language ,Root (linguistics) ,Computer science ,business.industry ,Speech recognition ,computer.software_genre ,language.human_language ,Stripping (linguistics) ,Malayalam ,language ,Artificial intelligence ,Suffix ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
Stemming is the process of removing the affixes from inflections and to return the root form. Malayalam is highly agglutinative in nature and hundreds of inflections are possible for each word. An effective stemmer in Malayalam is not yet implemented. This paper presents a lightweight stemmer for Malayalam, which conflates terms by suffix removal. The proposed stemmer is both computationally inexpensive and domain independent and will serve as a vital part in many areas of Malayalam Language Computing.
- Published
- 2013
- Full Text
- View/download PDF
34. Design of a Language-Independent Parallel String Matching Unit for NLP
- Author
-
P. C. Reghu Raj, S. Raman, and V.S. Murty
- Subjects
Matching (statistics) ,Computational complexity theory ,business.industry ,Computer science ,Commentz-Walter algorithm ,Parallel computing ,String searching algorithm ,computer.software_genre ,Parallel processing (DSP implementation) ,Interleaved memory ,Artificial intelligence ,business ,Time complexity ,computer ,Natural language ,Natural language processing - Abstract
In natural language processing applications, string matching is the main time-consuming operation due to the large size of lexicon. Data dependence is minimal in string matching operations, and hence it is ideal for parallelization. A dedicated hardware for string matching that uses memory interleaving and parallel processing techniques can relieve the host CPU from this burden, thereby making the system suitable for real-time applications. This paper reports the FPGA design of such a system with m parallel matching units. The time complexity of the proposed algorithm is O (log2 n), where n is the total number of lexical entries. This has been achieved by a proper selection of the value of m. A special memory organization technique, which reduces the storage space by nearly 70%, has been adopted for storing lexical entries. The techniques used for matching and storage of lexical entries make the system language independent
- Published
- 2006
- Full Text
- View/download PDF
35. A Hybrid Approach to Relationship Extraction from Stories
- Author
-
P. C. Reghu Raj and V. Devisree
- Subjects
Text corpus ,Relation (database) ,business.industry ,Computer science ,Supervised learning ,02 engineering and technology ,Machine learning ,computer.software_genre ,Relationship extraction ,Automatic summarization ,Relation Extraction ,Task (project management) ,Information extraction ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,Unsupervised learning ,020201 artificial intelligence & image processing ,Story Understanding ,Artificial intelligence ,business ,computer ,Information Extraction ,General Environmental Science - Abstract
A story may be analyzed to identify the main characters and to extract the relationship between them. Relation extraction problems are generally solved either through supervised or unsupervised learning algorithms. In the former, there should be a text corpus for which the entities and their relation types are already known. Such algorithms typically learn to classify new entity pairs into any of the relation types it has already seen, based on some recurring patterns. On the other hand, the unsupervised learning approach is used when there is no such marked up corpus. Such algorithms typically identify patterns relevant to the relation extraction task, occurring within the corpus and then use these patterns to group entities such that the entities within a group share similar relationships. The proposed method is a hybrid approach which combines the features of unsupervised and supervised learning methods. It also uses some rules to extract relationships. The method identifies the main characters and collects the sentences related to them. Then these sentences are analyzed and classified to extract relationships. The main applications are story summarization and analysis of the major characters in stories.
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.