18,210 results
Search Results
2. Classifying Papers from Different Computer Science Conferences
- Author
-
Avi Rosenfeld, Yaakov HaCohen-Kerner, Daniel Nisim Cohen, and Maor Tzidkani
- Subjects
Computer science ,business.industry ,Decision tree learning ,Document classification ,Key (cryptography) ,Feature (machine learning) ,Artificial intelligence ,computer.software_genre ,business ,Part of speech ,computer ,Natural language processing - Abstract
This paper analyzes what stylistic characteristics differentiate different styles of writing, and specifically types of different A-level computer science articles. To do so, we compared various full papers using stylistic feature sets and a supervised machine learning method. We report on the success of this approach in identifying papers from the last 6 years of the following three conferences: SIGIR, ACL, and AAMAS. This approach achieves high accuracy results of 95.86%, 97.04%, 93.22%, and 92.14% for the following four classification experiments: (1) SIGIR / ACL, (2) SIGIR / AAMAS, (3) ACL / AAMAS, and (4) SIGIR / ACL / AAMAS, respectively. The Part of Speech (PoS) and the Orthographic sets were superior to all others and have been found as key components in different types of writing.
- Published
- 2013
3. Text Classification of Technical Papers Based on Text Segmentation
- Author
-
Thien Hai Nguyen and Kiyoaki Shirai
- Subjects
Structure (mathematical logic) ,Multi-label classification ,business.industry ,Computer science ,Supervised learning ,Text segmentation ,Binary number ,Feature selection ,computer.software_genre ,Text mining ,Artificial intelligence ,Representation (mathematics) ,business ,computer ,Natural language processing - Abstract
The goal of this research is to design a multi-label classification model which determines the research topics of a given technical paper. Based on the idea that papers are well organized and some parts of papers are more important than others for text classification, segments such as title, abstract, introduction and conclusion are intensively used in text representation. In addition, new features called Title Bi-Gram and Title SigNoun are used to improve the performance. The results of the experiments indicate that feature selection based on text segmentation and these two features are effective. Furthermore, we proposed a new model for text classification based on the structure of papers, called Back-off model, which achieves 60.45% Exact Match Ratio and 68.75% F-measure. It was also shown that Back-off model outperformed two existing methods, ML-kNN and Binary Approach.
- Published
- 2013
4. Personalized Paper Recommendation Based on User Historical Behavior
- Author
-
Jie Liu, Yuan Wang, Tianbi Liu, XingLiang Dong, and Yalou Huang
- Subjects
Information retrieval ,Computer science ,business.industry ,computer.software_genre ,Field (computer science) ,Preference ,World Wide Web ,Recommendation model ,Similarity (psychology) ,Language model ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
With the increasing of the amount of the scientific papers, it is very important and difficult for paper-sharing platforms to recommend related papers accurately for users. This paper tackles the problem by proposing a method that models user historical behavior. Through collecting the operations on scientific papers of online users and carrying on the detailed analysis, we build preference model for each user. The personalized recommendation model is constructed based on content-based filtering model and statistical language model.. Experimental results show that users’ historical behavior plays an important role in user preference modeling and the proposed method improves the final predication performance in the field of technical papers recommendation.
- Published
- 2012
5. Paper Retrieval Based on Specific Paper Features: Chain and Laid Lines
- Author
-
Pavel Paclík, J.C.A. van der Lubbe, M. van Staalduinen, and E. Backer
- Subjects
Similarity (geometry) ,business.industry ,Computer science ,Image processing ,Similarity measure ,computer.software_genre ,Similitude ,Set (abstract data type) ,Metric (mathematics) ,Visual Word ,Artificial intelligence ,Data mining ,business ,computer - Abstract
This paper presents paper retrieval using the specific paper features chain and laid lines. Paper features are detected in digitized paper images and they are represented such that they could be used for retrieval. Optimal retrieval performance is achieved by means of a trainable similarity measure for a given set of paper features. By means of these methods a retrieval system is developed that art experts could use real-time in order to speed up their paper research.
- Published
- 2006
6. Advances in Deep Parsing of Scholarly Paper Content
- Author
-
Bernd Kiefer and Ulrich Schäfer
- Subjects
Head-driven phrase structure grammar ,Information retrieval ,Parsing ,Computer science ,business.industry ,Semantic search ,computer.software_genre ,Semantic similarity ,Language technology ,Question answering ,Artificial intelligence ,Computational linguistics ,business ,Phrase structure grammar ,computer ,Natural language processing - Abstract
We report on advances in deep linguistic parsing of the full textual content of 8200 papers from the ACL Anthology, a collection of electronically available scientific papers in the fields of Computational Linguistics and Language Technology. We describe how - by incorporating new techniques - we increase both speed and robustness of deep analysis, specifically on long sentences where deep parsing often failed in former approaches. With the current open source HPSG (Head-driven phrase structure grammar) for English (ERG), we obtain deep parses for more than 85% of the sentences in the 1.5 million sentences corpus, while the former approaches achieved only approx. 65% coverage. The resulting sentence-wise semantic representations are used in the Scientist's Workbench, a platform demonstrating the use and benefit of natural language processing (NLP) to support scientists or other knowledge workers in fast and better access to digital document content. With the generated NLP annotations, we are able to implement important, novel applications such as robust semantic search, citation classification, and (in the future) question answering and definition exploration.
- Published
- 2011
7. COMPENDIUM: A Text Summarization System for Generating Abstracts of Research Papers
- Author
-
Manuel Palomar, Elena Lloret, and María Teresa Romá-Ferri
- Subjects
Information retrieval ,business.industry ,Computer science ,User satisfaction ,computer.software_genre ,Automatic summarization ,Compendium ,Preliminary analysis ,Multi-document summarization ,Information system ,Selection (linguistics) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
This paper presents COMPENDIUM, a text summarization system, which has achieved good results in extractive summarization. Therefore, our main goal in this research is to extend it, suggesting a new approach for generating abstractive-oriented summaries of research papers. We conduct a preliminary analysis where we compare the extractive version of COMPENDIUM (COMPENDIUME) with the new abstractiveoriented approach (COMPENDIUME-A). The final summaries are evaluated according to three criteria (content, topic, and user satisfaction) and, from the results obtained, we can conclude that the use of COMPENDIUM is appropriate for producing summaries of research papers automatically, going beyond the simple selection of sentences.
- Published
- 2011
8. Handwriting on Paper as a Cybermedium
- Author
-
Akira Yoshida, Marcus Liwichi, Masakazu Iwamura, Seiichi Uchida, Shinichiro Omachi, and Koichi Kise
- Subjects
Sequence ,Handwriting ,Computer science ,business.industry ,Speech recognition ,Carry (arithmetic) ,Value (computer science) ,Image processing ,Artificial intelligence ,computer.software_genre ,business ,computer ,Natural language processing - Abstract
In this paper, we report recent work of the data-embedding pen, which adds an ink-dot sequence along a hand written pattern during writing. The ink-dot sequence represents some information, such as writer's name, date of writing, and URL. This information drastically increases the value of hand writing on a paper. The embedded information can be extracted from the hand written pattern by image processing techniques and a stroke recovery technique. Consequently, we can augment the hand written pattern by the data-embedding pen to carry arbitrary information.
- Published
- 2011
9. A Divide-and-Conquer Tabu Search Approach for Online Test Paper Generation
- Author
-
Minh Luan Nguyen, Siu Cheung Hui, and Alvis C. M. Fong
- Subjects
Divide and conquer algorithms ,Optimization problem ,business.industry ,Computer science ,Constraint satisfaction ,Machine learning ,computer.software_genre ,Swarm intelligence ,Multi-objective optimization ,Tabu search ,Dynamic programming ,Constraint (information theory) ,Artificial intelligence ,business ,computer - Abstract
Online Test Paper Generation (Online-TPG) is a promising approach for Web-based testing and intelligent tutoring. It generates a test paper automatically online according to user specification based on multiple assessment criteria, and the generated test paper can then be attempted over the Web by user for self-assessment. Online-TPG is challenging as it is a multi-objective optimization problem on constraint satisfaction that is NP-hard, and it is also required to satisfy the online runtime requirement. The current techniques such as dynamic programming, tabu search, swarm intelligence and biologically inspired algorithms are ineffective for Online-TPG as these techniques generally require long runtime for generating good quality test papers. In this paper, we propose an efficient approach, called DAC-TS, which is based on the principle of constraint-based divide-and-conquer (DAC) and tabu search (TS) for constraint decomposition and multi-objective optimization for Online-TPG. Our empirical performance results have shown that the proposed DAC-TS approach has outperformed other techniques in terms of runtime and paper quality.
- Published
- 2011
10. Screening Paper Runnability in a Web-Offset Pressroom by Data Mining
- Author
-
Ahmad Alzghoul, Magnus Hållander, Antanas Verikas, Adas Gelzinis, and Marija Bacauskiene
- Subjects
Offset (computer science) ,Computer science ,business.industry ,Data classification ,Information and Computer Science ,Feature selection ,Machine learning ,computer.software_genre ,Data mapping ,Search engine ,Test set ,Data mining ,Artificial intelligence ,business ,computer ,Classifier (UML) - Abstract
This paper is concerned with data mining techniques for identifying the main parameters of the printing press, the printing process and paper affecting the occurrence of paper web breaks in a pressroom. Two approaches are explored. The first one treats the problem as a task of data classification into "break " and "non break " classes. The procedures of classifier design and selection of relevant input variables are integrated into one process based on genetic search. The search process results in a set of input variables providing the lowest average loss incurred in taking decisions. The second approach, also based on genetic search, combines procedures of input variable selection and data mapping into a low dimensional space. The tests have shown that the web tension parameters are amongst the most important ones. It was also found that, provided the basic off-line paper parameters are in an acceptable range, the paper related parameters recorded online contain more information for predicting the occurrence of web breaks than the off-line ones. Using the selected set of parameters, on average, 93.7% of the test set data were classified correctly. The average classification accuracy of the break cases was equal to 76.7%.
- Published
- 2009
11. A Reliable Classification Method for Paper Currency Based on LVQ Neural Network
- Author
-
Xiaofeng Li, Xuedong Li, Hongling Gou, and Jing Yi
- Subjects
Learning vector quantization ,Artificial neural network ,business.industry ,Computer science ,Feature vector ,Pattern recognition ,computer.software_genre ,Kernel principal component analysis ,Principal component analysis ,Classification methods ,Artificial intelligence ,Data mining ,business ,Classifier (UML) ,computer ,Lvq neural network - Abstract
To increase the reliability of currency classification, a classification method using neural networks with multi-pattern vectors is proposed in this paper. The data space of samples are divided into three blocks, then the latter are further divided into four sub-pattern vectors, and kernel principal component analysis is applied to extract features and assemble feature vectors to train LVQ neural network classifier. We draw the conclusion by testing new fifth edition RMB including four kinds of inputting directions of 1 Yuan, 5 Yuan, 10 Yuan and 20 Yuan RMB, up to 800 samples that PCA can compress data and decrease dimension of input vectors, extract the feature vectors effectively, thus the high-level reliability can be achieved by using the LVQ network classifier.
- Published
- 2011
12. S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently
- Author
-
Xiaoyong Du, Pei Li, Jun He, Yuanzhe Cai, and Hongyan Liu
- Subjects
SimRank ,Computer science ,Content analysis ,business.industry ,Graph (abstract data type) ,Artificial intelligence ,Data mining ,business ,Machine learning ,computer.software_genre ,Cluster analysis ,computer ,Link analysis - Abstract
Both Content analysis and link analysis have its advantages in measuring relationships among documents. In this paper, we propose a new method to combine these two methods to compute the similarity of research papers so that we can do clustering of these papers more accurately. In order to improve the efficiency of similarity calculation, we develop a strategy to deal with the relationship graph separately without affecting the accuracy. We also design an approach to assign different weights to different links to the papers, which can enhance the accuracy of similarity calculation. The experimental results conducted on ACM Data Set show that our new algorithm, S-SimRank,outperforms other algorithms.
- Published
- 2008
13. Discovering User Profiles from Semantically Indexed Scientific Papers
- Author
-
Pasquale Lops, Pierpaolo Basile, Marco de Gemmis, and Giovanni Semeraro
- Subjects
Information retrieval ,User profile ,business.industry ,Computer science ,Search engine indexing ,WordNet ,Lexical database ,computer.software_genre ,Session (web analytics) ,Naive Bayes classifier ,Text mining ,Categorization ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
Typically, personalized information recommendation services automatically infer the user profile, a structured model of the user interests, from documents that were already deemed relevant by the user. We present an approach based on Word Sense Disambiguation (WSD) for the extraction of user profiles from documents. This approach relies on a knowledge-based WSD algorithm, called JIGSAW, for the semantic indexing of documents: JIGSAW exploits the WordNet lexical database to select, among all the possible meanings (senses) of a polysemous word, the correct one. Semantically indexed documents are used to train a naive Bayes learner that infers "semantic", sense-baseduser profiles as binary text classifiers (user-likes and user-dislikes). Two empirical evaluations are described in the paper. In the first experimental session, JIGSAW has been evaluated according to the parameters of the Senseval-3 initiative, that provides a forum where the WSD systems are assessed against disambiguated datasets. The goal of the second empirical evaluation has been to measure the accuracy of the user profiles in selecting relevant documents to be recommended. Performance of classical keyword-based profiles has been compared to that of sense-based profiles in the task of recommending scientific papers. The results show that sense-based profiles outperform keyword-based ones.
- Published
- 2007
14. Automatic Recognition and Interpretation of Pen- and Paper-Based Document Annotations
- Author
-
Markus Weber, Andreas Dengel, and Marcus Liwicki
- Subjects
Information management ,Information retrieval ,Computer science ,business.industry ,computer.software_genre ,Semantic desktop ,Gesture recognition ,Handwriting recognition ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Semantic Web Stack ,Artificial intelligence ,business ,computer ,Semantic Web ,Natural language processing ,Gesture ,Meaning (linguistics) - Abstract
In this paper we present a system which recognizes handwritten annotations on printed text documents and interprets their semantic meaning. This system processes in three steps. In the first step, document analysis methods are applied to identify possible gestures and text regions. In the second step, the text and gestures are recognized using several state-of-the-art recognition methods. In the fourth step, the actual marked text is identified. Finally, the recognized information is sent to the Semantic Desktop, the personal Semantic Web on the Desktop computer, which supports users in their information management. In order to assess the performance of the system, we have performed an experimental study. We evaluated the different stages of the system and measured the overall performance.
- Published
- 2009
15. Evaluating the Adaptation of a Learning System before the Prototype Is Ready: A Paper-Based Lab Study
- Author
-
Barbara Kump, Antonia Maas, Tobias Ley, Dietrich Albert, and Neil Maiden
- Subjects
Proactive learning ,Computer science ,business.industry ,Active learning (machine learning) ,Context (language use) ,Machine learning ,computer.software_genre ,Robot learning ,Task (project management) ,Human–computer interaction ,Adaptive system ,Adaptive learning ,Artificial intelligence ,Adaptation (computer science) ,business ,computer - Abstract
We report on results of a paper-based lab study that used information on task performance, self appraisal and personal learning need assessment to validate the adaptation mechanisms for a work-integrated learning system. We discuss the results in the wider context of the evaluation of adaptive systems where the validation methods we used can be transferred to a work-based setting to iteratively refine adaptation mechanisms and improve model validity.
- Published
- 2009
16. A Bayesian Approach to Classify Conference Papers
- Author
-
Kok-Chin Khor and Choo-Yee Ting
- Subjects
business.industry ,Computer science ,Bayesian probability ,Bayesian network ,Feature selection ,Machine learning ,computer.software_genre ,Intelligent tutoring system ,Classifier (linguistics) ,Expectation–maximization algorithm ,Prior probability ,The Internet ,Artificial intelligence ,business ,computer - Abstract
This article aims at presenting a methodological approach for classifying educational conference papers by employing a Bayesian Network (BN). A total of 400 conference papers were collected and categorized into 4 major topics (Intelligent Tutoring System, Cognition, e-Learning, and Teacher Education). In this study, we have implemented a 80-20 split of collected papers. 80% of the papers were meant for keywords extraction and BN parameter learning whereas the other 20% were aimed for predictive accuracy performance. A feature selection algorithm was applied to automatically extract keywords for each topic. The extracted keywords were then used for constructing BN. The prior probabilities were subsequently learned using the Expectation Maximization (EM) algorithm. The network has gone through a series of validation by human experts and experimental evaluation to analyze its predictive accuracy. The result has demonstrated that the proposed BN has outperformed Naive Bayesian Classifier, and BN learned from the training data.
- Published
- 2006
17. Modelling Citation Networks for Improving Scientific Paper Classification Performance
- Author
-
Mengjie Zhang, Minh Duc Cao, Xiaoying Gao, and Yuejin Ma
- Subjects
Computer science ,business.industry ,Probabilistic logic ,Bayesian network ,Hyperlink ,computer.software_genre ,Machine learning ,Class (biology) ,Data set ,Naive Bayes classifier ,Content analysis ,Data mining ,Artificial intelligence ,Citation ,business ,computer - Abstract
This paper describes an approach to the use of citation links to improve the scientific paper classification performance. In this approach, we develop two refinement functions, a linear label refinement (LLR) and a probabilistic label refinement (PLR), to model the citation link structures of the scientific papers for refining the class labels of the documents obtained by the content-based Naive Bayes classification method. The approach with the two new refinement models is examined and compared with the content-based Naive Bayes method on a standard paper classification data set with increasing training set sizes. The results suggest that both refinement models can significantly improve the system performance over the content-based method for all the training set sizes and that PLR is better than LLR when the training examples are sufficient.
- Published
- 2006
18. Inventing Malleable Scores: From Paper to Screen Based Scores
- Author
-
Arthur Clay
- Subjects
Malleability ,business.industry ,Computer science ,Mathematics education ,Artificial intelligence ,Standard score ,business ,Notation ,computer.software_genre ,computer ,Composition (language) ,License ,Interpreter - Abstract
This paper examines the idea of artistic license of the interpreter as a positive aspect of composition. The possibilities of participating in the creative act beyond the role of the traditional interpreter are illustrated by tracing the development of malleability in score writing in selected works of the author. Starting with the standard score, examples are given for the various forms of malleable scores that lead up to the application of real-time electronic scores in which a concept of self-conduction is feasibly implemented for use in distributed ensembles.
- Published
- 2008
19. Searching for Illustrative Sentences for Multiword Expressions in a Research Paper Database
- Author
-
Hidetsugu Nanba and Satoshi Morishita
- Subjects
Information retrieval ,Database ,business.industry ,Computer science ,Parse tree ,Limiting ,computer.software_genre ,Measure (mathematics) ,Expression (mathematics) ,Focus (linguistics) ,Component (UML) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
We propose a method to search for illustrative sentences for English multiword expressions (MWEs) from a research paper database. We focus on syntactically flexible expressions such as "regard --- as." Traditionally, illustrative sentences that contain such expressions have been searched for by limiting the maximum number of words between the component words of the MWE. However, this method could not collect enough illustrative sentences in which clauses are inserted between component words of MWEs. We therefore devised a measure that calculates the distance between component words of an MWE in a parse tree, and use it for flexible expression search. We conducted experiments, and obtained a precision of 0.832 and a recall of 0.911.
- Published
- 2008
20. Extracting and Querying Relations in Scientific Papers
- Author
-
Yajing Zhang, Torsten Marek, Hans Uszkoreit, Christian Federmann, and Ulrich Schäfer
- Subjects
Head-driven phrase structure grammar ,Parsing ,Information retrieval ,Grammar ,Computer science ,business.industry ,media_common.quotation_subject ,WordNet ,computer.software_genre ,Named-entity recognition ,Language technology ,Minimal recursion semantics ,Artificial intelligence ,Computational linguistics ,business ,computer ,Natural language processing ,media_common - Abstract
High-precision linguistic and semantic analysis of scientific texts is an emerging research area. We describe methods and an application for extracting interesting factual relations from scientific texts in computational linguistics and language technology. We use a hybrid NLP architecture with shallow preprocessing for increased robustness and domain-specific, ontology-based named entity recognition, followed by a deep HPSG parser running the English Resource Grammar (ERG). The extracted relations in the MRS (minimal recursion semantics) format are simplified and generalized using WordNet. The resulting `quriples' are stored in a database from where they can be retrieved by relation-based search. The query interface is embedded in a web browser-based application we call the Scientist's Workbench. It supports researchers in editing and online-searching scientific papers.
- Published
- 2008
21. A Memetic Differential Evolution in Filter Design for Defect Detection in Paper Production
- Author
-
Tuomo Rossi, Ville Tirronen, Kirsi Majava, Ferrante Neri, and Tommi Kärkkäinen
- Subjects
Engineering ,education.field_of_study ,Finite impulse response ,business.industry ,Process (engineering) ,Population ,Evolutionary algorithm ,Machine learning ,computer.software_genre ,Filter design ,Differential evolution ,Memetic algorithm ,Artificial intelligence ,business ,education ,computer ,Digital filter - Abstract
This article proposes a Memetic Differential Evolution (MDE) for designing digital filters which aim at detecting defects of the paper produced during an industrial process. The MDE is an adaptive evolutionary algorithm which combines the powerful explorative features of Differential Evolution (DE) with the exploitative features of two local searchers. The local searchers are adaptively activated by means of a novel control parameter which measures fitness diversity within the population. Numerical results show that the DE framework is efficient for the class of problems under study and employment of exploitative local searchers is helpful in supporting the DE explorative mechanism in avoiding stagnation and thus detecting solutions having a high performance.
- Published
- 2007
22. Vectorization-Free Reconstruction of 3D CAD Models from Paper Drawings
- Author
-
Frank Ditrich, Herbert Suesse, and Klaus Voss
- Subjects
Engineering drawing ,business.industry ,Computer science ,3D reconstruction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Process (computing) ,Image processing ,CAD ,Iterative reconstruction ,computer.software_genre ,Computer graphics (images) ,Pattern recognition (psychology) ,Computer Aided Design ,Computer vision ,Image tracing ,Artificial intelligence ,business ,computer - Abstract
We propose a new approach for the reconstruction of 3D CAD models from paper drawings. Our method uses a combination of the well-known fleshing-out-projections method and accumulation techniques from image processing to reconstruct part models. It should provide a comfortable method to handle inaccuracies and missing elements unavoidable in scanned paper drawings while giving the user the chance to observe and interactively control the reconstruction process.
- Published
- 2004
23. An Intelligent Grading System for Descriptive Examination Papers Based on Probabilistic Latent Semantic Analysis
- Author
-
Jae-Young Lee, Yu-Seop Kim, Jeong-Ho Chang, and Jung-Seok Oh
- Subjects
Probabilistic latent semantic analysis ,Computer science ,business.industry ,Semantics ,computer.software_genre ,Similitude ,Semantic similarity ,Vector space model ,Semantic memory ,Artificial intelligence ,business ,Grading (education) ,computer ,Natural language processing - Abstract
In this paper, we developed an intelligent grading system, which scores descriptive examination papers automatically, based on Probabilistic Latent Semantic Analysis (PLSA) For grading, we estimated semantic similarity between a student paper and a model paper PLSA is able to represent complex semantic structures of given contexts, like text passages, and are used for building linguistic semantic knowledge which could be used in estimating contextual semantic similarity In this paper, we marked the real examination papers and we can acquire about 74% accuracy of a manual grading, 7% higher than that from the Simple Vector Space Model.
- Published
- 2004
24. Extracting Positive Attributions from Scientific Papers
- Author
-
Achim Hoffmann and Son Bao Pham
- Subjects
business.industry ,Computer science ,media_common.quotation_subject ,Context (language use) ,computer.software_genre ,Machine learning ,Knowledge acquisition ,Task (project management) ,Term (time) ,Information extraction ,Knowledge base ,Expression (architecture) ,Reading (process) ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
The aim of our work is to provide support for reading (or skimming) scientific papers. In this paper we report on the task to identify concepts or terms with positive attributions in scientific papers. This task is challenging as it requires the analysis of the relationship between a concept or term and its sentiment expression. Furthermore, the context of the expression needs to be inspected. We propose an incremental knowledge acquisition framework to tackle these challenges. With our framework we could rapidly (within 2 days of an expert’s time) develop a prototype system to identify positive attributions in scientific papers. The resulting system achieves high precision (above 74%) and high recall rates (above 88%) in our initial experiments on corpora of scientific papers. It also drastically outperforms baseline machine learning algorithms trained on the same data.
- Published
- 2004
25. Relevant Information Extraction Driven with Rhetorical Schemas to Summarize Scientific Papers
- Author
-
Abdelmajid Ben Hamadou and Mariem Ellouze
- Subjects
Structural linguistics ,Phrase ,Computer science ,business.industry ,computer.software_genre ,Cohesion (linguistics) ,Information extraction ,Knowledge base ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Rhetorical question ,Artificial intelligence ,Source text ,business ,computer ,Natural language processing ,Sentence - Abstract
Automatic summaries are often subject to several criticisms (e.g., lack of cohesion and coherence). In this paper, we propose an approach that uses coherent Summary-Schemas (templates) conceived from the rhetorical structure of scientific papers including their abstracts. The Summary-Schemas embed rhetorical roles specified by signatures (sets of positional, structural, linguistic and thematic features) that guide the search for appropriate sentences in the source text.
- Published
- 2002
26. 3D Reconstruction of Paper Based Assembly Drawings: State of the Art and Approach
- Author
-
Harald Kunze, Hans Grabowski, Arno Michelis, El-Fathi El-Mejbri, and Ralf-Stefan Lossack
- Subjects
Engineering drawing ,Process (engineering) ,Computer science ,business.industry ,3D reconstruction ,Computer Aided Design ,Image tracing ,Artificial intelligence ,State (computer science) ,business ,computer.software_genre ,computer ,Digitization - Abstract
Engineering solutions are generally documented in assembly and part drawings and bill of materials. A great benefit, qualitatively and commercially, can be achieved if these paper based storages can be transformed into digital information archives. The process of this transformation is called reconstruction. The reconstruction process of paper based assembly drawings consists of four steps: digitization; vectorization/ interpretation; 3D reconstruction of the parts and the 3D reconstruction of the assembly.This paper evaluates existing commercial systems worldwide for interpretation of paper based mechanical engineering drawings. For a complete reconstruction process a 3D reconstruction is needed. This functionality is already supported by some CAD systems to a certain extent, but it still remains a major topic of research work. One CAD system which converts 2D CAD models into 3D CAD models is presented. Finally, after the reconstruction of the parts the whole assembly can be reconstructed. Until now, no system for the automatic reconstruction of assemblies is available. In our paper we present a general approach for automatic reconstruction of 3D assembly model data by interpretation of mechanical engineering 2D assembly drawings, their part drawings, and the bill of materials.
- Published
- 2002
27. Capturing Abstract Matrices from Paper
- Author
-
Volker Sorge, Toshihiro Kanahori, Masakazu Suzuki, and Alan P. Sexton
- Subjects
business.industry ,Computer science ,media_common.quotation_subject ,Semantic analysis (machine learning) ,Image processing ,Ambiguity ,computer.software_genre ,Semantics ,Task (project management) ,Matrix (mathematics) ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
Capturing and understanding mathematics from print form is an important task in translating written mathematical knowledge into electronic form. While the problem of syntactically recognising mathematical formulas from scanned images has received attention, very little work has been done on semantic validation and correction of recognised formulas. We present a first step towards such an integrated system by combining the Infty system with a semantic analyser for matrix expressions. We applied the combined system in experiments on the semantic analysis of matrix images scanned from textbooks. While the first results are encouraging, they also demonstrate many ambiguities one has to deal with when analysing matrix expressions in different contexts. We give a detailed overview of the problems we encountered that motivate further research into semantic validation of mathematical formula recognition.
- Published
- 2006
28. Helli-Respina 2001 Team Description Paper
- Author
-
Omid Aladini, B Bahador Nooraei, and N Siavash Rahbar
- Subjects
Intelligent agent ,Computer science ,business.industry ,Unsupervised learning ,Artificial intelligence ,computer.software_genre ,Agent architecture ,business ,computer - Abstract
One of the most important problems for development of intelligent agents is adaptation to the environment. In this paper we briefly describe Helli-Respina soccer simulator team that uses a new self-adaptive method named Dynamic Multi-Behavior Assessment (DMBA). By using built-in behavior manager named dynamic behavior transformer method lets the agent can choose the best algorithms to decide during the game. This system always tries to choose a set of available algorithms to get the best result against each opponent. The main objective in this research is how to choose a set of algorithms dynamically to get the best result against an opponent.
- Published
- 2002
29. Symbolic Learning Techniques in Paper Document Processing
- Author
-
Donato Malerba, Floriana Esposito, O. Altamura, and Francesca A. Lisi
- Subjects
Active learning (machine learning) ,business.industry ,Computer science ,Document classification ,Decision tree ,Well-formed document ,Document management system ,Document clustering ,computer.software_genre ,Document processing ,Document Schema Definition Languages ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
WISDOM++ is an intelligent document processing system that transforms a paper document into HTML/XML format. The main design requirement is adaptivity, which is realized through the application of machine learning methods. This paper illustrates the application of symbolic learning algorithms to the first three steps of document processing, namely document analysis, document classification and document understanding. Machine learning issues related to the application are: Efficient incremental induction of decision trees from numeric data, handling of both numeric and symbolic data in first-order rule learning, learning mutually dependent concepts. Experimental results obtained on a set of real-world documents are illustrated and commented.
- Published
- 1999
30. Building digital libraries from paper documents, using ART based neuro-fuzzy systems
- Author
-
R. Sanz Guadarrama, Juan López Coronado, Yannis Dimitriadis, Gregorio Ismael Sainz Palmero, and J. Manuel Cano Izquierdo
- Subjects
Information retrieval ,Neuro-fuzzy ,business.industry ,Computer science ,Context (language use) ,Optical character recognition ,Digital library ,computer.software_genre ,Document processing ,Modularity ,Adaptive resonance theory ,Table of contents ,Artificial intelligence ,business ,computer - Abstract
In this paper a new neuro-fuzzy system is proposed for both tasks of document analysis and Optical Character Recognition. FasART (Fuzzy adaptive system ART based) inherits the stability, flexibility and modularity properties of ART supervised models, but with a formal description as a Fuzzy Logic System, and increased functionality. On the other hand Recursive FasART permits us to exploit context information, crucial aspect in document understanding. Satisfactory experimental results are presented for the global application of building a digital library of scientific papers, giving special emphasis on the creation of links between items in table of contents and paper first pages.
- Published
- 1997
31. Research to Improve Cross-Language Retrieval — Position Paper for CLEF
- Author
-
Fredric C. Gey
- Subjects
Information retrieval ,Machine translation ,Computer science ,business.industry ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,computer.software_genre ,Variety (linguistics) ,Clef ,Romanization ,Human–computer information retrieval ,Transliteration ,Multilingualism ,Artificial intelligence ,Computational linguistics ,business ,computer ,Natural language processing - Abstract
Improvement in cross-language information retrieval results can come from a variety of sources - failure analysis, resource enrichment in terms of stemming and parallel and comparable corpora, use of pivot languages, as well as phonetic transliteration and Romanization. Application of these methodologies should contribute to a gradual increase in the ability of search software to cross the language barrier.
- Published
- 2001
32. Processing paper documents with WISDOM
- Author
-
Giovanni Semeraro, Luca de Filippis, Donato Malerba, and Floriana Esposito
- Subjects
Information retrieval ,Interface (Java) ,Computer science ,business.industry ,Document classification ,Optical character recognition ,Representation (arts) ,computer.software_genre ,Document processing ,Knowledge base ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Systems architecture ,Artificial intelligence ,business ,computer - Abstract
WISDOM is a paper-computer interface that can transform printed information into a symbolic representation. This is done into four distinct steps: Document analysis, document classification, document understanding, and text recognition with an OCR Machine learning tools and techniques are used in the first three steps to easily customize the interface on the exigencies of different users.
- Published
- 1997
33. An Experimental Implementation of a Document Recognition System for Papers Containing Mathematical Expressions
- Author
-
Masayuki Okamoto and Akira Miyazawa
- Subjects
business.industry ,Character (computing) ,Computer science ,Process (computing) ,Skew ,computer.software_genre ,Symbol (chemistry) ,Software ,Experimental system ,Artificial intelligence ,Graphics ,business ,computer ,Natural language processing ,Block (data storage) - Abstract
This paper describes the current state of an experimental document recognition system for scientific papers. A scientific paper contains not only text but also tables, pictures, graphics, and mathematical expressions. This system can convert character or symbol strings in text as well as mathematical expressions and tables into coded data. Out of all the functions required for the entire process from document scanning through recognition, these have been investigated and implemented: skew detection and correction, region (block) segmentation, and mathematical expression recognition. The algorithms have been designed for high speed as much as possible. This experimental system is implemented entirely in software on a work station under X-windows. Some experimental results on each stage of the document recognition process are presented.
- Published
- 1992
34. Predicting COVID-19 statistics using machine learning regression model: Li-MuLi-Poly
- Author
-
Seema Bawa and Hari Singh
- Subjects
Mean squared error ,Computer Networks and Communications ,Computer science ,02 engineering and technology ,Machine learning ,computer.software_genre ,Matrix (mathematics) ,symbols.namesake ,Statistics ,Linear regression ,0202 electrical engineering, electronic engineering, information engineering ,Media Technology ,Regular Paper ,Accuracy ,t-Test ,Polynomial regression ,Minimum mean square error ,business.industry ,COVID-19 ,020207 software engineering ,Regression analysis ,Regression ,Pearson product-moment correlation coefficient ,Hardware and Architecture ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Software ,Information Systems - Abstract
In this paper, linear regression (LR), multi-linear regression (MLR) and polynomial regression (PR) techniques are applied to propose a model Li-MuLi-Poly. The model predicts COVID-19 deaths happening in the United States of America. The experiment was carried out on machine learning model, minimum mean square error model, and maximum likelihood ratio model. The best-fitting model was selected according to the measures of mean square error, adjusted mean square error, mean square error, root mean square error (RMSE) and maximum likelihood ratio, and the statistical t-test was used to verify the results. Data sets are analyzed, cleaned up and debated before being applied to the proposed regression model. The correlation of the selected independent parameters was determined by the heat map and the Carl Pearson correlation matrix. It was found that the accuracy of the LR model best-fits the dataset when all the independent parameters are used in modeling, however, RMSE and mean absolute error (MAE) are high as compared to PR models. The PR models of a high degree are required to best-fit the dataset when not much independent parameter is considered in modeling. However, the PR models of low degree best-fits the dataset when independent parameters from all dimensions are considered in modeling.
- Published
- 2021
35. Predicting the pandemic: sentiment evaluation and predictive analysis from large-scale tweets on Covid-19 by deep convolutional neural network
- Author
-
Sourav Das and Anup Kumar Kolya
- Subjects
Text corpus ,Predictive analysis ,Phrase ,Computer science ,Cognitive Neuroscience ,Twitter ,Stability (learning theory) ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,Sentiment analysis ,Mathematics (miscellaneous) ,Deep convolutional network ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Artificial neural network ,business.industry ,Deep learning ,020206 networking & telecommunications ,Coronavirus ,Test case ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Covid-19 ,computer ,Research Paper - Abstract
Engaging deep neural networks for textual sentiment analysis is an extensively practiced domain of research. Textual sentiment classification harnesses the full computational potential of deep learning models. Typically, these research works are carried either with a popular open-source data corpus, or self-extracted short phrase texts from Twitter, Reddit, or web-scrapped text data from other resources. Rarely do we see a large amount of data on a current ongoing event is being collected and cultured further. Also, an even more complex task would be to model the data from a currently ongoing event, not only for scaling the sentiment accuracy but also for making a predictive analysis for the same. In this paper, we propose a novel approach for achieving sentiment evaluation accuracy by using a deep neural network on live-streamed tweets on Coronavirus and future case growth prediction. We develop a large tweet corpus exclusively based on the Coronavirus tweets. We split the data into train and test sets, alongside we perform polarity classification and trend analysis. The refined outcome from the trend analysis helps to train the data to provide an incremental learning curvature for our neural network, and we obtain an accuracy of 90.67%. Finally, we provide a statistical-based future prediction for Coronavirus cases growth. Not only our model outperforms several previous state-of-art experiments in overall sentiment accuracy comparison for similar tasks, but it also maintains a throughout performance stability among all the test cases when tested with several popular open-source text corpora.
- Published
- 2021
36. Application of artificial neural networks for automated analysis of cystoscopic images: a review of the current status and future prospects
- Author
-
Alexander Reiterer, Rodrigo Suarez-Ibarrola, Arkadiusz Miernik, Simon Hein, and Misgana Negassi
- Subjects
Urology ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Data acquisition ,Medical image analysis ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,Image Processing, Computer-Assisted ,Humans ,Bladder cancer ,Artificial neural network ,medicine.diagnostic_test ,business.industry ,Deep learning ,Frame (networking) ,Cystoscopy ,medicine.disease ,Topic Paper ,Visualization ,Cystoscopic images ,030220 oncology & carcinogenesis ,020201 artificial intelligence & image processing ,Artificial intelligence ,Neural Networks, Computer ,business ,computer ,Neural networks ,Forecasting - Abstract
BackgroundOptimal detection and surveillance of bladder cancer (BCa) rely primarily on the cystoscopic visualization of bladder lesions. AI-assisted cystoscopy may improve image recognition and accelerate data acquisition.ObjectiveTo provide a comprehensive review of machine learning (ML), deep learning (DL) and convolutional neural network (CNN) applications in cystoscopic image recognition.Evidence acquisitionA detailed search of original articles was performed using the PubMed-MEDLINE database to identify recent English literature relevant to ML, DL and CNN applications in cystoscopic image recognition.Evidence synthesisIn total, two articles and one conference abstract were identified addressing the application of AI methods in cystoscopic image recognition. These investigations showed accuracies exceeding 90% for tumor detection; however, future work is necessary to incorporate these methods into AI-aided cystoscopy and compared to other tumor visualization tools. Furthermore, we present results from the RaVeNNA-4pi consortium initiative which has extracted 4200 frames from 62 videos, analyzed them with the U-Net network and achieved an average dice score of 0.67. Improvements in its precision can be achieved by augmenting the video/frame database.ConclusionAI-aided cystoscopy has the potential to outperform urologists at recognizing and classifying bladder lesions. To ensure their real-life implementation, however, these algorithms require external validation to generalize their results across other data sets.
- Published
- 2020
37. Comparison of data science workflows for root cause analysis of bioprocesses
- Author
-
Christoph Herwig, Yvonne E. Thomassen, Daniel Borchert, Diego A. Suarez-Zuluaga, and Patrick Sagmeister
- Subjects
0106 biological sciences ,Drug Industry ,Process (engineering) ,Computer science ,Data analysis ,Bioengineering ,Machine learning ,computer.software_genre ,01 natural sciences ,Workflow ,Bioreactors ,Robustness (computer science) ,010608 biotechnology ,Partial least squares regression ,Chlorocebus aethiops ,Raw data analysis ,Root cause analysis ,Animals ,Vero Cells ,Principal Component Analysis ,010405 organic chemistry ,business.industry ,Data Science ,Feature based analysis ,General Medicine ,Variance (accounting) ,Work in process ,0104 chemical sciences ,Poliovirus ,Fermentation ,Multivariate Analysis ,Regression Analysis ,Artificial intelligence ,business ,Raw data ,computer ,Software ,Biotechnology ,Research Paper - Abstract
Root cause analysis (RCA) is one of the most prominent tools used to comprehensively evaluate a biopharmaceutical production process. Despite of its widespread use in industry, the Food and Drug Administration has observed a lot of unsuitable approaches for RCAs within the last years. The reasons for those unsuitable approaches are the use of incorrect variables during the analysis and the lack in process understanding, which impede correct model interpretation. Two major approaches to perform RCAs are currently dominating the chemical and pharmaceutical industry: raw data analysis and feature-based approach. Both techniques are shown to be able to identify the significant variables causing the variance of the response. Although they are different in data unfolding, the same tools as principal component analysis and partial least square regression are used in both concepts. Within this article we demonstrate the strength and weaknesses of both approaches. We proved that a fusion of both results in a comprehensive and effective workflow, which not only increases better process understanding. We demonstrate this workflow along with an example. Hence, the presented workflow allows to save analysis time and to reduce the effort of data mining by easy detection of the most important variables within the given dataset. Subsequently, the final obtained process knowledge can be translated into new hypotheses, which can be tested experimentally and thereby lead to effectively improving process robustness.
- Published
- 2018
38. A framework for sensitivity analysis of decision trees
- Author
-
Bogumił Kamiński, Przemysław Szufel, and Michał Jakubczyk
- Subjects
Incremental decision tree ,Original Paper ,Computer science ,business.industry ,020209 energy ,Decision tree learning ,Decision trees ,Decision tree ,Evidential reasoning approach ,02 engineering and technology ,Decision rule ,Management Science and Operations Research ,Machine learning ,computer.software_genre ,Decision optimization ,0202 electrical engineering, electronic engineering, information engineering ,Influence diagram ,020201 artificial intelligence & image processing ,Decision sensitivity ,Artificial intelligence ,business ,computer ,Decision analysis ,Optimal decision - Abstract
In the paper, we consider sequential decision problems with uncertainty, represented as decision trees. Sensitivity analysis is always a crucial element of decision making and in decision trees it often focuses on probabilities. In the stochastic model considered, the user often has only limited information about the true values of probabilities. We develop a framework for performing sensitivity analysis of optimal strategies accounting for this distributional uncertainty. We design this robust optimization approach in an intuitive and not overly technical way, to make it simple to apply in daily managerial practice. The proposed framework allows for (1) analysis of the stability of the expected-value-maximizing strategy and (2) identification of strategies which are robust with respect to pessimistic/optimistic/mode-favoring perturbations of probabilities. We verify the properties of our approach in two cases: (a) probabilities in a tree are the primitives of the model and can be modified independently; (b) probabilities in a tree reflect some underlying, structural probabilities, and are interrelated. We provide a free software tool implementing the methods described.
- Published
- 2017
39. Towards infield, live plant phenotyping using a reduced-parameter CNN
- Author
-
John Atanbori, Tony P. Pridmore, and Andrew P. French
- Subjects
Computer science ,Population ,02 engineering and technology ,Machine learning ,computer.software_genre ,Convolutional neural network ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,Segmentation ,education ,030304 developmental biology ,2. Zero hunger ,0303 health sciences ,education.field_of_study ,Original Paper ,Separable convolutions ,business.industry ,Lightweight deep convolutional neural networks ,Singular value decomposition ,Image segmentation ,G400 Computer Science ,15. Life on land ,Computer Science Applications ,Identification (information) ,Pixel-wise segmentation for plant phenotyping ,13. Climate action ,Hardware and Architecture ,Pattern recognition (psychology) ,Key (cryptography) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Mobile device ,computer ,Software - Abstract
There is an increase in consumption of agricultural produce as a result of the rapidly growing human population, particularly in developing nations. This has triggered high-quality plant phenotyping research to help with the breeding of high-yielding plants that can adapt to our continuously changing climate. Novel, low-cost, fully automated plant phenotyping systems, capable of infield deployment, are required to help identify quantitative plant phenotypes. The identification of quantitative plant phenotypes is a key challenge which relies heavily on the precise segmentation of plant images. Recently, the plant phenotyping community has started to use very deep convolutional neural networks (CNNs) to help tackle this fundamental problem. However, these very deep CNNs rely on some millions of model parameters and generate very large weight matrices, thus making them difficult to deploy infield on low-cost, resource-limited devices. We explore how to compress existing very deep CNNs for plant image segmentation, thus making them easily deployable infield and on mobile devices. In particular, we focus on applying these models to the pixel-wise segmentation of plants into multiple classes including background, a challenging problem in the plant phenotyping community. We combined two approaches (separable convolutions and SVD) to reduce model parameter numbers and weight matrices of these very deep CNN-based models. Using our combined method (separable convolution and SVD) reduced the weight matrix by up to 95% without affecting pixel-wise accuracy. These methods have been evaluated on two public plant datasets and one non-plant dataset to illustrate generality. We have successfully tested our models on a mobile device.
- Published
- 2019
40. Development and application of a machine learning algorithm for classification of elasmobranch behaviour from accelerometry data
- Author
-
Samuel H. Gruber, Alexander C. Hansell, Lauran R. Brewster, Michael Elliott, Ian G. Cowx, Jonathan J. Dale, Nicholas M. Whitney, Tristan L. Guttridge, and Adrian C. Gleiss
- Subjects
0106 biological sciences ,Original Paper ,Ecology ,biology ,Artificial neural network ,business.industry ,010604 marine biology & hydrobiology ,Aquatic Science ,biology.organism_classification ,Logistic regression ,Headshaking ,Accelerometer ,Machine learning ,computer.software_genre ,010603 evolutionary biology ,01 natural sciences ,Random forest ,Negaprion brevirostris ,14. Life underwater ,Gradient boosting ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Ecology, Evolution, Behavior and Systematics - Abstract
Discerning behaviours of free-ranging animals allows for quantification of their activity budget, providing important insight into ecology. Over recent years, accelerometers have been used to unveil the cryptic lives of animals. The increased ability of accelerometers to store large quantities of high resolution data has prompted a need for automated behavioural classification. We assessed the performance of several machine learning (ML) classifiers to discern five behaviours performed by accelerometer-equipped juvenile lemon sharks (Negaprion brevirostris) at Bimini, Bahamas (25°44′N, 79°16′W). The sharks were observed to exhibit chafing, burst swimming, headshaking, resting and swimming in a semi-captive environment and these observations were used to ground-truth data for ML training and testing. ML methods included logistic regression, an artificial neural network, two random forest models, a gradient boosting model and a voting ensemble (VE) model, which combined the predictions of all other (base) models to improve classifier performance. The macro-averaged F-measure, an indicator of classifier performance, showed that the VE model improved overall classification (F-measure 0.88) above the strongest base learner model, gradient boosting (0.86). To test whether the VE model provided biologically meaningful results when applied to accelerometer data obtained from wild sharks, we investigated headshaking behaviour, as a proxy for prey capture, in relation to the variables: time of day, tidal phase and season. All variables were significant in predicting prey capture, with predations most likely to occur during early evening and less frequently during the dry season and high tides. These findings support previous hypotheses from sporadic visual observations. Electronic supplementary material The online version of this article (10.1007/s00227-018-3318-y) contains supplementary material, which is available to authorized users.
- Published
- 2018
41. Analysis of Three Intrusion Detection System Benchmark Datasets Using Machine Learning Algorithms
- Author
-
H. Gunes Kayacik and Nur Zincir-Heywood
- Subjects
business.product_category ,Artificial neural network ,Competitive intelligence ,Computer science ,business.industry ,Intrusion detection system ,Benchmarking ,Machine learning ,computer.software_genre ,Paper machine ,Benchmark (computing) ,Artificial intelligence ,business ,Cluster analysis ,Algorithm ,computer ,Network analysis - Abstract
In this paper, we employed two machine learning algorithms – namely, a clustering and a neural network algorithm – to analyze the network traffic recorded from three sources. Of the three sources, two of the traffic sources were synthetic, which means the traffic was generated in a controlled environment for intrusion detection benchmarking. The main objective of the analysis is to determine the differences between synthetic and real-world traffic, however the analysis methodology detailed in this paper can be employed for general network analysis purposes. Moreover the framework, which we employed to generate one of the two synthetic traffic sources, is briefly discussed.
- Published
- 2005
42. Ontologies and Similarity
- Author
-
Steffen Staab
- Subjects
business.industry ,Computer science ,Differentia ,Short paper ,Disjoint sets ,computer.software_genre ,Intersection ,Stepping stone ,Similarity (psychology) ,Artificial intelligence ,Data mining ,business ,computer ,Natural language processing - Abstract
Ontologies [9] comprise a definition of concepts describing their commonalities (genus proximum) as well as their differences (differentia specifica). One might think that with the definition of commonalities and differences, the definition of similarities in and for ontologies should follow immediately. Traditionally, however, the contrary is true, because the method background of ontologies, i.e. logics-based representations, and similarity, i.e. geometry-based representations, have been explored in disjoint communities that have mixed only to a limited extent. In this short paper we survey how our own work touches on the intersection between ontologies and similarity. While this cannot be a comprehensive account of the interrelationship between ontologies and similarity, we aim it to be a stepping stone for inspiration and for indicating entry points for future investigations.
- Published
- 2011
43. Informing Datalog through Language Intelligence – A Personal Perspective
- Author
-
Veronica Dahl
- Subjects
business.industry ,Computer science ,computer.software_genre ,Data science ,Datalog ,Knowledge extraction ,Description logic ,Position paper ,Artificial intelligence ,Computational linguistics ,business ,Semantic Web ,computer ,Natural language ,Natural language processing ,Logic programming ,computer.programming_language - Abstract
Despite AI's paramount aim of developing convincing similes of true natural language "understanding", crucial knowledge that is increasingly becoming available to computers in text form on web repositories remains in fact decipherable only by humans. In this position paper, we present our views on the reasons for this failure, and we argue that for bringing computers closer to becoming true extensions of the human brain, we need to endow them with a cognitively-informed web by integrating new methodologies in the inter-disciplines involved, around the pivot of Logic Programming and Datalog.
- Published
- 2011
44. Image-Based and Sketch-Based Modeling of Plants and Trees
- Author
-
Sing Bing Kang
- Subjects
Tree (data structure) ,Sketch-based modeling ,Computer science ,business.industry ,Short paper ,Process (computing) ,Artificial intelligence ,Machine learning ,computer.software_genre ,business ,computer ,Image based ,Image (mathematics) - Abstract
In this short paper, I outline representative techniques for modeling plants and trees using images and sketches. Image-based approaches have the distinct advantage that the resulting model inherits the realistic shape and complexity of a real plant or tree. Using sketches to produce tree models relies much more on prior knowledge of tree construction but makes the modeling process intuitive and easy.
- Published
- 2011
45. Modeling Multi-agent Domains in an Action Languages: An Empirical Study Using $\mathcal{C}$
- Author
-
Tran Cao Son, Chitta Baral, and Enrico Pontelli
- Subjects
Cognitive science ,Computer science ,business.industry ,Short paper ,A domain ,Action language ,computer.software_genre ,Empirical research ,Action (philosophy) ,Single agent ,Relevance (information retrieval) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
In the last two decades there has been a lot of research on action languages and reasoning about actions. Most of this research assume a domain with a single agent and possibly the environment. In this short paper we explore the relevance of this research vis-a-vis modeling multi-agent domains. We use the action language $\mathcal{C}$ and show that with minimal extensions it can capture several multi-agent domains from the literature.
- Published
- 2009
46. Imprecise Probability as an Approach to Improved Dependability in High-Level Information Fusion
- Author
-
Ronnie Johansson, Alexander Karlsson, and Sten F. Andler
- Subjects
Decision support system ,Situation awareness ,Computer science ,business.industry ,Bayesian probability ,Bayesian network ,Machine learning ,computer.software_genre ,Imprecise probability ,Position paper ,Dependability ,Artificial intelligence ,Data mining ,business ,computer ,Reliability (statistics) - Abstract
The main goal of information fusion can be seen as improving human or automatic decision-making by exploiting diversities in information from multiple sources. High-level information fusion aims specifically at decision support regarding situations, often expressed as “achieving situation awareness”. A crucial issue for decision making based on such support is trust that can be defined as “accepted dependence”, where dependence or dependability is an overall term for many other concepts, e.g., reliability. This position paper reports on ongoing and planned research concerning imprecise probability as an approach to improved dependability in high-level information fusion. We elaborate on high-level information fusion from a generic perspective and a partial mapping from a taxonomy of dependability to high-level information fusion is presented. Three application domains: defense, manufacturing, and precision agriculture, where experiments are planned to be implemented are depicted. We conclude that high-level information fusion as an application-oriented research area, where precise probability (Bayesian theory) is commonly adopted, provides an excellent evaluation ground for imprecise probability.
- Published
- 2008
47. Analyzing Argumentative Structures in Procedural Texts
- Author
-
Lionel Fontan and Patrick Saint-Dizier
- Subjects
Structure (mathematical logic) ,Argumentative ,Computer science ,business.industry ,Short paper ,Artificial intelligence ,computer.software_genre ,business ,Generative lexicon ,computer ,Linguistics ,Natural language processing ,Focus (linguistics) - Abstract
In this short paper, we present the explicative structure as found in procedural texts. We focus in particular on arguments, and show how warnings, a type of arguments, can be extracted.
- Published
- 2008
48. Exploiting Structure and Conventions of Movie Scripts for Information Retrieval and Text Mining
- Author
-
Arnav Jhala
- Subjects
Structure (mathematical logic) ,Information retrieval ,Character (computing) ,business.industry ,Computer science ,Short paper ,computer.software_genre ,Interactive storytelling ,Blocking (computing) ,Annotation ,Scripting language ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Narrative ,Artificial intelligence ,business ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,computer ,Natural language processing - Abstract
Movie scripts are documents that describe the story, stage direction for actors and camera, and dialogue. Script writers, directors, and cinematographers have standardized the format and language that is used in script writing. Scripts contain a wealth of information about narrative patterns, character direction, blocking, and camera control that can be extracted for various applications in interactive storytelling. In this short paper, we propose the creation of an automatically annotated corpus of movie scripts and describe our initial efforts in automating script annotation. We first describe the parts of a movie script that can be automatically annotated and then describe the use of an existing language processing toolkit to automatically annotate specific parts of a movie script.
- Published
- 2006
49. Reactive Food Gathering
- Author
-
Robert Logie, Kevin Waugh, and Jon G. Hall
- Subjects
Engineering ,business.industry ,media_common.quotation_subject ,Deontic logic ,Short paper ,Representation (systemics) ,computer.software_genre ,Intelligent agent ,Human–computer interaction ,Formal specification ,Simplicity ,Artificial intelligence ,Problem set ,business ,computer ,Reactive system ,media_common - Abstract
This short paper describes a simple agent system aimed at addressing the food gathering problem set for the 2005 CLIMA contest. Our system is implemented as a collection of reactive agents which dynamically switch between a number of behaviours depending on interaction with their environment. Our agents maintain no internal representation of their environment and operate purely in response to their immediate surroundings. The agents collectively map the environment co-operating indirectly via environmental markers and they use these markers to assist them in locating the depot when they discover food. The required behaviour emerges from the interaction between agents and the marked environment. Despite the simplicity of the agents and their behaviours formal description is difficult. We concentrate more on identifying interesting problems in characterising system exhibiting emergent behaviour and outline possible logic approaches to dealing with them. The application (and one or two other systems addressing the same problem in a different manner) can be downloaded from: http://219.1.164.219/~robert/pwBlog/wp-content/CLIMAbuild.zip
- Published
- 2006
50. Automatic Video Shot Boundary Detection Using Machine Learning
- Author
-
Sameer Singh and Wei Ren
- Subjects
Boundary detection ,Motion compensation ,business.product_category ,business.industry ,Computer science ,Shot (filmmaking) ,Video sequence ,Machine learning ,computer.software_genre ,Edge detection ,Set (abstract data type) ,Paper machine ,Video tracking ,Computer Science::Multimedia ,Computer vision ,Artificial intelligence ,business ,computer - Abstract
In this paper we present a machine learning system that can accurately predict the transitions between frames in a video sequence. We propose a set of novel features and describe how to use dominant features based on a coarse-to-fine strategy to accurately predict video transitions.
- Published
- 2004
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.