559 results on '"parsers"'
Search Results
2. Classification of Deformed Objects Using Advanced LR Parsers
- Author
-
Junek, Lukas, Stastny, Jiri, Kacprzyk, Janusz, Series Editor, Matoušek, Radek, editor, and Kůdela, Jakub, editor
- Published
- 2021
- Full Text
- View/download PDF
Catalog
3. Parsers and Grammars: A Tutorial Overview from the Linguistics Building.
- Author
-
Acuña-Fariña, Carlos
- Subjects
- *
GRAMMAR , *LINGUISTICS , *ELECTROPHYSIOLOGY , *HEURISTIC - Abstract
The purpose of this paper is to re-examine the relationship between grammars and processing systems in light of the various forms of experimental research (especially of an electrophysiological nature) that has been conducted in the last fifteen years or so. First, the notion of 'processing strategy' or 'heuristics processing' is considered followed by a discussion of structures of great morphosyntactic complexity that parsing systems seem to tackle by simply respecting complex grammatical laws, instead of by resorting to shortcuts. Then, grammatical illusions and what these can teach us about the processing of grammar are considered. It is argued that illusions allow us to discern a few explanatory principles that may redefine the way we see parser–grammar relations. Among these is the idea that how long illusions last in the online-to-offline transition depends in part on their 'templatability', that is, the ease with which they become gestaltic templates. Another key idea is that some apparent illusions are in fact nothing more than grammar contemplated at work as in slow motion. [ABSTRACT FROM AUTHOR] more...
- Published
- 2022
- Full Text
- View/download PDF
4. Large‐scale semi‐automated migration of legacy C/C++ test code.
- Author
-
Schuts, Mathijs T. W., Aarssen, Rodin T. A., Tielemans, Paul M., and Vinju, Jurgen J.
- Subjects
SOURCE code ,SOFTWARE engineers ,SOFTWARE maintenance ,SOFTWARE engineering ,C++ - Abstract
This is an industrial experience report on a large semi‐automated migration of legacy test code in C and C++. The particular migration was enabled by automating most of the maintenance steps. Without automation this particular large‐scale migration would not have been conducted, due to the risks involved in manual maintenance (risk of introducing errors, risk of unexpected rework, and loss of productivity). We describe and evaluate the method of automation we used on this real‐world case. The benefits were that by automating analysis, we could make sure that we understand all the relevant details for the envisioned maintenance, without having to manually read and check our theories. Furthermore, by automating transformations we could reiterate and improve over complex and large scale source code updates, until they were "just right." The drawbacks were that, first, we have had to learn new metaprogramming skills. Second, our automation scripts are not readily reusable for other contexts; they were necessarily developed for this ad‐hoc maintenance task. Our analysis shows that automated software maintenance as compared to the (hypothetical) manual alternative method seems to be better both in terms of avoiding mistakes and avoiding rework because of such mistakes. It seems that necessary and beneficial source code maintenance need not to be avoided, if software engineers are enabled to create bespoke (and ad‐hoc) analysis and transformation tools to support it. [ABSTRACT FROM AUTHOR] more...
- Published
- 2022
- Full Text
- View/download PDF
5. A Verified Earley Parser
- Author
-
Martin Rau and Tobias Nipkow, Rau, Martin, Nipkow, Tobias, Martin Rau and Tobias Nipkow, Rau, Martin, and Nipkow, Tobias
- Abstract
An Earley parser is a top-down parsing technique that is capable of parsing arbitrary context-free grammars. We present a functional implementation of an Earley parser verified using the interactive theorem prover Isabelle/HOL. Our formalization builds upon Cliff Jones' extensive, refinement-based paper proof. We implement and prove soundness and completeness of a functional recognizer modeling Jay Earley’s original imperative implementation and extend it with the necessary data structures to enable the construction of parse trees following the work of Elizabeth Scott. Building upon this foundation, we develop a functional parser and prove its soundness. We round off the paper by providing an informal argument and empirical data regarding the running time and space complexity of our implementation. more...
- Published
- 2024
- Full Text
- View/download PDF
6. Designing and Interpreting a Mathematical Programming Language
- Author
-
Hüseyin Pehlivan
- Subjects
programming languages ,formal grammars ,parsers ,interpreters ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Chemistry ,QD1-999 - Abstract
The syntax of the programming languages has a significant impact on the definition and validation of mathematical calculations. In particular, the management of code identification and validation processes can be made easier and faster, depending on the parametric behavior of the functions. In this article, a programming language that supports the use of mathematical function structures is designed and an interpreter, which can evaluate the source code written in this language, is developed. The language syntax is represented by an LL (k) grammar defined in the BNF notation. The interpreter consists of several basic components such as parser, semantic controller and code evaluator, each of which makes a different kind of code interpretation. The LL (k) parser component used for the syntactic analysis of the language is generated via an automatic code generation tool called JavaCC. The other components work on the abstract syntactic tree that this parser generates. To illustrate the use of the language with code samples, several mathematical algorithms that include calculations on different sequences of numbers, are programmed and interpreted. more...
- Published
- 2019
- Full Text
- View/download PDF
7. Deep Learning for Natural Language Parsing
- Author
-
Sardar Jaf and Calum Calder
- Subjects
BiLSTM parsing ,deep learning ,dependency parsing ,natural language processing ,parsers ,shift-reduce parsing ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Natural language processing problems (such as speech recognition, text-based data mining, and text or speech generation) are becoming increasingly important. Before effectively approaching many of these problems, it is necessary to process the syntactic structures of the sentences. Syntactic parsing is the task of constructing a syntactic parse tree over a sentence which describes the structure of the sentence. Parse trees are used as part of many language processing applications. In this paper, we present a multi-lingual dependency parser. Using advanced deep learning techniques, our parser architecture tackles common issues with parsing such as long-distance head attachment, while using `architecture engineering' to adapt to each target language in order to reduce the feature engineering often required for parsing tasks. We implement a parser based on this architecture to utilize transfer learning techniques to address important issues related with limited-resourced language. We exceed the accuracy of state-of-the-art parsers on languages with limited training resources by a considerable margin. We present promising results for solving core problems in natural language parsing, while also performing at state-of-the-art accuracy on general parsing tasks. more...
- Published
- 2019
- Full Text
- View/download PDF
8. Sayısal Çözümleme Yöntemlerinin Programlanması ve Yorumlanması
- Author
-
Hüseyin Pehlivan
- Subjects
sayısal yöntemler ,programlama dilleri ,biçimsel gramerler ,ayrıştırıcılar ,yorumlayıcılar ,numerical methods ,programming languages ,formal grammars ,parsers ,interpreters ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Science ,Science (General) ,Q1-390 - Abstract
Sayısal hesaplama ya da diğer adıyla nümerik analiz, uygulamalı matematiğin önemli bir dalıdır. Matematiğinanalitik çözüm üretemediği veya üretilen çözümün uygulama açısından yüksek hesaplama karmaşıklığına sahipolduğu durumlarda sayısal yöntemler kullanılır. Bu makalede, sayısal yöntemlerin programlanabilmesinisağlayan bir programlama dilinin tasarımı yapılmış ve bu dilde yazılan kaynak kodun değerlendirmesiniyapabilen bir yorumlayıcı geliştirilmiştir. Dilin sözdizimi BNF (Backus Naur Form) notasyonunda tanımlananbir LL(k) grameri ile temsil edilmiştir. Yorumlayıcı, her biri kodun farklı türden yorumlamasını yapan ayrıştırıcı,anlamsal denetleyici, simgesel türev alıcı ve kod değerlendirici gibi birkaç temel bileşenden oluşmaktadır. Dilinsözdizim analizi için kullanılan LL(k) ayrıştırıcı bileşeni, otomatik kod üretim aracı olan JavaCC yardımıylaüretilmiştir. Diğer bileşenler bu ayrıştırıcının oluşturduğu soyut sözdizim ağacı üzerinde çalışmaktadır. Dilinkullanımına yönelik olarak, birkaç sayısal kök bulma yönteminin programlaması ve yorumlaması gösterilmiştir.Bazı popüler diller ile dil metriklerine dayalı bir karşılaştırma yapılmış ve koşum zamanları değerlendirilmiştir. more...
- Published
- 2019
- Full Text
- View/download PDF
9. History and Key Developments in Intelligent Computer-Assisted Language Learning (ICALL)
- Author
-
Heift, Trude, May, Stephen, Series editor, and Thorne, Steven L., editor
- Published
- 2017
- Full Text
- View/download PDF
10. Análise de Ferramentas de Compiladores em Ambientes Virtualizados.
- Author
-
Sachs C. de Barbosa, Cinthyan Renata, Roque e Faria, Carolinne, and M. Campano Junior, Maurílio
- Subjects
- *
COMPILERS (Computer programs) , *COMPUTER science , *PUBLIC universities & colleges , *C++ , *GRAMMAR , *UNDERGRADUATES - Abstract
The use of teaching tools has become an alternative to complement the learning of school content. This paper provides some overview aspects of Compilation and performance analysis of the computational tools GALS, Grammophone, The Context Grammar Free Checker, Verto, and Parsing Simulator that were developed to support the compilation process and aim at assisting the learning in Compilers course. There are several known tools, but only a few were built for academic purposes and will be presented in this paper, as they were tested by students in the Compilers course in the Undergraduate course and also int the Master's course in Computer Science at a Brazilian Public University in Paraná to analyze hypotheses, to help verifying parsing examples and to exchange experiences about these Compiler tools. It was observed that the lexical and mainly syntactic analysis phases become more didactic and attractive to the students, making it easier to understand their functionalities and implementation of a compiler as a whole. GALS has shown to be a good option with a simple interface, working with lexical and syntactic analysis for more than one language (Java, C++ and Delphi). Studies of Context Free Grammars in LL(1), LR(0) and LR(1) format may be favored not only with GALS, but also with the tools Grammophone and The Context Grammar Free Checker. Verto, on the other hand, works didactically, not only on the lexical and syntactic analysis steps (the latter also with LR(1) Parser), but also on code generation. Parsing Simulator proved to be an intuitive tool and also presents an extensive collection of syntactic analysis options showing the step by step LL(1) and LR(K) analysis tables, promoting teaching-learning in Compilers. [ABSTRACT FROM AUTHOR] more...
- Published
- 2021
- Full Text
- View/download PDF
11. Statically Resolvable Ambiguity
- Author
-
Palmkvist, Viktor, Castegren, Elias, Haller, Philipp, Broman, David, Palmkvist, Viktor, Castegren, Elias, Haller, Philipp, and Broman, David
- Abstract
Traditionally, a grammar defining the syntax of a programming language is typically both context free and unambiguous. However, recent work suggests that an attractive alternative is to use ambiguous grammars,thus postponing the task of resolving the ambiguity to the end user. If all programs accepted by an ambiguous grammar can be rewritten unambiguously, then the parser for the grammar is said to be resolvably ambiguous. Guaranteeing resolvable ambiguity statically---for all programs---is hard, where previous work only solves it partially using techniques based on property-based testing. In this paper, we present the first efficient, practical, and proven correct solution to the statically resolvable ambiguity problem. Our approach introduces several key ideas, including splittable productions, operator sequences, and the concept of a grouper that works in tandem with a standard parser. We prove static resolvability using a Coq mechanization and demonstrate its efficiency and practical applicability by implementing and integrating resolvable ambiguity into an essential part of the standard OCaml parser. more...
- Published
- 2023
- Full Text
- View/download PDF
12. Morpheus: Automated Safety Verification of Data-Dependent Parser Combinator Programs
- Author
-
Ashish Mishra and Suresh Jagannathan, Mishra, Ashish, Jagannathan, Suresh, Ashish Mishra and Suresh Jagannathan, Mishra, Ashish, and Jagannathan, Suresh
- Abstract
Parser combinators are a well-known mechanism used for the compositional construction of parsers, and have shown to be particularly useful in writing parsers for rich grammars with data-dependencies and global state. Verifying applications written using them, however, has proven to be challenging in large part because of the inherently effectful nature of the parsers being composed and the difficulty in reasoning about the arbitrarily rich data-dependent semantic actions that can be associated with parsing actions. In this paper, we address these challenges by defining a parser combinator framework called Morpheus equipped with abstractions for defining composable effects tailored for parsing and semantic actions, and a rich specification language used to define safety properties over the constituent parsers comprising a program. Even though its abstractions yield many of the same expressivity benefits as other parser combinator systems, Morpheus is carefully engineered to yield a substantially more tractable automated verification pathway. We demonstrate its utility in verifying a number of realistic, challenging parsing applications, including several cases that involve non-trivial data-dependent relations. more...
- Published
- 2023
- Full Text
- View/download PDF
13. Designing and Interpreting a Mathematical Programming Language.
- Author
-
Pehlivan, Hüseyin
- Subjects
PROGRAMMING languages ,SYNTAX in programming languages ,MATHEMATICAL programming ,MATHEMATICAL functions ,SOURCE code - Abstract
The syntax of the programming languages has a significant impact on the definition and validation of mathematical calculations. In particular, the management of code identification and validation processes can be made easier and faster, depending on the parametric behavior of the functions. In this article, a programming language that supports the use of mathematical function structures is designed and an interpreter, which can evaluate the source code written in this language, is developed. The language syntax is represented by an LL (k) grammar defined in the BNF notation. The interpreter consists of several basic components such as parser, semantic controller and code evaluator, each of which makes a different kind of code interpretation. The LL (k) parser component used for the syntactic analysis of the language is generated via an automatic code generation tool called JavaCC. The other components work on the abstract syntactic tree that this parser generates. To illustrate the use of the language with code samples, several mathematical algorithms that include calculations on different sequences of numbers, are programmed and interpreted. The paper also performs a comparative analysis of the language with some related ones. The paper also performs a comparative analysis of the language with some related ones based on some design principles and mathematical aspects. [ABSTRACT FROM AUTHOR] more...
- Published
- 2019
- Full Text
- View/download PDF
14. Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool.
- Author
-
Atutxa, Aitziber, Bengoetxea, Kepa, Diaz de Ilarraza, Arantza, and Iruskieta, Mikel
- Subjects
- *
DISCOURSE analysis , *MINERAL industry equipment , *TREE graphs , *SENTIMENT analysis , *NATURAL language processing - Abstract
Lately, discourse structure has received considerable attention due to the benefits its application offers in several NLP tasks such as opinion mining, summarization, question answering, text simplification, among others. When automatically analyzing texts, discourse parsers typically perform two different tasks: i) identification of basic discourse units (text segmentation) ii) linking discourse units by means of discourse relations, building structures such as trees or graphs. The resulting discourse structures are, in general terms, accurate at intra-sentence discourse-level relations, however they fail to capture the correct inter-sentence relations. Detecting the main discourse unit (the Central Unit) is helpful for discourse analyzers (and also for manual annotation) in improving their results in rhetorical labeling. Bearing this in mind, we set out to build the first two steps of a discourse parser following a top-down strategy: i) to find discourse units, ii) to detect the Central Unit. The final step, i.e. assigning rhetorical relations, remains to be worked on in the immediate future. In accordance with this strategy, our paper presents a tool consisting of a discourse segmenter and an automatic Central Unit detector. [ABSTRACT FROM AUTHOR] more...
- Published
- 2019
- Full Text
- View/download PDF
15. Statically Resolvable Ambiguity
- Author
-
Viktor Palmkvist, Elias Castegren, Philipp Haller, and David Broman
- Subjects
Ambiguity ,Datavetenskap (datalogi) ,OCaml ,Computer Sciences ,Coq ,Safety, Risk, Reliability and Quality ,Parsers ,Software - Abstract
Traditionally, a grammar defining the syntax of a programming language is typically both context free and unambiguous. However, recent work suggests that an attractive alternative is to use ambiguous grammars,thus postponing the task of resolving the ambiguity to the end user. If all programs accepted by an ambiguous grammar can be rewritten unambiguously, then the parser for the grammar is said to be resolvably ambiguous. Guaranteeing resolvable ambiguity statically---for all programs---is hard, where previous work only solves it partially using techniques based on property-based testing. In this paper, we present the first efficient, practical, and proven correct solution to the statically resolvable ambiguity problem. Our approach introduces several key ideas, including splittable productions, operator sequences, and the concept of a grouper that works in tandem with a standard parser. We prove static resolvability using a Coq mechanization and demonstrate its efficiency and practical applicability by implementing and integrating resolvable ambiguity into an essential part of the standard OCaml parser. more...
- Published
- 2023
16. Morpheus: Automated Safety Verification of Data-Dependent Parser Combinator Programs
- Author
-
Mishra, Ashish and Jagannathan, Suresh
- Subjects
FOS: Computer and information sciences ,Computer Science - Programming Languages ,Functional programming ,Refinement types ,Verification ,Type systems ,Parsers ,Domain-specific languages ,Programming Languages (cs.PL) ,Software and its engineering → General programming languages - Abstract
Parser combinators are a well-known mechanism used for the compositional construction of parsers, and have shown to be particularly useful in writing parsers for rich grammars with data-dependencies and global state. Verifying applications written using them, however, has proven to be challenging in large part because of the inherently effectful nature of the parsers being composed and the difficulty in reasoning about the arbitrarily rich data-dependent semantic actions that can be associated with parsing actions. In this paper, we address these challenges by defining a parser combinator framework called Morpheus equipped with abstractions for defining composable effects tailored for parsing and semantic actions, and a rich specification language used to define safety properties over the constituent parsers comprising a program. Even though its abstractions yield many of the same expressivity benefits as other parser combinator systems, Morpheus is carefully engineered to yield a substantially more tractable automated verification pathway. We demonstrate its utility in verifying a number of realistic, challenging parsing applications, including several cases that involve non-trivial data-dependent relations., LIPIcs, Vol. 263, 37th European Conference on Object-Oriented Programming (ECOOP 2023), pages 20:1-20:27 more...
- Published
- 2023
- Full Text
- View/download PDF
17. Analyzing SystemC Designs: SystemC Analysis Approaches for Varying Applications
- Author
-
Jannis Stoppe and Rolf Drechsler
- Subjects
SystemC ,ESL ,analysis ,machine learning ,parsers ,AOP ,hardware/software co-design ,Chemical technology ,TP1-1185 - Abstract
The complexity of hardware designs is still increasing according to Moore’s law. With embedded systems being more and more intertwined and working together not only with each other, but also with their environments as cyber physical systems (CPSs), more streamlined development workflows are employed to handle the increasing complexity during a system’s design phase. SystemC is a C++ library for the design of hardware/software systems, enabling the designer to quickly prototype, e.g., a distributed CPS without having to decide about particular implementation details (such as whether to implement a feature in hardware or in software) early in the design process. Thereby, this approach reduces the initial implementation’s complexity by offering an abstract layer with which to build a working prototype. However, as SystemC is based on C++, analyzing designs becomes a difficult task due to the complex language features that are available to the designer. Several fundamentally different approaches for analyzing SystemC designs have been suggested. This work illustrates several different SystemC analysis approaches, including their specific advantages and shortcomings, allowing designers to pick the right tools to assist them with a specific problem during the design of a system using SystemC. more...
- Published
- 2015
- Full Text
- View/download PDF
18. Automatically assembling a full census of an academic field.
- Author
-
Morgan, Allison C., Way, Samuel F., and Clauset, Aaron
- Subjects
- *
ACADEMIC achievement , *INFORMATION processing , *PROFESSIONAL associations , *COMPUTER science , *AUTOMATION - Abstract
The composition of the scientific workforce shapes the direction of scientific research, directly through the selection of questions to investigate, and indirectly through its influence on the training of future scientists. In most fields, however, complete census information is difficult to obtain, complicating efforts to study workforce dynamics and the effects of policy. This is particularly true in computer science, which lacks a single, all-encompassing directory or professional organization. A full census of computer science would serve many purposes, not the least of which is a better understanding of the trends and causes of unequal representation in computing. Previous academic census efforts have relied on narrow or biased samples, or on professional society membership rolls. A full census can be constructed directly from online departmental faculty directories, but doing so by hand is expensive and time-consuming. Here, we introduce a topical web crawler for automating the collection of faculty information from web-based department rosters, and demonstrate the resulting system on the 205 PhD-granting computer science departments in the U.S. and Canada. This method can quickly construct a complete census of the field, and achieve over 99% precision and recall. We conclude by comparing the resulting 2017 census to a hand-curated 2011 census to quantify turnover and retention in computer science, in general and for female faculty in particular, demonstrating the types of analysis made possible by automated census construction. [ABSTRACT FROM AUTHOR] more...
- Published
- 2018
- Full Text
- View/download PDF
19. Processing of ellipsis with garden-path antecedents in French and German: Evidence from eye tracking.
- Author
-
Paape, Dario, Hemforth, Barbara, and Vasishth, Shravan
- Subjects
- *
EYE tracking , *ELLIPSIS (Grammar) , *EXPERIMENTAL design , *AMBIGUITY , *FRENCH language - Abstract
In a self-paced reading study on German sluicing, Paape (Paape, 2016) found that reading times were shorter at the ellipsis site when the antecedent was a temporarily ambiguous garden-path structure. As a post-hoc explanation of this finding, Paape assumed that the antecedent’s memory representation was reactivated during syntactic reanalysis, making it easier to retrieve. In two eye tracking experiments, we subjected the reactivation hypothesis to further empirical scrutiny. Experiment 1, carried out in French, showed no evidence in favor in the reactivation hypothesis. Instead, results for one out of the three types of garden-path sentences that were tested suggest that subjects sometimes failed to resolve the temporary ambiguity in the antecedent clause, and subsequently failed to resolve the ellipsis. The results of Experiment 2, a conceptual replication of Paape’s (Paape, 2016) original study carried out in German, are compatible with the reactivation hypothesis, but leave open the possibility that the observed speedup for ambiguous antecedents may be due to occasional retrievals of an incorrect structure. [ABSTRACT FROM AUTHOR] more...
- Published
- 2018
- Full Text
- View/download PDF
20. Wailord: Parsers and Reproducibility for Quantum Chemistry
- Author
-
Rohit Goswami
- Subjects
python ,quantum chemistry ,reproducible reports ,parsers ,computational-chemistry - Abstract
Much of the scientific python ecosystem deals with problems at the level when their structure is already present in memory. However, the generation of input files for driving existing codes, as well as the parsing of results is not typically covered in great detail. This presentation bridges the gap between external programs and data-structures, demonstrating via a practical example, the utility of code-generation and parsing expression grammar parsers for reproducible results in quantum chemistry. More details at: https://rgoswami.me/posts/scipycon-2022-meta, The concept of a crisis of reproducibility in scientific research needs no introduction. Although there are several tooling approaches on can take to reduce the cognitive load of keeping track of various steps of an analysis pipeline [1], there remains an almost linguistic gap when it comes to interfacing with domain specific tools. We demonstrate the role of parsers in the reproducibility workflow. By focusing on the generation of input files and the structured extraction of output data, we will aim to plug a gap in the generation of reproducible reports, namely, interfacing (via file I/O) with existing software. The file I/O interface justifiably has many detractors, especially on an HPC (high performance computing) cluster, I/O can be a bottleneck. However, when faced with an opaque binary which outputs freeform results, powered by an input file which has little to no structure beyond a 1500 page manual of keyword arguments, the utility of a domain specific parser can pay off immensely. In our quest to translate domain intuition into computational input constraints, we will work in a reduced grammar, an intermediate representation (IR). Such an IR can be generated for multiple program specifications, so extensions to other software is not difficult either. As a concrete realization of an abstract concept, we will discuss Wailord [2], which uses parsimonious [3] and cookiecutter [4] to interface with ORCA [5], a popular free (but not open source) quantum chemistry software suite. Wailord. We will go over how such an input generation and output parser technique allows for catching otherwise hard to track down errors. Taking a step away from the problem of writing single-purpose input files and functionalities, we demonstrate how a series of tasks can be defined, executed, and harvested into a single report, at the cost of giving up control over the folder structure. [1] https://rgoswami.me/posts/pycon-in-2020-meta/ [2] https://wailord.xyz [3] https://github.com/erikrose/parsimonious [4] https://cookiecutter.readthedocs.io/ [5] https://www.kofo.mpg.de/en/research/services/orca more...
- Published
- 2022
- Full Text
- View/download PDF
21. Large-scale semi-automated migration of legacy C/C++ test code
- Author
-
Schuts, M.T.W. (Mathijs), Aarssen, R.T.A. (Rodin), Tielemans, P.M. (Paul), Vinju, J.J. (Jurgen), Schuts, M.T.W. (Mathijs), Aarssen, R.T.A. (Rodin), Tielemans, P.M. (Paul), and Vinju, J.J. (Jurgen)
- Abstract
This is an industrial experience report on a large semi-automated migration of legacy test code in C and C++. The particular migration was enabled by automating most of the maintenance steps. Without automation this particular large-scale migration would not have been conducted, due to the risks involved in manual maintenance (risk of introducing errors, risk of unexpected rework, and loss of productivity). We describe and evaluate the method of automation we used on this real-world case. The benefits were that by automating analysis, we could make sure that we understand all the relevant details for the envisioned maintenance, without having to manually read and check our theories. Furthermore, by automating transformations we could reiterate and improve over complex and large scale source code updates, until they were “just right.” The drawbacks were that, first, we have had to learn new metaprogramming skills. Second, our automation scripts are not readily reusable for other contexts; they were necessarily developed for this ad-hoc maintenance task. Our analysis shows that automated software maintenance as compared to the (hypothetical) manual alternative method seems to be better both in terms of avoiding mistakes and avoiding rework because of such mistakes. It seems that necessary and beneficial source code maintenance need not to be avoided, if software engineers are enabled to create bespoke (and ad-hoc) analysis and transformation tools to support it. more...
- Published
- 2022
- Full Text
- View/download PDF
22. Dependency-based Siamese long short-term memory network for learning sentence representations.
- Author
-
Zhu, Wenhao, Yao, Tengjun, Ni, Jianyue, Wei, Baogang, and Lu, Zhiguo
- Subjects
- *
NATURAL language processing , *SHORT-term memory , *ARTIFICIAL neural networks , *BAG-of-words model (Computer science) , *MACHINE learning - Abstract
Textual representations play an important role in the field of natural language processing (NLP). The efficiency of NLP tasks, such as text comprehension and information extraction, can be significantly improved with proper textual representations. As neural networks are gradually applied to learn the representation of words and phrases, fairly efficient models of learning short text representations have been developed, such as the continuous bag of words (CBOW) and skip-gram models, and they have been extensively employed in a variety of NLP tasks. Because of the complex structure generated by the longer text lengths, such as sentences, algorithms appropriate for learning short textual representations are not applicable for learning long textual representations. One method of learning long textual representations is the Long Short-Term Memory (LSTM) network, which is suitable for processing sequences. However, the standard LSTM does not adequately address the primary sentence structure (subject, predicate and object), which is an important factor for producing appropriate sentence representations. To resolve this issue, this paper proposes the dependency-based LSTM model (D-LSTM). The D-LSTM divides a sentence representation into two parts: a basic component and a supporting component. The D-LSTM uses a pre-trained dependency parser to obtain the primary sentence information and generate supporting components, and it also uses a standard LSTM model to generate the basic sentence components. A weight factor that can adjust the ratio of the basic and supporting components in a sentence is introduced to generate the sentence representation. Compared with the representation learned by the standard LSTM, the sentence representation learned by the D-LSTM contains a greater amount of useful information. The experimental results show that the D-LSTM is superior to the standard LSTM for sentences involving compositional knowledge (SICK) data. [ABSTRACT FROM AUTHOR] more...
- Published
- 2018
- Full Text
- View/download PDF
23. Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.
- Author
-
Murugesan, Gurusamy, Abdulkadhar, Sabenabanu, and Natarajan, Jeyakumar
- Subjects
- *
PROTEIN-protein interactions , *MACHINE learning , *SUPPORT vector machines , *KERNEL functions , *PROTEINS , *MATHEMATICAL models - Abstract
Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems. [ABSTRACT FROM AUTHOR] more...
- Published
- 2017
- Full Text
- View/download PDF
24. Hardware Inexact Grammar Parser.
- Author
-
Dimopoulos, Alexandros C., Pavlatos, Christos, and Papakonstantinou, George
- Subjects
- *
PARSING (Computer grammar) , *FIELD programmable gate arrays , *ALGORITHMS , *EMBEDDED computer systems , *COMPUTER hardware description languages - Abstract
In this paper, a platform is presented, that given a Stochastic Context-Free Grammar (SCFG), automatically outputs the description of a parser in synthesizable Hardware Description Language (HDL) which can be downloaded in an FPGA (Field Programmable Gate Arrays) board. Although the proposed methodology can be used for various inexact models, the probabilistic model is analyzed in detail and the extension to other inexact schemes is described. Context-Free Grammars (CFG) are augmented with attributes which represent the probability values. Initially, a methodology is proposed based on the fact that the probabilities can be evaluated concurrently with the parsing during the parse table construction by extending the fundamental parsing operation proposed by Chiang & Fu. Using this extended operation, an efficient architecture is presented based on Earley's parallel algorithm, which given an input string, generates the parse table while evaluating concurrently the probabilities of the generated dotted grammar rules in the table. Based on this architecture, a platform has been implemented that automatically generates the hardware design of the parser given a SCFG. The platform is suitable for embedded systems applications where a natural language interface is required or in pattern recognition tasks. The proposed hardware platform has been tested for various SCFGs and was compared with previously presented hardware parser for SCFGs based on Earley's parallel algorithm. The hardware generated by the proposed platform is much less complicated than the one of comparison and succeeds a speed-up of one order of magnitude. [ABSTRACT FROM AUTHOR] more...
- Published
- 2017
- Full Text
- View/download PDF
25. Ambiguity in the processing of Mandarin Chinese relative clauses: One factor cannot explain it all.
- Author
-
Mansbridge, Michael P., Tamaoka, Katsuo, Xiong, Kexin, and Verdonschot, Rinus G.
- Subjects
- *
AMBIGUITY , *MANDARIN dialects , *CHINESE dialects , *CLAUSES (Grammar) , *WORD recognition - Abstract
This study addresses the question of whether native Mandarin Chinese speakers process and comprehend subject-extracted relative clauses (SRC) more readily than object-extracted relative clauses (ORC) in Mandarin Chinese. Presently, this has been a hotly debated issue, with various studies producing contrasting results. Using two eye-tracking experiments with ambiguous and unambiguous RCs, this study shows that both ORCs and SRCs have different processing requirements depending on the locus and time course during reading. The results reveal that ORC reading was possibly facilitated by linear/temporal integration and canonicity. On the other hand, similarity-based interference made ORCs more difficult, and expectation-based processing was more prominent for unambiguous ORCs. Overall, RC processing in Mandarin should not be broken down to a single ORC (dis)advantage, but understood as multiple interdependent factors influencing whether ORCs are either more difficult or easier to parse depending on the task and context at hand. [ABSTRACT FROM AUTHOR] more...
- Published
- 2017
- Full Text
- View/download PDF
26. A study of the transferability of influenza case detection systems between two large healthcare systems.
- Author
-
Su, Howard, Millett, Nicholas E., Aronis, John M., Ruiz, Victor M., Shi, Lingyun, Ye, Ye, Wagner, Michael M., Cooper, Gregory F., Tsui, Fuchiang, Ferraro, Jeffrey P., Haug, Peter J., Gesteland, Per H., Van Bree, Rudy, Nowalk, Andrew J., López Pineda, Arturo, and Ginter, Thomas more...
- Subjects
- *
INFLUENZA , *EPIDEMICS , *CLINICAL medicine , *MEDICAL care , *BAYESIAN analysis - Abstract
Objectives: This study evaluates the accuracy and transferability of Bayesian case detection systems (BCD) that use clinical notes from emergency department (ED) to detect influenza cases. Methods: A BCD uses natural language processing (NLP) to infer the presence or absence of clinical findings from ED notes, which are fed into a Bayesain network classifier (BN) to infer patients’ diagnoses. We developed BCDs at the University of Pittsburgh Medical Center (BCDUPMC) and Intermountain Healthcare in Utah (BCDIH). At each site, we manually built a rule-based NLP and trained a Bayesain network classifier from over 40,000 ED encounters between Jan. 2008 and May. 2010 using feature selection, machine learning, and expert debiasing approach. Transferability of a BCD in this study may be impacted by seven factors: development (source) institution, development parser, application (target) institution, application parser, NLP transfer, BN transfer, and classification task. We employed an ANOVA analysis to study their impacts on BCD performance. Results: Both BCDs discriminated well between influenza and non-influenza on local test cases (AUCs > 0.92). When tested for transferability using the other institution’s cases, BCDUPMC discriminations declined minimally (AUC decreased from 0.95 to 0.94, p<0.01), and BCDIH discriminations declined more (from 0.93 to 0.87, p<0.0001). We attributed the BCDIH decline to the lower recall of the IH parser on UPMC notes. The ANOVA analysis showed five significant factors: development parser, application institution, application parser, BN transfer, and classification task. Conclusion: We demonstrated high influenza case detection performance in two large healthcare systems in two geographically separated regions, providing evidentiary support for the use of automated case detection from routinely collected electronic clinical notes in national influenza surveillance. The transferability could be improved by training Bayesian network classifier locally and increasing the accuracy of the NLP parser. [ABSTRACT FROM AUTHOR] more...
- Published
- 2017
- Full Text
- View/download PDF
27. Plagiarism in solutions of programming tasks in distance learning
- Author
-
Krzysztof Barteczko
- Subjects
plagiarism ,code duplicates detection ,parsers ,tokenization ,Abstract Syntax Tree ,Education - Abstract
Source code plagiarism in students solutions of programming tasks is a serious problem, especially important in distance learning. Naturally, it should be prevented, but publicly available code plagiarism detection tools are not fully adjusted to this purpose. This paper proposes the specific approach to detecting code duplicates. This approach is based on adapting of detection process to characteristics of programming tasks and comprise of freshly developed detecting tools, which could be configured and tuned to fit individual features of the programming task. Particular attention is paid to the possibility of an automatic elimination of duplicate codes from the set of all solutions. As a minimum, this requires the rejection of false-positive duplicates, even for simple, schematic tasks. The case in the use of tools is presented in this context. The discussion is illustrated by applying of proposed tools to duplicates detection in the set of actual, real-life, codes written in Java programming language. more...
- Published
- 2012
28. Large-scale semi-automated migration of legacy C/C++ test code
- Author
-
Mathijs T. W. Schuts, Rodin T. A. Aarssen, Paul M. Tielemans, and Jurgen J. Vinju
- Subjects
Source code generation ,Refactoring ,Program analysis ,Pattern matching ,Parsers ,Software - Abstract
This is an industrial experience report on a large semi-automated migration of legacy test code in C and C++. The particular migration was enabled by automating most of the maintenance steps. Without automation this particular large-scale migration would not have been conducted, due to the risks involved in manual maintenance (risk of introducing errors, risk of unexpected rework, and loss of productivity). We describe and evaluate the method of automation we used on this real-world case. The benefits were that by automating analysis, we could make sure that we understand all the relevant details for the envisioned maintenance, without having to manually read and check our theories. Furthermore, by automating transformations we could reiterate and improve over complex and large scale source code updates, until they were “just right.” The drawbacks were that, first, we have had to learn new metaprogramming skills. Second, our automation scripts are not readily reusable for other contexts; they were necessarily developed for this ad-hoc maintenance task. Our analysis shows that automated software maintenance as compared to the (hypothetical) manual alternative method seems to be better both in terms of avoiding mistakes and avoiding rework because of such mistakes. It seems that necessary and beneficial source code maintenance need not to be avoided, if software engineers are enabled to create bespoke (and ad-hoc) analysis and transformation tools to support it. more...
- Published
- 2022
29. It’s Harder to Break a Relationship When you Commit Long.
- Author
-
Arai, Manabu and Nakamura, Chie
- Subjects
- *
INTERPERSONAL relations , *COMMITMENT (Psychology) , *SELF-organizing systems , *SOCIAL sciences , *LEXICAL access , *LINGUISTICS - Abstract
Past research has produced evidence that parsing commitments strengthen over the processing of additional linguistic elements that are consistent with the commitments and undoing strong commitments takes more time than undoing weak commitments. It remains unclear, however, whether this so-called digging-in effect is exclusively due to the length of an ambiguous region or at least partly to the extra cost of processing these additional phrases. The current study addressed this issue by testing Japanese relative clause structure, where lexical content and sentence meaning were controlled for. The results showed evidence for a digging-in effect reflecting the strengthened commitment to an incorrect analysis caused by the processing of additional adjuncts. Our study provides strong support for the dynamical, self-organizing models of sentence processing but poses a problem for other models including serial two-stage models as well as frequency-based probabilistic models such as the surprisal theory. [ABSTRACT FROM AUTHOR] more...
- Published
- 2016
- Full Text
- View/download PDF
30. Parallel Hardware Stochastic Context-Free Parsers.
- Author
-
Pavlatos, Christos, Dimopoulos, Alexandros C., and Papakonstantinou, George
- Subjects
- *
STOCHASTIC processes , *COMPUTER hardware description languages , *FIELD programmable gate arrays , *PARALLEL algorithms , *COMPUTER science , *PROGRAMMING languages - Abstract
In this paper a platform is presented, that given a stochastic context-free grammar (SCFG), automatically outputs the description of the parser in synthesizable hardware description language (HDL) which can be downloaded in an Field Programmable Gate Arrays (FPGA) board. Initially, according to our methodology the SCFG is augmented with attributes which store the probability values and can be evaluated through corresponding stack actions. The architecture of the produced system is based on a proposed extension of Earley's parallel algorithm, which given an input string, generates the parse trees in the form of an AND-Or parse tree. This AND-or parse tree is then traversed using a proposed tree traversal technique in order to execute the corresponding actions in the correct order, so as to compute the necessary probabilities. The platform is suitable for embedded systems applications where a natural language interface is required or in pattern recognition tasks. The parser generated by the presented platform has been tested for various SCFGs and compared to software approaches. The performance comparison is one to two orders of magnitude in favor of the presented hardware, compared to previous software approaches, depending on the application, the input string length and the number of produced trees. [ABSTRACT FROM AUTHOR] more...
- Published
- 2016
- Full Text
- View/download PDF
31. Analyzing SystemC Designs: SystemC Analysis Approaches for Varying Applications.
- Author
-
Stoppe, Jannis and Drechsler, Rolf
- Subjects
MOORE'S law ,EMBEDDED computer systems ,CYBER physical systems ,COMPUTER systems ,INTERNET of things - Abstract
The complexity of hardware designs is still increasing according to Moore's law. With embedded systems being more and more intertwined and working together not only with each other, but also with their environments as cyber physical systems (CPSs), more streamlined development workflows are employed to handle the increasing complexity during a system's design phase. SystemC is a C++ library for the design of hardware/software systems, enabling the designer to quickly prototype, e.g., a distributed CPS without having to decide about particular implementation details (such as whether to implement a feature in hardware or in software) early in the design process. Thereby, this approach reduces the initial implementation's complexity by offering an abstract layer with which to build a working prototype. However, as SystemC is based on C++, analyzing designs becomes a difficult task due to the complex language features that are available to the designer. Several fundamentally different approaches for analyzing SystemC designs have been suggested. This work illustrates several different SystemC analysis approaches, including their specific advantages and shortcomings, allowing designers to pick the right tools to assist them with a specific problem during the design of a system using SystemC. [ABSTRACT FROM AUTHOR] more...
- Published
- 2015
- Full Text
- View/download PDF
32. Bilinguals are better than monolinguals in detecting manipulative discourse
- Author
-
Natalia Mitrofanova, Evelina Leivada, and Marit Westergaard
- Subjects
Male ,Social Sciences ,Multilingualism ,Executive Function ,0302 clinical medicine ,Cognition ,VDP::Humanities: 000::Linguistics: 010 ,Psychology ,Cognitive linguistics ,Neuroscience of multilingualism ,media_common ,Language ,Multidisciplinary ,Psycholinguistics ,Communication ,05 social sciences ,Software Engineering ,Executive functions ,Illusions ,Cognitive Linguistics ,Semantics ,Engineering and Technology ,Medicine ,Female ,Cognitive psychology ,Research Article ,Adult ,Computer and Information Sciences ,media_common.quotation_subject ,Cognitive Neuroscience ,Science ,Illusion ,050105 experimental psychology ,03 medical and health sciences ,Neurolinguistics ,Reaction Time ,Humans ,0501 psychology and cognitive sciences ,VDP::Humaniora: 000::Språkvitenskapelige fag: 010 ,Cognitive Psychology ,Biology and Life Sciences ,Linguistics ,Parsers ,Cognitive Science ,030217 neurology & neurosurgery ,Neuroscience - Abstract
One of the most contentious topics in cognitive science concerns the impact of bilingualism on cognitive functions and neural resources. Research on executive functions has shown that bilinguals often perform better than monolinguals in tasks that require monitoring and inhibiting automatic responses. The robustness of this effect is a matter of an ongoing debate, with both sides approaching bilingual cognition mainly through measuring abilities that fall outside the core domain of language processing. However, the mental juggling that bilinguals perform daily involves language. This study takes a novel path to bilingual cognition by comparing the performance of monolinguals and bilinguals in a timed task that features a special category of stimulus, which has the peculiar ability to manipulate the cognitive parser into treating it as well-formed while it is not: grammatical illusions. The results reveal that bilinguals outperform monolinguals in detecting illusions, but they are also slower across the board in judging the stimuli, illusory or not. We capture this trade-off by proposing the Plurilingual Adaptive Trade-off Hypothesis (PATH), according to which the adaptation of bilinguals’ cognitive abilities may (i) decrease fallibility to illusions by means of recruiting sharpened top-down control processes, but (ii) this is part of a larger bundle of effects, not all of which are necessarily advantageous. Copyright: © 2021 Leivada et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. more...
- Published
- 2021
33. Modeling the predictive potential of extralinguistic context with script knowledge: The case of fragments
- Author
-
Robin Lemke, Ingo Reich, and Lisa Schäfer
- Subjects
Computer science ,Physiology ,Eggs ,Social Sciences ,computer.software_genre ,0302 clinical medicine ,Reproductive Physiology ,Psychology ,Event (probability theory) ,Language ,Linguistic context ,Grammar ,Multidisciplinary ,Psycholinguistics ,Approximation Methods ,05 social sciences ,Software Engineering ,Physical Sciences ,Medicine ,Engineering and Technology ,Utterance ,Natural language processing ,Research Article ,Linguistic Morphology ,Computer and Information Sciences ,Science ,Context (language use) ,050105 experimental psychology ,03 medical and health sciences ,Humans ,0501 psychology and cognitive sciences ,Syntax ,Preprocessing ,Natural Language Processing ,Probability ,business.industry ,Cognitive Psychology ,Biology and Life Sciences ,Linguistics ,Models, Theoretical ,Parsers ,Cognitive Science ,Language model ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,Mathematics ,Neuroscience - Abstract
We describe a novel approach to estimating the predictability of utterances given extralinguistic context in psycholinguistic research. Predictability effects on language production and comprehension are widely attested, but so far predictability has mostly been manipulated through local linguistic context, which is captured withn-gram language models. However, this method does not allow to investigate predictability effects driven by extralinguistic context. Modeling effects of extralinguistic context is particularly relevant to discourse-initial expressions, which can be predictable even if they lack linguistic context at all. We propose to use script knowledge as an approximation to extralinguistic context. Since the application of script knowledge involves the generation of prediction about upcoming events, we expect that scrips can be used to manipulate the likelihood of linguistic expressions referring to these events. Previous research has shown that script-based discourse expectations modulate the likelihood of linguistic expressions, but script knowledge has often been operationalized with stimuli which were based on researchers’ intuitions and/or expensive production and norming studies. We propose to quantify the likelihood of an utterance based on the probability of the event to which it refers. This probability is calculated with event language models trained on a script knowledge corpus and modulated with probabilistic event chains extracted from the corpus. We use the DeScript corpus of script knowledge to obtain empirically founded estimates of the likelihood of an event to occur in context without having to resort to expensive pre-tests of the stimuli. We exemplify our method at a case study on the usage of nonsentential expressions (fragments), which shows that utterances that are predictable given script-based extralinguistic context are more likely to be reduced. more...
- Published
- 2020
34. Resolving Structural Ambiguity in Language Processing: A Systematic Review
- Author
-
Darzhinova, L. and Darzhinova, L.
- Abstract
This paper addresses these research questions: (1) What are the main ideas presented in the published articles (2005–2020) on structural ambiguity resolution in language processing? (2) What are the main venues for unveiling research on structural ambiguity resolution in language processing? For that, a systematic review is performed, which reports on the eight most relevant studies. It is found the investigations into the topic of interest are conducted across multidisciplinary areas and primarily in the European institutions and the US. This research is circulated in journals, which are peer-reviewed and indexed by Scopus, Web of Science, and other databases. The other major finding is that psychophysical tests are more popular in the field, and reasons for that are explained. The polarity of results on syntactic disambiguation leaves room for much to be discovered. more...
- Published
- 2020
35. Self-Checking Spreadsheets: Recognition of Semantics.
- Author
-
Stewart, M.E.M.
- Subjects
ERROR-correcting codes ,SEMANTICS ,MATHEMATICAL variables ,SCHEME programming language ,MATHEMATICAL formulas ,INTERNET - Abstract
Abstract: This paper demonstrates a self-checking (self-validating) spreadsheet. This checking analyzes the meaning or semantics of the spreadsheet's variables and equations using a parsing scheme. These semantics go beyond dimension, unit, and type checking to include the physical and mathematical formulae that dominate science, engineering, and mathematics. The spreadsheet is a client, JavaScript web application working with a server application. Entries in the spreadsheet are analyzed by semantic parsers on the server, and the representation and recognition of spreadsheet semantics are detailed. The intent of this prototype is to reduce the errors—errors in meaning—that commonly occur when spreadsheets are used. A prototype has been available on the Internet since early 2010 at http://semantics.grc.nasa.gov/cgi-bin/spread.cgi. [Copyright &y& Elsevier] more...
- Published
- 2013
- Full Text
- View/download PDF
36. The value of parsing as feature generation for gene mention recognition.
- Author
-
Smith, Larry H. and Wilbur, W. John
- Abstract
Abstract: We measured the extent to which information surrounding a base noun phrase reflects the presence of a gene name, and evaluated seven different parsers in their ability to provide information for that purpose. Using the GENETAG corpus as a gold standard, we performed machine learning to recognize from its context when a base noun phrase contained a gene name. Starting with the best lexical features, we assessed the gain of adding dependency or dependency-like relations from a full sentence parse. Features derived from parsers improved performance in this partial gene mention recognition task by a small but statistically significant amount. There were virtually no differences between parsers in these experiments. [Copyright &y& Elsevier] more...
- Published
- 2009
- Full Text
- View/download PDF
37. Toward an Engineering Discipline for Grammarware.
- Author
-
Klint, Paul, Lämmel, Ralf, and Verhoef, Chris
- Subjects
SOFTWARE engineering ,GRAMMAR ,COMPUTER software ,COMPUTER systems ,XML (Extensible Markup Language) - Abstract
Grammarware comprises grammars and all grammar-dependent software. The term grammar is meant here in the sense of all established grammar formalisms and grammar notations including context-free grammars, class dictionaries, and XML schemas as well as some forms of tree and graph grammars. The term grammar-dependent software refers to all software that involves grammar knowledge in an essential manner. Archetypal examples of grammar-dependent software are parsers, program converters, and XML document processors. Despite the pervasive role of grammars in software systems, the engineering aspects of grammarware are insufficiently understood. We lay out an agenda that is meant to promote research on increasing the productivity of grammarware development and on improving the quality of grammarware. To this end, we identify the problems with the current grammarware practices, the barriers that currently hamper research, and the promises of an engineering discipline for grammarware, its principles, and the research challenges that have to be addressed. [ABSTRACT FROM AUTHOR] more...
- Published
- 2005
- Full Text
- View/download PDF
38. Relationships among commercial practices and author conflicts of interest in biomedical publishing
- Author
-
Zoltan P. Majdik, Dave Clark, S. Scott Graham, Molly M. Kessler, and Tristin Brynn Hooker
- Subjects
Biomedical Research ,Medical Journals ,Economics ,Social Sciences ,030204 cardiovascular system & hematology ,Advertising revenue ,0302 clinical medicine ,Cognition ,Sociology ,Advertising ,Medicine and Health Sciences ,Psychology ,030212 general & internal medicine ,Publication ,Marketing ,Multidisciplinary ,Software Engineering ,Publishing ,Physical Sciences ,Engineering and Technology ,Medicine ,Editorial Policies ,Research Article ,Computer and Information Sciences ,Permutation ,Reprint ,Science ,Decision Making ,MEDLINE ,Research and Analysis Methods ,03 medical and health sciences ,Humans ,Scientific Publishing ,Estimation ,business.industry ,Conflict of Interest ,Discrete Mathematics ,Ownership ,Conflict of interest ,Cognitive Psychology ,Biology and Life Sciences ,Publication bias ,Parsers ,Communications ,Combinatorics ,Cognitive Science ,business ,Publication Bias ,Medical Humanities ,Mathematics ,Finance ,Neuroscience - Abstract
Recently, concerns have been raised over the potential impacts of commercial relationships on editorial practices in biomedical publishing. Specifically, it has been suggested that certain commercial relationships may make editors more open to publishing articles with author conflicts of interest (aCOI). Using a data set of 128,781 articles published in 159 journals, we evaluated the relationships among commercial publishing practices and reported author conflicts of interest. The 159 journals were grouped according to commercial biases (reprint services, advertising revenue, and ownership by a large commercial publishing firm). 30.6% (39,440) of articles were published in journals showing no evidence of evaluated commercial publishing relationships. 33.9% (43,630) were published in journals accepting advertising and reprint fees; 31.7% (40,887) in journals owned by large publishing firms; 1.2% (1,589) in journals accepting reprint fees only; and 2.5% (3,235) in journals accepting only advertising fees. Journals with commercial relationships were more likely to publish articles with aCOI (9.2% (92/1000) vs. 6.4% (64/1000), p = 0.024). In the multivariate analysis, only a journal's acceptance of reprint fees served as a significant predictor (OR = 2.81 at 95% CI, 1.5 to 8.6). Shared control estimation was used to evaluate the relationships between commercial publishing practices and aCOI frequency in total and by type. BCa-corrected mean difference effect sizes ranged from -1.0 to 6.1, and confirm findings indicating that accepting reprint fees may constitute the most significant commercial bias. The findings indicate that concerns over the influence of industry advertising in medical journals may be overstated, and that accepting fees for reprints may constitute the largest risk of bias for editorial decision-making. more...
- Published
- 2020
39. Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification
- Author
-
Uxoa Iñurrieta, Arantza Díaz de Ilarraza, Kepa Sarasola, Itziar Aduriz, and Gorka Labaka
- Subjects
Vocabulary ,Lexicography ,Computer science ,Social Sciences ,02 engineering and technology ,computer.software_genre ,Machine Learning ,0302 clinical medicine ,Morphology (Grammar) ,0202 electrical engineering, electronic engineering, information engineering ,media_common ,Grammar ,Multidisciplinary ,Parsing ,Software Engineering ,Semantics ,Semàntica ,Phraseology ,Engineering and Technology ,Medicine ,020201 artificial intelligence & image processing ,Information Technology ,Natural language processing ,Research Article ,Linguistic Morphology ,Computer and Information Sciences ,media_common.quotation_subject ,Science ,Verb ,Multiword expression ,03 medical and health sciences ,Artificial Intelligence ,Noun ,Aprenentatge automàtic ,Machine learning ,Syntax ,Lexicons ,Natural Language Processing ,business.industry ,Linguistics ,Parsers ,Morfologia (Gramàtica) ,030221 ophthalmology & optometry ,Artificial intelligence ,business ,computer - Abstract
Multiword Expressions (MWEs) are idiosyncratic combinations of words which pose important challenges to Natural Language Processing. Some kinds of MWEs, such as verbal ones, are particularly hard to identify in corpora, due to their high degree of morphosyntactic flexibility. This paper describes a linguistically motivated method to gather detailed information about verb+noun MWEs (VNMWEs) from corpora. Although the main focus of this study is Spanish, the method is easily adaptable to other languages. Monolingual and parallel corpora are used as input, and data about the morphosyntactic variability of VNMWEs is extracted. This information is then tested in an identification task, obtaining an F score of 0.52, which is considerably higher than related work. This work was funded by the Basque Government, who qualified the IXA research group (of which the authors of this article are members) as an A type research group (IT1343-19). It is also part of the project entitled "MODENA: advanced neural modeling for high-quality translation" (KK-2018/00087). more...
- Published
- 2020
40. A self-describing data transfer model for ITS applications.
- Author
-
Dailey, D.J., Maclean, S., Cathey, F.W., and Meyers, D.
- Abstract
The wide variety of remote sensors used in Intelligent Transportation Systems (ITS) applications (loops, probe vehicles, radar, cameras, etc.) has created a need for general methods by which data can be shared among agencies and users who own disparate computer systems. In this paper, we present a methodology that demonstrates that it is possible to create, encode, and decode a self-describing data stream using: 1) existing data description language standards; 2) parsers to enforce language compliance; 3) a simple content language that flows out of the data description language; and 4) architecture neutral encoders and decoders based on ASN.1. [ABSTRACT FROM PUBLISHER] more...
- Published
- 2002
- Full Text
- View/download PDF
41. Training Agents to Recognize Text by Example.
- Author
-
Lieberman, Henry, Nardi, Bonnie, and Wright, David
- Abstract
An important function of an agent is to be “on the lookout” for bits of information that are interesting to its user, even if these items appear in the midst of a larger body of unstructured information. But how to tell these agents which patterns are meaningful and what to do with the result? Especially when agents are used to recognize text, they are usually driven by parsers which require input in the form of textual grammar rules. Editing grammars is difficult and error-prone for end users. Grammex [“Grammars by Example”] is the first direct manipulation interface designed to allow non-expert users to define grammars interactively. The user presents concrete examples of text that he or she would like the agent to recognize. Rules are constructed by an iterative process, where Grammex heuristically parses the example, displays a set of hypotheses, and the user critiques the system's suggestions. Actions to take upon recognition are also demonstrated by example. [ABSTRACT FROM AUTHOR] more...
- Published
- 2001
- Full Text
- View/download PDF
42. PlateEditor: A web-based application for the management of multi-well plate layouts and associated data
- Author
-
Vincent Delorme, Virginia Carla de Almeida Falcão, Connor Wood, and Minjeong Woo
- Subjects
Computer and Information Sciences ,Engineering drawing ,Computer science ,Science ,Context (language use) ,JavaScript ,01 natural sciences ,Computer Applications ,Computer Architecture ,Computer Software ,Automation ,03 medical and health sciences ,Disk formatting ,Software ,Data visualization ,Industrial Engineering ,Web application ,Data Management ,030304 developmental biology ,computer.programming_language ,Internet ,0303 health sciences ,Commercial software ,Multidisciplinary ,Computers ,business.industry ,Data Visualization ,Software Engineering ,Control Engineering ,Microarray Analysis ,Computer Hardware ,Parsers ,Source Code ,0104 chemical sciences ,Visualization ,010404 medicinal & biomolecular chemistry ,Web-Based Applications ,Medicine ,Engineering and Technology ,business ,computer ,Research Article - Abstract
Multi-well plates are convenient tools to work with in biology experiments, as they allow the probing of multiple conditions in a compact and economic way. Although both free and commercial software exist for the definition of plate layout and management of plate data, we were looking for a more flexible solution, available anywhere, free from download, installation and licensing constraints. In this context, we created PlateEditor, a free web-based, client-side application allowing rapid creation of even complex layouts, including dose-response curves and multiple combination experiments for any plate format up to 1536 wells. PlateEditor also provides heatmap visualization and aggregation features to speed-up the process of data analysis and formatting for export in other application. Written in pure JavaScript, it is fully open-source, can be integrated in various workflows and has the potential to be extended with more functionalities in the future. more...
- Published
- 2021
- Full Text
- View/download PDF
43. Principled procedural parsing
- Author
-
UCL - SST/ICTM/INGI - Pôle en ingénierie informatique, UCL - Ecole Polytechnique de Louvain, Mens, Kim, Pecheur, Charles, Van Roy, Peter, Bagge, Anya Helene, van der Storms, Tijs, Laurent, Nicolas, UCL - SST/ICTM/INGI - Pôle en ingénierie informatique, UCL - Ecole Polytechnique de Louvain, Mens, Kim, Pecheur, Charles, Van Roy, Peter, Bagge, Anya Helene, van der Storms, Tijs, and Laurent, Nicolas more...
- Abstract
Parsing is the process of analysing an input string in order to extract a structured representation of its content (a syntax tree) with respect to a specific language. In the thesis, we focus on parsing formal languages, such as programming or markup languages — as opposed to natural spoken languages. Unlike natural languages, formal languages are never ambiguous: there is only a single correct interpretation of the input. Parsing is a pervasive activity: every time a source file must be turned into executable code, a parser is required. Similarly, parsers are used to convert input files into relevant data structures. It is fair to say that most programs include a parser — sometimes many. As such, making parsers easier to write, use, and modify is a broadly beneficial endeavour. This thesis is concerned with the limitations of currently available parsing systems, and how to overcome them. In particular, we show how to develop a simple yet expressive notation that can be easily extended, and build upon this basis to add context-sensitive parsing to the parsing system, robust support for infix expression (avoiding expressiveness and performance issues relating to associativity selection) as well as many more expressiveness and usability features — such as permissive parsing and debugging tools., (FSA - Sciences de l'ingénieur) -- UCL, 2019 more...
- Published
- 2019
44. Programming and Interpretation of Numerical Analysis Methods
- Author
-
Hüseyin Pehlivan
- Subjects
ayrıştırıcılar ,biçimsel gramerler ,yorumlayıcılar ,lcsh:T ,parsers ,Numerical methods,programming languages,formal grammars,parsers,interpreters ,Mühendislik ,interpreters ,lcsh:Technology ,programlama dilleri ,Engineering ,lcsh:TA1-2040 ,programming languages ,numerical methods ,formal grammars ,lcsh:Q ,sayısal yöntemler ,Sayısal yöntemler,programlama dilleri,biçimsel gramerler,ayrıştırıcılar,yorumlayıcılar ,lcsh:Engineering (General). Civil engineering (General) ,lcsh:Science ,lcsh:Science (General) ,lcsh:Q1-390 - Abstract
Sayısal hesaplama ya da diğer adıyla nümerik analiz, uygulamalı matematiğin önemli bir dalıdır. Matematiğinanalitik çözüm üretemediği veya üretilen çözümün uygulama açısından yüksek hesaplama karmaşıklığına sahipolduğu durumlarda sayısal yöntemler kullanılır. Bu makalede, sayısal yöntemlerin programlanabilmesinisağlayan bir programlama dilinin tasarımı yapılmış ve bu dilde yazılan kaynak kodun değerlendirmesiniyapabilen bir yorumlayıcı geliştirilmiştir. Dilin sözdizimi BNF (Backus Naur Form) notasyonunda tanımlananbir LL(k) grameri ile temsil edilmiştir. Yorumlayıcı, her biri kodun farklı türden yorumlamasını yapan ayrıştırıcı,anlamsal denetleyici, simgesel türev alıcı ve kod değerlendirici gibi birkaç temel bileşenden oluşmaktadır. Dilinsözdizim analizi için kullanılan LL(k) ayrıştırıcı bileşeni, otomatik kod üretim aracı olan JavaCC yardımıylaüretilmiştir. Diğer bileşenler bu ayrıştırıcının oluşturduğu soyut sözdizim ağacı üzerinde çalışmaktadır. Dilinkullanımına yönelik olarak, birkaç sayısal kök bulma yönteminin programlaması ve yorumlaması gösterilmiştir.Bazı popüler diller ile dil metriklerine dayalı bir karşılaştırma yapılmış ve koşum zamanları değerlendirilmiştir., Numerical calculation, or, in other words, numerical analysis, is an important branch of applied mathematics.Numerical methods are used where mathematics can not produce an analytical solution or the generated solutionhas high computational complexity. In this article, a programming language that allows numerical methods to beprogrammed is designed and an interpreter, which can evaluate the source code written in this language, isdeveloped. The language syntax is represented by an LL(k) grammar defined in the BNF (Backus Naur Form)notation. The interpreter consists of several basic components such as parser, semantic controller, symbolicderivator and code evaluator, each of which makes a different kind of code interpretation. The LL(k) parsercomponent used for the syntactic analysis of the language is generated via JavaCC, an automatic code generationtool. The other components work on the abstract syntactic tree that this parser generates. For the use of thelanguage, the programming and interpretation of several numerical root finding methods are demonstrated. Acomparison is made with some populer languages based on language metrics and the running time is evaluated. more...
- Published
- 2019
45. CavBench: a benchmark for protein cavity detection methods
- Author
-
Abel J. P. Gomes, Joaquim Jorge, Ana Mafalda Martins, Francisco Fernandes, Alfredo Ferreira, Tiago M. C. Simões, Sérgio Dias, and uBibliorum
- Subjects
Models, Molecular ,Protein Conformation ,Computer science ,False Negative Result ,Biochemistry ,Database and Informatics Methods ,Protein structure ,Software Design ,Drug Discovery ,Macromolecular Structure Analysis ,Medicine and Health Sciences ,False positive paradox ,Protein cavity detection ,Statistical Data ,0303 health sciences ,Multidisciplinary ,Protein pocket ,Drug discovery ,Statistics ,030302 biochemistry & molecular biology ,Software Engineering ,Physical Sciences ,Benchmark (computing) ,Engineering and Technology ,Medicine ,Algorithms ,Research Article ,Computer and Information Sciences ,Protein Structure ,Drug Research and Development ,Science ,Research and Analysis Methods ,Sensitivity and Specificity ,PDBsum ,Protein–protein interaction ,03 medical and health sciences ,Binding site ,Protein Interactions ,Molecular Biology ,030304 developmental biology ,Pharmacology ,business.industry ,Biology and Life Sciences ,Proteins ,Reproducibility of Results ,Pattern recognition ,Parsers ,Statistical classification ,Docking (molecular) ,Drug Design ,Artificial intelligence ,business ,Mathematics ,Software - Abstract
Extensive research has been applied to discover new techniques and methods to model protein-ligand interactions. In particular, considerable efforts focused on identifying candidate binding sites, which quite often are active sites that correspond to protein pockets or cavities. Thus, these cavities play an important role in molecular docking. However, there is no established benchmark to assess the accuracy of new cavity detection methods. In practice, each new technique is evaluated using a small set of proteins with known binding sites as ground-truth. However, studies supported by large datasets of known cavities and/or binding sites and statistical classification (i.e., false positives, false negatives, true positives, and true negatives) would yield much stronger and reliable assessments. To this end, we propose CavBench, a generic and extensible benchmark to compare different cavity detection methods relative to diverse ground truth datasets (e.g., PDBsum) using statistical classification methods. more...
- Published
- 2019
46. Principled procedural parsing
- Author
-
Laurent, Nicolas, UCL - SST/ICTM/INGI - Pôle en ingénierie informatique, UCL - Ecole Polytechnique de Louvain, Mens, Kim, Pecheur, Charles, Van Roy, Peter, Bagge, Anya Helene, and van der Storms, Tijs more...
- Subjects
Parsing ,Grammar ,TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES ,Backtracking ,Parsers ,Parser - Abstract
Parsing is the process of analysing an input string in order to extract a structured representation of its content (a syntax tree) with respect to a specific language. In the thesis, we focus on parsing formal languages, such as programming or markup languages — as opposed to natural spoken languages. Unlike natural languages, formal languages are never ambiguous: there is only a single correct interpretation of the input. Parsing is a pervasive activity: every time a source file must be turned into executable code, a parser is required. Similarly, parsers are used to convert input files into relevant data structures. It is fair to say that most programs include a parser — sometimes many. As such, making parsers easier to write, use, and modify is a broadly beneficial endeavour. This thesis is concerned with the limitations of currently available parsing systems, and how to overcome them. In particular, we show how to develop a simple yet expressive notation that can be easily extended, and build upon this basis to add context-sensitive parsing to the parsing system, robust support for infix expression (avoiding expressiveness and performance issues relating to associativity selection) as well as many more expressiveness and usability features — such as permissive parsing and debugging tools. (FSA - Sciences de l'ingénieur) -- UCL, 2019 more...
- Published
- 2019
47. Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool
- Author
-
Arantza Díaz de Ilarraza, Kepa Bengoetxea, Aitziber Atutxa, and Mikel Iruskieta
- Subjects
Employment ,Computer and Information Sciences ,Linguistic Morphology ,Word embedding ,Neural Networks ,Economics ,Computer science ,Text simplification ,Science ,media_common.quotation_subject ,Discourse analysis ,Social Sciences ,computer.software_genre ,Semantics ,Automation ,Word Embedding ,Rhetorical question ,Question answering ,Syntax ,Natural Language Processing ,media_common ,Grammar ,Multidisciplinary ,Parsing ,business.industry ,Text segmentation ,Sentiment analysis ,Software Engineering ,Biology and Life Sciences ,Linguistics ,Brazilian Portuguese ,Parsers ,Automatic summarization ,Labor Economics ,Medicine ,Engineering and Technology ,Artificial intelligence ,Information Technology ,business ,computer ,Natural language processing ,Research Article ,Neuroscience - Abstract
Lately, discourse structure has received considerable attention due to the benefits its application offers in several NLP tasks such as opinion mining, summarization, question answering, text simplification, among others. When automatically analyzing texts, discourse parsers typically perform two different tasks: i) identification of basic discourse units (text segmentation) ii) linking discourse units by means of discourse relations, building structures such as trees or graphs. The resulting discourse structures are, in general terms, accurate at intra-sentence discourse-level relations, however they fail to capture the correct inter-sentence relations. Detecting the main discourse unit (the Central Unit) is helpful for discourse analyzers (and also for manual annotation) in improving their results in rhetorical labeling. Bearing this in mind, we set out to build the first two steps of a discourse parser following a top-down strategy: i) to find discourse units, ii) to detect the Central Unit. The final step, i.e. assigning rhetorical relations, remains to be worked on in the immediate future. In accordance with this strategy, our paper presents a tool consisting of a discourse segmenter and an automatic Central Unit detector. This study was carried out within the framework of the following projects: IXA Group: natural language processing IT1343-19 (Basque Government), DL4NLP KK-2019/00045 (Basque Government), PROSA-MED TIN2016-77820-C3-1-R (MINECO) and DeepReading: RTI2018-096846-B-C21 (MCIU/AEI/FEDER, UE). more...
- Published
- 2019
48. Incremental Parsing of Common Lisp Code
- Author
-
Durand, Irène, Strandh, Robert, Durand, Irène, Dave Cooper, Laboratoire Bordelais de Recherche en Informatique (LaBRI), and Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB) more...
- Subjects
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Development frameworks and environments ,[INFO.INFO-PL]Computer Science [cs]/Programming Languages [cs.PL] ,[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL] ,Functional languages ,[INFO.INFO-TT] Computer Science [cs]/Document and Text Processing ,Common Lisp ,Integrated and visual development environments ,Multiparadigm languages ,Parsers ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,[INFO.INFO-PL] Computer Science [cs]/Programming Languages [cs.PL] - Abstract
In a text editor for writing Common Lisp source code, it is desirable to have an accurate analysis of the buffer contents, so that the role of the elements of the code can be indicated to the programmer. Furthermore, the buffer contents should preferably be analyzed after each keystroke so that the programmer has up-to-date information resulting from the analysis. We describe an incremental parser that can be used as a key component of such an analyzer. The parser, itself written in Common Lisp, uses a special-purpose implementation of the Common Lisp read function in combination with a cache that stores existing results of calling the reader. Since the parser uses the standard Common Lisp reader, the resulting analysis is very accurate. Furthermore, the cache makes the parser very fast in most common cases; re-parsing a buffer in which a single character has been altered takes only a few milliseconds.   more...
- Published
- 2018
- Full Text
- View/download PDF
49. Dependency-based Siamese long short-term memory network for learning sentence representations
- Author
-
Baogang Wei, Jianyue Ni, Tengjun Yao, Wenhao Zhu, and Zhiguo Lu
- Subjects
Computer science ,Social Sciences ,lcsh:Medicine ,02 engineering and technology ,computer.software_genre ,Machine Learning ,Learning and Memory ,Cognition ,Dependency grammar ,0202 electrical engineering, electronic engineering, information engineering ,Psychology ,lcsh:Science ,Neurolinguistics ,Grammar ,Multidisciplinary ,Artificial neural network ,Software Engineering ,Predicate (grammar) ,Semantics ,Information extraction ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Engineering and Technology ,020201 artificial intelligence & image processing ,Information Technology ,Natural language processing ,Sentence ,Research Article ,Computer and Information Sciences ,Neural Networks ,Object (grammar) ,Memory ,020204 information systems ,Learning ,Syntax ,Natural Language Processing ,business.industry ,lcsh:R ,Cognitive Psychology ,Biology and Life Sciences ,Linguistics ,Models, Theoretical ,Parsers ,Sentence Processing ,Bag-of-words model ,Cognitive Science ,lcsh:Q ,Neural Networks, Computer ,Artificial intelligence ,business ,computer ,Neuroscience - Abstract
Textual representations play an important role in the field of natural language processing (NLP). The efficiency of NLP tasks, such as text comprehension and information extraction, can be significantly improved with proper textual representations. As neural networks are gradually applied to learn the representation of words and phrases, fairly efficient models of learning short text representations have been developed, such as the continuous bag of words (CBOW) and skip-gram models, and they have been extensively employed in a variety of NLP tasks. Because of the complex structure generated by the longer text lengths, such as sentences, algorithms appropriate for learning short textual representations are not applicable for learning long textual representations. One method of learning long textual representations is the Long Short-Term Memory (LSTM) network, which is suitable for processing sequences. However, the standard LSTM does not adequately address the primary sentence structure (subject, predicate and object), which is an important factor for producing appropriate sentence representations. To resolve this issue, this paper proposes the dependency-based LSTM model (D-LSTM). The D-LSTM divides a sentence representation into two parts: a basic component and a supporting component. The D-LSTM uses a pre-trained dependency parser to obtain the primary sentence information and generate supporting components, and it also uses a standard LSTM model to generate the basic sentence components. A weight factor that can adjust the ratio of the basic and supporting components in a sentence is introduced to generate the sentence representation. Compared with the representation learned by the standard LSTM, the sentence representation learned by the D-LSTM contains a greater amount of useful information. The experimental results show that the D-LSTM is superior to the standard LSTM for sentences involving compositional knowledge (SICK) data. more...
- Published
- 2018
50. Extending code editors with LPeg parsers
- Author
-
Fajfar, Rok and Kosar, Tomaž
- Subjects
programski jeziki ,parsers ,programming languages ,editors ,udc:004.4'42:004.43(043.2) ,razpoznavalniki ,Lua ,urejevalniki - Abstract
Cilj diplomskega dela je predstaviti razpoznavalnike LPeg kot alternativo regularnim izrazom ter prikazati njihovo uporabo v namene razširjanja urejevalnikov programske kode. Razpoznavalniki tipa PEG so vrsta navzdoljnih razpoznavalnikov, LPeg pa je implementacija razpoznavalnikov PEG za skriptni programski jezik Lua. Po izgledu so podobni kontekstno prostim gramatikam z dodanimi regularnimi izrazi in imajo številne značilnosti zaradi katerih so odlična izbira za obdelavo sintakse programskih jezikov. S pomočjo knjižnice LPeg smo izdelali razširitev za urejevalnik programske kode Howl, ki dodaja podporo za funkcijski jezik Elixir. The goal of this diploma thesis is to present LPeg parsers as an alternative to regular expressions and to show their usage for the means of extending code editors. PEG parsers are a form of top-down parsers and LPeg is their implementation for the Lua scripting language. They look a lot like context free grammars with added regular expressions and have several attibutes that make them perfect for working with programming language syntax. With help of the LPeg library we've extended the editor Howl with Elixir support. more...
- Published
- 2017
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.