15 results on '"Piao, Scott Songlin"'
Search Results
2. Open Welsh Language Resources for a Corpus Annotation Framework
- Author
-
Piao, Scott Songlin, Neale, Steven, Ezeani, Ignatius, Rayson, Paul Edward, Knight, Dawn, Donnelly, Kevin, Piao, Scott Songlin, Neale, Steven, Ezeani, Ignatius, Rayson, Paul Edward, Knight, Dawn, and Donnelly, Kevin
- Published
- 2019
3. Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger
- Author
-
Calzolari, Nicoletta, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Hasida, Koiti, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios, Tokunaga, Takenobu, El Haj, Mahmoud, Rayson, Paul Edward, Piao, Scott Songlin, Knight, Jo, Calzolari, Nicoletta, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Hasida, Koiti, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios, Tokunaga, Takenobu, El Haj, Mahmoud, Rayson, Paul Edward, Piao, Scott Songlin, and Knight, Jo
- Abstract
In many areas of academic publishing, there is an explosion of literature, and sub-division of fields into subfields, leading to stove-piping where sub-communities of expertise become disconnected from each other. This is especially true in the genetics literature over the last 10 years where researchers are no longer able to maintain knowledge of previously related areas. This paper extends several approaches based on natural language processing and corpus linguistics which allow us to examine corpora derived from bodies of genetics literature and will help to make comparisons and improve retrieval methods using domain knowledge via an existing gene ontology. We derived two open access medical journal corpora from PubMed related to psychiatric genetics and immune disorder genetics. We created a novel Gene Ontology Semantic Tagger (GOST) and lexicon to annotate the corpora and are then able to compare subsets of literature to understand the relative distributions of genetic terminology, thereby enabling researchers to make improved connections between them.
- Published
- 2018
4. Towards A Welsh Semantic Annotation System
- Author
-
Calzolari, Nicoletta, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Hasida, Koiti, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios, Tokunaga, Takenobu, Piao, Scott Songlin, Rayson, Paul Edward, Knight, Dawn, Watkins, Gareth, Calzolari, Nicoletta, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Hasida, Koiti, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios, Tokunaga, Takenobu, Piao, Scott Songlin, Rayson, Paul Edward, Knight, Dawn, and Watkins, Gareth
- Abstract
Automatic semantic annotation of natural language data is an important task in Natural Language Processing, and a variety of semantic taggers have been developed for this task, particularly for English. However, for many languages, particularly for low-resource languages, such tools are yet to be developed. In this paper, we report on the development of an automatic Welsh semantic annotation tool (named CySemTagger) in the CorCenCC Project, which will facilitate semantic-level analysis of Welsh language data on a large scale. Based on Lancaster’s USAS semantic tagger framework, this tool tags words in Welsh texts with semantic tags from a semantic classification scheme, and is designed to be compatible with multiple Welsh POS taggers and POS tagsets by mapping different tagsets into a core shared POS tagset that is used internally by CySemTagger. Our initial evaluation shows that the tagger can cover up to 91.78% of words in Welsh text. This tagger is under continuous development, and will provide a critical tool for Welsh language corpus and information processing at semantic level.
- Published
- 2018
5. Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh)
- Author
-
Knight, Dawn, Fitzpatrick, Tess, Morris, Steve, Evas, Jeremy, Rayson, Paul Edward, Spasic, Irena, Stonelake, Mark, Mon Thomas, Enlli, Neale, Steven, Needs, Jennifer, Piao, Scott Songlin, Rees, Mair, Watkins, Gareth, Anthony, Laurence, Michael Cobb, Thomas, Deuchar, Margaret, Donnelly, Kevin, McCarthy, Michael, Scannell, Kevin, Knight, Dawn, Fitzpatrick, Tess, Morris, Steve, Evas, Jeremy, Rayson, Paul Edward, Spasic, Irena, Stonelake, Mark, Mon Thomas, Enlli, Neale, Steven, Needs, Jennifer, Piao, Scott Songlin, Rees, Mair, Watkins, Gareth, Anthony, Laurence, Michael Cobb, Thomas, Deuchar, Margaret, Donnelly, Kevin, McCarthy, Michael, and Scannell, Kevin
- Published
- 2017
6. Towards a Welsh semantic tagger:creating lexicons for a resource poor language
- Author
-
Piao, Scott Songlin, Rayson, Paul Edward, Knight, Dawn, Watkins, Gareth, Donnelly, Kevin, Piao, Scott Songlin, Rayson, Paul Edward, Knight, Dawn, Watkins, Gareth, and Donnelly, Kevin
- Abstract
Semantic annotation is an important part of corpus linguistics. A major tool for semantic tagger is the USAS developed at Lancaster University, which was originally designed for English but has been extended to cover many more languages. In the CorCenCC Project (http://sites.cardiff.ac.uk/corcencc), we are extending the USAS to automatically annotate Welsh language data with the USAS semantic tagset. In this paper, we report on the development of Welsh semantic lexicons for the semantic tagger, in which we have already built a Welsh semantic lexicon containing 143,290 entries that has achieved a lexical coverage of 72.42% in an initial evaluation. An initial version of the Welsh semantic tagger has already been developed based on the lexical resource.
- Published
- 2017
7. A Comparison Between Genetics Papers Relating to Immune Disorders and Psychiatric Disorders
- Author
-
El-Haj, Mahmoud, Piao, Scott Songlin, Rayson, Paul Edward, Knight, Jo, El-Haj, Mahmoud, Piao, Scott Songlin, Rayson, Paul Edward, and Knight, Jo
- Published
- 2017
8. A time-sensitive historical thesaurus-based semantic tagger for deep semantic annotation
- Author
-
Piao, Scott Songlin, Dallachy, Fraser, Baron, Alistair, Demmen, Jane Elizabeth, Wattam, Steve, Durkin, Philip, McCracken, James, Rayson, Paul Edward, Alexander, Marc, Piao, Scott Songlin, Dallachy, Fraser, Baron, Alistair, Demmen, Jane Elizabeth, Wattam, Steve, Durkin, Philip, McCracken, James, Rayson, Paul Edward, and Alexander, Marc
- Abstract
Automatic extraction and analysis of meaning-related information from natural language data has been an important issue in a number of research areas, such as natural language processing (NLP), text mining, corpus linguistics, and data science. An important aspect of such information extraction and analysis is the semantic annotation of language data using a semantic tagger. In practice, various semantic annotation tools have been designed to carry out different levels of semantic annotation, such as topics of documents, semantic role labeling, named entities or events. Currently, the majority of existing semantic annotation tools identify and tag partial core semantic information in language data, but they tend to be applicable only for modern language corpora. While such semantic analyzers have proven useful for various purposes, a semantic annotation tool that is capable of annotating deep semantic senses of all lexical units, or all-words tagging, is still desirable for a deep, comprehensive semantic analysis of language data. With large-scale digitization efforts underway, delivering historical corpora with texts dating from the last 400 years, a particularly challenging aspect is the need to adapt the annotation in the face of significant word meaning change over time. In this paper, we report on the development of a new semantic tagger (the Historical Thesaurus Semantic Tagger), and discuss challenging issues we faced in this work. This new semantic tagger is built on existing NLP tools and incorporates a large-scale historical English thesaurus linked to the Oxford English Dictionary. Employing contextual disambiguation algorithms, this tool is capable of annotating lexical units with a historically-valid highly fine-grained semantic categorization scheme that contains about 225,000 semantic concepts and 4,033 thematic semantic categories. In terms of novelty, it is adapted for processing historical English data, with rich information about historical usage
- Published
- 2017
9. Building a Spanish lexicon for corpus analysis
- Author
-
Jiménez, Ricardo-María, Sanjurjo-González, Hugo, Rayson, Paul Edward, Piao, Scott Songlin, Jiménez, Ricardo-María, Sanjurjo-González, Hugo, Rayson, Paul Edward, and Piao, Scott Songlin
- Abstract
This paper seeks to describe the creation of a Spanish lexicon with semantic annotation in order to analyse more extensive corpora in the Spanish language. The semantic resources most employed nowadays are WordNet, FrameNet, PDEV and USAS, but they have been used mainly for English language research. The creation of a large Spanish lexicon will permit a greater amount of studies of corpora in Spanish can be undertaken. In the description of the steps followed for the construction of the lexicon, the difficulties encountered in its creation, and the solutions used to overcome them will be described. Finally, the construction of the lexicon will allow specific research tasks to be carried out, such as metaphor analysis, ACD studies and even PLN studies.
- Published
- 2017
10. Discourses around climate change in the news media
- Author
-
Dayrell, Carmen, Caimotto, Maria Cristina, Muller, Marcus, Piao, Scott Songlin, Dayrell, Carmen, Caimotto, Maria Cristina, Muller, Marcus, and Piao, Scott Songlin
- Abstract
Ever since the publication of their first report, the Intergovernmental Panel on Climate Change (IPCC) has described the origin of climate change as anthropogenic and, declaring it as ‘unequivocal’ in 2007. Nevertheless, societies worldwide react in different ways while the level of scepticism remains high and the scientific evidence is challenged. This research examines the ways printed newspapers have framed climate change issues across four countries: Britain, Brazil, Germany and Italy. Our ultimate aim is to investigate the role that mass media in shaping public opinion. These countries are all major emitters of greenhouse gases but their citizens reveal different attitudes and different levels of concern towards climate-change related issues (PEW 2010; EC 2011). Here, we are interested in examining the similarities and differences across these four countries regarding the debate around climate change issues within the news media. More specifically, we aim to explore the following questions: (i) what concerns are revealed through the debate? (ii) does the data explain why these societies respond differently to climate change in terms of level of concern and proposed solutions? (iii) what kinds of social practices do people discuss in relation to the causes and ways to mitigate climate change? To what extent the results can be understood as traces of national social practices? The data is drawn from a corpus comprising newspaper articles making reference to climate change/global warming published between Jan/2003 and Dec/2013 in the four countries under analysis. The texts were selected on the basis of a set of query words/phrases, established according to Gabrielatos (2007). The British corpus consists of 61.8 million words (86,088 texts), the Brazilian corpus contains 10.9 million words (19,268 texts), while the German and the Italian corpora reach 40 million and 10 million words (19,777 texts) respectively. This paper presents the results of such cross-cultura
- Published
- 2016
11. Reversing the polarity with emoticons
- Author
-
Metais, Elizabeth, Meziane, Farid, Saraee, Mohamad, Sugumaran, Vijayan, Vadera, Sunil, Teh, Phoey Lee, Rayson, Paul Edward, Pak, Irina, Piao, Scott Songlin, Yeng, Seow Mei, Metais, Elizabeth, Meziane, Farid, Saraee, Mohamad, Sugumaran, Vijayan, Vadera, Sunil, Teh, Phoey Lee, Rayson, Paul Edward, Pak, Irina, Piao, Scott Songlin, and Yeng, Seow Mei
- Abstract
Technology advancement in social media software allows users to include elements of visual communication in textual settings. Emoticons are widely used as visual representations of emotion and body expressions. However, the assignment of values to the “emoticons” in current sentiment analysis tools is still at a very early stage. This paper presents our experiments in which we study the impact of positive and negative emoticons on the classifications by fifteen different sentiment tools. The “smiley” :) and the “sad” emoticon :( and raw-text are compared to verify the degrees of sentiment polarity levels. Questionnaires were used to collect human ratings of the positive and negative values of a set of sample comments that end with these emoticons. Our results show that emoticons used in sentences are able to reverse the polarity of their true sentiment values.
- Published
- 2016
12. Lexical coverage evaluation of large-scale multilingual semantic lexicons for twelve languages
- Author
-
Calzolari, Nicoletta, Choukri, Khalid, Declerck, Thierry, Grobelnik, Marko, Maegaard, Bente, Mariani, Joseph, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios, Piao, Scott Songlin, Rayson, Paul Edward, Archer, Dawn, Bianchi, Francesca, Dayrell, Carmen, El-Haj, Mahmoud, Jiménez, Ricardo-María, Knight, Dawn, Křen, Michal, Lofberg, Laura, Nawab, Rao Muhammad Adeel, Shafi, Jawad, Teh, Phoey Lee, Mudraya, Olga, Calzolari, Nicoletta, Choukri, Khalid, Declerck, Thierry, Grobelnik, Marko, Maegaard, Bente, Mariani, Joseph, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios, Piao, Scott Songlin, Rayson, Paul Edward, Archer, Dawn, Bianchi, Francesca, Dayrell, Carmen, El-Haj, Mahmoud, Jiménez, Ricardo-María, Knight, Dawn, Křen, Michal, Lofberg, Laura, Nawab, Rao Muhammad Adeel, Shafi, Jawad, Teh, Phoey Lee, and Mudraya, Olga
- Abstract
The last two decades have seen the development of various semantic lexical resources such as WordNet (Miller, 1995) and the USAS semantic lexicon (Rayson et al., 2004), which have played an important role in the areas of natural language processing and corpus-based studies. Recently, increasing efforts have been devoted to extending the semantic frameworks of existing lexical knowledge resources to cover more languages, such as EuroWordNet and Global WordNet. In this paper, we report on the construction of large-scale multilingual semantic lexicons for twelve languages, which employ the unified Lancaster semantic taxonomy and provide a multilingual lexical knowledge base for the automatic UCREL semantic annotation system (USAS). Our work contributes towards the goal of constructing larger-scale and higher-quality multilingual semantic lexical resources and developing corpus annotation tools based on them. Lexical coverage is an important factor concerning the quality of the lexicons and the performance of the corpus annotation tools, and in this experiment we focus on evaluating the lexical coverage achieved by the multilingual lexicons and semantic annotation tools based on them. Our evaluation shows that some semantic lexicons such as those for Finnish and Italian have achieved lexical coverage of over 90% while others need further expansion.
- Published
- 2016
13. Sentiment analysis tools should take account of the number of exclamation marks!!!
- Author
-
Teh, Phoey Lee, Rayson, Paul Edward, Pak, Irina, Piao, Scott Songlin, Teh, Phoey Lee, Rayson, Paul Edward, Pak, Irina, and Piao, Scott Songlin
- Abstract
There are various factors that affect the sentiment level expressed in textual comments. Capitalization of letters tends to mark something for attention and repeating of letters tends to strengthen the emotion. Emoticons are used to help visualize facial expressions which can affect understanding of text. In this paper, we show the effect of the number of exclamation marks used, via testing with twelve online sentiment tools. We present opinions gathered from 500 respondents towards “like” and “dislike” values, with a varying number of exclamation marks. Results show that only 20% of the online sentiment tools tested considered the number of exclamation marks in their returned scores. However, results from our human raters show that the more exclamation marks used for positive comments, the more they have higher “like” values than the same comments with fewer exclamations marks. Similarly, adding more exclamation marks for negative comments, results in a higher “dislike”.
- Published
- 2015
14. Towards a semantic tagger for analysing contents of Chinese corporate reports
- Author
-
Piao, Scott Songlin, Hu, Xiaopeng, Rayson, Paul Edward, Piao, Scott Songlin, Hu, Xiaopeng, and Rayson, Paul Edward
- Published
- 2015
15. Word Alignment in English–Chinese Parallel Corpora.
- Author
-
Piao, Scott Songlin
- Subjects
ALGORITHMS ,SENTENCES (Grammar) ,BILINGUALISM ,LOANWORDS ,LANGUAGE policy ,TRANSLATING & interpreting ,LANGUAGE arts ,CORPORA ,LINGUISTIC analysis - Abstract
Word alignment in bilingual or multilingual parallel corpora has been a challenging issue for natural language engineering. An efficient algorithm for automatically aligning word translation equivalents across different languages will be of use for a number of practical applications such as multilingual lexical construction, machine translation, etc. This paper presents a hybrid algorithm for English–Chinese word alignment, which incorporates co‐occurrence association measures, word distribution distances, English word lemmatization, and part‐of‐speech information. Eleven co‐occurrence association coefficients and eight distance measures of word distribution are explored to compare their efficiency for word alignment. The paper also describes an experiment in which the algorithm is evaluated on sentence‐aligned English–Chinese parallel corpora. In the experiment, the algorithm produced encouraging success rates on two test corpora, with the highest success rate of 89.37 per cent. It provides a practical tool for extracting word translation equivalents from English–Chinese parallel corpora. [ABSTRACT FROM PUBLISHER]
- Published
- 2002
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.