10 results on '"Dage, Särg"'
Search Results
2. Common clinical blood and urine biomarkers for ischemic stroke: an Estonian Electronic Health Records database study
- Author
-
Siim Kurvits, Ainika Harro, Anu Reigo, Anne Ott, Sven Laur, Dage Särg, Ardi Tampuu, the Estonian Biobank Research Team, Kaur Alasoo, Jaak Vilo, Lili Milani, Toomas Haller, and the PRECISE4Q consortium
- Subjects
Ischemic stroke ,Electronic health records ,Population health ,Machine learning ,Medicine - Abstract
Abstract Background Ischemic stroke (IS) is a major health risk without generally usable effective measures of primary prevention. Early warning signals that are easy to detect and widely available can save lives. Estonia has one nation-wide Electronic Health Record (EHR) database for the storage of medical information of patients from hospitals and primary care providers. Methods We extracted structured and unstructured data from the EHRs of participants of the Estonian Biobank (EstBB) and evaluated different formats of input data to understand how this continuously growing dataset should be prepared for best prediction. The utility of the EHR database for finding blood- and urine-based biomarkers for IS was demonstrated by applying different analytical and machine learning (ML) methods. Results Several early trends in common clinical laboratory parameter changes (set of red blood indices, lymphocyte/neutrophil ratio, etc.) were established for IS prediction. The developed ML models predicted the future occurrence of IS with very high accuracy and Random Forests was proved as the most applicable method to EHR data. Conclusions We conclude that the EHR database and the risk factors uncovered are valuable resources in screening the population for risk of IS as well as constructing disease risk scores and refining prediction models for IS by ML.
- Published
- 2023
- Full Text
- View/download PDF
3. EstNLTK 1.6: Remastered Estonian NLP Pipeline.
- Author
-
Sven Laur, Siim Orasmaa, Dage Särg, and Paul Tammo
- Published
- 2020
4. Quote extraction from Estonian media: Analysis and tools
- Author
-
Dage Särg, Karmen Kink, and Karl-Oskar Masing
- Subjects
quote extraction ,indirect speech ,named entity recognition ,information extraction ,corpus linguistics ,computational linguistics ,tsitaatide tuvastamine ,vahendatud kõne ,nimeüksuste tuvastamine ,info eraldamine ,korpuslingvistika ,arvutilingvistika ,Philology. Linguistics ,P1-1091 ,Finnic. Baltic-Finnic ,PH91-98.5 - Abstract
This paper describes the identification, adaptation and creation of tools that are needed for creating a quote extractor for Estonian media texts that would be able to properly extract both direct and indirect quotes and attribute them to the correct person identified by full name and profession. This includes named entity recognition and resolution as well as grammar-based extraction of direct and indirect quotes. To get a further understanding of indirect speech in Estonian media, we also performed a corpus linguistic analysis of the quotes extracted with our tools from one week of Estonian news. *** Tsitaatide eraldamine eestikeelsetest meediatekstidest: analüüs ja töövahendid Artikkel annab ülevaate eesti keele tsitaadituvastaja loomise esimesest etapist. Tsitaadituvastaja eesmärk on eraldada nii otseses kui kaudses kõnes väljendatud tsitaate koos tsiteeritud isiku täisnime ning võimalusel ka ametiga. Artiklis selgitasime, milliseid komponente tsitaadituvastaja jaoks oleks vaja ning vastavalt sellele testisime ja kohandasime olemasolevaid ning lõime veel puuduvaid töövahendeid. Samuti identifitseerisime tsitaadituvastaja arenduseks vajalikud parandused ja lisatööriistad ning analüüsisime uudistes otsese ja kaudse kõne edastamiseks kasutatavaid saatelauseid. Isikunimede leidmiseks kasutasime EstNLTK teegi standardset CRF-põhist nimeüksuste märgendajat. Lisasime sellele ühestaja, mis leiab tekstist tsiteeritud isiku täisnime, juhul kui saatelauses on kasutatud ainult ees- või perekonnanime. Sel moel suutsime ära lahendada 61,0% ühesõnalistest isikunimedest. Elukutsete leidmiseks lõime 5659 sõna suuruse eestikeelse ametite leksikoni ning märgendasime selle põhjal nimeüksuste tuvastaja treeningkorpuses ka elukutsed. Seejärel treenisime nimeüksuste tuvastaja ümber tuvastama ka elukutseid. Tulemusi hinnates leidsime, et kõige parem on kasutada leksikonipõhist lähenemist koos ümbertreenitud CRF-märgendajaga, mis andis elukutsete tuvastamise F1-skooriks 86,1%. Otsekõne eraldamiseks kasutasime regulaaravaldistepõhist lähenemist. Kuna otsekõne on jutumärkidega selgelt markeeritud, saime sel moel 95,0%-se F1-skoori. Otsekõne saatelausetest saime sisendit kaudse kõne eraldamiseks: lõime verbide, nimisõnade ning määruste leksikoni, mis viitavad, et lause edastab vahendatud mõtteid. Grammatikapõhise lähenemisega saime F1-skooriks 84,7%, seejuures oli täpsus 93,5%. Uurides tuvastamata jäänud kaudse kõne lauseid, leidsime veel mitut tüüpi konstruktsioone, mida saagise parandamiseks grammatikapõhises lähenemises käsitleda võiks. Lõpetuseks analüüsisime vastloodud töövahendite abil ühe nädala Eesti meediatekstidest eraldatud tsitaate ja nende saatelauseid, käsitledes nii leksikaalseid, morfoloogilisi, süntaktilisi kui ka semantilisi jooni. Tulevikus on plaanis peale eraldiseisvate töövahendite parandamise luua ka vabalt kasutatav terviklahendus eestikeelsete tsitaatide ja tsiteeritute tuvastamiseks.
- Published
- 2021
- Full Text
- View/download PDF
5. Annotated Clause Boundaries' Influence on Parsing Results.
- Author
-
Dage Särg, Kadri Muischnek, and Kaili Müürisep
- Published
- 2018
- Full Text
- View/download PDF
6. Internetikeele automaatne süntaktiline analüüs kitsenduste grammatikaga
- Author
-
Dage Särg
- Subjects
arvutilingvistika ,keeletöötlus ,süntaks ,sõltuvussüntaks ,keele varieerumine ,eesti keel ,Philology. Linguistics ,P1-1091 ,Finnic. Baltic-Finnic ,PH91-98.5 - Abstract
"Syntactic analysis of Estonian netspeak using Constraint Grammar" The paper provides an overview of an attempt to adapt the Estonian Constraint Grammar rule set for netspeak. The rule set has been developed by Kaili Müürisep and Tiina Puolakainen for shallow and dependency parsing of Estonian literary language, and it has previously been adapted for shallow parsing of spoken Estonian by Kaili Müürisep and Heli Uibo. First, in order to adapt the rules, a chatroom corpus was parsed with the existing rule set. The corpus was manually revised and based on the errors that were found, changes were made to the rule set. The changes regarded detection of clause boundaries and particle verbs, as well as assignment of syntactic tags and dependency relations. Extensive use of discourse particles and direct addresses, short sentence length, and small percentage of attributes among the syntactic functions used in text appeared to be the most distinctive features of netspeak, as well as the large amount of elliptical sentences from which, in addition to other syntactic functions, a predicate can be left out. As a result of adapting the rule set, the results of both shallow and dependency parsing improved. The most error-prone syntactic functions were subjects, predicatives, and adverbials. In dependency parsing, the largest number of errors was made in determining the governors of adverbials.
- Published
- 2016
- Full Text
- View/download PDF
7. Transforming Estonian health data to the Observational Medical Outcomes Partnership (OMOP) Common Data Model: lessons learned
- Author
-
Marek Oja, Sirli Tamm, Kerli Mooses, Maarja Pajusalu, Harry-Anton Talvik, Anne Ott, Marianna Laht, Maria Malk, Marcus Lõo, Johannes Holm, Markus Haug, Hendrik Šuvalov, Dage Särg, Jaak Vilo, Sven Laur, Raivo Kolde, and Sulev Reisberg
- Abstract
ObjectiveTo describe the reusable transformation process of electronic health records (EHR), claims, and prescriptions data into Observational Medical Outcome Partnership (OMOP) common data model (CDM), together with challenges faced and solutions implemented.Materials and MethodsWe used Estonian national health databases that store almost all residents’ claims, prescriptions, and EHR records. To develop and demonstrate the transformation process of Estonian health data to OMOP CDM, we used a 10% random sample of the Estonian population (n = 150,824 patients) from 2012-2019. For the sample, complete information from all three databases was converted to OMOP CDM version 5.3. The validation was performed using open-source tools.ResultsIn total, we transformed over 100 million entries to standard concepts using standard OMOP vocabularies with the average mapping rate 95%. For conditions, observations, drugs, and measurements, the mapping rate was over 90%. In most cases, SNOMED Clinical Terms were used as the target vocabulary.DiscussionDuring the transformation process, we encountered several challenges, which are described in detail with concrete examples and solutions.ConclusionFor a representative 10% random sample, we successfully transferred complete records from three national health databases to OMOP CDM and created a reusable transformation process. Our work helps future researchers to transform linked databases into OMOP CDM more efficiently, ultimately leading to better real-world evidence.
- Published
- 2023
- Full Text
- View/download PDF
8. Quote extraction from Estonian media: Analysis and tools
- Author
-
Karmen Kink, Karl-Oskar Masing, and Dage Särg
- Subjects
Linguistics and Language ,business.industry ,Computer science ,Extraction (chemistry) ,computer.software_genre ,Estonian ,Language and Linguistics ,language.human_language ,Education ,language ,Artificial intelligence ,business ,computer ,Natural language processing - Published
- 2021
- Full Text
- View/download PDF
9. Genome-wide Study Identifies Association between HLA-B∗55:01 and Self-Reported Penicillin Allergy
- Author
-
Kristi Krebs, Jonas Bovijn, Neil Zheng, Maarja Lepamets, Jenny C. Censin, Tuuli Jürgenson, Dage Särg, Erik Abner, Triin Laisk, Yang Luo, Line Skotte, Frank Geller, Bjarke Feenstra, Wei Wang, Adam Auton, Soumya Raychaudhuri, Tõnu Esko, Andres Metspalu, Sven Laur, Dan M. Roden, Wei-Qi Wei, Michael V. Holmes, Cecilia M. Lindgren, Elizabeth J. Phillips, Reedik Mägi, Lili Milani, João Fadista, Michelle Agee, Stella Aslibekyan, Robert K. Bell, Katarzyna Bryc, Sarah K. Clark, Sarah L. Elson, Kipper Fletez-Brant, Pierre Fontanillas, Nicholas A. Furlotte, Pooja M. Gandhi, Karl Heilbron, Barry Hicks, David A. Hinds, Karen E. Huber, Ethan M. Jewett, Yunxuan Jiang, Aaron Kleinman, Keng-Han Lin, Nadia K. Litterman, Marie K. Luff, Jennifer C. McCreight, Matthew H. McIntyre, Kimberly F. McManus, Joanna L. Mountain, Sahar V. Mozaffari, Priyanka Nandakumar, Elizabeth S. Noblin, Carrie A.M. Northover, Jared O’Connell, Aaron A. Petrakovitz, Steven J. Pitts, G. David Poznik, J. Fah Sathirapongsasuti, Anjali J. Shastri, Janie F. Shelton, Suyash Shringarpure, Chao Tian, Joyce Y. Tung, Robert J. Tunney, Vladimir Vacic, Xin Wang, Amir S. Zare, Institute for Molecular Medicine Finland, and University of Helsinki
- Subjects
0301 basic medicine ,Genome-wide association study ,Human leukocyte antigen ,HYPERSENSITIVITY REACTIONS ,FREQUENCY ,BIOBANK ,MECHANISMS ,PTPN22 ,03 medical and health sciences ,0302 clinical medicine ,MANAGEMENT ,Genetics ,medicine ,SNP ,Allele ,METAANALYSIS ,Genetics (clinical) ,business.industry ,1184 Genetics, developmental biology, physiology ,ADVERSE DRUG-REACTIONS ,POLYMORPHISM ,HLA-B ,3. Good health ,HLA ,Penicillin ,030104 developmental biology ,030220 oncology & carcinogenesis ,Pharmacogenomics ,Immunology ,T-CELLS ,business ,medicine.drug - Abstract
Hypersensitivity reactions to drugs are often unpredictable and can be life threatening, underscoring a need for understanding their underlying mechanisms and risk factors. The extent to which germline genetic variation influences the risk of commonly reported drug allergies such as penicillin allergy remains largely unknown. We extracted data from the electronic health records of more than 600,000 participants from the UK, Estonian, and Vanderbilt University Medical Center's BioVU biobanks to study the role of genetic variation in the occurrence of self-reported penicillin hypersensitivity reactions. We used imputed SNP to HLA typing data from these cohorts to further fine map the human leukocyte antigen (HLA) association and replicated our results in 23andMe's research cohort involving a total of 1.12 million individuals. Genome-wide meta-analysis of penicillin allergy revealed two loci, including one located in the HLA region on chromosome 6. This signal was further fine-mapped to the HLA-B∗55:01 allele (OR 1.41 95% CI 1.33-1.49, p value 2.04 × 10-31) and confirmed by independent replication in 23andMe's research cohort (OR 1.30 95% CI 1.25-1.34, p value 1.00 × 10-47). The lead SNP was also associated with lower lymphocyte counts and in silico follow-up suggests a potential effect on T-lymphocytes at HLA-B∗55:01. We also observed a significant hit in PTPN22 and the GWAS results correlated with the genetics of rheumatoid arthritis and psoriasis. We present robust evidence for the role of an allele of the major histocompatibility complex (MHC) I gene HLA-B in the occurrence of penicillin allergy.
- Published
- 2020
- Full Text
- View/download PDF
10. Methodology for Automated Extraction of Socialization Values from Online Media Texts
- Author
-
Kalmus, Teet, Dage Särg, and MA ja prof. Veronika Kalmus
- Abstract
Magistritöö eesmärgiks oli luua metoodika koos tarkvaralahendusega, mis võimaldaks online-meediatekstide hulgast leida üles tekstid, mis sisaldavad kasvatusväärtusi, ning neid tekste analüüsida.Loodud tarkvaralahendus hõlmab andmete leidmist ja allalaadimist Perekooli Koolilapse foorumi meediatekstidest, kasvatusväärtuste sõnastiku loomist, kasvatusväärtusi sisaldavate tekstide filtreerimist ning analüüsi, sh ka tekstide emotsionaalsuse analüüsi. Tarkvaralahenduse loomisel on kasutatud Tartu ülikoolis loodud keeletöötlusteeki EstNLTK ning magistritöö analüüsi käigus saadud andmeid on plaanis kasutada teadusartiklite kirjutamisel. Loodud metoodika alusel on plaanis analüüsida ka teisi meediatekste., The aim of the present thesis was to create a methodology with a software solution to find texts containing socialization values among online media texts and to analyse the filtered texts.The software solution includes web crawling and downloading the texts from the forum Koolilaps of Perekool, creating a dictionary of socialization values, filtering texts using the dictionary of socialization values and analysing the filtered texts with sentiment analysis.The software package EstNLTK was used to create the software solution. Further plans include using the results of the thesis for writing academic articles and using the methodology and the software solution for analysing other media texts.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.