77 results on '"Plain text"'
Search Results
2. A Novel Methodology for Converting English Text into Objects
- Author
-
Raj, I. Infant, Bala, B. Kiran, Smys, S., editor, Iliyasu, Abdullah M., editor, Bestak, Robert, editor, and Shi, Fuqian, editor
- Published
- 2020
- Full Text
- View/download PDF
3. Breakthrough!
- Author
-
Carroll, Michael, Alpert, Mark, Series editor, Ball, Philip, Series editor, Benford, Gregory, Series editor, Brotherton, Michael, Series editor, Callaghan, Victor, Series editor, Eden, Amnon H, Series editor, Kanas, Nick, Series editor, Landis, Geoffrey, Series editor, Rucker, Rudi, Series editor, Schulze-Makuch, Dirk, Series editor, Vaas, Rüdiger, Series editor, Walter, Ulrich, Series editor, Webb, Stephen, Series editor, and Carroll, Michael
- Published
- 2015
- Full Text
- View/download PDF
4. Text Quality, Text Variety, and Parsing XML
- Author
-
Jockers, Matthew L., DeFanti, Thomas, Series editor, Grafton, Anthony, Series editor, Levy, Thomas E., Series editor, Manovich, Lev, Series editor, Rockwood, Alyn, Series editor, and Jockers, Matthew L.
- Published
- 2014
- Full Text
- View/download PDF
5. Effective Reproducible Research with Org-Mode and Git
- Author
-
Stanisic, Luka, Legrand, Arnaud, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Lopes, Luís, editor, Žilinskas, Julius, editor, Costan, Alexandru, editor, Cascella, Roberto G., editor, Kecskemeti, Gabor, editor, Jeannot, Emmanuel, editor, Cannataro, Mario, editor, Ricci, Laura, editor, Benkner, Siegfried, editor, Petit, Salvador, editor, Scarano, Vittorio, editor, Gracia, José, editor, Hunold, Sascha, editor, Scott, Stephen L., editor, Lankes, Stefan, editor, Lengauer, Christian, editor, Carretero, Jesus, editor, Breitbart, Jens, editor, and Alexander, Michael, editor
- Published
- 2014
- Full Text
- View/download PDF
6. Redundant Encryption
- Author
-
Donovan, Peter, Mack, John, Donovan, Peter, and Mack, John
- Published
- 2014
- Full Text
- View/download PDF
7. Major Encryption Systems
- Author
-
Donovan, Peter, Mack, John, Donovan, Peter, and Mack, John
- Published
- 2014
- Full Text
- View/download PDF
8. Temporal Expression Recognition in Hindi
- Author
-
Ramrakhiyani, Nitin, Majumder, Prasenjit, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Prasath, Rajendra, editor, and Kathirvalavakumar, T., editor
- Published
- 2013
- Full Text
- View/download PDF
9. BIRL: Bidirectional-Interaction Reinforcement Learning Framework for Joint Relation and Entity Extraction
- Author
-
Yashen Wang and Huanhuan Zhang
- Subjects
050101 languages & linguistics ,Relation (database) ,Computer science ,Plain text ,Generalization ,business.industry ,05 social sciences ,02 engineering and technology ,Construct (python library) ,computer.file_format ,Machine learning ,computer.software_genre ,Relationship extraction ,Consistency (database systems) ,0202 electrical engineering, electronic engineering, information engineering ,Reinforcement learning ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Artificial intelligence ,business ,Feature learning ,computer - Abstract
Joint relation and entity extraction is a crucial technology to construct a knowledge graph. However, most existing methods (i) can not fully capture the beneficial connections between relation extraction and entity extraction tasks, and (ii) can not combat the noisy data in the training dataset. To overcome these problems, this paper proposes a novel Bidirectional-Interaction Reinforcement Learning (BIRL) framework, to extract entities and relations from plain text. Especially, we apply a relation calibration RL policy to (i) measure relation consistency and enhance the bidirectional interaction between entity mentions and relation types; and (ii) guide a dynamic selection strategy to remove noise from training dataset. Moreover, we also introduce a data augmentation module for bridging the gap of data-efficiency and generalization. Empirical studies on two real-world datasets confirm the superiority of the proposed model.
- Published
- 2021
10. Eye Movement Data Analysis and Visualization
- Author
-
Zhiguo Wang
- Subjects
Computer science ,business.industry ,Plain text ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Eye movement ,computer.file_format ,Sample (graphics) ,Gaze ,Visualization ,Trace (semiology) ,Data visualization ,Computer graphics (images) ,Data file ,business ,computer - Abstract
Lots of tools can be used to analyze or visualize eye movement data. This chapter will overview the format of the eye movement data stored in the EyeLink EDF data files. We then show how to convert EyeLink EDF files into plain text files and then extract sample and eye events. For data visualization, this chapter will briefly discuss gaze trace, scanpath, heatmap, and interest area-based plotting techniques.
- Published
- 2021
11. Research on Aging Design of News APP Interface Layout Based on Perceptual Features
- Author
-
Zehua Li, Zhixin Wu, Hongqian Li, and Xiang Li
- Subjects
Plain text ,Computer science ,business.industry ,media_common.quotation_subject ,Interactive design ,Interface (computing) ,computer.file_format ,Design strategy ,Mode (computer interface) ,User experience design ,Human–computer interaction ,Perception ,business ,computer ,media_common ,Graphical user interface - Abstract
Objective Based on the perception characteristics of the elderly and the analysis of news APP interface layout, by changing the design of news APP interface layout, to study the influence of graphic presentation mode of interface layout on information retrieval efficiency of the elderly. Methods According to the analysis results of the corresponding relationship model between the perceived characteristics of the elderly and the layout of news APP interface, the research method of combining subjective user experience data analysis with objective eye movement experiment analysis was carried out, and finally the aging design strategy of news APP interface layout was summarized. Results (1) Using graphic layout of news information can increase the information retrieval time of the elderly, while using plain text layout can reduce the information retrieval time; (2) The subjective user experience data of interface layout shows that the elderly tend to graphic interface layout; (3) According to the eye movement experiment results, the visual browsing time of the elderly on the left layout boundary of the right picture is obviously higher than that on the left layout. Conclusion The effective layout of news APP interface is helpful to improve the information retrieval efficiency of the elderly, and further stimulate the enthusiasm of the elderly users to retrieve information. The research results can be used to optimize and innovate the interactive design of news APP interface layout.
- Published
- 2021
12. Decoding Secret Message with Frequency Analysis
- Author
-
Yucel Inan
- Subjects
0209 industrial biotechnology ,Frequency analysis ,business.industry ,Computer science ,Plain text ,Reliability (computer networking) ,Cryptography ,Caesar cipher ,Data_CODINGANDINFORMATIONTHEORY ,02 engineering and technology ,computer.file_format ,Encryption ,law.invention ,020901 industrial engineering & automation ,law ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Arithmetic ,business ,Cryptanalysis ,computer ,Decoding methods ,Computer Science::Cryptography and Security - Abstract
In this study, Cryptography systems are emphasized in order for the messages to be encrypted, transmitted and decoded according to the specific system. Caesar Cryptography is one of the easy to solve ciphers in Cryptography. For the frequency analysis of an encryption method, the basic operating principle of the encryption method is based on the frequency of use of letters in the alphabet to which the plain text to be encrypted belongs. In addition, the study aims to reveal the weaknesses and strengths of the software by examining the reliability of the letters against frequency analysis for the tests and the results are evaluated on the graphs and tables.
- Published
- 2021
13. Reading and Writing Files
- Author
-
Rafael E. Banchs
- Subjects
Information retrieval ,computer.internet_protocol ,Plain text ,Computer science ,media_common.quotation_subject ,Text file ,computer.file_format ,File format ,Reading (process) ,Regular expression ,computer ,XML ,Sparse matrix ,media_common - Abstract
This chapter presents and describes the main functions and procedures for reading and writing files. First, in Sect. 5.1, the most commonly used and basic file formats are presented along with specific functions for handling them. These include the MATLAB® proprietary binary MAT-file format, as well as more conventional formatted and unformatted text files which are commonly referred to as plain text files. Then, in Sect. 5.2, functions for reading and writing some other commonly used file formats such as CSV, Row-Column-Value, XLS and XML are presented and described. Finally, in Sect. 5.3, some useful tools for working with datasets and document collections are presented.
- Published
- 2021
14. Fusing Essential Knowledge for Text-Based Open-Domain Question Answering
- Author
-
Zhonghai Wu, Ying Li, and Xiao Su
- Subjects
Information retrieval ,business.industry ,Plain text ,Computer science ,computer.file_format ,Variety (cybernetics) ,Knowledge base ,Question answering ,Open domain ,Lack of knowledge ,Document retrieval ,business ,computer ,Encoder - Abstract
Question answering (QA) systems can be classified as either text-based QA systems or knowledge base QA (KBQA) systems, depending on the used knowledge source. KBQA systems are generally domain-specific and can’t deal with a variety of questions in the open-domain QA setting, while text-based systems can. However, text-based systems’ performance is far from satisfactory. This paper focuses on the text-based open-domain QA setting. We argue that text-based approaches’ poor performance is largely caused by the lack of knowledge, which is often essential for answering the question and can be easily found in knowledge base (KB), in plain text. So in this paper, we propose a new text-based open-domain QA system called KF (Knowledge Fusion)-QA, which uses KB as a second knowledge source to incorporate essential knowledge into text to help answer the question. Our system has a Knowledge-Aware Encoder which extracts essential knowledge from KB and performs knowledge fusion to output knowledge-aware (KA) text representations. With this KA representations, the system first re-rank the retrieved documents, then read the re-ranked top-N documents to give the answer. Our system significantly outperforms existing text-based QA systems on multiple open-domain QA datasets, demonstrating the effectiveness of fusing essential knowledge.
- Published
- 2021
15. Multi-modal Fake News Detection
- Author
-
Tanmoy Chakraborty
- Subjects
World Wide Web ,Modal ,Leverage (negotiation) ,Computer science ,Plain text ,Microblogging ,Taxonomy (general) ,Social media ,computer.file_format ,Representation (arts) ,Set (psychology) ,computer - Abstract
The primary motivation behind the spread of fake news is to convince the readers to believe false information related to certain events or entities. Human cognition tends to consume news more when it is visually depicted through multimedia content than just plain text. Fake news spreaders leverage this cognitive state to prepare false information in such a way that it looks attractive in the first place. Therefore, multi-modal representation of fake news has become highly popular. This chapter presents a thorough survey of the recent approaches to detect multi-modal fake news spreading on various social media platforms. To this end, we present a list of challenges and opportunities in detecting multi-modal fake news. We further provide a set of publicly available datasets, which is often used to design multi-modal fake news detection models. We then describe the proposed methods by categorizing them through a taxonomy.
- Published
- 2020
16. Using Natural Language Processing to Translate Plain Text into Pythonic Syntax in Kannada
- Author
-
K R Pavan, Sundar Guntnur, G B Sanjana, Vinay Rao, Sanjana Reddy, and N Navya Priya
- Subjects
Computer science ,First language ,media_common.quotation_subject ,Language barrier ,02 engineering and technology ,010501 environmental sciences ,computer.software_genre ,01 natural sciences ,Literacy ,0202 electrical engineering, electronic engineering, information engineering ,0105 earth and related environmental sciences ,computer.programming_language ,media_common ,Plain text ,business.industry ,computer.file_format ,Python (programming language) ,language.human_language ,Kannada ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,Transfer of learning ,business ,computer ,Natural language processing ,Coding (social sciences) - Abstract
Digital evolution has made various services and products available at everyone’s fingertips and made human lives easier. It has become necessary for individuals with a passion to be a part of this digital evolution to learn how to write code, which is the basic literacy of the digital age. But writing code has become a privilege for students with prior knowledge of English. This project aims to remove this language barrier by teaching students to solve coding problems in their native language and to convert their logic to code. The paper presents a platform where students provide their logic to coding problems in their native language in plain text, which is then converted to python code using natural language processing techniques. The current platform can successfully identify and convert conditional statements in the Kannada language into python code. The next effort will be aimed at extending this to recognize loop statements and create a framework for a wide variety of languages.
- Published
- 2020
17. Assessing the Impact of OCR Errors in Information Retrieval
- Author
-
Gustavo Acauan Lorentz, Guilherme Torresan Bazzo, Viviane Pereira Moreira, and Danny Suarez Vargas
- Subjects
Information retrieval ,Empirical research ,Text mining ,business.industry ,Plain text ,Computer science ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Word error rate ,computer.file_format ,Noisy text ,business ,computer - Abstract
A significant amount of the textual content available on the Web is stored in PDF files. These files are typically converted into plain text before they can be processed by information retrieval or text mining systems. Automatic conversion typically introduces various errors, especially if OCR is needed. In this empirical study, we simulate OCR errors and investigate the impact that misspelled words have on retrieval accuracy. In order to quantify such impact, errors were systematically inserted at varying rates in an initially clean IR collection. Our results showed that significant impacts are noticed starting at a 5% error rate. Furthermore, stemming has proven to make systems more robust to errors.
- Published
- 2020
18. A Novel Methodology for Converting English Text into Objects
- Author
-
I. Infant Raj and B. Kiran Bala
- Subjects
Steganography ,business.industry ,Plain text ,Computer science ,English language ,computer.file_format ,computer.software_genre ,Communication source ,Artificial intelligence ,Day to day ,Dimension (data warehouse) ,business ,computer ,Character recognition ,Natural language processing ,Data transmission - Abstract
Day to day life security is very important aspect likewise one of the famous technique for security in present technology is steganography which will provide a new dimension to the data and that particular data will be in another form so the sender and receiver only knows the exact data others will be able to see the data but understanding of the data is very less compare to other technique. To strengthen the technique conversion of english language text into objects are another dimension which gives us more confident on data transfer by using proposed algorithm. This algorithm is applicable for both sender as well as receiver side and the system supports only english letters and special characters alone.
- Published
- 2020
19. Interacting with a Salesman Chatbot
- Author
-
Charlotte Esteban and Thomas Beauvisage
- Subjects
Plain text ,Politeness ,media_common.quotation_subject ,05 social sciences ,020207 software engineering ,02 engineering and technology ,computer.file_format ,Hyperlink ,computer.software_genre ,Chatbot ,World Wide Web ,Qualitative analysis ,0202 electrical engineering, electronic engineering, information engineering ,0501 psychology and cognitive sciences ,Conversation ,computer ,050107 human factors ,media_common - Abstract
In recent years, chatbots have been spreading on social networks and brand websites, and interactions between users and commercial chatbots have become an ordinary experience in the range of human-computer interactions. Yet, whereas automated conversation has been analyzed in various experimental contexts, only a few studies describe real-world interactions with voicebots [1, 2, 3] or chatbots [4, 5, 6]. How do interactions with chatbots actually take place? What is an AI-driven commercial conversation in practice? To address these questions, we conducted a sociological study of interactions with chatbots, based on the quantitative and qualitative analysis of interaction logs with a vending chatbot, deployed on a French online telecom company. The study relies on a dataset of 9 months of ComBot usage logs in 2019, representing roughly 47,000 interaction sessions. Our analysis shows that interactions with the commercial chatbot are a highly hybrid format between click-based interfaces and conversational interactions. A majority of users mobilize the conventions of commercial conversation to express their need in plain text. However, the rest of the dialogue mainly combines response-buttons, short input text, and hyperlinks. The use of politeness shows that users are keen on following the conversational interaction format offered to them, even if they don’t use it entirely.
- Published
- 2020
20. Multi Keyword Search on Encrypted Text Without Decryption
- Author
-
Vemula Sridhar and K. Ram-Mohan Rao
- Subjects
Information retrieval ,Cloud computing security ,Plain text ,business.industry ,Computer science ,Cosine similarity ,Data security ,Cloud computing ,computer.file_format ,Encryption ,Software portability ,High availability ,business ,computer - Abstract
Various organizations use plain text to store data related to day-to-day computations. Data is stored in the form of plain text documents without any structure and specifications. Retrieval and searching from structured data will be easier with various existing database systems. Querying and searching on unstructured content is difficult. In general, searching on unstructured content can be implemented using a similarity between input keywords and documents. Organizations are moving towards cloud to store the data because of high availability, lower maintenance cost, reliability, and portability. In a cloud system, sensitive data like personal records are to be protected to avoid malicious access from intruders. But searching data from encrypted content is difficult. In this paper, we are proposing a scheme called Multi Keyword Search on Encrypted text (MKSE) which enables searching on encrypted unstructured text without decryption in the cloud using Cosine Similarity. To store the documents, we are using CryptDB database where documents are stored in encrypted form. Thus the multi-keyword search is done on encrypted data in the cloud using cryptDB for providing data security.
- Published
- 2020
21. Text Steganography Based on Parallel Encryption Using Cover Text (PECT)
- Author
-
Sakshi Sharma, Mukesh Kumar, and Subhash Panwar
- Subjects
Correctness ,Cover (telecommunications) ,business.industry ,Plain text ,Language change ,Computer science ,computer.file_format ,Space (commercial competition) ,Encryption ,Text steganography ,Parallelism (grammar) ,business ,computer ,Computer network - Abstract
This paper presents a self-checking text steganography which could be used efficiently where large data is involved in various IoT applications. The space complexity of its cover text is one-fourth of its message size which is very less as compared to other techniques. The cover text of this method is generated from the private message itself. Later it is used to check the correctness of the data on receiver side. This helps to reduce the time overhead in case of corruption of any part of the data. Thus, the security and trust issues of IoT are easily handled even on high traffic channels.
- Published
- 2020
22. Detection of NAT64/DNS64 by SRV Records: Detection Using Global DNS Tree in the World Beyond Plain-Text DNS
- Author
-
Martin Hunek and Zdenek Pliva
- Subjects
Third party ,Plain text ,Computer science ,media_common.quotation_subject ,020208 electrical & electronic engineering ,020206 networking & telecommunications ,02 engineering and technology ,computer.file_format ,Computer security ,computer.software_genre ,NAT64 ,Tree (data structure) ,0202 electrical engineering, electronic engineering, information engineering ,computer ,Reputation ,media_common - Abstract
Since it has been introduced the NAT64/DNS64 transition mechanism has reputation of method which simply works. This could change as currently used detection method, RFC7050 [16], for this transition mechanism doesn’t work with third party/foreign DNS resolvers. These resolvers have been lately introduced by Mozilla Firefox [1] with implementation of DNS over HTTPS. This paper describes problems connected with default usage of third party DNS resolvers and provides a way how to solve issues of RFC7050 [16] with and without third party resolvers.
- Published
- 2020
23. Entity Linking for Historical Documents: Challenges and Solutions
- Author
-
Pontes, Elvys Linhares, Cabrera-Diego, Luis Adrián, Moreno, José G., Boros, Emanuela, Pontes, Elvys, Hamdi, Ahmed, Sidère, Nicolas, Coustaty, Mickaël, Doucet, Antoine, Laboratoire Informatique, Image et Interaction - EA 2118 (L3I), Université de La Rochelle (ULR), Recherche d’Information et Synthèse d’Information (IRIT-IRIS), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), and Université Fédérale Toulouse Midi-Pyrénées
- Subjects
Historical data ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,02 engineering and technology ,computer.software_genre ,[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Entity linking ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,[INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL] ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,Digital libraries ,business.industry ,Plain text ,Search engine indexing ,Deep learning ,Optical character recognition ,computer.file_format ,Digital library ,Clef ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Index (publishing) ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Historical document ,Natural language processing - Abstract
International audience; Named entities (NEs) are among the most relevant type of information that can be used to efficiently index and retrieve digital documents. Furthermore, the use of Entity Linking (EL) to disambiguate and relate NEs to knowledge bases, provides supplementary information which can be useful to differentiate ambiguous elements such as geographical locations and peoples' names. In historical documents, the detection and disambiguation of NEs is a challenge. Most historical documents are converted into plain text using an optical character recognition (OCR) system at the expense of some noise. Documents in digital libraries will, therefore, be indexed with errors that may hinder their accessibility. OCR errors affect not only document indexing but the detection, disambiguation, and linking of NEs. This paper aims at analysing the performance of different EL approaches on two multilingual historical corpora, CLEF HIPE 2020 (English, French, German) and NewsEye (Finnish, French, German, Swedish), while proposes several techniques for alleviating the impact of historical data problems on the EL task. Our findings indicate that the proposed approaches not only outperform the baseline in both corpora but additionally they considerably reduce the impact of historical document issues on different subjects and languages.
- Published
- 2020
24. HMAC and 'Secure Preferences': Revisiting Chromium-Based Browsers Security
- Author
-
Gerardo Schneider, Pablo Picazo-Sanchez, and Andrei Sabelfeld
- Subjects
050101 languages & linguistics ,business.industry ,Computer science ,Plain text ,Home page ,05 social sciences ,Hash function ,Cryptography ,02 engineering and technology ,computer.file_format ,Internet security ,Computer security ,computer.software_genre ,Hash-based message authentication code ,0202 electrical engineering, electronic engineering, information engineering ,Code (cryptography) ,020201 artificial intelligence & image processing ,0501 psychology and cognitive sciences ,Message authentication code ,business ,computer - Abstract
Google disabled years ago the possibility to freely modify some internal configuration parameters, so options like silently (un)install browser extensions, changing the home page or the search engine were banned. This capability was as simple as adding/removing some lines from a plain text file called Secure Preferences file automatically created by Chromium the first time it was launched. Concretely, Google introduced a security mechanism based on a cryptographic algorithm named Hash-based Message Authentication Code (HMAC) to avoid users and applications other than the browser modifying the Secure Preferences file. This paper demonstrates that it is possible to perform browser hijacking, browser extension fingerprinting, and remote code execution attacks as well as silent browser extensions (un)installation by coding a platform-independent proof-of-concept changeware that exploits the HMAC, allowing for free modification of the Secure Preferences file. Last but not least, we analyze the security of the four most important Chromium-based browsers: Brave, Chrome, Microsoft Edge, and Opera, concluding that all of them suffer from the same security pitfall.
- Published
- 2020
25. HDTCat: Let’s Make HDT Generation Scale
- Author
-
Dennis Diefenbach and José M. Giménez-García
- Subjects
Database ,Computer science ,Plain text ,Test data generation ,business.industry ,Serialization ,010401 analytical chemistry ,020207 software engineering ,02 engineering and technology ,computer.file_format ,computer.software_genre ,Data structure ,01 natural sciences ,0104 chemical sciences ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,RDF ,business ,computer ,Merge (version control) ,Publication - Abstract
Data generation in RDF has been increasing over the last years as a means to publish heterogeneous and interconnected data. RDF is usually serialized in verbose text formats, which is problematic for publishing and managing huge datasets. HDT is a binary serialization of RDF that makes use of compact data structures, making it possible to publish and query highly compressed RDF data. This allows to reduce both the volume needed to store it and the speed at which it can be transferred or queried. However, it moves the burden of dealing with huge amounts of data from the consumer to the publisher, who needs to serialize the text data into HDT. This process consumes a lot of resources in terms of time, processing power, and especially memory. In addition, adding data to a file in HDT format is currently not possible, whether this additional data is in plain text or already serialized into HDT.
- Published
- 2020
26. Latent Space Modeling for Cloning Encrypted PUF-Based Authentication
- Author
-
Sathyanarayanan N. Aakur, Vishalini Laguduva Ramnath, Srinivas Katkoori, University of South Florida [Tampa] (USF), Oklahoma State University [Stillwater], Augusto Casaca, Srinivas Katkoori, Sandip Ray, and Leon Strous
- Subjects
Authentication ,Physically Unclonable Function ,Latent space modeling ,Cloning (programming) ,Computer science ,Plain text ,business.industry ,Physical unclonable function ,Encryption ,020206 networking & telecommunications ,02 engineering and technology ,computer.file_format ,Cryptographic protocol ,ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS ,Attack model ,0202 electrical engineering, electronic engineering, information engineering ,Physical access ,[INFO]Computer Science [cs] ,020201 artificial intelligence & image processing ,Clone (computing) ,business ,computer ,Cloning ,Computer network - Abstract
Part 3: IoT Security; International audience; Physically Unclonable Functions (PUFs) have emerged as a lightweight, viable security protocol in the Internet of Things (IoT) framework. While there have been recent works on crypt-analysis of PUF-based models, they require physical access to the device and knowledge of the underlying architecture along with unlimited access to the challenge-response pairs in plain text without encryption. In this work, we are the first to tackle the problem of encrypted PUF-based authentication in an IoT framework. We propose a novel, generative framework based on variational autoencoders that is PUF architecture-independent and can handle encryption protocols on the transmitted CRPs. We show that the proposed framework can successfully clone three (3) different PUF architectures encrypted using two (2) different encryption protocols in DES and AES. We also show that the proposed approach outperforms a brute-force machine learning-based attack model by over $$20\%$$.
- Published
- 2020
27. A Content Dictionary for In-Object Comments
- Author
-
Lars Hellström
- Subjects
TheoryofComputation_MISCELLANEOUS ,Plain text ,Computer science ,business.industry ,Content (measure theory) ,OpenMath ,computer.file_format ,Artificial intelligence ,Object (computer science) ,computer.software_genre ,business ,computer ,Natural language processing - Abstract
It is observed that some OpenMath objects may benefit from containing comments. A content dictionary with suitable attribution symbols is proposed. This content dictionary also provides application symbols for constructing comments that are somewhat more than just plain text strings.
- Published
- 2020
28. Recommendations for Evolving Relational Databases
- Author
-
Stéphane Ducasse, Anne Etien, Julien Delplanque, Nicolas Anquetil, Analyses and Languages Constructs for Object-Oriented Application Evolution (RMOD), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189 (CRIStAL), Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Ecole Centrale de Lille-Centre National de la Recherche Scientifique (CNRS)-Université de Lille-Ecole Centrale de Lille, Contributions of the Data parallelism to real time (DART), Laboratoire d'Informatique Fondamentale de Lille (LIFL), Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Université de Lille, Sciences et Technologies-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille, Sciences Humaines et Sociales-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, and Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
SQL ,[INFO.INFO-PL]Computer Science [cs]/Programming Languages [cs.PL] ,Information retrieval ,Impact analysis ,Plain text ,Computer science ,Relational database ,Database schema ,InformationSystems_DATABASEMANAGEMENT ,[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE] ,02 engineering and technology ,computer.file_format ,computer.software_genre ,Article ,Relational database management system ,020204 information systems ,Schema (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Information system ,Stored procedure ,Semi-automatic evolution ,computer ,Meta-model ,computer.programming_language - Abstract
International audience; Relational databases play a central role in many information systems. Their schemas contain structural and behavioral entity descriptions. Databases must continuously be adapted to new requirements of a world in constant change while: (1) relational database management systems (RDBMS) do not allow inconsistencies in the schema; (2) stored procedure bodies are not meta-described in RDBMS such as PostgreSQL that consider their bodies as plain text. As a consequence , evaluating the impact of an evolution of the database schema is cumbersome , being essentially manual. We present a semi-automatic approach based on recommendations that can be compiled into a SQL patch fulfilling RDBMS constraints. To support recommendations, we designed a meta-model for relational databases easing computation of change impact. We performed an experiment to validate the approach by reproducing a real evolution on a database. The results of our experiment show that our approach can set the database in the same state as the one produced by the manual evolution in 75% less time.
- Published
- 2020
29. Tri Layer Model for Color Image Ciphering Through DNA Assisted 2D Chaos
- Author
-
P. Sherine, C. V. Sanjay Siddharth, Amirtharajan Rengarajan, and Nithya Chidambaram
- Subjects
Plain text ,business.industry ,Color image ,Computer science ,02 engineering and technology ,computer.file_format ,Encryption ,01 natural sciences ,CHAOS (operating system) ,Brute-force attack ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Computer vision ,Artificial intelligence ,Noise (video) ,business ,010301 acoustics ,computer - Abstract
In this modern digital world, there are many advancements that are made in the fields of networking. These days many multimedia messages, in the form of images are shared online for communicating with each other. The images thus send are vulnerable to attacks which are both internal (noise attack etc.) and external (hackers). An algorithm for securing these multimedia messages have been proposed here. The key employed in the proposed scheme is made from three different 2D chaotic maps. A colour image is split into RGB planes, it undergoes a 3-level confusion-double diffusion using Deoxyribo Nucleic Acid (DNA). A novel methodology has been approached to substantiate it over the currently existing encryption standards. The ability to withstand brute force attack, noise attack, chosen plain text attack of the method are to be tested.
- Published
- 2019
30. Advanced Similarity Measures Using Word Embeddings and Siamese Networks in CBR
- Author
-
George Lancaster, Stelios Kapetanakis, Klaus-Dieter Althoff, Kareem Amin, Miltos Petridis, and Andreas Dengel
- Subjects
business.industry ,Computer science ,Plain text ,Rich Text Format ,Context (language use) ,02 engineering and technology ,computer.file_format ,computer.software_genre ,Knowledge acquisition ,Text processing ,Similarity (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,Domain knowledge ,020201 artificial intelligence & image processing ,Relevance (information retrieval) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Automatic fuzzy text processing, context extraction and disambiguation are three challenging research areas with high relevance to complex business domains. Business knowledge can be found in plain text message exchanges, emails, support tickets, internal chat messengers and other volatile means, making the decoding of text-based domain knowledge a challenging task. Traditional natural language processing approaches focus on a comprehensive representation of business knowledge and any relevant mappings. However, such approaches can be highly complex, not cost-effective and of high maintenance, especially in environments that experience frequent changes. This work applies LSTM Siamese Networks to measure text similarities in ambiguous domains. We implement the Manhattan LSTM (MaLSTM) Siamese neural network for semi-automatic knowledge acquisition of business knowledge and decoding of domain-relevant features that enable building similarity measures. Our aim is to minimize the effort from human experts while extracting domain knowledge from rich text, containing context-free abbreviations, grammatically incorrect text and mixed language.
- Published
- 2019
31. Hybrid HSW Based Zero Watermarking for Tampering Detection of Text Contents
- Author
-
Anurag Dixit and Fatek Saeed
- Subjects
business.industry ,Plain text ,Computer science ,Data_MISCELLANEOUS ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,Pattern recognition ,Watermark ,Text document ,computer.file_format ,Zero (linguistics) ,Watermark embedding ,Embedding ,Pattern matching ,Artificial intelligence ,business ,Digital watermarking ,computer - Abstract
In this paper, the Hybrid Structural component and word length (HSW) based zero watermarking technique is proposed in which the content of text document is not altered for watermark embedding. It involves two steps namely watermark embedding and extraction. From the data owner the plain text is obtained and the watermark is generated. It is based on utilizing the characteristics of text. The text and the watermark are registered to the certifying authority and it is further used for pattern matching which detects the tampering in text document. The proposed embedding algorithm HSW is used for embedding the text. After watermarking, the tampering is applied to the content. The extraction algorithm for HSW is utilized for getting the applied watermark from the text content. For each sub pattern, the corresponding watermarking pattern is extracted from the certified authority. The extracted patterns are compared with the generated pattern. The pattern matching procedure is taken into account for detecting the tampered content
- Published
- 2019
32. Legislative Document Content Extraction Based on Semantic Web Technologies
- Author
-
José Emilio Labra Gayo and Francisco Cifuentes-Silva
- Subjects
Markup language ,computer.internet_protocol ,Computer science ,Plain text ,Process (engineering) ,020207 software engineering ,02 engineering and technology ,Linked data ,computer.file_format ,Document processing ,World Wide Web ,0202 electrical engineering, electronic engineering, information engineering ,Semantic technology ,020201 artificial intelligence & image processing ,Semantic Web ,computer ,XML - Abstract
This paper describes the system architecture for generating the History of the Law developed for the Chilean National Library of Congress (BCN). The production system uses Semantic Web technologies, Akoma-Ntoso, and tools that automate the marking of plain text to XML, enriching and linking documents. These documents semantically annotated allow to develop specialized political and legislative services, and to extract knowledge for a Legal Knowledge Base for public use. We show the strategies used for the implementation of the automatic markup tools, as well as describe the knowledge graph generated from semantic documents. Finally, we show the contrast between the time of document processing using semantic technologies versus manual tasks, and the lessons learnt in this process, installing a base for the replication of a technological model that allows the generation of useful services for diverse contexts.
- Published
- 2019
33. Extraction of RDF Statements from Text
- Author
-
Ana B. Rios-Alvarado, Edwin Aldana-Bobadilla, Ivan Lopez-Arevalo, José-Lázaro Martínez-Rodríguez, and Julio Hernandez
- Subjects
Entity linking ,Information retrieval ,Computer science ,Plain text ,computer.file_format ,Representation (arts) ,RDF ,Relationship extraction ,Semantic Web ,computer ,Meaning (linguistics) ,Task (project management) - Abstract
The vision of the Semantic Web is to get information with a defined meaning in a way that computers and people can work collaboratively. In this sense, the RDF model provides such a definition by linking and representing resources and descriptions through defined schemes and vocabularies. However, much of the information able to be represented is contained within plain text, which results in an unfeasible task by humans to annotate large scale data sources such as the Web. Therefore, this paper presents a strategy for the extraction and representation of RDF statements from text. The idea is to provide an architecture that receives sentences and returns triples with elements linked to resources and vocabularies of the Semantic Web. The results demonstrate the feasibility of representing RDF statements from text through an implementation following the proposed strategy.
- Published
- 2019
34. Mathematical Expression Extraction from Unstructured Plain Text
- Author
-
Kulakshi Fernando, Gihan Dias, and Surangika Ranathunga
- Subjects
010304 chemical physics ,Computer science ,Plain text ,business.industry ,Deep learning ,Scale (descriptive set theory) ,02 engineering and technology ,computer.file_format ,computer.software_genre ,01 natural sciences ,Expression (mathematics) ,Range (mathematics) ,Information extraction ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Word (computer architecture) ,Variable (mathematics) - Abstract
Mathematical expressions are often found embedded inline with unstructured plain text in the web and documents. They can vary from numbers and variable names to average-level mathematical expressions. Traditional rule-based techniques for mathematical expression extraction do not scale well across a wide range of expression types, and are less robust for expressions with slight typos and lexical ambiguities. This research employs sequential, as well as deep learning classifiers to identify mathematical expressions in a given unstructured text. We compare CRF, LSTM, Bi-LSTM with word embeddings, and Bi-LSTM with word and character embeddings. These were trained with a dataset containing 102K tokens and 9K mathematical expressions. Given the relatively small dataset, the CRF model out-performed RNN models.
- Published
- 2019
35. Transfusion of Extended Vigenere Table and ASCII Conversion for Encryption Contrivance
- Author
-
Karan Goyal, Prateek Thakral, Deepak Garg, Sakshi, and Tarun Kumar
- Subjects
business.industry ,Computer science ,Plain text ,020206 networking & telecommunications ,Cryptography ,Caesar cipher ,02 engineering and technology ,computer.file_format ,Encryption ,ASCII ,Symmetric-key algorithm ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,Cryptosystem ,020201 artificial intelligence & image processing ,Arithmetic ,business ,computer - Abstract
In the field of cryptography, to make cryptosystem more secure an evaluation of modulus operations on integral values followed by ASCII value generation on plain text characters have been deeply explored in this research paper. On the basis of this an extended algorithm has been proposed. In this, the Encryption technique consists of an extended combination of Vigenere and Caesar cipher which is the main key feature of this algorithm and then decryption of text along with ASCII algorithm and substitution methodology has been done. The Algorithm is initiated on the basis of inspection of various research papers, furthermore, reviews have been made for proving this system more reliable. In the proposed algorithm modified Vigenere table and ASCII values are taken into consideration for decreasing the steps to reduce complexity and making a more secure way of cryptography.
- Published
- 2019
36. A New Signcryption Scheme Based on Elliptic Curves
- Author
-
Zhijuan Jia, Cui Wenjun, Hu Mingsheng, Bei-Gong, and Li-peng Wang
- Subjects
Discrete mathematics ,Computer science ,Plain text ,business.industry ,Hash function ,computer.file_format ,Public-key cryptography ,Forward secrecy ,Discrete logarithm ,Ciphertext ,Multiplication ,business ,computer ,Signcryption - Abstract
Based on the intractable problem of discrete logarithm in ECC and the intractability of reversing a one-way hash function, this paper presents a signcryption scheme with public verifiability and forward security. In the process of security proof, the unforgeability ensures that the attacker can’t create a valid ciphertext. We verify the cipher text \( c \) instead of the plain text \( m \) in verification phase. We protect the plain text \( m \), which makes the proposed scheme confidential. Thus, the proposed scheme has the property of public verification. And the scheme ensures that if the sender’s private key is compromised, but the attacker can’t recover original message \( m \) from cipher text \( (c,R,s) \). By the performance analysis, our proposed scheme mainly uses the model multiplication. Compared with Zhou scheme, the number of model multiplication has lost one time in signcryption phase, which leads to the significant increase in calculation rate. Moreover, the signature length has lost \( 2|n| \) compared with Zhou scheme. In other words, the minimum value of complexity is reached in theory. This makes the scheme have higher security and wider applications.
- Published
- 2019
37. Regular Expressions in Python
- Author
-
John Hunt
- Subjects
Computer science ,Plain text ,Programming language ,Data_FILES ,Regular expression ,computer.file_format ,Python (programming language) ,computer.software_genre ,computer ,computer.programming_language - Abstract
Regular Expression are a very powerful way of processing text while looking for recurring patterns; they are often used with data held in plain text files (such as log files), CSV files as well as Excel files. This chapter introduces regular expressions, discusses the syntax used to define a regular expression pattern and presents the Python re module and its use.
- Published
- 2019
38. Text Relation Extraction Using Sentence-Relation Semantic Similarity
- Author
-
Mohamed Lubani and Shahrul Azman Mohd Noah
- Subjects
Artificial neural network ,Plain text ,Computer science ,business.industry ,computer.file_format ,computer.software_genre ,Autoencoder ,Relationship extraction ,Semantic similarity ,Deep neural networks ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Sentence ,Natural language processing - Abstract
There is a huge amount of available information stored in unstructured plain text. Relation Extraction (RE) is an important task in the process of converting unstructured resources into machine-readable format. RE is usually considered as a classification problem where a set of features are extracted from the training sentences and thereafter passed to a classifier to predict the relation labels. Existing methods either manually design these features or automatically build them by means of deep neural networks. However, in many cases these features are general and do not accurately reflect the properties of the input sentences. In addition, these features are only built for the input sentences with no regard to the features of the target relations. In this paper, we follow a different approach to perform the RE task. We propose an extended autoencoder model to automatically build vector representations for sentences and relations from their distinctive features. The built vectors are high abstract continuous vector representations (embeddings) where task related features are preserved and noisy irrelevant features are eliminated. Similarity measures are then used to find the sentence-relation semantic similarities using their representations in order to label sentences with the most similar relations. The conducted experiments show that the proposed model is effective in labeling new sentences with their correct semantic relations.
- Published
- 2019
39. The (Persistent) Threat of Weak Passwords: Implementation of a Semi-automatic Password-Cracking Algorithm
- Author
-
Christoph Meinel, Feng Cheng, Chris Pelchen, and David Jaeger
- Subjects
Password ,Authentication ,Software_OPERATINGSYSTEMS ,Point (typography) ,Computer science ,Plain text ,Hash function ,Password cracking ,computer.file_format ,Service provider ,ComputingMilieux_MANAGEMENTOFCOMPUTINGANDINFORMATIONSYSTEMS ,Cryptographic hash function ,computer ,Algorithm - Abstract
Password-based authentication remains the main method of user authentication in computer systems. In case of a leak of the user database, the obfuscated storage of passwords is the last remaining protection of credentials. The strength of a password determines how hard it is to crack a password hash for uncovering the plain text password. Internet users often ignore recommended password guidelines and choose weak passwords that are easy to guess. In addition, service providers do not warn users that their chosen passwords are not secure enough. In this work we present a semi-automatic password cracking algorithm that orders and executes user-chosen password cracking attacks based on their efficiency. With our new approach, we are able to accelerate the cracking of password hashes and to demonstrate that weak passwords are still a serious security risk. The intention of this work is to point out that the usage of weak passwords holds great dangers for both the user and the service provider.
- Published
- 2019
40. Visual Meaningful Encryption Scheme Using Intertwinning Logistic Map
- Author
-
Muazzam A. Khan, Shehzad Amin Sheikh, Jan Sher Khan, Jawad Ahmad, and Saadullah Farooq Abbasi
- Subjects
Pixel ,business.industry ,Plain text ,Computer science ,Key space ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Pattern recognition ,02 engineering and technology ,computer.file_format ,Data loss ,Encryption ,01 natural sciences ,Computer Science::Computer Vision and Pattern Recognition ,Computer Science::Multimedia ,0103 physical sciences ,Ciphertext ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Confusion and diffusion ,Logistic map ,business ,010301 acoustics ,computer ,Computer Science::Cryptography and Security - Abstract
Transmission of images over the Internet is exponentially increased in the last decade. However, Internet is considered as an insecure channel and hence may cause serious privacy issues. To overcome such privacy concerns, researchers are trying to secure image data from eavesdroppers through a method known as encryption. The final output of most traditional image encryption scheme is a random like noise. However, attackers pay special attention to such random/noise like images and hence these images are vulnerable to different type of attacks. In this paper, we propose a novel encryption scheme that can transform the plain-text image pixels into a visually meaningful encrypted image. In the proposed scheme, intertwining logistic map is used for introducing confusion and diffusion in plain-text images. Firstly, plain-text image pixels are permuted via random values obtained from chaotic map. In order to strengthen the proposed scheme, random values obtained from chaotic map are also XORed with the permuted image. In the final stage, Gray Substitution Box (S-Box) is applied to achieve the final cipher-text image. All experimental results such as key space analysis, noise attack and data loss attack are in the favor of the proposed scheme.
- Published
- 2018
41. Power Series Transform in Cryptology and ASCII
- Author
-
Muharrem Tuncay Gençoğlu and Dumitru Baleanu
- Subjects
Power series ,Laplace transform ,Computer science ,Plain text ,business.industry ,Cryptography ,computer.file_format ,ASCII ,Encryption ,Exponential function ,Ciphertext ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,business ,computer ,Algorithm ,Computer Science::Cryptography and Security - Abstract
This chapter introduces a different cryptographic method that uses power series transform and ASCII codes. We produce a new algorithm for cryptology, use an expanded Laplace transformation of the exponential function for encrypting plain text, and use ASCII codes to support the confidentiality of the cipher text. We also show the corresponding inverse of the power series transform for decryption.
- Published
- 2018
42. Identifying Vulnerabilities and Attacking Capabilities Against Pseudonym Changing Schemes in VANET
- Author
-
Arunita Jaekel, Ikjot Saini, and Sherif Saad
- Subjects
021110 strategic, defence & security studies ,Vehicular ad hoc network ,Computer science ,Plain text ,0211 other engineering and technologies ,Eavesdropping ,02 engineering and technology ,computer.file_format ,010501 environmental sciences ,Pseudonym ,Computer security ,computer.software_genre ,01 natural sciences ,Identification (information) ,Metric (mathematics) ,computer ,0105 earth and related environmental sciences - Abstract
Vehicular communication discloses critical information about the vehicle. Association of this information to the drivers put the privacy of the driver at risk. The broadcast of safety messages in plain text is essential for safety applications but not secure with respect to the privacy of the driver. Many pseudonymous schemes are proposed in the literature, yet the level of privacy is not being compared among these schemes. Our contribution in this paper is the identification of the vulnerabilities in the existing pseudonym changing schemes, determining the attacking capabilities of the local-passive attacker and demonstration of the optimal case for an attacker to deploy the network of eavesdropping stations with the feasible attacking capabilities. We have also provided the analysis and comparison of the different pseudonym changing schemes with a new metric to measure tracking ability of the local-passive attacker in highway and urban scenarios as well as with the varying number of attacking stations.
- Published
- 2018
43. Challenges of an Annotation Task for Open Information Extraction in Portuguese
- Author
-
Rafael Glauber, Marlo Souza, Cleiton Fernando Lima Sena, Leandro Souza de Oliveira, and Daniela Barreiro Claro
- Subjects
Computer science ,business.industry ,Plain text ,05 social sciences ,02 engineering and technology ,Limiting ,computer.file_format ,computer.software_genre ,language.human_language ,Task (project management) ,Set (abstract data type) ,Information extraction ,Annotation ,Resource (project management) ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,0509 other social sciences ,Portuguese ,050904 information & library sciences ,business ,computer ,Natural language processing - Abstract
Open information extraction (Open IE) is a task of extracting facts from a plain text without limiting the analysis to a predefined set of relationships. Although a significant number of studies have focused on this problem in the last years, there is a lack of available linguistic resources for languages other than English. An essential resource for the evaluation of Open IE methods is notably an annotated corpus. In this work, we present the challenges involved in the creation of a golden set corpus for the Open IE task in the Portuguese language. We describe our methodology, an annotation tool to support the task and our results on performing this annotation task in a small validation corpus.
- Published
- 2018
44. Citation Field Learning by RNN with Limited Training Data
- Author
-
Jianzhong Qi, Xinxing Xu, Yiqing Zhang, Yimeng Dai, and Rui Zhang
- Subjects
Training set ,Artificial neural network ,Computer science ,business.industry ,Plain text ,String (computer science) ,02 engineering and technology ,computer.file_format ,Machine learning ,computer.software_genre ,Field (computer science) ,Task (project management) ,Recurrent neural network ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Citation ,business ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,computer - Abstract
Citation field learning is to segment a citation string into fields of interest such as author, title, and venue from plain text. We are interested in citation field learning from researchers’ homepages. This task is challenging due to the free citation styles used by different creators of the homepages. We aim to address the challenge by neural network based approaches which learn the citation field styles automatically. Neural network based approaches are data-hungry, but manually labeled training data is expensive to obtain. Therefore, we propose a novel framework that utilizes auto-generated training data and domain adaptation to enhance a manually labeled training dataset of limited size. At the same time, we design an adaptive Recurrent Neural Network (RNN) to learn citation styles from the enhanced training data effectively. Extensive experiments show that the proposed methods outperform state-of-the-art methods for citation field learning.
- Published
- 2018
45. On the Security of Stream Ciphers with Encryption Rate $$\frac{1}{2}$$ 1 2
- Author
-
Michele Elia
- Subjects
Scheme (programming language) ,Degree (graph theory) ,Plain text ,business.industry ,Computer science ,computer.file_format ,Encryption ,Secrecy ,Projective plane ,Arithmetic ,business ,computer ,Stream cipher ,computer.programming_language - Abstract
Based on the geometry of finite projective planes, a secret-key encryption scheme which offers an exactly computable degree of secrecy is described. This target is achieved at the cost of an encryption rate equal to \(\frac{1}{2}\), as in one-time-pad encryption, but the devised scheme avoids the burden of exchanging and destroying long keys. It is also shown that knowledge of pieces of plain text does not significantly reduce the degree of secrecy; further, the cost of possible plain-text attacks is under the designer’s control, and can be made as high as desired.
- Published
- 2018
46. Factors Impacting the Label Denoising of Neural Relation Extraction
- Author
-
Yang Ji, Tingting Sun, and Chunhong Zhang
- Subjects
Focus (computing) ,Plain text ,business.industry ,Computer science ,Noise reduction ,02 engineering and technology ,computer.file_format ,Machine learning ,computer.software_genre ,Relationship extraction ,Variety (cybernetics) ,03 medical and health sciences ,ComputingMethodologies_PATTERNRECOGNITION ,0302 clinical medicine ,Knowledge base ,030221 ophthalmology & optometry ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Sentence - Abstract
The goal of relation extraction is to obtain relational facts from plain text, which can benefit a variety of natural language processing tasks. To address the challenge of automatically labeling large-scale training data, a distant supervision strategy is introduced to relation extraction by heuristically aligning entity pairs in plain text with the knowledge base. Unfortunately, the method is vulnerable to the noisy label problem due to the incompletion of the exploited knowledge base. Existing works focus on the specific algorithms, but few works summarize the commonalities between different methods and the influencing factors of these denoising mechanisms. In this paper, we propose three main factors that impact the label denoising of distantly supervised relation extraction, including labeling assumption, prior knowledge and confidence level. In order to analyze how these factors influence the denoising effectiveness, we build a unified neural framework with word, sentence and label denoising modules for relation extraction. Then we conduct experiments to evaluate and compare these factors according to ten neural schemes. In addition, we discuss the typical cases of these factors and find that influential word-level prior knowledge and partial confidence for distantly supervised labels can significantly affect the denoising performance. These implicational findings can provide researchers with more insight of distantly supervised relation extraction.
- Published
- 2018
47. DMSD-FPE: Data Masking System for Database Based on Format-Preserving Encryption
- Author
-
Shimeng Wei, Mingming Zhang, Zhonghao Guo, Zheli Liu, Pu Song, Guiyang Xie, and Zijing Cheng
- Subjects
Database ,Plain text ,Computer science ,business.industry ,computer.file_format ,computer.software_genre ,Encryption ,Masking (Electronic Health Record) ,Filesystem-level encryption ,Format-preserving encryption ,Ciphertext ,Referential integrity ,business ,computer ,Data masking - Abstract
The traditional data masking systems cannot provide reversible operations for database, and they will destroy the referential integrity of database. To solve the problems above, we provide a new data masking system based on format-preserving encryption (DMSD-FPE). This paper presents the model of it and highlights the appropriate masking algorithms for different databases. DMSD-FPE could guarantee that the format of cipher text is the same as plain text, and provides reversible operations for databases. Besides, the referential integrity is also kept. Furthermore, the experiments demonstrates that the system is efficient enough to adapt to practical uses.
- Published
- 2017
48. Mining Temporal Causal Relations in Medical Texts
- Author
-
Alejandro Sobrino, Cristina Puente, and José A. Olivas
- Subjects
Interpretation (logic) ,Conjecture ,Plain text ,business.industry ,Computer science ,02 engineering and technology ,computer.file_format ,computer.software_genre ,Causality ,Image (mathematics) ,020204 information systems ,Node (computer science) ,0202 electrical engineering, electronic engineering, information engineering ,Arrow ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,Sentence - Abstract
Causal sentences are a main part of the medical explanations, providing the causes of diseases or showing the effects of medical treatments. In medicine, causal association is frequently related to time restrictions. So, some drugs must be taken before or after meals, being ‘after’ and ‘before’ temporary constraints. Thus, we conjecture that frequently medical papers include time causal sentences. Causality involves a transfer of qualities from the cause to the effect, denoted by a directed arrow. An arrow connecting the node cause with the node effect is a causal graph. Causal graphs are an imagery way to show the causal dependencies that a sentence shows using plain text. In this paper, we will provide several programs to extract time causal sentences from medical Internet resources and to convert the obtained sentences in their equivalent causal graphs, providing an enlightening image of the relations that a text describes, showing the cause-effect links and the temporary constraints affecting their interpretation.
- Published
- 2017
49. SIMSSP: Secure Instant Messaging System for Smart Phones
- Author
-
Kahtan Aziz, Saed Tarapiah, and Shadi Atalla
- Subjects
Short Message Service ,Application programming interface ,Computer science ,business.industry ,Plain text ,computer.file_format ,Encryption ,Telecommunications network ,Key (cryptography) ,Wireless ,business ,computer ,Secure transmission ,Computer network - Abstract
Hand held smart devices such as smart phones, Personal Digital Assistant (PDAs) and tablets are ubiquitous and touch almost all people’ life functions. In most cases, Hand held devices can be categorized as messaging-centric that conveying information in many form of electronic messages i.e. e-mail, Short Message Service (SMS) and instance messaging (IM). IM software exchange large amount of messages (e.g. plain text, images and files) over insecure communications networks for example wireless and internet communication networks. IM data are transferred through communication networks in different locations. Thus, the potential for unauthorized access, abuse, or fraud is not limited to a single location but can occur at any access point in the communication network. The objective of this work is to provide a secure effective platform for encrypting and decrypting IM data. The proposed platform is based on pre-shared secure key mechanism and Vernam algorithm in order to guarantee secure transmission and storage functions for IM applications through a well-established Application Programming Interface (API) that provides seamless integration with existing IM software.
- Published
- 2017
50. A Performance Comparison of Feature Extraction Methods for Sentiment Analysis
- Author
-
Lai Po Hung and Rayner Alfred
- Subjects
Phrase ,010308 nuclear & particles physics ,Computer science ,Plain text ,business.industry ,Bigram ,Feature extraction ,Sentiment analysis ,02 engineering and technology ,computer.file_format ,computer.software_genre ,Part of speech ,01 natural sciences ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Trigram ,Artificial intelligence ,business ,computer ,Natural language processing ,Sentence - Abstract
Sentiment analysis is the task of classifying documents according to their sentiment polarity. Before classification of sentiment documents, plain text documents need to be transformed into workable data for the system. This step is known as feature extraction. Feature extraction produces text representations that are enriched with information in order to have better classification results. The experiment in this work aims to investigate the effects of applying different sets of features extracted and to discuss the behavior of the features in sentiment analysis. These features extraction methods include unigrams, bigrams, trigrams, Part-Of-Speech (POS) and Sentiwordnet methods. The unigrams, part-of-speech and Sentiwordnet features are word based features, whereas bigrams and trigrams are phrase-based features. From the results of the experiment obtained, phrase based features are more effective for sentiment analysis as the accuracies produced are much higher than word based features. This might be due to the fact that word based features disregards the sentence structure and sequence of original text and thus distorting the original meaning of the text. Bigrams and trigrams features retain some sequence of the sentences thus contributing to better representations of the text.
- Published
- 2017
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.