Descriptor: "*UNICODE (Computer character set)" / Publication Year Range: Last 50 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"*UNICODE (Computer character set)"' showing total 204 results

Start Over Descriptor "*UNICODE (Computer character set)" Publication Year Range Last 50 years

204 results on '"*UNICODE (Computer character set)"'

1. Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis.

Author: Struppek, Lukas, Hintersdorf, Dominik, Friedrich, Felix, Brack, Manuel, Schramowski, Patrick, and Kersting, Kristian
Subjects: GLYPHS (Graphic methods), LANGUAGE models, STABLE Diffusion, UNICODE (Computer character set), QUALITATIVE research
Abstract: Models for text-to-image synthesis, such as DALL-E 2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public. These models are capable of producing high-quality images that depict a variety of concepts and styles when conditioned on textual descriptions. However, these models adopt cultural characteristics associated with specific Unicode scripts from their vast amount of training data, which may not be immediately apparent. We show that by simply inserting single non-Latin characters in the textual description, common models reflect cultural biases in their generated images. We analyze this behavior both qualitatively and quantitatively and identify a model's text encoder as the root cause of the phenomenon. Such behavior can be interpreted as a model feature, offering users a simple way to customize the image generation and reflect their own cultural background. Yet, malicious users or service providers may also try to intentionally bias the image generation. One goal might be to create racist stereotypes by replacing Latin characters with similarly-looking characters from non-Latin scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. CuneiML: A Cuneiform Dataset for Machine Learning.

Author: Chen, Danlu, Agarwal, Aditi, Berg-Kirkpatrick, Taylor, and Myerston, Jacobo
Subjects: MACHINE learning, TRANSLITERATION, METADATA, UNICODE (Computer character set), ELECTRONIC data processing
Abstract: The cuneiform writing system holds a vast reservoir of ancient literature, encompassing over 3000 years of history. Originating around the mid-fourth millennium BCE and enduring until the late first millennium BCE, cuneiform writing spans various genres such as administrative, legal, medical, and scientific documents, among others. This article introduces a curated dataset, CuneiML, featuring 38,947 high-resolution 2D photos of Sumerian and Akkadian cuneiform tablets, accompanied by their cuneiform Unicode transcriptions, transliterations, lineart, and metadata. This dataset aims to support the development of machine learning tools for processing and analyzing Sumerian and Akkadian cuneiform artifacts - e.g. for automatically classifying genre, provenance, or period from unannotated tablet images. Thus, CuneiML is designed with consistency of format as a primary concern. Specifically, CuneiML is a result of meticulously preprocessing, segmenting, filtering, and re-transliterating data that is available online in the Cuneiform Digital Library Initiative (CDLI) collection. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. LATEX News.

Subjects: LATEX (Computer software), PDF (Computer file format), UNICODE (Computer character set), DOCUMENTATION, COPYING
Abstract: The article focuses on new functionality, commands, and improvements introduced in the latest LaTeX release, which is version 2023-06-01. Topics include new functionality for LaTeX Tagged Portable Document Format (PDF), providing copy and show functions for environments, and improvements in handling Unicode case changing, highlighting bug fixes and documentation improvements.
Published: 2023
Full Text: View/download PDF

4. The gods smile at me: The LATEX Companion, third edition, and ChatGPT.

Author: Grätzer, George
Subjects: LATEX (Computer software), CHATGPT, UNICODE (Computer character set), COMMANDS (Logic), LANGUAGE models
Abstract: The article focuses on the developments in LaTeX, highlighting its transition to Unicode Transformation Format (UTF)-8, changes in BibTeX, and significant improvements. Topics include the release of "The LaTeX Companion, third edition," which comprehensively covers LaTeX and its packages, the use of ChatGPT, and discussing LaTeX topics and English usage.
Published: 2023
Full Text: View/download PDF

5. Exploiting Vector Instructions with Generalized Stream Fusion.

Author: Mainland, Geoffrey, Leshchinskiy, Roman, and Peyton Jones, Simon
Subjects: *HASKELL (Computer program language), *COMPUTER programming, *ARRAY processing, *UNICODE (Computer character set), *VECTOR processing (Computer science), *SIMD (Computer architecture)
Abstract: Ideally, a program written as a composition of concise, self-contained components should perform as well as the equivalent hand-written version where the functionality of what was many components has been manually combined into a monolithic implementation. That is, programmers should not have to sacrifice code clarity or good software engineering practices to obtain performance--we want compositionality without a performance penalty. This work shows how to attain this goal for high-level Haskell in the domain of sequence-processing functions, which includes applications such as array processing. Prior work on stream fusion3 shows how to automatically transform some high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However some operations, like vector append, do not perform well within the stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in the stream fusion framework at all. We describe generalized stream fusion, which solves these issues through a careful choice of stream representation. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler- and hand-vectorized C. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

6. The Unicode Cookbook for Linguists

Author: Moran, Steven, Cysouw, Michael, Moran, Steven, and Cysouw, Michael
Subjects: Language and languages--Orthography and spelling, Unicode (Computer character set)
Abstract: This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research. This book is a prime example of open publishing as envisioned by Language Science Press. It is open access, has accompanying open source software, has open peer review, versioning and so on. Read more in this blog post. The book is continuously being improved. You can follow the development on https://github.com/unicode-cookbook/cookbook/releases/latest
Published: 2018

7. The Structure and Content of MARC 21 Records in the Unicode Environment.

Author: Aliprand, Joan M.
Subjects: *UNICODE (Computer character set), *CIPHERS, *LATIN language, *LIBRARIES, *DATABASES, *MARC formats, *CHINESE language, *JAPANESE language
Abstract: The article discusses the effects of the use of unicodes in the MARC 21 records. Specifically, it tries to highlight the accommodation of the greatly expanded character repertoire that includes not only additional non-Roman scripts but many more Latin script characters. MARC 21 is flexible with respect to the structure of records containing multi-script data to accommodate different needs worldwide. MARC 21 has specified two record models designated as A and B. Model A is a record in the preferred script which can be augmented with specially designated fields holding other scripts and Model B is a record which can have data in any script in regular fields. Model A is largely used for the MARC 21 bibliographic records with Latin as the preferred script. Model B can be seen by libraries with devices capable of displaying Chinese, Japanese, and Korean. It is necessary for a Model A record encoded in Unicode with Latin to be the preferred script that the definition of the Latin script data content of regular fields should be controlled by the Script Names property of Unicode. Creating consistent records for MARC 21 interchange necessities the data content in Model A records with Latin as the preferred script should be controlled.
Published: 2005
Full Text: View/download PDF

8. Library Systems and Unicode: A Review of the Current State of Development.

Author: Tull, Laura
Subjects: *UNICODE (Computer character set), *ACADEMIC libraries, *COMPUTER industry
Abstract: Unicode, a standard developed in 1991, defines a universal character set for encoding the characters in the scripts of the world's languages. Unicode implementation has been gaining momentum in recent years especially in the software and computer industry. Academic libraries with collections of materials in multiple languages will want to take advantage of Unicode for display and searching of materials in non-Latin scripts such as Arabic, Hebrew, and Chinese. The focus of this article is a review of Unicode and its incorporation in library systems. [ABSTRACT FROM AUTHOR]
Published: 2002

9. UNICODE (HEX).

Author: TAN, MARYLYN
Subjects: UNICODE (Computer character set)
Published: 2018

10. ユニコード戦記

Author: 小林竜生 and 小林竜生
Subjects: Unicode (Computer character set)
Published: 2011

11. Google Expands 'Emoji Kitchen' for World Emoji Day.

Author: Hutchinson, Andrew
Subjects: SHORT videos, UNICODE (Computer character set), SEARCH engines, WEB browsers
Published: 2024

12. On bottom accents in OpenType math.

Author: Hagen, Hans and Sundqvist, Mikael P.
Subjects: STRESS (Linguistics), MATHEMATICS, UNICODE (Computer character set), NUMERICAL calculations, FONTS & typefaces
Abstract: The article focuses on addressing issues related to bottom accents in OpenType math, specifically concerning the placement of these accents. Topics include variations in how different fonts handle bottom accents, the absence of a standardized approach by Microsoft, and the proposed solution for calculating accent width and anchors.
Published: 2023
Full Text: View/download PDF

13. Motivic and real étale stable homotopy theory.

Author: Bachmann, Tom
Subjects: *HOMOTOPY theory, *SHEAF theory, *TOPOLOGY, *MORPHISMS (Mathematics), *UNICODE (Computer character set)
Abstract: Let S be a Noetherian scheme of finite dimension and denote by ρ ∈ [1, Gm]SHS the (additive inverse of the) morphism corresponding to -1 ∈ O× (S). Here SH}(S) denotes the motivic stable homotopy category. We show that the category obtained by inverting ρ in SH(S) is canonically equivalent to the (simplicial) local stable homotopy category of the site Srét, by which we mean the small real étale site of S, comprised of étale schemes over S with the real étale topology. One immediate application is that SH(R)[ρ-1] is equivalent to the classical stable homotopy category. In particular this computes all the stable homotopy sheaves of the ρ-local sphere (over R). As further applications we show that DA1(k,Z[1/2]) ~ DMW(k)[1/2] (improving a result of Ananyevskiy–Levine–Panin), reprove Röndigs’ result that πi(1|1/η,1/2]) = 0 for i = 1,2 and establish some new rigidity results. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

14. WIGNER’S THEOREM IN ${\mathcal{L}}^{\infty }(\unicode[STIX]{x1D6E4})$-TYPE SPACES.

Author: JIA, WEIKE and TAN, DONGNI
Subjects: *UNICODE (Computer character set), *CHARACTER sets (Data processing), *ISOCHORIC processes, *THERMODYNAMICS, *INCOMPRESSIBLE flow, *VOLUME (Cubic content)
Abstract: We investigate surjective solutions of the functional equation $$\begin{eqnarray}\displaystyle \{\Vert f(x)+f(y)\Vert ,\Vert f(x)-f(y)\Vert \}=\{\Vert x+y\Vert ,\Vert x-y\Vert \}\quad (x,y\in X), & & \displaystyle \nonumber\end{eqnarray}$$ where $f:X\rightarrow Y$ is a map between two real ${\mathcal{L}}^{\infty }(\unicode[STIX]{x1D6E4})$-type spaces. We show that all such solutions are phase equivalent to real linear isometries. This can be considered as an extension of Wigner’s theorem on symmetry for real ${\mathcal{L}}^{\infty }(\unicode[STIX]{x1D6E4})$-type spaces. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

15. Unicode with rules Arabic text data hiding.

Author: Khami, Mohammed Jawar
Subjects: UNICODE (Computer character set), REVERSIBLE data hiding (Computer science), ARABIC manuscripts, ARABIC document writing, ALGORITHMS
Abstract: Text documents are the unavoidable form of information communication among humans, research papers on text hiding techniques are less in contrast to other cover object’s techniques. This is due to that text documents have relatively less number of features (less redundant), that can be used to hide data in comparison with other cover object types (image, audio, and video). In this paper, text hiding (Text-in-Text data hiding), algorithms is proposed and then coded in Matlab (m-files) form. The algorithm represents a new technique. It has many advantages over other existing text-in-text hiding techniques. These advantages include the usage of Arabic or Arabic-English mixed for both the secret and cover text with the aids of two of the nonprinting Unicode characters. Also applying new hiding rules concerning Arabic writing system. The cover text classified into groups of Arabic letters each with specific features and thus hiding text between letters from these groups must be controlled by these new text hiding rules. Matlab programs for embedding and extracting the secret text, according to the new approach, are tested and the outputs have been found very satisfying. Both secret and cover text have the same original format and text configuration. [ABSTRACT FROM AUTHOR]
Published: 2018

16. LOSSLESS TEXT COMPRESSION FOR UNICODE TAMIL DOCUMENTS.

Author: Vijayalakshmi, B. and Sasirekha, N.
Subjects: TAMIL language, UNICODE (Computer character set), DATA compression, ASCII (Character set), COMPUTER storage capacity
Abstract: Data compressions for different world languages including Indian languages are in high need and demand. Tamil language is one of the longest-surviving classical languages in the world. Usage of Tamil language for communication and storage was increased due to the digitization of government documents and orders. Lossless text compression process for Tamil language document involves substituting an ASCII character in place of Unicode Tamil characters, since the size of an ASCII character is one byte where as a Unicode character size range between 1 byte to 4 bytes depends on the encoding file storage type. The decompression process involves the reverse of compression technique (i.e) replacing ASCII characters with Unicode characters. This paper describes about the architecture of compression and decompression process for Tamil text documents. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

17. Scripta Bulgarica: Digital Library of Medieval Bulgarian Literature.

Author: MILTENOVA, ANISSAVA
Subjects: DIGITAL libraries, BULGARIAN literature, MEDIEVAL literature, UNICODE (Computer character set), LITERARY form, BYZANTINE authors, HAGIOGRAPHY
Published: 2018
Full Text: View/download PDF

18. MANAGING TECHNOLOGY: Unicode: The Universal Character Set Part 2: Unicode in Library Systems.

Author: Coyle, Karen
Subjects: *UNICODE (Computer character set), *MACHINE-readable bibliographic data, *LIBRARIES, *LIBRARY catalog management, *LIBRARY catalogs, *CATALOGING, *LANGUAGE & languages, *CATALOGERS, *DATABASES
Abstract: The article discusses the use of Unicodes, which comprises of a universal character set to represent all languages on computers, by libraries. The machine readable cataloging (MARC) record provides for a range of accented characters through an available 256 bytes for character expression. New methods were added to the MARC record and the cataloging systems that permitted libraries to encode some non-Latin-based alphabetic languages. Unicodes have allowed the transformation of library catalogs into truly multi-lingual databases. Currently, the scripts available to catalogers are those available in the MARC 21 character set, MARC-8. But, there have been proposals to convert catalog records from MARC-8 to Unicode, disproving concerns on the cost and complexity of pursuing such a process.
Published: 2006
Full Text: View/download PDF

19. Unicode: The Universal Character Set.

Author: Coyle, Karen
Subjects: *UNICODE (Computer character set), *CHARACTER sets (Data processing), *ALPHABET -- Data processing, *DATA processing of signs & symbols, *ASCII (Character set), *COMPUTER software, *ENGLISH language, *LIBRARY science, *INFORMATION technology
Abstract: The article presents information about unicode, which are known to be the universal character set. Computers have become an indispensable part of one's life. Until recently, the outdated key character set standard, ASCII was in use. This character set has numerous limitations such as because of its constraint to include only seven of the eight bits in the byte, there is no room in ASCII to expand beyond the characters needed for the English language. Thus, in order to respond to the needs of international customer base, unicode character set is developed. The universal character set has numerous advantage because of the large number of characters that these languages share due to their common origins. These codes have found out a new way to encode languages for computer manipulation. These encoding allows over one million characters to be defined. Such codes covers not only currently used languages but also ancient languages that are studied by historians and scholars. These single character set have many advantages for all languages are particularly evident in libraries where works occur in different scripts.
Published: 2005

20. Cataloging, Character Sets and the Longue Durée.

Author: Riley, Charles L.
Subjects: *CATALOGING, *LIBRARY catalogs, *LIBRARY technical services, *CHARACTER sets (Data processing), *UNICODE (Computer character set), *CORPORATE headings (Cataloging)
Abstract: The article shares information on developments in library technical services, particularly on cataloging and character sets. Topics discussed are the character encoding standard called Unicode, authorized name headings that use expanded range of scripts and appear in catalogs, challenges in the conversion from MARC-8 character set to 8-bit Unicode Transformation Format (UTF-8) encoding, and variation in the capacity of operating system and browsers when it comes to Unicode processing.
Published: 2019

21. Decomposition of multi-output functions oriented to configurability of logic blocks.

Author: KUBICA, M. and KANIA, D.
Subjects: *LOGIC circuit synthesis (Electronic design), *FIELD programmable gate arrays, *ADAPTIVE computing systems, *UNICODE (Computer character set), *ALGORITHMS
Abstract: The main goal of the paper is to present a logic synthesis strategy dedicated to an LUT-based FPGA. New elements of the proposed synthesis strategy include: an original method of function decomposition, non-disjoint decomposition and technology mapping dedicated to configurability of logic blocks. The aim of all of the proposed synthesis approaches is the sharing of appropriately configured logic blocks. Innovation of the methods is based on the way of searching decomposition, which relies on multiple cutting of an MTBDD diagram describing a multi-output function. The essence of the proposed algorithms rests on the method of unicoding dedicated to sharing resources, searching non-disjoint decomposition on the basis of the partition of root tables and choosing the levels of diagram cutting that will guarantee the best mapping to complex logic blocks. The methods mentioned above were implemented in the MultiDec tool. The efficiency of the analyzed methods was experimentally confirmed by comparing the synthesis results with both academic and commercial tools. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

22. Optimising unicode regular expression evaluation with previews.

Author: Chivers, Howard
Subjects: UNICODE (Computer character set), SIMULATION methods & models, LOGARITHMIC functions, LIBRARIES, COMPUTER algorithms
Abstract: The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user-selectable character encodings. An established design approach was used: a simulated non-deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes, and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched. Copyright © 2016 John Wiley & Sons, Ltd. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

23. Importance and Challenges of Social Media Text.

Author: Singh, Shailendra Kumar and Sachan, Manoj Kumar
Subjects: SOCIAL media, TEXT processing (Computer science), NATURAL language processing, UNICODE (Computer character set), CODE switching (Linguistics)
Abstract: The rapid growth of social media like twitter, Facebook, WhatsApp, messenger etc. has increased the availability of unstructured data (texts, images, videos) amount on internet. These texts are different from traditional text. The text written on social media are called Social Media Text. In India more than 50% comments on social media are written in Indian Languages using Unicode and Phonetic typing. The Pre- processing of these texts for application of Natural Language Processing (NLP) is a challenging task. This paper will help the researchers to understand the concept of code mixing, social media text, code mixed text and various challenges of social media text. [ABSTRACT FROM AUTHOR]
Published: 2017

24. A HYBRID TEXT STEGANOGRAPHY APPROACH UTILIZING UNICODE SPACE CHARACTERS AND ZERO-WIDTH CHARACTER.

Author: Aman, Muhammad, Khan, Aihab, Ahmad, Basheer, and Kouser, Saeeda
Subjects: CRYPTOGRAPHY, UNICODE (Computer character set), DATA security
Abstract: This paper presents a steganographic approach utilizing Unicode space and Zero-Width Characters. The existing techniques are less robust, not sensitive against steg-analysis and attain low hiding capacity. The proposed technique outperforms the limitations in existing approaches. It tenders high hidden capacity by using lose-less compression algorithm and embedding 4 bits per space using any version of MS Word file as a stego carrier. Moreover, robustness is highly improved by adding multi-layers of security and sensitivity has been created with addition of SHA-1 algorithm. The experimental results verify that the proposed scheme has increased the capacity 4 times and creates 4 times smaller stego-text as compared to existing Unispach method. Moreover, the transparency has not been affected which shows that our approach is best suitable for large messages when high security is required. [ABSTRACT FROM AUTHOR]
Published: 2017

25. The winding, heated, and absurdly technical oral history of the ginger emoji.

Author: MONE, GREGORY
Subjects: *EMOTICONS & emojis, *UNICODE (Computer character set), *SOCIAL media, *INTERNET terminology, *INTERNET users
Abstract: The article discusses the development of emojis to expand the physical appearance of the pictograms. Information about the evolution of emoji in Japan with pictographic language and stylized picture characters in 1999, is provided. Also discussed is the updated Unicode standard for the social media and Internet users.
Published: 2018

26. Fonts & Encodings

Author: Haralambous, Yannis, Horne, P. Scott, Haralambous, Yannis, and Horne, P. Scott
Subjects: Type and type-founding--Digital techniques, Computer fonts, Web typography, Unicode (Computer character set), Character sets (Data processing)
Abstract: This reference is a fascinating and complete guide to using fonts and typography on the Web and across a variety of operating systems and application software. Fonts & Encodings shows you how to take full advantage of the incredible number of typographic options available, with advanced material that covers everything from designing glyphs to developing software that creates and processes fonts.The era of ASCII characters on green screens is long gone, and industry leaders such as Apple, HP, IBM, Microsoft, and Oracle have adopted the Unicode Worldwide Character Standard. Yet, many software applications and web sites still use a host of standards, including PostScript, TrueType, TeX/Omega, SVG, Fontlab, FontForge, Metafont, Panose, and OpenType. This book explores each option in depth, and provides background behind the processes that comprise today's'digital space for writing':Part I introduces Unicode, with a brief history of codes and encodings including ASCII. Learn about the morass of the data that accompanies each Unicode character, and how Unicode deals with normalization, the bidirectional algorithm, and the handling of East Asian characters.Part II discusses font management, including installation, tools for activation/deactivation, and font choices for three different systems: Windows, the Mac OS, and the X Window System (Unix).Part III deals with the technical use of fonts in two specific cases: the TeX typesetting system (and its successor, W, which the author co-developed) and web pages.Part IV describes methods for classifying fonts: Vox, Alessandrini, and Panose-1, which is used by Windows and the CSS standard. Learn about existing tools for creating (or modifying) fonts, including FontLab and FontForge, and become familiar with OpenType properties and AAT fonts.Nowhere else will you find the valuable technical information on fonts and typography that software developers, web developers, and graphic artists need to know to get typography and fonts to work properly.
Published: 2007

27. Data Representation with Unicode Standard in Presentation Layer of OSI

Author: International Symposium on Information Theory & Its Applications (1994 : Sydney, N.S.W.), Liu, Raymond, and Lions, John
Published: 1994

28. Unicode Explained

Author: Jukka K. Korpela and Jukka K. Korpela
Subjects: Unicode (Computer character set)
Abstract: Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. There are hundreds of different encoding systems for mapping characters to numbers, but Unicode promises a single mapping. Unicode enables a single software product or website to be targeted across multiple platforms, languages and countries without re-engineering. It's no wonder that industry giants like Apple, Hewlett-Packard, IBM andMicrosoft have all adopted Unicode. Containing everything you need to understand Unicode, this comprehensive reference from O'Reilly takes you on a detailed guide through the complex character world. For starters, it explains how to identify and classify characters - whether they're common, uncommon, or exotic. It then shows you how to type them, utilize their properties, and process character data in a robust manner. The book is broken up into three distinct parts. The first few chapters provide you with a tutorial presentation of Unicode and character data. It gives you a firm grasp of the terminology you need to reference various components, including character sets, fonts and encodings, glyphs and character repertoires. The middle section offers more detailed information about using Unicode and other character codes. It explains the principles and methods of defining character codes, describes some of the widely used codes, and presents code conversion techniques. It also discusses properties of characters, collation and sorting, line breaking rules and Unicode encodings. The final four chapters cover more advanced material, suchas programming to support Unicode. You simply can't afford to be without the nuggets of valuable information detailed in Unicode Explained.
Published: 2006

29. On algebraic surfaces of general type with negative $c_{2}$.

Author: Gu, Yi
Subjects: *ALGEBRAIC surfaces, *PRIME numbers, *EULER characteristic, *SIGNED numbers, *UNICODE (Computer character set)
Abstract: We prove that for any prime number $p\geqslant 3$, there exists a positive number $\unicode[STIX]{x1D705}_{p}$ such that $\unicode[STIX]{x1D712}({\mathcal{O}}_{X})\geqslant \unicode[STIX]{x1D705}_{p}c_{1}^{2}$ holds true for all algebraic surfaces $X$ of general type in characteristic $p$. In particular, $\unicode[STIX]{x1D712}({\mathcal{O}}_{X})>0$. This answers a question of Shepherd-Barron when $p\geqslant 3$. [ABSTRACT FROM PUBLISHER]
Published: 2016
Full Text: View/download PDF

30. The Content Of Their Characters.

Author: Erard, Michael and Dorfman, Matt
Subjects: *UNICODE (Computer character set), *EMOTICONS & emojis, *ASCII (Character set)
Abstract: The article offers information on the Unicode whose mission was to bring the world's neglected languages into the digital sphere until emoji came along. It mentions that technology firms like Apple, Xerox, DEC, Hewlett-Packard and Kodak have created their proprietary encodings in order to work with more writing systems than ASCII was able to handle.
Published: 2017

31. Stata tip 129: Efficiently processing textual data with Stata's new Unicode features.

Author: Koplenig, Alexander
Subjects: *NATURAL language processing, *UNICODE (Computer character set)
Published: 2018
Full Text: View/download PDF

32. Fo & Fo: Forscher und Fonts, oder Probleme der multilingualen Textverarbeitung in der Slavistik.

Author: Podtergera, Irina
Subjects: CATALOGING of Slavic literature, MULTILINGUALISM & literature, FONTS & typefaces, TEXT processing (Computer science), GREEK literature, UNICODE (Computer character set), MEDIEVAL manuscripts
Abstract: This paper is intended to help Slavicists dealing with multilingual text. It presents an account of the experience gained while working on the SlaVaComp project. The main concern of the paper are problems of multilingual text processing specifically the use of characters in entering Greek and Church Slavonic texts into the computer. It also broaches the most common problems encountered when using Unicode characters. Finally, the paper discusses select text processing technologies based on the WYSIWYG principle which can be employed to represent medieval manuscripts correctly by means of Unicode. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

33. Email Address Internationalization: 例子@例子.中国.

Author: Gulbrandsen, Arnt and Yao, Jiankang
Subjects: EMAIL, UNICODE (Computer character set), CLIENT/SERVER computing equipment, EMAIL systems standards, EMAIL systems -- Design & construction
Abstract: For 30 years, Internet email addresses have used the format user@example.com, with the character set A-Z, 0-9, ., and -. Gradually mail has been extended to support bæ, löffel, and 中国 in most places, but not in addresses. Now Email Address Internationalization (EAI) work has added support for unicode in addresses, simplifying syntax simultaneously. (To ensure that characters display correctly, please read the PDF version of this article.) [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

34. Unicode-based method for text steganography with malayalam text.

Author: Vidhya, P.M. and Paul, Varghese
Subjects: *UNICODE (Computer character set), *CRYPTOGRAPHY, *INFORMATION processing, *INFORMATION sharing, *DATA encryption, *COMPUTER algorithms
Abstract: Recent researches regarding information hiding is mostly concentrating on Linguistic steganography. The Steganography is the art and science of hiding a message inside another message without drawing any suspicion to the others so that the message can only be detected by its intended recipient. Now days, concentration is made on local language encryption based steganography to provide high security for the secret information sharing. In this paper, a method to steganography is proposed with an Indian local language, Malayalam. The proposed method consists of a custom Unicode based technique with embedding based on indexing, i.e. the original message is encoded to a Malayalam text with custom UNICODE values generated for the Malayalam text. After that an embedding algorithm will be designed to mix the encoded original message with the Malayalam text. The experimental study was done to evaluate the efficiency of the proposed approach. The comparison study of the proposed method against an existing method revealed that, the proposed steganography methods is more precise in the encoding process and balanced in the decoding process. The proposed method achieved a precision rate of .95 and decoding rate of .81. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

35. UBISLABEL & UNDERSCOREENCODING: A NEW APPROACH FOR LABEL-ENCODING IN THE MULTILINGUAL WORLD WIDE WEB.

Author: Heckmann, Dominikus and Loskyll, Matthias
Subjects: MULTILINGUAL websites, ENCODING, UNICODE (Computer character set), SEMANTIC Web, WORLD Wide Web, UNIFORM Resource Identifiers
Abstract: Internationalization of identification names, for cities for instance, bears two major problems: which language to choose for the label and which character set to choose for those characters that are not covered by the ASCII character set. UbisLabel is a new approach to combine ideas for internationalizing the labels for identifiers on the Semantic Web and the Web in general. We introduce an inline syntax to put several, possibly multilingual labels into one string. UderscoreEncoding is able to represent the full Unicode code points shorter than existing textual UTF representation. It has been developed in order to represent special characters in an efficient manner without using any characters apart from alphanumerical letters plus the underscore only. The idea is that such encoded labels can be attached directly to the identifiers (like URIs) without the need to be further encoded anywhere else in the Internet. [ABSTRACT FROM AUTHOR]
Published: 2009

36. Texting in Ancient Mayan Hieroglyphs: What Unicode will make possible.

Author: Machulak, Erica
Subjects: *MAYAN languages, *HIEROGLYPHICS, *UNICODE (Computer character set), *SYLLABLE (Grammar)
Abstract: The article informs on the history of Mayan Hieroglyphs. It mentions that the complexity of Mayan poses unique challenges to scholars who want to make it more widely accessible electronically and to do so, Mayan will need to make it into the Unicode Standard. It also mentions that Carlos Pallán Gayol is working with Deborah Anderson to get this writing system into the Unicode Standard, and include signs that indicate words, syllables, and signs that indicate both words and syllables.
Published: 2023

37. A Fast Input Method for Tibetan Based on Word in Unicode.

Author: Weilan Wang and Lingwang Kun
Subjects: UNICODE (Computer character set), DATABASES, TIBETAN language, SYLLABLE (Grammar), LANGUAGE glossaries, vocabularies, etc.
Abstract: The database is essential resources for an intelligent Tibetan language inputting system in Unicode. This article mentions Tibetan language input system database including syllable and vocabulary. The database of syllable and vocabulary are established by the code rule of Latin Transcribing and root Latin Transcribing, respectively, and the frequency of words are the statistics based on the Tibetan language corpus of 60M. Searching possible syllables and words based on inputted Latin Transcribing string or root Latin Transcribing string; matching the template syllables and words according to the arithmetic of optimal match, the key assignments of input code can be able to obtain. Then the key assignment point to the linked list of the syllables or words corresponding, and same code syllables or words are sent to the candidate windows in the linked list, thereby inputting of searches of syllables, Sanskrits and words are implemented. So far the input method of Tibetan language is a very fast and efficient in Unicode. [ABSTRACT FROM AUTHOR]
Published: 2008

38. Unicode turn, BULAC experience.

Author: Desnoues, Bernard
Subjects: *CARD catalogs, *CHARACTER sets (Data processing), *UNICODE (Computer character set), *DATA processing of signs & symbols, *ALPHABET -- Data processing
Abstract: BULAC, the Bibliothèque universitaire des langues et civilisations (in English : University Library for Language and Civilization Studies) will open to the public in 2010. It will offer a wide range of documentary resources on Non-Western languages and civilizations to students and researchers. These resources are at present scattered in many locations and are to be gathered in the new library. The primary aim of the project is to build a common multi-script catalogue. We had first of all to convert data from previous computerized catalogues (several formats, several character sets). We purchased consequently an integrated library system in 2003. This library system was decided to be unimarc and unicode compliant according to our specifications. The catalogue was launched in july 2005. It contains by now CJK, Cyrillic, Greek and Latin scripts. But, we had many difficulties with the character sets conversion. We had also to understand Unicode principles. The catalogue is now working properly. But we have to improve the index rules and we have also to implement many other scripts in the catalogue. [ABSTRACT FROM AUTHOR]
Published: 2006

39. Open source for an emerging country

Author: Balnaves, Edmund
Published: 2014

40. Unicode Encoding Conversions with STL Strings and Win32 APIs.

Author: Dicanio, Giovanni
Subjects: UNICODE (Computer character set), APPLICATION program interfaces, C++
Abstract: The article focuses on the conversion between encodings Unicode Transformation Format (UTF)-8 and UTF-16 using Windows (Win)32 application program interfaces (APIs) and addressing errors with C+ + exceptions.
Published: 2016

41. PixelCAPTCHA.

Author: Kalra, Gursev Singh
Subjects: *CAPTCHA (Challenge-response test), *UNICODE (Computer character set), *DESIGN techniques
Abstract: This paper will discuss a new visual CAPTCHA [1] scheme that leverages the 64K Unicode code points from the Basic Multilingual Plane (plane 0) to construct the CAPTCHAs that can be solved with 2 to 4 mouse clicks. We will review the design principles, the security mechanisms and its various features. We will also discuss the potential attack vectors on the proposed CAPTCHA scheme. The proposed CAPTCHA scheme will also be available as an open source Java library in near future. [ABSTRACT FROM AUTHOR]
Published: 2016

42. An improved algorithm for information hiding based on features of Arabic text: A Unicode approach.

Author: Mohamed, A. A.
Subjects: ALGORITHMS, CRYPTOGRAPHY, REDUNDANCY (Linguistics), ARABIC language, UNICODE (Computer character set), INFORMATION technology security
Abstract: Steganography means how to hide secret information in a cover media, so that other individuals fail to realize their existence. Due to the lack of data redundancy in the text file in comparison with other carrier files, text steganography is a difficult problem to solve. In this paper, we proposed a new promised steganographic algorithm for Arabic text based on features of Arabic text. The focus is on more secure algorithm and high capacity of the carrier. Our extensive experiments using the proposed algorithm resulted in a high capacity of the carrier media. The embedding capacity rate ratio of the proposed algorithm is high. In addition, our algorithm can resist traditional attacking methods since it makes the changes in carrier text as minimum as possible. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

43. Unicode Text Editor for Ancient Egyptian Hieroglyphs Writing System.

Author: AL-Nasrawi, Dhamyaa A., Hashem, Hashem A., and Odhaib, Mohammed A.
Subjects: UNICODE (Computer character set), TEXT editors (Computer programs), HIEROGLYPHICS, PROGRAMMING languages, COMPUTER research
Abstract: A writing system as a set of visible used to represent units of language in a systematic way. Egyptian hieroglyphs were a formal writing system used by the ancient Egyptians that combined logographic and alphabetic elements. In serious music textbook editor programs there is a trouble in writing documents which include Egyptian hieroglyph symbols which take more than two bytes, because there is no way to embed these symbolization in a particular document. In this paper, a special text editor designed for ancient Egyptian hieroglyph writing system which power comprehension the Egyptian hieroglyph symbols using Unicode standard, and the basic operations of a classical text editor such as operations of file manipulation (load and save), select font size, type, and color, the operations of copy, paste, cut, select all, etc. as well as print the final document. Besides this, hieroglyphic numbers were represented in this editor. More facilities, flexibility, simplicity and benefits can be gotten using these editors especially for historic research worker which they interested authors in the ancient civilization. This editor designed in Visual Basic.Net. [ABSTRACT FROM AUTHOR]
Published: 2014

44. Simple linear string constraints.

Author: Fu, Xiang, Powell, Michael, Bantegui, Michael, and Li, Chung-Chih
Subjects: *WEB-based user interfaces, *COMPUTER viruses, *TEXT processing (Computer science), *TRANSDUCERS, *UNICODE (Computer character set), *ALGORITHMS
Abstract: Modern web applications often suffer from command injection attacks. Even when equipped with sanitization code, many systems can be penetrated due to software bugs. It is desirable to automatically discover such vulnerabilities, given the bytecode of a web application. One approach would be symbolically executing the target system and constructing constraints for matching path conditions and attack patterns. Solving these constraints yields an attack signature, based on which, the attack process can be replayed. Constraint solving is the key to symbolic execution. For web applications, string constraints receive most of the attention because web applications are essentially text processing programs. We present simple linear string equation (SISE), a decidable fragment of the general string constraint system. SISE models a collection of regular replacement operations (such as the greedy, reluctant, declarative, and finite replacement), which are frequently used by text processing programs. Various automata techniques are proposed for simulating procedural semantics such as left-most matching. By composing atomic transducers of a SISE, we show that a recursive algorithm can be used to compute the solution pool, which contains the value range of each variable in concrete solutions. Then a concrete variable solution can be synthesized from a solution pool. To accelerate solver performance, a symbolic representation of finite state transducer is developed. This allows the constraint solver to support a 16-bit Unicode alphabet in practice. The algorithm is implemented in a Java constraint solver called SUSHI. We compare the applicability and performance of SUSHI with Kaluza, a bounded string solver. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

45. Ada Gems.

Subjects: *ADA (Computer program language), *UNICODE (Computer character set), *ENCODING, *COMPUTER programming
Abstract: The article discusses several Ada programming language Gems taken from the AdaCore Gem of the Week series. It describes the Gem#144 which is concerned with the concepts behind Unicode and encoding. It mentions the three series of Gems which describes three aspects, Static_Predicate, Dynamic_Predicate and Type_Invariant, that can be used in specifying invariant properties of types and subtypes.
Published: 2013

46. The Construction of the Multilingual Internet: Unicode, Hebrew, and Globalization.

Author: John, Nicholas A.
Subjects: MULTILINGUAL websites, UNICODE (Computer character set), HEBREW alphabet, GLOBALIZATION, CULTURAL imperialism, LINGUISTIC minorities
Abstract: This paper examines the technologies that enable the representation of Hebrew on websites. Hebrew is written from right to left and in non-Latin characters, issues shared by a number of languages which seem to be converging on a shared solution-Unicode. Regarding the case of Hebrew, I show how competing solutions have given way to one dominant technology. I link processes in the Israeli context with broader questions about the 'multilingual Internet,' asking whether the commonly accepted solution for representing non-Latin texts on computer screens is an instance of cultural imperialism and convergence around a western artifact. It is argued that while minority languages are given an online voice by Unicode, the context is still one of western power. [ABSTRACT FROM AUTHOR]
Published: 2013
Full Text: View/download PDF

47. Enhancing Data Security in Cloud Computing Environment.

Author: Sugumar, Ramalingam, Rajeswari, A. Janet, and Hariharan, Shanmugasundaram
Subjects: CLOUD computing security measures, DATA security, COMPUTER security, NETWORK PC (Computer), CRYPTOGRAMS, UNICODE (Computer character set)
Abstract: Security in a networked environment, especially in cloud computing, is the hottest field that is attracting several research communities. Storage of data and security using standard cryptographic algorithms has become an important topic of discussion and research. Cloud computing moves the software and databases to data centers that are not trustworthy. Also, cloud storage faces many new security challenges that need more attention. The major threats of data security in cloud storage arise out of scheduling the computing resources using network, storage and retrieval of data. In this paper, we suggest a framework that enhances data security in the cloud storage system. We use Unicode, colors and private keys to develop a new and simple method that ensures secure storage of data in the cloud. [ABSTRACT FROM AUTHOR]
Published: 2013

48. EyeMap: a software system for visualizing and analyzing eye movement data in reading.

Author: Tang, Siliang, Reilly, Ronan, and Vorstius, Christian
Subjects: *DATA analysis software, *EYE movements, *READING research, *UNICODE (Computer character set), *FONTS & typefaces, *WEB browsers, WRITING
Abstract: We have developed EyeMap, a freely available software system for visualizing and analyzing eye movement data specifically in the area of reading research. As compared with similar systems, including commercial ones, EyeMap has more advanced features for text stimulus presentation, interest area extraction, eye movement data visualization, and experimental variable calculation. It is unique in supporting binocular data analysis for unicode, proportional, and nonproportional fonts and spaced and unspaced scripts. Consequently, it is well suited for research on a wide range of writing systems. To date, it has been used with English, German, Thai, Korean, and Chinese. EyeMap is platform independent and can also work on mobile devices. An important contribution of the EyeMap project is a device-independent XML data format for describing data from a wide range of reading experiments. An online version of EyeMap allows researchers to analyze and visualize reading data through a standard Web browser. This facility could, for example, serve as a front-end for online eye movement data corpora. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

49. A Comparative Study of UTF-8, UTF-16, and UTF-32 of Unicode Code Point.

Author: Kumar, Sanjeev
Subjects: UNICODE (Computer character set), COMPUTER software, COMPUTER software development, COMPUTER operating systems, XML (Extensible Markup Language), WML (Document markup language)
Abstract: Unicode is a critical enabling technology for developers who want to internationalize applications for global environments. Unicode assigns a unique number for every character, irrespective of what the platform, or the program, or the language is. The Unicode Standard has been adopted in the industry by Apple, HP, IBM, Microsoft, Oracle, SAP, Sun, Sybase, and many others. Unicode is required by modern standards such as XML, Java and WML, and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode standard, and the availability of tools supporting it, is among the most significant recent global software technology advances. Each available format of UTF-8, UTF-16 and UTF-32 has its own pros and cons. The comparison of the following three formats is discussed in this paper. [ABSTRACT FROM AUTHOR]
Published: 2012

50. The liberty of invention: alchemical discourse and information technology standardization.

Author: Walsh, John A. and Hooper, Wallace Edd
Subjects: *ALCHEMY, *UNICODE (Computer character set), *INFORMATION technology, *STANDARDIZATION, *GLYPHS (Graphic methods), *ICONICITY (Linguistics)
Abstract: The Chymistry of Isaac Newton project, an online scholarly edition of Newton's alchemical manuscripts, has engaged in a process to include a number of core alchemical symbols into the Unicode standard, a standard for digital representation of characters and symbols from the world's languages, scripts, and writing systems. Our article explores the relationship between information technology standardization and humanities research. We discuss Newton's engagement with alchemy and explore the graphic dimensions of alchemical discourse. We illustrate this discussion with examples of Newton's use of alchemical symbols. We examine Unicode itself, particularly a core Unicode principle distinguishing between the abstract character and the image or glyph of the character, and we discuss the tensions between this core principle and the representation of graphic, symbolic, and pictorial discourse. We describe our experience with the Unicode proposal process and illustrate again—this time with an organizational scheme for the symbols—how the technical standardization process forced a reexamination of our historical materials. Our conclusions reemphasize the potential for mutually beneficial relationships between certain types of information technology standardization and humanities research and suggest that study of the graphic qualities of alchemical discourse, especially in light of competing theories of text represented by standards like Unicode, may contribute to our understanding of the increasingly graphic, iconic, and pictorial nature of information and communication. [ABSTRACT FROM AUTHOR]
Published: 2012
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

204 results on '"*UNICODE (Computer character set)"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources