Descriptor: "External Data Representation" / Publisher: acm - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"External Data Representation"' showing total 185 results

Start Over Descriptor "External Data Representation" Publisher acm

185 results on '"External Data Representation"'

1. Real-time data visualisation on the adaptive city platform

Author: Rohit Verma, Justas Brazauskas, Matthew Danish, Vadim Safronov, Ian Lewis, and Richard Mortier
Subjects: Building management system, Data visualization, Computer science, business.industry, Information model, Human–computer interaction, Smart environment, Real-time data, business, External Data Representation, Building automation, Visualization
Abstract: In smart buildings research, the integration of Building Information Models (BIM), Building Management Systems (BMS), and Internet of Things (IoT) is of paramount importance. However, such integration often overlooks real-time building data visualisation. In this demo, we examine challenges related to spatiotemporal data representation and novel visualisation methods in smart environments. Following this, we present the front-end design of our Adaptive City Platform (ACP), a system for collecting, processing and visualising building information and sensor data in real-time.
Published: 2021

2. AutoML: From Methodology to Application

Author: Zhen Wang, Yaliang Li, Yuexiang Xie, Ce Zhang, Bolin Ding, and Kai Zeng
Subjects: Hyperparameter, Range (mathematics), Model architecture, business.industry, Computer science, Process (engineering), Hyperparameter optimization, Feature generation, Architecture, Software engineering, business, External Data Representation
Abstract: Machine Learning methods have been adopted for a wide range of real-world applications, ranging from social networks, online image/video-sharing platforms, and e-commerce to education, healthcare, etc. However, in practice, a large amount of effort is required to tune several components of machine learning methods, including data representation, hyperparameter, and model architecture, in order to achieve a good performance. To alleviate the required tunning efforts, Automated Machine Learning (AutoML), which can automate the process of applying machine learning methods, has been studied in both academy and industry recently. In this tutorial, we will introduce the main research topics of AutoML, including Hyperparameter Optimization, Neural Architecture Search, and Meta-Learning. Two emerging topics of AutoML, Automatic Feature Generation and Machine Learning Guided Database, will also be discussed since they are important components for real-world applications. For each topic, we will motivate it with application examples from industry, illustrate the state-of-the-art methodologies, and discuss some future research directions based on our experience from industry and the trends in academy.
Published: 2021

3. Using Generative Adversarial Networks to Create Graphical User Interfaces for Video Games

Author: Christopher Acornley
Subjects: Adversarial system, business.industry, Human–computer interaction, Process (engineering), Computer science, Component (UML), Architecture, USable, business, External Data Representation, Generative grammar, Graphical user interface
Abstract: Designing and creating a Graphical User Interface (GUI) is a difficult and slow process. It requires a number of professions to all contribute to its development and it can be heavily detrimental to a product if implemented poorly. This research aims to investigate a method of using Generative Adversarial Networks (GANs) to generate new and usable designs for GUIs. GANs are a relatively new architecture for adversarial learning and have been used to good effect in replicating instances of a real dataset. The primary aim is to develop a GAN that is capable of processing a collection of existing GUIs and learn how to replicate these to allow for creation of further designs. These GUI designs need to be formatted in a manner that enables modification, allowing for them to be used by a development team to enhance their production process. Completed work demonstrates numerous approaches at using GANs to create text files that contain the component elements of a GUI. Their results and the release of a similar research paper (GUIGAN) has led to a new approach focusing on more abstract data representation, with a quality control system for ensuring the output data is properly formatted. It is hypothesised that the approach will develop a model capable of creating new, editable GUI designs.
Published: 2021

4. Angelic and demonic visitation: school memories

Author: Leila Salem
Subjects: Cognitive science, Refinement calculus, Algebraic semantics, Computer science, Process (engineering), Interpretation (philosophy), Program derivation, External Data Representation, Programmer, Reusability
Abstract: The whole activity of programming can be thought of as a process of time reusability. This essay considers how computing education can transform these errors into shared learning journeys by refining the relationship between programmer and user. It is also an interpretation of Ralph-Johan Back’s Changing data representation in the refinement calculus.
Published: 2021

5. Measuring Data Collection Diligence for Community Healthcare

Author: Saachi Dalal, Dhyanesh Narayanan, Hamid Abdullah, Milind Tambe, Mohammad Sarparajul Ambiya, Ramesha Karunasena, Divy Thakkar, Arunesh Sinha, and Ruchit Nagar
Subjects: Data collection, business.industry, Computer science, media_common.quotation_subject, Context (language use), External Data Representation, Data science, Diligence, Subject-matter expert, Data quality, Health care, business, Raw data, media_common
Abstract: Data analytics has tremendous potential to provide targeted benefit in low-resource communities, however the availability of high-quality public health data is a significant challenge in developing countries primarily due to non-diligent data collection by community health workers (CHWs). Our use of the word non-diligence here is to emphasize that poor data collection is often not a deliberate action by CHW but arises due to a myriad of factors, sometime beyond the control of the CHW. In this work, we define and test a data collection diligence score. This challenging unlabeled data problem is handled by building upon domain expert’s guidance to design a useful data representation of the raw data, using which we design a simple and natural score. An important aspect of the score is relative scoring of the CHWs, which implicitly takes into account the context of the local area. The data is also clustered and interpreting these clusters provides a natural explanation of the past behavior of each data collector. We further predict the diligence score for future time steps. Our framework has been validated on the ground using observations by the field monitors of our partner NGO in India. Beyond the successful field test, our work is in the final stages of deployment in the state of Rajasthan, India. This system will be helpful in providing non-punitive intervention and necessary guidance to encourage CHWs.
Published: 2021

6. It’s about Time: Adopting Theoretical Constructs from Visualization for Sonification

Author: Robert Höldrich, Michael Iber, Alexander Rind, Wolfgang Aigner, and Kajetan Enge
Subjects: Perceptual system, Human–computer interaction, Analytics, business.industry, Sonification, Computer science, Designtheory, External Data Representation, business, Field (computer science), Visualization, Terminology
Abstract: Both sonification and visualization convey information about data by effectively using our human perceptual system, but their ways to transform the data could not be more different. The sonification community has demanded a holistic perspective on data representation, including audio-visual analysis, several times during the past 30 years. A design theory of audio-visual analysis could be a first step in this direction. An indispensable foundation for this undertaking is a terminology that describes the combined design space. To build a bridge between the domains, we adopt two of the established theoretical constructs from visualization theory for the field of sonification. The two constructs are the spatial substrate and the visual mark. In our model, we choose time to be the temporal substrate of sonification. Auditory marks are then positioned in time, such as visual marks are positioned in space. The proposed definitions allow discussing visualization and sonification designs as well as multi-modal designs based on a common terminology. While the identified terminology can support audio-visual analytics research, it also provides a new perspective on sonification theory itself.
Published: 2021

7. Optimizing Data Science Applications using Static Analysis

Author: Sundararajarao Sudarshan, Mudra Sahu, and Bhushan Pal Singh
Subjects: Source code, Computer science, media_common.quotation_subject, Fetch, Python (programming language), Static analysis, External Data Representation, Data science, Column (database), Transformation (function), Memory footprint, computer, media_common, computer.programming_language
Abstract: Data science applications are often coded in Python, using Pandas and similar APIs. Pandas requires data to be in memory, and when run on larger datasets, these applications may run out of memory, or suffer from poor performance. We describe the SCIRPy system for optimizing such applications by source to source transformations, using static analysis and transformation rules. SCIRPy implements a number of optimizations like data selection, drop column removal, multistage data fetch, and efficient data representation based on metadata analysis. The application source code is transformed into a custom-built intermediate representation (IR) and these optimizations are performed in this IR. The optimized IR is then transformed back to Python source. Our experiments show that our approach reduces the memory footprint and time consumption of a number of data science applications.
Published: 2021

8. AutoML

Author: Ce Zhang, Bolin Ding, Yaliang Li, and Zhen Wang
Subjects: Hyperparameter, Meta learning (computer science), Computer science, Process (engineering), Scale (chemistry), Hyperparameter optimization, Perspective (graphical), Architecture, External Data Representation, Data science
Abstract: Machine learning methods have been adopted for various real-world applications, ranging from social networks, online image/video-sharing platforms, and e-commerce to education, healthcare, etc. However, several components of machine learning methods, including data representation, hyperparameter and model architecture, can largely affect their performance in practice. Moreover, the explosions of data scale and model size make the optimization of these components more and more time-consuming for machine learning developers. To tackle these challenges, Automated Machine Learning (AutoML) aims to automate the process of applying machine learning methods to solve real-world application tasks, reducing the time of tuning machine learning methods while maintaining good performance. In this tutorial, we will introduce the main research topics of AutoML, including Hyperparameter Optimization, Neural Architecture Search and Meta-Learning. Two emerging topics of AutoML, DNN-based Feature Generation and Machine Learning Guided Database, will also be discussed as they are important components for real-world applications. For each topic, we will motivate it with examples from industry, illustrate the state-of-the-art methods, and discuss their pros and cons from both perspectives of industry and academy. We will also discuss some future research directions based on our experience from industry and the trends in academy.
Published: 2021

9. Transformer-based Banking Products Recommender System

Author: Alexandre Boulenger, George Philippe Farajalla, and Davide Liu
Subjects: Metadata, Product (business), Information retrieval, Computer science, Recommender system, Representation (mathematics), External Data Representation, Encoder, Personalization, Transformer (machine learning model)
Abstract: Credit cards, deposits, loans, pension funds, mutual funds which of these products is relevant to a bank's clients, and at what time in their banking journey? We propose a modeling framework for item recommendation using a Transformer encoder [6] and a novel input data representation accounting for the temporal context of item ownership and user metadata. We evaluate the model on a large dataset from Bank Santander. Our system outperforms industry baselines Amazon Personalize [1], and XGBoost [4], a top performing model in the Santander Kaggle competition [2]. We achieve a 56.6% top-3 precision and significantly outperforms Amazon Personalize and the XGBoost model, with 21.5% and 37.9% top-3 precision, respectively. We engineered an original way of representing input data as a sequence and found that this specific representation, with our Transformer-based architecture, improves the model's performance. We hope that our contribution paves the way for the democratization of recommender systems in banking, and the use of the Transformer model for product recommendation in industry.
Published: 2021

10. Tactile Heatmaps: A Novel Visualisation Technique for Data Analysis with Tactile Charts

Author: Gerhard Weber, Christin Engel, and Emma Franziska Müller
Subjects: Focus (computing), InformationSystems_INFORMATIONINTERFACESANDPRESENTATION(e.g.,HCI), Computer science, Human–computer interaction, Visual impairment, Scalability, medicine, medicine.symptom, External Data Representation, Representation (mathematics), Information overload, Readability, Haptic technology
Abstract: Analysing large data sets for various purposes is a growing requirement for many professions. Tactile charts are suitable to enable people with visual impairment and blindness performing data analysis tasks. However, only a few approaches focus on the development of tactile charts for data analysis purposes. Concepts are needed to represent a sufficient amount of data with tactile charts and address arising challenges, such as information overload. In this paper, we first discuss and analyse the scalability of data represented by tactile charts using tactile scatterplots. We further address the data size limitations and present methods to identify critical, tactile representation with limited readability respecting the analysis task. Moreover, we propose methods to increase the amount of data represented in tactile scatterplots. We further introduce tactile heatmaps as an innovative and new concept for haptic data representation that utilises different elevation levels. We evaluated our design concept as well as the feasibility of varying elevation levels with 11 blind and visually impaired people. We compared four design conditions for embossed tactile heatmaps as well as the suitability of 3D-printed heatmaps. The results show that tactile heatmaps are suitable for representing more data than previously known tactile representation methods. They support obtaining an overview of a high amount of data and can be applied for data analysis purposes.
Published: 2021

11. DataMoves: Entangling Data and Movement to Support Computer Science Education

Author: Su Adams, Nicolai Marquardt, Yvonne Rogers, Justas Brazauskas, Susan Lechelt, Ethan Wood, Rebecca Evans, and Emma McFarland
Subjects: Dance, education/learning, Computer science, Movement (music), Teaching method, media_common.quotation_subject, Physical computing, External Data Representation, Domain (software engineering), schools/educational setting, Embodied cognition, ComputingMilieux_COMPUTERSANDEDUCATION, Mathematics education, Curiosity, embodied interaction, media_common
Abstract: In the domain of computing education for children, much work has been done to devise creative and engaging methods of teaching about programming. However, there are many other fundamental aspects of computing that have so far received relatively less attention. This work explores how the topics of number systems and data representation can be taught in a way that piques curiosity and captures learners' imaginations. Specifically, we present the design of two interactive physical computing artefacts, which we collectively call DataMoves, that enable students, 12-14 years old, to explore number systems and data through embodied movement and dance. Our evaluation of DataMoves, used in tandem with other pedagogical methods, demonstrates that the form of embodied, exploration-based learning adopted has much potential for deepening students' understandings of computing topics, as well as for shaping positive perceptions of topics that are traditionally considered boring and dull.
Published: 2021

12. Data as Delight: Eating data

Author: Han D. Phan, Florian 'Floyd' Mueller, Sarah Goodwin, Jionghao Lin, Kun-Ting Chen, Jialin Deng, Kim Marriott, Rohit Ashok Khot, Tim Dwyer, and Yan Wang
Subjects: Research design, Work (electrical), Scope (project management), business.industry, Ephemerality, Interaction design, Sociology, Public relations, business, External Data Representation, ComputingMilieux_MISCELLANEOUS
Abstract: The HCI community has a rich history of finding new ways to engage people with data beyond the screen. With our work, we aim to expand the scope of how interaction design can engage people, arguing that “eating data” has the potential to allow people to experience “data as delight”. With reference to prior work and our design research findings, we discuss the advantages and the challenges of this approach to integrating data and food. We then identify four themes to guide the design of engagements with data through food: food form, food commensality, food ephemerality, and emotional response to food. Within these design themes, we articulate twelve insights for interaction designers to use when working on serving data as delight.
Published: 2021

13. Does Interaction Improve Bayesian Reasoning with Visualization?

Author: Ab Mosca, Alvitta Ottley, and Remco Chang
Subjects: FOS: Computer and information sciences, Computer science, media_common.quotation_subject, 05 social sciences, Testbed, Computer Science - Human-Computer Interaction, 020207 software engineering, Cognition, 02 engineering and technology, Interaction design, External Data Representation, Bayesian inference, Human-Computer Interaction (cs.HC), Task (project management), Visualization, Human–computer interaction, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, Function (engineering), 050107 human factors, media_common
Abstract: Interaction enables users to navigate large amounts of data effectively, supports cognitive processing, and increases data representation methods. However, there have been few attempts to empirically demonstrate whether adding interaction to a static visualization improves its function beyond popular beliefs. In this paper, we address this gap. We use a classic Bayesian reasoning task as a testbed for evaluating whether allowing users to interact with a static visualization can improve their reasoning. Through two crowdsourced studies, we show that adding interaction to a static Bayesian reasoning visualization does not improve participants' accuracy on a Bayesian reasoning task. In some cases, it can significantly detract from it. Moreover, we demonstrate that underlying visualization design modulates performance and that people with high versus low spatial ability respond differently to different interaction techniques and underlying base visualizations. Our work suggests that interaction is not as unambiguously good as we often believe; a well designed static visualization can be as, if not more, effective than an interactive one., 14 pages, 11 figures, To be published in 2021 ACM CHI Virtual Conference on Human Factors in Computing Systems
Published: 2021

14. Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

Author: Erik Schultheis, Priyanshu Gupta, Rohit Babbar, Mohammadreza Qaraei, Professorship Babbar Rohit, Department of Computer Science, Indian Institute of Technology, Computer Science Professors, Aalto-yliopisto, and Aalto University
Subjects: Computer science, business.industry, Supervised learning, Perspective (graphical), Pattern recognition, 02 engineering and technology, Recommender system, External Data Representation, Convexity, ComputingMethodologies_PATTERNRECOGNITION, Distribution (mathematics), 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Fraction (mathematics), Learning to rank, Artificial intelligence, business
Abstract: Extreme Classification (XC) refers to supervised learning where each training/test instance is labeled with small subset of relevant labels that are chosen from a large set of possible target labels. The framework of XC has been widely employed in web applications such as automatic labeling of web-encyclopedia, prediction of related searches, and recommendation systems. While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to Natarajan et al. [26]. This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by combining with label-frequency-based rebalancing. We show that the proposed loss functions can be easily incorporated into various different frameworks for extreme classification. This includes (i) linear classifiers, such as DiSMEC, on sparse input data representation, (ii) attention-based deep architecture, AttentionXML, learnt on dense Glove embeddings, and (iii) XLNet-based transformer model for extreme classification, APLC-XLNet. Our results demonstrate consistent improvements over the respective vanilla baseline models, on the propensity-scored metrics for precision and nDCG.
Published: 2021

15. HyperRec

Author: Yunhui Guo, Justin Morris, Jaeyoung Kang, Yeseong Kim, Sahand Salamat, Tajana Rosing, Mohsen Imani, and Baris Aksanli
Subjects: Speedup, Computer science, 020206 networking & telecommunications, 02 engineering and technology, Recommender system, External Data Representation, Memory management, Computer engineering, 0202 electrical engineering, electronic engineering, information engineering, Hardware acceleration, 020201 artificial intelligence & image processing, Central processing unit, Representation (mathematics), Efficient energy use
Abstract: Recommender systems are important tools for many commercial applications such as online shopping websites. There are several issues that make the recommendation task very challenging in practice. The first is that an efficient and compact representation is needed to represent users, items and relations. The second is-sue is that the online markets are changing dynamically, it is thus important that the recommendation algorithm is suitable for fast updates and hardware acceleration. In this paper, we propose a new hardware-friendly recommendation algorithm based on Hyperdimensional Computing, called HyperRec. Unlike existing solutions which leverages floating-point numbers for the data representation, in HyperRec, users and items are modeled with binary vectors in a high dimension. The binary representation enables to perform the reasoning process of the proposed algorithm only using Boolean operations, which is efficient on various computing platforms and suitable for hardware acceleration. In this work, we show how to utilize GPU and FPGA to accelerate the proposed HyperRec. When compared with the state-of-the-art methods for rating prediction, the CPU-based HyperRec implementation is 13.75× faster and consumes 87% less memory, while decreasing the mean squared error (MSE) for the prediction by as much as 31.84%. Our FPGA implementation is on average 67.0× faster and has 6.9× higher energy efficient as compared to CPU. Our GPU implementation further achieves on average 3.1× speedup as compared to FPGA, while providing only 1.2× lower energy efficiency.
Published: 2021

16. Zero Correlation Error

Author: Joshua San Miguel, Jason H. Anderson, Yuko Hara-Azumi, and Hsuan Hsiao
Subjects: Stochastic computing, Design space exploration, Computer science, Probabilistic logic, 020206 networking & telecommunications, 02 engineering and technology, External Data Representation, Measure (mathematics), 020202 computer hardware & architecture, Metric (mathematics), 0202 electrical engineering, electronic engineering, information engineering, Bitstream, Algorithm, Independence (probability theory)
Abstract: Stochastic computing (SC), with its probabilistic data representation format, has sparked renewed interest due to its ability to use very simple circuits to implement complex operations. Though unlike traditional binary computing, SC needs to carefully handle correlations that exist across data values to avoid the risk of unacceptably inaccurate results. With many SC circuits designed to operate under the assumption that input values are independent, it is important to provide the ability to accurately measure and characterize independence of SC bitstreams. We propose zero correlation error (ZCE), a metric that quantifies how independent two finite-length bitstreams are, and show that it addresses fundamental limitations in metrics currently used by the SC community. Through evaluation at both the functional unit level and application level, we demonstrate how ZCE can be an effective tool for analyzing SC bitstreams, simulating circuits and design space exploration.
Published: 2021

17. Tools for developing color ramps for representing quantitative data

Author: Kazuo Misue
Subjects: Color difference, business.industry, Computer science, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, 02 engineering and technology, Color space, External Data Representation, Color gradient, Data visualization, Path (graph theory), 0202 electrical engineering, electronic engineering, information engineering, Computer vision, Artificial intelligence, business, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: Color is a convenient visual attribute in data visualization that performs significant roles in data representation. However, there are several situations where colors are used inappropriately to represent quantitative data. A possible reason could be that it is not easy to develop color ramps that consider color differences. This paper describes some tools designed to support the development of color ramps to represent quantitative data. The tools help develop color ramps with uniform color differences by selecting colors from a specified color path (continuous straight lines or curved lines) in a color space at equal intervals according to some color difference formula.
Published: 2020

18. Mind the Gap

Author: W. Philip Kegelmeyer, Joe Ingram, Christopher C. Lamb, Armida J. Carbajal, Ramyaa Ramyaa, Eva Domschot, Michael R. Smith, Nicholas T. Johnson, Stephen J. Verzi, and Bridget I. Haus
Subjects: Training set, Computer science, business.industry, computer.software_genre, Machine learning, External Data Representation, Majority class, Bridging (programming), Malware, Artificial intelligence, Malware analysis, Transfer of learning, business, computer, Semantic gap
Abstract: Machine learning (ML) techniques are being used to detect increasing amounts of malware and variants. Despite successful applications of ML, we hypothesize that the full potential of ML is not realized in malware analysis (MA) due to a semantic gap between the ML and MA communities---as demonstrated in the data that is used. Due in part to the available data, ML has primarily focused on detection whereas MA is also interested in identifying behaviors. We review existing open-source malware datasets used in ML and find a lack of behavioral information that could facilitate stronger impact by ML in MA. As a first step in bridging this gap, we label existing data with behavioral information using open-source MA reports---1) altering the analysis from identifying malware to identifying behaviors, 2)~aligning ML better with MA, and 3)~allowing ML models to generalize to novel malware in a zero/few-shot learning manner. We classify the behavior of a malware family not seen during training using transfer learning from a state-of-the-art model for malware family classification and achieve 57% - 84% accuracy on behavioral identification but fail to outperform the baseline set by a majority class predictor. This highlights opportunities for improvement on this task related to the data representation, the need for malware specific ML techniques, and a larger training set of malware samples labeled with behaviors.
Published: 2020

19. Hear Her Fear: Data Sonification for Sensitizing Society on Crime Against Women in India

Author: Surabhi S. Nath
Subjects: FOS: Computer and information sciences, Sound (cs.SD), Human studies, Interface (computing), Applied psychology, Computer Science - Human-Computer Interaction, External Data Representation, Computer Science - Sound, Human-Computer Interaction (cs.HC), Variety (cybernetics), Audio and Speech Processing (eess.AS), Sonification, Sound spatialization, FOS: Electrical engineering, electronic engineering, information engineering, User interface, Psychology, Period (music), Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Data sonification is a means of representing data through sound and has been utilized in a variety of applications. Crime against women has been a rising concern in India. We explore the potential of data sonification to provide an immersive engagement with sensitive data on crime against women in Indian states. The data for nine crime categories covering thirty-five Indian states over a period of twelve years is acquired from National records. Sonification techniques of parameter mapping and auditory icons are adopted: sound parameters such as frequencies, amplitudes and timbres are incorporated to represent the crime data, and audio sounds of women screams are employed as auditory icons to emphasize the traumatic experience. Higher crime rates are assigned higher frequencies, harsher scream textures and larger amplitudes. A user-friendly interface is developed with multiple options for sequential and comparative data sonification. Through the interface, a user can evaluate and compare the extent of crime against women in different states, years or crime categories. Sound spatialization is used to immerse the listener in the sound and further intensify the sonification experience. To assess and validate effectiveness, a user study on twenty participants is conducted with feedback obtained through questionnaires. The responses indicate that the participants could comprehend trends in the data easily and found the data sonification experience impactful. Sonification may therefore prove to be a valuable tool for data representation in fields related to social and human studies., 6 pages
Published: 2020

20. Echo: Analyzing Gameplay Sessions by Reconstructing Them From Recorded Data

Author: Daniel MacCormick and Loutfouz Zaman
Subjects: Computer science, business.industry, 05 social sciences, Echo (computing), ComputingMilieux_PERSONALCOMPUTING, 020207 software engineering, 02 engineering and technology, External Data Representation, User Research, Bridge (nautical), Session (web analytics), Workflow, Human–computer interaction, Analytics, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, business, 050107 human factors, Cognitive load
Abstract: Games user research (GUR) is centered on ensuring games deliver the experience that their designers intended. GUR researchers frequently make use of playtesting to evaluate games. This often requires watching back hours of video footage after the session to ensure that they did not miss anything important. Analytics have been used to help improve this process, providing visualizations of the underlying gameplay data. Yet, many of these game analytics tools provide static visualizations which do not accurately capture the dynamic aspects of modern video games. To address this problem, we have created Echo, a tool that uses gameplay data to reconstruct the original session with in-game assets, instead of abstracting them away. Echo has been designed to help bridge the gap between static gameplay data representation and video footage, with the goal of providing the best of both. A user study revealed that participants found Echo less frustrating to use compared to videos for gameplay analysis and also ranked it higher for efficiency, among others. It revealed that participants felt less cognitive load when using Echo as well. Qualitative results were also promising as participants employed several distinct workflows while using Echo. We received numerous suggestions for building upon the current state of the tool, including support for multiple viewports, live annotations, and visible gameplay metrics.
Published: 2020

21. Representation, navigation and exploration

Author: Luciana A. M. Zaina and José M. F. Vieira
Subjects: business.industry, Computer science, Computational thinking, 05 social sciences, 020207 software engineering, Usability, Context (language use), 02 engineering and technology, External Data Representation, Data type, Interactive Learning, User experience design, Human–computer interaction, User experience evaluation, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, business, 050107 human factors
Abstract: Learning trajectories are paths that students may follow in order to achieve their learning goals. Although the literature has addressed the subject, little has been done in the way of exploring how to visualize learning trajectories. In this paper, we present three forms of interactive learning trajectories visualizations linked to the context of computational thinking. As the interactions on visualizations involved different aspects, our proposal comprises three layers: the data representation, the reactions to the navigation and data exploration where more details of the data can be seen. Due to visualizations being tightly related to the context from which the data comes, we analyzed the data types available in Code.org, a well-known platform commonly used to teach computational thinking. To assess the three visualizations, we carried out usability and user experience evaluation with 23 Brazilian elementary schools teachers. The results revealed that the three visualizations achieved an average of 72% of overall understanding by the audience. Besides, our findings showed the visualizations were well accepted among the participants. We also found out that the user experience reported by the participants is in some way associated with the level of understanding of the visualizations.
Published: 2020

22. Towards Clustering-friendly Representations

Author: Zhengrui Ma, Zhao Kang, Guangchun Luo, Ling Tian, and Wenyu Chen
Subjects: Computer science, business.industry, Deep learning, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, Filter (signal processing), Document clustering, External Data Representation, Linear subspace, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), 020201 artificial intelligence & image processing, Artificial intelligence, Cluster analysis, business, Feature learning
Abstract: Finding a suitable data representation for a specific task has been shown to be crucial in many applications. The success of subspace clustering depends on the assumption that the data can be separated into different subspaces. However, this simple assumption does not always hold since the raw data might not be separable into subspaces. To recover the "clustering-friendly" representation and facilitate the subsequent clustering, we propose a graph filtering approach by which a smooth representation is achieved. Specifically, it injects graph similarity into data features by applying a low-pass filter to extract useful data representations for clustering. Extensive experiments on image and document clustering datasets demonstrate that our method improves upon state-of-the-art subspace clustering techniques. Especially, its comparable performance with deep learning methods emphasizes the effectiveness of the simple graph filtering scheme for many real-world applications. An ablation study shows that graph filtering can remove noise, preserve structure in the image, and increase the separability of classes.
Published: 2020

23. SparseTrain

Author: Christopher W. Fletcher, Christopher J. Hughes, Josep Torrellas, Zhangxiaowen Gong, and Houxiang Ji
Subjects: 010302 applied physics, Computer science, business.industry, Computation, Deep learning, Inference, 02 engineering and technology, External Data Representation, 01 natural sciences, 020202 computer hardware & architecture, Convolution, Feature (computer vision), 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Leverage (statistics), SIMD, Artificial intelligence, business, Algorithm
Abstract: Our community has improved the efficiency of deep learning applications by exploiting sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known statically, and/or for specialized hardware. In this paper, we propose SparseTrain, a software-only scheme to leverage dynamic sparsity during training on general-purpose SIMD processors. SparseTrain exploits zeros introduced by the ReLU activation function to both feature maps and their gradients. Exploiting such sparsity is challenging because the sparsity degree is moderate and the locations of zeros change over time. SparseTrain identifies zeros in a dense data representation and performs vectorized computation. Variations of the scheme are applicable to all major components of training: forward propagation, backward propagation by inputs, and backward propagation by weights. Our experiments on a 6-core Intel Skylake-X server show that SparseTrain is very effective. In end-to-end training of VGG16, ResNet-34, and ResNet-50 with ImageNet, SparseTrain outperforms a highly-optimized direct convolution on the non-initial convolutional layers by 2.19x, 1.37x, and 1.31x, respectively. SparseTrain also benefits inference. It accelerates the non-initial convolutional layers of the aforementioned models by 1.88x, 1.64x, and 1.44x, respectively.
Published: 2020

24. ExPress: Simultaneously Achieving Storage, Execution and Energy Efficiencies in Moderately Sparse Matrix Computations

Author: Alex Weaver, Shashank Adavally, Krishna M. Kavi, Benjamin Wang, Pranoy Dutta, and Nagendra Gulur
Subjects: Computational complexity theory, business.industry, Computer science, computer.file_format, Parallel computing, External Data Representation, Column (database), Metadata, Software, Encoding (memory), Bitmap, business, computer, Sparse matrix
Abstract: Sparse matrix computations have witnessed a resurgence with the pervasive use of deep neural networks. Leveraging sparsity enables efficiency of storage by avoiding storing zeroes. However, sparse representations incur metadata computational overheads – software needs to process the metadata (or indexe) that describes row/column locations of non-zero values before it can access the corresponding data values. There have been several formats proposed for representing sparse matrices including Compressed Sparse Row (CSR), Coordinate (COO), Bitmaps, Run-length encoding, & hierarchical representations. Each representation achieves different levels of memory compression and incurs different levels of computational complexity depending on the sparsity (percentage of zero values). We seek answers to the following: (i) at what sparsity levels is it worth eliminating compressed representation of matrices and use the dense representation that includes both zeros and non-zero values, and (ii) even if we use compressed data representation, will it be useful to expand the matrices internally to eliminate metadata processing overheads? In this paper we propose the use of a special hardware called ExPress that expands compressed matrices into dense data, eliminating metadata computations from the main processing element. Our ExPress hardware is configurable so that it can expand from different compressed formats. Our experiments for matrix-vector multiplication using several DNN workloads show performance gains of 43%, 33% and 11% on average over software implementations that use CSR, Bitmap and Run-length encoding respectively. ExPress shows performance gains over sparse software codes for sparsity up to 70%. Further, ExPress simultaneously achieves energy improvement by reducing the instruction overhead of sparsity-aware computations.
Published: 2020

25. GROOT

Author: Juheon Yi, Sunghyun Choi, Kyung Jin Lee, Youngki Lee, and Young Min Kim
Subjects: Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Point cloud, 020206 networking & telecommunications, 02 engineering and technology, Frame rate, External Data Representation, Data structure, Octree, Filter (video), 0202 electrical engineering, electronic engineering, information engineering, Overhead (computing), 020201 artificial intelligence & image processing, Augmented reality, Computer vision, Artificial intelligence, business
Abstract: We present GROOT, a mobile volumetric video streaming system that delivers three-dimensional data to mobile devices for a fully immersive virtual and augmented reality experience. The system design for streaming volumetric videos should be fundamentally different from conventional 2D video streaming systems. First, the amount of data required to deliver the 3D volume is considerably larger than conventional videos with frames of 2D images, even compared to high-resolution 2D or 360° videos. Second, the 3D data representation, which encodes the surface of objects within the volume, is a sparse and unorganized data structure with varying scales, whereas a conventional video is composed of a sequence of images with the fixed-size 2D grid structure. GROOT is a streaming framework with a novel data structure that enables not only real-time transmission and decoding on mobile devices but also continuous on-demand user view adaptation. Specifically, we modify the conventional octree to introduce the independence of leaf nodes with minimal memory overhead, which enables parallel decoding of highly irregular 3D data. We also developed a suite of techniques to compress color information and filter out 3D points outside of a user's view, which efficiently minimizes the data size and decoding cost. Our extensive evaluation shows that GROOT achieves more stable but faster frame rates compared to any previous method to stream and visualize volumetric videos on mobile devices.
Published: 2020

26. Cost Estimation for Configurable Model-Driven SoC Designs Using Machine Learning

Author: Robert Wille, Keerthikumara Devarajegowda, Edoardo Mosca, Michael Werner, Lorenzo Servadei, and Wolfgang Ecker
Subjects: Structure (mathematical logic), Design stage, Cost estimate, Computer science, business.industry, 02 engineering and technology, Machine learning, computer.software_genre, External Data Representation, 020202 computer hardware & architecture, Logic synthesis, Power consumption, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), 020201 artificial intelligence & image processing, Electronic design automation, Artificial intelligence, business, computer
Abstract: The complexity of today's System on Chips (SoCs) forces designers to use higher levels of abstractions. Here, early design decisions are conducted on abstract models while different configurations describe how to actually realize the desired SoC. Since those decisions severely affect the final costs of the resulting SoC (in terms of utilized area, power consumption, etc.), a fast and accurate cost estimation is essential at this design stage. Additionally, the resulting costs heavily depend on the adopted logic synthesis algorithms, which optimize the design towards one or more cost objectives. But how to structure a cost estimation method that supports multiple configurations of an SoC, implemented by use of different synthesis strategies, remains an open question. In this work, we address this problem by providing a cost estimation method for a configurable SoC using Machine Learning (ML). A key element of the proposed method is a data representation which describes SoC configurations in a way that is suited for advanced ML algorithms. Experimental evaluations conducted within an industrial environment confirm the accuracy as well as the efficiency of the proposed method.
Published: 2020

27. KDAP

Author: S. Sitharama Iyengar, Simran Setia, Amit Arjun Verma, and Neeru Dubey
Subjects: World Wide Web, Class (computer programming), Markup language, Process (engineering), Computer science, Benchmark (surveying), Knowledge building, Unavailability, External Data Representation, Representation (mathematics)
Abstract: With the success of crowdsourced portals, such as Wikipedia, Stack Overflow, Quora, and GitHub, a class of researchers is driven towards understanding the dynamics of knowledge building on these portals. Even though collaborative knowledge building portals are known to be better than expert-driven knowledge repositories, limited research has been performed to understand the knowledge building dynamics in the former. This is mainly due to two reasons; first, unavailability of the standard data representation format, second, lack of proper tools and libraries to analyze the knowledge building dynamics.We describe Knowledge Data Analysis and Processing Platform (KDAP), a programming toolkit that is easy to use and provides high-level operations for analysis of knowledge data. We propose Knowledge Markup Language (Knol-ML), a standard representation format for the data of collaborative knowledge building portals. KDAP can process the massive data of crowdsourced portals like Wikipedia and Stack Overflow efficiently. As a part of this toolkit, a data-dump of various collaborative knowledge building portals is published in Knol-ML format. The combination of Knol-ML and the proposed open-source library will help the knowledge building community to perform benchmark analysis.URL:https://github.com/descentis/kdapSupplementary Material: https://bit.ly/2Z3tZK5
Published: 2020

28. On 3d Face Attributes Analysis Using Deep Learning: A Preliminary Case Study on Gender and Ethnicity Recognition

Author: Daning Wang, Yicheng Fan, and Qijun Zhao
Subjects: Computer science, business.industry, Deep learning, Point cloud, Elevation, Pattern recognition, 02 engineering and technology, External Data Representation, 030210 environmental & occupational health, 03 medical and health sciences, 0302 clinical medicine, Benchmark (surveying), Face (geometry), Normal mapping, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Normal
Abstract: Human faces provide us with not only identity, but also de-mographic attributes like gender and ethnicity. Recognizing such attributes from 2D face images has been rapidly developed due to deep learning (DL). However, it is still unknown about the effectiveness of DL in facial attributes analysis using 3D data. This paper systematically investigates the performance of DL-based 3D face gender and ethnicity recognition from three aspects: data representation, data augmentation, and comparison to the state-of-the-art. Using two typical deep networks in the literature, five representations including point clouds, depth images, normal maps, HHA (Horizontal disparity, Height over ground, and Angle between local surface normal and gravity direction) and DAE (Depth, Azimuth and Elevation angles of surface normal) maps are compared on two benchmark databases, FRGC v2 and BU3D-FE. Data augmentation by synthesizing multiview 3D faces is proven effective in cross-database evaluation, and the proposed DAE-based deep model effectively advances the state-of-the-art.
Published: 2020

29. Spatio-temporal Conditioned Language Models

Author: Juglar Diaz
Subjects: Hierarchy, Information retrieval, Artificial neural network, Computer science, Context (language use), Language model, External Data Representation, Mobile device, Task (project management)
Abstract: The ubiquitous availability of mobile devices with GPS capabilities and the popularity of social media platforms have created a rich source for textual data with spatio-temporal information. Also, other domains like crime incident description and search engine queries, can provide spatio-temporal textual data. These data sources can be used to discover space-time related insights of human behavior. This work focuses on modeling text that is associated with a particular time and place. We extend the traditional language modeling task from natural language processing to language modeling under spatio-temporal conditions. This task definition allows us to use the same evaluation framework used in language modeling. A model for spatio-temporal text data representation should be able to capture the patterns that guide how text is generated in a spatio-temporal context. We aim to develop neural network models for language modeling conditioned on spatio-temporal variables with the ability to capture properties such as: neighborhood, periodicity and hierarchy.
Published: 2020

30. SciPuRe

Author: Martin Lentschat, Juliette Dibie-Barthelemy, Patrice Buche, Mathieu Roche, Dibie, Juliette, Mathématiques et Informatique Appliquées (MIA-Paris), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-AgroParisTech-Université Paris-Saclay, Ingénierie des Agro-polymères et Technologies Émergentes (UMR IATE), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-Centre international d'études supérieures en sciences agronomiques (Montpellier SupAgro)-Université de Montpellier (UM)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS), Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Département Environnements et Sociétés (Cirad-ES), and Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)
Subjects: [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], 0301 basic medicine, Information extraction, Text Mining, Computer science, Process (engineering), 02 engineering and technology, Representation (arts), [INFO] Computer Science [cs], computer.software_genre, External Data Representation, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], Set (abstract data type), 03 medical and health sciences, 0202 electrical engineering, electronic engineering, information engineering, False positive paradox, Information retrieval, [INFO]Computer Science [cs], Relevance (information retrieval), ComputingMilieux_MISCELLANEOUS, business.industry, Rank (computer programming), Ontological and Terminological Resource, Identification (information), 030104 developmental biology, 020201 artificial intelligence & image processing, Artificial intelligence, business, computer, Natural language processing
Abstract: Retrieving entities associated with experimental data in the textual content of scientific documents faces numbers of challenges. One of them is the assessment of the extracted entities for further process, especially the identification of false positives. We present in this paper SciPuRe (Scientific Publication Representation): a new representation of entities.The extraction process presented in this paper is driven by an Ontological and Terminological Resource (OTR). It is applied to the extraction of entities associated with food packaging permeabilities, that can be symbolic (e.g. the Packaging "low density polyethylene") or quantitative (e.g. the Temperature "25", "°C" or the H20_Permeability "4.34 * 10-3", "cm3 μm-2 d-1 kPa"). A representation of each entity, composed of a set of features, is built during the extraction process. These features can be gathered in three categories: Ontological, Lexical and Structural. The features of SciPuRe are used to compute Relevance scores that consider the different information available for each entity extracted. Such Relevance scores inform the usefulness of SciPuRe and can then be used to rank the extraction results and discard false positives.
Published: 2020

31. Introducing Data Analytics Concepts in a CS Course for Non-Majors

Author: Zhuojun Duan, Andrew Jung, and Ingrid Russell
Subjects: Important research, business.industry, Computer science, Computational thinking, ComputingMilieux_COMPUTERSANDEDUCATION, Data analysis, Software engineering, business, External Data Representation, Course (navigation)
Abstract: We present a curricular model for introducing data analytics concepts into an introductory computer science course for non-majors. This is accomplished through the design and implementation of hands-on laboratories projects using the Python programming language and associated tools. While introducing students to an important research area, we believe the use of these projects improves students' learning experiences, enabling them to apply and relate fundamental computational thinking concepts of algorithmic reasoning, data representation, and computational efficiency to data analytics problems. We present the curricular modules, as well as preliminary experiences using them.
Published: 2020

32. Automated ontology-based annotation of scientific literature using deep learning

Author: Somya D. Mohanty, Prashanti Manda, and Saed Sayedahmed
Subjects: 0303 health sciences, Jaccard index, 020205 medical informatics, business.industry, Computer science, Deep learning, 02 engineering and technology, Scientific literature, Ontology (information science), Semantics, computer.software_genre, External Data Representation, 03 medical and health sciences, Annotation, Text mining, Named-entity recognition, Semantic similarity, 0202 electrical engineering, electronic engineering, information engineering, Ontology, Artificial intelligence, business, computer, Natural language processing, 030304 developmental biology, Data integration
Abstract: Representing scientific knowledge using ontologies enables data integration, consistent machine-readable data representation, and allows for large-scale computational analyses. Text mining approaches that can automatically process and annotate scientific literature with ontology concepts are necessary to keep up with the rapid pace of scientific publishing. Here, we present deep learning models (Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) combined with different input encoding formats for automated Named Entity Recognition (NER) of ontology concepts from text. The Colorado Richly Annotated Full Text (CRAFT) gold standard corpus was used to train and test our models. Precision, Recall, F-1, and Jaccard semantic similarity were used to evaluate the performance of the models. We found that GRU-based models outperform LSTM models across all evaluation metrics. Surprisingly, considering the top two probabilistic predictions of the model for each instance instead of the top one resulted in a substantial increase in accuracy. Inclusion of ontology semantics via subsumption reasoning yielded modest performance improvement.
Published: 2020

33. An Evaluation of Methods of Compressing Doubles

Author: Jacob Spiegel
Subjects: Computer engineering, Series (mathematics), Computer science, Compression (functional analysis), External Data Representation, Throughput (business), Data compression
Abstract: Data compression is a problem with far-reaching implications across science and industry. In the era of big data, methods for efficient compression are crucial to achieve compact data representation, low-latency data transfers, and high- throughput during query execution. Due to the explosion of Internet-of-Things applications, a large portion of this data is in the form of double-precision floating-point numbers. Despite the plethora of methods for compression, a comprehensive evaluation across real-world data and applications is still missing. In this paper, we perform such a comparison of methods and evaluate their performance in terms of compression ratio and throughput achieved across two dataset repositories of time series and featurized machine-learning problems, as well as on a dataset of machine logs.
Published: 2020

34. Effective Data Versioning for Collaborative Data Analytics

Author: Silu Huang
Subjects: SQL, Information retrieval, Relational database, Computer science, business.industry, Data management, Data transformation, Information repository, External Data Representation, Query language, Data model, Data analysis, business, computer, computer.programming_language
Abstract: With the massive proliferation of datasets in a variety of sec-tors, data science teams in these sectors spend vast amounts of time collaboratively constructing, curating, and analyzing these datasets. Versions of datasets are routinely generated during this data science process, via various data processing operations like data transformation and cleaning, feature engineering and normalization, among others. However, no existing systems enable us to effectively store, track, and query these versioned datasets, leading to massive redundancy in versioned data storage and making true collaboration and sharing impossible. In my PhD thesis, we develop solutions for versioned data management for collaborative data analytics. In the first part of my dissertation, we extend a relational database to support versioning of structured data. Specifically, we build a system, OrpheusDB, on top of a relational database with a carefully designed data representation and an intelligent partitioning algorithm for fast version control operations. OrpheusDB inherits much of the same benefits of relational databases, while also compactly storing, keeping track of, and recreating versions on demand. However, OrpheusDB implicitly makes a few assumptions, namely that:(a) the SQL assumption: a SQL-like language is the best fit for querying data and versioning information;(b) the structural assumption: the data is in a relational for-mat with a regular structure;(c) the from-scratch assumption: users adopt OrpheusDB from the very beginning of their project and register each data version along with full meta-data in the system. In the second part of my dissertation, we remove each of these assumptions, one at a time. First, we remove the SQL assumption and propose a generalized query language for querying data along with versioning and provenance information. Second, we remove the structural assumption and develop solutions for compact storage and fast retrieval of arbitrary data representations [4]. Finally, we remove the "from-scratch" assumption, by developing techniques to infer lineage relationships among versions residing in an existing data repository.
Published: 2020

35. Outlier detection based on sparse coding and neighbor entropy in high-dimensional space

Author: Meng Chow, Siyu Shao, and Ping Gu
Subjects: Computer science, business.industry, 020206 networking & telecommunications, Pattern recognition, 02 engineering and technology, External Data Representation, Outlier, 0202 electrical engineering, electronic engineering, information engineering, Entropy (information theory), 020201 artificial intelligence & image processing, Anomaly detection, Artificial intelligence, business, Linear combination, Neural coding, High dimensional space
Abstract: Outlier detection is an important branch in data mining and plays a vital role in broad range of applications including network-traffic anomaly detection, credit fraud prevention, etc. Based on the assumption that dataset can be approximately reconstructed by linear combinations of dictionary atoms, some detection algorithms initially project the data to a higher dimensional manifold such that data representation becomes sparse. Unlike previous sparse coding based approaches, our method SNOD (Sparse coding and Neighbor entropy based Outlier Detection) can detect local and global outliers and construct neighborhood in a self-manner. Finally, the outlier score of each sample using local reconstruction coefficients is computed. Experiments on several benchmark datasets and the comparison to the state-of-the-art methods validate the advantages of our algorithm.
Published: 2020

36. A CS Course for Non-Majors Based on the Arduino Platform

Author: Ingrid Russell, Aaron Gold, and Carolyn Pe Rosiene
Subjects: 0209 industrial biotechnology, Computer science, Computational thinking, 05 social sciences, 050301 education, 02 engineering and technology, External Data Representation, Course (navigation), 020901 industrial engineering & automation, Human–computer interaction, Arduino, Active learning, ComputingMilieux_COMPUTERSANDEDUCATION, 0503 education
Abstract: We present a model for enhancing an introductory computer science course for non-majors through the use of the Arduino platform. We have developed and tested curricular modules and associated hands-on laboratories for this model. The use of the highly visual and interactive Arduino system has improved students' learning experiences, enabling them to apply and relate fundamental computational thinking concepts of algorithmic reasoning, data representation, and computational efficiency to real-world problems. Assessment results show that the approach has been effective. We present the curricular modules, our experiences using them, as well as assessment results.
Published: 2020

37. ECG sonification

Author: Shaid Hasan, Iqbal Kabir, and Pratic A Muntakim
Subjects: 0303 health sciences, 03 medical and health sciences, Computer science, Human–computer interaction, Sonification, Cardiac pathology, 0202 electrical engineering, electronic engineering, information engineering, Data analysis, 020201 artificial intelligence & image processing, 02 engineering and technology, External Data Representation, 030304 developmental biology, Visualization
Abstract: Easy detection of electrocardiogram (ECG) data is highly required in modern clinical system in case of different diseases. The present existing technique to represent and analysis data is visualization. Another alternative way of data representation known as sonification can make a revolutionary development in many clinical applications. In this work, we have applied sonification technique on ECG dataset and demonstrated a user study on 20 undergraduate students for diagnosis of cardiac pathologies. We have also made a user study comparison between sonification and visualization technique. Our study can be the foundation in further sonification and medical researches.
Published: 2019

38. Generation of Test Cases for Testing SuperSQL

Author: Motomichi Toyama, Amulya Bathini, and Kento Goto
Subjects: SQL, computer.internet_protocol, Programming language, Computer science, Relational database, Extension (predicate logic), Query language, computer.software_genre, External Data Representation, Test case, Nesting (computing), computer, XML, computer.programming_language
Abstract: SuperSQL is an extension of SQL which generates data in various formats like HTML, PDF, XML, among many others. The same data is represented in different forms according to the user, due to which it is called a data representation and publishing language. This research is to provide help in testing the SuperSQL processor. In this study, possible test cases that are used for testing the SuperSQL system are generated using a combinatorial algorithm that was constructed. This algorithm generated a combinatorially explosive number of test cases that are required for testing the SuperSQL system. A software tool called SStest was made to exhibit these test cases for use and also to manage (execute, add, delete) these test cases.
Published: 2019

39. A generalized semantic representation for procedural generation of rooms

Author: Balint, J.T., Bidarra, Rafael, Khosmood, Foaad, Pirker, Johanna, Apperley, Thomas, and Deterding, Sebastian
Subjects: Theoretical computer science, Computer science, Node (networking), 05 social sciences, Bayesian network, Data representation, 050801 communication & media studies, 020207 software engineering, 02 engineering and technology, 3D content generation, Procedural content generation, Semantics, Object (computer science), External Data Representation, 0508 media and communications, 0202 electrical engineering, electronic engineering, information engineering, Graph (abstract data type), Representation (mathematics), Factor graph
Abstract: Procedural generation of rooms aims to create virtual environments that mimic common patterns found in real-world indoor locations, like offices or bedrooms. Graph-based models (e.g. factor graphs or Bayesian networks) have often been used to represent typical location's objects and their occurrence likelihood (nodes), as well as their inter-relationships (edges). Previous methods have struggled to represent object semantics in their graph nodes; specifically, they fail to fully and effectively support notions as abstractions (e.g. generic seat instead of chair) and replication (e.g. cups instead of cup). We propose a generalized representation and use for object semantics that overcomes the above limitations of graph-based models in the procedural generation of rooms. This node representation handles semantics as attributes, and clearly distinguishes the contribution of the attributes on the node from the potential effects of the node on the whole graph. We illustrate the additional expressive power of the resulting graph-based model for room generation, and show that it subsumes previous models as particular cases.
Published: 2019

40. An Efficient Intrusion Detection Model for Edge System in Brownfield Industrial Internet of Things

Author: Muna Al-Hawawreh, Frank den Hartog, and Elena Sitnikova
Subjects: Computer science, business.industry, Deep learning, Supervised learning, 020206 networking & telecommunications, 02 engineering and technology, Intrusion detection system, Machine learning, computer.software_genre, External Data Representation, Brownfield, Control system, 0202 electrical engineering, electronic engineering, information engineering, Unsupervised learning, 020201 artificial intelligence & image processing, Artificial intelligence, Enhanced Data Rates for GSM Evolution, business, computer
Abstract: The Industrial Internet of Things (IIoT) is bringing control systems online leading to significant innovation in industry and business. However, this development also comes with new cybersecurity threats. As much of the value of IIoT systems resides at the edge tier, this makes them potentially desired targets for attackers. Protecting edge physical systems by monitoring them and identifying malicious activities based on an efficient detection model is therefore of utmost importance. This paper proposes a detection model based on deep learning techniques that can learn and test using data collected from Remote Telemetry Unit (RTU) streams of gas pipeline system. It utilizes the sparse and denoising auto-encoder methods for unsupervised learning and deep neural networks for supervised learning to produce a high-level data representation from unlabeled and noisy data. Our results show that the proposed model achieves superior performance in identifying malicious activities.
Published: 2019

41. G2Q: Haskell constraint solving

Author: Ruzica Piskac, William T. Hallahan, and Anton Xue
Subjects: Programming language, Computer science, Interface (Java), 02 engineering and technology, computer.software_genre, Symbolic execution, External Data Representation, Constraint (information theory), 020204 information systems, Satisfiability modulo theories, Type safety, 0202 electrical engineering, electronic engineering, information engineering, Constraint programming, 020201 artificial intelligence & image processing, Haskell, computer, computer.programming_language
Abstract: Constraint solvers give programmers a useful interface to solve challenging constraints at runtime. In particular, SMT solvers have been used for a vast variety of different, useful applications, ranging from strengthening Haskell's type system to verifying network protocols. Unfortunately, interacting with constraint solvers directly from Haskell requires a great deal of manual effort. Data must be represented in and translated between two forms: that understood by Haskell, and that understood by the SMT solver. Such a translation is often done via printing and parsing text, meaning that any notion of type safety is lost. Furthermore, direct translations are rarely sufficient, as it typically takes many iterations on a design in order to get optimal -- or even acceptable -- performance from a SMT solver on large scale problems. This need for iteration complicates the translation issue: it is easy to introduce a runtime bug and frustrating to fix said bug. To address these problems, we introduce a new constraint solving library, G2Q. G2Q includes a quasiquoter that allows solving constraints written in Haskell itself, thus unifying data representation, ensuring correct typing, and simplifying development iteration. We describe the API to our library and its backend. Rather than a direct translation to SMT formulas, G2Q makes use of the G2 symbolic execution engine. This allows G2Q to solve problems that are out of scope when directly encoded as SMT formulas. Finally, we demonstrate the usability of G2Q via four example programs.
Published: 2019

42. Algorithmic Fairness

Author: Suresh Venkatasubramanian
Subjects: Point (typography), 020204 information systems, Field (Bourdieu), Accountability, 0202 electrical engineering, electronic engineering, information engineering, Key (cryptography), Fairness measure, 020201 artificial intelligence & image processing, 02 engineering and technology, External Data Representation, Transparency (behavior), Interpretability, Epistemology
Abstract: What happens when we replace - or augment - human decision-making with algorithms? This is a simple question, but the answers now define a new field of study - a field that I call algorithmic fairness, and that spans issues of fairness, discrimination, accountability, transparency, interpretability and responsibility, and so much more. While some of the early work in the area came out of data mining and machine learning, the field is now truly transdisciplinary, with contributions from all across computer science, as well as from all disciplines that touch on aspects of society - whether it be economics, philosophy, sociology, political science, or communication. In this tutorial, I'll try to do three things: I'll survey the main questions and some of the key insights we've developed over the years. I'll explain the web of connections between the technical and the social disciplines that make up this area, and I'll point to exciting directions that remain to be explored in both technical and social dimensions. Along the way I hope to illustrate what I think are some interesting "collisions" between computer science and the social sciences, and call for a reimagining of core ideas in our field, including the very idea of how we think about data representation.
Published: 2019

43. Recipes for Breaking Data Free

Author: Jordan Wirfs-Brock
Subjects: Process (engineering), Computer science, 05 social sciences, Activity tracker, Control (management), 020207 software engineering, 02 engineering and technology, External Data Representation, World Wide Web, Sonification, 0202 electrical engineering, electronic engineering, information engineering, 0501 psychology and cognitive sciences, Biometric data, 050107 human factors, Meaning (linguistics)
Abstract: How do the specific, predefined ways data brokers like Garmin or Fitbit render personal biometric data for us hinder-or enhance-our ability to find meaning in our data? Using a Garmin activity tracker as a platform, I present a series of recipes for alternative modes to experience personal data. Recipes are sets of instructions people can follow or remix to create personal, novel data interactions. These recipes highlight how the under-used medium of sound can be a creative material for producing meaning. When we allow our personal data to be brokered by companies like Garmin, we exchange the hidden labor of data representation for an easy-to-access personal data experience; but in doing so, we forfeit the ability to do unexpected things with our data. By exposing these tradeoffs, these recipes encourage us to reclaim control of our own data and embrace the effortful process of data representation as a sense-making practice.
Published: 2019

44. LABIOS

Author: Anthony Kougkas, Xian-He Sun, Hariharan Devarajan, and Jay Lofstead
Subjects: Elasticity (cloud computing), Asynchronous communication, business.industry, Analytics, Computer science, Distributed computing, Computer data storage, Big data, Provisioning, business, External Data Representation, Bridging (programming)
Abstract: In the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a proliferation of different storage devices and software stacks, many of which have conflicting requirements. In this paper, we investigate how to support a wide variety of conflicting I/O workloads under a single storage system. We introduce the idea of a Label, a new data representation, and, we present LABIOS: a new, distributed, Label- based I/O system. LABIOS boosts I/O performance by up to 17x via asynchronous I/O, supports heterogeneous storage resources, offers storage elasticity, and promotes in-situ analytics via data provisioning. LABIOS demonstrates the effectiveness of storage bridging to support the convergence of HPC and BigData workloads on a single platform.
Published: 2019

45. Intrusion detection using dimensionality reduced soft matrix

Author: B. Uma and Arun Nagaraja
Subjects: Computer science, Network security, business.industry, Dimensionality reduction, Pattern recognition, Intrusion detection system, External Data Representation, ComputingMethodologies_PATTERNRECOGNITION, Preprocessor, Artificial intelligence, business, Literature survey, Classifier (UML), Curse of dimensionality
Abstract: The task of identifying attacks in the real time networks has become a recent point of focus in network security. A classifier performance depends on the input data. Preprocessing dataset fed as input for classifier is usually required to make the dataset suitable for efficient processing. Preprocessing is first stage of transforming data for better data representation. Dimensionality reduction is usually applied on datasets to reduce the space and time complexities and to facilitate easy handling of large datasets. This paper gives an approach for performing dimensionality reduction of input dataset. The resulting one is the soft matrix. This dimensional reduced input dataset is fed as input to classifier. The paper restricts to outlining the algorithm of proposed approach.
Published: 2019

46. Boosted Race Trees for Low Energy Classification

Author: Georgios Tzimpragos, Dmitri B. Strukov, Timothy Sherwood, Dilip Vasudevan, and Advait Madhavan
Subjects: 010302 applied physics, Theoretical computer science, Computer science, Dataflow, Decision tree, 02 engineering and technology, External Data Representation, 01 natural sciences, Space exploration, 020202 computer hardware & architecture, CMOS, Frequency domain, 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, Architecture, Implementation
Abstract: When extremely low-energy processing is required, the choice of data representation makes a tremendous difference. Each representation (e.g. frequency domain, residue coded, log-scale) comes with a unique set of trade-offs --- some operations are easier in that domain while others are harder. We demonstrate that race logic, in which temporally coded signals are getting processed in a dataflow fashion, provides interesting new capabilities for in-sensor processing applications. Specifically, with an extended set of race logic operations, we show that tree-based classifiers can be naturally encoded, and that common classification tasks can be implemented efficiently as a programmable accelerator in this class of logic. To verify this hypothesis, we design several race logic implementations of ensemble learners, compare them against state-of-the-art classifiers, and conduct an architectural design space exploration. Our proof-of-concept architecture, consisting of 1,000 reconfigurable Race Trees of depth 6, will process 15.2M frames/s, dissipating 613mW in 14nm CMOS.
Published: 2019

47. Arabic Sentiment Analysis based on Topic Modeling

Author: Abdelmonaime Lachkar and Mohammed Bekkali
Subjects: Topic model, Information retrieval, Conceptualization, Computer science, Scale (chemistry), Sentiment analysis, Social media, Representation (arts), External Data Representation, Semantics
Abstract: Users of social media generate a huge volume of reviews and comments. These reviews and comments express user's opinions about different topics. As a result, there is a great need to understand and classify these reviews. Sentiment Analysis Systems is a good way to overcome this problem. Reviews are considered as short texts and they are different from traditional documents without enough contextual information. To address this issue, we propose an efficient representation for short text based on concepts instead of terms, which transforms the data representation into a shorter, more compact, and more predictive one. However, for the Arabic language, the majority of semantic resources are incomplete projects; this may presents a serious problem about the coverage ratio of the Arabic language compared with other Languages. To overcome this problem and starting with the assumption that terms belonging to same topic share many semantic links in the same dataset, their corresponding concepts will share the same semantics links in the same dataset. We suggest integrating Topic Modeling as a tool to bring together terms with the same semantic links. The proposed method has been tested and evaluated using the Large Scale Arabic Book Reviews Dataset and the obtained results illustrate the interest and efficiency of our contribution.
Published: 2019

48. CluWords

Author: Felipe Viegas, Christian Gomes, Marcos André Gonçalves, Washington Luiz, Sergio Canuto, Sabir Ribas, Leonardo Rocha, and Thierson Couto Rosa
Subjects: Topic model, Word embedding, Computer science, business.industry, 02 engineering and technology, Space (commercial competition), computer.software_genre, External Data Representation, Matrix decomposition, 020204 information systems, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, business, Representation (mathematics), Cluster analysis, computer, Natural language processing, Word (computer architecture)
Abstract: In this paper, we advance the state-of-the-art in topic modeling by means of a new document representation based on pre-trained word embeddings for non-probabilistic matrix factorization. Specifically, our strategy, called CluWords, exploits the nearest words of a given pre-trained word embedding to generate meta-words capable of enhancing the document representation, in terms of both, syntactic and semantic information. The novel contributions of our solution include: (i)the introduction of a novel data representation for topic modeling based on syntactic and semantic relationships derived from distances calculated within a pre-trained word embedding space and (ii)the proposal of a new TF-IDF-based strategy, particularly developed to weight the CluWords. In our extensive experimentation evaluation, covering 12 datasets and 8 state-of-the-art baselines, we exceed (with a few ties) in almost cases, with gains of more than 50% against the best baselines (achieving up to 80% against some runner-ups). Finally, we show that our method is able to improve document representation for the task of automatic text classification.
Published: 2019

49. A comparative study on two XML editors (oxygon and ultraedit)

Author: QiBin Kang, Kien Tsong Chau, ZiHan Wei, and YouXuan Li
Subjects: World Wide Web, Markup language, Syntax (programming languages), SIMPLE (military communications protocol), Computer science, computer.internet_protocol, Functional features, ComputingMethodologies_DOCUMENTANDTEXTPROCESSING, External Data Representation, Variety (linguistics), computer, Research question, XML
Abstract: Extensible Markup Language (XML) is a simple, universal format supported by the W3C designed to data representation, exchange and transition on the web between different applications. XML editor is a markup language editor that provides a platform with functional features to facilitate XML editing. There is a variety of XML editors available in the market. Amongst the editors, the two most popular XML editors are Oxygen XML Editor and UltraEdit. Both of them offer easy functionalities such as syntax validation, auto-completion, multiple tabs, advanced search and replacement tools. However, novice programmers find difficult to choose suitable editors. For this reason, this paper aims to compare the features of Oxygen XML Editor and UltraEdit so that the users are aware of their capability and capacity, leading them to select an editor that can fulfill their requirements. This paper begins discussions on the objectives of the research, research question, literature review, research methodology and findings.
Published: 2019

50. Goal-based Ontology Creation for Natural Language Querying in SAP-ERP Platform

Author: Senthil Mani, Diptikalyan Saha, Neelamadhav Gantayat, and Jaydeep Sen
Subjects: SQL, Computer science, Database schema, 020207 software engineering, 02 engineering and technology, External Data Representation, Human–computer interaction, 020204 information systems, Schema (psychology), 0202 electrical engineering, electronic engineering, information engineering, Graph reduction, User interface, Mobile device, computer, Natural language, computer.programming_language
Abstract: The omnipresence of mobile devices coupled with recent advances in automatic speech recognition capabilities has led to a growing demand for natural language querying (NLQ) interfaces to retrieve information from data repositories. Going beyond consumer tools like Siri and Cortana towards industry settings, natural language interaction has been observed to be the next generation user interface to business applications (such as ERP systems) after GUI and touch-based UIs on mobile. It enables business users to ask questions in natural language without needing to have any programming knowledge (such as ABAP or SQL) and knowledge about the data representation mechanisms (such as data schema). State of the art NLQ systems such as ATHENA represents the domain schema in the form of an ontology and performs interpretation using the ontology. The primary challenge in developing a NLQ system for querying data in SAP-ERP is its large ontology which results in an inefficient interpretation. We propose a Steiner tree based novel algorithm which generates a relatively smaller goal-oriented ontology which does not affect the NLQ interpretation. We investigate practical ways to address the problem of precise interpretation generation and introduce an algorithm for Lazy Inclusion. We present the effectiveness of the proposed techniques in the SAP-ERP domain with a set of benchmark natural language questions.
Published: 2019

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

185 results on '"External Data Representation"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources