36 results on '"Prateek Jain"'
Search Results
2. Reduction of Data Risk in Cloud Computing Through the Front Layer Security Mechanism
- Author
-
Rahul a and Prateek Jain
- Subjects
Multidisciplinary ,Cloud computing security ,business.industry ,Computer science ,Interface (computing) ,Cloud computing ,Cryptography ,Client-side ,Computer security ,computer.software_genre ,Encryption ,Timestamp ,business ,computer ,Cloud storage - Abstract
Background/objectives: The centralised computing defines cloud computation as centralised model which is responsible to provide the network access on-request basis to a shared pool of interconnected computation resources linked together. The same is done to provision in a quick manner and to be released with the minimal management effort or cloud provider interaction. It is a platform that is built to customer convenient interface with IT framework. The promise made by the cloud computing, especially shared cloud can be decked by security infraction which are indefeasible. Methods/statistical analysis: An encryption and decryption algorithms have been proposed so as to maintain the security of the private data that is being transmitted in the cloud scenario. Findings: The increasing necessity for the secure cloud storage being handled in a centralised fashion and thereby the enticing effects of the cryptography at client side help us in prioritising and combine them together, thereby naming an innovative mechanism for the private data as third-party security and regulation issues. Improvements/applications: The proposed work provides a more secure framework layer as security to the data, but still there are various aspects which need to be addressed in future. Proposed work is suitable only for selected private data rather than a huge data input by user on cloud, we can extend this work for different types of modules with timestamp encryption converted time and efficiency in future. In future, the technique of Each Word Secure Authentication would be added to improve cloud security on behalf of private. Based on the security solutions, we will demonstrate a secure new client-side framework with advanced algorithm implementations in this article.Keywords: Cloud Enhancement, EWSA, Cloud Server, Secure Cloud Platform
- Published
- 2019
3. Towards the Prevention of Car Hacking: A Threat to Automation Industry
- Author
-
Vasudha Arora, Vaibhav Jha, Prateek Jain, and Pooja Sharma
- Subjects
Multidisciplinary ,business.industry ,Computer science ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Attack surface ,Computer security ,computer.software_genre ,Automation ,CAN bus ,law.invention ,Bluetooth ,law ,SAFER ,Information system ,The Internet ,business ,computer ,Hacker - Abstract
Background/objectives: Connectivity provides a safer environment, but it also acts as a backbone to provide attack surface to hackers. There are millions of cars on the road today, and so many are expected to be in future; there might be a risk to the passengers, vehicle drivers, etc. Methods/statistical analysis: This study discusses the issue of car hacking which is one of the real threats to automobile as well as automation, and how we can prevent it by studying the details about the controller area network (CAN) bus architecture so that the auto manufacturer gives more emphasis to developing a secure vehicular information system. Findings: Hackers gain access to the car system via the internet, Bluetooth, etc. As much as a car is automated, it is much more vulnerable to cyber-attack. When a car is connected to the internet, it provides access to the vehicle’s delicate CAN bus. Hackers can hijack non-safety and safety-critical functions such as steering, accelerator, brake and clutches by sending commands. Improvements/applications: This study gives a general overview of how we can validate the security features of the vehicle so that we can secure our vehicle from black hat hackers, resulting in saving millions of people who could be a victim of such menacing cyber-attacks.Keywords: Car Hacking, CAN Bus, Cyber-attacks, OBD Hacking
- Published
- 2019
4. Nonconvex Optimization for Signal Processing and Machine Learning [From the Guest Editors]
- Author
-
Gesualdo Scutari, Anthony Man-Cho So, Wing-Kin Ma, and Prateek Jain
- Subjects
Signal processing ,Optimization algorithm ,business.industry ,Computer science ,Applied Mathematics ,Regular polygon ,Machine learning ,computer.software_genre ,Software deployment ,Signal Processing ,Convex optimization ,Special section ,Artificial intelligence ,Electrical and Electronic Engineering ,Focus (optics) ,business ,Convex function ,computer - Abstract
The articles in this special section focus on nonconvex optimization for signal processing and machine learning. Optimization is now widely recognized as an indispensable tool in signal processing (SP) and machine learning (ML). Indeed, many of the advances in these fields rely crucially on the formulation of suitable optimization models and deployment of efficient numerical optimization algorithms. In the early 2000s, there was a heavy focus on the use of convex optimization techniques to tackle SP and ML applications. This is largely due to the fact that convex optimization problems often possess favorable theoretical and computational properties and that many problems of practical interest have been shown to admit convex formulations or good convex approximations.
- Published
- 2020
5. Data Analysis FIFA World Cup Data Set
- Author
-
Palak Mittal, Nidhi Garg, and Dr Prateek Jain, and Mansi Sharma
- Subjects
Data set ,Multidisciplinary ,Information retrieval ,business.industry ,Computer science ,Analytics ,Big data ,Attendance ,Python (programming language) ,business ,computer ,Readability ,computer.programming_language - Abstract
Background/objectives: To analyze a data set related to FIFA World Cup using a suitable method. Methods/statistical analysis: In this study we have taken up the data sets of the FIFA World Cup and analyzed them using Python and R programming. The analysis focused on: a) which team conceded a greater number of goals than they scored; b) the percentage of goals scored in the First Half, Second Half, Extra Time, and Penalty Shootout; the c) highest average attendance in a particular stage of the match. Findings: Python seems to be an emerging programming language and is thriving to a great extent. Due to its advantages of easy-to-learn syntax, improved readability, object-oriented programming support, integration support, and extensive libraries, this language is adaptable in many fields and hence increasing its applications. Keywords: Big Data, Analytics, Data Set, API, Machine Learning
- Published
- 2019
6. Distributional Semantics Meets Multi-Label Learning
- Author
-
Vivek Gupta, Rahul Wadbude, Nagarajan Natarajan, Harish Karnick, Prateek Jain, and Piyush Rai
- Subjects
Computer science ,business.industry ,02 engineering and technology ,General Medicine ,Machine learning ,computer.software_genre ,Variety (cybernetics) ,Set (abstract data type) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Benchmark (computing) ,Embedding ,020201 artificial intelligence & image processing ,Distributional semantics ,Artificial intelligence ,Paragraph ,business ,computer ,Word (computer architecture) - Abstract
We present a label embedding based approach to large-scale multi-label learning, drawing inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings. Besides leading to a highly scalable model for multi-label learning, our approach highlights interesting connections between label embedding methods commonly used for multi-label learning and paragraph embedding methods commonly used for learning representations of text data. The framework easily extends to incorporating auxiliary information such as label-label correlations; this is crucial especially when many training instances are only partially annotated. To facilitate end-to-end learning, we develop a joint learning algorithm that can learn the embeddings as well as a regression model that predicts these embeddings for the new input to be annotated, via efficient gradient based methods. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed models perform favorably as compared to state-of-the-art methods for large-scale multi-label learning.
- Published
- 2019
7. Designing the Empathetic Research IoT Network (ERIN) Chatbot for Mental Health Resources
- Author
-
Soussan Djamasbi, Prateek Jain, Christopher J. Chagnon, and Brandon Persons
- Subjects
business.product_category ,Smart phone ,business.industry ,Computer science ,computer.software_genre ,Mental health ,Chatbot ,Test (assessment) ,World Wide Web ,Service experience ,User experience design ,Laptop ,business ,Internet of Things ,computer - Abstract
Grounded in the user experience driven innovation (UXDI) framework, we designed and developed a chatbot, ERIN, to help college students with finding resources about sensitive issues such as mental health and Title IX. ERIN was designed to be accessed via different devices. Throughout the design process, the analysis of user interviews suggested that the service experience of the chatbot and its adoption may strongly be influenced by the medium through which it is accessed. To test this possibility, we conducted an experiment comparing user reactions to the chatbot using two different devices: laptop and smart phone. The preliminary results showed that user experience of the chatbot was almost significantly better in the mobile group and people in that group were almost significantly more likely to adopt the chatbot. These results and their implications are discussed.
- Published
- 2021
8. Secure-iGLU: A Secure Device for Noninvasive Glucose Measurement and Automatic Insulin Delivery in IoMT Framework
- Author
-
Amit Joshi, Prateek Jain, and Saraju P. Mohanty
- Subjects
021110 strategic, defence & security studies ,Authentication ,Computer science ,Human life ,Fingerprint (computing) ,Physical unclonable function ,0211 other engineering and technologies ,Insulin delivery ,Wearable computer ,020206 networking & telecommunications ,02 engineering and technology ,Continuous sensing ,Computer security ,computer.software_genre ,0202 electrical engineering, electronic engineering, information engineering ,computer - Abstract
The growth of healthcare technologies has made a great impact on human life for the last few years. Various innovations in implantable and wearable medical devices improve the quality of life. Internet-of-Medical-Things (IoMT) based smart healthcare with continuous sensing, connectivity, and automatic medication is the latest trend. With the growth of technologies and connectivity, the security of these devices is a growing concern. The security of medical devices is important as the security compromise may lead to critical situations. This paper explores the security aspect of the IoMT system with a non-invasive glucose monitoring device integrated with an insulin delivery system (called iGLU) as a specific example. We call this secure system as Secure-iGLU. The paper presents a Hardware-Assisted Security (HAS) paradigm using Physical Unclonable Function (PUF) to design Secure-iGLU. PUF is a useful primitive to generate fingerprint of the hardware, and it has a great potential to mitigate the security problem of iGLU. The simulation results confirm the security of our Secure-iGLU using PUF in IoMT with safe insulin delivery system.
- Published
- 2020
9. Understanding Financial Transaction Documents using Natural Language Processing
- Author
-
Prateek Jain, Kunal Verma, Aniket Gaikwad, and Pramod Gadde
- Subjects
Feature engineering ,business.industry ,Process (engineering) ,Purchase order ,Computer science ,05 social sciences ,02 engineering and technology ,Audit ,computer.software_genre ,Information extraction ,020204 information systems ,Financial transaction ,0202 electrical engineering, electronic engineering, information engineering ,0501 psychology and cognitive sciences ,Artificial intelligence ,business ,computer ,Database transaction ,Natural language processing ,Sentence ,050104 developmental & child psychology - Abstract
In this paper, we share our experiences creating NLP based AI platform for finance - Appzen (http://www.appzen.com). AppZen's auditing technology is being utilized by over 500 enterprise customers including multiple Fortune 500 companies for auditing employee expenses. AppZen's technology can process, analyze and identify relationships between various kinds of transaction documents such as - receipts, invoices, contracts and purchase orders. Each type of transaction document requires custom processing and analysis due to the diversity in language and structure of the document. Contracts typically require deep understanding of the content such as identifying sentence structures, identifying entities and relationships between them compared to receipts and invoices, which are somewhat semi-structured and require a different kind of processing. We elaborate on the challenges we have experienced and use of NLP in conjunction with a lightweight semantic layer to alleviate these challenges.
- Published
- 2019
10. An IOT based Mechanism for Automatic Classroom Electricity Saving
- Author
-
Prateek Jain, Kaustav Mani Pathak, Priyanka Jain, Swati Yadav, and Veena Mittal
- Subjects
Computer science ,business.industry ,010401 analytical chemistry ,Computer security ,computer.software_genre ,01 natural sciences ,Carelessness ,0104 chemical sciences ,Microcontroller ,Work (electrical) ,Order (exchange) ,Control system ,medicine ,Electricity ,medicine.symptom ,business ,Internet of Things ,computer ,Mechanism (sociology) - Abstract
Human have been communicating with each other for ages in order to solve problems and do complex work but with the advancement of technology, IoT or Internet of Things promises a great future for the communication between machines that can be implemented to perform many tasks that can benefit the human community. Many problems whether small or big can be solved with the help of IoT. One such problems is saving electricity, which is one of the major responsibilities of everyone yet very few people could actually do so be it carelessness or lack of time. This paper aims to solve the problem of electricity wastage in school and college classrooms by using an automatic electricity control system for a room made by applying IoT sensors and microcontrollers.
- Published
- 2019
11. Prioritized Service Scheme with QOS Provisioning in a Cloud Computing System
- Author
-
Prateek Jain
- Subjects
Scheme (programming language) ,Service (business) ,Qos provisioning ,Utility computing ,business.industry ,Computer science ,Distributed computing ,Cloud computing ,Provisioning ,business ,computer ,Computer network ,computer.programming_language - Published
- 2015
12. FlashProfile: A Framework for Synthesizing Data Profiles
- Author
-
Oleksandr Polozov, Saswat Padhi, Prateek Jain, Daniel Perelman, Sumit Gulwani, and Todd Millstein
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Similarity (geometry) ,Computer science ,02 engineering and technology ,outlier detection ,computer.software_genre ,Machine Learning (cs.LG) ,program synthesis ,Set (abstract data type) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Safety, Risk, Reliability and Quality ,business.industry ,020207 software engineering ,Syntax ,pattern learning ,Task (computing) ,Identification (information) ,pattern profiles ,data profiling ,Artificial intelligence ,business ,computer ,hierarchical clustering ,Software ,Natural language processing - Abstract
We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across $153$ tasks over $75$ large real datasets, we observe a median profiling time of only $\sim\,0.7\,$s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill., 28 pages, SPLASH (OOPSLA) 2018
- Published
- 2017
13. Non-convex Optimization for Machine Learning
- Author
-
Prateek Jain and Purushottam Kar
- Subjects
FOS: Computer and information sciences ,Optimization problem ,Computer science ,media_common.quotation_subject ,Inference ,Machine Learning (stat.ML) ,Machine learning ,computer.software_genre ,Machine Learning (cs.LG) ,Artificial Intelligence ,Statistics - Machine Learning ,FOS: Mathematics ,Tensor ,Function (engineering) ,Mathematics - Optimization and Control ,media_common ,business.industry ,Rank (computer programming) ,Human-Computer Interaction ,Computer Science - Learning ,Optimization and Control (math.OC) ,Relaxation (approximation) ,Artificial intelligence ,Gradient descent ,Heuristics ,business ,computer ,Software - Abstract
A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. The monograph will lead the reader through several widely used non-convex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems., Comment: The official publication is available from now publishers via http://dx.doi.org/10.1561/2200000058
- Published
- 2017
- Full Text
- View/download PDF
14. Healthsurance – Mobile App for Standardized Electronic Health Records Database
- Author
-
Subhash Bhalla, Shivani Batra, Naman Jain, Sagar Bhargava, Prateek Jain, and Shelly Sachdeva
- Subjects
020205 medical informatics ,Database ,Standardization ,business.industry ,Computer science ,Reliability (computer networking) ,02 engineering and technology ,Semantic interoperability ,computer.software_genre ,Set (abstract data type) ,World Wide Web ,03 medical and health sciences ,Centralized database ,0302 clinical medicine ,0202 electrical engineering, electronic engineering, information engineering ,030212 general & internal medicine ,User interface ,business ,computer ,Graphical user interface - Abstract
With the increasing popularity of Electronic Health Records (EHRs), there arises a need to understand its importance in terms of clinical contexts for a standard based health application. Standards for semantic interoperability propose the use of archetypes for building a health application. A usual practice followed for storing of EHRs is through graphical user interfaces. Generally, user interface is static corresponding to the underlying medical concept, often made manually and are prone to errors. However, evolution in knowledge demands for dynamically generated user interfaces to reduce time, minimize cost and enhance reliability. Current research implements mobile app for standardized Electronic Health Records Database termed as HEALTHSURANCE. The application maintains its dynamic behavior through creation of graphical user interfaces at runtime by gaining knowledge from the artefacts (known as archetypes) available from standard clinical repositories (such as Clinical Knowledge Manager). This provides easy and hassle-free user operability without any need of mobile developer. A standardized format and content helps to uplift the credibility of data and maintains a uniform and specific set of constraints used to evaluate the user’s health. A generic centralized database is chosen for data storage to support evolution in clinical knowledge and to handle heterogeneity of EHRs data. Implementing mobile app based on archetype paradigm avoids reimplementation of systems, migrating databases and allows the creation of future-proof systems.
- Published
- 2017
15. Programming by Examples: PL Meets ML
- Author
-
Prateek Jain and Sumit Gulwani
- Subjects
Set (abstract data type) ,Code refactoring ,Computer science ,Programming language ,Application domain ,Data wrangling ,0202 electrical engineering, electronic engineering, information engineering ,020207 software engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,computer.software_genre ,computer - Abstract
Programming by Examples (PBE) involves synthesizing intended programs in an underlying domain-specific language from example-based specifications. PBE systems are already revolutionizing the application domain of data wrangling and are set to significantly impact several other domains including code refactoring.
- Published
- 2017
16. An Impact of Digitalized Technologies Transformation in Healthcare Using Mobile Cloud Computing
- Author
-
Rahul Sharma and Prateek Jain
- Subjects
Statement (computer science) ,Multidisciplinary ,Database ,Computer science ,business.industry ,Smart device ,Cloud computing ,computer.software_genre ,Encryption ,Mobile cloud computing ,law.invention ,Work (electrical) ,law ,Operating system ,Timestamp ,business ,computer ,Word (computer architecture) - Abstract
Objectives: Smart Device with Open Source Platform, cloud tools and technologies are at the key point of healthcare variation. E-Devices replacing on manual basis or automated medical reports/graphs, private or secure clouds provide private access to automated healthcare records and device based device collaboration tools are reforming information share among medical experts. Methods/Analysis: In proposed Work, We are using Open Source API like Google API within system for automatic improvement of the entire system rather than deploy each time system after new version of implemented platform. According to below mentioned Proposed Work, We are encrypt word with current timestamp middle number using sampling method and every time system will auto decrypt private data with same TIMESATMP technique on receiver side panel. Findings: All the findings are mentioned in the form of statements given below: Statement 1: Existing System: MICROSOFT [Working on Only WINDOWS based platform] Statement 2: Existing System: PREBUILD ALGORITHM [Easy to Decrypt with predefined decryption methods]. Novelty/Improvement: Smart Devices based healthcare applications will more advance with Open Source Based API and Private data will more secure.
- Published
- 2016
17. A Model based on Effective and Intelligent Sentiment Mining: A Review
- Author
-
Rajni Bhalla and Prateek Jain
- Subjects
Multidisciplinary ,Information retrieval ,Cover (telecommunications) ,business.industry ,Computer science ,Novelty ,02 engineering and technology ,Machine learning ,computer.software_genre ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,The Internet ,Artificial intelligence ,business ,computer ,Sentence - Abstract
Objectives: Due to proliferation of internet spammers post fake audit to embrace or minimization items. Objective is to purpose a model that can extract spam reviews and implicit reviews. Methods/Statistical Analysis: Most research concentrated on extracting just explicit said highlights. Extraction of certain angle like implicit and spam gives more proficient result even in rating too. Pattern discovery method are proposed to known different behaviors to discover spam review. Detection metrics could be used to score every survey. Findings: Because of absence of dialect builds in the sentence implicit verifiable viewpoint extraction a mind boggling issue. Most research concentrated on extracting just explicit said highlights. The major weakness of the methods are lack of gold-standard dataset,unable to achieve better accuracy. The framework builds time arrangement of number of surveys for every brand and recognizes spam audits from genuine assessments subsequent to distinguishing suspicious intervals. Novelty/Improvements: Before obtaining anything,we need to know conclusion of others. By headway of social websites, opinion settles on potential choice for customer. Even manufacturers can enhance the nature of their item. This proposed model also has the capacity to cover dominant part of the elements which are the deciding factors for the effectiveness of aspect mining framework.
- Published
- 2016
18. 9/7 IWT Domain Data Hiding in Image using Adaptive and Non Adaptive Methods
- Author
-
V. Thanikaiselvan, Tushar Bansal, Shounak Shastri, and Prateek Jain
- Subjects
Steganography tools ,Steganalysis ,Multidisciplinary ,Theoretical computer science ,Cover (telecommunications) ,Steganography ,Computer science ,business.industry ,020206 networking & telecommunications ,Cryptography ,02 engineering and technology ,computer.software_genre ,Least significant bit ,Information hiding ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Data mining ,business ,computer ,Digital watermarking - Abstract
Background/Objectives: The advancement of information exchange through internet has made it easy to transfer the exact information faster to the destination. Exchange the information safely to the destination with no alterations, there are many approaches like Cryptography, Steganography and Watermarking. Methods/Statistical Analysis: Steganography is a method of hiding a secret data in other cover medium. Digital Images are popular for cover medium than other because of their frequent use on the internet. In this paper a transform domain steganography with 9/7 Integer Wavelet Transform (IWT) is proposed. A pixel adaptive embedding method using LSB (Least Significant Bit) method is employed to increase the security of the secret data embedded in the a cover medium. Graph Theory is used to select the coefficients randomly for embedding the secret messages. Findings: It is found that the proposed method provides good security and high capacity. This algorithm can be applicable for all kinds of secret communications. Finally Results are compared with 5/3 IWT. Applications/Improvements: This method can be applied for all the secret communication applications especially Defence, Telemedicine, etc. This proposed method developed further in terms of robust against various steganalysis tools.
- Published
- 2016
19. Fast Similarity Search for Learned Metrics
- Author
-
Brian Kulis, Prateek Jain, and Kristen Grauman
- Subjects
Computer Science::Machine Learning ,Databases, Factual ,Nearest neighbor search ,Posture ,Hash function ,Machine learning ,computer.software_genre ,Pattern Recognition, Automated ,Locality-sensitive hashing ,Artificial Intelligence ,Image Interpretation, Computer-Assisted ,Humans ,Mathematics ,Mahalanobis distance ,business.industry ,Applied Mathematics ,Search engine indexing ,Equivalence of metrics ,Kernel method ,Computational Theory and Mathematics ,Metric (mathematics) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Algorithms ,Software - Abstract
We introduce a method that enables scalable similarity search for learned metrics. Given pairwise similarity and dissimilarity constraints between some examples, we learn a Mahalanobis distance function that captures the examples' underlying relationships well. To allow sublinear time similarity search under the learned metric, we show how to encode the learned metric parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality makes it infeasible to learn an explicit transformation over the feature dimensions. We demonstrate the approach applied to a variety of image data sets, as well as a systems data set. The learned metrics improve accuracy relative to commonly used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases.
- Published
- 2009
20. Comparative Study of Preprocessing and Classification Methods in Character Recognition of Natural Scene Images
- Author
-
Yash Sinha, Prateek Jain, and Nirant Kasliwal
- Subjects
business.industry ,Computer science ,Feature extraction ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Image processing ,Pattern recognition ,Optical character recognition ,computer.software_genre ,Ensemble learning ,Thresholding ,Random forest ,ComputingMethodologies_PATTERNRECOGNITION ,Histogram of oriented gradients ,Median filter ,Computer vision ,Artificial intelligence ,business ,computer - Abstract
This paper presents an approach to character recognition in natural scene images. Recognizing such text is a challenging problem in the field of Computer Vision, more than the recognition of scanned documents due to several reasons. We propose a classification technique for classifying characters based on a pipeline of image processing operations and ensemble machine learning techniques. This pipeline tackles problems where Optical Character Recognition (OCR) fails. We present a framework that comprises a sequence of operations such as resizing, grey scaling, thresholding, morphological opening and median filtering on the images to handle background clutter, noise, multi-sized and multi-oriented characters and variance in illumination. We used image pixels and HOG (Histogram of Oriented Gradients) as features to train three different models based on Nearest-Neighbour, Random Forest and Extra Tree classifiers. When the input images were pre-processed, HOG features were extracted and fed into extra tree classifier, and the model classified the characters with maximum accuracy, among the other models that we tested. The proposed steps have been experimentally proven to yield better accuracy than the present state-of-the-art classification techniques on the Chars74k dataset. In addition, the paper includes a comparative study elaborating on various image processing operations, feature extraction methods and classification techniques.
- Published
- 2015
21. Improved multiple sequence alignments using coupled pattern mining
- Author
-
Naren Ramakrishnan, Debprakash Patnaik, K. S. M. Tozammel Hossain, Srivatsan Laxman, Chris Bailey-Kellogg, and Prateek Jain
- Subjects
Multiple sequence alignment ,Basis (linear algebra) ,Base Sequence ,Applied Mathematics ,Molecular Sequence Data ,Proteins ,Biology ,computer.software_genre ,Pattern Recognition, Automated ,Statistical classification ,Pattern set mining ,Episode mining ,Sequence Analysis, Protein ,Genetics ,Key (cryptography) ,Data Mining ,Data mining ,Amino Acid Sequence ,Hidden Markov model ,computer ,Sequence Alignment ,Algorithms ,Conserved Sequence ,Biotechnology ,Sequence (medicine) - Abstract
We present alignment refinement by mining coupled residues (ARMiCoRe), a novel approach to a classical bioinformatics problem, viz., multiple sequence alignment (MSA) of gene and protein sequences. Aligning multiple biological sequences is a key step in elucidating evolutionary relationships, annotating newly sequenced segments, and understanding the relationship between biological sequences and functions. Classical MSA algorithms are designed to primarily capture conservations in sequences whereas couplings, or correlated mutations, are well known as an additional important aspect of sequence evolution. (Two sequence positions are coupled when mutations in one are accompanied by compensatory mutations in another). As a result, better exposition of couplings is sometimes one of the reasons for hand-tweaking of MSAs by practitioners. ARMiCoRe introduces a distinctly pattern mining approach to improving MSAs: using frequent episode mining as a foundational basis, we define the notion of a coupled pattern and demonstrate how the discovery and tiling of coupled patterns using a max-flow approach can yield MSAs that are better than conservation-based alignments. Although we were motivated to improve MSAs for the sake of better exposing couplings, we demonstrate that our MSAs are also improvements in terms of traditional metrics of assessment. We demonstrate the effectiveness of ARMiCoRe on a large collection of data sets.
- Published
- 2014
22. Automatic Domain Identification for Linked Open Data
- Author
-
Amit P. Sheth, Pascal Hitzler, Prateek Jain, and Sarasi Lalithsena
- Subjects
Computer science ,business.industry ,Linked data ,computer.software_genre ,Data structure ,Task (project management) ,Identification (information) ,Knowledge sources ,The Internet ,Data mining ,business ,computer ,Domain identification ,Reusability - Abstract
Linked Open Data (LOD) has emerged as one of the largest collections of interlinked structured datasets on the Web. Although the adoption of such datasets for applications is increasing, identifying relevant datasets for a specific task or topic is still challenging. As an initial step to make such identification easier, we provide an approach to automatically identify the topic domains of given datasets. Our method utilizes existing knowledge sources, more specifically Freebase, and we present an evaluation which validates the topic domains we can identify with our system. Furthermore, we evaluate the effectiveness of identified topic domains for the purpose of finding relevant datasets, thus showing that our approach improves reusability of LOD datasets.
- Published
- 2013
23. Constructing consumer profiles from social media data
- Author
-
Bogdan Alexe, Mauricio A. Hernández, Prateek Jain, Chitra Venkatramani, Rohit Wagle, Ioana Stanoi, Kirsten W. Hildrum, and Rajasekar Krishnamurthy
- Subjects
Multimedia ,business.industry ,Computer science ,Key (cryptography) ,Social media ,Comics ,business ,computer.software_genre ,Construct (philosophy) ,Data science ,computer ,Know-how - Abstract
Social media is playing a growing role in providing consumer feedback to companies about their products and services. To maximize the benefit of this feedback, companies want to know how different consumer-segments they are interested in, such as parents, frequent travelers, and comic book fans react to their products and campaigns. In this paper, we describe how constructing consumer profiles is valuable to obtain such insights. We present the challenges in analyzing noisy social media data and the techniques we employ for building the profiles. We also present detailed experimental results from the analysis of over seven billion messages to construct profiles of over 100 million consumers. We demonstrate how consumer profiles can help in understanding consumer feedback by different key segments using a TV show analysis scenario.
- Published
- 2013
24. A statistical and schema independent approach to identify equivalent properties on linked data
- Author
-
Krishnaprasad Thirunarayan, Prateek Jain, Amit P. Sheth, Kalpa Gunaratna, and Sanjaya Wijeratne
- Subjects
Information retrieval ,Computer science ,business.industry ,Schema (psychology) ,Cloud computing ,Linked data ,Data mining ,Precision and recall ,computer.software_genre ,business ,computer ,Semantic Web ,Data integration - Abstract
Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community recently. Currently it consists of approximately 295 interlinked datasets with over 50 billion triples including 500 million links, and continues to expand in size. This vast source of structured information has the potential to have a significant impact on knowledge-based applications. However, a key impediment to the use of LOD cloud is limited support for data integration tasks over concepts, instances, and properties. Efforts to address this limitation over properties have focused on matching data-type properties across datasets; however, matching of object-type properties has not received similar attention. We present an approach that can automatically match object-type properties across linked datasets, primarily exploiting and bootstrapping from entity co-reference links such as owl:sameAs. Our evaluation, using sample instance sets taken from Freebase, DBpedia, LinkedMDB, and DBLP datasets covering multiple domains shows that our approach matches properties with high precision and recall (on average, F measure gain of 57% - 78%).
- Published
- 2013
25. Ad impression forecasting for sponsored search
- Author
-
Abhirup Nath, Prateek Jain, Srivatsan Laxman, Navin Goyal, and Shibnath Mukherjee
- Subjects
Service (systems architecture) ,Search engine ,Computer science ,business.industry ,Common value auction ,Data mining ,Artificial intelligence ,Variance (accounting) ,Machine learning ,computer.software_genre ,business ,computer ,Impression - Abstract
A typical problem for a search engine (hosting sponsored search service) is to provide the advertisers with a forecast of the number of impressions his/her ad is likely to obtain for a given bid. Accurate forecasts have high business value, since they enable advertisers to select bids that lead to better returns on their investment. They also play an important role in services such as automatic campaign optimization. Despite its importance the problem has remained relatively unexplored in literature. Existing methods typically overfit to the training data, leading to inconsistent performance. Furthermore, some of the existing methods cannot provide predictions for new ads, i.e., for ads that are not present in the logs. In this paper, we develop a generative model based approach that addresses these drawbacks. We design a Bayes net to capture inter-dependencies between the query traffic features and the competitors in an auction. Furthermore, we account for variability in the volume of query traffic by using a dynamic linear model. Finally, we implement our approach on a production grade MapReduce framework and conduct extensive large scale experiments on substantial volumes of sponsored search data from Bing. Our experimental results demonstrate significant advantages over existing methods as measured using several accuracy/error criteria, improved ability to provide estimates for new ads and more consistent performance with smaller variance in accuracies. Our method can also be adapted to several other related forecasting problems such as predicting average position of ads or the number of clicks under budget constraints.
- Published
- 2013
26. Moving beyond SameAs with PLATO
- Author
-
Kunal Verma, Peter Z. Yeh, Prateek Jain, Amit P. Sheth, and Pascal Hitzler
- Subjects
business.industry ,Property (programming) ,Computer science ,Cloud computing ,Linked data ,computer.software_genre ,Data science ,Variety (cybernetics) ,Information system ,Question answering ,Web application ,Web service ,business ,computer - Abstract
The Linked Open Data (LOD) Cloud has gained significant traction over the past few years. With over 275 interlinked datasets across diverse domains such as life science, geography, politics, and more, the LOD Cloud has the potential to support a variety of applications ranging from open domain question answering to drug discovery.Despite its significant size (approx. 30 billion triples), the data is relatively sparely interlinked (approx. 400 million links). A semantically richer LOD Cloud is needed to fully realize its potential. Data in the LOD Cloud are currently interlinked mainly via the owl:sameAs property, which is inadequate for many applications. Additional properties capturing relations based on causality or partonomy are needed to enable the answering of complex questions and to support applications.In this paper, we present a solution to enrich the LOD Cloud by automatically detecting partonomic relationships, which are well-established, fundamental properties grounded in linguistics and philosophy. We empirically evaluate our solution across several domains, and show that our approach performs well on detecting partonomic properties between LOD Cloud data.
- Published
- 2012
27. Alignment-Based Querying of Linked Open Data
- Author
-
Pascal Hitzler, Peter Z. Yeh, Prateek Jain, Kunal Verma, Mariana Damova, Amit P. Sheth, and Amit Krishna Joshi
- Subjects
business.industry ,Computer science ,Cloud computing ,Linked data ,computer.file_format ,Ontology (information science) ,computer.software_genre ,Query plan ,Knowledge extraction ,Question answering ,Upper ontology ,SPARQL ,Data mining ,business ,computer - Abstract
The Linked Open Data (LOD) cloud is rapidly becoming the largest interconnected source of structured data on diverse domains.The potential of the LOD cloud is enormous, ranging from solving challenging AI issues such as open domain question answering to automated knowledge discovery. However, due to an inherent distributed nature of LOD and a growing number of ontologies and vocabularies used in LOD datasets, querying over multiple datasets and retrieving LOD data remains a challenging task. In this paper, we propose a novel approach to querying linked data by using alignments for processing queries whose constituent data come from heterogeneous sources. We also report on our Alignment based Linked Open Data Querying System (ALOQUS) and present the architecture and associated methods. Using the state of the art alignment system BLOOMS, ALOQUS automatically maps concepts in users’ SPARQL queries, written in terms of a conceptual upper ontology or domain specific ontology, to different LOD concepts and datasets. It then creates a query plan, sends sub-queries to the different endpoints, crawls for co-referent URIs, merges the results and presents them to the user. We also compare the existing querying systems and demonstrate the added capabilities that the alignment based approach can provide for querying the Linked data.
- Published
- 2012
28. Mirror Descent Based Database Privacy
- Author
-
Abhradeep Thakurta and Prateek Jain
- Subjects
Privacy preserving ,Graph database ,Theoretical computer science ,Convex optimization algorithm ,Computer science ,Bipartite graph ,Database construction ,Mirror descent ,Data mining ,computer.software_genre ,Focus (optics) ,Database privacy ,computer - Abstract
In this paper, we focus on the problem of private database release in the interactive setting: a trusted database curator receives queries in an online manner for which it needs to respond with accurate but privacy preserving answers. To this end, we generalize the IDC (Iterative Database Construction) framework of [15,13] that maintains a differentially private artificial dataset and answers incoming linear queries using the artificial dataset. In particular, we formulate a generic IDC framework based on the Mirror Descent algorithm, a popular convex optimization algorithm [1]. We then present two concrete applications, namely, cut queries over a bipartite graph and linear queries over low-rank matrices, and provide significantly tighter error bounds than the ones by [15,13].
- Published
- 2012
29. SPARQL-ST: Extending SPARQL to Support Spatiotemporal Queries
- Author
-
Amit P. Sheth, Prateek Jain, and Matthew Perry
- Subjects
Information retrieval ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,InformationSystems_DATABASEMANAGEMENT ,computer.file_format ,Semantics ,Query language ,Temporal database ,Formal grammar ,Valid time ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,SPARQL ,RDF ,computer ,Semantic Web - Abstract
Spatial and temporal data is plentiful on the Web, and Semantic Web technologies have the potential to make this data more accessible and more useful. Semantic Web researchers have consequently made progress towards better handling of spatial and temporal data.SPARQL, the W3C-recommended query language for RDF, does not adequately support complex spatial and temporal queries. In this work, we present the SPARQL-ST query language. SPARQL-ST is an extension of SPARQL for complex spatiotemporal queries. We present a formal syntax and semantics for SPARQL-ST. In addition, we describe a prototype implementation of SPARQL-ST and demonstrate the scalability of this implementation with a performance study using large real-world and synthetic RDF datasets.
- Published
- 2011
30. Far-sighted active learning on a budget for image and video recognition
- Author
-
Kristen Grauman, Sudheendra Vijayanarasimhan, and Prateek Jain
- Subjects
Support vector machine ,Activity recognition ,Automatic image annotation ,business.industry ,Computer science ,Active learning ,Cognitive neuroscience of visual object recognition ,Artificial intelligence ,business ,Machine learning ,computer.software_genre ,computer ,Classifier (UML) - Abstract
Active learning methods aim to select the most informative unlabeled instances to label first, and can help to focus image or video annotations on the examples that will most improve a recognition system. However, most existing methods only make myopic queries for a single label at a time, retraining at each iteration. We consider the problem where at each iteration the active learner must select a set of examples meeting a given budget of supervision, where the budget is determined by the funds (or time) available to spend on annotation. We formulate the budgeted selection task as a continuous optimization problem where we determine which subset of possible queries should maximize the improvement to the classifier's objective, without overspending the budget. To ensure far-sighted batch requests, we show how to incorporate the predicted change in the model that the candidate examples will induce. We demonstrate the proposed algorithm on three datasets for object recognition, activity recognition, and content-based retrieval, and we show its clear practical advantages over random, myopic, and batch selection baselines.
- Published
- 2010
31. Ontology Alignment for Linked Open Data
- Author
-
Peter Z. Yeh, Kunal Verma, Amit P. Sheth, Prateek Jain, and Pascal Hitzler
- Subjects
Information retrieval ,business.industry ,Computer science ,Ontology-based data integration ,Bootstrapping (linguistics) ,Cloud computing ,Linked data ,Ontology (information science) ,computer.software_genre ,Benchmark (computing) ,Data mining ,business ,Semantic Web ,computer ,Ontology alignment ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
The Web of Data currently coming into existence through the Linked Open Data (LOD) effort is a major milestone in realizing the Semantic Web vision. However, the development of applications based on LOD faces difficulties due to the fact that the different LOD datasets are rather loosely connected pieces of information. In particular, links between LOD datasets are almost exclusively on the level of instances, and schema-level information is being ignored. In this paper, we therefore present a system for finding schema-level links between LOD datasets in the sense of ontology alignment. Our system, called BLOOMS, is based on the idea of bootstrapping information already present on the LOD cloud. We also present a comprehensive evaluation which shows that BLOOMS outperforms state-of-the-art ontology alignment systems on LOD datasets. At the same time, BLOOMS is also competitive compared with these other systems on the Ontology Evaluation Alignment Initiative Benchmark datasets.
- Published
- 2010
32. Active learning for large multi-class problems
- Author
-
Ashish Kapoor and Prateek Jain
- Subjects
Active learning (machine learning) ,business.industry ,Nearest neighbor search ,Probabilistic logic ,Semi-supervised learning ,Machine learning ,computer.software_genre ,Support vector machine ,Categorization ,Kernel (statistics) ,Metric (mathematics) ,Data mining ,Artificial intelligence ,business ,computer ,Mathematics - Abstract
Scarcity and infeasibility of human supervision for large scale multi-class classification problems necessitates active learning. Unfortunately, existing active learning methods for multi-class problems are inherently binary methods and do not scale up to a large number of classes. In this paper, we introduce a probabilistic variant of the K-nearest neighbor method for classification that can be seamlessly used for active learning in multi-class scenarios. Given some labeled training data, our method learns an accurate metric/kernel function over the input space that can be used for classification and similarity search. Unlike existing metric/kernel learning methods, our scheme is highly scalable for classification problems and provides a natural notion of uncertainty over class labels. Further, we use this measure of uncertainty to actively sample training examples that maximize discriminating capabilities of the model. Experiments on benchmark datasets show that the proposed method learns appropriate distance metrics that lead to state-of-the-art performance for object categorization problems. Furthermore, our active learning method effectively samples training examples, resulting in significant accuracy gains over random sampling for multi-class problems involving a large number of classes.
- Published
- 2009
33. SPARQL Query Re-writing Using Partonomy Based Transformation Rules
- Author
-
Prateek Jain, Peter Z. Yeh, Amit P. Sheth, Kunal Verma, and Cory Henson
- Subjects
Web search query ,Information retrieval ,business.industry ,Computer science ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,InformationSystems_DATABASEMANAGEMENT ,computer.file_format ,Query optimization ,Query language ,Spatial query ,Query expansion ,Named graph ,Knowledge base ,Web query classification ,Ontology ,SPARQL ,Sargable ,business ,computer ,Spatial analysis ,RDF query language ,computer.programming_language - Abstract
Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology's containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.
- Published
- 2009
34. Integrating Stateful Services in Workflow
- Author
-
Vikram Sorathia, Prateek Jain, Zakir Laliwala, and Sanjay Chaudhary
- Subjects
Database ,Computer science ,business.industry ,computer.internet_protocol ,Services computing ,Service-oriented architecture ,computer.software_genre ,Workflow engine ,Workflow technology ,Business process management ,Business Process Execution Language ,Workflow ,Software engineering ,business ,computer ,Workflow management system - Abstract
Long running business processes require composition of services for task accomplishment. BPEL provide a mechanism to model business workflow among collaborating Web Services distributed across organizations. For effective execution and consistent outcomes from such composite services in a loosely coupled environment, management of state, transaction, notification and execution monitoring are the key requirements. After the emergence of WS-RF, stateful Web services have become integral part of Grid environment. Grid supports execution monitoring, resource sharing, integration and management of large-scale applications. Such capabilities are the backbone for any large-scale enterprise applications. Process oriented workflow execution in a grid environment with the support to manage state, transaction and notification is a challenging issue. In this paper we have proposed architecture to achieve integration of stateful services for a grid workflow. Development of stateful services is based on WS-RF specification to build services to support state management, transaction, notification and execution monitoring of a business process in a large-scale grid environment.
- Published
- 2006
35. Multi-objective Optimization for Adaptive Web Site Generation
- Author
-
Pabitra Mitra and Prateek Jain
- Subjects
DBSCAN ,Hierarchical agglomerative clustering ,Optimization problem ,business.industry ,Computer science ,Machine learning ,computer.software_genre ,Multi-objective optimization ,Index (publishing) ,Artificial intelligence ,business ,Cluster analysis ,computer ,Web site - Abstract
Designing web sites is a complex problem. Adaptive sites are those which improve themselves by learning from user access patterns. In this paper we have considered a problem of index page synthesis for an adaptive website and framed it in a new type of Multi-Objective Optimization problem. We give a solution to index page synthesis which uses a popular clustering algorithm DBSCAN alongwith NSGA-II–an evolutionary algorithm–to find out best index pages for a website. Our experiments shows that very good candidate index pages can be generated automatically, and that our technique outperforms various existing methods such as PageGather, K-Means and Hierarchical Agglomerative Clustering.
- Published
- 2005
36. Supporting complex thematic, spatial and temporal queries over semantic web data
- Author
-
Matthew Perry, Farshad Hakimpour, Prateek Jain, and Amit P. Sheth
- Subjects
Semantic query ,Information retrieval ,Computer science ,business.industry ,computer.file_format ,Linked data ,Social Semantic Web ,Semantic grid ,Semantic computing ,Semantic analytics ,SPARQL ,Semantic Web Stack ,business ,computer - Abstract
Spatial and temporal data are critical components in many applications. This is especially true in analytical domains such as national security and criminal investigation. Often, the analytical process requires uncovering and analyzing complex thematic relationships between disparate people, places and events. Fundamentally new query operators based on the graph structure of Semantic Web data models, such as semantic associations, are proving useful for this purpose. However, these analysis mechanisms are primarily intended for thematic relationships. In this paper, we describe a framework built around the RDF metadata model for analysis of thematic, spatial and temporal relationships between named entities. We discuss modeling issues and present a set of semantic query operators. We also describe an efficient implementation in Oracle DBMS and demonstrate the scalability of our approach with a performance study using a large synthetic dataset from the national security domain.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.