8 results on '"Sudip Roy"'
Search Results
2. TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines
- Author
-
G C Paul Suganthan, Martin Zinkevich, Zhuo Peng, Emily Caveness, Neoklis Polyzotis, and Sudip Roy
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,Data management ,Data validation ,02 engineering and technology ,Machine learning ,computer.software_genre ,Pipeline transport ,020204 information systems ,Scalability ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Quality (business) ,Artificial intelligence ,business ,computer ,media_common - Abstract
Machine Learning (ML) research has primarily focused on improving the accuracy and efficiency of the training algorithms while paying much less attention to the equally important problem of understanding, validating, and monitoring the data fed to ML. Irrespective of the ML algorithms used, data errors can adversely affect the quality of the generated model. This indicates that we need to adopt a data-centric approach to ML that treats data as a first-class citizen, on par with algorithms and infrastructure which are the typical building blocks of ML pipelines. In this demonstration we showcase TensorFlow Data Validation (TFDV), a scalable data analysis and validation system for ML that we have developed at Google and recently open-sourced. This system is deployed in production as an integral part of TFX - an end-to-end machine learning platform at Google. It is used by hundreds of product teams at Google and has received significant attention from the open-source community as well.
- Published
- 2020
- Full Text
- View/download PDF
3. Factorization based dilution of biochemical fluids with micro-electrode-dot-array biochips
- Author
-
Bhargab B. Bhattacharya, Partha Chakrabarti, Sohini Saha, Sudip Roy, Debraj Kundu, Sukanta Bhattacharjee, and Krishnendu Chakrabarty
- Subjects
Computer science ,Sample (material) ,020208 electrical & electronic engineering ,02 engineering and technology ,Chip ,020202 computer hardware & architecture ,Dilution ,chemistry.chemical_compound ,Factorization ,chemistry ,0202 electrical engineering, electronic engineering, information engineering ,Sample preparation ,Biological system ,Biochip ,Mixing (physics) ,MEDA - Abstract
Sample preparation, an essential preprocessing step for biochemical protocols, is concerned with the generation of fluids satisfying specific target ratios and error-tolerance. Recent micro-electrode-dot-array (MEDA)-based DMF biochips provide the advantage of supporting both discrete and dynamic mixing models, the power of which has not yet been fully harnessed for implementing on-chip dilution and mixing of fluids. In this paper, we propose a novel factorization-based algorithm called FacDA for efficient and accurate dilution of sample fluid on a MEDA chip. Simulation results reveal that over a large number of test-cases with the mixing volume constraint in the range of 4--10 units, FacDA requires around 38% fewer mixing steps, 52% less sample units, and generates approximately 23% less wastage, all on average, compared to two prior dilution algorithms used for MEDA chips.
- Published
- 2019
- Full Text
- View/download PDF
4. Data Management Challenges in Production Machine Learning
- Author
-
Steven Euijong Whang, Martin Zinkevich, Neoklis Polyzotis, and Sudip Roy
- Subjects
Focus (computing) ,Training set ,business.industry ,Computer science ,Data management ,Data validation ,Context (language use) ,02 engineering and technology ,Machine learning ,computer.software_genre ,Data science ,Open research ,Work (electrical) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Production (economics) ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer - Abstract
The tutorial discusses data-management issues that arise in the context of machine learning pipelines deployed in production. Informed by our own experience with such largescale pipelines, we focus on issues related to understanding, validating, cleaning, and enriching training data. The goal of the tutorial is to bring forth these issues, draw connections to prior work in the database literature, and outline the open research questions that are not addressed by prior art.
- Published
- 2017
- Full Text
- View/download PDF
5. Goods
- Author
-
Alon Halevy, Neoklis Polyzotis, Flip Korn, Sudip Roy, Steven Euijong Whang, Natalya F. Noy, and Christopher Olston
- Subjects
Data flow diagram ,World Wide Web ,Metadata ,Data element ,Computer science ,020204 information systems ,Schema (psychology) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Enterprise data management ,Metadata repository - Abstract
Enterprises increasingly rely on structured datasets to run their businesses. These datasets take a variety of forms, such as structured files, databases, spreadsheets, or even services that provide access to the data. The datasets often reside in different storage systems, may vary in their formats, may change every day. In this paper, we present GOODS, a project to rethink how we organize structured datasets at scale, in a setting where teams use diverse and often idiosyncratic ways to produce the datasets and where there is no centralized system for storing and querying them. GOODS extracts metadata ranging from salient information about each dataset (owners, timestamps, schema) to relationships among datasets, such as similarity and provenance. It then exposes this metadata through services that allow engineers to find datasets within the company, to monitor datasets, to annotate them in order to enable others to use their datasets, and to analyze relationships between them. We discuss the technical challenges that we had to overcome in order to crawl and infer the metadata for billions of datasets, to maintain the consistency of our metadata catalog at scale, and to expose the metadata to users. We believe that many of the lessons that we learned are applicable to building large-scale enterprise-level data-management systems in general.
- Published
- 2016
- Full Text
- View/download PDF
6. The Homeostasis Protocol
- Author
-
Lucja Kot, Nate Foster, Christoph Koch, Hossein Hojjat, Sudip Roy, Bailu Ding, Johannes Gehrke, and Gabriel Bender
- Subjects
FOS: Computer and information sciences ,Correctness ,Computer science ,Distributed computing ,Strong consistency ,Databases (cs.DB) ,02 engineering and technology ,Replication (computing) ,Program analysis ,Computer Science - Databases ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Key (cryptography) ,020201 artificial intelligence & image processing ,Protocol (object-oriented programming) ,Database transaction - Abstract
Datastores today rely on distribution and replication to achieve improved performance and fault-tolerance. But correctness of many applications depends on strong consistency properties - something that can impose substantial overheads, since it requires coordinating the behavior of multiple nodes. This paper describes a new approach to achieving strong consistency in distributed systems while minimizing communication between nodes. The key insight is to allow the state of the system to be inconsistent during execution, as long as this inconsistency is bounded and does not affect transaction correctness. In contrast to previous work, our approach uses program analysis to extract semantic information about permissible levels of inconsistency and is fully automated. We then employ a novel homeostasis protocol to allow sites to operate independently, without communicating, as long as any inconsistency is governed by appropriate treaties between the nodes. We discuss mechanisms for optimizing treaties based on workload characteristics to minimize communication, as well as a prototype implementation and experiments that demonstrate the benefits of our approach on common transactional benchmarks.
- Published
- 2015
- Full Text
- View/download PDF
7. Demand-Driven Mixture Preparation and Droplet Streaming using Digital Microfluidic Biochips
- Author
-
Krishnendu Chakrabarty, Bhargab B. Bhattacharya, Partha Chakrabarti, Sudip Roy, and Srijan Kumar
- Subjects
Reduction (complexity) ,Materials science ,business.industry ,Microfluidics ,Latency (audio) ,Demand driven ,Electronic engineering ,Digital microfluidics ,One pass ,Biochip ,Process engineering ,business - Abstract
In many biochemical protocols, such as polymerase chain reaction, a mixture of fluids in a certain ratio is repeatedly required, and hence a sufficient quantity of the mixture must be supplied for assay completion. Existing sample-preparation algorithms based on digital microfluidics (DMF) emit two target droplets in one pass, and costly multiple passes are required to sustain the emission of the mixture droplet. To alleviate this problem, we design a streaming engine on a DMF biochip, which optimizes droplet emission depending on the demand and available storage. Simulation results show significant reduction in latency and reactant usage for mixture preparation.
- Published
- 2014
- Full Text
- View/download PDF
8. A layout-aware physical design method for constructing feasible QCA circuits
- Author
-
M. Bubna, Naresh Shenoy, Sudip Roy, and Subhra Mazumdar
- Subjects
Clock rate ,Quantum dot cellular automaton ,Hardware_PERFORMANCEANDRELIABILITY ,Grid ,Computer engineering ,Hardware_INTEGRATEDCIRCUITS ,Netlist ,Electronic engineering ,Electronic design automation ,Routing (electronic design automation) ,Physical design ,Hardware_LOGICDESIGN ,Mathematics ,Electronic circuit - Abstract
Quantum-dot Cellular Automata (QCA) is an emerging computing paradigm, in which logical operations as well as signal transmission occurs due to Coulombic charge interaction between neighbouring QCA cells, moderated by a 4-phase QCA clock potential. Thermodynamic constraints like the number of QCA cells in a clocking zone must be obeyed to obtain a logically correct and feasible QCA circuit. These constraints depend on various design factors like total wirelength in a circuit, height of a clocking zone etc. which are not available until actual circuit layout is obtained. In this paper, the various design automation problems assosciated with obtaining a feasible QCA layout are addressed. The layout generation problem is formulated as embedding the netlist digraph in an orthogonal grid, which provides an abstraction of the actual physical layout to be obtained. Novel graph theoretic algorithms are proposed to perform placement and global routing and various design parameters like clock rate, wasted area and total wirelength are used to estimate the quality of the layout obtained. Also, planarization methods are used to remove all wire crossings, which are expensive to fabricate. The methods applied on a large number of MCNC'93 and ISCAS'89 benchmarks show good results.
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.