Author: "Keutzer, Kurt" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Keutzer, Kurt"' showing total 1,010 results

Start Over Author "Keutzer, Kurt"

1,010 results on '"Keutzer, Kurt"'

151. Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models

Author: Xu, Chenfeng, Yang, Shijia, Galanti, Tomer, Wu, Bichen, Yue, Xiangyu, Zhai, Bohan, Zhan, Wei, Vajda, Peter, Keutzer, Kurt, Tomizuka, Masayoshi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Avidan, Shai, editor, Brostow, Gabriel, editor, Cissé, Moustapha, editor, Farinella, Giovanni Maria, editor, and Hassner, Tal, editor
Published: 2022
Full Text: View/download PDF

152. Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

Author: Wu, Bichen, Wang, Yanghan, Zhang, Peizhao, Tian, Yuandong, Vajda, Peter, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent work in network quantization has substantially reduced the time and space complexity of neural network inference, enabling their deployment on embedded and mobile devices with limited computational and memory resources. However, existing quantization methods often represent all weights and activations with the same precision (bit-width). In this paper, we explore a new dimension of the design space: quantizing different layers with different bit-widths. We formulate this problem as a neural architecture search problem and propose a novel differentiable neural architecture search (DNAS) framework to efficiently explore its exponential search space with gradient-based optimization. Experiments show we surpass the state-of-the-art compression of ResNet on CIFAR-10 and ImageNet. Our quantized models with 21.1x smaller model size or 103.9x lower computational cost can still outperform baseline quantized or even full precision models.
Published: 2018

153. Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Author: Yang, Yifan, Huang, Qijing, Wu, Bichen, Zhang, Tianjun, Ma, Liang, Gambardella, Giulio, Blott, Michaela, Lavagno, Luciano, Vissers, Kees, Wawrzynek, John, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Hardware Architecture
Abstract: Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) are ignored in favor of simply counting GOPs, and results on accuracy, which is critical to application success, are often not even reported. In this work, we adopt an algorithm-hardware co-design approach to develop a ConvNet accelerator called Synetgy and a novel ConvNet model called DiracDeltaNet$^{\dagger}$. Both the accelerator and ConvNet are tailored to FPGA requirements. DiracDeltaNet, as the name suggests, is a ConvNet with only $1\times 1$ convolutions while spatial convolutions are replaced by more efficient shift operations. DiracDeltaNet achieves competitive accuracy on ImageNet (88.7\% top-5), but with 42$\times$ fewer parameters and 48$\times$ fewer OPs than VGG16. We further quantize DiracDeltaNet's weights to 4-bit and activations to 4-bits, with less than 1\% accuracy loss. These quantizations exploit well the nature of FPGA hardware. In short, DiracDeltaNet's small model size, low computational OP count, low precision and simplified operators allow us to co-design a highly customized computing unit for an FPGA. We implement the computing units for DiracDeltaNet on an Ultra96 SoC system through high-level synthesis. Our accelerator's final top-5 accuracy of 88.1\% on ImageNet, is higher than all the previously reported embedded FPGA accelerators. In addition, the accelerator reaches an inference speed of 66.3 FPS on the ImageNet classification task, surpassing prior works with similar accuracy by at least 11.6$\times$., Comment: Update to the latest results
Published: 2018
Full Text: View/download PDF

154. Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Author: Bakas, Spyridon, Reyes, Mauricio, Jakab, Andras, Bauer, Stefan, Rempfler, Markus, Crimi, Alessandro, Shinohara, Russell Takeshi, Berger, Christoph, Ha, Sung Min, Rozycki, Martin, Prastawa, Marcel, Alberts, Esther, Lipkova, Jana, Freymann, John, Kirby, Justin, Bilello, Michel, Fathallah-Shaykh, Hassan, Wiest, Roland, Kirschke, Jan, Wiestler, Benedikt, Colen, Rivka, Kotrotsou, Aikaterini, Lamontagne, Pamela, Marcus, Daniel, Milchenko, Mikhail, Nazeri, Arash, Weber, Marc-Andre, Mahajan, Abhishek, Baid, Ujjwal, Gerstner, Elizabeth, Kwon, Dongjin, Acharya, Gagan, Agarwal, Manu, Alam, Mahbubul, Albiol, Alberto, Albiol, Antonio, Albiol, Francisco J., Alex, Varghese, Allinson, Nigel, Amorim, Pedro H. A., Amrutkar, Abhijit, Anand, Ganesh, Andermatt, Simon, Arbel, Tal, Arbelaez, Pablo, Avery, Aaron, Azmat, Muneeza, B., Pranjal, Bai, W, Banerjee, Subhashis, Barth, Bill, Batchelder, Thomas, Batmanghelich, Kayhan, Battistella, Enzo, Beers, Andrew, Belyaev, Mikhail, Bendszus, Martin, Benson, Eze, Bernal, Jose, Bharath, Halandur Nagaraja, Biros, George, Bisdas, Sotirios, Brown, James, Cabezas, Mariano, Cao, Shilei, Cardoso, Jorge M., Carver, Eric N, Casamitjana, Adrià, Castillo, Laura Silvana, Catà, Marcel, Cattin, Philippe, Cerigues, Albert, Chagas, Vinicius S., Chandra, Siddhartha, Chang, Yi-Ju, Chang, Shiyu, Chang, Ken, Chazalon, Joseph, Chen, Shengcong, Chen, Wei, Chen, Jefferson W, Chen, Zhaolin, Cheng, Kun, Choudhury, Ahana Roy, Chylla, Roger, Clérigues, Albert, Colleman, Steven, Colmeiro, Ramiro German Rodriguez, Combalia, Marc, Costa, Anthony, Cui, Xiaomeng, Dai, Zhenzhen, Dai, Lutao, Daza, Laura Alexandra, Deutsch, Eric, Ding, Changxing, Dong, Chao, Dong, Shidu, Dudzik, Wojciech, Eaton-Rosen, Zach, Egan, Gary, Escudero, Guilherme, Estienne, Théo, Everson, Richard, Fabrizio, Jonathan, Fan, Yong, Fang, Longwei, Feng, Xue, Ferrante, Enzo, Fidon, Lucas, Fischer, Martin, French, Andrew P., Fridman, Naomi, Fu, Huan, Fuentes, David, Gao, Yaozong, Gates, Evan, Gering, David, Gholami, Amir, Gierke, Willi, Glocker, Ben, Gong, Mingming, González-Villá, Sandra, Grosges, T., Guan, Yuanfang, Guo, Sheng, Gupta, Sudeep, Han, Woo-Sup, Han, Il Song, Harmuth, Konstantin, He, Huiguang, Hernández-Sabaté, Aura, Herrmann, Evelyn, Himthani, Naveen, Hsu, Winston, Hsu, Cheyu, Hu, Xiaojun, Hu, Xiaobin, Hu, Yan, Hu, Yifan, Hua, Rui, Huang, Teng-Yi, Huang, Weilin, Van Huffel, Sabine, Huo, Quan, HV, Vivek, Iftekharuddin, Khan M., Isensee, Fabian, Islam, Mobarakol, Jackson, Aaron S., Jambawalikar, Sachin R., Jesson, Andrew, Jian, Weijian, Jin, Peter, Jose, V Jeya Maria, Jungo, Alain, Kainz, B, Kamnitsas, Konstantinos, Kao, Po-Yu, Karnawat, Ayush, Kellermeier, Thomas, Kermi, Adel, Keutzer, Kurt, Khadir, Mohamed Tarek, Khened, Mahendra, Kickingereder, Philipp, Kim, Geena, King, Nik, Knapp, Haley, Knecht, Urspeter, Kohli, Lisa, Kong, Deren, Kong, Xiangmao, Koppers, Simon, Kori, Avinash, Krishnamurthi, Ganapathy, Krivov, Egor, Kumar, Piyush, Kushibar, Kaisar, Lachinov, Dmitrii, Lambrou, Tryphon, Lee, Joon, Lee, Chengen, Lee, Yuehchou, Lee, M, Lefkovits, Szidonia, Lefkovits, Laszlo, Levitt, James, Li, Tengfei, Li, Hongwei, Li, Wenqi, Li, Hongyang, Li, Xiaochuan, Li, Yuexiang, Li, Heng, Li, Zhenye, Li, Xiaoyu, Li, Zeju, Li, XiaoGang, Lin, Zheng-Shen, Lin, Fengming, Lio, Pietro, Liu, Chang, Liu, Boqiang, Liu, Xiang, Liu, Mingyuan, Liu, Ju, Liu, Luyan, Llado, Xavier, Lopez, Marc Moreno, Lorenzo, Pablo Ribalta, Lu, Zhentai, Luo, Lin, Luo, Zhigang, Ma, Jun, Ma, Kai, Mackie, Thomas, Madabushi, Anant, Mahmoudi, Issam, Maier-Hein, Klaus H., Maji, Pradipta, Mammen, CP, Mang, Andreas, Manjunath, B. S., Marcinkiewicz, Michal, McDonagh, S, McKenna, Stephen, McKinley, Richard, Mehl, Miriam, Mehta, Sachin, Mehta, Raghav, Meier, Raphael, Meinel, Christoph, Merhof, Dorit, Meyer, Craig, Miller, Robert, Mitra, Sushmita, Moiyadi, Aliasgar, Molina-Garcia, David, Monteiro, Miguel A. B., Mrukwa, Grzegorz, Myronenko, Andriy, Nalepa, Jakub, Ngo, Thuyen, Nie, Dong, Ning, Holly, Niu, Chen, Nuechterlein, Nicholas K, Oermann, Eric, Oliveira, Arlindo, Oliveira, Diego D. C., Oliver, Arnau, Osman, Alexander F. I., Ou, Yu-Nian, Ourselin, Sebastien, Paragios, Nikos, Park, Moo Sung, Paschke, Brad, Pauloski, J. Gregory, Pawar, Kamlesh, Pawlowski, Nick, Pei, Linmin, Peng, Suting, Pereira, Silvio M., Perez-Beteta, Julian, Perez-Garcia, Victor M., Pezold, Simon, Pham, Bao, Phophalia, Ashish, Piella, Gemma, Pillai, G. N., Piraud, Marie, Pisov, Maxim, Popli, Anmol, Pound, Michael P., Pourreza, Reza, Prasanna, Prateek, Prkovska, Vesna, Pridmore, Tony P., Puch, Santi, Puybareau, Élodie, Qian, Buyue, Qiao, Xu, Rajchl, Martin, Rane, Swapnil, Rebsamen, Michael, Ren, Hongliang, Ren, Xuhua, Revanuru, Karthik, Rezaei, Mina, Rippel, Oliver, Rivera, Luis Carlos, Robert, Charlotte, Rosen, Bruce, Rueckert, Daniel, Safwan, Mohammed, Salem, Mostafa, Salvi, Joaquim, Sanchez, Irina, Sánchez, Irina, Santos, Heitor M., Sartor, Emmett, Schellingerhout, Dawid, Scheufele, Klaudius, Scott, Matthew R., Scussel, Artur A., Sedlar, Sara, Serrano-Rubio, Juan Pablo, Shah, N. Jon, Shah, Nameetha, Shaikh, Mazhar, Shankar, B. Uma, Shboul, Zeina, Shen, Haipeng, Shen, Dinggang, Shen, Linlin, Shen, Haocheng, Shenoy, Varun, Shi, Feng, Shin, Hyung Eun, Shu, Hai, Sima, Diana, Sinclair, M, Smedby, Orjan, Snyder, James M., Soltaninejad, Mohammadreza, Song, Guidong, Soni, Mehul, Stawiaski, Jean, Subramanian, Shashank, Sun, Li, Sun, Roger, Sun, Jiawei, Sun, Kay, Sun, Yu, Sun, Guoxia, Sun, Shuang, Suter, Yannick R, Szilagyi, Laszlo, Talbar, Sanjay, Tao, Dacheng, Teng, Zhongzhao, Thakur, Siddhesh, Thakur, Meenakshi H, Tharakan, Sameer, Tiwari, Pallavi, Tochon, Guillaume, Tran, Tuan, Tsai, Yuhsiang M., Tseng, Kuan-Lun, Tuan, Tran Anh, Turlapov, Vadim, Tustison, Nicholas, Vakalopoulou, Maria, Valverde, Sergi, Vanguri, Rami, Vasiliev, Evgeny, Ventura, Jonathan, Vera, Luis, Vercauteren, Tom, Verrastro, C. A., Vidyaratne, Lasitha, Vilaplana, Veronica, Vivekanandan, Ajeet, Wang, Guotai, Wang, Qian, Wang, Chiatse J., Wang, Weichung, Wang, Duo, Wang, Ruixuan, Wang, Yuanyuan, Wang, Chunliang, Wen, Ning, Wen, Xin, Weninger, Leon, Wick, Wolfgang, Wu, Shaocheng, Wu, Qiang, Wu, Yihong, Xia, Yong, Xu, Yanwu, Xu, Xiaowen, Xu, Peiyuan, Yang, Tsai-Ling, Yang, Xiaoping, Yang, Hao-Yu, Yang, Junlin, Yang, Haojin, Yang, Guang, Yao, Hongdou, Ye, Xujiong, Yin, Changchang, Young-Moxon, Brett, Yu, Jinhua, Yue, Xiangyu, Zhang, Songtao, Zhang, Angela, Zhang, Kun, Zhang, Xuejie, Zhang, Lichi, Zhang, Xiaoyue, Zhang, Yazhuo, Zhang, Lei, Zhang, Jianguo, Zhang, Xiang, Zhang, Tianhao, Zhao, Sicheng, Zhao, Yu, Zhao, Xiaomei, Zhao, Liang, Zheng, Yefeng, Zhong, Liming, Zhou, Chenhong, Zhou, Xiaobing, Zhou, Fan, Zhu, Hongtu, Zhu, Jin, Zhuge, Ying, Zong, Weiwei, Kalpathy-Cramer, Jayashree, Farahani, Keyvan, Davatzikos, Christos, van Leemput, Koen, and Menze, Bjoern
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset., Comment: The International Multimodal Brain Tumor Segmentation (BraTS) Challenge
Published: 2018

155. A Novel Domain Adaptation Framework for Medical Image Segmentation

Author: Gholami, Amir, Subramanian, Shashank, Shenoy, Varun, Himthani, Naveen, Yue, Xiangyu, Zhao, Sicheng, Jin, Peter, Biros, George, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: We propose a segmentation framework that uses deep neural networks and introduce two innovations. First, we describe a biophysics-based domain adaptation method. Second, we propose an automatic method to segment white and gray matter, and cerebrospinal fluid, in addition to tumorous tissue. Regarding our first innovation, we use a domain adaptation framework that combines a novel multispecies biophysical tumor growth model with a generative adversarial model to create realistic looking synthetic multimodal MR images with known segmentation. Regarding our second innovation, we propose an automatic approach to enrich available segmentation data by computing the segmentation for healthy tissues. This segmentation, which is done using diffeomorphic image registration between the BraTS training data and a set of prelabeled atlases, provides more information for training and reduces the class imbalance problem. Our overall approach is not specific to any particular neural network and can be used in conjunction with existing solutions. We demonstrate the performance improvement using a 2D U-Net for the BraTS'18 segmentation challenge. Our biophysics based domain adaptation achieves better results, as compared to the existing state-of-the-art GAN model used to create synthetic data for training.
Published: 2018

156. Large batch size training of neural networks with adversarial training and second-order information

Author: Yao, Zhewei, Gholami, Amir, Arfeen, Daiyaan, Liaw, Richard, Gonzalez, Joseph, Keutzer, Kurt, and Mahoney, Michael
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematics - Optimization and Control, Statistics - Machine Learning
Abstract: The most straightforward method to accelerate Stochastic Gradient Descent (SGD) computation is to distribute the randomly selected batch of inputs over multiple processors. To keep the distributed processors fully utilized requires commensurately growing the batch size. However, large batch training often leads to poorer generalization. A recently proposed solution for this problem is to use adaptive batch sizes in SGD. In this case, one starts with a small number of processes and scales the processes as training progresses. Two major challenges with this approach are (i) that dynamically resizing the cluster can add non-trivial overhead, in part since it is currently not supported, and (ii) that the overall speed up is limited by the initial phase with smaller batches. In this work, we address both challenges by developing a new adaptive batch size framework, with autoscaling based on the Ray framework. This allows very efficient elastic scaling with negligible resizing overhead (0.32\% of time for ResNet18 ImageNet training). Furthermore, we propose a new adaptive batch size training scheme using second order methods and adversarial training. These enable increasing batch sizes earlier during training, which leads to better training time. We extensively evaluate our method on Cifar-10/100, SVHN, TinyImageNet, and ImageNet datasets, using multiple neural networks, including ResNets and smaller networks such as SqueezeNext. Our method exceeds the performance of existing solutions in terms of both accuracy and the number of SGD iterations (up to 1\% and $5\times$, respectively). Importantly, this is achieved without any additional hyper-parameter tuning to tailor our method in any of these experiments.
Published: 2018

157. SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud

Author: Wu, Bichen, Zhou, Xuanyu, Zhao, Sicheng, Yue, Xiangyu, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Earlier work demonstrates the promise of deep-learning-based approaches for point cloud segmentation; however, these approaches need to be improved to be practically useful. To this end, we introduce a new model SqueezeSegV2 that is more robust to dropout noise in LiDAR point clouds. With improved model structure, training loss, batch normalization and additional input channel, SqueezeSegV2 achieves significant accuracy improvement when trained on real data. Training models for point cloud segmentation requires large amounts of labeled point-cloud data, which is expensive to obtain. To sidestep the cost of collection and annotation, simulators such as GTA-V can be used to create unlimited amounts of labeled, synthetic data. However, due to domain shift, models trained on synthetic data often do not generalize well to the real world. We address this problem with a domain-adaptation training pipeline consisting of three major components: 1) learned intensity rendering, 2) geodesic correlation alignment, and 3) progressive domain calibration. When trained on real data, our new model exhibits segmentation accuracy improvements of 6.0-8.6% over the original SqueezeSeg. When training our new model on synthetic data using the proposed domain adaptation pipeline, we nearly double test accuracy on real-world data, from 29.0% to 57.4%. Our source code and synthetic dataset will be open-sourced., Comment: Bichen Wu, Xuanyu Zhou, and Sicheng Zhao contributed equally to this paper
Published: 2018

158. Counterexample-Guided Data Augmentation

Author: Dreossi, Tommaso, Ghosh, Shromona, Yue, Xiangyu, Keutzer, Kurt, Sangiovanni-Vincentelli, Alberto, and Seshia, Sanjit A.
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: We present a novel framework for augmenting data sets for machine learning based on counterexamples. Counterexamples are misclassified examples that have important properties for retraining and improving the model. Key components of our framework include a counterexample generator, which produces data items that are misclassified by the model and error tables, a novel data structure that stores information pertaining to misclassifications. Error tables can be used to explain the model's vulnerabilities and are used to efficiently generate counterexamples for augmentation. We show the efficacy of the proposed framework by comparing it to classical augmentation techniques on a case study of object detection in autonomous driving based on deep neural networks.
Published: 2018

159. Co-Design of Deep Neural Nets and Neural Net Accelerators for Embedded Vision Applications

Author: Kwon, Kiseok, Amid, Alon, Gholami, Amir, Wu, Bichen, Asanovic, Krste, and Keutzer, Kurt
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Deep Learning is arguably the most rapidly evolving research area in recent years. As a result it is not surprising that the design of state-of-the-art deep neural net models proceeds without much consideration of the latest hardware targets, and the design of neural net accelerators proceeds without much consideration of the characteristics of the latest deep neural net models. Nevertheless, in this paper we show that there are significant improvements available if deep neural net models and neural net accelerators are co-designed., Comment: This paper is trimmed to 6 pages to meet the conference requirement. A longer version with more detailed discussion will be released afterwards
Published: 2018

160. A LiDAR Point Cloud Generator: from a Virtual World to Autonomous Driving

Author: Yue, Xiangyu, Wu, Bichen, Seshia, Sanjit A., Keutzer, Kurt, and Sangiovanni-Vincentelli, Alberto L.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D LiDAR scanners are playing an increasingly important role in autonomous driving as they can generate depth information of the environment. However, creating large 3D LiDAR point cloud datasets with point-level labels requires a significant amount of manual annotation. This jeopardizes the efficient development of supervised deep learning algorithms which are often data-hungry. We present a framework to rapidly create point clouds with accurate point-level labels from a computer game. The framework supports data collection from both auto-driving scenes and user-configured scenes. Point clouds from auto-driving scenes can be used as training data for deep learning algorithms, while point clouds from user-configured scenes can be used to systematically test the vulnerability of a neural network, and use the falsifying examples to make the neural network more robust through retraining. In addition, the scene images can be captured simultaneously in order for sensor fusion tasks, with a method proposed to do automatic calibration between the point clouds and captured scene images. We show a significant improvement in accuracy (+9%) in point cloud segmentation by augmenting the training dataset with the generated synthesized data. Our experiments also show by testing and retraining the network using point clouds from user-configured scenes, the weakness/blind spots of the neural network can be fixed.
Published: 2018

161. Unsupervised Domain Adaptation: from Simulation Engine to the RealWorld

Author: Zhao, Sicheng, Wu, Bichen, Gonzalez, Joseph, Seshia, Sanjit A., and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning, Statistics - Machine Learning
Abstract: Large-scale labeled training datasets have enabled deep neural networks to excel on a wide range of benchmark vision tasks. However, in many applications it is prohibitively expensive or time-consuming to obtain large quantities of labeled data. To cope with limited labeled training data, many have attempted to directly apply models trained on a large-scale labeled source domain to another sparsely labeled target domain. Unfortunately, direct transfer across domains often performs poorly due to domain shift and dataset bias. Domain adaptation is the machine learning paradigm that aims to learn a model from a source domain that can perform well on a different (but related) target domain. In this paper, we summarize and compare the latest unsupervised domain adaptation methods in computer vision applications. We classify the non-deep approaches into sample re-weighting and intermediate subspace transformation categories, while the deep strategy includes discrepancy-based methods, adversarial generative models, adversarial discriminative models and reconstruction-based methods. We also discuss some potential directions.
Published: 2018

162. SqueezeNext: Hardware-Aware Neural Network Design

Author: Gholami, Amir, Kwon, Kiseok, Wu, Bichen, Tai, Zizheng, Yue, Xiangyu, Jin, Peter, Zhao, Sicheng, and Keutzer, Kurt
Subjects: Computer Science - Neural and Evolutionary Computing
Abstract: One of the main barriers for deploying neural networks on embedded systems has been large memory and power consumption of existing neural networks. In this work, we introduce SqueezeNext, a new family of neural network architectures whose design was guided by considering previous architectures such as SqueezeNet, as well as by simulation results on a neural network accelerator. This new network is able to match AlexNet's accuracy on the ImageNet benchmark with $112\times$ fewer parameters, and one of its deeper variants is able to achieve VGG-19 accuracy with only 4.4 Million parameters, ($31\times$ smaller than VGG-19). SqueezeNext also achieves better top-5 classification accuracy with $1.3\times$ fewer parameters as compared to MobileNet, but avoids using depthwise-separable convolutions that are inefficient on some mobile processor platforms. This wide range of accuracy gives the user the ability to make speed-accuracy tradeoffs, depending on the available resources on the target hardware. Using hardware simulation results for power and inference speed on an embedded system has guided us to design variations of the baseline model that are $2.59\times$/$8.26\times$ faster and $2.25\times$/$7.5\times$ more energy efficient as compared to SqueezeNet/AlexNet without any accuracy degradation., Comment: 12 Pages
Published: 2018

163. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Author: Yao, Zhewei, Gholami, Amir, Lei, Qi, Keutzer, Kurt, and Mahoney, Michael W.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss function changes when training with large batch size. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum. Furthermore, we show that robust training allows one to favor flat areas, as points with large Hessian spectrum show poor robustness to adversarial perturbation. We further study this relationship, and provide empirical and theoretical proof that the inner loop for robust training is a saddle-free optimization problem \textit{almost everywhere}. We present detailed experiments with five different network architectures, including a residual network, tested on MNIST, CIFAR-10, and CIFAR-100 datasets. We have open sourced our method which can be accessed at [1]., Comment: Presented in NeurIPS'18 conference
Published: 2018

164. Integrated Model, Batch and Domain Parallelism in Training Neural Networks

Author: Gholami, Amir, Azad, Ariful, Jin, Peter, Keutzer, Kurt, and Buluc, Aydin
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using $P$ processes. Our method is inspired by the communication-avoiding algorithms in numerical linear algebra. We see $P$ processes as logically divided into a $P_r \times P_c$ grid where the $P_r$ dimension is implicitly responsible for model/domain parallelism and the $P_c$ dimension is implicitly responsible for batch parallelism. In practice, the integrated matrix-based parallel algorithm encapsulates these types of parallelism automatically. We analyze the communication complexity and analytically demonstrate that the lowest communication costs are often achieved neither with pure model nor with pure data parallelism. We also show how the domain parallel approach can help in extending the theoretical scaling limit of the typical batch parallel method., Comment: 11 pages
Published: 2017

165. Network Processors: Origin of Species

Author: Shah, Niraj, primary and Keutzer, Kurt, additional
Published: 2022
Full Text: View/download PDF

166. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions

Author: Wu, Bichen, Wan, Alvin, Yue, Xiangyu, Jin, Peter, Zhao, Sicheng, Golmant, Noah, Gholaminejad, Amir, Gonzalez, Joseph, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural networks rely on convolutions to aggregate spatial information. However, spatial convolutions are expensive in terms of model size and computation, both of which grow quadratically with respect to kernel size. In this paper, we present a parameter-free, FLOP-free "shift" operation as an alternative to spatial convolutions. We fuse shifts and point-wise convolutions to construct end-to-end trainable shift-based modules, with a hyperparameter characterizing the tradeoff between accuracy and efficiency. To demonstrate the operation's efficacy, we replace ResNet's 3x3 convolutions with shift-based modules for improved CIFAR10 and CIFAR100 accuracy using 60% fewer parameters; we additionally demonstrate the operation's resilience to parameter reduction on ImageNet, outperforming ResNet family members. We finally show the shift operation's applicability across domains, achieving strong performance with fewer parameters on classification, face verification and style transfer., Comment: Source code will be released afterwards
Published: 2017

167. Regret Minimization for Partially Observable Deep Reinforcement Learning

Author: Jin, Peter, Keutzer, Kurt, and Levine, Sergey
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong., Comment: ICML 2018
Published: 2017

168. SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud

Author: Wu, Bichen, Wan, Alvin, Yue, Xiangyu, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we address semantic segmentation of road-objects from 3D LiDAR point clouds. In particular, we wish to detect and categorize instances of interest, such as cars, pedestrians and cyclists. We formulate this problem as a point- wise classification problem, and propose an end-to-end pipeline called SqueezeSeg based on convolutional neural networks (CNN): the CNN takes a transformed LiDAR point cloud as input and directly outputs a point-wise label map, which is then refined by a conditional random field (CRF) implemented as a recurrent layer. Instance-level labels are then obtained by conventional clustering algorithms. Our CNN model is trained on LiDAR point clouds from the KITTI dataset, and our point-wise segmentation labels are derived from 3D bounding boxes from KITTI. To obtain extra training data, we built a LiDAR simulator into Grand Theft Auto V (GTA-V), a popular video game, to synthesize large amounts of realistic training data. Our experiments show that SqueezeSeg achieves high accuracy with astonishingly fast and stable runtime (8.7 ms per frame), highly desirable for autonomous driving applications. Furthermore, additionally training on synthesized data boosts validation accuracy on real-world data. Our source code and synthesized data will be open-sourced.
Published: 2017

169. Keynote: Small Neural Nets Are Beautiful: Enabling Embedded Systems with Small Deep-Neural-Network Architectures

Author: Iandola, Forrest and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Over the last five years Deep Neural Nets have offered more accurate solutions to many problems in speech recognition, and computer vision, and these solutions have surpassed a threshold of acceptability for many applications. As a result, Deep Neural Networks have supplanted other approaches to solving problems in these areas, and enabled many new applications. While the design of Deep Neural Nets is still something of an art form, in our work we have found basic principles of design space exploration used to develop embedded microprocessor architectures to be highly applicable to the design of Deep Neural Net architectures. In particular, we have used these design principles to create a novel Deep Neural Net called SqueezeNet that requires as little as 480KB of storage for its model parameters. We have further integrated all these experiences to develop something of a playbook for creating small Deep Neural Nets for embedded systems., Comment: Keynote at Embedded Systems Week (ESWEEK) 2017
Published: 2017

170. ImageNet Training in Minutes

Author: You, Yang, Zhang, Zhao, Hsieh, Cho-Jui, Demmel, James, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Finishing 90-epoch ImageNet-1k training with ResNet-50 on a NVIDIA M40 GPU takes 14 days. This training requires 10^18 single precision operations in total. On the other hand, the world's current fastest supercomputer can finish 2 * 10^17 single precision operations per second (Dongarra et al 2017, https://www.top500.org/lists/2017/06/). If we can make full use of the supercomputer for DNN training, we should be able to finish the 90-epoch ResNet-50 training in one minute. However, the current bottleneck for fast DNN training is in the algorithm level. Specifically, the current batch size (e.g. 512) is too small to make efficient use of many processors. For large-scale DNN training, we focus on using large-batch data-parallelism synchronous SGD without losing accuracy in the fixed epochs. The LARS algorithm (You, Gitman, Ginsburg, 2017, arXiv:1708.03888) enables us to scale the batch size to extremely large case (e.g. 32K). We finish the 100-epoch ImageNet training with AlexNet in 11 minutes on 1024 CPUs. About three times faster than Facebook's result (Goyal et al 2017, arXiv:1706.02677), we finish the 90-epoch ImageNet training with ResNet-50 in 20 minutes on 2048 KNLs without losing accuracy. State-of-the-art ImageNet training speed with ResNet-50 is 74.9% top-1 test accuracy in 15 minutes. We got 74.9% top-1 test accuracy in 64 epochs, which only needs 14 minutes. Furthermore, when we increase the batch size to above 16K, our accuracy is much higher than Facebook's on corresponding batch sizes. Our source code is available upon request.
Published: 2017

171. SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving

Author: Wu, Bichen, Wan, Alvin, Iandola, Forrest, Jin, Peter H., and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Object detection is a crucial task for autonomous driving. In addition to requiring high accuracy to ensure safety, object detection for autonomous driving also requires real-time inference speed to guarantee prompt vehicle control, as well as small model size and energy efficiency to enable embedded system deployment. In this work, we propose SqueezeDet, a fully convolutional neural network for object detection that aims to simultaneously satisfy all of the above constraints. In our network, we use convolutional layers not only to extract feature maps but also as the output layer to compute bounding boxes and class probabilities. The detection pipeline of our model only contains a single forward pass of a neural network, thus it is extremely fast. Our model is fully-convolutional, which leads to a small model size and better energy efficiency. While achieving the same accuracy as previous baselines, our model is 30.4x smaller, 19.7x faster, and consumes 35.2x lower energy. The code is open-sourced at \url{https://github.com/BichenWuUCB/squeezeDet}., Comment: The supplementary material of this paper, which discusses the energy efficiency of SqueezeDet, is attached after the main paper. The source code of this work is open-source released at https://github.com/BichenWuUCB/squeezeDet
Published: 2016

172. Integrated Model, Batch, and Domain Parallelism in Training Neural Networks

Author: Gholami, Amir, Azad, Ariful, Jin, Peter, Keutzer, Kurt, and Buluc, Aydin
Subjects: Information and Computing Sciences, Machine Learning, cs.LG, stat.ML
Abstract: We propose a new integrated method of exploiting model, batch and domain parallelism for the training of deep neural networks (DNNs) on large distributed-memory computers using minibatch stochastic gradient descent (SGD). Our goal is to find an efficient parallelization strategy for a fixed batch size using P processes. Our method is inspired by the communication-avoiding algorithms in numerical linear algebra. We see P processes as logically divided into a Pr × Pc grid where the Pr dimension is implicitly responsible for model/domain parallelism and the Pc dimension is implicitly responsible for batch parallelism. In practice, the integrated matrix-based parallel algorithm encapsulates these types of parallelism automatically. We analyze the communication complexity and analytically demonstrate that the lowest communication costs are often achieved neither with pure model nor with pure data parallelism. We also show how the domain parallel approach can help in extending the theoretical scaling limit of the typical batch parallel method.
Published: 2018

173. A LiDAR Point Cloud Generator

Author: Yue, Xiangyu, Wu, Bichen, Seshia, Sanjit A, Keutzer, Kurt, and Sangiovanni-Vincentelli, Alberto L
Subjects: LiDAR Point Cloud, Simulation Environment, Autonomous Driving, Neural Network Analysis, Neural Network Retraining, cs.CV
Abstract: 3D LiDAR scanners are playing an increasingly important role in autonomous driving as they can generate depth information of the environment. However, creating large 3D LiDAR point cloud datasets with point-level labels requires a significant amount of manual annotation. This jeopardizes the efficient development of supervised deep learning algorithms which are often data-hungry. We present a framework to rapidly create point clouds with accurate point-level labels from a computer game. To our best knowledge, this is the first publication on LiDAR point cloud simulation framework for autonomous driving. The framework supports data collection from both auto-driving scenes and user-configured scenes. Point clouds from auto-driving scenes can be used as training data for deep learning algorithms, while point clouds from user-configured scenes can be used to systematically test the vulnerability of a neural network, and use the falsifying examples to make the neural network more robust through retraining. In addition, the scene images can be captured simultaneously in order for sensor fusion tasks, with a method proposed to do automatic registration between the point clouds and captured scene images. We show a significant improvement in accuracy (+9%) in point cloud segmentation by augmenting the training dataset with the generated synthesized data. Our experiments also show by testing and retraining the network using point clouds from user-configured scenes, the weakness/blind spots of the neural network can be fixed.
Published: 2018

174. A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

Author: Moskewicz, Matthew W., Jannesari, Ali, and Keutzer, Kurt
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software
Abstract: In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, low productivity, and low portability. GPU vendors such as NVIDIA have spent enormous effort to write special-purpose DNN libraries. However, on other hardware targets, especially mobile GPUs, such vendor libraries are not generally available. Thus, the development of portable, open, high-performance, energy-efficient GPU code for DNN operations would enable broader deployment of DNN-based algorithms. Toward this end, this work presents a framework to enable productive, high-efficiency GPU programming for DNN computations across hardware platforms and programming models. In particular, the framework provides specific support for metaprogramming, autotuning, and DNN-tailored data types. Using our framework, we explore implementing DNN operations on three different hardware targets: NVIDIA, AMD, and Qualcomm GPUs. On NVIDIA GPUs, we show both portability between OpenCL and CUDA as well competitive performance compared to the vendor library. On Qualcomm GPUs, we show that our framework enables productive development of target-specific optimizations, and achieves reasonable absolute performance. Finally, On AMD GPUs, we show initial results that indicate our framework can yield reasonable performance on a new platform with minimal effort.
Published: 2016

175. How to scale distributed deep learning?

Author: Jin, Peter H., Yuan, Qiaochu, Iandola, Forrest, and Keutzer, Kurt
Subjects: Computer Science - Learning
Abstract: Training time on large datasets for deep neural networks is the principal workflow bottleneck in a number of important applications of deep learning, such as object classification and detection in automatic driver assistance systems (ADAS). To minimize training time, the training of a deep neural network must be scaled beyond a single machine to as many machines as possible by distributing the optimization method used for training. While a number of approaches have been proposed for distributed stochastic gradient descent (SGD), at the current time synchronous approaches to distributed SGD appear to be showing the greatest performance at large scale. Synchronous scaling of SGD suffers from the need to synchronize all processors on each gradient step and is not resilient in the face of failing or lagging processors. In asynchronous approaches using parameter servers, training is slowed by contention to the parameter server. In this paper we compare the convergence of synchronous and asynchronous SGD for training a modern ResNet network architecture on the ImageNet classification problem. We also propose an asynchronous method, gossiping SGD, that aims to retain the positive features of both systems by replacing the all-reduce collective operation of synchronous training with a gossip aggregation algorithm. We find, perhaps counterintuitively, that asynchronous SGD, including both elastic averaging and gossiping, converges faster at fewer nodes (up to about 32 nodes), whereas synchronous SGD scales better to more nodes (up to about 100 nodes)., Comment: Extended version of paper accepted at ML Sys 2016 (at NIPS 2016)
Published: 2016

176. Shallow Networks for High-Accuracy Road Object-Detection

Author: Ashraf, Khalid, Wu, Bichen, Iandola, Forrest N., Moskewicz, Mattthew W., and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The ability to automatically detect other vehicles on the road is vital to the safety of partially-autonomous and fully-autonomous vehicles. Most of the high-accuracy techniques for this task are based on R-CNN or one of its faster variants. In the research community, much emphasis has been applied to using 3D vision or complex R-CNN variants to achieve higher accuracy. However, are there more straightforward modifications that could deliver higher accuracy? Yes. We show that increasing input image resolution (i.e. upsampling) offers up to 12 percentage-points higher accuracy compared to an off-the-shelf baseline. We also find situations where earlier/shallower layers of CNN provide higher accuracy than later/deeper layers. We further show that shallow models and upsampled images yield competitive accuracy. Our findings contrast with the current trend towards deeper and larger models to achieve high accuracy in domain specific detection tasks., Comment: 9 pages, 5 figures
Published: 2016

177. Boda-RTC: Productive Generation of Portable, Efficient Code for Convolutional Neural Networks on Mobile Computing Platforms

Author: Moskewicz, Matthew, Iandola, Forrest, and Keutzer, Kurt
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Mathematical Software, Computer Science - Neural and Evolutionary Computing
Abstract: The popularity of neural networks (NNs) spans academia, industry, and popular culture. In particular, convolutional neural networks (CNNs) have been applied to many image based machine learning tasks and have yielded strong results. The availability of hardware/software systems for efficient training and deployment of large and/or deep CNN models has been, and continues to be, an important consideration for the field. Early systems for NN computation focused on leveraging existing dense linear algebra techniques and libraries. Current approaches use low-level machine specific programming and/or closed-source, purpose-built vendor libraries. In this work, we present an open source system that, compared to existing approaches, achieves competitive computational speed while achieving higher portability. We achieve this by targeting the vendor-neutral OpenCL platform using a code-generation approach. We argue that our approach allows for both: (1) the rapid development of new computational kernels for existing hardware targets, and (2) the rapid tuning of existing computational kernels for new hardware targets. Results are presented for a case study of targeting the Qualcomm Snapdragon 820 mobile computing platform for CNN deployment.
Published: 2016

178. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Author: Iandola, Forrest N., Han, Song, Moskewicz, Matthew W., Ashraf, Khalid, Dally, William J., and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet, Comment: In ICLR Format
Published: 2016

179. Convolutional Monte Carlo Rollouts in Go

Author: Jin, Peter H. and Keutzer, Kurt
Subjects: Computer Science - Learning, Computer Science - Artificial Intelligence
Abstract: In this work, we present a MCTS-based Go-playing program which uses convolutional networks in all parts. Our method performs MCTS in batches, explores the Monte Carlo search tree using Thompson sampling and a convolutional network, and evaluates convnet-based rollouts on the GPU. We achieve strong win rates against open source Go programs and attain competitive results against state of the art convolutional net-based Go-playing programs.
Published: 2015

180. A Survey of Quantization Methods for Efficient Neural Network Inference

Author: Gholami, Amir, primary, Kim, Sehoon, additional, Dong, Zhen, additional, Yao, Zhewei, additional, Mahoney, Michael W., additional, and Keutzer, Kurt, additional
Published: 2022
Full Text: View/download PDF

181. MTTrans: Cross-domain Object Detection with Mean Teacher Transformer

Author: Yu, Jinze, primary, Liu, Jiaming, additional, Wei, Xiaobao, additional, Zhou, Haoyi, additional, Nakata, Yohei, additional, Gudovskiy, Denis, additional, Okuno, Tomoyuki, additional, Li, Jianxin, additional, Keutzer, Kurt, additional, and Zhang, Shanghang, additional
Published: 2022
Full Text: View/download PDF

182. PreTraM: Self-supervised Pre-training via Connecting Trajectory and Map

Author: Xu, Chenfeng, primary, Li, Tian, additional, Tang, Chen, additional, Sun, Lingfeng, additional, Keutzer, Kurt, additional, Tomizuka, Masayoshi, additional, Fathi, Alireza, additional, and Zhan, Wei, additional
Published: 2022
Full Text: View/download PDF

183. MADAN: Multi-source Adversarial Domain Aggregation Network for Domain Adaptation

Author: Zhao, Sicheng, Li, Bo, Xu, Pengfei, Yue, Xiangyu, Ding, Guiguang, and Keutzer, Kurt
Published: 2021
Full Text: View/download PDF

184. A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models

Author: Ma, Rui, Zhou, Qiang, Xiao, Bangjun, Jin, Yizhu, Zhou, Daquan, Li, Xiuyu, Singh, Aishani, Qu, Yi, Keutzer, Kurt, Xie, Xiaodong, Hu, Jingtong, Dong, Zhen, Zhang, Shanghang, Ma, Rui, Zhou, Qiang, Xiao, Bangjun, Jin, Yizhu, Zhou, Daquan, Li, Xiuyu, Singh, Aishani, Qu, Yi, Keutzer, Kurt, Xie, Xiaodong, Hu, Jingtong, Dong, Zhen, and Zhang, Shanghang
Abstract: Copyright is a legal right that grants creators the exclusive authority to reproduce, distribute, and profit from their creative works. However, the recent advancements in text-to-image generation techniques have posed significant challenges to copyright protection, as these methods have facilitated the learning of unauthorized content, artistic creations, and portraits, which are subsequently utilized to generate and disseminate uncontrolled content. Especially, the use of stable diffusion, an emerging model for text-to-image generation, poses an increased risk of unauthorized copyright infringement and distribution. Currently, there is a lack of systematic studies evaluating the potential correlation between content generated by stable diffusion and those under copyright protection. Conducting such studies faces several challenges, including i) the intrinsic ambiguity related to copyright infringement in text-to-image models, ii) the absence of a comprehensive large-scale dataset, and iii) the lack of standardized metrics for defining copyright infringement. This work provides the first large-scale standardized dataset and benchmark on copyright protection. Specifically, we propose a pipeline to coordinate CLIP, ChatGPT, and diffusion models to generate a dataset that contains anchor images, corresponding prompts, and images generated by text-to-image models, reflecting the potential abuses of copyright. Furthermore, we explore a suite of evaluation metrics to judge the effectiveness of copyright protection methods. The proposed dataset, benchmark library, and evaluation metrics will be open-sourced to facilitate future research and application. The website and dataset can be accessed website dataset., Comment: Improve experimental content
Published: 2024

185. FireCaffe: near-linear acceleration of deep neural network training on compute clusters

Author: Iandola, Forrest N., Ashraf, Khalid, Moskewicz, Matthew W., and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers; DNN training is not an exception to this rule. Therefore, the key consideration here is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train. Our approach has three key pillars. First, we select network hardware that achieves high bandwidth between GPU servers -- Infiniband or Cray interconnects are ideal for this. Second, we consider a number of communication algorithms, and we find that reduction trees are more efficient and scalable than the traditional parameter server approach. Third, we optionally increase the batch size to reduce the total quantity of communication during DNN training, and we identify hyperparameters that allow us to reproduce the small-batch accuracy while training with large batch sizes. When training GoogLeNet and Network-in-Network on ImageNet, we achieve a 47x and 39x speedup, respectively, when training on a cluster of 128 GPUs., Comment: Version 2: Added results on 128 GPUs
Published: 2015

186. DeepLogo: Hitting Logo Recognition with the Deep Neural Network Hammer

Author: Iandola, Forrest N., Shen, Anting, Gao, Peter, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, there has been a flurry of industrial activity around logo recognition, such as Ditto's service for marketers to track their brands in user-generated images, and LogoGrab's mobile app platform for logo recognition. However, relatively little academic or open-source logo recognition progress has been made in the last four years. Meanwhile, deep convolutional neural networks (DCNNs) have revolutionized a broad range of object recognition applications. In this work, we apply DCNNs to logo recognition. We propose several DCNN architectures, with which we surpass published state-of-art accuracy on a popular logo recognition dataset.
Published: 2015

187. A Novel Domain Adaptation Framework for Medical Image Segmentation

Author: Gholami, Amir, Subramanian, Shashank, Shenoy, Varun, Himthani, Naveen, Yue, Xiangyu, Zhao, Sicheng, Jin, Peter, Biros, George, Keutzer, Kurt, Hutchison, David, Series Editor, Kanade, Takeo, Series Editor, Kittler, Josef, Series Editor, Kleinberg, Jon M., Series Editor, Mattern, Friedemann, Series Editor, Mitchell, John C., Series Editor, Naor, Moni, Series Editor, Pandu Rangan, C., Series Editor, Steffen, Bernhard, Series Editor, Terzopoulos, Demetri, Series Editor, Tygar, Doug, Series Editor, Crimi, Alessandro, editor, Bakas, Spyridon, editor, Kuijf, Hugo, editor, Keyvan, Farahani, editor, Reyes, Mauricio, editor, and van Walsum, Theo, editor
Published: 2019
Full Text: View/download PDF

188. Multitask Vision-Language Prompt Tuning

Author: Shen, Sheng, primary, Yang, Shijia, additional, Zhang, Tianjun, additional, Zhai, Bohan, additional, Gonzalez, Joseph E., additional, Keutzer, Kurt, additional, and Darrell, Trevor, additional
Published: 2024
Full Text: View/download PDF

189. Namsel: An Optical Character Recognition System for Tibetan Text

Author: Rowinski, Zach and Keutzer, Kurt
Subjects: NLP, Tibetan, Optical Character Recogition, OCR
Abstract: The use of advanced computational methods for the analysis of large corpora of electronic texts is becoming increasingly popular in humanities and social science research. Unfortunately, Tibetan Studies has lacked such a repository of electronic, searchable texts. The automated recognition of printed texts, known as Optical Character Recognition (OCR), offers a solution to this problem; however, until recently, robust OCR systems for the Tibetan language have not been available. In this paper, we introduce one new system, called Namsel, which uses Optical Character Recognition (OCR) to support the production, review, and distribution of searchable Tibetan texts at a large scale. Namsel tackles a number of challenges unique to the recognition of complex scripts such as Tibean uchen and has been able to achieve high accuracy rates on a wide range of machine-printed works. In this paper, we discuss the details of Tibetan OCR, how Namsel works, and the problems it is able to solve. We also discuss the collaborative work between Namsel and its partner libraries aimed at building a comprehensive database of historical and modern Tibetan works—a database that consists of more than one million pages of texts spanning over a thousand years of literary production.
Published: 2016

190. DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

Author: Iandola, Forrest, Moskewicz, Matt, Karayev, Sergey, Girshick, Ross, Darrell, Trevor, and Keutzer, Kurt
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Convolutional Neural Networks (CNNs) can provide accurate object classification. They can be extended to perform object detection by iterating over dense or selected proposed object regions. However, the runtime of such detectors scales as the total number and/or area of regions to examine per image, and training such detectors may be prohibitively slow. However, for some CNN classifier topologies, it is possible to share significant work among overlapping regions to be classified. This paper presents DenseNet, an open source system that computes dense, multiscale features from the convolutional layers of a CNN based object classifier. Future work will involve training efficient object detectors with DenseNet feature descriptors.
Published: 2014

191. Fast LSTM by dynamic decomposition on cloud and distributed systems

Author: You, Yang, He, Yuxiong, Rajbhandari, Samyam, Wang, Wenhan, Hsieh, Cho-Jui, Keutzer, Kurt, and Demmel, James
Published: 2020
Full Text: View/download PDF

192. SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization

Author: Guo, Mingfei, primary, Dong, Zhen, additional, and Keutzer, Kurt, additional
Published: 2023
Full Text: View/download PDF

193. SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation

Author: Xu, Chenfeng, primary, Wu, Bichen, additional, Wang, Zining, additional, Zhan, Wei, additional, Vajda, Peter, additional, Keutzer, Kurt, additional, and Tomizuka, Masayoshi, additional
Published: 2020
Full Text: View/download PDF

194. Quadric Representations for LiDAR Odometry, Mapping and Localization

Author: Xia, Chao, primary, Xu, Chenfeng, additional, Rim, Patrick, additional, Ding, Mingyu, additional, Zheng, Nanning, additional, Keutzer, Kurt, additional, Tomizuka, Masayoshi, additional, and Zhan, Wei, additional
Published: 2023
Full Text: View/download PDF

195. CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

Author: Xiao, Lirui, primary, Yang, Huanrui, additional, Dong, Zhen, additional, Keutzer, Kurt, additional, Du, Li, additional, and Zhang, Shanghang, additional
Published: 2023
Full Text: View/download PDF

196. Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

Author: Lu, Yuheng, primary, Xu, Chenfeng, additional, Wei, Xiaobao, additional, Xie, Xiaodong, additional, Tomizuka, Masayoshi, additional, Keutzer, Kurt, additional, and Zhang, Shanghang, additional
Published: 2023
Full Text: View/download PDF

197. NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

Author: Liu, Yijiang, primary, Yang, Huanrui, additional, Dong, Zhen, additional, Keutzer, Kurt, additional, Du, Li, additional, and Zhang, Shanghang, additional
Published: 2023
Full Text: View/download PDF

198. Technology Mapping

Author: Keutzer, Kurt, Ravindran, Kaushik, and Kao, Ming-Yang, editor
Published: 2016
Full Text: View/download PDF

199. A Novel Domain Adaptation Framework for Medical Image Segmentation

Author: Gholami, Amir, primary, Subramanian, Shashank, additional, Shenoy, Varun, additional, Himthani, Naveen, additional, Yue, Xiangyu, additional, Zhao, Sicheng, additional, Jin, Peter, additional, Biros, George, additional, and Keutzer, Kurt, additional
Published: 2019
Full Text: View/download PDF

200. Overview of the Factors Affecting the Power Consumption

Author: Chinnery, David, Keutzer, Kurt, Chinnery, David, and Keutzer, Kurt
Published: 2007
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,010 results on '"Keutzer, Kurt"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources