Author: "Lin, Jianzhe" / Database: OAIster - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lin, Jianzhe"' showing total 14 results

Start Over Author "Lin, Jianzhe" Database OAIster

14 results on '"Lin, Jianzhe"'

1. Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models

Author: Diesendruck, Maurice, Lin, Jianzhe, Imani, Shima, Mahalingam, Gayathri, Xu, Mingyang, Zhao, Jie, Diesendruck, Maurice, Lin, Jianzhe, Imani, Shima, Mahalingam, Gayathri, Xu, Mingyang, and Zhao, Jie
Abstract: When LLMs perform zero-shot inference, they typically use a prompt with a task specification, and generate a completion. However, there is no work to explore the possibility of the reverse - going from completion to task specification. In this paper, we employ both directions to perform cycle-supervised learning entirely in-context. Our goal is to create a forward map f : X -> Y (e.g. image -> generated caption), coupled with a backward map g : Y -> X (e.g. caption -> generated image) to construct a cycle-consistency "loss" (formulated as an update to the prompt) to enforce g(f(X)) ~= X. The technique, called CyclePrompt, uses cycle-consistency as a free supervisory signal to iteratively craft the prompt. Importantly, CyclePrompt reinforces model performance without expensive fine-tuning, without training data, and without the complexity of external environments (e.g. compilers, APIs). We demonstrate CyclePrompt in two domains: code generation and image captioning. Our results on the HumanEval coding benchmark put us in first place on the leaderboard among models that do not rely on extra training data or usage of external environments, and third overall. Compared to the GPT4 baseline, we improve accuracy from 80.5% to 87.2%. In the vision-language space, we generate detailed image captions which outperform baseline zero-shot GPT4V captions, when tested against natural (VQAv2) and diagrammatic (FigureQA) visual question-answering benchmarks. To the best of our knowledge, this is the first use of self-supervised learning for prompting.
Published: 2024

2. Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate Group Conversations

Author: Mao, Manqing, Ting, Paishun, Xiang, Yijian, Xu, Mingyang, Chen, Julia, Lin, Jianzhe, Mao, Manqing, Ting, Paishun, Xiang, Yijian, Xu, Mingyang, Chen, Julia, and Lin, Jianzhe
Abstract: Recent advancements in large language models (LLMs) have provided a new avenue for chatbot development, while most existing research has primarily centered on single-user chatbots that focus on deciding "What" to answer after user inputs. In this paper, we identified that multi-user chatbots have more complex 3W design dimensions -- "What" to say, "When" to respond, and "Who" to answer. Additionally, we proposed Multi-User Chat Assistant (MUCA), which is an LLM-based framework for chatbots specifically designed for group discussions. MUCA consists of three main modules: Sub-topic Generator, Dialog Analyzer, and Utterance Strategies Arbitrator. These modules jointly determine suitable response contents, timings, and the appropriate recipients. To make the optimizing process for MUCA easier, we further propose an LLM-based Multi-User Simulator (MUS) that can mimic real user behavior. This enables faster simulation of a conversation between the chatbot and simulated users, making the early development of the chatbot framework much more efficient. MUCA demonstrates effectiveness, including appropriate chime-in timing, relevant content, and improving user engagement, in group conversations with a small to medium number of participants, as evidenced by case studies and experimental results from user studies.
Published: 2024

3. BatchPrompt: Accomplish more with less

Author: Lin, Jianzhe, Diesendruck, Maurice, Du, Liang, Abraham, Robin, Lin, Jianzhe, Diesendruck, Maurice, Du, Liang, and Abraham, Robin
Abstract: As the ever-increasing token limits of large language models (LLMs) have enabled long context as input, prompting with single data samples might no longer an efficient way. A straightforward strategy improving efficiency is to batch data within the token limit (e.g., 8k for gpt-3.5-turbo; 32k for GPT-4), which we call BatchPrompt. We have two initial observations for prompting with batched data. First, we find that prompting with batched data in longer contexts will inevitably lead to worse performance, compared to single-data prompting. Second, the performance of the language model is significantly correlated with the positions and order of the batched data, due to the corresponding change in decoder context. To retain efficiency and overcome performance loss, we propose Batch Permutation and Ensembling (BPE), and a novel Self-reflection-guided EArly Stopping (SEAS) technique. Our comprehensive experimental evaluation demonstrates that BPE can boost the performance of BatchPrompt with a striking margin on a range of popular NLP tasks, including question answering (Boolq), textual entailment (RTE), and duplicate questions identification (QQP). These performances are even competitive with/higher than single-data prompting(SinglePrompt), while BatchPrompt requires much fewer LLM calls and input tokens (For SinglePrompt v.s. BatchPrompt with batch size 32, using just 9%-16% the number of LLM calls, Boolq accuracy 90.6% to 90.9% with 27.4% tokens, QQP accuracy 87.2% to 88.4% with 18.6% tokens, RTE accuracy 91.5% to 91.1% with 30.8% tokens). To the best of our knowledge, this is the first work to technically improve prompting efficiency of large language models. We hope our simple yet effective approach will shed light on the future research of large language models. The code will be released., Comment: 20 pages, 5 figures
Published: 2023

4. CitySurfaces: City-Scale Semantic Segmentation of Sidewalk Materials

Author: Hosseini, Maryam, Miranda, Fabio, Lin, Jianzhe, Silva, Claudio, Hosseini, Maryam, Miranda, Fabio, Lin, Jianzhe, and Silva, Claudio
Abstract: While designing sustainable and resilient urban built environment is increasingly promoted around the world, significant data gaps have made research on pressing sustainability issues challenging to carry out. Pavements are known to have strong economic and environmental impacts; however, most cities lack a spatial catalog of their surfaces due to the cost-prohibitive and time-consuming nature of data collection. Recent advancements in computer vision, together with the availability of street-level images, provide new opportunities for cities to extract large-scale built environment data with lower implementation costs and higher accuracy. In this paper, we propose CitySurfaces, an active learning-based framework that leverages computer vision techniques for classifying sidewalk materials using widely available street-level images. We trained the framework on images from New York City and Boston and the evaluation results show a 90.5% mIoU score. Furthermore, we evaluated the framework using images from six different cities, demonstrating that it can be applied to regions with distinct urban fabrics, even outside the domain of the training data. CitySurfaces can provide researchers and city agencies with a low-cost, accurate, and extensible method to collect sidewalk material data which plays a critical role in addressing major sustainability issues, including climate change and surface water management., Comment: Sustainable Cities and Society journal (accepted); Model: https://github.com/VIDA-NYU/city-surfaces
Published: 2022
Full Text: View/download PDF

5. NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

Author: Sheng, Diwei, Chai, Yuxiang, Li, Xinru, Feng, Chen, Lin, Jianzhe, Silva, Claudio, Rizzo, John-Ross, Sheng, Diwei, Chai, Yuxiang, Li, Xinru, Feng, Chen, Lin, Jianzhe, Silva, Claudio, and Rizzo, John-Ross
Abstract: Visual place recognition (VPR) is critical in not only localization and mapping for autonomous driving vehicles, but also in assistive navigation for the visually impaired population. To enable a long-term VPR system on a large scale, several challenges need to be addressed. First, different applications could require different image view directions, such as front views for self-driving cars while side views for the low vision people. Second, VPR in metropolitan scenes can often cause privacy concerns due to the imaging of pedestrian and vehicle identity information, calling for the need for data anonymization before VPR queries and database construction. Both factors could lead to VPR performance variations that are not well understood yet. To study their influences, we present the NYU-VPR dataset that contains more than 200,000 images over a 2km by 2km area near the New York University campus, taken within the whole year of 2016. We present benchmark results on several popular VPR algorithms showing that side views are significantly more challenging for current VPR methods while the influence of data anonymization is almost negligible, together with our hypothetical explanations and in-depth analysis., Comment: 8 pages, 10 figures, published in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)
Published: 2021

6. IntentVizor: Towards Generic Query Guided Interactive Video Summarization

Author: Wu, Guande, Lin, Jianzhe, Silva, Claudio T., Wu, Guande, Lin, Jianzhe, and Silva, Claudio T.
Abstract: The target of automatic video summarization is to create a short skim of the original long video while preserving the major content/events. There is a growing interest in the integration of user queries into video summarization or query-driven video summarization. This video summarization method predicts a concise synopsis of the original video based on the user query, which is commonly represented by the input text. However, two inherent problems exist in this query-driven way. First, the text query might not be enough to describe the exact and diverse needs of the user. Second, the user cannot edit once the summaries are produced, while we assume the needs of the user should be subtle and need to be adjusted interactively. To solve these two problems, we propose IntentVizor, an interactive video summarization framework guided by generic multi-modality queries. The input query that describes the user's needs are not limited to text but also the video snippets. We further represent these multi-modality finer-grained queries as user `intent', which is interpretable, interactable, editable, and can better quantify the user's needs. In this paper, we use a set of the proposed intents to represent the user query and design a new interactive visual analytic interface. Users can interactively control and adjust these mixed-initiative intents to obtain a more satisfying summary through the interface. Also, to improve the summarization quality via video understanding, a novel Granularity-Scalable Ego-Graph Convolutional Networks (GSE-GCN) is proposed. We conduct our experiments on two benchmark datasets. Comparisons with the state-of-the-art methods verify the effectiveness of the proposed framework. Code and dataset are available at https://github.com/jnzs1836/intent-vizor., Comment: 10 pages and 4 figures, CVPR 2022
Published: 2021

7. ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN

Author: Wu, Guande, Lin, Jianzhe, Silva, Claudio T., Wu, Guande, Lin, Jianzhe, and Silva, Claudio T.
Abstract: Video summarization aims to simplify large scale video browsing by generating concise, short summaries that diver from but well represent the original video. Due to the scarcity of video annotations, recent progress for video summarization concentrates on unsupervised methods, among which the GAN based methods are most prevalent. This type of methods includes a summarizer and a discriminator. The summarized video from the summarizer will be assumed as the final output, only if the video reconstructed from this summary cannot be discriminated from the original one by the discriminator. The primary problems of this GAN based methods are two folds. First, the summarized video in this way is a subset of original video with low redundancy and contains high priority events/entities. This summarization criterion is not enough. Second, the training of the GAN framework is not stable. This paper proposes a novel Entity relationship Aware video summarization method (ERA) to address the above problems. To be more specific, we introduce an Adversarial Spatio Temporal network to construct the relationship among entities, which we think should also be given high priority in the summarization. The GAN training problem is solved by introducing the Wasserstein GAN and two newly proposed video patch/score sum losses. In addition, the score sum loss can also relieve the model sensitivity to the varying video lengths, which is an inherent problem for most current video analysis tasks. Our method substantially lifts the performance on the target benchmark datasets and exceeds the current leaderboard Rank 1 state of the art CSNet (2.1% F1 score increase on TVSum and 3.1% F1 score increase on SumMe). We hope our straightforward yet effective approach will shed some light on the future research of unsupervised video summarization., Comment: 8 pages, 3 figures
Published: 2021

8. Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

Author: Lin, Jianzhe, Yu, Tianze, Wang, Z. Jane, Lin, Jianzhe, Yu, Tianze, and Wang, Z. Jane
Abstract: Annotated images are required for both supervised model training and evaluation in image classification. Manually annotating images is arduous and expensive, especially for multi-labeled images. A recent trend for conducting such laboursome annotation tasks is through crowdsourcing, where images are annotated by volunteers or paid workers online (e.g., workers of Amazon Mechanical Turk) from scratch. However, the quality of crowdsourcing image annotations cannot be guaranteed, and incompleteness and incorrectness are two major concerns for crowdsourcing annotations. To address such concerns, we have a rethinking of crowdsourcing annotations: Our simple hypothesis is that if the annotators only partially annotate multi-label images with salient labels they are confident in, there will be fewer annotation errors and annotators will spend less time on uncertain labels. As a pleasant surprise, with the same annotation budget, we show a multi-label image classifier supervised by images with salient annotations can outperform models supervised by fully annotated images. Our method contributions are 2-fold: An active learning way is proposed to acquire salient labels for multi-label images; and a novel Adaptive Temperature Associated Model (ATAM) specifically using partial annotations is proposed for multi-label image classification. We conduct experiments on practical crowdsourcing data, the Open Street Map (OSM) dataset and benchmark dataset COCO 2014. When compared with state-of-the-art classification methods trained on fully annotated images, the proposed ATAM can achieve higher accuracy. The proposed idea is promising for crowdsourcing data annotation. Our code will be publicly available.
Published: 2021
Full Text: View/download PDF

9. SCIDA: Self-Correction Integrated Domain Adaptation from Single- to Multi-label Aerial Images

Author: Yu, Tianze, Lin, Jianzhe, Mou, Lichao, Hua, Yuansheng, Zhu, Xiaoxiang, Wang, Z. Jane, Yu, Tianze, Lin, Jianzhe, Mou, Lichao, Hua, Yuansheng, Zhu, Xiaoxiang, and Wang, Z. Jane
Abstract: Most publicly available datasets for image classification are with single labels, while images are inherently multi-labeled in our daily life. Such an annotation gap makes many pre-trained single-label classification models fail in practical scenarios. This annotation issue is more concerned for aerial images: Aerial data collected from sensors naturally cover a relatively large land area with multiple labels, while annotated aerial datasets, which are publicly available (e.g., UCM, AID), are single-labeled. As manually annotating multi-label aerial images would be time/labor-consuming, we propose a novel self-correction integrated domain adaptation (SCIDA) method for automatic multi-label learning. SCIDA is weakly supervised, i.e., automatically learning the multi-label image classification model from using massive, publicly available single-label images. To achieve this goal, we propose a novel Label-Wise self-Correction (LWC) module to better explore underlying label correlations. This module also makes the unsupervised domain adaptation (UDA) from single- to multi-label data possible. For model training, the proposed model only uses single-label information yet requires no prior knowledge of multi-labeled data; and it predicts labels for multi-label aerial images. In our experiments, trained with single-labeled MAI-AID-s and MAI-UCM-s datasets, the proposed model is tested directly on our collected Multi-scene Aerial Image (MAI) dataset.
Published: 2021
Full Text: View/download PDF

10. Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory Networks

Author: Hua, Yuansheng, Moua, Lichao, Lin, Jianzhe, Heidler, Konrad, Zhu, Xiao Xiang, Hua, Yuansheng, Moua, Lichao, Lin, Jianzhe, Heidler, Konrad, and Zhu, Xiao Xiang
Abstract: Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time- and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module is devised to retrieve scene prototypes relevant to query multi-scene images for final predictions. Notably, only a limited number of annotated multi-scene images are needed in the training phase. To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset. Experimental results on variant dataset configurations demonstrate the effectiveness of our network. Our dataset and codes are publicly available.
Published: 2021

11. Reciprocal Landmark Detection and Tracking with Extremely Few Annotations

Author: Lin, Jianzhe, Sahebzamani, Ghazal, Luong, Christina, Dezaki, Fatemeh Taheri, Jafari, Mohammad, Abolmaesumi, Purang, Tsang, Teresa, Lin, Jianzhe, Sahebzamani, Ghazal, Luong, Christina, Dezaki, Fatemeh Taheri, Jafari, Mohammad, Abolmaesumi, Purang, and Tsang, Teresa
Abstract: Localization of anatomical landmarks to perform two-dimensional measurements in echocardiography is part of routine clinical workflow in cardiac disease diagnosis. Automatic localization of those landmarks is highly desirable to improve workflow and reduce interobserver variability. Training a machine learning framework to perform such localization is hindered given the sparse nature of gold standard labels; only few percent of cardiac cine series frames are normally manually labeled for clinical use. In this paper, we propose a new end-to-end reciprocal detection and tracking model that is specifically designed to handle the sparse nature of echocardiography labels. The model is trained using few annotated frames across the entire cardiac cine sequence to generate consistent detection and tracking of landmarks, and an adversarial training for the model is proposed to take advantage of these annotated frames. The superiority of the proposed reciprocal model is demonstrated using a series of experiments.
Published: 2021

12. Discriminative Feature Alignment: Improving Transferability of Unsupervised Domain Adaptation by Gaussian-guided Latent Alignment

Author: Wang, Jing, Chen, Jiahong, Lin, Jianzhe, Sigal, Leonid, de Silva, Clarence W., Wang, Jing, Chen, Jiahong, Lin, Jianzhe, Sigal, Leonid, and de Silva, Clarence W.
Abstract: In this study, we focus on the unsupervised domain adaptation problem where an approximate inference model is to be learned from a labeled data domain and expected to generalize well to an unlabeled data domain. The success of unsupervised domain adaptation largely relies on the cross-domain feature alignment. Previous work has attempted to directly align latent features by the classifier-induced discrepancies. Nevertheless, a common feature space cannot always be learned via this direct feature alignment especially when a large domain gap exists. To solve this problem, we introduce a Gaussian-guided latent alignment approach to align the latent feature distributions of the two domains under the guidance of the prior distribution. In such an indirect way, the distributions over the samples from the two domains will be constructed on a common feature space, i.e., the space of the prior, which promotes better feature alignment. To effectively align the target latent distribution with this prior distribution, we also propose a novel unpaired L1-distance by taking advantage of the formulation of the encoder-decoder. The extensive evaluations on nine benchmark datasets validate the superior knowledge transferability through outperforming state-of-the-art methods and the versatility of the proposed method by improving the existing work significantly., Comment: 14 pages, 11 figures
Published: 2020

13. DT-LET: Deep Transfer Learning by Exploring where to Transfer

Author: Lin, Jianzhe, Wang, Qi, Ward, Rabab, Wang, Z. Jane, Lin, Jianzhe, Wang, Qi, Ward, Rabab, and Wang, Z. Jane
Abstract: Previous transfer learning methods based on deep network assume the knowledge should be transferred between the same hidden layers of the source domain and the target domains. This assumption doesn't always hold true, especially when the data from the two domains are heterogeneous with different resolutions. In such case, the most suitable numbers of layers for the source domain data and the target domain data would differ. As a result, the high level knowledge from the source domain would be transferred to the wrong layer of target domain. Based on this observation, "where to transfer" proposed in this paper should be a novel research frontier. We propose a new mathematic model named DT-LET to solve this heterogeneous transfer learning problem. In order to select the best matching of layers to transfer knowledge, we define specific loss function to estimate the corresponding relationship between high-level features of data in the source domain and the target domain. To verify this proposed cross-layer model, experiments for two cross-domain recognition/classification tasks are conducted, and the achieved superior results demonstrate the necessity of layer correspondence searching., Comment: Conference paper submitted to AAAI 2019
Published: 2018

14. Exploring AI-Based Video Segmentation and Saliency Computation To Optimize Imagery-Acquisition From Moving Vehicles

Author: United States. Department of Transportation. University Transportation Centers (UTC) Program, United States. Department of Transportation. Office of the Assistant Secretary for Research and Technology, United States. Department of Transportation. Federal Highway Administration, Silva, Claudio T, Ozbay, Kaan, Rulff de Costa, Joao, Lin, Jianzhe, Hosseini, Maryam, Tokuda, Eric, Connected Cities for Smart Mobility toward Accessible and Resilient Transportation Center (C2SMART), New York University. Tandon School of Engineering, United States. Department of Transportation. University Transportation Centers (UTC) Program, United States. Department of Transportation. Office of the Assistant Secretary for Research and Technology, United States. Department of Transportation. Federal Highway Administration, Silva, Claudio T, Ozbay, Kaan, Rulff de Costa, Joao, Lin, Jianzhe, Hosseini, Maryam, Tokuda, Eric, Connected Cities for Smart Mobility toward Accessible and Resilient Transportation Center (C2SMART), and New York University. Tandon School of Engineering
Abstract: 69A3551747119, In this study, a new dataset and tool were generated describing and mapping street-level infrastructure. The presented dataset, StreetAware, is generated from more than 7 hours of synchronized data collected at urban intersections by specialized Reconfigurable Environmental Intelligence Platform (REIP) sensors developed by the Visualization and Data Analytics (VIDA) Research Center at NYU. To demonstrate these key features of the data, we present four uses for the data that are not possible on many existing datasets. (1) to track objects using the multiple perspectives of multiple cameras from both audio (sound-based localization) and visual modes, (2) to associate audio events with their respective visual representations using audio and video, (3) to track the amount of each type of object in a scene over time, i.e., occupancy, and (4) to measure the speed of a pedestrian while crossing a street using multiple synchronized views and the high-resolution capability of the cameras.

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

14 results on '"Lin, Jianzhe"'

1. Learning How To Ask: Cycle-Consistency Refines Prompts in Multimodal Foundation Models

2. Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate Group Conversations

3. BatchPrompt: Accomplish more with less

4. CitySurfaces: City-Scale Semantic Segmentation of Sidewalk Materials

5. NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

6. IntentVizor: Towards Generic Query Guided Interactive Video Summarization

7. ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN

8. Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification

9. SCIDA: Self-Correction Integrated Domain Adaptation from Single- to Multi-label Aerial Images

10. Aerial Scene Understanding in The Wild: Multi-Scene Recognition via Prototype-based Memory Networks

11. Reciprocal Landmark Detection and Tracking with Extremely Few Annotations

12. Discriminative Feature Alignment: Improving Transferability of Unsupervised Domain Adaptation by Gaussian-guided Latent Alignment

13. DT-LET: Deep Transfer Learning by Exploring where to Transfer

14. Exploring AI-Based Video Segmentation and Saliency Computation To Optimize Imagery-Acquisition From Moving Vehicles

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

14 results on '"Lin, Jianzhe"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources