263 results on '"Liu, Jihao"'
Search Results
2. Flop between algebraically integrable foliations on potentially klt varieties
- Author
-
Chen, Yifei, Liu, Jihao, and Wang, Yanze
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75 - Abstract
We prove that for any two minimal models of an lc algebraically integrable foliated triple on potentially klt varieties, there exist small birational models that are connected by a sequence of flops. In particular, any two minimal models of lc algebraically integrable foliated triples on $\mathbb Q$-factorial klt varieties are connected by a sequence of flops. We also discuss the connection between minimal models for possibly non-algebraically integrable foliations on threefolds, assuming the minimal model program for generalized foliated quadruples., Comment: 12 pages
- Published
- 2024
3. ACC for local volumes
- Author
-
Han, Jingjun, Liu, Jihao, and Qi, Lu
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14B05 - Abstract
We prove the ACC conjecture for local volumes. Moreover, when the local volume is bounded away from zero, we prove Shokurov's ACC conjecture for minimal log discrepancies., Comment: 22 pages, remove the assumption "Q-Gorenstein" in Theorem 1.7
- Published
- 2024
4. Minimal model program for algebraically integrable adjoint foliated structures
- Author
-
Cascini, Paolo, Han, Jingjun, Liu, Jihao, Meng, Fanjun, Spicer, Calum, Svaldi, Roberto, and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75 - Abstract
For $\mathbb Q$-factorial klt algebraically integrable adjoint foliated structures, we prove the cone theorem, the contraction theorem, and the existence of flips. Therefore, we deduce the existence of the minimal model program for such structures. We also prove the base-point-freeness theorem for such structures of general type and establish an adjunction formula and the existence of $\mathbb Q$-factorial quasi-dlt modifications for algebraically integrable adjoint foliated structures., Comment: 50 pages
- Published
- 2024
5. MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
- Author
-
Liu, Jihao, Huang, Xin, Zheng, Jinliang, Liu, Boxiao, Wang, Jia, Yoshie, Osamu, Liu, Yu, and Li, Hongsheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Machine Learning - Abstract
This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs). While existing visual instruction datasets often focus on question-answering, they struggle to generalize to broader application scenarios such as creative writing, summarization, or image analysis. To address these limitations, we propose a novel approach to constructing MM-Instruct that leverages the strong instruction-following capabilities of existing LLMs to generate novel visual instruction data from large-scale but conventional image captioning datasets. MM-Instruct first leverages ChatGPT to automatically generate diverse instructions from a small set of seed instructions through augmenting and summarization. It then matches these instructions with images and uses an open-sourced large language model (LLM) to generate coherent answers to the instruction-image pairs. The LLM is grounded by the detailed text descriptions of images in the whole answer generation process to guarantee the alignment of the instruction data. Moreover, we introduce a benchmark based on the generated instruction data to evaluate the instruction-following capabilities of existing LMMs. We demonstrate the effectiveness of MM-Instruct by training a LLaVA-1.5 model on the generated data, denoted as LLaVA-Instruct, which exhibits significant improvements in instruction-following capabilities compared to LLaVA-1.5 models. The MM-Instruct dataset, benchmark, and pre-trained models are available at https://github.com/jihaonew/MM-Instruct., Comment: Dataset and models are available at https://github.com/jihaonew/MM-Instruct
- Published
- 2024
6. Volume of algebraically integrable foliations and locally stable families
- Author
-
Han, Jingjun, Jiao, Junpeng, Li, Mengchu, and Liu, Jihao
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75 - Abstract
In this paper, we study the volume of algebraically integrable foliations and locally stable families. We show that, for any canonical algebraically integrable foliation, its volume belongs to a discrete set depending only on its rank and the volume of its general leaves. In particular, if the foliation is of general type, then its volume has a positive lower bound depending only on its rank and the volume of its general leaves. This implies some special cases of a question posed by Cascini, Hacon, and Langer. As a consequence, we show that the relative volume of a stable family with a normal generic fiber belongs to a discrete set if the dimension and the volume of its general fibers are bounded. Log versions of the aforementioned theorems are also provided and proved., Comment: 24 pages
- Published
- 2024
7. Exceptional Fano varieties with small minimal log discrepancy
- Author
-
Esser, Louis, Liu, Jihao, and Wang, Chengxi
- Subjects
Mathematics - Algebraic Geometry ,14J40, 14J45 (primary), 14C20, 14E30, 14J17 (secondary) - Abstract
We construct exceptional Fano varieties with the smallest known minimal log discrepancies in all dimensions. These varieties are well-formed hypersurfaces in weighted projective space. Their minimal log discrepancies decay doubly exponentially with dimension, and achieve the optimal value in dimension 2., Comment: 28 pages
- Published
- 2024
8. Instruction-Guided Visual Masking
- Author
-
Zheng, Jinliang, Li, Jianxiong, Cheng, Sijie, Zheng, Yinan, Li, Jiaming, Liu, Jihao, Liu, Yu, Liu, Jingjing, and Zhan, Xianyuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Robotics - Abstract
Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model. By constructing visual masks for instruction-irrelevant regions, IVM-enhanced multimodal models can effectively focus on task-relevant image regions to better align with complex instructions. Specifically, we design a visual masking data generation pipeline and create an IVM-Mix-1M dataset with 1 million image-instruction pairs. We further introduce a new learning technique, Discriminator Weighted Supervised Learning (DWSL) for preferential IVM training that prioritizes high-quality data samples. Experimental results on generic multimodal tasks such as VQA and embodied robotic control demonstrate the versatility of IVM, which as a plug-and-play tool, significantly boosts the performance of diverse multimodal models, yielding new state-of-the-art results across challenging multimodal benchmarks. Code, model and data are available at https://github.com/2toinf/IVM., Comment: NeurIPS 2024
- Published
- 2024
9. Enhancing Vision-Language Model with Unmasked Token Alignment
- Author
-
Liu, Jihao, Zheng, Jinliang, Liu, Boxiao, Liu, Yu, and Li, Hongsheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Contrastive pre-training on image-text pairs, exemplified by CLIP, becomes a standard technique for learning multi-modal visual-language representations. Although CLIP has demonstrated remarkable performance, training it from scratch on noisy web-scale datasets is computationally demanding. On the other hand, mask-then-predict pre-training approaches, like Masked Image Modeling (MIM), offer efficient self-supervised learning for single-modal representations. This paper introduces Unmasked Token Alignment (UTA), a method that leverages existing CLIP models to further enhance its vision-language representations. UTA trains a Vision Transformer (ViT) by aligning unmasked visual tokens to the corresponding image tokens from a frozen CLIP vision encoder, which automatically aligns the ViT model with the CLIP text encoder. The pre-trained ViT can be directly applied for zero-shot evaluation even without training on image-text pairs. Compared to MIM approaches, UTA does not suffer from training-finetuning inconsistency and is much more training-efficient by avoiding using the extra [MASK] tokens. Extensive experimental results demonstrate that UTA can enhance CLIP models and outperform existing MIM methods on various uni- and multi-modal benchmarks. Code and models are available at https://github.com/jihaonew/UTA., Comment: Accepted by TMLR; Code and models are available at https://github.com/jihaonew/UTA
- Published
- 2024
10. GLID: Pre-training a Generalist Encoder-Decoder Vision Model
- Author
-
Liu, Jihao, Zheng, Jinliang, Liu, Yu, and Li, Hongsheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper proposes a GeneraLIst encoder-Decoder (GLID) pre-training method for better handling various downstream computer vision tasks. While self-supervised pre-training approaches, e.g., Masked Autoencoder, have shown success in transfer learning, task-specific sub-architectures are still required to be appended for different downstream tasks, which cannot enjoy the benefits of large-scale pre-training. GLID overcomes this challenge by allowing the pre-trained generalist encoder-decoder to be fine-tuned on various vision tasks with minimal task-specific architecture modifications. In the GLID training scheme, pre-training pretext task and other downstream tasks are modeled as "query-to-answer" problems, including the pre-training pretext task and other downstream tasks. We pre-train a task-agnostic encoder-decoder with query-mask pairs. During fine-tuning, GLID maintains the pre-trained encoder-decoder and queries, only replacing the topmost linear transformation layer with task-specific linear heads. This minimizes the pretrain-finetune architecture inconsistency and enables the pre-trained model to better adapt to downstream tasks. GLID achieves competitive performance on various vision tasks, including object detection, image segmentation, pose estimation, and depth estimation, outperforming or matching specialist models such as Mask2Former, DETR, ViTPose, and BinsFormer., Comment: CVPR 2024
- Published
- 2024
11. Minimal model program for algebraically integrable foliations on klt varieties
- Author
-
Liu, Jihao, Meng, Fanjun, and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75 - Abstract
For lc algebraically integrable foliations on klt varieties, we prove the base-point-freeness theorem, the contraction theorem, and the existence of flips. The first result resolves a conjecture of Cascini and Spicer, while the latter two results strengthen a result of Cascini and Spicer by removing their assumption on the termination of flips. Moreover, we prove the existence of the minimal model program for lc algebraically integrable foliations on klt varieties and the existence of good minimal models or Mori fiber spaces for lc algebraically integrable foliations polarized with ample divisors on klt varieties. As a consequence, we show that $\mathbb{Q}$-factorial klt varieties with lc algebraically integrable Fano foliation structures are Mori dream spaces. We also show the existence of a Shokurov-type polytope for lc algebraically integrable foliations., Comment: 56 pages. New results added and expositions improved. We additionally prove the base-point-freeness theorem and the finite generation of polarized canonical rings. As a corollary, we also show that every $\mathbb{Q}$-factorial klt variety with an lc algebraically integrable Fano foliation structure is a Mori dream space
- Published
- 2024
12. Infrasound Event Classification Fusion Model Based on Multiscale SE-CNN and BiLSTM
- Author
-
Li, Hongru, Li, Xihai, Tan, Xiaofeng, Niu, Chao, Liu, Jihao, and Liu, Tianyou
- Published
- 2024
- Full Text
- View/download PDF
13. DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
- Author
-
Li, Jianxiong, Zheng, Jinliang, Zheng, Yinan, Mao, Liyuan, Hu, Xiao, Cheng, Sijie, Niu, Haoyi, Liu, Jihao, Liu, Yu, Liu, Jingjing, Zhang, Ya-Qin, and Zhan, Xianyuan
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding. Most existing methods approach these via separate objectives, which often reach sub-optimal solutions. In this paper, we propose a universal unified objective that can simultaneously extract meaningful task progression information from image sequences and seamlessly align them with language instructions. We discover that via implicit preferences, where a visual trajectory inherently aligns better with its corresponding language instruction than mismatched pairs, the popular Bradley-Terry model can transform into representation learning through proper reward reparameterizations. The resulted framework, DecisionNCE, mirrors an InfoNCE-style objective but is distinctively tailored for decision-making tasks, providing an embodied representation learning framework that elegantly extracts both local and global task progression features, with temporal consistency enforced through implicit time contrastive learning, while ensuring trajectory-level instruction grounding via multimodal joint encoding. Evaluation on both simulated and real robots demonstrates that DecisionNCE effectively facilitates diverse downstream policy learning tasks, offering a versatile solution for unified representation and reward learning. Project Page: https://2toinf.github.io/DecisionNCE/, Comment: ICML 2024
- Published
- 2024
14. On the equivalence between the effective adjunction conjectures of Prokhorov-Shokurov and of Li
- Author
-
Han, Jingjun, Liu, Jihao, and Xue, Qingyuan
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75 - Abstract
Prokhorov and Shokurov introduced the famous effective adjunction conjecture, also known as the effective base-point-freeness conjecture. This conjecture asserts that the moduli component of an lc-trivial fibration is effectively base-point-free. Li proposed a variation of this conjecture, which is known as the $\Gamma$-effective adjunction conjecture, and proved that a weaker version of his conjecture is implied by the original Prokhorov-Shokurov conjecture. In this paper, we establish the equivalence of Prokhorov-Shokurov's and Li's effective adjunction conjectures. The key to our proof is the formulation of a uniform rational polytope for canonical bundle formulas, which relies on recent developments in the minimal model program theory of algebraically integrable foliations by Ambro-Cascini-Shokurov-Spicer and Chen-Han-Liu-Xie., Comment: 13 pages. arXiv admin note: text overlap with arXiv:2309.15823
- Published
- 2023
15. On explicit bounds of Fano threefolds
- Author
-
Birkar, Caucher and Liu, Jihao
- Subjects
Mathematics - Algebraic Geometry ,14J30, 14J45, 14E30, 14C20 - Abstract
In this paper, we study the explicit geometry of threefolds, in particular, Fano varieties. We find an explicitly computable positive integer $N$, such that all but a bounded family of Fano threefolds have $N$-complements. This result has many applications on finding explicit bounds of algebraic invariants for threefolds. We provide explicit lower bounds for the first gap of the $\mathbb R$-complementary thresholds for threefolds, the first gap of the global lc thresholds, the smallest minimal log discrepancy of exceptional threefolds, and the volume of log threefolds with reduced boundary and ample log canonical divisor. We also provide an explicit upper bound of the anti-canonical volume of exceptional threefolds. While the bounds in this paper may not and are not expected to be optimal, they are the first explicit bounds of these invariants in dimension three., Comment: 49 pages
- Published
- 2023
16. Minimal model program for algebraically integrable foliations and generalized pairs
- Author
-
Chen, Guodu, Han, Jingjun, Liu, Jihao, and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75 - Abstract
By systematically introducing and studying the structure of algebraically integrable generalized foliated quadruples, we establish the minimal model program for $\mathbb Q$-factorial foliated dlt algebraically integrable foliations and lc generalized pairs by proving their cone theorems, contraction theorems, and the existence of flips. We also provide numerous applications on their birational geometry and resolve a conjecture of Cascini and Spicer., Comment: 137 pages. Minor change: remove a redundant paragraph in introduction
- Published
- 2023
17. The minimal volume of surfaces of log general type with non-empty non-klt locus
- Author
-
Liu, Jihao and Liu, Wenfei
- Subjects
Mathematics - Algebraic Geometry ,14J29, 14B05, 14E30 - Abstract
We show that the minimal volume of surfaces of log general type, with non-empty non-klt locus on the ample model, is $\frac{1}{825}$. Furthermore, the ample model $V$ achieving the minimal volume is determined uniquely up to isomorphism. The canonical embedding presents $V$ as a degree $86$ hypersurface of $\mathbb P(6,11,25,43)$. This motivates a one-parameter deformation of $V$ to klt stable surfaces within the weighted projective space. Consequently, we identify a $\textit{complete}$ rational curve in the corresponding moduli space $M_{\frac{1}{825}}$. As an important application, we deduce that the smallest accumulation point of the set of volumes for projective log canonical surfaces equals $\frac{1}{825}$., Comment: 24 pages
- Published
- 2023
18. ACC for lc thresholds for algebraically integrable foliations
- Author
-
Das, Omprokash, Liu, Jihao, and Mascharak, Roktim
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75 - Abstract
We prove the ACC for lc thresholds and the global ACC for algebraically integrable foliations and provide applications., Comment: 25 pages. arXiv admin note: text overlap with arXiv:2306.00330
- Published
- 2023
19. Uniform rational polytopes of foliated threefolds and the global ACC
- Author
-
Liu, Jihao, Meng, Fanjun, and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 37F75 - Abstract
In this paper, we show the existence of uniform rational lc polytopes for foliations with functional boundaries in dimension $\leq 3$. As an application, we prove the global ACC for foliated threefolds with arbitrary DCC coefficients. We also provide applications on the accumulation points of lc thresholds of foliations in dimension $\leq 3$., Comment: 25 pages
- Published
- 2023
- Full Text
- View/download PDF
20. Classification of Small Sample Nuclear Explosion Seismic Events based on MSSA–XGBoost
- Author
-
Li, Hongru, Li, Xihai, Tan, Xiaofeng, Liu, Tianyou, Zhang, Yun, Liu, Jihao, and Niu, Chao
- Published
- 2024
- Full Text
- View/download PDF
21. Optimal bounds on surfaces
- Author
-
Liu, Jihao and Shokurov, V. V.
- Subjects
Mathematics - Algebraic Geometry ,14J26, 14B05, 14J29, 14E30, 14C20, 14J30 - Abstract
We prove that the first gap of $\mathbb R$-complementary thresholds of surfaces is $\frac{1}{13}$. More precisely, the largest $\mathbb R$-complementary threshold for surfaces that is strictly less than $1$ is $\frac{12}{13}$. This result has many applications in explicit birational geometry of surfaces and threefolds and allows us to find several other optimal bounds on surfaces. We show that the first gap of global log canonical threshold for surfaces is $\frac{1}{13}$, answering a question of V. Alexeev and W. Liu. We show that the minimal volume of log surfaces with reduced boundary and ample log canonical divisor is $\frac{1}{462}$, answering a question of J. Koll\'ar. We show that the smallest minimal log discrepancy (mld) of exceptional surfaces is $\frac{1}{13}$. As a special case, we show that the smallest mld of klt Calabi-Yau surfaces is $\frac{1}{13}$, reproving a recent result of L. Esser, B. Totaro, and C. Wang. After a more detailed classification, we classify all exceptional del Pezzo surfaces that are not $\frac{1}{11}$-lt, and show that the smallest mld of exceptional del Pezzo surfaces is $\frac{3}{35}$. We also get better upper bounds of $n$-complements and Tian's $\alpha$-invariants for surfaces. Finally, as an analogue of our main theorem in high dimensions, we propose a question associating the gaps of $\mathbb R$-complementary thresholds with the gaps of mld's and study some special cases of this question., Comment: 46 pages, 11 tables
- Published
- 2023
22. Vanishing theorems for generalized pairs
- Author
-
Chen, Bingyi, Liu, Jihao, and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14B05 - Abstract
We establish the Kodaira vanishing theorem and the Kawamata-Viehweg vanishing theorem for lc generalized pairs. As a consequence, we provide a new proof of the base-point-freeness theorem for lc generalized pairs. This new approach allows us to prove the contraction theorem for lc generalized pairs without using Koll\'ar's gluing theory., Comment: 12 pages
- Published
- 2023
23. Complements, index theorem, and minimal log discrepancies of foliated surface singularities
- Author
-
Liu, Jihao, Meng, Fanjun, and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75, 14B05 - Abstract
We present an extension of several results on pairs and varieties to foliated surface pairs. We prove the boundedness of local complements, the local index theorem, and the uniform boundedness of minimal log discrepancies (mlds), as well as establishing the existence of uniform rational lc polytopes. Furthermore, we address two questions posed by P. Cascini and C. Spicer on foliations, providing negative responses. We also demonstrate that the Grauert-Riemenschneider type vanishing theorem generally fails for lc foliations on surfaces. In addition, we determine the set of minimal log discrepancies for foliated surface pairs with specific coefficients, which leads to the recovery of Y.-A. Chen's proof on the ascending chain condition conjecture for mlds for foliated surfaces., Comment: 29 pages
- Published
- 2023
- Full Text
- View/download PDF
24. On global ACC for foliated threefolds
- Author
-
Liu, Jihao, Luo, Yujie, and Meng, Fanjun
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Dynamical Systems ,14E30, 37F75 - Abstract
In this paper, we prove the rational coefficient case of the global ACC for foliated threefolds. Specifically, we consider any lc foliated log Calabi-Yau triple $(X,\mathcal{F},B)$ of dimension $3$ whose coefficients belong to a set $\Gamma$ of rational numbers satisfying the descending chain condition, and prove that the coefficients of $B$ belong to a finite set depending only on $\Gamma$. To prove our main result, we introduce the concept of generalized foliated quadruples, which is a mixture of foliated triples and Birkar-Zhang's generalized pairs. With this concept, we establish a canonical bundle formula for foliations in any dimension. As for applications, we extend Shokurov's global index conjecture in the classical MMP to foliated triples and prove this conjecture for threefolds with nonzero boundaries and for surfaces. Additionally, we introduce the theory of rational polytopes for functional divisors on foliations and prove some miscellaneous results., Comment: 22 pages. Add a paragraph on pages 3-4. Proposition 6.4 and Lemma 7.2 strengthened. Small modification of the proof of 8.1. Reference updated
- Published
- 2023
25. GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
- Author
-
Liu, Jihao, Wang, Tai, Liu, Boxiao, Zhang, Qihang, Liu, Yu, and Li, Hongsheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Multi-view camera-based 3D detection is a challenging problem in computer vision. Recent works leverage a pretrained LiDAR detection model to transfer knowledge to a camera-based student network. However, we argue that there is a major domain gap between the LiDAR BEV features and the camera-based BEV features, as they have different characteristics and are derived from different sources. In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm for improving the multi-view camera-based 3D detection. GeoMIM is a multi-camera vision transformer with Cross-View Attention (CVA) blocks that uses LiDAR BEV features encoded by the pretrained BEV model as learning targets. During pretraining, GeoMIM's decoder has a semantic branch completing dense perspective-view features and the other geometry branch reconstructing dense perspective-view depth maps. The depth branch is designed to be camera-aware by inputting the camera's parameters for better transfer capability. Extensive results demonstrate that GeoMIM outperforms existing methods on nuScenes benchmark, achieving state-of-the-art performance for camera-based 3D object detection and 3D segmentation. Code and pretrained models are available at https://github.com/Sense-X/GeoMIM., Comment: Release code: https://github.com/Sense-X/GeoMIM
- Published
- 2023
26. On effective log Iitaka fibrations and existence of complements
- Author
-
Chen, Guodu, Han, Jingjun, and Liu, Jihao
- Subjects
Mathematics - Algebraic Geometry - Abstract
We study the relationship between Iitaka fibrations and the conjecture on the existence of complements, assuming the good minimal model conjecture. In one direction, we show that the conjecture on the existence of complements implies the effective log Iitaka fibration conjecture. As a consequence, the effective log Iitaka fibration conjecture holds in dimension $3$. In the other direction, for any Calabi-Yau type variety $X$ such that $-K_X$ is nef, we show that $X$ has an $n$-complement for some universal constant $n$ depending only on the dimension of $X$ and two natural invariants of a general fiber of an Iitaka fibration of $-K_X$. We also formulate the decomposable Iitaka fibration conjecture, a variation of the effective log Iitaka fibration conjecture which is closely related to the structure of ample models of pairs with non-rational coefficients, and study its relationship with the forestated conjectures., Comment: 26 pages, comments are very welcome!
- Published
- 2023
27. 2D/3D Reconstruction of Patient-Specific Surface Models and Uncertainty Estimation via Posterior Shape Models
- Author
-
Sun, Wenyuan, Zhao, Yuyun, Liu, Jihao, Zheng, Guoyan, Magjarević, Ratko, Series Editor, Ładyżyński, Piotr, Associate Editor, Ibrahim, Fatimah, Associate Editor, Lackovic, Igor, Associate Editor, Rock, Emilio Sacristan, Associate Editor, Wang, Guangzhi, editor, Yao, Dezhong, editor, Gu, Zhongze, editor, Peng, Yi, editor, Tong, Shanbao, editor, and Liu, Chengyu, editor
- Published
- 2024
- Full Text
- View/download PDF
28. Semi-ampleness of NQC generalized log canonical pairs
- Author
-
Liu, Jihao and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14C20, 14E05 - Abstract
We establish a Koll\'ar-type gluing theory for NQC generalized log canonical pairs and use it to prove semi-ampleness results of NQC generalized pairs. As consequences, we prove the existence of flips for any NQC generalized log canonical pair, and show that NQC generalized log canonical singularities are Du Bois., Comment: 26 pages. Final version. Title and abstract changed as suggested by the referee
- Published
- 2022
- Full Text
- View/download PDF
29. On termination of flips and exceptionally non-canonical singularities
- Author
-
Han, Jingjun and Liu, Jihao
- Subjects
Mathematics - Algebraic Geometry - Abstract
We reduce the termination of flips to the termination of terminal flips and the ACC conjecture for minimal log discrepancies (mlds) for exceptionally non-canonical (enc) pairs, a class of very restrictive singularities. As a consequence, the ACC conjecture for enc pairs implies the termination of flips in dimension $4$. We also show that the termination of flips follows from the lower-semicontinuity for mlds for terminal pairs, and the ACC for mlds for terminal and enc pairs. Moreover, in dimension $3$, we give a rough classification of enc singularities, and prove the ACC for mlds for enc pairs. These two results provide a proof on the termination of flips in dimension $3$ which does not rely on any difficulty function., Comment: 64 pages, comments are very welcome!
- Published
- 2022
30. Infinitesimal structure of log canonical thresholds
- Author
-
Liu, Jihao, Meng, Fanjun, and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14B05 - Abstract
We show that log canonical thresholds of fixed dimension are standardized. More precisely, we show that any sequence of log canonical thresholds in fixed dimension $d$ accumulates in a way which is i) either similar to how standard and hyperstandard sets accumulate, or ii) to log canonical thresholds in dimension $\leq d-2$. This provides an accurate description on the infinitesimal structure of the set of log canonical thresholds. We also discuss similar behaviors of minimal log discrepancies, canonical thresholds, and K-semistable thresholds., Comment: 22 pages
- Published
- 2022
- Full Text
- View/download PDF
31. Remark on complements on surfaces
- Author
-
Liu, Jihao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14B05 - Abstract
We give an explicit characterization on the singularities of exceptional pairs in any dimension. In particular, we show that any exceptional Fano surface is $\frac{1}{42}$-lc. As corollaries, we show that any $\mathbb R$-complementary surface $X$ has an $n$-complement for some integer $n\leq 192\cdot 84^{128\cdot 42^5}\approx 10^{10^{10.5}}$, and Tian's alpha invariant for any surface is $\leq 3\sqrt{2}\cdot 84^{64\cdot 42^5}\approx 10^{10^{10.2}}$. Although the latter two values are expected to be far from being optimal, they are the first explicit upper bounds of these two algebraic invariants for surfaces., Comment: 7 pages. Final version. One estimation number changed. Add postscript
- Published
- 2022
32. Uniform rational polytopes for Iitaka dimensions
- Author
-
Chen, Guodu, Han, Jingjun, and Liu, Jihao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14B05 - Abstract
In this paper, we continue to develop the theories on functional pairs and uniform rational polytopes. We show that there is a uniform perturbation for Iitaka dimensions of pseudo-effective lc pairs of fixed dimension with DCC coefficients assuming the non-vanishing conjecture. We also show the existence of uniform rational polytopes for Iitaka dimensions of pseudo-effective lc pairs assuming the non-vanishing conjecture., Comment: 17 pages. Dedicated to Prof. Vyacheslav V. Shokurov on the occasion of his seventieth birthday
- Published
- 2022
33. Relative Nakayama-Zariski decomposition and minimal models of generalized pairs
- Author
-
Liu, Jihao and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14C20.14E05, 14J17 - Abstract
We prove some basic properties of the relative Nakayama-Zariski decomposition. We apply them to the study of lc generalized pairs. We prove the existence of log minimal models or Mori fiber spaces for (relative) lc generalized pairs polarized by an ample divisor. This extends a result of Hashizume-Hu to generalized pairs. We also show that, for any lc generalized pair $(X,B+A,{\bf{M}})/Z$ such that $K_X+B+A+{\bf{M}}_X\sim_{\mathbb R,Z}0$ and $B\geq 0,A\geq 0$, $(X,B,{\bf{M}})/Z$ has either a log minimal model or a Mori fiber space. This is an analogue of a result of Birkar/Hacon-Xu and Hashizume in the category of generalized pairs, and is later shown to be crucial to the proof of the existence of lc generalized flips in full generality., Comment: 39 pages. Final Version. Correction made in Section 3. Main theorems remain unaffected. To appear in Peking Math. J
- Published
- 2022
34. TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
- Author
-
Liu, Jihao, Liu, Boxiao, Zhou, Hang, Li, Hongsheng, and Liu, Yu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
CutMix is a popular augmentation technique commonly used for training modern convolutional and transformer vision networks. It was originally designed to encourage Convolution Neural Networks (CNNs) to focus more on an image's global context instead of local information, which greatly improves the performance of CNNs. However, we found it to have limited benefits for transformer-based architectures that naturally have a global receptive field. In this paper, we propose a novel data augmentation technique TokenMix to improve the performance of vision transformers. TokenMix mixes two images at token level via partitioning the mixing region into multiple separated parts. Besides, we show that the mixed learning target in CutMix, a linear combination of a pair of the ground truth labels, might be inaccurate and sometimes counter-intuitive. To obtain a more suitable target, we propose to assign the target score according to the content-based neural activation maps of the two images from a pre-trained teacher model, which does not need to have high performance. With plenty of experiments on various vision transformer architectures, we show that our proposed TokenMix helps vision transformers focus on the foreground area to infer the classes and enhances their robustness to occlusion, with consistent performance gains. Notably, we improve DeiT-T/S/B with +1% ImageNet top-1 accuracy. Besides, TokenMix enjoys longer training, which achieves 81.2% top-1 accuracy on ImageNet with DeiT-S trained for 400 epochs. Code is available at https://github.com/Sense-X/TokenMix., Comment: ECCV 2022; Code: https://github.com/Sense-X/TokenMix
- Published
- 2022
35. UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
- Author
-
Liu, Jihao, Huang, Xin, Song, Guanglu, Li, Hongsheng, and Liu, Yu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks. However, how to effectively combine those operators to form high-performance hybrid visual architectures still remains a challenge. In this work, we study the learnable combination of convolution, transformer, and MLP by proposing a novel unified architecture search approach. Our approach contains two key designs to achieve the search for high-performance networks. First, we model the very different searchable operators in a unified form, and thus enable the operators to be characterized with the same set of configuration parameters. In this way, the overall search space size is significantly reduced, and the total search cost becomes affordable. Second, we propose context-aware downsampling modules (DSMs) to mitigate the gap between the different types of operators. Our proposed DSMs are able to better adapt features from different types of operators, which is important for identifying high-performance hybrid architectures. Finally, we integrate configurable operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm to fully explore the optimal combination of the operators. To this end, we search a baseline network and scale it up to obtain a family of models, named UniNets, which achieve much better accuracy and efficiency than previous ConvNets and Transformers. In particular, our UniNet-B5 achieves 84.9% top-1 accuracy on ImageNet, outperforming EfficientNet-B7 and BoTNet-T7 with 44% and 55% fewer FLOPs respectively. By pretraining on the ImageNet-21K, our UniNet-B6 achieves 87.4%, outperforming Swin-L with 51% fewer FLOPs and 41% fewer parameters. Code is available at https://github.com/Sense-X/UniNet., Comment: ECCV 2022, code at https://github.com/Sense-X/UniNet. arXiv admin note: substantial text overlap with arXiv:2110.04035
- Published
- 2022
36. Second largest accumulation point of minimal log discrepancies of threefolds
- Author
-
Liu, Jihao and Luo, Yujie
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Combinatorics ,14E30, 14J17, 14J30.14M25, 52B20, 52C07 - Abstract
The second largest accumulation point of the set of minimal log discrepancies of threefolds is $\frac{5}{6}$. In particular, the minimal log discrepancies of $\frac{5}{6}$-lc threefolds satisfy the ACC., Comment: 38 pages
- Published
- 2022
37. Complements, index theorem, and minimal log discrepancies of foliated surface singularities
- Author
-
Liu, Jihao, Meng, Fanjun, and Xie, Lingyao
- Published
- 2024
- Full Text
- View/download PDF
38. MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers
- Author
-
Liu, Jihao, Huang, Xin, Zheng, Jinliang, Liu, Yu, and Li, Hongsheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we propose Mixed and Masked AutoEncoder (MixMAE), a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers. Existing masked image modeling (MIM) methods for hierarchical Vision Transformers replace a random subset of input tokens with a special [MASK] symbol and aim at reconstructing original image tokens from the corrupted image. However, we find that using the [MASK] symbol greatly slows down the training and causes pretraining-finetuning inconsistency, due to the large masking ratio (e.g., 60% in SimMIM). On the other hand, MAE does not introduce [MASK] tokens at its encoder at all but is not applicable for hierarchical Vision Transformers. To solve the issue and accelerate the pretraining of hierarchical models, we replace the masked tokens of one image with visible tokens of another image, i.e., creating a mixed image. We then conduct dual reconstruction to reconstruct the two original images from the mixed input, which significantly improves efficiency. While MixMAE can be applied to various hierarchical Transformers, this paper explores using Swin Transformer with a large window size and scales up to huge model size (to reach 600M parameters). Empirical results demonstrate that MixMAE can learn high-quality visual representations efficiently. Notably, MixMAE with Swin-B/W14 achieves 85.1% top-1 accuracy on ImageNet-1K by pretraining for 600 epochs. Besides, its transfer performances on the other 6 datasets show that MixMAE has better FLOPs / performance tradeoff than previous popular MIM methods. Code is available at https://github.com/Sense-X/MixMIM., Comment: CVPR2023. Code: https://github.com/Sense-X/MixMIM
- Published
- 2022
39. On the fixed part of pluricanonical systems for surfaces
- Author
-
Liu, Jihao and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14B05 - Abstract
We show that $|mK_X|$ defines a birational map and has no fixed part for some bounded positive integer $m$ for any $\frac{1}{2}$-lc surface $X$ such that $K_X$ is big and nef. For every positive integer $n\geq 3$, we construct a sequence of projective surfaces $X_{n,i}$, such that $K_{X_{n,i}}$ is ample, ${\rm{mld}}(X_{n,i})>\frac{1}{n}$ for every $i$, $\lim_{i\rightarrow+\infty}{\rm{mld}}(X_{n,i})=\frac{1}{n}$, and for any positive integer $m$, there exists $i$ such that $|mK_{X_{n,i}}|$ has non-zero fixed part. These results answer the surface case of a question of Xu., Comment: 20 pages, v1
- Published
- 2022
40. On generalized lc pairs with $\mathrm{\textbf b}$-log abundant nef part
- Author
-
Jiao, Junpeng, Liu, Jihao, and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14C20. 14E05, 14J17 - Abstract
We study the behavior of generalized lc pairs with $\mathrm{\textbf b}$-log abundant nef part, a meticulously designed structure on algebraic varieties. We show that this structure is preserved under the canonical bundle formula and sub-adjunction formulas, and is also compatible with the non-vanishing conjecture and the abundance conjecture in the classical minimal model program., Comment: 20 pages, v2. Reference update
- Published
- 2022
41. Meta Knowledge Distillation
- Author
-
Liu, Jihao, Liu, Boxiao, Li, Hongsheng, and Liu, Yu
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Recent studies pointed out that knowledge distillation (KD) suffers from two degradation problems, the teacher-student gap and the incompatibility with strong data augmentations, making it not applicable to training state-of-the-art models, which are trained with advanced augmentations. However, we observe that a key factor, i.e., the temperatures in the softmax functions for generating probabilities of both the teacher and student models, was mostly overlooked in previous methods. With properly tuned temperatures, such degradation problems of KD can be much mitigated. However, instead of relying on a naive grid search, which shows poor transferability, we propose Meta Knowledge Distillation (MKD) to meta-learn the distillation with learnable meta temperature parameters. The meta parameters are adaptively adjusted during training according to the gradients of the learning objective. We validate that MKD is robust to different dataset scales, different teacher/student architectures, and different types of data augmentation. With MKD, we achieve the best performance with popular ViT architectures among compared methods that use only ImageNet-1K as training data, ranging from tiny to large models. With ViT-L, we achieve 86.5% with 600 epochs of training, 0.6% better than MAE that trains for 1,650 epochs., Comment: preprint
- Published
- 2022
42. ACC for minimal log discrepancies of terminal threefolds
- Author
-
Han, Jingjun, Liu, Jihao, and Luo, Yujie
- Subjects
Mathematics - Algebraic Geometry - Abstract
We prove that the ACC conjecture for minimal log discrepancies holds for threefolds in $[1-\delta,+\infty)$, where $\delta>0$ only depends on the coefficient set. We also study Reid's general elephant for pairs, and show Shokurov's conjecture on the existence of $(\epsilon,n)$-complements for threefolds for any $\epsilon\geq 1$. As a key important step, we prove the uniform boundedness of divisors computing minimal log discrepancies for terminal threefolds. We show the ACC for threefold canonical thresholds, and that the set of accumulation points of threefold canonical thresholds is equal to $\{0\}\cup\{\frac{1}{n}\}_{n\in\mathbb Z_{\ge 2}}$ as well., Comment: 87 pages, V2. References of [Che22] updated. Typos fixed. Introduction revised/add new references thanks to suggestions of Prof. Shokurov
- Published
- 2022
43. INTERN: A New Learning Paradigm Towards General Vision
- Author
-
Shao, Jing, Chen, Siyu, Li, Yangguang, Wang, Kun, Yin, Zhenfei, He, Yinan, Teng, Jianing, Sun, Qinghong, Gao, Mengya, Liu, Jihao, Huang, Gengshi, Song, Guanglu, Wu, Yichao, Huang, Yuming, Liu, Fenggang, Peng, Huan, Qin, Shuo, Wang, Chengyu, Wang, Yujie, He, Conghui, Liang, Ding, Liu, Yu, Yu, Fengwei, Yan, Junjie, Lin, Dahua, Wang, Xiaogang, and Qiao, Yu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society. However, down the road, a key challenge awaits us, that is, our capability of meeting rapidly-growing scenario-specific demands is severely limited by the cost of acquiring a commensurate amount of training data. This difficult situation is in essence due to limitations of the mainstream learning paradigm: we need to train a new model for each new scenario, based on a large quantity of well-annotated data and commonly from scratch. In tackling this fundamental problem, we move beyond and develop a new learning paradigm named INTERN. By learning with supervisory signals from multiple sources in multiple stages, the model being trained will develop strong generalizability. We evaluate our model on 26 well-known datasets that cover four categories of tasks in computer vision. In most cases, our models, adapted with only 10% of the training data in the target domain, outperform the counterparts trained with the full set of data, often by a significant margin. This is an important step towards a promising prospect where such a model with general vision capability can dramatically reduce our reliance on data, thus expediting the adoption of AI technologies. Furthermore, revolving around our new paradigm, we also introduce a new data system, a new architecture, and a new benchmark, which, together, form a general vision ecosystem to support its future development in an open and inclusive manner. See project website at https://opengvlab.shlab.org.cn .
- Published
- 2021
44. UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
- Author
-
Liu, Jihao, Li, Hongsheng, Song, Guanglu, Huang, Xin, and Liu, Yu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks. A few works investigated manually combining those operators to design visual network architectures, and can achieve satisfactory performances to some extent. In this paper, we propose to jointly search the optimal combination of convolution, transformer, and MLP for building a series of all-operator network architectures with high performances on visual tasks. We empirically identify that the widely-used strided convolution or pooling based down-sampling modules become the performance bottlenecks when the operators are combined to form a network. To better tackle the global context captured by the transformer and MLP operators, we propose two novel context-aware down-sampling modules, which can better adapt to the global information encoded by transformer and MLP operators. To this end, we jointly search all operators and down-sampling modules in a unified search space. Notably, Our searched network UniNet (Unified Network) outperforms state-of-the-art pure convolution-based architecture, EfficientNet, and pure transformer-based architecture, Swin-Transformer, on multiple public visual benchmarks, ImageNet classification, COCO object detection, and ADE20K semantic segmentation., Comment: technich report
- Published
- 2021
45. Relative Nakayama–Zariski Decomposition and Minimal Models of Generalized Pairs
- Author
-
Liu, Jihao and Xie, Lingyao
- Published
- 2023
- Full Text
- View/download PDF
46. LatentPCN: latent space-constrained point cloud network for reconstruction of 3D patient-specific bone surface models from calibrated biplanar X-ray images
- Author
-
Sun, Wenyuan, Zhao, Yuyun, Liu, Jihao, and Zheng, Guoyan
- Published
- 2023
- Full Text
- View/download PDF
47. Existence of flips for generalized lc pairs
- Author
-
Hacon, Christopher D. and Liu, Jihao
- Subjects
Mathematics - Algebraic Geometry ,14E30, 14C20.14E05, 14J17, 14J30, 14J35 - Abstract
We prove the existence of flips for $\mathbb Q$-factorial NQC generalized lc pairs, and the cone and contraction theorems for NQC generalized lc pairs. This answers a question of C. Birkar which was conjectured by J. Han and Z. Li. As an immediate application, we show that we can run the minimal model program for $\mathbb Q$-factorial NQC generalized lc pairs. In particular, we complete the minimal model program for $\mathbb Q$-factorial NQC generalized lc pairs in dimension $\leq 3$ and pseudo-effective $\mathbb Q$-factorial NQC generalized lc pairs in dimension $4$., Comment: 38 pages, v3. Comments are welcome. Proof of Theorem 1.1 has been greatly simplified
- Published
- 2021
48. FNAS: Uncertainty-Aware Fast Neural Architecture Search
- Author
-
Liu, Jihao, Zhang, Ming, Sun, Yangting, Liu, Boxiao, Song, Guanglu, Liu, Yu, and Li, Hongsheng
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Reinforcement learning (RL)-based neural architecture search (NAS) generally guarantees better convergence yet suffers from the requirement of huge computational resources compared with gradient-based approaches, due to the rollout bottleneck -- exhaustive training for each sampled generation on proxy tasks. In this paper, we propose a general pipeline to accelerate the convergence of the rollout process as well as the RL process in NAS. It is motivated by the interesting observation that both the architecture and the parameter knowledge can be transferred between different experiments and even different tasks. We first introduce an uncertainty-aware critic (value function) in Proximal Policy Optimization (PPO) to utilize the architecture knowledge in previous experiments, which stabilizes the training process and reduces the searching time by 4 times. Further, an architecture knowledge pool together with a block similarity function is proposed to utilize parameter knowledge and reduces the searching time by 2 times. It is the first to introduce block-level weight sharing in RLbased NAS. The block similarity function guarantees a 100% hitting ratio with strict fairness. Besides, we show that a simply designed off-policy correction factor used in "replay buffer" in RL optimization can further reduce half of the searching time. Experiments on the Mobile Neural Architecture Search (MNAS) search space show the proposed Fast Neural Architecture Search (FNAS) accelerates standard RL-based NAS process by ~10x (e.g. ~256 2x2 TPUv2 x days / 20,000 GPU x hour -> 2,000 GPU x hour for MNAS), and guarantees better performance on various vision tasks.
- Published
- 2021
49. Number of singular points on projective surfaces
- Author
-
Liu, Jihao and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry - Abstract
The number of singular points on a klt Fano surface $X$ is $\leq 2\rho(X)+2$., Comment: 10 pages
- Published
- 2021
50. Divisors computing minimal log discrepancies on lc surfaces
- Author
-
Liu, Jihao and Xie, Lingyao
- Subjects
Mathematics - Algebraic Geometry - Abstract
Let $(X\ni x,B)$ be an lc surface germ. If $X\ni x$ is klt, we show that there exists a divisor computing the minimal log discrepancy of $(X\ni x,B)$ that is a Koll\'ar component of $X\ni x$. If $B\not=0$ or $X\ni x$ is not Du Val, we show that any divisor computing the minimal log discrepancy of $(X\ni x,B)$ is a potential lc place of $X\ni x$., Comment: 19 pages. Add an example thanks to Ziquan Zhuang. Comments are welcome
- Published
- 2020
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.