Author: "Song, Xinhang" / Publication Type: Academic Journals - Searchworks@Jio Institute Digital Library Search Results

1. Automated Segmentation and Classification of Knee Synovitis Based on MRI Using Deep Learning

Author: Wang, Qizheng, Yao, Meiyi, Song, Xinhang, Liu, Yandong, Xing, Xiaoying, Chen, Yongye, Zhao, Fangbo, Liu, Ke, Cheng, Xiaoguang, Jiang, Shuqiang, and Lang, Ning
Published: 2024
Full Text: View/download PDF

2. Molecular phylogeography of Hipposideros pratti in China.

Author: LIU, Wei, WANG, Jinhe, HAO, Yan, SONG, Xinhang, YANG, Yaping, LI, Jing, HE, Jingying, BU, Yanzhen, and NIU, Hongxing
Subjects: ANIMAL migration, ANIMAL dispersal, GENETIC variation, QUATERNARY Period, ALPINE glaciers, PHYLOGEOGRAPHY
Abstract: The article "Molecular phylogeography of Hipposideros pratti in China" discusses the genetic diversity and population dynamics of the bat species Hipposideros pratti in China. The study found that H. pratti has low genetic diversity and is divided into two clades, the central-western clade and the eastern clade. The research did not detect a clear east-to-west dispersal route, and the eastern clade spread outward from one population to another while the central-western clade spread gradually. The study also highlighted the importance of preserving bat populations and the need for in situ conservation measures. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

3. Image captioning via semantic element embedding

Author: Zhang, Xiaodan, He, Shengfeng, Song, Xinhang, Lau, Rynson W.H., Jiao, Jianbin, and Ye, Qixiang
Published: 2020
Full Text: View/download PDF

4. A facile route to graphite-tungsten nitride and graphite-molybdenum nitride nanocomposites and their ORR performances

Author: Pan, Xiaolong, Song, Xinhang, Lin, Sen, Bi, Ke, Hao, Yanan, Du, Yinxiao, Liu, Jun, Fan, Dongyu, Wang, Yonggang, and Lei, Ming
Published: 2016
Full Text: View/download PDF

5. Category co-occurrence modeling for large scale scene recognition

Author: Song, Xinhang, Jiang, Shuqiang, Herranz, Luis, Kong, Yan, and Zheng, Kai
Published: 2016
Full Text: View/download PDF

6. Exploring the endangerment mechanisms of Hipposideros pomona based on molecular phylogeographic methods.

Author: Liu, Wei, Hao, Yan, Song, Xinhang, Ma, Liqun, Li, Jing, He, Jingying, Bu, Yanzhen, and Niu, Hongxing
Subjects: MICROSATELLITE repeats, LAST Glacial Maximum, QUATERNARY Period, GENETIC correlations, GENE flow, MITOCHONDRIAL DNA, HETEROZYGOSITY
Abstract: The endangerment mechanisms of various species are a focus of studies on biodiversity and conservation biology. Hipposideros pomona is an endangered species, but the reasons behind its endangerment remain unclear. We investigated the endangerment mechanisms of H. pomona using mitochondrial DNA, nuclear DNA, and microsatellite loci markers. The results showed that the nucleotide diversity of mitochondria DNA and heterozygosity of microsatellite markers were high (π = 0.04615, HO = 0.7115), whereas the nucleotide diversity of the nuclear genes was low (THY: π = 0.00508, SORBS2: π = 0.00677, ACOX2: π = 0.00462, COPS7A: π = 0.00679). The phylogenetic tree and median‐joining network based on mitochondrial DNA sequences clustered the species into three clades, namely North Vietnam‐Fujian, Myanmar‐West Yunnan, and Laos‐Hainan clades. However, joint analysis of nuclear genes did not exhibit clustering. Analysis of molecular variance revealed a strong population genetic structure; IMa2 analysis did not reveal significant gene flow between all groups (p >.05), and isolation‐by‐distance analysis revealed a significant positive correlation between genetic and geographic distances (p <.05). The mismatch distribution analysis, neutral test, and Bayesian skyline plots revealed that the H. pomona population were relatively stable and exhibited a contraction trend. The results implied that H. pomona exhibits female philopatry and male‐biased dispersal. The Hengduan Mountains could have acted as a geographical barrier for gene flow between the North Vietnam‐Fujian clade and the Myanmar‐West Yunnan clade, whereas the Qiongzhou Strait may have limited interaction between the Hainan populations and other clades. The warm climate during the second interglacial Quaternary period (c. 0.33 Mya) could have been responsible for species differentiation, whereas the cold climate during the late Quaternary last glacial maximum (c. 10 ka BP) might have caused the overall contraction of species. The lack of significant gene flow in nuclear microsatellite loci markers among the different populations investigated reflects recent habitat fragmentation due to anthropogenic activities; thus, on‐site conservation of the species and restoration of gene flow corridors among populations need immediate implementation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. Relative image similarity learning with contextual information for Internet cross-media retrieval

Author: Jiang, Shuqiang, Song, Xinhang, and Huang, Qingming
Published: 2014
Full Text: View/download PDF

8. Evaluation of Deep Learning-Based Automated Detection of Primary Spine Tumors on MRI Using the Turing Test.

Author: Ouyang, Hanqiang, Meng, Fanyu, Liu, Jianfang, Song, Xinhang, Li, Yuan, Yuan, Yuan, Wang, Chunjie, Lang, Ning, Tian, Shuai, Yao, Meiyi, Liu, Xiaoguang, Yuan, Huishu, Jiang, Shuqiang, and Jiang, Liang
Subjects: TURING test, SPINE, MAGNETIC resonance imaging, ARTIFICIAL intelligence, DEEP learning
Abstract: Background: Recently, the Turing test has been used to investigate whether machines have intelligence similar to humans. Our study aimed to assess the ability of an artificial intelligence (AI) system for spine tumor detection using the Turing test. Methods: Our retrospective study data included 12179 images from 321 patients for developing AI detection systems and 6635 images from 187 patients for the Turing test. We utilized a deep learning-based tumor detection system with Faster R-CNN architecture, which generates region proposals by Region Proposal Network in the first stage and corrects the position and the size of the bounding box of the lesion area in the second stage. Each choice question featured four bounding boxes enclosing an identical tumor. Three were detected by the proposed deep learning model, whereas the other was annotated by a doctor; the results were shown to six doctors as respondents. If the respondent did not correctly identify the image annotated by a human, his answer was considered a misclassification. If all misclassification rates were >30%, the respondents were considered unable to distinguish the AI-detected tumor from the human-annotated one, which indicated that the AI system passed the Turing test. Results: The average misclassification rates in the Turing test were 51.2% (95% CI: 45.7%–57.5%) in the axial view (maximum of 62%, minimum of 44%) and 44.5% (95% CI: 38.2%–51.8%) in the sagittal view (maximum of 59%, minimum of 36%). The misclassification rates of all six respondents were >30%; therefore, our AI system passed the Turing test. Conclusion: Our proposed intelligent spine tumor detection system has a similar detection ability to annotation doctors and may be an efficient tool to assist radiologists or orthopedists in primary spine tumor detection. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. Scene Recognition With Prototype-Agnostic Scene Layout.

Author: Chen, Gongwei, Song, Xinhang, Zeng, Haitao, and Jiang, Shuqiang
Subjects: *CONVOLUTIONAL neural networks, *REPRESENTATIONS of graphs, *BUILDING layout, *IMAGE representation, *STRUCTURAL models
Abstract: Exploiting the spatial structure in scene images is a key research direction for scene recognition. Due to the large intra-class structural diversity, building and modeling flexible structural layout to adapt various image characteristics is a challenge. Existing structural modeling methods in scene recognition either focus on predefined grids or rely on learned prototypes, which all have limited representative ability. In this paper, we propose Prototype-agnostic Scene Layout (PaSL) construction method to build the spatial structure for each image without conforming to any prototype. Our PaSL can flexibly capture the diverse spatial characteristic of scene images and have considerable generalization capability. Given a PaSL, we build Layout Graph Network (LGN) where regions in PaSL are defined as nodes and two kinds of independent relations between regions are encoded as edges. The LGN aims to incorporate two topological structures (formed in spatial and semantic similarity dimensions) into image representations through graph convolution. Extensive experiments show that our approach achieves state-of-the-art results on widely recognized MIT67 and SUN397 datasets without multi-model or multi-scale fusion. Moreover, we also conduct the experiments on one of the largest scale datasets, Places365. The results demonstrate the proposed method can be well generalized and obtains competitive performance. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

10. Image Representations With Spatial Object-to-Object Relations for RGB-D Scene Recognition.

Author: Song, Xinhang, Jiang, Shuqiang, Wang, Bohan, Chen, Chengpeng, and Chen, Gongwei
Subjects: *IMAGE representation, *OBJECT recognition (Computer vision), *RECURRENT neural networks
Abstract: Scene recognition is challenging due to the intra-class diversity and inter-class similarity. Previous works recognize scenes either with global representations or with the intermediate representations of objects. In contrast, we investigate more discriminative image representations of object-to-object relations for scene recognition, which are based on the triplets of obtained with detection techniques. Particularly, two types of representations, including co-occurring frequency of object-to-object relation (denoted as COOR) and sequential representation of object-to-object relation (denoted as SOOR), are proposed to describe objects and their relative relations in different forms. COOR is represented as the intermediate representation of co-occurring frequency of objects and their relations, with a three order tensor that can be fed to scene classifier without further embedding. SOOR is represented in a more explicit and freer form that sequentially describe image contents with local captions. And a sequence encoding model (e.g., recurrent neural network (RNN)) is implemented to encode SOOR to the features for feeding the classifiers. In order to better capture the spatial information, the proposed COOR and SOOR are adapted to RGB-D data, where a RGB-D proposal fusion method is proposed for RGB-D object detection. With the proposed approaches COOR and SOOR, we obtain the state-of-the-art results of RGB-D scene recognition on SUN RGB-D and NYUD2 datasets. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

11. Spatio-Temporal Memory Attention for Image Captioning.

Author: Ji, Junzhong, Xu, Cheng, Zhang, Xiaodan, Wang, Boyue, and Song, Xinhang
Subjects: LANGUAGE models, ATTENTION, LANGUAGE policy, MEMORY
Abstract: Visual attention has been successfully applied in image captioning to selectively incorporate the most relevant areas to the language generation procedure. However, the attention in current image captioning methods is only guided by the hidden state of language model, e.g. LSTM (Long-Short Term Memory), indirectly and implicitly, and thus the attended areas are weakly relevant at different time steps. Besides the spatial relationship of attention areas, the temporal relationship in attention is crucial for image captioning according to the attention transmission mechanism of human vision. In this paper, we propose a new spatio-temporal memory attention (STMA) model to learn the spatio-temporal relationship in attention for image captioning. The STMA introduces the memory mechanism to the attention model through a tailored LSTM, where the new cell is used to memorize and propagate the attention information, and the output gate is used to generate attention weights. The attention in STMA transmits with memory adaptively and dependently, which builds strong temporal connections of attentions and learns the spatio-temporal relationship of attended areas simultaneously. Besides, the proposed STMA is flexible to combine with attention-based image captioning frameworks. Experiments on MS COCO dataset demonstrate the superiority of the proposed STMA model in exploring the spatio-temporal relationship in attention and improving the current attention-based image captioning. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

12. Learning Scene Attribute for Scene Recognition.

Author: Zeng, Haitao, Song, Xinhang, Chen, Gongwei, and Jiang, Shuqiang
Abstract: Scene recognition has been a challenging task in the field of computer vision and multimedia for a long time. The current scene recognition works often extract object features and scene features through CNN, and combine these two types of features to obtain complementary and discriminative scene representations. However, when the scene categories are visually similar, the object features might lack of discriminations. Therefore, it may be debatable to consider only object features. In contrast to the existing works, in this paper, we discuss the discrimination of scene attributes in local regions and utilize scene attributes as the complementary features of object and scene features. We extract these visual features from two individual CNN branches, one extracting the global features of the image while the other extracting the features of local regions. Through contextual modeling framework, we aggregate these features and generate more discriminative scene representations, which achieve better performance than the feature aggregation of object and scene. Moreover, we achieve the new state-of-the-art performance on both standard scene recognition benchmarks by aggregating more complementary visual features: MIT67 (88.06%) and SUN397 (74.12%). [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

13. Multi-Scale Multi-Feature Context Modeling for Scene Recognition in the Semantic Manifold.

Author: Song, Xinhang, Jiang, Shuqiang, and Herranz, Luis
Subjects: *PATTERN recognition systems, *MANIFOLDS (Mathematics), *ARTIFICIAL neural networks, *GAUSSIAN mixture models, *MARKOV random fields
Abstract: Before the big data era, scene recognition was often approached with two-step inference using localized intermediate representations (objects, topics, and so on). One of such approaches is the semantic manifold (SM), in which patches and images are modeled as points in a semantic probability simplex. Patch models are learned resorting to weak supervision via image labels, which leads to the problem of scene categories co-occurring in this semantic space. Fortunately, each category has its own co-occurrence patterns that are consistent across the images in that category. Thus, discovering and modeling these patterns are critical to improve the recognition performance in this representation. Since the emergence of large data sets, such as ImageNet and Places, these approaches have been relegated in favor of the much more powerful convolutional neural networks (CNNs), which can automatically learn multi-layered representations from the data. In this paper, we address many limitations of the original SM approach and related works. We propose discriminative patch representations using neural networks and further propose a hybrid architecture in which the semantic manifold is built on top of multiscale CNNs. Both representations can be computed significantly faster than the Gaussian mixture models of the original SM. To combine multiple scales, spatial relations, and multiple features, we formulate rich context models using Markov random fields. To solve the optimization problem, we analyze global and local approaches, where a top–down hierarchical algorithm has the best performance. Experimental results show that exploiting different types of contextual relations jointly consistently improves the recognition accuracy. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

14. Inelastic interaction between dark solitons for fourth-order variable-coefficient nonlinear Schrödinger equation.

Author: Song, Xinhang, Yang, Chunyu, Yu, Weitian, Zhang, Yujia, Liu, Mengli, Lei, Ming, and Liu, Wen-Jun
Subjects: *SOLITONS, *SOLITON collisions, *SCHRODINGER equation, *OPTICAL switching, *LOGIC circuits
Abstract: Dark solitons have the advantages of high stability, long transmission distance, and low time jitter, they can be used in the field of high precision measurement, optical communication, and nonlinear optics. In this paper, inelastic interactions between dark solitons are investigated. With the analytic dark soliton solutions for the fourth-order variable-coefficient nonlinear Schrödinger equation, various types of inelastic interactions between dark solitons are presented. Influences of corresponding parameters are analyzed. Results may be applied in the optical switching devices and design of logic gates. [ABSTRACT FROM PUBLISHER]
Published: 2017
Full Text: View/download PDF

15. Synthesis of hollow porous ZnCo2O4 microspheres as high-performance oxygen reduction reaction electrocatalyst.

Author: Wang, Hao, Song, Xinhang, Wang, Haiyan, Bi, Ke, Liang, Ce, Lin, Sen, Zhang, Ru, Du, Yinxiao, Liu, Jun, Fan, Dongyu, Wang, Yonggang, and Lei, Ming
Subjects: *ZINC compounds, *OXYGEN reduction, *ELECTROCATALYSTS, *INORGANIC synthesis, *SOLUTION (Chemistry), *CALCINATION (Heat treatment)
Abstract: Hollow porous ZnCo 2 O 4 microspheres have been successfully prepared by a simple solution-based assembly followed by calcination under an air atmosphere using zinc acetylacetonate Zn(C 5 H 7 O 2 ) 2 and cobalt acetylacetonate Co(C 5 H 7 O 2 ) 3 as raw materials. Scanning electron microscopy (SEM) and Transmission electron microscope (TEM) are used to reveal the synthesis mechanism of hollow porous structure, meanwhile, BET is used to analyze the specific surface area and pore size distribution. In the oxygen reduction reaction (ORR) test, hollow porous ZnCo 2 O 4 microspheres exhibit enhanced ORR performance than bulk ZnCo 2 O 4 , mainly owing to the hollow porous structure which has more catalytic sites and higher efficiency of reactant exchange. Moreover, such catalyst also exhibits superior methanol tolerance ability and durability over commercial Pt/C catalyst. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

16. Geolocalized Modeling for Dish Recognition.

Author: Xu, Ruihan, Herranz, Luis, Jiang, Shuqiang, Wang, Shuang, Song, Xinhang, and Jain, Ramesh
Abstract: Food-related photos have become increasingly popular , due to social networks, food recommendations, and dietary assessment systems. Reliable annotation is essential in those systems, but unconstrained automatic food recognition is still not accurate enough. Most works focus on exploiting only the visual content while ignoring the context. To address this limitation, in this paper we explore leveraging geolocation and external information about restaurants to simplify the classification problem. We propose a framework incorporating discriminative classification in geolocalized settings and introduce the concept of geolocalized models, which, in our scenario, are trained locally at each restaurant location. In particular, we propose two strategies to implement this framework: geolocalized voting and combinations of bundled classifiers. Both models show promising performance, and the later is particularly efficient and scalable. We collected a restaurant-oriented food dataset with food images, dish tags, and restaurant-level information, such as the menu and geolocation. Experiments on this dataset show that exploiting geolocation improves around 30% the recognition performance, and geolocalized models contribute with an additional 3–8% absolute gain, while they can be trained up to five times faster. [ABSTRACT FROM PUBLISHER]
Published: 2015
Full Text: View/download PDF

18. Composite Object Relation Modeling for Few-Shot Scene Recognition.

Author: Song X, Liu C, Zeng H, Zhu Y, Chen G, Qin X, and Jiang S
Abstract: The goal of few-shot image recognition is to classify different categories with only one or a few training samples. Previous works of few-shot learning mainly focus on simple images, such as object or character images. Those works usually use a convolutional neural network (CNN) to learn the global image representations from training tasks, which are then adapted to novel tasks. However, there are many more abstract and complex images in real world, such as scene images, consisting of many object entities with flexible spatial relations among them. In such cases, global features can hardly obtain satisfactory generalization ability due to the large diversity of object relations in the scenes, which may hinder the adaptability to novel scenes. This paper proposes a composite object relation modeling method for few-shot scene recognition, capturing the spatial structural characteristic of scene images to enhance adaptability on novel scenes, considering that objects commonly co- occurred in different scenes. In different few-shot scene recognition tasks, the objects in the same images usually play different roles. Thus we propose a task-aware region selection module (TRSM) to further select the detected regions in different few-shot tasks. In addition to detecting object regions, we mainly focus on exploiting the relations between objects, which are more consistent to the scenes and can be used to cleave apart different scenes. Objects and relations are used to construct a graph in each image, which is then modeled with graph convolutional neural network. The graph modeling is jointly optimized with few-shot recognition, where the loss of few-shot learning is also capable of adjusting graph based representations. Typically, the proposed graph based representations can be plugged in different types of few-shot architectures, such as metric-based and meta-learning methods. Experimental results of few-shot scene recognition show the effectiveness of the proposed method.
Published: 2023
Full Text: View/download PDF

19. Multi-Object Navigation Using Potential Target Position Policy Function.

Author: Zeng H, Song X, and Jiang S
Abstract: Visual object navigation is an essential task of embodied AI, which is letting the agent navigate to the goal object under the user's demand. Previous methods often focus on single-object navigation. However, in real life, human demands are generally continuous and multiple, requiring the agent to implement multiple tasks in sequence. These demands can be addressed by repeatedly performing previous single task methods. However, by dividing multiple tasks into several independent tasks to perform, without the global optimization between different tasks, the agents' trajectories may overlap, reducing the efficiency of navigation. In this paper, we propose an efficient reinforcement learning framework with a hybrid policy for multi-object navigation, aiming to maximally eliminate noneffective actions. First, the visual observations are embedded to detect the semantic entities (such as objects). And the detected objects are memorized and projected into semantic maps, which can also be regarded as a long-term memory of the observed environment. Then a hybrid policy consisting of exploration and long-term planning strategies is proposed to predict the potential target position. In particular, when the target is directly oriented, the policy function makes long-term planning for the target based on the semantic map, which is implemented by a sequence of motion actions. In the alternative, when the target is not oriented, the policy function estimates an object's potential position toward exploring the most possible objects (positions) that have close relations to the target. The relation between different objects is obtained with prior knowledge, which is used to predict the potential target position by integrating with the memorized semantic map. And then a path to the potential target is planned by the policy function. We evaluate our proposed method on two large-scale 3D realistic environment datasets, Gibson and Matterport3D, and the experimental results demonstrate the effectiveness and generalization of the proposed method.
Published: 2023
Full Text: View/download PDF

20. Dataset Bias in Few-Shot Image Recognition.

Author: Jiang S, Zhu Y, Liu C, Song X, Li X, and Min W
Abstract: The goal of few-shot image recognition (FSIR) is to identify novel categories with a small number of annotated samples by exploiting transferable knowledge from training data (base categories). Most current studies assume that the transferable knowledge can be well used to identify novel categories. However, such transferable capability may be impacted by the dataset bias, and this problem has rarely been investigated before. Besides, most of few-shot learning methods are biased to different datasets, which is also an important issue that needs to be investigated deeply. In this paper, we first investigate the impact of transferable capabilities learned from base categories. Specifically, we use the relevance to measure relationships between base categories and novel categories. Distributions of base categories are depicted via the instance density and category diversity. The FSIR model learns better transferable knowledge from relevant training data. In the relevant data, dense instances or diverse categories can further enrich the learned knowledge. Experimental results on different sub-datasets of Imagenet demonstrate category relevance, instance density and category diversity can depict transferable bias from distributions of base categories. Second, we investigate performance differences on different datasets from the aspects of dataset structures and different few-shot learning methods. Specifically, we introduce image complexity, intra-concept visual consistency, and inter-concept visual similarity to quantify characteristics of dataset structures. We use these quantitative characteristics and eight few-shot learning methods to analyze performance differences on multiple datasets. Based on the experimental analysis, some insightful observations are obtained from the perspective of both dataset structures and few-shot learning methods. We hope these observations are useful to guide future few-shot learning research on new datasets or tasks. Our data is available at http://123.57.42.89/dataset-bias/dataset-bias.html.
Published: 2023
Full Text: View/download PDF

21. Image Representations with Spatial Object-to-Object Relations for RGB-D Scene Recognition.

Author: Song X, Jiang S, Wang B, Chen C, and Chena G
Abstract: Scene recognition is challenging due to the intra-class diversity and inter-class similarity. Previous works recognize scenes either with global representations or with the intermediate representations of objects. In contrast, we investigate more discriminative image representations of object-to-object relations for scene recognition, which are based on the triplets of obtained with detection techniques. Particularly, two types of representations, including co-occurring frequency of object-to-object relation (denoted as COOR) and sequential representation of object-to-object relation (denoted as SOOR), are proposed to describe objects and their relative relations in different forms. COOR is represented as the intermediate representation of co-occurring frequency of objects and their relations, with a three order tensor that can be fed to scene classifier without further embedding. SOOR is represented in a more explicit and freer form that sequentially describe image contents with local captions. And a sequence encoding model (e.g., recurrent neural network (RNN)) is implemented to encode SOOR to the features for feeding the classifiers. In order to better capture the spatial information, the proposed COOR and SOOR are adapted to RGB-D data, where a RGB-D proposal fusion method is proposed for RGB-D object detection. With the proposed approaches COOR and SOOR, we obtain the state-of-the-art results of RGB-D scene recognition on SUN RGB-D and NYUD2 datasets.
Published: 2019
Full Text: View/download PDF

22. Learning Effective RGB-D Representations for Scene Recognition.

Author: Song X, Jiang S, Herranz L, and Chen C
Abstract: Deep convolutional networks (CNN) can achieve impressive results on RGB scene recognition thanks to large datasets such as Places. In contrast, RGB-D scene recognition is still underdeveloped in comparison, due to two limitations of RGB-D data we address in this paper. The first limitation is the lack of depth data for training deep learning models. Rather than fine tuning or transferring RGB-specific features, we address this limitation by proposing an architecture and a twostep training approach that directly learns effective depth-specific features using weak supervision via patches. The resulting RGBD model also benefits from more complementary multimodal features. Another limitation is the short range of depth sensors (typically 0.5m to 5.5m), resulting in depth images not capturing distant objects in the scenes that RGB images can. We show that this limitation can be addressed by using RGB-D videos, where more comprehensive depth information is accumulated as the camera travels across the scenes. Focusing on this scenario, we introduce the ISIA RGB-D video dataset to evaluate RGB-D scene recognition with videos. Our video recognition architecture combines convolutional and recurrent neural networks (RNNs) that are trained in three steps with increasingly complex data to learn effective features (i.e. patches, frames and sequences). Our approach obtains state-of-the-art performances on RGB-D image (NYUD2 and SUN RGB-D) and video (ISIA RGB-D) scene recognition.
Published: 2018
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

22 results on '"Song, Xinhang"'

1. Automated Segmentation and Classification of Knee Synovitis Based on MRI Using Deep Learning

2. Molecular phylogeography of Hipposideros pratti in China.

3. Image captioning via semantic element embedding

4. A facile route to graphite-tungsten nitride and graphite-molybdenum nitride nanocomposites and their ORR performances

5. Category co-occurrence modeling for large scale scene recognition

6. Exploring the endangerment mechanisms of Hipposideros pomona based on molecular phylogeographic methods.

7. Relative image similarity learning with contextual information for Internet cross-media retrieval

8. Evaluation of Deep Learning-Based Automated Detection of Primary Spine Tumors on MRI Using the Turing Test.

9. Scene Recognition With Prototype-Agnostic Scene Layout.

10. Image Representations With Spatial Object-to-Object Relations for RGB-D Scene Recognition.

11. Spatio-Temporal Memory Attention for Image Captioning.

12. Learning Scene Attribute for Scene Recognition.

13. Multi-Scale Multi-Feature Context Modeling for Scene Recognition in the Semantic Manifold.

14. Inelastic interaction between dark solitons for fourth-order variable-coefficient nonlinear Schrödinger equation.

15. Synthesis of hollow porous ZnCo2O4 microspheres as high-performance oxygen reduction reaction electrocatalyst.

16. Geolocalized Modeling for Dish Recognition.

17. Composite Object Relation Modeling for Few-shot Scene Recognition_supp1-3321475.pdf

18. Composite Object Relation Modeling for Few-Shot Scene Recognition.

19. Multi-Object Navigation Using Potential Target Position Policy Function.

20. Dataset Bias in Few-Shot Image Recognition.

21. Image Representations with Spatial Object-to-Object Relations for RGB-D Scene Recognition.

22. Learning Effective RGB-D Representations for Scene Recognition.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

22 results on '"Song, Xinhang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources