1,269 results
Search Results
2. Computing and analyzing decision boundaries from shortest path maps.
- Author
-
Sharma, Ritesh and Kallmann, Marcelo
- Subjects
- *
CIVILIAN evacuation , *SCALAR field theory , *EMERGENCY management , *TOPOLOGICAL fields , *DATA visualization - Abstract
This paper proposes a methodology for computing, visualizing, and analyzing critical decision boundaries for the selection of shortest paths in a given environment. Decision boundaries are defined as the points in a map from which two or more different shortest paths exist towards a destination. This paper introduces the problem of visualizing their evolution, taking into account moving obstacles, moving goals, and as well multiple goals. The proposed visualizations enable analyzing which paths should be taken and at which departure times, such that a destination can be reached by the shortest possible path when taking into account a moving target or time-varying areas to be avoided. The proposed techniques are also applied to the analysis and improvement of exit placement in a given environment, in order to improve the evacuation flow in emergency situations. [Display omitted] • This research presents a unique method for detecting decision boundaries in a given environment, based on the analysis of the generator points of the Shortest Path Map (SPM) rather than employing traditional scalar field topological methods relying on cell neighborhood information which can be affected by the representation resolution. • The proposed approach introduces tools and techniques to visualize the evolution of decision boundaries when considering dynamically-changing obstacles and targets, and to design exit placement to equalize the escape flow distribution. • This novel approach supports decision-making applications related to navigation and environment modeling in emergency evacuation planning. • By analyzing and visualizing SPM decision boundaries, the lengths of globally-optimal Euclidean shortest paths are taken into account, instead of grid-based accumulated distances used in other approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. GRSI Best Paper Award 2021.
- Subjects
- *
AWARDS - Published
- 2022
- Full Text
- View/download PDF
4. GRSI Best Paper Award.
- Published
- 2021
- Full Text
- View/download PDF
5. <atl>A photogrammetry-based verification of assumptions applied in the interpretation of paper architecture
- Author
-
Shih, Naai-Jung and Tsai, Yu-Tun
- Subjects
- *
PHOTOGRAMMETRY , *COMPUTER-aided design , *COMPUTER-generated imagery - Abstract
This research investigated the space compositions of Chernikhov’s 101 Architectural Fantasies via computer-aided simulation to interpret the relationships between architectural components and spatial organization. An algorithmic approach and a perception approach were tested. Traditional analysis emphasized the simulation of corresponding objects by perspective deconstruction methods, which might not be able to show the exactly correct spatial relationship between objects. This research adopted photogrammetry to investigate the non-orthogonal spatial construction of 3D objects in 2D pictures. Research results showed that the algorithmic approach may derive different degrees of angles of parallel or intersected objects, and that observers tend to be misled by the effect of “orthogonal assumption” in terms of their own visual experiences. This finding revealed that Chernikhov had created unreasonable descriptions of space. This result was verified by the existence of false parallel and orthogonal relationships between drawn building parts. Three tests were conducted. Observers used a reverse verification process to analyze three-dimensional objects re-built in simulation. The verification mirrored a two-way construction relationship between 2D perspective and 3D models. [Copyright &y& Elsevier]
- Published
- 2002
- Full Text
- View/download PDF
6. Image deraining based on dual-channel component decomposition.
- Author
-
Lin, Xiao, Xu, Duojiu, Tan, Peiwen, Ma, Lizhuang, and Wang, Zhi-Jie
- Subjects
- *
IMAGE reconstruction , *IMAGE processing , *VISIBILITY - Abstract
Image deraining aims to remove rain streaks from images and reduce information loss in outdoor images caused by rain. As a fundamental task in image processing, image deraining not only enhances the visibility of images but also provides necessary image restoration for advanced vision tasks. Existing image deraining models mostly train end-to-end models by minimizing the similarity between the output image of the model and the rain-free ground truth. Although these methods have achieved significant results, they often perform poorly in the face of dense and changing rain streak scenes. In this paper, we propose a novel method, called D ual-Channel C omponent D ecomposition Net work (DCD-Net). The basic idea of DCD-Net is to leverage the separability prior of rainy images, treats the rain-free background layer and the rain streak mask layer as two parallel component extraction tasks. To this end, it builds a dual-branch parallel networks that extract the rain-free background image and decouple the reconstruction information of the rain streak mask, respectively. It finally applies a composite multi-level contrastive supervision to the output of the above dual-branch parallel network, thereby achieving rain streak removal. Extensive experiments on various datasets demonstrate that the proposed model outperforms existing methods in deraining dense rain streak images. [Display omitted] • This paper proposes an image deraining method, called Dual-Channel Component Decomposition Network (DCD-Net). • DCD-Net treats the rain-free background layer and the rain streak mask layer as two parallel component extraction tasks. • DCD-Net obtains competitive performance in deraining complex and dense rain streak images. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. [formula omitted]GAN: Importance Weight and Wavelet feature guided Image-to-Image translation under limited data.
- Author
-
Yang, Qiuxia, Pu, Yuanyuan, Zhao, Zhengpeng, Xu, Dan, and Li, Siqi
- Subjects
- *
GENERATIVE adversarial networks , *MACHINE translating - Abstract
Image-to-Image (I2I) translation methods based on generative adversarial networks (GANs) require large amounts of training data, without which they will suffer from over-fitting and train divergence, and trained models are sub-optimal. In addition, it would be very difficult for the model to synthesize high-frequency signals, deteriorating the synthesis quality. To address these, this paper proposes W 2 GAN, which mainly introduces the ideas of the Importance Weight and Wavelet transformation to achieve the I2I translation trained on limited-data. Concretely, this paper first alleviates the over-fitting and train divergence by the adversarial loss with importance weight, which aims to improve the influence of the high-quality generated images during the training generator, thus enhancing the generator to deceive the discriminator. Then, the high-frequency features of the wavelet transformation are applied to the decoder, and wavelet-AdaIN normalization is proposed to prevent deficiency of high-frequency information, which adaptively integrates high-frequency statistical characteristics from generated features and real image high-frequency information. Qualitative and quantitative results on the AFHQ and CelebA-HQ datasets demonstrate the merits of the W 2 GAN. Noticeably, this paper achieves state-of-the-art FID and KID on AFHQ and CelebA-HQ datasets. [Display omitted] • In this paper, the GANs are trained under limited data. This paper overcomes the problem of the discriminator over-fitting during the training, which leads to the model divergence and the results degradation, stabilizing the training process and achieving higher quality results. From the qualitative and quantitative perspectives, this paper has achieved competitive results in current research. • This paper introduces the learnable importance weight to the adversarial loss, which aims to hope the high-quality images produce higher influence during the training generator. It relieves the problems of the training diverge and over-fitting. • This paper proposes a Wavelet-AdaIN Normalization to learn the high-frequency features, which adaptively integrates high-frequency statistical characteristics from generated features and real image high-frequency information. It encourages the generator to produce precise high-frequency signals with fine details. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
8. LBARNet: Lightweight bilateral asymmetric residual network for real-time semantic segmentation.
- Author
-
Hu, Xuegang and Zhou, Baoman
- Subjects
- *
DATA mining , *COMPUTER vision , *IMAGE segmentation , *VISUAL fields , *PIXELS , *LEARNING modules - Abstract
Real-time semantic segmentation, as a key technique for scene understanding, has been an important research topic in the field of computer vision in recent years. However, existing models are unable to achieve good segmentation accuracy on mobile devices due to their huge computational overhead, which makes it difficult to meet actual industrial requirements. To address the problems faced by current semantic segmentation tasks, this paper proposes a lightweight bilateral asymmetric residual network (LBARNet) for real-time semantic segmentation. First, we propose the bilateral asymmetric residual (BAR) module. This module learns multi-scale feature representations with strong semantic information at different stages of the semantic information extraction branch, thus improving pixel classification performance. Secondly, the spatial information extraction (SIE) module is constructed in the spatial detail extraction branch to capture multi-level local features of the shallow network to compensate for the lost geometric information in the downsampling stage. At the same time, we design the attention mechanism perception (AMP) module in the jump connection part to enhance the contextual representation. Finally, we design the dual branch feature fusion (DBF) module to exploit the correspondence between higher-order features and lower-order features to fuse spatial and semantic information appropriately. The experimental results show that LBARNet, without any pre-training and pre-processing and using only 0.6M parameters, achieves 70.8% mloU and 67.2% mloU on the Cityscapes dataset and Camvid dataset, respectively. LBARNet maintain a high segmentation accuracy while using a smaller number of parameters compared to most existing state-of-the-art models. [Display omitted] • This paper proposes a Bilateral Asymmetric Residual (BAR) module, a Spatial Information Extraction (SIE) module, an Attention Mechanism Perception (AMP) module and a Dual Branch Feature Fusion (DBF) module. • A Lightweight Bilateral Asymmetric Residual Network (LBARNet) for real-time image semantic segmentation is proposed in this article. • The experimental results show that LBARNet achieves 70.8% and 67.2% segmentation accuracy on two challenging datasets (Cityscapes and CamVid) using only 0.6M parameters. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Deep learning of curvature features for shape completion.
- Author
-
Hernández-Bautista, Marina and Melero, Francisco Javier
- Subjects
- *
DEEP learning , *CURVATURE , *GEOMETRIC surfaces , *SURFACE reconstruction , *INPAINTING , *PARAMETERIZATION , *SURFACE geometry - Abstract
The paper presents a novel solution to the issue of incomplete regions in 3D meshes obtained through digitization. Traditional methods for estimating the surface of missing geometry and topology often yield unrealistic outcomes for intricate surfaces. To overcome this limitation, the paper proposes a neural network-based approach that generates points in areas where geometric information is lacking. The method employs 2D inpainting techniques on color images obtained from the original mesh parameterization and curvature values. The network used in this approach can reconstruct the curvature image, which then serves as a reference for generating a polygonal surface that closely resembles the predicted one. The paper's experiments show that the proposed method effectively fills complex holes in 3D surfaces with a high degree of naturalness and detail. This paper improves the previous work in terms of a more in-depth explanation of the different stages of the approach as well as an extended results section with exhaustive experiments. [Display omitted] • We perform 3D surface reconstructions using generative inpainting techniques. • 2D representation of a 3D surface geometry based on its curvature. • Application of a general purpose neural network for inpainting. • Our approach does not require dataset nor training time. • Results outperform state-of-the-art quality and naturalness of the reconstructions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. A systematic review on open-set segmentation.
- Author
-
Nunes, Ian, Laranjeira, Camila, Oliveira, Hugo, and dos Santos, Jefersson A.
- Subjects
- *
COMPUTER vision , *REMOTE sensing , *RESEARCH personnel , *AUTONOMOUS vehicles , *VISUAL learning - Abstract
Open-set semantic segmentation remains yet a challenging task, not only due to the inherent challenges of pixel-wise classification but also the precise segmentation of categories not seen during training. The pursuit of that task is rapidly growing in the Computer Vision community, urging the need to organize the literature. In this paper, we extend our previous work by conducting a more comprehensive systematic mapping of the open-set segmentation literature between January 2001 and January 2023 and proposing a novel taxonomy. Our goal is to provide a broad understanding of current trends for the open-set semantic segmentation (OSS) task defined by existing approaches that may influence future methods. By characterizing methodologies in terms of open-set identification strategies, data inputs, and other relevant aspects, we present a structured view of how researchers are advancing in the field of open-set semantic segmentation. To the best of the authors' knowledge, this is the first systematic review of OSS methods. Moreover, we apply the proposed taxonomy to selected methods for open-set recognition, outlining important similarities and differences of such a closely related field. [Display omitted] • Systematic review of papers related to open-set semantic segmentation for the past 20 years. • The proposed taxonomy aims to organize the literature on open-set segmentation. • Seminal papers on open-set recognition are classified under the proposed taxonomy. • Applications like autonomous driving and remote sensing were found to commonly resort to the open set strategy. • Methods tackling open-world are becoming more commom; [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. LFPeers: Temporal similarity search and result exploration.
- Author
-
Sachdeva, Madhav, Burmeister, Jan, Kohlhammer, Jörn, and Bernard, Jürgen
- Subjects
- *
CONCEPT learning , *VISUAL analytics , *BEHAVIORAL assessment , *COVID-19 pandemic , *PEERS , *NATURAL gas prospecting - Abstract
In this paper, we introduce a general concept for the analysis of temporal and multivariate data and the system LFPeers that applies this concept to temporal similarity search and results exploration. The conceptual workflow divides the analysis in two phases: a search phase to find the most similar objects to a query object before a time point t 0 in the temporal data, and an exploration phase to analyze and contextualize this subset of objects after t 0. LFPeers enables users to search for peers through interactive similarity search and filtering, explore interesting behavior of this peer group, and learn from peers through the assessment of diverging behaviors. We present the conceptual workflow to learn from peers and the LFPeers system with novel interfaces for search and exploration in temporal and multivariate data. An earlier workshop publication for LFPeers included a usage scenario targeting epidemiologists and the public who want to learn from the Covid-19 pandemic and distinguish successful and ineffective measures. In this extended paper, we now show how our concept is generalized and applied by domain experts in two case studies, including a novel case on stocks data. Finally, we reflect on the new state of development and on the insights gained by the experts in the case studies on the search and exploration of temporal data to learn from peers. [Display omitted] • Conceptual framework for temporal and multimodal similarity search and exploration. • Visual Analytics system for cause–effect analysis on temporal data. • Interactive user-defined search and exploration of data objects by similarity. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Freeform digital ink annotations in electronic documents: A systematic mapping study.
- Author
-
Sutherland, Craig J., Luxton-Reilly, Andrew, and Plimmer, Beryl
- Subjects
- *
ELECTRONIC paper , *ELECTRONIC records , *ANNOTATIONS , *RAPID prototyping , *INFORMATION display systems - Abstract
A variety of different approaches have been used to add digital ink annotations to text-based documents. While the majority of research in this field has focused on annotation support for static documents, a small number of studies have investigated support for documents in which the underlying content is changed. Although the approaches used to annotate static documents have been relatively successful, the annotation of dynamic text documents poses significant challenges which remain largely unsolved. However, it is difficult to clearly identify the successful techniques and the remaining challenges since there has not yet been a comprehensive review of digital ink annotation research. This paper reports the results of a systematic mapping study of existing work, and presents a taxonomy categorizing digital ink annotation research. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
13. Collaborative use of mobile augmented reality with paper maps
- Author
-
Morrison, Ann, Mulloni, Alessandro, Lemmelä, Saija, Oulasvirta, Antti, Jacucci, Giulio, Peltonen, Peter, Schmalstieg, Dieter, and Regenbrecht, Holger
- Subjects
- *
AUGMENTED reality , *GAME theory , *MOBILE communication systems , *COMPUTER graphics , *COLLECTIVE action - Abstract
Abstract: The popularity of augmented reality (AR) applications on mobile devices is increasing, but there is as yet little research on their use in real-settings. We review data from two pioneering field trials where MapLens, a magic lens that augments paper-based city maps, was used in small-group collaborative tasks. The first study compared MapLens to a digital version akin to Google Maps, the second looked at using one shared mobile device vs. using multiple devices. The studies find place-making and use of artefacts to communicate and establish common ground as predominant modes of interaction in AR-mediated collaboration with users working on tasks together despite not needing to. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
14. COMPUTERS \amp GRAPHICS BEST PAPER AWARD
- Published
- 2006
- Full Text
- View/download PDF
15. Computers & Graphics best paper award (2004)
- Published
- 2005
- Full Text
- View/download PDF
16. Frequency-aware network for low-light image enhancement.
- Author
-
Shang, Kai, Shao, Mingwen, Qiao, Yuanjian, and Liu, Huan
- Subjects
- *
IMAGE intensifiers , *IMAGE enhancement (Imaging systems) , *IMAGE reconstruction , *FREQUENCY-domain analysis - Abstract
Low-light images often suffer from severe visual degradation, affecting both human perception and high-level computer vision tasks. Most existing methods process images in the spatial domain, making it challenging to simultaneously improve brightness while suppressing noise. In this paper, we present a novel perspective to enhance images based on frequency domain characteristics. Specifically, we reveal that the low-frequency components are closely related to luminance and color, whereas the high-frequency components are not. Based on this observation, we propose the Frequency-aware Network (FaNet) for low-light image enhancement. By selectively adjusting low-frequency components, FaNet preserves more high-frequency details while achieving low-light image enhancement. Additionally, we employ a multi-scale framework and selective fusion for effective feature learning and image reconstruction. Experimental results demonstrate the superiority of the proposed method. [Display omitted] • We reveal that the luminance is closely related to low-frequency components. • We design a frequency-aware network to utilize frequency domain features. • A multi-scale framework and selective fusion is proposed for feature learning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Mixed reality human teleoperation with device-agnostic remote ultrasound: Communication and user interaction.
- Author
-
Black, David, Nogami, Mika, and Salcudean, Septimiu
- Subjects
- *
MIXED reality , *REMOTE control , *ULTRASONIC imaging , *TELECOMMUNICATION systems , *TELEROBOTICS , *HUMAN beings - Abstract
For many applications, remote guidance and telerobotics provide great advantages. For example, tele-ultrasound can bring much-needed expert healthcare to isolated communities. However, existing tele-guidance methods have serious limitations including either low precision for video conference-based systems, or high complexity and cost for telerobotics. A new concept called human teleoperation leverages mixed reality, haptics, and high-speed communication to provide tele-guidance that gives an expert nearly-direct remote control without requiring a robot. This paper provides an overview of the human teleoperation concept and its application to tele-ultrasound. The concept and its impact are discussed. A new approach to remote streaming and control of point-of-care ultrasound systems independent of their manufacturer is described, as is a high-speed communication system for the HoloLens 2 that is compatible with ResearchMode API sensor stream access. Details of these systems are shown in supplementary video demonstrations. Novel interaction methods enabled by HoloLens 2-based pose tracking are also introduced and tests of the communication and user interaction are presented. The results show continued improvement of the system compared to previous work in instrumentation, HCI, and communication. The system thus has good potential for tele-ultrasound, as well as possible other applications of human teleoperation including remote maintenance, inspection, and training. The remote ultrasound streaming and control application is made available open source. [Display omitted] • System improvements to human teleoperation demonstrate its feasibility. • Other devices such as the Nreal Light can be used for implementation. • New device-agnostic remote ultrasound streaming and control demonstrated. • HoloLens pose tracking enables human teleoperation with limited compute resources. • HoloLens-based communication system provides effective sensor data streaming. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Voting-based patch sequence autoregression network for adaptive point cloud completion.
- Author
-
Wu, Hang and Miao, Yubin
- Subjects
- *
POINT cloud , *PETRI nets , *NETWORK performance - Abstract
Point cloud completion aims to estimate the whole shapes of objects from their partial scans, and one of the main obstacles that prevents current methods from being applied in real-world scenarios is the variety of structural losses in real-scanned objects, which can hardly be fully included and reflected by the training samples. In this paper, we introduce Patch Sequence Autoregression Network (PSA-Net), a learning-based method that can be trained without the partial point clouds in dataset and is inherently adaptable to input scans with different levels of shape incompleteness: It makes restoring the unseen parts of objects be equivalent to predicting the missing tokens in local patch embedding sequences, and such prediction can start from any initial states. Specifically, we first introduce a Sequential Patch AutoEncoder that reconstructs complete point clouds from quantized patch feature sequences. Second, we establish a Mixed Patch Autoregression pipeline that can flexibly infer the whole sequence from any number of known tokens at any positions. Third, we propose a Voting-Based Mapping module that makes input points softly vote for their possible related tokens in sequences based on their local areas, which transforms partial point clouds to masked sequences in test. Quantitative and qualitative evaluations on two synthetic and four real-world datasets illustrate the competitive performances of our network when comparing with existing approaches. [Display omitted] • A Sequential Patch AutoEncoder for shape generation from quantized feature sequence. • A Mixed Patch Autoregression pipeline for token prediction from any initial states. • A Voting-based Mapping module for transformation from partial shapes to sequences. • Competitive performances on two synthetic and four real-world datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Developing an immersive virtual farm simulation for engaging and effective public education about the dairy industry.
- Author
-
Nguyen, Anh, Francis, Michael, Windfeld, Emma, Lhermie, Guillaume, and Kim, Kangsoo
- Subjects
- *
DAIRY industry , *PUBLIC education , *DAIRY farms , *AGRICULTURAL education , *DAIRY processing - Abstract
Growing public interest in understanding the origins and production methods of dairy products, driven by concerns related to environmental impact, local sourcing, and ethics, highlights an important trend. Nevertheless, a knowledge-trust gap persists between consumers and the dairy industry. Addressing this gap, in this paper, we developed an immersive virtual farm simulation to provide realistic on-farm experiences to the public. Within the virtual farm, users can explore various sites where dairy cows are raised and gain insights into dairy production processes using a head-mounted display (HMD). This simulation was demonstrated at local libraries, involving 48 public participants. We collected and analyzed participants' feedback on various aspects, including usability and their overall perceptions, to assess the simulation's effectiveness as an agricultural education tool. We investigated the impact of the virtual experience on participants' perceived knowledge gain and their awareness of the dairy industry. The results indicate that our dairy farm simulation was positively received as an effective tool for public education. Emphasizing the potential of virtual reality (VR) simulations in agricultural education and the industry, we discuss our key findings and future plans. [Display omitted] • Development process to build a realistic immersive simulation of a dairy farm for public education purposes. • Findings on the usability and user perception of the immersive experience for education purposes. • The system is an effective and useful tool for learning and the provision of information about the dairy industry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Linear time manageable edge-aware filtering on complementary tree structures.
- Author
-
Bu, Penghui, Wang, Hang, Yang, Tao, and Zhao, Hong
- Subjects
- *
TIME complexity , *IMAGE denoising , *GEODESIC distance , *SPANNING trees , *TREES - Abstract
Typical non-local edge-aware filtering methods build long-range connections by deriving a minimum spanning tree (MST) from the input image. Each pixel on the MST only connects to a sub-set of pixels in the 8-connected neighborhood, resulting in piece-wise constant output with fake edges among sub-trees for the unbalanced information propagation along eight directions. In this paper, we propose two complementary spatial trees to incorporate information from the entire image. The structure of each tree depends on the spatial relationships of neighboring pixels. The distances between any two pixels in both spatial space and intensity space are the shortest distances on each tree. We introduce an efficient algorithm to recursively compute the output and the normalization constant on each tree with linear time complexity. For each pixel, we first calculate the outputs from eight subtrees and then fuse them to obtain the result on each tree structure. The final filtering output of our method is the weighted average of the results from two complementary spatial trees. Moreover, we present a distance mapping scheme to adjust the intensity distance between neighboring pixels, enabling our method to filter out a manageable degree of low-amplitude structures while sharpening major edges. Extensive experiments in graphic applications, such as image denoising, JPEG artifact removal, tone mapping, detail enhancement, and colorization, demonstrate the effectiveness and versatility of our method. [Display omitted] • Novel complementary trees to estimate the geodesic distance between any two pixels. • Efficient algorithms with linear time complexity to compute the weighted average of all pixels in the input image. • A distance mapping scheme in intensity space to manageably filter out low-amplitude structures. • Quantitative evaluation and qualitative comparison on various graphic applications to show the effectiveness and the versatility of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Human image animation via semantic guidance.
- Author
-
Guo, Congwei, Ke, Yongken, Wan, Zhenkai, Jia, Minrui, Wang, Kai, and Yang, Shuai
- Subjects
- *
PARSING (Computer grammar) , *HUMAN body , *HUMAN beings - Abstract
Image animation creates visually compelling effects by animating still source images according to driving videos. Recent work performs animation on arbitrary objects using unsupervised methods and can relatively robustly perform motion transfer on human bodies. However, the complex representation of motion and unknown correspondence between human bodies often lead to issues such as distorted limbs and missing semantics, which make human animation challenging. In this paper, we propose a semantically guided, unsupervised method of motion transfer, which uses semantic information to model motion and appearance. Specifically, we use a pre-trained human parsing network to encode the rich and diverse foreground semantic information, thus generating fine details. Secondly, we use a cross-modal attention layer to learn the semantic region's correspondence between human bodies to guide the network in selecting appropriate input features, prompting the network to generate accurate results. Experiments demonstrate that our method outperforms state-of-the-art methods in motion-related metrics, while effectively addressing the problems of semantic missing and unclear limb structures prevalent in human motion transfer. These improvements can facilitate its applications in various fields, such as education and entertainment. [Display omitted] • Proposes a novel framework for human image animation using semantic features and cross-modal attention. • Introduces semantic segmentation features to represent motion and identity information. • Employs a cross-modal attention mechanism to establish correspondences between semantic regions. • Achieves state-of-the-art performance on complex human motions like TaiChi poses. • Reduces issues of missing semantics and distorted limbs common in human animation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. [formula omitted]-Curves: Extended log-aesthetic curves with variable shape parameter.
- Author
-
Tsuchie, Shoichi and Yoshida, Norimasa
- Subjects
- *
CURVES , *AUTOMOBILE industry , *CURVATURE - Abstract
This paper proposes a novel curve called the α - curve , which is specified by a variable shape parameter α and can be applied in curve reconstruction involving curve fairing, fitting, and segmentation. In contrast to classical log-aesthetic curves in which α is constant, this study contributes a novel formulation of α that varies monotonically within a specified segment. By introducing the variable α , the logarithmic curvature graph (LCG) becomes piecewise linear and parabolic, or any other function if necessary. Consequently, in comparison to conventional log-aesthetic curves that have limited field usage based on their rigidity, α -curves have various LCGs and flexible representations, thereby opening up many practical applications. Experimental results and comparative studies on curves created by CAD experts in the automotive industry demonstrate the theoretical and practical validity of α -curves. [Display omitted] • α -Curves are proposed by extending the mathematical framework of log-aesthetic curves • An α -curve is specified by a variable shape parameter • α -Curves have various log-curvature graphs (LCG) and flexible representations • α -Curves are applied to a new fairing, avoiding conventional issues of side effects [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Local geometry-perceptive mesh convolution with multi-ring receptive field.
- Author
-
Liu, Shanghuan, Chen, Xunhao, Gai, Shaoyan, and Da, Feipeng
- Subjects
- *
COMPUTER vision - Abstract
Learning 3D mesh representation is necessary for many computer vision and graphic tasks. Recently, some works have studied convolution methods for directly processing input meshes. However, these methods are usually weak in extracting local geometry information because of the disadvantages such as isotropic filter, neglect of mesh topology, and a small convolution field. In this paper, we introduce a local geometry-perceptive mesh convolution, which pays attention to mesh irregular structures for efficiently capturing geometry features in a multi-ring receptive field. Specifically, we define each template node's dynamic neighbor-attention weights used in multiple attention aggregation operations for obtaining local mesh change information of different vertices in the multi-ring field. After each aggregation, a shared anisotropic filter maps the catenation of each new vertex and its neighbors for extracting geometry features of the current ring. Then, complete local geometry features of each vertex in its large local field are obtained by summing the mapped results of each aggregation. Moreover, the position features of each vertex are added to its local geometry features to get the final representation vector of the vertex. We demonstrate the proposed mesh convolution method's strong ability in modeling 3D mesh shapes. [Display omitted] • Utilizing local irregular mesh structures, dynamic neighbor-attention weights of template's nodes perceive local mesh change information of different vertices. • Multiple attention aggregations and shared anisotropic filters help extract geometry features of large local receptive fields with multi-ring. • The proposed method determines our deep 3D morphable models (3DMMs) have smaller sizes than previous models and achieve state-of-the-art performances. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. GMM-ICQ: A GMM vertex-optimization-based implicitly-connected quadrilateral format for 3D mesh storage.
- Author
-
Lin, Dayong, Zhao, Chunhui, Tian, Qihang, Xu, Yunfei, Wang, Ruilin, and Qu, Zonghua
- Subjects
- *
GAUSSIAN mixture models , *MICROSOFT Surface (Computer) , *QUADRILATERALS , *COMPUTER graphics , *STORAGE - Abstract
3D meshes are commonly utilized and may be considered to be the most popular surface representation in computer graphics due to their simplicity, efficiency and flexibility. However, the explicit storage of mesh vertices and connectivity, as in widely-used PLY and OBJ file formats, leads to substantial memory consumption. This, in turn, directly affects the processing and transmission in downstream applications. Though mesh simplification and mesh compression are common strategies to lessen memory consumption, they exhibit inherent limitations either in maintaining a balance between accuracy, efficiency, memory usage and mesh quality, or breaking the simplicity of explicit storage and struggling with optimizing the trade-off between compression performance and computational resource consumption. To overcome these limitations, inspired by the Gaussian Mixture Model (GMM), this paper proposes a GMM vertex-optimization-based implicitly-connected quadrilateral format for 3D mesh storage, named GMM-ICQ. Extensive qualitative and quantitative evaluations demonstrate that the GMM-ICQ format achieves efficient compression by retaining only a small amount of vertex information, while preserving sharp features and maintaining relatively high mesh quality. It also exhibits a certain degree of robustness in the presence of noise interference. Furthermore, benefiting from the inherent grid-based connectivity, the GMM-ICQ format maintains the simplicity of explicit storage and can be implemented as a progressive variant without incurring additional computational overhead. [Display omitted] • We present a GMM vertex-optimization-based implicitly-connected quadrilateral format for 3D mesh storage. • Simultaneously balances accuracy, efficiency, memory usage, and mesh quality. • Preserves the simplicity of explicit storage (such as PLY and OBJ). • No additional computational overhead needed for progressive variant implementation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. RepDehazeNet: Dual subnets image dehazing network based on structural re-parameterization.
- Author
-
Luo, Xiaozhong, Zhong, Han, Lu, Junjie, Meng, Chen, and Han, Xu
- Subjects
- *
DECODING algorithms , *HAZE , *DEEP learning , *PARAMETERIZATION - Abstract
In recent times, there has been notable and swift advancement in the field of image dehazing. Several deep learning techniques have demonstrated remarkable proficiency in resolving homogeneous dehazing issues. Nonetheless, the current dehazing approaches are generally formulated to deal with homogeneous haze, which is often undermined in real-world scenarios due to the uncertain haze dispersion. In this paper, we propose a dehazing model named RepDehazeNet by combining a structurally Reparameterization Encoder-Decoder subnet and a Full-Resolution Attention subnet. To be specific, the structural reparameterization idea is introduced into the encoder–decoder subnet to strengthen the feature extraction of dehazed images and improve the feature extraction speed. RepDehazeNet is compared with seven SOTA models on different datasets in terms of PSNR, SSIM, parameter quantity, and inference time. Compared to the DW-GAN model, the proposed RepDehazeNet model reduces the number of parameters by 2.7 million, and improves the inference speed by 90.3%, while achieving a higher PSNR of 0.5 dB on the NH-Haze2021 dataset. The experimental results demonstrate that the proposed RepDehazeNet model can effectively improve the real-time performance, accuracy of dehazing synthesized and nonhomogeneous haze images. [Display omitted] • Structural reparameterization dehazenet: outstanding performance, faster speed. • Replacing Tanh with ReLU leads to better results. • Transfer learning addresses the problem of insufficient samples. • Dual subnets method proves highly effective in datasets of different scales. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. 3D face reconstruction from a single image based on hybrid-level contextual information with weak supervision.
- Author
-
Liu, Yang, Ran, Teng, Yuan, Liang, Lv, Kai, and Zheng, Guoquan
- Subjects
- *
TRANSFORMER models , *SUPERVISION - Abstract
Currently, deep learning-based 3D face reconstruction methods have shown promising results. However, they ignore the contextual information of the face, which is a topologically unified entirety. This paper proposes a 3D face reconstruction approach based on hybrid-level contextual information. Firstly, we suggest a regression network with contextual modeling capability at the feature level, PPR-CNet, which adopts a preferential parameter regression to regress the 3DMM parameters dynamically based on their various impacts on the reconstructed 3D face. Furthermore, we design a contextual landmark loss to constrain the face geometry at the landmark level. We introduce a differentiable renderer combined with multiple loss functions for weakly-supervised training. Quantitative experiments on two benchmarks show our method outperforms several SOTA methods. Extensive qualitative experiments indicate that our method performs efficiently in realism, facial proportion, and occlusion. [Display omitted] • We propose an approach to reconstruct a 3D face from a single image based on hybrid-level contextual information. • We propose a regression network, PPR-CNet, with contextual modeling capability, which regresses 3DMM parameters dynamically. • We design a contextual landmark loss to constrain face geometry employing landmark contextual information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Computer ; Graphics best paper award (2003)
- Published
- 2004
- Full Text
- View/download PDF
28. Lightweight fully connected network-based fast CU size decision for video-based point cloud compression.
- Author
-
Que, Shicheng and Li, Yue
- Subjects
- *
VIDEO coding , *POINT cloud , *FEATURE extraction , *DECISION making - Abstract
Video-based point cloud compression (V-PCC) utilizes high efficiency video coding (HEVC) to compress geometry and attribute videos generated from dynamic point cloud projection. However, the HEVC exhaustive coding unit (CU) size decision process is complex and hinders the real-time application of V-PCC. To reduce the coding complexity of V-PCC, this paper proposes a method that combined hand-crafted features and lightweight neural network to accurately predict the best CU partition in advance. First, we extract hand-crafted features, including direct features (DFs) and indirect feature (IF), as mixed features. DFs are simple and require no additional calculation, while IF is obtained indirectly by transforming the global and local distortions of the CU extracted before size decision determination. Second, we propose a lightweight fully connected network (LFCN) as the backbone network, two feature types are used as inputs to the LFCN to predict whether the CU should be split into sub-CUs, and the LFCN can be fully integrated into the encoder with only about 1.58KB of additional parameters. Experimental results show that the proposed method reduces coding complexity by an average of 51.2% while Luma's BD-TotalRate only increases by 0.1% on average under the All Intra (AI) configuration. [Display omitted] • Lightweight neural network-based fast coding method is proposed for V-PCC. • Direct features and indirect feature are jointly extracted for fast CU partition. • The proposed method can be effectively used in I frame and P frame. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Self-report user interfaces for patients with Rheumatic and Musculoskeletal Diseases: App review and usability experiments with mobile user interface components.
- Author
-
Nunes, Francisco, Rato Grego, Petra, Araújo, Ricardo, and Silva, Paula Alexandra
- Subjects
- *
RHEUMATISM , *USER interfaces , *MUSCULOSKELETAL system diseases , *PATIENT reported outcome measures , *SELF-evaluation , *IPHONE (Smartphone) - Abstract
Rheumatic and Musculoskeletal Diseases (RMDs) affect 120 million Europeans and are responsible for joint inflammation, stiffness, pain, and fatigue. Patient-Reported Outcome Measures (PROMs), essential to diagnosis and treatment adjustments, are expected to revolutionise rheumatology care if mobile apps reach clinical practice. However, patients often experience finger dexterity issues that can hinder their interaction with mobile apps. This paper investigates the interaction of patients with RMDs with mobile apps for self-report. We started by reviewing existing iPhone and Android apps for RMDs, to identify common user interface (UI) components, and conducted usability experiments with 20 patients with RMDs to record their performance. The usability experiments showed that in-line selectors are the best-performing UI component and that column selectors are considered the most usable by patients. Sliders perform worse than in-line selectors, with significant differences. Results also showed little difference between test conditions aligned with mobile UI design guidelines and those that provided larger or more spaced targets, leading us to conclude that following existing Apple Human Interface Guidelines and Android Material Design will lead to apps with UIs that are appropriate for patients with RMDs. [Display omitted] • In-line selectors are the UI component that affords the best user performance for patients with Rheumatic and Musculoskeletal Diseases. • Column selectors are perceived as the most usable by patients with Rheumatic and Musculoskeletal Diseases. • Sliders perform worse than in-line selectors, with significant differences. • Following UI Apple and Android guidelines is appropriate for patients with RMDs. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. A multimodal smartwatch-based interaction concept for immersive environments.
- Author
-
Lang, Matěj, Strobel, Clemens, Weckesser, Felix, Langlois, Danielle, Kasneci, Enkelejda, Kozlíková, Barbora, and Krone, Michael
- Subjects
- *
SMARTWATCHES , *EYE tracking , *PERSONAL computers , *UNITS of measurement , *USER experience , *VIRTUAL reality - Abstract
Augmented and Virtual Reality (AR/VR) environments require user interaction concepts beyond the traditional mouse-and-keyboard setup for seated desktop computer usage. Although advanced input modalities such as hand or gaze tracking have been developed, they have yet to be widely adopted in available hardware. Modern smartwatches have been shown to provide a powerful and intuitive means of input, thereby overcoming the limitation of the current AR/VR headsets. They typically offer a set of interesting input modalities, such as a touchscreen, rotary buttons, and an Inertial Measurement Unit (IMU), which can be used for mid-air gesture recognition. Compared to other input devices, they have the benefit that they are hands-free as soon as the user stops interacting since they are attached to the wrist. As many concepts have been proposed, comparative evaluations of their effectiveness and user-friendliness are still rare. In this paper, we evaluate the usability of two commonly found approaches for using a smartwatch as an interaction device, specifically in immersive environments provided by AR/VR HMDs: using the physical inputs of the watch (touchscreen, rotary buttons) or mid-air gestures. We conducted a user study with 20 participants, where they tested both of the interaction methods, and we compared them in their usability and performance. Based on a prototypical AR application, we evaluated the performance and user experience of these two smartwatch-based interaction concepts. We have found that the input using a touchscreen and buttons was generally favored by the participants and led to shorter task completion times. [Display omitted] • We assessed smartwatch interaction: buttons, touchscreens, and gestures for AR/VR. • We implemented AR app with common concepts, ran a user study (20 participants). • Users preferred touchscreens over gestures; they're faster and less taxing. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
31. Weighted guided image filtering with entropy evaluation weighting.
- Author
-
Jia, Hongbin, Yin, Qingbo, and Lu, Mingyu
- Subjects
- *
IMAGE fusion , *IMAGE denoising , *EXTREME value theory , *HYPERBOLIC functions , *REGULARIZATION parameter , *STATISTICAL weighting , *ENTROPY - Abstract
Although the guided image filter (GIF) is an excellent edge-preserving filter, it generally suffers from halo artifacts due to the local property and the fixed regularization parameter. To address the problem, a weighted guided image filter (WGIF) was proposed by incorporating an edge-aware weighting into the GIF. In the filtering process, WGIF employs an averaging strategy for edge-aware weighting. Although the averaging strategy is a highly efficient method, it is susceptible to extreme values and tends to obscure critical factors, so it often leads to inaccurate results. Consequently, the output results quality of the WGIF is often degraded. To remedy the deficiency, a weighted guided image filter with entropy evaluation weighting (EEW-WGIF) is proposed in this paper. EEW-WGIF employs an edge-aware weighting strategy based on entropy evaluation method to detect edges more accurately, and incorporates an explicit constraint based on the gradient variation to better preserve edges. To verify the filtering effectiveness of the EEW-WGIF, it was applied to edge-preserving smoothing filtering, exposure images fusion, single image detail enhancement, structure-transferring filtering and image denoising. Experimental results show that the proposed filter can achieve excellent performance in both visual quality and objective evaluation. [Display omitted] • An edge-aware weighting strategy based on entropy evaluation method is proposed, which is more reliable in calculating the importance of each edge-aware factor. • The proposed EEW-WGIF incorporates an explicit constraint based on gradient variation and hyperbolic function to handle the edges so that the edges can be better preserved. • The EEW-WGIF was applied to edge-preserving smoothing filtering, exposure images fusion, single image detail enhancement, structure-transferring filtering and image denoising. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
32. An overview of Eulerian video motion magnification methods.
- Author
-
Ahmed, Ahmed Mohamed, Abdelrazek, Mohamed, Aryal, Sunil, and Nguyen, Thanh Thi
- Subjects
- *
CONVOLUTIONAL neural networks , *RANGE of motion of joints , *IMAGE processing , *MOTION - Abstract
The concept of video motion magnification has become increasingly relevant due to its ability to detect small and invisible motions that can be of great value in a variety of applications. A variety of approaches have been developed to magnify these motions and variations. While both Eulerian and Lagrangian processing methods are widely used for motion magnification, Eulerian approaches are more commonly employed due to their lower computational cost. This paper provides an overview of the powerful Eulerian motion magnification techniques. We begin with a brief introduction to technical concepts associated with Eulerian motion techniques such as pyramids and filters in image processing. Additionally, we provide a comparison between the Lagrangian and Eulerian perspectives, followed by a comprehensive overview of the various Eulerian motion magnification (EVM) techniques available. Finally, we present implementation results and a comparative analysis of some of the Eulerian motion techniques. [Display omitted] • Detecting Imperceptible Motions: Explore motion magnification's role in revealing subtle movements. • Demystifying Eulerian Processing: Simplify complex concepts, mathematical foundations. • Real-World Applications: Illustrate motion magnification's utility in healthcare, construction, etc. • Eulerian Advantages: Compare and emphasize the strength of Eulerian over Lagrangian methods. • Comprehensive Survey: Cover a range of Eulerian motion magnification techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
33. Efficient boundary surface reconstruction from multi-label volumetric data with mathematical morphology.
- Author
-
N'Guyen, Franck, Kanit, Toufik, Maisonneuve, F., and Imad, Abdellatif
- Subjects
- *
MATHEMATICAL morphology , *SURFACE reconstruction , *LEANNESS , *CUBES , *AMBIGUITY - Abstract
This paper proposes a new, fully automatic and robust approach to generating triangular meshes directly from volumetric data (scanned images), particularly when these images contain multiple adjacent labels. Current meshing techniques produce a number of mesh elements directly related to the number of components (voxels) in the image. This number can be considerable if the image is large. The proposed methodology is significantly less dense in terms of the number of elements compared to marching cube methods. The proposed method presents no configuration ambiguity and is faithful to the original morphology of the images regardless of the thinness of the topologies or the presence of possible erratic morphological configurations that may lead to geometric interpretation indecisions. [Display omitted] • New fully automatic and robust approach to generate triangular meshes directly from volumetric data, in particular when these images contain adjoining multiple-label. • The proposed methodology is significantly less dense in terms of the number of elements compared to Marching Cube Methods. • The proposed method presents no configuration ambiguity and is faithful to the original morphology of the images regardless of the thinness of the topologies or the presence of possible erratic morphological configurations that may lead to geometric interpretation indecisions. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
34. Perceptual thresholds of visual size discrimination in augmented and virtual reality.
- Author
-
Wang, Liwen, Cai, Shaoyu, and Sandor, Christian
- Subjects
- *
DISCRIMINATION against overweight persons , *VISUAL discrimination , *VIRTUAL reality , *AUGMENTED reality , *COMPUTING platforms - Abstract
The perception of size in virtual objects in Augmented Reality (AR) and Virtual Reality (VR) is a not trivial issue, as the effectiveness of manipulating and interacting with virtual content depends on the accuracy of size perception. However, there are missing straightforward comparisons between VR and AR in terms of size perception for the deep understanding of size perceptual differences. Understanding these perceptual differences can inform designers on how to adapt content when transitioning between these two spatial computing platforms. In this paper, we conducted two psychophysical experiments to measure the perceptual thresholds of size discrimination for virtual objects. Our results indicated that users are more sensitive to size changes in VR than in video see-through AR, suggesting that size differences are easier to be perceived in VR than in AR. Additionally, for increase or decrease of sizes, the accuracy of judgments showed an asymmetric trend in video see-through AR. [Display omitted] • A comparative experiment to understand the difference in size perception in AR vs VR. • The thresholds for size discrimination in AR and VR are not the same. • The accuracy of judgments is asymmetric for increases and decreases of sizes in AR. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
35. Visualising geospatial time series datasets in realtime with the Digital Earth Viewer.
- Author
-
Buck, Valentin, Stäbler, Flemming, Mohrmann, Jochen, González, Everardo, and Greinert, Jens
- Subjects
- *
TIME series analysis , *UNDERWATER exploration , *INTERNET servers , *CONFERENCE papers , *VISUALIZATION , *GEOSPATIAL data - Abstract
A comprehensive study of the Earth System and its different environments requires understanding of multi-dimensional data acquired with a multitude of different sensors or produced by various models. Here we present a component-wise scalable web-based framework for simultaneous visualisation of multiple data sources. It helps contextualise mixed observation and simulation data in time and space. This work is an extended version of the conference paper (Buck et al., 2021). [Display omitted] • Open-source (EUPL) hybrid application for realtime visualisation of 4D geoscientific data. • Split into native server and web client to utilize the strengths of both platforms. • Desktop builds are released for Windows, Linux, and MacOS. • Viewer used on expedition cruises to plan underwater exploration missions • Presentation and data validation capabilities used by GLODAP [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Computers & graphics best paper award 2002
- Published
- 2003
- Full Text
- View/download PDF
37. Best paper award 2001
- Published
- 2002
- Full Text
- View/download PDF
38. A framework for the efficient enhancement of non-uniform illumination underwater image using convolution neural network.
- Author
-
Zhang, Wenbo, Liu, Weidong, Li, Le, Jiao, Huifeng, Li, Yanli, Guo, Liwei, and Xu, Jingming
- Subjects
- *
CONVOLUTIONAL neural networks , *GENERATIVE adversarial networks , *LIGHT sources , *IMAGE enhancement (Imaging systems) , *COLOR in nature , *GAUSSIAN function - Abstract
In this paper, the non-uniform illumination enhancement problem of underwater images under the artificial light sources conditions is investigated based on Convolution Neural Network (CNN). First, we propose a trainable end-to-end enhancer called NUIENet, for enhancing the non-uniform illumination of underwater images. The proposed model consists of correction network and fusion layers. The correction network adopts the encoder–decoder structure with skip connections to enhance the features of different channels in the HSV domain, and then these enhanced features are fused by the fusion layers to obtain the desired high-quality images. Second, we built an underwater images dataset using Generative Adversarial Network (GAN) and Gaussian Function. Finally, both qualitative and quantitative experimental results show that the proposed method can produce better performance compared to other state-of-the-art enhancement methods on both real-word and synthetic underwater dataset. • This paper proposes a non-uniform illumination enhancer CNN-based which uses the encoder–decoder structure with skip connections to enhance the underwater images with NUI to the desired high-quality images. • To boost underwater imaging processing, we construct a dataset of the underwater image with NUI based on the GAN and Gaussian function which contains NUI images and their corresponding high-quality reference image. • Compared with other state-of-the-art NUIE methods, the proposed network achieves a nature color correction and superior or equivalent visibility improvement. [Display omitted] [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
39. Teaching the basics of computer graphics in virtual reality.
- Author
-
Heinemann, Birte, Görzen, Sergej, and Schroeder, Ulrik
- Subjects
- *
VIRTUAL reality , *COMPUTERS in education , *TECHNOLOGICAL innovations , *COMPUTER graphics , *SCHOOL environment , *TELEPORTATION - Abstract
New technology such as virtual reality can help computer graphics education, for example, by providing the opportunity to illustrate challenging 3D procedures. RePiX VR is a virtual reality tool for computer graphics education that focuses on teaching the core ideas of the rendering pipeline. This paper describes the development and two initial evaluations, which aimed to strengthen the usability, review requirements for different stakeholders, and build infrastructure for learning analytics and research. The integration of learning analytics raises the question of appropriate indicators to be approached through exploratory data analysis. In addition to learning analytics, the evaluation includes quantitative techniques to get insights about usability, and didactical feedback. This paper discusses advanced aspects of learning in VR and looks specifically at movement behavior. According to the evaluations, even learners without prior experience can utilize the VR tool to pick up the fundamentals of computer graphics. [Display omitted] • Evaluated educational VR environment for teaching Computer Graphics. • Teaching Computer Graphics in Virtual Reality is promising. • Learners have various movement and teleportation pattern and different interaction behavior. • A comparison of Desktop and VR users shows differences between groups, as well as a comparison of novices and experts. • The evaluation contains Multimodal Learning Analytics, Quantitative Feedback, and Usability aspects. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
40. DCU-NET: Self-supervised monocular depth estimation based on densely connected U-shaped convolutional neural networks.
- Author
-
Zheng, Qiumei, Yu, Tao, and Wang, Fenghua
- Subjects
- *
CONVOLUTIONAL neural networks , *MONOCULARS , *INFORMATION networks - Abstract
Depth estimation is crucial for scene understanding and downstream tasks, especially the self-supervised training methods showing great potential. The overall structure and local details of the scene are essential for improving the quality of depth estimation. The proposal of Monodepth2 has led to significant progress in self-supervised monocular depth estimation. However, Monodepth2 uses the most basic encoder–decoder architecture. The limited data flow information of the network leads to a large semantic gap between the encoder and the decoder, which reduces the accuracy of the network for fine-grained feature recognition. Monodepth2 adopts Resnet18 pre-trained on the Imagenet dataset as the encoder. This traditional convolutional pooling structure results in a loss of pixel information in the network at every scale. In order to solve this problem, this paper proposes an improved DepthNet. The network adopts Hrnet in semantic segmentation as the base encoder, which adopts an advanced multi-scale fusion method in the whole process, thus avoiding the loss of pixel information. An additional densely connected U-Net is employed at the decoder side to provide more information flow. Furthermore, the semantic gap between the encoder and decoder is reduced by adding different numbers of residual connections and channel attention on each layer. The network structure can be regarded as a collection of fully convolutional networks. Since the deep features of the network have a higher correlation with the vertical position, we add a spatial location attention module to the deep-level network to reduce this semantic gap. The approach performs significantly well on the KITTI dataset benchmark, with several performance criteria comparable to supervised monocular depth inference methods. • This work is a deep estimation network for scene reconstruction and scene understanding. This network redesigns the self-supervised monocular depth framework from an entirely new perspective. The network uses HrNet from the field of semantic segmentation as the base encoder, which employs a progressive multi-scale fusion approach throughout, thus avoiding the loss of pixel information. • An additional densely connected U-Net is used at the decoder side to provide further information flow. To reduce the semantic gap between codecs, we add a different number of residual connections and channel attention on each layer. The network is not trained with the help of other auxiliary networks, and the performance of the depth estimation is improved only by stimulating the network's potential. • This work achieves best-in-class accuracy in monocular depth estimation. When the model in this paper is used for 3D scene reconstruction, it can perform a complete recovery of the scene structure. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
41. A survey of deep learning methods and datasets for hand pose estimation from hand-object interaction images.
- Author
-
Woo, Taeyun, Park, Wonjung, Jeong, Woohyun, and Park, Jinah
- Subjects
- *
POSE estimation (Computer vision) , *DEEP learning , *JOINTS (Anatomy) , *IMPLICIT functions , *VIRTUAL reality , *COMPUTER vision - Abstract
The research topic of estimating hand pose from the images of hand-object interaction has the potential for replicating natural hand behavior in many practical applications of virtual reality and robotics. However, the intricacy of hand-object interaction combined with mutual occlusion, and the need for physical plausibility, brings many challenges to the problem. This paper provides a comprehensive survey of the state-of-the-art deep learning-based approaches for estimating hand pose (joint and shape) in the context of hand-object interaction. We discuss various deep learning-based approaches to image-based hand tracking, including hand joint and shape estimation. In addition, we review the hand-object interaction dataset benchmarks that are well-utilized in hand joint and shape estimation methods. Deep learning has emerged as a powerful technique for solving many problems including hand pose estimation. While we cover extensive research in the field, we discuss the remaining challenges leading to future research directions. [Display omitted] • Deep learning is effectively used for estimating hand pose from images. • The correlation between a hand and an object helps in estimating hand-object pose. • Hand model helps estimate hand shape, but it restricts within the model's prior. • Implicit function methods have emerged in hand-object pose estimation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
42. Computational design of planet regolith sampler based on Bayesian optimization.
- Author
-
Li, Mingyu, Zhu, Lifeng, Yan, Yibing, Zhao, Ziyi, and Song, Aiguo
- Subjects
- *
REGOLITH , *STRAINS & stresses (Mechanics) , *GRANULAR materials , *COMPUTER-aided design , *SPACE exploration , *ENGINEERING design - Abstract
Regolith sampling is one of the core missions in deep space exploration. The design, optimization, and fabrication of samplers are challenging tasks to meet the requirements of deep space exploration, often necessitating complex modeling with computer-aided design tools and demanding the expertise of experienced space engineers with lengthy design iterations. We propose an interactive design framework where designers collaborate with optimization tool to streamline the design process. With the operator adjusting the design goals, we introduce Bayesian optimization to automatically suggest the next sets of parameters to explore. This approach is suitable for optimization scenarios when the design goals cannot be well established as analytical functions and fewer design iterations are required. In this paper, we design and optimize the core structure of the sampler under both stress analysis and discrete element analysis, considering lower stress, greater sampling volume per unit power consumption, and smaller size. Both simulation and physical experimental results show that the design proposed by our framework outperforms existing designs with a small number of design iterations. [Display omitted] • A computational design framework for searching forms of planetary regolith samplers. • The introduction of Bayesian optimization to reduce the cost in optimizing the shape in terms of efficient interaction with granular material. • Simulation and physical experiments to validate the proposed design from our framework. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
43. Simulating hyperelastic materials with anisotropic stiffness models in a particle-based framework.
- Author
-
Wang, Tiancheng, Xu, Yanrui, Li, Ruolan, Wang, Haoping, Xiong, Yuege, and Wang, Xiaokun
- Subjects
- *
POISSON'S ratio , *HUMAN mechanics , *ENERGY function , *MUSCLE contraction , *ELASTIC scattering - Abstract
We present a particle-based smoothed particle hydrodynamics (SPH) framework for simulating hyperelastic materials with anisotropic stiffness models. While most elastic simulations predominantly rely on mesh-based approaches, such as the Finite Element method, the relationship between Lamé's first parameter and Poisson's ratio complicates the strict enforcement of volume conservation, making it challenging to stabilize simulations for common biological tissues like fat and muscle. In this paper, we couple an implicit divergence-free SPH solver with particle-based deformation gradient computation and apply various elastic energy functions to achieve incompressible elastic simulations. The incompressibility of elastic objects and collisions between different bodies are managed by the implicit SPH algorithm. We further incorporate anisotropic energy functions, constructed from the extrapolation of Cauchy–Green invariants, to introduce anisotropic properties to the objects. By integrating activation and contraction coefficients into the energy functions, particles can simulate muscle contractions and lift heavy objects. Our method can effectively represent elastic objects with varying mechanical properties across different directions and be further employed to mimic muscle contractions. Experiments demonstrate that our approach provides realistic simulations for a wide range of animal and human body movements. [Display omitted] • Leveraging a Lagrangian-based approach for the simulation of anisotropic elasticity. • Integration of Smoothed Particle Hydrodynamics with anisotropic energy functions for advanced modeling. • Adaptable simulation of muscle contraction, effectively mimicking a diverse range of movement behaviors. • Enforcing strict incompressibility in the simulation of muscle-like tissues, ensuring highly accurate representations. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. Arbitrary style transfer based on Attention and Covariance-Matching.
- Author
-
Peng, Haiyuan, Qian, Wenhua, Cao, Jinde, and Tang, Shan
- Subjects
- *
VISUAL fields , *COMPUTER vision , *IMAGE registration - Abstract
Arbitrary style transfer has broad application prospects and important research values, and it is a research hotspot in the field of computer vision. Many studies have shown that arbitrary style transfer achieves remarkable success. However, existing methods may produce artifacts and sometimes lead to distortion of the content structure. To overcome this limitation, this paper proposes a novel Attention-wise and Covariance-Matching Module (ACMM) that preserves the content structure well without unpleasant artifacts. First, our method uses global attention covariance matching to match the global information of style features with content features, thereby obtaining pleasing stylized images. Then, to enable the model to better match the global statistics, a histogram loss is introduced to improve the saturation and stability of the resulting color. Our method can preserve the content structure; so that appearance transfer can be achieved with simple adjustments to the model. The effectiveness of the proposed method is demonstrated through qualitative and quantitative experiments with comparisons with some state-of-the-art arbitrary style transfer methods. [Display omitted] • The Attention-wise and Covariance-Matching Module can preserve content structure well without unpleasant artifacts. • The histogram loss can improve the color saturation of the generated results and enhances the stability of the results. • The proposed framework can be extended to achieve photorealistic style transfer. • The visual performance of the proposed framework is comparable to other state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. An enhanced interactive endoscope model based on position-based dynamics and Cosserat rods for colonoscopy simulation.
- Author
-
Morais, Lucas Zanusso, Bergmann, Victor Kunde, Carvalho, Eduarda Abreu, Zimmer, Raquel, Martins, Marcelo Gomes, Nedel, Luciana Porcher, Maciel, Anderson, and Torchelsen, Rafael Piccin
- Subjects
- *
COLONOSCOPY , *VIRTUAL colonoscopy , *TRAINING of surgeons , *VIDEO games , *COLON (Anatomy) - Abstract
Virtual simulators play a significant role in training surgeons for endoscopic procedures, improving their abilities in real-world scenarios while reducing patient exposure. However, due to the complexity of the organs' behavior, building realistic virtual simulators is still a challenge. In this paper, we propose an approach to modeling the endoscope and its interaction with the colon with a focus on compatibility between colon simulation, endoscope simulation, and user interaction. Instead of using a traditional physics-based approach, which is computationally expensive, we rely on efficient position-based methods that were originally built for video games, where high accuracy is traded for interactive plausibility. We propose a Cosserat rod constraint to model the endoscope and designed a user-centered interface that allows the simulator to run on commodity computers instead of dedicated training hardware. We implemented our model and interface on Unity and evaluated our virtual endoscope for compatibility with a deformable colon. Results show that the model performs in real-time, the dynamic elongation of the scope is stable, and the typical maneuvers made in total colonoscopy can be effectively made with the interface. [Display omitted] • We present an endoscope model and provide data demonstrating relevant aspects. • Compatibility with a deformable colon. • Real-time performance that allows for both visual and force-feedback fidelity. • We present an interface that increases the access and availability of the simulator. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. Unsupervised method for identifying shape instances on 3D CAD models.
- Author
-
Figueiredo, Lucas, Ivson, Paulo, and Celes, Waldemar
- Subjects
- *
SUPERVISED learning , *DEEP learning - Abstract
Increasingly complex 3D CAD models are essential during different life-cycle stages of modern engineering projects. Even though these models contain several repeated geometries, instancing information is often not available, resulting in increased requirements for storage, transmission, and rendering. Previous research have successfully applied shape matching techniques to identify repeated geometries and thus reduce memory requirements and improve rendering performance. However, these approaches require consistent vertex topology, prior knowledge about the scene, and/or the laborious creation of labeled datasets. In this paper, we present an unsupervised deep-learning method that overcomes these limitations and is capable of identifying repeated geometries and computing their instancing transformations. The method also guarantees a maximum visual error and preserves intrinsic characteristics of surfaces. Results on real-world 3D CAD models demonstrate the effectiveness of our approach: the datasets are reduced by up to 83.93% in size. Our approach achieves better results than previous work that does not rely on supervised learning. The proposed method is applicable to any kind of 3D scene and geometry. [Display omitted] • Unsupervised deep learning method for 3D shape registration. • Does not require any previous knowledge of the 3D geometries. • Does not require a labeled dataset for any supervised training. • Guarantees an upper bound on any visual errors. • Generalizes for any 3D scene and geometry. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
47. CR-Net: A robust craniofacial registration network by introducing Wasserstein distance constraint and geometric attention mechanism.
- Author
-
Dai, Zhenyu, Zhao, Junli, Deng, Xiaodan, Duan, Fuqing, Li, Dantong, Pan, Zhenkuan, and Zhou, Mingquan
- Subjects
- *
PETRI nets , *GAUSSIAN mixture models , *RECORDING & registration , *POINT cloud - Abstract
Accurate registration of three-dimensional (3D) craniofacial data is fundamental work for craniofacial reconstruction and analysis. The complex topology and low-quality 3D models make craniofacial registration challenging in the iterative optimization process. In this paper, we proposed a craniofacial registration network (CR-Net) that can automatically learn the registration parameters of the non-rigid thin plate spline (TPS) transformation from the training data sets and perform the required geometric transformations to align craniofacial point clouds. The proposed CR-Net employs an improved point cloud encoder architecture, a specially designed attention mechanism that can perceive the geometric structure of the point cloud. In order to align the source and target data, Wasserstein distance loss is introduced to combined with Chamfer loss and Gaussian Mixture Models (GMM) loss as an unsupervised loss function dedicated to improves registration accuracy. After efficient training, the network can automatically generate the transformation parameters for registration, transforming the reference craniofacial data to the target craniofacial data without manual calibration of feature points or performing an iterative optimization process. Experimental results show that our method has high registration accuracy and is robust to low-quality models. [Display omitted] • A neural network for robust craniofacial point cloud registration. • Geometric attention mechanism to perceive the geometric structure of point clouds. • Introducing Wasserstein distance loss constrain unsupervised training. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
48. AdaptMVSNet: Efficient Multi-View Stereo with adaptive convolution and attention fusion.
- Author
-
Jiang, Pengfei, Yang, Xiaoyan, Chen, Yuanjie, Song, Wenjie, and Li, Yang
- Subjects
- *
SOURCE code , *GEOMETRIC approach , *COMPUTER vision , *DEEP learning , *PYRAMIDS - Abstract
Multi-View Stereo (MVS) is a crucial technique for reconstructing the geometric structure of a scene, given the known camera parameters. Previous deep learning-based MVS methods have mainly focused on improving the reconstruction quality but overlooked the running efficiency during the actual algorithm deployment. For example, deformable convolutions have been introduced to improve the accuracy of the reconstruction results further, however, its inability for parallel optimization caused low inference speed. In this paper, we propose AdaptMVSNet which is device-friendly and reconstruction-efficient, while preserving the original results. To this end, adaptive convolution is introduced to significantly improve the efficiency in speed and metrics compared to current methods. In addition, an attention fusion module is proposed to blend features from adaptive convolution and the feature pyramid network. Our experiments demonstrate that our proposed approach achieves state-of-the-art performance and is almost 2 × faster than the recent fastest MVS method. We will release our source code. [Display omitted] [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
49. Random screening-based feature aggregation for point cloud denoising.
- Author
-
Wang, Weijia, Pan, Wei, Liu, Xiao, Su, Kui, Rolfe, Bernard, and Lu, Xuequan
- Subjects
- *
POINT cloud - Abstract
Raw point clouds captured by sensing devices are often contaminated with noise, which perturbs the fidelity of the original geometric information. Point cloud denoising is therefore an inseparable post-processing step, aiming to remove the noise in the point clouds. Existing point cloud denoising approaches are typically trained on datasets that have uniform point distributions and densities, making them unsuitable for effectively denoising point clouds with severe noise or irregular point distributions. In this paper, we introduce a novel random screening-based feature aggregation method for point cloud denoising. Our key insight is that merging features of dense and sparse points assists with enhancing the quality of point cloud denoising results. In specific, our approach involves randomly screening the features of local point patches and fusing richer geometric information of denser points into sparser point representations. Comprehensive experiments demonstrate that our method achieves state-of-the-art performance in the point cloud denoising task on both synthetic and real-world datasets. [Display omitted] • Random screening-based point feature aggregation is useful for point cloud denoising. • Merging features of dense points into sparser points boosts point cloud denoising. • Our pipeline demonstrates robustness to noise and saves processing time. • Our technique achieves impressive results on severe noise and irregular points. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
50. TSNeRF: Text-driven stylized neural radiance fields via semantic contrastive learning.
- Author
-
Wang, Yi, Cheng, Jing-Song, Feng, Qiao, Tao, Wen-Yuan, Lai, Yu-Kun, and Li, Kun
- Subjects
- *
RADIANCE , *DATABASES , *SAMPLING methods - Abstract
3D scene stylization aims to generate impressive stylized images from arbitrary novel views based on the stylistic reference. Existing image-driven 3D scene stylization methods require a specific style reference to be given, and lack the ability to produce diverse stylization results by combining style information from different aspects. In this paper, we propose a text-driven 3D scene stylization method based on semantic contrast learning, which takes Neural Radiance Fields (NeRF) as the 3D scene representation and generates diverse 3D stylized scenes by leveraging the semantic capabilities of the Contrastive Language-Image Pre-Training (CLIP) model. For comprehensively exploiting the semantic knowledge to generate finely stylized results, we design a CLIP-based semantic contrast estimation loss, which can avoid the global stylistic inconsistency caused by the NeRF ray sampling method and avoid the tendency to stylize neutral descriptions due to the semantic averaging of the CLIP space. In addition, to reduce the memory burden arising from NeRF ray sampling, we propose a novel ray sampling method with gradient accumulation to optimize the NeRF rendering process. The experimental results indicate that our method generates high-quality and plausible results with cross-view consistency. Moreover, our method enables the creation of new styles that match the target text by combining multiple domains. The code will be available at. • The text-driven 3D implicit stylization method can intuitively and diversely stylize 3D scenes. • The stylization transfer direction of the CLIP semantic space can be controlled by using the contrast learning method, which can more accurately stylize the 3D scene. • Gradient accumulation for NeRF field ray sampling ensures that the stylization losses are calculated with full-size rendered images, and prevent the memory overload caused by NeRF intensive sampling. • The target stylized semantics are fine-tuned on a specialized art database by using a nearest-neighbor semantic similarity searcher, which can generate multi-domain stylized scenes based on a specialized art database. [Display omitted] [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.