Author: "Stricker A" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Stricker A"' showing total 28,717 results

Start Over Author "Stricker A"

28,717 results on '"Stricker A"'

1. MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation

Author: Sinha, Sankalp, Khan, Mohammad Sadil, Usama, Muhammad, Sam, Shino, Stricker, Didier, Ali, Sk Aziz, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Graphics, Computer Science - Machine Learning
Abstract: Generating high-fidelity 3D content from text prompts remains a significant challenge in computer vision due to the limited size, diversity, and annotation depth of the existing datasets. To address this, we introduce MARVEL-40M+, an extensive dataset with 40 million text annotations for over 8.9 million 3D assets aggregated from seven major 3D datasets. Our contribution is a novel multi-stage annotation pipeline that integrates open-source pretrained multi-view VLMs and LLMs to automatically produce multi-level descriptions, ranging from detailed (150-200 words) to concise semantic tags (10-20 words). This structure supports both fine-grained 3D reconstruction and rapid prototyping. Furthermore, we incorporate human metadata from source datasets into our annotation pipeline to add domain-specific information in our annotation and reduce VLM hallucinations. Additionally, we develop MARVEL-FX3D, a two-stage text-to-3D pipeline. We fine-tune Stable Diffusion with our annotations and use a pretrained image-to-3D network to generate 3D textured meshes within 15s. Extensive evaluations show that MARVEL-40M+ significantly outperforms existing datasets in annotation quality and linguistic diversity, achieving win rates of 72.41% by GPT-4 and 73.40% by human evaluators.
Published: 2024

2. Modality-Incremental Learning with Disjoint Relevance Mapping Networks for Image-based Semantic Segmentation

Author: Hegde, Niharika, Muralidhara, Shishir, Schuster, René, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In autonomous driving, environment perception has significantly advanced with the utilization of deep learning techniques for diverse sensors such as cameras, depth sensors, or infrared sensors. The diversity in the sensor stack increases the safety and contributes to robustness against adverse weather and lighting conditions. However, the variance in data acquired from different sensors poses challenges. In the context of continual learning (CL), incremental learning is especially challenging for considerably large domain shifts, e.g. different sensor modalities. This amplifies the problem of catastrophic forgetting. To address this issue, we formulate the concept of modality-incremental learning and examine its necessity, by contrasting it with existing incremental learning paradigms. We propose the use of a modified Relevance Mapping Network (RMN) to incrementally learn new modalities while preserving performance on previously learned modalities, in which relevance maps are disjoint. Experimental results demonstrate that the prevention of shared connections in this approach helps alleviate the problem of forgetting within the constraints of a strict continual learning framework., Comment: Accepted at WACV 2025
Published: 2024

3. AnonyNoise: Anonymizing Event Data with Smart Noise to Outsmart Re-Identification and Preserve Privacy

Author: Bendig, Katharina, Schuster, René, Thiemer, Nicole, Joisten, Karen, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The increasing capabilities of deep neural networks for re-identification, combined with the rise in public surveillance in recent years, pose a substantial threat to individual privacy. Event cameras were initially considered as a promising solution since their output is sparse and therefore difficult for humans to interpret. However, recent advances in deep learning proof that neural networks are able to reconstruct high-quality grayscale images and re-identify individuals using data from event cameras. In our paper, we contribute a crucial ethical discussion on data privacy and present the first event anonymization pipeline to prevent re-identification not only by humans but also by neural networks. Our method effectively introduces learnable data-dependent noise to cover personally identifiable information in raw event data, reducing attackers' re-identification capabilities by up to 60%, while maintaining substantial information for the performing of downstream tasks. Moreover, our anonymization generalizes well on unseen data and is robust against image reconstruction and inversion attacks. Code: https://github.com/dfki-av/AnonyNoise, Comment: Accepted at WACV25
Published: 2024

4. Composition-property extrapolation for compositionally complex solid solutions based on word embeddings

Author: Zhang, Lei, Banko, Lars, Schuhmann, Wolfgang, Ludwig, Alfred, and Stricker, Markus
Subjects: Condensed Matter - Materials Science
Abstract: Mastering the challenge of predicting properties of unknown materials with multiple principal elements (high entropy alloys/compositionally complex solid solutions) is crucial for the speedup in materials discovery. We show and discuss three models, using property data from two ternary systems (Ag-Pd-Ru; Ag-Pd-Pt), to predict material performance in the shared quaternary system (Ag-Pd-Pt-Ru). First, we apply Gaussian Process Regression (GPR) based on composition, which includes both Ag and Pd, achieving an initial correlation coefficient for the prediction ($r$) of 0.63 and a determination coefficient ($r^2$) of 0.08. Second, we present a version of the GPR model using word embedding-derived materials vectors as representations. Using materials-specific embedding vectors significantly improves the predictive capability, evident from an improved $r^2$ of 0.65. The third model is based on a `standard vector method' which synthesizes weighted vector representations of material properties, then creating a reference vector that results in a very good correlation with the quaternary system's material performance (resulting $r$ of 0.89). Our approach demonstrates that existing experimental data combined with latent knowledge of word embedding-based representations of materials can be used effectively for materials discovery where data is typically sparse., Comment: 17 pages, 12 figures, pre-print
Published: 2024

5. SurgeoNet: Realtime 3D Pose Estimation of Articulated Surgical Instruments from Stereo Images using a Synthetically-trained Network

Author: Aboukhadra, Ahmed Tawfik, Robertini, Nadia, Malik, Jameel, Elhayek, Ahmed, Reis, Gerd, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Surgery monitoring in Mixed Reality (MR) environments has recently received substantial focus due to its importance in image-based decisions, skill assessment, and robot-assisted surgery. Tracking hands and articulated surgical instruments is crucial for the success of these applications. Due to the lack of annotated datasets and the complexity of the task, only a few works have addressed this problem. In this work, we present SurgeoNet, a real-time neural network pipeline to accurately detect and track surgical instruments from a stereo VR view. Our multi-stage approach is inspired by state-of-the-art neural-network architectural design, like YOLO and Transformers. We demonstrate the generalization capabilities of SurgeoNet in challenging real-world scenarios, achieved solely through training on synthetic data. The approach can be easily extended to any new set of articulated surgical instruments. SurgeoNet's code and data are publicly available.
Published: 2024

6. Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies

Author: Sarode, Shalini, Khan, Muhammad Saif Ullah, Shehzadi, Tahira, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition, I.2.6
Abstract: We propose ClassroomKD, a novel multi-mentor knowledge distillation framework inspired by classroom environments to enhance knowledge transfer between student and multiple mentors. Unlike traditional methods that rely on fixed mentor-student relationships, our framework dynamically selects and adapts the teaching strategies of diverse mentors based on their effectiveness for each data sample. ClassroomKD comprises two main modules: the Knowledge Filtering (KF) Module and the Mentoring Module. The KF Module dynamically ranks mentors based on their performance for each input, activating only high-quality mentors to minimize error accumulation and prevent information loss. The Mentoring Module adjusts the distillation strategy by tuning each mentor's influence according to the performance gap between the student and mentors, effectively modulating the learning pace. Extensive experiments on image classification (CIFAR-100 and ImageNet) and 2D human pose estimation (COCO Keypoints and MPII Human Pose) demonstrate that ClassroomKD significantly outperforms existing knowledge distillation methods. Our results highlight that a dynamic and adaptive approach to mentor selection and guidance leads to more effective knowledge transfer, paving the way for enhanced model performance through distillation.
Published: 2024

7. Continual Human Pose Estimation for Incremental Integration of Keypoints and Pose Variations

Author: Khan, Muhammad Saif Ullah, Khan, Muhammad Ahmed Ullah, Afzal, Muhammad Zeshan, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: This paper reformulates cross-dataset human pose estimation as a continual learning task, aiming to integrate new keypoints and pose variations into existing models without losing accuracy on previously learned datasets. We benchmark this formulation against established regularization-based methods for mitigating catastrophic forgetting, including EWC, LFL, and LwF. Moreover, we propose a novel regularization method called Importance-Weighted Distillation (IWD), which enhances conventional LwF by introducing a layer-wise distillation penalty and dynamic temperature adjustment based on layer importance for previously learned knowledge. This allows for a controlled adaptation to new tasks that respects the stability-plasticity balance critical in continual learning. Through extensive experiments across three datasets, we demonstrate that our approach outperforms existing regularization-based continual learning strategies. IWD shows an average improvement of 3.60\% over the state-of-the-art LwF method. The results highlight the potential of our method to serve as a robust framework for real-world applications where models must evolve with new data without forgetting past knowledge.
Published: 2024

8. Text2CAD: Generating Sequential CAD Models from Beginner-to-Expert Level Text Prompts

Author: Khan, Mohammad Sadil, Sinha, Sankalp, Sheikh, Talha Uddin, Stricker, Didier, Ali, Sk Aziz, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Graphics
Abstract: Prototyping complex computer-aided design (CAD) models in modern softwares can be very time-consuming. This is due to the lack of intelligent systems that can quickly generate simpler intermediate parts. We propose Text2CAD, the first AI framework for generating text-to-parametric CAD models using designer-friendly instructions for all skill levels. Furthermore, we introduce a data annotation pipeline for generating text prompts based on natural language instructions for the DeepCAD dataset using Mistral and LLaVA-NeXT. The dataset contains $\sim170$K models and $\sim660$K text annotations, from abstract CAD descriptions (e.g., generate two concentric cylinders) to detailed specifications (e.g., draw two circles with center $(x,y)$ and radius $r_{1}$, $r_{2}$, and extrude along the normal by $d$...). Within the Text2CAD framework, we propose an end-to-end transformer-based auto-regressive network to generate parametric CAD models from input texts. We evaluate the performance of our model through a mixture of metrics, including visual quality, parametric precision, and geometrical accuracy. Our proposed framework shows great potential in AI-aided design applications. Our source code and annotations will be publicly available., Comment: Accepted in NeurIPS 2024 (Spotlight)
Published: 2024

9. BRep Boundary and Junction Detection for CAD Reverse Engineering

Author: Ali, Sk Aziz, Khan, Mohammad Sadil, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Multimedia
Abstract: In machining process, 3D reverse engineering of the mechanical system is an integral, highly important, and yet time consuming step to obtain parametric CAD models from 3D scans. Therefore, deep learning-based Scan-to-CAD modeling can offer designers enormous editability to quickly modify CAD model, being able to parse all its structural compositions and design steps. In this paper, we propose a supervised boundary representation (BRep) detection network BRepDetNet from 3D scans of CC3D and ABC dataset. We have carefully annotated the 50K and 45K scans of both the datasets with appropriate topological relations (e.g., next, mate, previous) between the geometrical primitives (i.e., boundaries, junctions, loops, faces) of their BRep data structures. The proposed solution decomposes the Scan-to-CAD problem in Scan-to-BRep ensuring the right step towards feature-based modeling, and therefore, leveraging other existing BRep-to-CAD modeling methods. Our proposed Scan-to-BRep neural network learns to detect BRep boundaries and junctions by minimizing focal-loss and non-maximal suppression (NMS) during training time. Experimental results show that our BRepDetNet with NMS-Loss achieves impressive results., Comment: 6 pages, 5 figures
Published: 2024
Full Text: View/download PDF

10. ShapeAug++: More Realistic Shape Augmentation for Event Data

Author: Bendig, Katharina, Schuster, René, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The novel Dynamic Vision Sensors (DVSs) gained a great amount of attention recently as they are superior compared to RGB cameras in terms of latency, dynamic range and energy consumption. This is particularly of interest for autonomous applications since event cameras are able to alleviate motion blur and allow for night vision. One challenge in real-world autonomous settings is occlusion where foreground objects hinder the view on traffic participants in the background. The ShapeAug method addresses this problem by using simulated events resulting from objects moving on linear paths for event data augmentation. However, the shapes and movements lack complexity, making the simulation fail to resemble the behavior of objects in the real world. Therefore in this paper, we propose ShapeAug++, an extended version of ShapeAug which involves randomly generated polygons as well as curved movements. We show the superiority of our method on multiple DVS classification datasets, improving the top-1 accuracy by up to 3.7% compared to ShapeAug., Comment: accepted in Lecture Notes in Computer Science (LNCS)
Published: 2024

11. GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Author: Oehri, Sven, Ebert, Nikolas, Abdullah, Ahmed, Stricker, Didier, and Wasenmüller, Oliver
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent studies showcase the competitive accuracy of Vision Transformers (ViTs) in relation to Convolutional Neural Networks (CNNs), along with their remarkable robustness. However, ViTs demand a large amount of data to achieve adequate performance, which makes their application to small datasets challenging, falling behind CNNs. To overcome this, we propose GenFormer, a data augmentation strategy utilizing generated images, thereby improving transformer accuracy and robustness on small-scale image classification tasks. In our comprehensive evaluation we propose Tiny ImageNetV2, -R, and -A as new test set variants of Tiny ImageNet by transferring established ImageNet generalization and robustness benchmarks to the small-scale data domain. Similarly, we introduce MedMNIST-C and EuroSAT-C as corrupted test set variants of established fine-grained datasets in the medical and aerial domain. Through a series of experiments conducted on small datasets of various domains, including Tiny ImageNet, CIFAR, EuroSAT and MedMNIST datasets, we demonstrate the synergistic power of our method, in particular when combined with common train and test time augmentations, knowledge distillation, and architectural design choices. Additionally, we prove the effectiveness of our approach under challenging conditions with limited training data, demonstrating significant improvements in both accuracy and robustness, bridging the gap between CNNs and ViTs in the small-scale dataset domain., Comment: This paper has been accepted at International Conference on Pattern Recognition (ICPR), 2024
Published: 2024

12. G3FA: Geometry-guided GAN for Face Animation

Author: Javanmardi, Alireza, Pagani, Alain, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Animating human face images aims to synthesize a desired source identity in a natural-looking way mimicking a driving video's facial movements. In this context, Generative Adversarial Networks have demonstrated remarkable potential in real-time face reenactment using a single source image, yet are constrained by limited geometry consistency compared to graphic-based approaches. In this paper, we introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation. Our novel approach empowers the face animation model to incorporate 3D information using only 2D images, improving the image generation capabilities of the talking head synthesis model. We integrate inverse rendering techniques to extract 3D facial geometry properties, improving the feedback loop to the generator through a weighted average ensemble of discriminators. In our face reenactment model, we leverage 2D motion warping to capture motion dynamics along with orthogonal ray sampling and volume rendering techniques to produce the ultimate visual output. To evaluate the performance of our G3FA, we conducted comprehensive experiments using various evaluation protocols on VoxCeleb2 and TalkingHead benchmarks to demonstrate the effectiveness of our proposed framework compared to the state-of-the-art real-time face animation methods., Comment: BMVC 2024, Accepted
Published: 2024

13. Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer

Author: Shehzadi, Tahira, Ifza, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The impressive advancements in semi-supervised learning have driven researchers to explore its potential in object detection tasks within the field of computer vision. Semi-Supervised Object Detection (SSOD) leverages a combination of a small labeled dataset and a larger, unlabeled dataset. This approach effectively reduces the dependence on large labeled datasets, which are often expensive and time-consuming to obtain. Initially, SSOD models encountered challenges in effectively leveraging unlabeled data and managing noise in generated pseudo-labels for unlabeled data. However, numerous recent advancements have addressed these issues, resulting in substantial improvements in SSOD performance. This paper presents a comprehensive review of 27 cutting-edge developments in SSOD methodologies, from Convolutional Neural Networks (CNNs) to Transformers. We delve into the core components of semi-supervised learning and its integration into object detection frameworks, covering data augmentation techniques, pseudo-labeling strategies, consistency regularization, and adversarial training methods. Furthermore, we conduct a comparative analysis of various SSOD models, evaluating their performance and architectural differences. We aim to ignite further research interest in overcoming existing challenges and exploring new directions in semi-supervised learning for object detection.
Published: 2024

14. CLEO: Continual Learning of Evolving Ontologies

Author: Muralidhara, Shishir, Bukhari, Saqib, Schneider, Georg, Stricker, Didier, and Schuster, René
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Continual learning (CL) addresses the problem of catastrophic forgetting in neural networks, which occurs when a trained model tends to overwrite previously learned information, when presented with a new task. CL aims to instill the lifelong learning characteristic of humans in intelligent systems, making them capable of learning continuously while retaining what was already learned. Current CL problems involve either learning new domains (domain-incremental) or new and previously unseen classes (class-incremental). However, general learning processes are not just limited to learning information, but also refinement of existing information. In this paper, we define CLEO - Continual Learning of Evolving Ontologies, as a new incremental learning setting under CL to tackle evolving classes. CLEO is motivated by the need for intelligent systems to adapt to real-world ontologies that change over time, such as those in autonomous driving. We use Cityscapes, PASCAL VOC, and Mapillary Vistas to define the task settings and demonstrate the applicability of CLEO. We highlight the shortcomings of existing CIL methods in adapting to CLEO and propose a baseline solution, called Modelling Ontologies (MoOn). CLEO is a promising new approach to CL that addresses the challenge of evolving ontologies in real-world applications. MoOn surpasses previous CL approaches in the context of CLEO., Comment: Accepted to ECCV 2024
Published: 2024

15. EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support

Author: Battrawy, Ramy, Schuster, René, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent weakly-supervised methods for scene flow estimation from LiDAR point clouds are limited to explicit reasoning on object-level. These methods perform multiple iterative optimizations for each rigid object, which makes them vulnerable to clustering robustness. In this paper, we propose our EgoFlowNet - a point-level scene flow estimation network trained in a weakly-supervised manner and without object-based abstraction. Our approach predicts a binary segmentation mask that implicitly drives two parallel branches for ego-motion and scene flow. Unlike previous methods, we provide both branches with all input points and carefully integrate the binary mask into the feature extraction and losses. We also use a shared cost volume with local refinement that is updated at multiple scales without explicit clustering or rigidity assumptions. On realistic KITTI scenes, we show that our EgoFlowNet performs better than state-of-the-art methods in the presence of ground surface points., Comment: This paper is published in BMVC2023 (pp. 441-443)
Published: 2024

16. RMS-FlowNet++: Efficient and Robust Multi-Scale Scene Flow Estimation for Large-Scale Point Clouds

Author: Battrawy, Ramy, Schuster, René, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The proposed RMS-FlowNet++ is a novel end-to-end learning-based architecture for accurate and efficient scene flow estimation that can operate on high-density point clouds. For hierarchical scene f low estimation, existing methods rely on expensive Farthest-Point-Sampling (FPS) to sample the scenes, must find large correspondence sets across the consecutive frames and/or must search for correspondences at a full input resolution. While this can improve the accuracy, it reduces the overall efficiency of these methods and limits their ability to handle large numbers of points due to memory requirements. In contrast to these methods, our architecture is based on an efficient design for hierarchical prediction of multi-scale scene flow. To this end, we develop a special flow embedding block that has two advantages over the current methods: First, a smaller correspondence set is used, and second, the use of Random-Sampling (RS) is possible. In addition, our architecture does not need to search for correspondences at a full input resolution. Exhibiting high accuracy, our RMS-FlowNet++ provides a faster prediction than state-of-the-art methods, avoids high memory requirements and enables efficient scene flow on dense point clouds of more than 250K points at once. Our comprehensive experiments verify the accuracy of RMS FlowNet++ on the established FlyingThings3D data set with different point cloud densities and validate our design choices. Furthermore, we demonstrate that our model has a competitive ability to generalize to the real-world scenes of the KITTI data set without fine-tuning., Comment: This version of the article has been accepted by International Journal of Computer Vision (IJCV), and published in 23.05.2024
Published: 2024
Full Text: View/download PDF

17. Continuous Associations between Remote Self-Administered Cognitive Measures and Imaging Biomarkers of Alzheimer’s Disease

Author: Boots, E. A., Frank, R. D., Fan, W. Z., Christianson, T. J., Kremers, W. K., Stricker, J. L., Machulda, M. M., Fields, J. A., Hassenstab, J., Graff-Radford, J., Vemuri, P., Jack, C. R., Knopman, D. S., Petersen, R. C., and Stricker, Nikki H.
Published: 2024
Full Text: View/download PDF

18. Unlocking the Potential of Operations Research for Multi-Graph Matching

Author: Kahl, Max, Stricker, Sebastian, Hutschenreiter, Lisa, Bernard, Florian, and Savchynskyy, Bogdan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We consider the incomplete multi-graph matching problem, which is a generalization of the NP-hard quadratic assignment problem for matching multiple finite sets. Multi-graph matching plays a central role in computer vision, e.g., for matching images or shapes, so that a number of dedicated optimization techniques have been proposed. While the closely related NP-hard multi-dimensional assignment problem (MDAP) has been studied for decades in the operations research community, it only considers complete matchings and has a different cost structure. We bridge this gap and transfer well-known approximation algorithms for the MDAP to incomplete multi-graph matching. To this end, we revisit respective algorithms, adapt them to incomplete multi-graph matching, and propose their extended and parallelized versions. Our experimental validation shows that our new method substantially outperforms the previous state of the art in terms of objective and runtime. Our algorithm matches, for example, 29 images with more than 500 keypoints each in less than two minutes, whereas the fastest considered competitor requires at least half an hour while producing far worse results.
Published: 2024

19. Shape2.5D: A Dataset of Texture-less Surfaces for Depth and Normals Estimation

Author: Khan, Muhammad Saif Ullah, Sinha, Sankalp, Stricker, Didier, Liwicki, Marcus, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Reconstructing texture-less surfaces poses unique challenges in computer vision, primarily due to the lack of specialized datasets that cater to the nuanced needs of depth and normals estimation in the absence of textural information. We introduce "Shape2.5D," a novel, large-scale dataset designed to address this gap. Comprising 1.17 million frames spanning over 39,772 3D models and 48 unique objects, our dataset provides depth and surface normal maps for texture-less object reconstruction. The proposed dataset includes synthetic images rendered with 3D modeling software to simulate various lighting conditions and viewing angles. It also includes a real-world subset comprising 4,672 frames captured with a depth camera. Our comprehensive benchmarks demonstrate the dataset's ability to support the development of algorithms that robustly estimate depth and normals from RGB images and perform voxel reconstruction. Our open-source data generation pipeline allows the dataset to be extended and adapted for future research. The dataset is publicly available at https://github.com/saifkhichi96/Shape25D., Comment: Accepted for publication in IEEE Access
Published: 2024
Full Text: View/download PDF

20. Dislocation cartography: Representations and unsupervised classification of dislocation networks with unique fingerprints

Author: Udofia, Benjamin, Jogi, Tushar, and Stricker, Markus
Subjects: Condensed Matter - Materials Science, Computer Science - Machine Learning
Abstract: Detecting structure in data is the first step to arrive at meaningful representations for systems. This is particularly challenging for dislocation networks evolving as a consequence of plastic deformation of crystalline systems. Our study employs Isomap, a manifold learning technique, to unveil the intrinsic structure of high-dimensional density field data of dislocation structures from different compression axis. The resulting maps provide a systematic framework for quantitatively comparing dislocation structures, offering unique fingerprints based on density fields. Our novel, unbiased approach contributes to the quantitative classification of dislocation structures which can be systematically extended., Comment: 26 pages, 7 figures
Published: 2024

21. Enhanced Bank Check Security: Introducing a Novel Dataset and Transformer-Based Approach for Detection and Verification

Author: Khan, Muhammad Saif Ullah, Shehzadi, Tahira, Noor, Rabeya, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Automated signature verification on bank checks is critical for fraud prevention and ensuring transaction authenticity. This task is challenging due to the coexistence of signatures with other textual and graphical elements on real-world documents. Verification systems must first detect the signature and then validate its authenticity, a dual challenge often overlooked by current datasets and methodologies focusing only on verification. To address this gap, we introduce a novel dataset specifically designed for signature verification on bank checks. This dataset includes a variety of signature styles embedded within typical check elements, providing a realistic testing ground for advanced detection methods. Moreover, we propose a novel approach for writer-independent signature verification using an object detection network. Our detection-based verification method treats genuine and forged signatures as distinct classes within an object detection framework, effectively handling both detection and verification. We employ a DINO-based network augmented with a dilation module to detect and verify signatures on check images simultaneously. Our approach achieves an AP of 99.2 for genuine and 99.4 for forged signatures, a significant improvement over the DINO baseline, which scored 93.1 and 89.3 for genuine and forged signatures, respectively. This improvement highlights our dilation module's effectiveness in reducing both false positives and negatives. Our results demonstrate substantial advancements in detection-based signature verification technology, offering enhanced security and efficiency in financial document processing., Comment: Accepted for publication in 16th IAPR International Workshop on Document Analysis Systems 2024
Published: 2024

22. Situational Instructions Database: Task Guidance in Dynamic Environments

Author: Khan, Muhammad Saif Ullah, Sinha, Sankalp, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The Situational Instructions Database (SID) addresses the need for enhanced situational awareness in artificial intelligence (AI) systems operating in dynamic environments. By integrating detailed scene graphs with dynamically generated, task-specific instructions, SID provides a novel dataset that allows AI systems to perform complex, real-world tasks with improved context sensitivity and operational accuracy. This dataset leverages advanced generative models to simulate a variety of realistic scenarios based on the 3D Semantic Scene Graphs (3DSSG) dataset, enriching it with scenario-specific information that details environmental interactions and tasks. SID facilitates the development of AI applications that can adapt to new and evolving conditions without extensive retraining, supporting research in autonomous technology and AI-driven decision-making processes. This dataset is instrumental in developing robust, context-aware AI agents capable of effectively navigating and responding to unpredictable settings. Available for research and development, SID serves as a critical resource for advancing the capabilities of intelligent systems in complex environments. Dataset available at \url{https://github.com/mindgarage/situational-instructions-database}., Comment: 9 pages, 6 figures
Published: 2024

23. UnSupDLA: Towards Unsupervised Document Layout Analysis

Author: Sheikh, Talha Uddin, Shehzadi, Tahira, Hashmi, Khurram Azeem, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Document layout analysis is a key area in document research, involving techniques like text mining and visual analysis. Despite various methods developed to tackle layout analysis, a critical but frequently overlooked problem is the scarcity of labeled data needed for analyses. With the rise of internet use, an overwhelming number of documents are now available online, making the process of accurately labeling them for research purposes increasingly challenging and labor-intensive. Moreover, the diversity of documents online presents a unique set of challenges in maintaining the quality and consistency of these labels, further complicating document layout analysis in the digital era. To address this, we employ a vision-based approach for analyzing document layouts designed to train a network without labels. Instead, we focus on pre-training, initially generating simple object masks from the unlabeled document images. These masks are then used to train a detector, enhancing object detection and segmentation performance. The model's effectiveness is further amplified through several unsupervised training iterations, continuously refining its performance. This approach significantly advances document layout analysis, particularly precision and efficiency, without labels., Comment: ICDAR 2024 - Workshop
Published: 2024

24. Unifying atoms and colloids near the glass transition through bond-order topology

Author: Stricker, Laura, Derlet, Peter M., Demirörs, Ahmet Faik, Vutukuri, Hanumantha Rao, and Vermant, Jan
Subjects: Condensed Matter - Soft Condensed Matter, 82D30
Abstract: In this combined experimental and simulation study, we utilize bond-order topology to quantitatively match particle volume fraction in mechanically uniformly compressed colloidal suspensions with temperature in atomistic simulations. The obtained mapping temperature is above the dynamical glass transition temperature, indicating that the colloidal systems examined are structurally most like simulated undercooled liquids. Furthermore, the structural mapping procedure offers a unifying framework for quantifying relaxation in arrested colloidal systems., Comment: Main: 6 pages, 3 figures. Supplementary Material: 10 pages, 14 figures
Published: 2024
Full Text: View/download PDF

25. Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach

Author: Khan, Muhammad Saif Ullah, Limbachiya, Dhavalkumar, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Human pose estimation is a key task in computer vision with various applications such as activity recognition and interactive systems. However, the lack of consistency in the annotated skeletons across different datasets poses challenges in developing universally applicable models. To address this challenge, we propose a novel approach integrating multi-teacher knowledge distillation with a unified skeleton representation. Our networks are jointly trained on the COCO and MPII datasets, containing 17 and 16 keypoints, respectively. We demonstrate enhanced adaptability by predicting an extended set of 21 keypoints, 4 (COCO) and 5 (MPII) more than original annotations, improving cross-dataset generalization. Our joint models achieved an average accuracy of 70.89 and 76.40, compared to 53.79 and 55.78 when trained on a single dataset and evaluated on both. Moreover, we also evaluate all 21 predicted points by our two models by reporting an AP of 66.84 and 72.75 on the Halpe dataset. This highlights the potential of our technique to address one of the most pressing challenges in pose estimation research and application - the inconsistency in skeletal annotations., Comment: 15 pages (with references)
Published: 2024

26. End-to-End Semi-Supervised approach with Modulated Object Queries for Table Detection in Documents

Author: Ehsan, Iqraa, Shehzadi, Tahira, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Table detection, a pivotal task in document analysis, aims to precisely recognize and locate tables within document images. Although deep learning has shown remarkable progress in this realm, it typically requires an extensive dataset of labeled data for proficient training. Current CNN-based semi-supervised table detection approaches use the anchor generation process and Non-Maximum Suppression (NMS) in their detection process, limiting training efficiency. Meanwhile, transformer-based semi-supervised techniques adopted a one-to-one match strategy that provides noisy pseudo-labels, limiting overall efficiency. This study presents an innovative transformer-based semi-supervised table detector. It improves the quality of pseudo-labels through a novel matching strategy combining one-to-one and one-to-many assignment techniques. This approach significantly enhances training efficiency during the early stages, ensuring superior pseudo-labels for further training. Our semi-supervised approach is comprehensively evaluated on benchmark datasets, including PubLayNet, ICADR-19, and TableBank. It achieves new state-of-the-art results, with a mAP of 95.7% and 97.9% on TableBank (word) and PubLaynet with 30% label data, marking a 7.4 and 7.6 point improvement over previous semi-supervised table detection approach, respectively. The results clearly show the superiority of our semi-supervised approach, surpassing all existing state-of-the-art methods by substantial margins. This research represents a significant advancement in semi-supervised table detection methods, offering a more efficient and accurate solution for practical document analysis tasks., Comment: ICDAR-IJDAR 2024
Published: 2024

27. CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification

Author: Sinha, Sankalp, Khan, Muhammad Saif Ullah, Sheikh, Talha Uddin, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Zero-shot learning has been extensively investigated in the broader field of visual recognition, attracting significant interest recently. However, the current work on zero-shot learning in document image classification remains scarce. The existing studies either focus exclusively on zero-shot inference, or their evaluation does not align with the established criteria of zero-shot evaluation in the visual recognition domain. We provide a comprehensive document image classification analysis in Zero-Shot Learning (ZSL) and Generalized Zero-Shot Learning (GZSL) settings to address this gap. Our methodology and evaluation align with the established practices of this domain. Additionally, we propose zero-shot splits for the RVL-CDIP dataset. Furthermore, we introduce CICA (pronounced 'ki-ka'), a framework that enhances the zero-shot learning capabilities of CLIP. CICA consists of a novel 'content module' designed to leverage any generic document-related textual information. The discriminative features extracted by this module are aligned with CLIP's text and image features using a novel 'coupled-contrastive' loss. Our module improves CLIP's ZSL top-1 accuracy by 6.7% and GZSL harmonic mean by 24% on the RVL-CDIP dataset. Our module is lightweight and adds only 3.3% more parameters to CLIP. Our work sets the direction for future research in zero-shot document classification., Comment: 18 Pages, 4 Figures and Accepted in ICDAR 2024
Published: 2024

28. Towards End-to-End Semi-Supervised Table Detection with Semantic Aligned Matching Transformer

Author: Shehzadi, Tahira, Sarode, Shalini, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still heavily relies on large labeled datasets for effective training. Several semi-supervised approaches have emerged to overcome this challenge, often employing CNN-based detectors with anchor proposals and post-processing techniques like non-maximal suppression (NMS). However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms. Previous research has focused on two key areas to improve transformer-based detectors: refining the quality of object queries and optimizing attention mechanisms. However, increasing object queries can introduce redundancy, while adjustments to the attention mechanism can increase complexity. To address these challenges, we introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features. Our approach demonstrates remarkable reductions in false positives and substantial enhancements in table detection performance, particularly in complex documents characterized by diverse table structures. This work provides more efficient and accurate table detection in semi-supervised settings., Comment: ICDAR 2024
Published: 2024

29. A Hybrid Approach for Document Layout Analysis in Document images

Author: Shehzadi, Tahira, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Document layout analysis involves understanding the arrangement of elements within a document. This paper navigates the complexities of understanding various elements within document images, such as text, images, tables, and headings. The approach employs an advanced Transformer-based object detection network as an innovative graphical page object detector for identifying tables, figures, and displayed elements. We introduce a query encoding mechanism to provide high-quality object queries for contrastive learning, enhancing efficiency in the decoder phase. We also present a hybrid matching scheme that integrates the decoder's original one-to-one matching strategy with the one-to-many matching strategy during the training phase. This approach aims to improve the model's accuracy and versatility in detecting various graphical elements on a page. Our experiments on PubLayNet, DocLayNet, and PubTables benchmarks show that our approach outperforms current state-of-the-art methods. It achieves an average precision of 97.3% on PubLayNet, 81.6% on DocLayNet, and 98.6 on PubTables, demonstrating its superior performance in layout analysis. These advancements not only enhance the conversion of document images into editable and accessible formats but also streamline information retrieval and data extraction processes., Comment: ICDAR 2024
Published: 2024

30. Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection

Author: Shehzadi, Tahira, Hashmi, Khurram Azeem, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this paper, we address the limitations of the DETR-based semi-supervised object detection (SSOD) framework, particularly focusing on the challenges posed by the quality of object queries. In DETR-based SSOD, the one-to-one assignment strategy provides inaccurate pseudo-labels, while the one-to-many assignments strategy leads to overlapping predictions. These issues compromise training efficiency and degrade model performance, especially in detecting small or occluded objects. We introduce Sparse Semi-DETR, a novel transformer-based, end-to-end semi-supervised object detection solution to overcome these challenges. Sparse Semi-DETR incorporates a Query Refinement Module to enhance the quality of object queries, significantly improving detection capabilities for small and partially obscured objects. Additionally, we integrate a Reliable Pseudo-Label Filtering Module that selectively filters high-quality pseudo-labels, thereby enhancing detection accuracy and consistency. On the MS-COCO and Pascal VOC object detection benchmarks, Sparse Semi-DETR achieves a significant improvement over current state-of-the-art methods that highlight Sparse Semi-DETR's effectiveness in semi-supervised object detection, particularly in challenging scenarios involving small or partially obscured objects., Comment: CVPR2024
Published: 2024

31. Continental-scale nutrient and contaminant delivery by Pacific salmon

Author: Brandt, Jessica E., Wesner, Jeff S., Ruggerone, Gregory T., Jardine, Timothy D., Eagles-Smith, Collin A., Ruso, Gabrielle E., Stricker, Craig A., Voss, Kristofor A., and Walters, David M.
Published: 2024
Full Text: View/download PDF

32. SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks

Author: Xie, Yaxu, Pagani, Alain, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics
Abstract: Scene graphs have been recently introduced into 3D spatial understanding as a comprehensive representation of the scene. The alignment between 3D scene graphs is the first step of many downstream tasks such as scene graph aided point cloud registration, mosaicking, overlap checking, and robot navigation. In this work, we treat 3D scene graph alignment as a partial graph-matching problem and propose to solve it with a graph neural network. We reuse the geometric features learned by a point cloud registration method and associate the clustered point-level geometric features with the node-level semantic feature via our designed feature fusion module. Partial matching is enabled by using a learnable method to select the top-k similar node pairs. Subsequent downstream tasks such as point cloud registration are achieved by running a pre-trained registration network within the matched regions. We further propose a point-matching rescoring method, that uses the node-wise alignment of the 3D scene graph to reweight the matching candidates from a pre-trained point cloud registration method. It reduces the false point correspondences estimated especially in low-overlapping cases. Experiments show that our method improves the alignment accuracy by 10~20% in low-overlap and random transformation scenarios and outperforms the existing work in multiple downstream tasks., Comment: 16 pages, 10 figures
Published: 2024

33. Employing constrained non-negative matrix factorization for microstructure segmentation

Author: Chauniyal, Ashish, Thome, Pascal, and Stricker, Markus
Subjects: Condensed Matter - Materials Science
Abstract: Materials characterization using electron backscatter diffraction (EBSD) requires indexing the orientation of the measured region from Kikuchi patterns. The quality of Kikuchi patterns can degrade due to pattern overlaps arising from two or more orientations, in the presence of defects or grain boundaries. In this work we employ constrained non-negative matrix factorization to segment a microstructure with small grain misorientations,< 1 degree, and predict the amount of pattern overlap. First we implement the method on mixed simulated patterns - that replicates a pattern overlap scenario, and demonstrate the resolution limit of pattern mixing or factorization resolution using a weight metric. Subsequently, we segment a single-crystal dendritic microstructure and compare the results with high resolution EBSD. By utilizing weight metrics across a low angle grain boundary we demonstrate how very small misorientations/low-angle grain boundaries can be resolved at a pixel level. Our approach constitutes a versatile and robust tool, complementing other fast indexing methods for microstructure characterization., Comment: 22 pages, 7 figures
Published: 2024

34. Human Pose Descriptions and Subject-Focused Attention for Improved Zero-Shot Transfer in Human-Centric Classification Tasks

Author: Khan, Muhammad Saif Ullah, Naeem, Muhammad Ferjad, Tombari, Federico, Van Gool, Luc, Stricker, Didier, and Afzal, Muhammad Zeshan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: We present a novel LLM-based pipeline for creating contextual descriptions of human body poses in images using only auxiliary attributes. This approach facilitates the creation of the MPII Pose Descriptions dataset, which includes natural language annotations for 17,367 images containing people engaged in 410 distinct activities. We demonstrate the effectiveness of our pose descriptions in enabling zero-shot human-centric classification using CLIP. Moreover, we introduce the FocusCLIP framework, which incorporates Subject-Focused Attention (SFA) in CLIP for improved text-to-image alignment. Our models were pretrained on the MPII Pose Descriptions dataset and their zero-shot performance was evaluated on five unseen datasets covering three tasks. FocusCLIP outperformed the baseline CLIP model, achieving an average accuracy increase of 8.61\% (33.65\% compared to CLIP's 25.04\%). Notably, our approach yielded improvements of 3.98\% in activity recognition, 14.78\% in age classification, and 7.06\% in emotion recognition. These results highlight the potential of integrating detailed pose descriptions and subject-level guidance into general pretraining frameworks for enhanced performance in downstream tasks.
Published: 2024

35. MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding

Author: Chang, Chun-Peng, Wang, Shaoxiang, Pagani, Alain, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D visual grounding involves matching natural language descriptions with their corresponding objects in 3D spaces. Existing methods often face challenges with accuracy in object recognition and struggle in interpreting complex linguistic queries, particularly with descriptions that involve multiple anchors or are view-dependent. In response, we present the MiKASA (Multi-Key-Anchor Scene-Aware) Transformer. Our novel end-to-end trained model integrates a self-attention-based scene-aware object encoder and an original multi-key-anchor technique, enhancing object recognition accuracy and the understanding of spatial relationships. Furthermore, MiKASA improves the explainability of decision-making, facilitating error diagnosis. Our model achieves the highest overall accuracy in the Referit3D challenge for both the Sr3D and Nr3D datasets, particularly excelling by a large margin in categories that require viewpoint-dependent descriptions.
Published: 2024

36. Chitchat as Interference: Adding User Backstories to Task-Oriented Dialogues

Author: Stricker, Armand and Paroubek, Patrick
Subjects: Computer Science - Computation and Language
Abstract: During task-oriented dialogues (TODs), human users naturally introduce chitchat that is beyond the immediate scope of the task, interfering with the flow of the conversation. To address this issue without the need for expensive manual data creation, we use few-shot prompting with Llama-2-70B to enhance the MultiWOZ dataset with user backstories, a typical example of chitchat interference in TODs. We assess the impact of this addition by testing two models: one trained solely on TODs and another trained on TODs with a preliminary chitchat interaction. Our analysis demonstrates that our enhanced dataset poses a challenge for these systems. Moreover, we demonstrate that our dataset can be effectively used for training purposes, enabling a system to consistently acknowledge the user's backstory while also successfully moving the task forward in the same turn, as confirmed by human evaluation. These findings highlight the benefits of generating novel chitchat-TOD scenarios to test TOD systems more thoroughly and improve their resilience to natural user interferences, Comment: Accepted @ LREC-COLING 2024
Published: 2024

37. Speech foundation models in healthcare: Effect of layer selection on pathological speech feature prediction

Author: Wiepert, Daniela A., Utianski, Rene L., Duffy, Joseph R., Stricker, John L., Barnard, Leland R., Jones, David T., and Botha, Hugo
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Accurately extracting clinical information from speech is critical to the diagnosis and treatment of many neurological conditions. As such, there is interest in leveraging AI for automatic, objective assessments of clinical speech to facilitate diagnosis and treatment of speech disorders. We explore transfer learning using foundation models, focusing on the impact of layer selection for the downstream task of predicting pathological speech features. We find that selecting an optimal layer can greatly improve performance (~15.8% increase in balanced accuracy per feature as compared to worst layer, ~13.6% increase as compared to final layer), though the best layer varies by predicted feature and does not always generalize well to unseen data. A learned weighted sum offers comparable performance to the average best layer in-distribution (only ~1.2% lower) and had strong generalization for out-of-distribution data (only 1.5% lower than the average best layer)., Comment: Accepted to INTERSPEECH 2024
Published: 2024

38. A Unified Approach to Emotion Detection and Task-Oriented Dialogue Modeling

Author: Stricker, Armand and Paroubek, Patrick
Subjects: Computer Science - Computation and Language
Abstract: In current text-based task-oriented dialogue (TOD) systems, user emotion detection (ED) is often overlooked or is typically treated as a separate and independent task, requiring additional training. In contrast, our work demonstrates that seamlessly unifying ED and TOD modeling brings about mutual benefits, and is therefore an alternative to be considered. Our method consists in augmenting SimpleToD, an end-to-end TOD system, by extending belief state tracking to include ED, relying on a single language model. We evaluate our approach using GPT-2 and Llama-2 on the EmoWOZ benchmark, a version of MultiWOZ annotated with emotions. Our results reveal a general increase in performance for ED and task results. Our findings also indicate that user emotions provide useful contextual conditioning for system responses, and can be leveraged to further refine responses in terms of empathy., Comment: Accepted @ IWSDS 2024
Published: 2024

39. ShapeAug: Occlusion Augmentation for Event Camera Data

Author: Bendig, Katharina, Schuster, René, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, Dynamic Vision Sensors (DVSs) sparked a lot of interest due to their inherent advantages over conventional RGB cameras. These advantages include a low latency, a high dynamic range and a low energy consumption. Nevertheless, the processing of DVS data using Deep Learning (DL) methods remains a challenge, particularly since the availability of event training data is still limited. This leads to a need for event data augmentation techniques in order to improve accuracy as well as to avoid over-fitting on the training data. Another challenge especially in real world automotive applications is occlusion, meaning one object is hindering the view onto the object behind it. In this paper, we present a novel event data augmentation approach, which addresses this problem by introducing synthetic events for randomly moving objects in a scene. We test our method on multiple DVS classification datasets, resulting in an relative improvement of up to 6.5 % in top1-accuracy. Moreover, we apply our augmentation technique on the real world Gen1 Automotive Event Dataset for object detection, where we especially improve the detection of pedestrians by up to 5 %., Comment: Accepted at ICPRAM 2024
Published: 2024

40. A multi-center international study to evaluate the safety, functional and oncological outcomes of irreversible electroporation for the ablation of prostate cancer

Author: Zhang, Kai, Stricker, Phillip, Löhr, Martin, Stehling, Michael, Suberville, Michel, Cussenot, Olivier, Lunelli, Luca, Ng, Chi-Fai, Teoh, Jeremy, Laguna, Pilar, and de la Rosette, Jean
Published: 2024
Full Text: View/download PDF

41. Associations of continuum beliefs with personality disorder stigma: correlational and experimental evidence

Author: Stricker, Johannes, Jakob, Louisa, and Pietrowsky, Reinhard
Published: 2024
Full Text: View/download PDF

42. A Relational Frame Theory-Based Intervention for Improving Reading and Mathematical Competencies Among School Children

Author: Stricker, Charles, Mao, Jin, Cassidy, Sarah, Colbert, Dylan, and Roche, Bryan
Published: 2024
Full Text: View/download PDF

43. Delphi consensus project on prostate-specific membrane antigen (PSMA)–targeted surgery—outcomes from an international multidisciplinary panel

Author: Berrens, Anne-Claire, Scheltema, Matthijs, Maurer, Tobias, Hermann, Ken, Hamdy, Freddie C., Knipper, Sophie, Dell’Oglio, Paolo, Mazzone, Elio, de Barros, Hilda A., Sorger, Jonathan M., van Oosterom, Matthias N., Stricker, Philip D., van Leeuwen, Pim J., Rietbergen, Daphne D. D., Valdes Olmos, Renato A., Vidal-Sicart, Sergi, Carroll, Peter R., Buckle, Tessa, van der Poel, Henk G., and van Leeuwen, Fijs W. B.
Published: 2024
Full Text: View/download PDF

44. Learned Fusion: 3D Object Detection using Calibration-Free Transformer Feature Fusion

Author: Fürst, Michael, Jakkamsetty, Rahul, Schuster, René, and Stricker, Didier
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The state of the art in 3D object detection using sensor fusion heavily relies on calibration quality, which is difficult to maintain in large scale deployment outside a lab environment. We present the first calibration-free approach for 3D object detection. Thus, eliminating the need for complex and costly calibration procedures. Our approach uses transformers to map the features between multiple views of different sensors at multiple abstraction levels. In an extensive evaluation for object detection, we not only show that our approach outperforms single modal setups by 14.1% in BEV mAP, but also that the transformer indeed learns mapping. By showing calibration is not necessary for sensor fusion, we hope to motivate other researchers following the direction of calibration-free fusion. Additionally, resulting approaches have a substantial resilience against rotation and translation changes., Comment: 11 pages, 5 figures
Published: 2023

45. Question Answering in Natural Language: the Special Case of Temporal Expressions

Author: Stricker, Armand
Subjects: Computer Science - Computation and Language
Abstract: Although general question answering has been well explored in recent years, temporal question answering is a task which has not received as much focus. Our work aims to leverage a popular approach used for general question answering, answer extraction, in order to find answers to temporal questions within a paragraph. To train our model, we propose a new dataset, inspired by SQuAD, specifically tailored to provide rich temporal information. We chose to adapt the corpus WikiWars, which contains several documents on history's greatest conflicts. Our evaluation shows that a deep learning model trained to perform pattern matching, often used in general question answering, can be adapted to temporal question answering, if we accept to ask questions whose answers must be directly present within a text., Comment: Accepted at Student Research Workshop associated with RANLP-2021
Published: 2023
Full Text: View/download PDF

46. Searching for Snippets of Open-Domain Dialogue in Task-Oriented Dialogue Datasets

Author: Stricker, Armand and Paroubek, Patrick
Subjects: Computer Science - Computation and Language
Abstract: Most existing dialogue corpora and models have been designed to fit into 2 predominant categories : task-oriented dialogues portray functional goals, such as making a restaurant reservation or booking a plane ticket, while chit-chat/open-domain dialogues focus on holding a socially engaging talk with a user. However, humans tend to seamlessly switch between modes and even use chitchat to enhance task-oriented conversations. To bridge this gap, new datasets have recently been created, blending both communication modes into conversation examples. The approaches used tend to rely on adding chit-chat snippets to pre-existing, human-generated task-oriented datasets. Given the tendencies observed in humans, we wonder however if the latter do not \textit{already} hold chit-chat sequences. By using topic modeling and searching for topics which are most similar to a set of keywords related to social talk, we explore the training sets of Schema-Guided Dialogues and MultiWOZ. Our study shows that sequences related to social talk are indeed naturally present, motivating further research on ways chitchat is combined into task-oriented dialogues.
Published: 2023

47. Enhancing Task-Oriented Dialogues with Chitchat: a Comparative Study Based on Lexical Diversity and Divergence

Author: Stricker, Armand and Paroubek, Patrick
Subjects: Computer Science - Computation and Language
Abstract: As a recent development, task-oriented dialogues (TODs) have been enriched with chitchat in an effort to make dialogues more diverse and engaging. This enhancement is particularly valuable as TODs are often confined to narrow domains, making the mitigation of repetitive and predictable responses a significant challenge. This paper presents a comparative analysis of three chitchat enhancements, aiming to identify the most effective approach in terms of diversity. Additionally, we quantify the divergence between the added chitchat, the original task-oriented language, and chitchat typically found in chitchat datasets, highlighting the top 20 divergent keywords for each comparison. Our findings drive a discussion on future enhancements for augmenting TODs, emphasizing the importance of grounding dialogues beyond the task to achieve more diverse and natural exchanges., Comment: Accepted @ ASRU 2023 Code: https://github.com/armandstrickernlp/Task-Chitchat-Entropy
Published: 2023
Full Text: View/download PDF

48. HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Author: Lin, Yongliang, Su, Yongzhi, Nathan, Praveen, Inuganti, Sandeep, Di, Yan, Sundermeyer, Martin, Manhardt, Fabian, Stricker, Didier, Rambach, Jason, and Zhang, Yu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: In this work, we present a novel dense-correspondence method for 6DoF object pose estimation from a single RGB-D image. While many existing data-driven methods achieve impressive performance, they tend to be time-consuming due to their reliance on rendering-based refinement approaches. To circumvent this limitation, we present HiPose, which establishes 3D-3D correspondences in a coarse-to-fine manner with a hierarchical binary surface encoding. Unlike previous dense-correspondence methods, we estimate the correspondence surface by employing point-to-surface matching and iteratively constricting the surface until it becomes a correspondence point while gradually removing outliers. Extensive experiments on public benchmarks LM-O, YCB-V, and T-Less demonstrate that our method surpasses all refinement-free methods and is even on par with expensive refinement-based approaches. Crucially, our approach is computationally efficient and enables real-time critical applications with high accuracy requirements., Comment: CVPR 2024
Published: 2023

49. MatNexus: A Comprehensive Text Mining and Analysis Suite for Materials Discover

Author: Zhang, Lei and Stricker, Markus
Subjects: Condensed Matter - Materials Science, Computer Science - Computation and Language, Physics - Chemical Physics, H.4, H.5, I.5, I.7, J.2
Abstract: MatNexus is a specialized software for the automated collection, processing, and analysis of text from scientific articles. Through an integrated suite of modules, the MatNexus facilitates the retrieval of scientific articles, processes textual data for insights, generates vector representations suitable for machine learning, and offers visualization capabilities for word embeddings. With the vast volume of scientific publications, MatNexus stands out as an end-to-end tool for researchers aiming to gain insights from scientific literature in material science, making the exploration of materials, such as the electrocatalyst examples we show here, efficient and insightful., Comment: 15 pages, 6 figures, submission to SoftwareX
Published: 2023
Full Text: View/download PDF

50. Microscopic insights on field induced switching and domain wall motion in orthorhombic ferroelectrics

Author: Khachaturyan, Ruben, Yang, Yijing, Teng, Sheng-Han, Udofia, Benjamin, Stricker, Markus, and Grünebohm, Anna
Subjects: Condensed Matter - Materials Science
Abstract: Surprisingly little is known about the microscopic processes that govern ferroelectric switching in orthorhombic ferroelectrics. To study microscopic switching processes we combine ab initio-based molecular dynamics simulations and data science on the prototypical material BaTiO$_3$. We reveal two different field regimes: For moderate field strengths, the switching is dominated by domain wall motion while a fast bulk-like switching can be induced for large fields. Switching in both field regimes follows a multi-step process via polarization directions perpendicular to the applied field. In the former case, the moving wall is of Bloch character and hosts dipole vortices due to nucleation, growth, and crossing of two dimensional 90$^{\circ}$ domains. In the second case, the local polarization shows a continuous correlated rotation via a an intermediate tetragonal multidomain state., Comment: 12 pages, 14 figures
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

28,717 results on '"Stricker A"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources