15,180 results on '"deep Neural Networks"'
Search Results
2. Analysis of data-driven approaches for radar target classification
- Author
-
Coşkun, Aysu and Bilicz, Sándor
- Published
- 2024
- Full Text
- View/download PDF
3. A self-attention-based deep architecture for online handwriting recognition.
- Author
-
Molavi, Seyed Alireza and BabaAli, Bagher
- Subjects
- *
ARTIFICIAL neural networks , *AUTOMATIC speech recognition , *NATURAL language processing , *RECURRENT neural networks , *ARTIFICIAL intelligence - Abstract
The self-attention mechanism has been the most frequent and efficient way for processing and learning sequences in numerous domains of artificial intelligence, including natural language processing, automatic speech recognition, and computer vision in recent years. It has a strong ability to learn the dependencies between the points of the input sequence, particularly those that are separated by a distance, and it also allows for parallel processing of the sequence. As a result, when used in processing sequences, this mechanism has a stronger ability to extract an appropriate representation from the input sequence at a faster rate than other approaches such as recurrent neural networks. Despite the benefits of the self-attention mechanism, recurrent neural networks along with feature engineering have been the most commonly employed approaches to online handwriting recognition. This study introduces an end-to-end online handwriting recognition system that utilizes the self-attention mechanism into three different modeling methods: CTC-based, RNN-T, and encoder–decoder. The proposed system demonstrates the capacity to recognize handwritten scripts without the need for feature engineering. The system's performance was evaluated using the Arabic Online-KHATT dataset and the English IAM-OnDB dataset. On the former, it achieved character error rate (CER) of 4.78% and word error rate (WER) of 20.63%, and on the latter, the CER of 4.10% and the WER of 14.31%, both of which were noticeably better than the results previously reported. Additionally, the Persian Online Handwriting Database was utilized for experimental validation, resulting in a CER 8.03% and a WER of 28.39%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Enhanced speech emotion recognition using averaged valence arousal dominance mapping and deep neural networks.
- Author
-
Rizhinashvili, Davit, Sham, Abdallah Hussein, and Anbarjafari, Gholamreza
- Abstract
This study delves into advancements in speech emotion recognition (SER) by establishing a novel approach for emotion mapping and prediction using the Valence-Arousal-Dominance (VAD) model. Central to this research is the creation of reliable emotion-to-VAD mappings, achieved by averaging outcomes from multiple pre-trained networks applied to the RAVDESS dataset. This approach adeptly resolves prior inconsistencies in emotion-to-VAD mappings and establishes a dependable framework for SER. The study also introduces a refined SER model, integrating the pre-trained Wave2Vec 2.0 with Long Short-Term Memory (LSTM) networks and linear layers, culminating in an output layer representing valence, arousal, and dominance. Notably, this model exhibits commendable accuracy across various datasets, such as RAVDESS, EMO-DB, CREMA-D, and TESS, thereby showcasing its robustness and adaptability, an improvement over earlier models susceptible to dataset-specific overfitting. The research further unveils a comprehensive speech analysis application, adept at denoising, segmenting, and profiling emotions in speech segments. This application features interactive emotion tracking and sentiment reports, illustrating its practicality in diverse applications. The study recognizes ongoing challenges in SER, especially in managing the subjective nature of emotion perception and integrating multimodal data. Although the research marks a progression in SER technology, it underscores the need for continuous research and careful consideration of ethical aspects in deploying such technologies. This work contributes to the SER domain by introducing a dependable method for emotion mapping, a robust model for emotion recognition, and a user-friendly application for practical implementations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Dual-Stream CoAtNet models for accurate breast ultrasound image segmentation.
- Author
-
Zaidkilani, Nadeem, Garcia, Miguel Angel, and Puig, Domenec
- Subjects
- *
ARTIFICIAL neural networks , *BREAST ultrasound , *TRANSFORMER models , *ULTRASONIC imaging , *IMAGE segmentation , *BREAST - Abstract
The CoAtNet deep neural model has been shown to achieve state-of-the-art performance by stacking convolutional and self-attention layers. In particular, the initial layers of CoAtNet apply efficient convolutions for extracting local features out of the input image and the initial fine-resolution feature maps. In turn, the final layers apply more cumbersome Transformers in order to extract global features from the coarse-resolution feature maps. The model's outcome directly depends on those final global features. This paper proposes an extension of the original CoAtNet model based on the introduction of a dual stream of convolution and self-attention blocks applied at the final layers of CoAtNet. In this way, those final layers automatically aggregate both local and global features extracted from the initial feature maps. Two dual-stream topologies have been proposed and evaluated. This Dual-Stream CoAtNet model exhibits a significant improvement on the segmentation accuracy of breast ultrasound images, thus contributing to the development of more robust tumor detection methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Bridging auditory perception and natural language processing with semantically informed deep neural networks.
- Author
-
Esposito, Michele, Valente, Giancarlo, Plasencia-Calaña, Yenisel, Dumontier, Michel, Giordano, Bruno L., and Formisano, Elia
- Abstract
Sound recognition is effortless for humans but poses a significant challenge for artificial hearing systems. Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have recently surpassed traditional machine learning in sound classification. However, current DNNs map sounds to labels using binary categorical variables, neglecting the semantic relations between labels. Cognitive neuroscience research suggests that human listeners exploit such semantic information besides acoustic cues. Hence, our hypothesis is that incorporating semantic information improves DNN's sound recognition performance, emulating human behaviour. In our approach, sound recognition is framed as a regression problem, with CNNs trained to map spectrograms to continuous semantic representations from NLP models (Word2Vec, BERT, and CLAP text encoder). Two DNN types were trained: semDNN with continuous embeddings and catDNN with categorical labels, both with a dataset extracted from a collection of 388,211 sounds enriched with semantic descriptions. Evaluations across four external datasets, confirmed the superiority of semantic labeling from semDNN compared to catDNN, preserving higher-level relations. Importantly, an analysis of human similarity ratings for natural sounds, showed that semDNN approximated human listener behaviour better than catDNN, other DNNs, and NLP models. Our work contributes to understanding the role of semantics in sound recognition, bridging the gap between artificial systems and human auditory perception. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Rescaling large datasets based on validation outcomes of a pre-trained network.
- Author
-
Nguyen, Thanh Tuan and Nguyen, Thanh Phuong
- Published
- 2024
- Full Text
- View/download PDF
8. Distributed edge to cloud ensemble deep learning architecture to diagnose Covid-19 from lung image in IoT based e-Health system.
- Author
-
Zamani, Mohammadreza and Sharifian, Saeed
- Subjects
- *
DEEP learning , *LUNGS , *ARTIFICIAL neural networks , *COVID-19 , *ARTIFICIAL intelligence , *COVID-19 pandemic , *INTERNET of things - Abstract
Today, with the expansion of technology and new architectures of deep learning, the accuracy of artificial intelligence methods in diagnosing diseases has increased. On the other hand, with the spread of new pandemic diseases such as Covid-19, timely and accurate diagnosis of the disease has become more important. Recently, proposed deep learning methods diagnose Covid-19 with acceptable accuracy but have expensive computational cost which could not distributed and implemented in edge devices. Sometimes the type of disease could be diagnosed by small models with few parameters. These small models can be placed in the fog or edge devices, and if they detect the disease with high confidence locally, the disease investigation request will not be sent to the cloud where the comprehensive and main trained model is located. Based on this idea; we proposed an ensemble of two deep learning models using boosting Shema named mobile COVID-Net, first a light weight MobileNet model designed and embedded in fog devices to diagnose pneumonia and Covid-19 which have similar symptoms with low computational cost and high confidence. If the embedded model fails to diagnose; a modified ResNet based neural network in the second layer designed to diagnose only Covid-19 with high precision in cloud, the distributed edge to cloud ensemble of neural network models trained and tested on publicly available dataset, has achieved a total accuracy of 93.8% for detection of Covid-19, in compare to 92.4% and 92% accuracy of COVID-Net and inception algorithms respectively. The most challenging part of the work is the accurate diagnosis of Covid-19 and pneumonia diseases from one another with the least amount of error. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. SRU-Net: a novel spatiotemporal attention network for sclera segmentation and recognition.
- Author
-
Mashayekhbakhsh, Tara, Meshgini, Saeed, Rezaii, Tohid Yousefi, and Makouei, Somayeh
- Abstract
Segmenting sclera images for effective recognition under non-cooperative conditions poses a significant challenge due to the prevalent noise. While U-Net-based methods have shown success, their limitations in accurately segmenting objects with varying shapes necessitate innovative approaches. This paper introduces the spatiotemporal residual encoding and decoding network (SRU-Net), featuring multi-spatiotemporal feature integration (Ms-FI) modules and attention-pool mechanisms to enhance segmentation accuracy and robustness. Ms-FI modules within SRU-Net’s encoders and decoders identify salient feature regions and prune responses, while attention-pool modules improve segmentation robustness. To assess the proposed SRU-Net, we conducted experiments using six datasets, employing precision, recall, and F1-score metrics. The experimental results demonstrate the superiority of SRU-Net over state-of-the-art methods. Specifically, SRU-Net achieves F1-score values of 94.58%, 98.31%, 98.49%, 97.52%, 95.3%, 97.47%, and 93.11% for MSD, MASD, SVBPI, MASD+MSD, UBIRIS.v1, UBIRIS.v2, and MICHE, respectively. Further evaluation in recognition tasks, with metrics such as AUC, EER, VER@0.1%FAR, and VER@1%FAR considered for the six datasets. The proposed pipeline, comprising SRU-Net and auto encoders (AE), outperforms previous research for all datasets. Particularly noteworthy is the comparison of EER, where SRU-Net + AE exhibits the best recognition results, achieving an EER of 9.42%, 3.81%, and 5.73% for MSD, MASD, and MICHE datasets, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Development and evaluation of a deep neural network model for orthokeratology lens fitting.
- Author
-
Yang, Hsiu‐Wan Wendy, Liang, Chih‐Kai Leon, Chou, Shih‐Chi, Wang, Hsin‐Hui, and Chiang, Huihua Kenny
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *CORNEAL topography , *MACHINE learning , *ORTHOKERATOLOGY - Abstract
Purpose: To optimise the precision and efficacy of orthokeratology, this investigation evaluated a deep neural network (DNN) model for lens fitting. The objective was to refine the standardisation of fitting procedures and curtail subjective evaluations, thereby augmenting patient safety in the context of increasing global myopia. Methods: A retrospective study of successful orthokeratology treatment was conducted on 266 patients, with 449 eyes being analysed. A DNN model with an 80%–20% training‐validation split predicted lens parameters (curvature, power and diameter) using corneal topography and refractive indices. The model featured two hidden layers for precision. Results: The DNN model achieved mean absolute errors of 0.21 D for alignment curvature (AC), 0.19 D for target power (TP) and 0.02 mm for lens diameter (LD), with R2 values of 0.97, 0.95 and 0.91, respectively. Accuracy decreased for myopia of less than 1.00 D, astigmatism exceeding 2.00 D and corneal curvatures >45.00 D. Approximately, 2% of cases with unique physiological characteristics showed notable prediction variances. Conclusion: While exhibiting high accuracy, the DNN model's limitations in specifying myopia, cylinder power and corneal curvature cases highlight the need for algorithmic refinement and clinical validation in orthokeratology practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Learning Scatter Artifact Correction in Cone-Beam X-Ray CT Using Incomplete Projections with Beam Hole Array.
- Author
-
Hattori, Haruki, Yatagawa, Tatsuya, Ohtake, Yutaka, and Suzuki, Hiromasa
- Abstract
X-ray cone-beam computed tomography (CBCT) is a powerful tool for nondestructive testing and evaluation, yet the CT image quality can be compromised by artifact due to X-ray scattering within dense materials such as metals. This problem leads to the need for hardware- and software-based scatter artifact correction to enhance the image quality. Recently, deep learning techniques have merged as a promising approach to obtain scatter-free images efficiently. However, these deep learning techniques rely heavily on training data, often gathered through simulation. Simulated CT images, unfortunately, do not accurately reproduce the real properties of objects, and physically accurate X-ray simulation still requires significant computation time, hindering the collection of a large number of CT images. To address these problems, we propose a deep learning framework for scatter artifact correction using projections obtained solely by real CT scanning. To this end, we utilize a beam-hole array (BHA) to block the X-rays deviating from the primary beam path, thereby capturing scatter-free X-ray intensity at certain detector pixels. As the BHA shadows a large portion of detector pixels, we incorporate several regularization losses to enhance the training process. Furthermore, we introduce radiographic data augmentation to mitigate the need for long scanning time, which is a concern as CT devices equipped with BHA require two series of CT scans. Experimental validation showed that the proposed framework outperforms a baseline method that learns simulated projections where the entire image is visible and does not contain scattering artifacts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Local Interpretations for Explainable Natural Language Processing: A Survey.
- Author
-
Luo, Siwen, Ivison, Hamish, Han, Soyeon Caren, and Poon, Josiah
- Published
- 2024
- Full Text
- View/download PDF
13. Automated barcodeless product classifier for food retail self-checkout images.
- Author
-
Ciapas, Bernardas and Treigys, Povilas
- Subjects
- *
SELF-service stores , *ARTIFICIAL neural networks , *RETAIL stores , *PLASTIC bags , *CONSUMERS - Abstract
Growing popularity of self-service in retail stores and increasing associated shrinkage presents an urgent need for computer-vision-based product recognition in the area of self-checkouts. The article focuses on individual product recognition using automated workflow in images collected from retail store self-checkouts. The interest of this research lies exclusively in the recognition of barcodeless products—ones that present a challenge of being identified quickly and qualitatively at self-checkouts. Image sets representative of retail store product distribution do not exist as of the time of writing to the authors' knowledge. Images collected by exploiting self-checkout events often contain products partially covered by customer body parts, inside semi-transparent plastic bags, or not present in the area of interest. Due to the huge assortment of products that varies between stores and changes frequently, manual image labeling, filtering and long training time are unpractical. The proposed method investigates the need for automated steps to eliminate empty images and eliminate images where product visibility is unsatisfactory. Authors achieved 80.5±1.2% classification accuracy on a real-world dataset of 194 products using automatic workflow. The ablation studies proved the need for image filtering in both training and inference workflows. The neural network architecture tuned to the self-checkout dataset proved to outperform well-known networks: the suggested architecture's training time is a fraction of ImageNet's best EfficientNet and accuracy is slightly better. The suggested method generalization is proved on comparable products dataset Fruits 360, where 99.6% accuracy was achieved—comparable or better than other authors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Fast continuous patch-based artistic style transfer for videos.
- Author
-
Wu, Bing, Dong, Qingshuang, and Sun, Wenqing
- Subjects
- *
ARTISTIC style , *ARTIFICIAL neural networks , *OPTICAL flow , *VIDEOS - Abstract
Convolutional neural network-based image style transfer models often suffer from temporal inconsistency when applied to video. Although several video style transfer models have been proposed to improve temporal consistency, they often trade off processing speed, perceptual style quality, and temporal consistency. In this work, we propose a novel approach for fast continuous patch-based arbitrary video style transfer that achieves high-quality transfer results while maintaining temporal coherence. Our approach begins with stylizing the first frame as a standalone single image using patch propagation within the content activation. Subsequent frames are computed based on the key insight that optical flow field evaluated from neighboring content activations provides meaningful information to preserve temporal coherence efficiently. To address the problems introduced from optical flow stage, we additionally incorporate a correction procedure as a post-process to ensure a high-quality stylized video. Finally, we demonstrate our method can transfer arbitrary styles on a set of examples and illustrate that our approach exhibits superior performance both qualitatively and quantitatively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Learning-based data-driven optimal deployment control of tethered space robot.
- Author
-
Jin, Ao, Zhang, Fan, and Huang, Panfeng
- Subjects
- *
ARTIFICIAL neural networks , *ROBOT dynamics , *DEEP learning , *OPTIMAL control theory , *LINEAR systems , *OPERATOR theory , *ROBOTS , *SPACE robotics - Abstract
• A data-driven optimal control framework is proposed for TSR deployment. • A linear representation of TSR's dynamics is derived with Koopman operator. • An enhanced deep learning method is proposed for finding embedding functions. To avoid complex constraints of the traditional nonlinear method for tethered space robot (TSR) deployment, a data-driven optimal control framework with an improve deep learning based Koopman operator is proposed in this work. In consideration of nonlinearity of tethered space robot dynamics, its finite dimensional global linear representation called lifted linear system is derived with the Koopman operator theory. A deep learning scheme is adopted to find the embedding functions associate with Koopman operator. And an auxiliary neural network is developed to encode the nonlinear control term of finite dimensional lifted system. Then a controllability constraint is considered for learning a controllable lifted linear system. Besides two loss functions that relate to reconstruction and prediction ability of lifted linear system are designed for training the deep neural network. With the learned lifted linear dynamics, Linear Quadratic Regulator (LQR) is applied to derive the optimal control policy for the tethered space robot deployment. Finally, simulation results verify the effectiveness of proposed framework and show that it could deploy tethered space robot more quickly with less swing of in-plane angle. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. AESA Antennas using Machine Learning with Reduced Dataset.
- Author
-
ZAIB, Alam, MASOOD, Abdur Rehman, ABDULLAH, Muhammad Asad, KHATTAK, Shahid, SALEEM, Aasim Bin, and ULLAH, Irfan
- Subjects
ARTIFICIAL neural networks ,PHASED array antennas ,COMPUTER engineering ,ANTENNAS (Electronics) ,MACHINE learning - Abstract
This paper proposes a deep neural network (DNN)-based approach for radiation pattern synthesis of 8 elements phased array antenna. For this purpose, 181 points of a desired radiation pattern are fed as input to the DNN and phases of array elements are extracted as the outputs. Existing DNN techniques for radiation pattern synthesis are not directly applicable to higher-order arrays as the dataset size grows exponentially with array dimensions. To overcome this bottleneck, we propose novel and efficient methods of generating datasets for DNN. Specifically, by leveraging the constant phase-shift characteristic of the phased array antenna, dataset size is reduced by several orders of magnitude and made independent of the array size. This has considerable advantages in terms of speed and complexity, especially in real-time applications as the DNN can immediately learn and synthesize the desired patterns. The performance of the proposed methods is validated by using an ideal square beam and an optimal array pattern as reference inputs to the DNN. The results generated in MATLAB as well as in CST, demonstrate the effectiveness of the proposed methods in synthesizing the desired radiation patterns. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Influence of Temperature on Brushless Synchronous Machine Field Winding Interturn Fault Severity Estimation.
- Author
-
Pascual, Rubén, Rivero, Eduardo, Guerrero, José M., Mahtani, Kumar, and Platero, Carlos A.
- Abstract
There are numerous methods for detecting interturn faults (ITFs) in the field winding of synchronous machines (SMs). One effective approach is based on comparing theoretical and measured excitation currents. This method is unaffected by rotor temperature in static excitation SMs. However, this paper investigates the influence of rotor temperature in brushless synchronous machines (BSMs), where rotor temperature significantly impacts the exciter excitation current. Extensive experimental tests were conducted on a special BSM with measurable rotor temperature. Given the challenges of measuring rotor temperature in industrial machines, this paper explores the feasibility of using stator temperature in the exciter field current estimation model. The theoretical exciter field current is calculated using a deep neural network (DNN), which incorporates electrical brushless synchronous generator output values and stator temperature, and it is subsequently compared with the measured exciter field current. This method achieves an error rate below 0.5% under healthy conditions, demonstrating its potential for simple implementation in industrial BSMs for ITF detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Transient Fault Detection in Tensor Cores for Modern GPUs.
- Author
-
Hafezan, Mohammad Hassan and Atoofian, Ehsan
- Abstract
Deep neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called tensor core (TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of the TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance. In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called fault detection in tensor core (FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. The high level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with a low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Fast Loosely-Timed Deep Neural Network Models with Accurate Memory Contention.
- Author
-
Arasteh, Emad M. and Dömer, Rainer
- Abstract
The emergence of data-intensive applications, such as Deep Neural Networks (DNN), exacerbates the well-known memory bottleneck in computer systems and demands early attention in the design flow. Electronic System-Level (ESL) design using SystemC Transaction Level Modeling (TLM) enables effective performance estimation, design space exploration (DSE), and gradual refinement. However, memory contention is often only detectable after detailed TLM-2.0 approximately-timed or cycle-accurate RTL models are developed. A memory bottleneck detected at such a late stage can severely limit the available design choices or even require costly redesign. In this work, we propose a novel TLM-2.0 loosely-timed contention-aware (LT-CA) modeling style that offers high-speed simulation close to traditional loosely-timed (LT) models, yet shows the same accuracy for memory contention as low-level approximately-timed (AT) models. Thus, our proposed LT-CA modeling breaks the speed/accuracy tradeoff between regular LT and AT models and offers fast and accurate observation and visualization of memory contention. Our extensible SystemC model generator automatically produces desired TLM-1 and TLM-2.0 models from a DNN architecture description for design space exploration focusing on memory contention. We demonstrate our approach with a real-world industry-strength DNN application, GoogLeNet. The experimental results show that the proposed LT-CA modeling is 46× faster in simulation than equivalent AT models with an average error of less than 1% in simulated time. Early detection of memory contentions also suggests that local memories close to computing cores can eliminate memory contention in such applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. OpBench: an operator-level GPU benchmark for deep learning.
- Author
-
Gu, Qingwen, Fan, Bo, Liu, Zhengning, Cao, Kaicheng, Zhang, Songhai, and Hu, Shimin
- Abstract
Operators (such as Conv and ReLU) play an important role in deep neural networks. Every neural network is composed of a series of differentiable operators. However, existing AI benchmarks mainly focus on accessing model training and inference performance of deep learning systems on specific models. To help GPU hardware find computing bottlenecks and intuitively evaluate GPU performance on specific deep learning tasks, this paper focuses on evaluating GPU performance at the operator level. We statistically analyze the information of operators on 12 representative deep learning models from six prominent AI tasks and provide an operator dataset to show the different importance of various types of operators in different networks. An operator-level benchmark, OpBench, is proposed on the basis of this dataset, allowing users to choose from a given range of models and set the input sizes according to their demands. This benchmark offers a detailed operator-level performance report for AI and hardware developers. We also evaluate four GPU models on OpBench and find that their performances differ on various types of operators and are not fully consistent with the performance metric FLOPS (floating point operations per second). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. An End-to-End Workflow to Efficiently Compress and Deploy DNN Classifiers on SoC/FPGA.
- Author
-
Molina, Romina Soledad, Morales, Ivan Rene, Crespo, Maria Liz, Costa, Veronica Gil, Carrato, Sergio, and Ramponi, Giovanni
- Abstract
Machine learning (ML) models have demonstrated discriminative and representative learning capabilities over a wide range of applications, even at the cost of high-computational complexity. Due to their parallel processing capabilities, reconfigurability, and low-power consumption, systems on chip based on a field programmable gate array (SoC/FPGA) have been used to face this challenge. Nevertheless, SoC/FPGA devices are resource-constrained, which implies the need for optimal use of technology for the computation and storage operations involved in ML-based inference. Consequently, mapping a deep neural network (DNN) architecture to a SoC/FPGA requires compression strategies to obtain a hardware design with a good compromise between effectiveness, memory footprint, and inference time. This letter presents an efficient end-to-end workflow for deploying DNNs on an SoC/FPGA by integrating hyperparameter tuning through Bayesian optimization (BO) with an ensemble of compression techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Kalt: generating adversarial explainable chinese legal texts.
- Author
-
Zhang, Yunting, Li, Shang, Ye, Lin, Zhang, Hongli, Chen, Zhe, and Fang, Binxing
- Subjects
ARTIFICIAL neural networks ,NATURAL language processing ,CHINESE language - Abstract
Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs), which are well-designed input samples with imperceptible perturbations. Existing methods generate AEs to evaluate the robustness of DNN-based natural language processing models. However, the AE attack performance significantly degrades in some verticals, such as law, due to overlooking essential domain knowledge. To generate explainable Chinese legal adversarial texts, we introduce legal knowledge and propose a novel black-box approach, knowledge-aware law tricker (KALT), in the framework of adversarial text generation based on word importance. Firstly, we invent a legal knowledge extraction method based on KeyBERT. The knowledge contains unique features from each category and shared features among different categories. Additionally, we design two perturbation strategies, Strengthen Similar Label and Weaken Original Label, to selectively perturb the two types of features, which can significantly reduce the classification accuracy of the target model. These two perturbation strategies can be regarded as components, which can be conveniently integrated into any perturbation method to enhance attack performance. Furthermore, we propose a strong hybrid perturbation method to introduce perturbation into the original texts. The perturbation method combines seven representative perturbation methods for Chinese. Finally, we design a formula to calculate interpretability scores, quantifying the interpretability of adversarial text generation methods. Experimental results demonstrate that KALT can effectively generate explainable Chinese legal adversarial texts that can be misclassified with high confidence and achieve excellent attack performance against the powerful Chinese BERT. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Improving interpretability via regularization of neural activation sensitivity.
- Author
-
Moshe, Ofir, Fidel, Gil, Bitton, Ron, and Shabtai, Asaf
- Subjects
ARTIFICIAL neural networks ,STIMULUS generalization ,TRUST ,CONFIDENCE - Abstract
State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs' security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model's decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs' interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.) [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Sparse oblique decision trees: a tool to understand and manipulate neural net features.
- Author
-
Hada, Suryabhan Singh, Carreira-Perpiñán, Miguel Á., and Zharmagambetov, Arman
- Subjects
ARTIFICIAL neural networks ,DECISION trees ,TREES - Abstract
The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understanding which of the internal features computed by the neural net are responsible for a particular class. We achieve this by mimicking part of the neural net with an oblique decision tree having sparse weight vectors at the decision nodes. Using the recently proposed Tree Alternating Optimization (TAO) algorithm, we are able to learn trees that are both highly accurate and interpretable. Such trees can faithfully mimic the part of the neural net they replaced, and hence they can provide insights into the deep net black box. Further, we show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features. These insights and manipulations apply globally to the entire training and test set, not just at a local (single-instance) level. We demonstrate this robustly in the MNIST and ImageNet datasets with LeNet5 and VGG networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. On the applications of neural ordinary differential equations in medical image analysis.
- Author
-
Niu, Hao, Zhou, Yuxiang, Yan, Xiaohao, Wu, Jun, Shen, Yuncheng, Yi, Zhang, and Hu, Junjie
- Abstract
Medical image analysis tasks are characterized by high-noise, volumetric, and multi-modality, posing challenges for the model that attempts to learn robust features from the input images. Over the last decade, deep neural networks (DNNs) have achieved enormous success in medical image analysis tasks, which can be attributed to their powerful feature representation capability. Despite the promising results reported in numerous literature, DNNs are also criticized for several pivotal limits, with one of the limitations is lack of safety. Safety plays an important role in the applications of DNNs during clinical practice, helping the model defend against potential attacks and preventing the model from silent failure prediction. The recently proposed neural ordinary differential equation (NODE), a continuous model bridging the gap between DNNs and ODE, provides a significant advantage in ensuring the model’s safety. Among the variants of NODE, the neural memory ordinary differential equation (nmODE) owns the global attractor theoretically, exhibiting superiority in prompting the model’s performance and robustness during applications. While NODE and its variants have been widely used in medical image analysis tasks, there is a lack of a comprehensive review of their applications, hindering the in-depth understanding of NODE’s working principle and its potential applications. To mitigate this limitation, this paper thoroughly reviews the literature on the applications of NODE in medical image analysis from the following five aspects: segmentation, reconstruction, registration, disease prediction, and data generation. We also summarize both the strengths and downsides of the applications of NODE, followed by the possible research directions. To the best of our knowledge, this is the first review regards the applications of NODE in the field of medical image analysis. We hope this review can draw the researchers’ attention to the great potential of NODE and its variants in medical image analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. A Generalized Attention Mechanism to Enhance the Accuracy Performance of Neural Networks.
- Author
-
Jiang, Pengcheng, Neri, Ferrante, Xue, Yu, and Maulik, Ujjwal
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *DEEP learning , *CONVOLUTIONAL neural networks , *CLASSIFICATION - Abstract
In many modern machine learning (ML) models, attention mechanisms (AMs) play a crucial role in processing data and identifying significant parts of the inputs, whether these are text or images. This selective focus enables subsequent stages of the model to achieve improved classification performance. Traditionally, AMs are applied as a preprocessing substructure before a neural network, such as in encoder/decoder architectures. In this paper, we extend the application of AMs to intermediate stages of data propagation within ML models. Specifically, we propose a generalized attention mechanism (GAM), which can be integrated before each layer of a neural network for classification tasks. The proposed GAM allows for at each layer/step of the ML architecture identification of the most relevant sections of the intermediate results. Our experimental results demonstrate that incorporating the proposed GAM into various ML models consistently enhances the accuracy of these models. This improvement is achieved with only a marginal increase in the number of parameters, which does not significantly affect the training time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Constitutive Modeling of High‐Temperature Deformation Behavior of Nonoriented Electrical Steels as Compared to Machine Learning.
- Author
-
Mishra, Gyanaranjan, Pasco, Jubert, McCarthy, Thomas, Nyamuchiwa, Kudakwashe, He, Youliang, and Aranas, Clodualdo
- Subjects
- *
ARTIFICIAL neural networks , *SILICON steel , *HOT rolling , *STRAIN rate , *ELECTRICAL steel , *MACHINE learning - Abstract
Hot rolling is a critical thermomechanical processing step for nonoriented electrical steel (NOES) to achieve optimal mechanical and magnetic properties. Depending on the silicon and carbon contents, the electrical steel may or may not undergo austenite–ferrite phase transformation during hot rolling, which requires different process controls as the austenite and ferrite show different flow stresses at high temperatures. Herein, the high‐temperature flow behaviors of two nonoriented electrical steels with silicon contents of 1.3 and 3.2 wt% are investigated through hot compression tests. The hot deformation temperature is varied from 850 to 1050 °C, and the strain rate is differentiated from 0.01 to 1.0 s−1. The measured stress‐strain data are fitted using various constitutive models (combined with optimization techniques), namely, Johnson–Cook, modified Johnson–Cook, Zener–Hollomon, Hensel–Spittel, modified Hensel–Spittel, and modified Zerilli–Armstrong. The results are also compared with a model based on deep neural network (DNN). It is shown that the Hensel–Spittel model results in the smallest average absolute relative error among all the constitutive models, and the DNN model can perfectly track almost all the experimental flow stresses over the entire ranges of temperature, strain rate, and strain. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Cardio vascular disease prediction by deep learning based on IOMT: review.
- Author
-
C, Deepti and J, Nagaraja
- Subjects
- *
ARTIFICIAL neural networks , *EPIDEMIOLOGICAL transition , *GLOBAL burden of disease , *HEART diseases ,DEVELOPING countries - Abstract
The global burden of disease caused by cardiovascular diseases (CVDs) is increasing despite technical advancements in healthcare because of a dramatic rise in the developing nations that are experiencing rapid health transitions. The World Health Organization (WHO) estimates 17.9 million deaths worldwide in 2021 and is connected to CVDs, or 32% of all deaths. Since ancient times, people have experimented with the methods that extend their lives. The proposed technology is still a long way for attaining the aim of lessening the mortality rates. Early detection and proactive management of CVD risk factors are crucial for reducing the burden of these diseases. In recent years, researchers have been exploring the potential of deep learning methods for predicting cardiovascular disease risk depending upon data collected from IoMT devices. Deep learning (DL) methods used for cardiovascular diseases prediction have been popular in this domain. Several DL techniques are implemented to accomplish efficient prediction-based CVD. There are several steps in the CVD employing deep learning model. IoT sensors and deep learning techniques are used to process large amounts of patient-related biomedical data, enabling doctors to closely monitor their patients and make choices in real-time. An outline of the IoT, sensors, and deep learning is provided after a discussion of cardiac disease and its existing treatments. A complete analysis of the current and pertinent deep-learning techniques for heart disease prediction is reviewed. The result shows the performance metrics of the comparison of different deep learning approaches. This review is undertaken by pulling data from 44 papers published between the years 2020 and 2023, provides a thorough statistical analysis. Finally, this survey will be beneficial for CVD prediction researchers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Research on Data-Driven Methods for Solving High-Dimensional Neutron Transport Equations.
- Author
-
Peng, Zhiqiang, Lei, Jichong, Ni, Zining, Yu, Tao, Xie, Jinsen, Hong, Jun, and Hu, Hong
- Subjects
- *
ARTIFICIAL neural networks , *TRANSPORT equation , *ENGINEERING standards , *ARTIFICIAL intelligence , *ENGINEERING design , *NEUTRON transport theory - Abstract
With the continuous development of computer technology, artificial intelligence has been widely applied across various industries. To address the issues of high computational cost and inefficiency in traditional numerical methods, this paper proposes a data-driven artificial intelligence approach for solving high-dimensional neutron transport equations. Based on the AFA-3G assembly model, a neutron transport equation solving model is established using deep neural networks, considering factors that influence the neutron transport process in real engineering scenarios, such as varying temperature, power, and boron concentration. Comparing the model's predicted values with reference values, the average error in the infinite multiplication factor kinf of the assembly is found to be 145.71 pcm (10−5), with a maximum error of 267.10 pcm. The maximum relative error is less than 3.5%, all within the engineering error standards of 500 pcm and 5%. This preliminary validation demonstrates the feasibility of using data-driven artificial intelligence methods to solve high-dimensional neutron transport equations, offering a new option for engineering design and practical engineering computations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. The face inversion effect through the lens of deep neural networks.
- Author
-
Tousi, Ehsan and Mur, Marieke
- Subjects
- *
ARTIFICIAL neural networks , *HUMAN information processing , *FEEDFORWARD neural networks , *ARTIFICIAL intelligence , *MACHINE learning , *DEEP learning , *FUSIFORM gyrus - Abstract
This article examines whether the brain is organized based on specialized mechanisms for processing specific types of stimuli or if it is organized around mechanisms that can be applied to different types of stimuli. The authors conducted a study using deep neural networks to investigate the face inversion effect, which is a difficulty in recognizing faces when they are upside-down. They found that similar inversion effects exist for other types of stimuli, but these effects do not extend to unseen stimuli. The results suggest that the face inversion effect may be due to specialized processing mechanisms rather than general processing mechanisms. The article also discusses the potential of deep learning as a computational model for understanding human vision and the need for further research on the development of processing systems in the brain. [Extracted from the article]
- Published
- 2024
- Full Text
- View/download PDF
31. Communication-Efficient Wireless Traffic Prediction with Federated Learning.
- Author
-
Gao, Fuwei, Zhang, Chuanting, Qiao, Jingping, Li, Kaiqiang, and Cao, Yi
- Subjects
- *
ARTIFICIAL neural networks , *FEDERATED learning , *INTELLIGENT networks , *DATA protection , *RESOURCE allocation - Abstract
Wireless traffic prediction is essential to developing intelligent communication networks that facilitate efficient resource allocation. Along this line, decentralized wireless traffic prediction under the paradigm of federated learning is becoming increasingly significant. Compared to traditional centralized learning, federated learning satisfies network operators' requirements for sensitive data protection and reduces the consumption of network resources. In this paper, we propose a novel communication-efficient federated learning framework, named FedCE, by developing a gradient compression scheme and an adaptive aggregation strategy for wireless traffic prediction. FedCE achieves gradient compression through top-K sparsification and can largely relieve the communication burdens between local clients and the central server, making it communication-efficient. An adaptive aggregation strategy is designed by quantifying the different contributions of local models to the global model, making FedCE aware of spatial dependencies among various local clients. We validate the effectiveness of FedCE on two real-world datasets. The results demonstrate that FedCE can improve prediction accuracy by approximately 27% with only 20% of communications in the baseline method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Exploring adversarial examples and adversarial robustness of convolutional neural networks by mutual information.
- Author
-
Zhang, Jiebao, Qian, Wenhua, Cao, Jinde, and Xu, Dan
- Subjects
- *
ARTIFICIAL neural networks , *CONVOLUTIONAL neural networks , *INFORMATION networks - Abstract
Convolutional neural networks (CNNs) are susceptible to adversarial examples, which are similar to original examples but contain malicious perturbations. Adversarial training is a simple and effective defense method to improve the robustness of CNNs to adversarial examples. Many works explore the mechanism behind adversarial examples and adversarial training. However, mutual information is rarely present in the interpretation of these counter-intuitive phenomena. This work investigates similarities and differences between normally trained CNNs (NT-CNNs) and adversarially trained CNNs (AT-CNNs) from the mutual information perspective. We show that although mutual information trends of NT-CNNs and AT-CNNs are similar throughout training for original and adversarial examples, there exists an obvious difference. Compared with NT-CNNs, AT-CNNs achieve a lower clean accuracy and extract less information from the input. CNNs trained with different methods have different preferences for certain types of information; NT-CNNs tend to extract texture-based information from the input, while AT-CNNs prefer shape-based information. The reason why adversarial examples mislead CNNs may be that they contain more texture-based information about other classes. Furthermore, we also analyze the mutual information estimators used in this work and find that they outline the geometric properties of the middle layer's output. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Physics-Informed Neural Networks for Cantilever Dynamics and Fluid-Induced Excitation.
- Author
-
Lee, Jeongsu, Park, Keunhwan, and Jung, Wonjong
- Subjects
ARTIFICIAL neural networks ,FLUID-induced excitation ,STRUCTURAL dynamics ,DIFFERENTIAL equations ,DIFFERENTIABLE functions - Abstract
Physics-informed neural networks (PINNs) represent a continuous and differentiable mapping function, approximating solution curves for given differential equations. Recent studies have demonstrated the significant potential of PINNs as an alternative or complementary approach to conventional numerical methods. However, their application in structural dynamics, such as cantilever dynamics and fluid-induced excitations, poses challenges. In particular, limited accuracy and robustness in resolving high-order differential equations, including fourth-order differential equations encountered in structural dynamics, are major problems with PINNs. To address these challenges, this study explores optimal strategies for constructing PINNs in the context of cantilever dynamics: (1) performing scaling analysis for the configuration, (2) incorporating the second-order non-linear term of the input variables, and (3) utilizing a neural network architecture that reflects a series solution of decomposed bases. These proposed methods have significantly enhanced the predictive capabilities of PINNs, showing an order-of-magnitude improvement in accuracy compared to standard PINNs in resolving the dynamic oscillation of cantilevers and fluid-induced excitation driven by added mass forces. Furthermore, this study extends to the domain of fluid-induced excitation in cantilever dynamics, representing an extreme case of coupled dynamics in fluid–structure interaction. This research is expected to establish crucial baselines for the further development of PINNs in structural dynamics, with potential applicability to high-order coupled differential equations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Landslide Assessment Classification Using Deep Neural Networks Based on Climate and Geospatial Data.
- Author
-
Tynchenko, Yadviga, Kukartsev, Vladislav, Tynchenko, Vadim, Kukartseva, Oksana, Panfilova, Tatyana, Gladkov, Alexey, Nguyen, Van, and Malashin, Ivan
- Abstract
This study presents a method for classifying landslide triggers and sizes using climate and geospatial data. The landslide data were sourced from the Global Landslide Catalog (GLC), which identifies rainfall-triggered landslide events globally, regardless of size, impact, or location. Compiled from 2007 to 2018 at NASA Goddard Space Flight Center, the GLC includes various mass movements triggered by rainfall and other events. Climatic data for the 10 years preceding each landslide event, including variables such as rainfall amounts, humidity, pressure, and temperature, were integrated with the landslide data. This dataset was then used to classify landslide triggers and sizes using deep neural networks (DNNs) optimized through genetic algorithm (GA)-driven hyperparameter tuning. The optimized DNN models achieved accuracies of 0.67 and 0.82, respectively, in multiclass classification tasks. This research demonstrates the effectiveness of GA to enhance landslide disaster risk management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Machine-Learning-Based Anomaly Detection for GOOSE in Digital Substations.
- Author
-
Nhung-Nguyen, Hong, Girdhar, Mansi, Kim, Yong-Hwa, and Hong, Junho
- Subjects
- *
ARTIFICIAL neural networks , *INTRUSION detection systems (Computer security) , *INFORMATION & communication technologies , *CYBERTERRORISM , *SUPPORT vector machines , *CYBER physical systems - Abstract
Digital substations have adopted a high amount of information and communication technology (ICT) and cyber–physical systems (CPSs) for monitoring and control. As a result, cyber attacks on substations have been increasing and have become a major concern. An intrusion-detection system (IDS) could be a solution to detect and identify the abnormal behaviors of hackers. In this paper, a Deep Neural Network (DNN)-based IDS is proposed to detect malicious generic object-oriented substation event (GOOSE) communication over the process and station bus network, followed by the multiclassification of the cyber attacks. For training, both the abnormal and the normal substation networks are monitored, captured, and logged, and then the proposed algorithm is applied for distinguishing normal events from abnormal ones within the network communication packets. The designed system is implemented and tested with a real-time IEC 61850 GOOSE message dataset using two different approaches. The experimental results show that the proposed system can successfully detect intrusions with an accuracy of 98%. In addition, a comparison is performed in which the proposed IDS outperforms the support vector machine (SVM)-based IDS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. 基于样本优化策略的滑坡易发性评价.
- Author
-
吴宏阳, 周 超, 梁 鑫, 王 悦, 袁鹏程, and 吴立星
- Subjects
- *
ARTIFICIAL neural networks , *LANDSLIDE prediction , *LANDSLIDES , *LANDSLIDE hazard analysis , *STATISTICAL decision making , *PROBLEM solving - Abstract
Objectives: Accurate susceptibility evaluation results can accurately prevent and control the dangers caused by landslides. Sample optimization is an important method for landslide susceptibility evaluation, which can effectively solve the problem of decision boundary offset generated by unbalanced samples and improve the accuracy of landslide susceptibility evaluation. Methods: Taking the southeast area of Wanzhou District of Chongqing, China as an example, ten influencing factors such as strata, land use and elevation were selected to construct a landslide susceptibility evaluation index system, and the relationship between landslide and the indices was quantitatively analyzed by frequency ratio method, and on this basis, deep neural network model (DNN), synthetic minority oversampling technique-DNN model (SMOTEDNN), one-class support vector machine-DNN coupling model (OS-DNN), and OS-DNN-K-means clustering coupling model (OS-DNN-K-means) were used to evaluate landslide susceptibility. Results: The results show that the distance from the road, land use and strata are the main control factors for land‐ slide development in the study area. The accuracy evaluation results show that OS-DNN-K-means (95.61%) and OS-DNN (91.16%) could improve the landslide prediction accuracy more effectively compared with SMOTE-DNN (87.97%) and DNN (81.40%). Conclusions: Sample optimization through mixed sampling and semi-supervised classification can effectively solve the problem of sample imbalance in the study area, and provide new technical support for spatial prediction of landslide disasters [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Proof of the Theory-to-Practice Gap in Deep Learning via Sampling Complexity bounds for Neural Network Approximation Spaces.
- Author
-
Grohs, Philipp and Voigtlaender, Felix
- Subjects
- *
ARTIFICIAL neural networks , *APPROXIMATION algorithms , *COMPUTATIONAL complexity , *ALGORITHMS , *HARDNESS , *DEEP learning - Abstract
We study the computational complexity of (deterministic or randomized) algorithms based on point samples for approximating or integrating functions that can be well approximated by neural networks. Such algorithms (most prominently stochastic gradient descent and its variants) are used extensively in the field of deep learning. One of the most important problems in this field concerns the question of whether it is possible to realize theoretically provable neural network approximation rates by such algorithms. We answer this question in the negative by proving hardness results for the problems of approximation and integration on a novel class of neural network approximation spaces. In particular, our results confirm a conjectured and empirically observed theory-to-practice gap in deep learning. We complement our hardness results by showing that error bounds of a comparable order of convergence are (at least theoretically) achievable. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Transfer Accent Identification Learning for Enhancing Speech Emotion Recognition.
- Author
-
Priya Dharshini, G. and Sreenivasa Rao, K.
- Abstract
Emotional speech has some dependency on language or within a language itself, there are certain variations due to accents. The presence of accents degrades the performance of the speech emotion recognition (SER) system. A pre-trained accent identification system (AID) could effectively capture the characteristics of accent variations in emotional speech which is an important factor to develop a more reliable SER system. In this work, we investigate the dependencies between accent identification and emotion recognition to enhance the performance of SER. This paper proposes a novel transfer learning-based approach utilizing accent identification knowledge for SER. In the proposed method, the deep neural network (DNN) is used to model the accent identification system, which uses statistical aggregation functions (mean, std, median, etc.,) of spectral subband centroid (SSC) features and Mel-frequency discrete wavelet coefficients (MFDWC). To build the SER, the deep convolutional recurrent autoencoder produces the attention-based latent representation, and the acoustic features are extracted by the openSMILE toolkit. A separate DNN model is used to learn the mapping between attention features and acoustic features for SER. In addition, the a priori knowledge of accent can lead the SER to effect the improvement which is possible through transfer learning (TL). The performance of the proposed method is assessed using the accented emotional speech utterances of the Crema-D dataset and also compared with state-of-the-art techniques. The experimental results show that transferring AID learning improves the recognition rate of the SER and results in around 8% relative improvement in accuracy as compared to the existing SER techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Deep Learning-Based Boolean, Time Series, Error Detection, and Predictive Analysis in Container Crane Operations.
- Author
-
Awasthi, Amruta, Krpalkova, Lenka, and Walsh, Joseph
- Subjects
- *
ARTIFICIAL neural networks , *RECURRENT neural networks , *CRANES (Machinery) , *COVARIANCE matrices , *DATA analytics - Abstract
Deep learning is crucial in marine logistics and container crane error detection, diagnosis, and prediction. A novel deep learning technique using Long Short-Term Memory (LSTM) detected and anticipated errors in a system with imbalanced data. The LSTM model was trained on real operational error data from container cranes. The custom algorithm employs the Synthetic Minority Oversampling TEchnique (SMOTE) to balance the imbalanced data for operational data errors (i.e., too few minority class samples). Python was used to program. Pearson, Spearman, and Kendall correlation matrices and covariance matrices are presented. The model's training and validation loss is shown, and the remaining data are predicted. The test set (30% of actual data) and forecasted data had RMSEs of 0.065. A heatmap of a confusion matrix was created using Matplotlib and Seaborn. Additionally, the error outputs for the time series for the next n seconds were projected, with the n seconds input by the user. Accuracy was 0.996, precision was 1.00, recall was 0.500, and f1 score was 0.667, according to the evaluation criteria that were produced. Experiments demonstrated that the technique is capable of identifying critical elements. Thus, future attempts will improve the model's structure to forecast industrial big data errors. However, the advantage is that it can handle imbalanced data, which is usually what most industries have. With additional data, the model can be further improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. A Noisy Sample Selection Framework Based on a Mixup Loss and Recalibration Strategy.
- Author
-
Zhang, Qian, Yu, De, Zhou, Xinru, Gong, Hanmeng, Li, Zheng, Liu, Yiming, and Shao, Ruirui
- Subjects
- *
ARTIFICIAL neural networks , *SUPERVISED learning , *IMAGE recognition (Computer vision) , *GAUSSIAN mixture models , *PRIOR learning - Abstract
Deep neural networks (DNNs) have achieved breakthrough progress in various fields, largely owing to the support of large-scale datasets with manually annotated labels. However, obtaining such datasets is costly and time-consuming, making high-quality annotation a challenging task. In this work, we propose an improved noisy sample selection method, termed "sample selection framework", based on a mixup loss and recalibration strategy (SMR). This framework enhances the robustness and generalization abilities of models. First, we introduce a robust mixup loss function to pre-train two models with identical structures separately. This approach avoids additional hyperparameter adjustments and reduces the need for prior knowledge of noise types. Additionally, we use a Gaussian Mixture Model (GMM) to divide the entire training set into labeled and unlabeled subsets, followed by robust training using semi-supervised learning (SSL) techniques. Furthermore, we propose a recalibration strategy based on cross-entropy (CE) loss to prevent the models from converging to local optima during the SSL process, thus further improving performance. Ablation experiments on CIFAR-10 with 50% symmetric noise and 40% asymmetric noise demonstrate that the two modules introduced in this paper improve the accuracy of the baseline (i.e., DivideMix) by 1.5% and 0.5%, respectively. Moreover, the experimental results on multiple benchmark datasets demonstrate that our proposed method effectively mitigates the impact of noisy labels and significantly enhances the performance of DNNs on noisy datasets. For instance, on the WebVision dataset, our method improves the top-1 accuracy by 0.7% and 2.4% compared to the baseline method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Incorporating causality in energy consumption forecasting using deep neural networks.
- Author
-
Sharma, Kshitij, Dwivedi, Yogesh K., and Metri, Bhimaraya
- Subjects
- *
ARTIFICIAL neural networks , *ENERGY consumption forecasting , *SHORT-term memory , *LONG-term memory , *MACHINE learning - Abstract
Forecasting energy demand has been a critical process in various decision support systems regarding consumption planning, distribution strategies, and energy policies. Traditionally, forecasting energy consumption or demand methods included trend analyses, regression, and auto-regression. With advancements in machine learning methods, algorithms such as support vector machines, artificial neural networks, and random forests became prevalent. In recent times, with an unprecedented improvement in computing capabilities, deep learning algorithms are increasingly used to forecast energy consumption/demand. In this contribution, a relatively novel approach is employed to use long-term memory. Weather data was used to forecast the energy consumption from three datasets, with an additional piece of information in the deep learning architecture. This additional information carries the causal relationships between the weather indicators and energy consumption. This architecture with the causal information is termed as entangled long short term memory. The results show that the entangled long short term memory outperforms the state-of-the-art deep learning architecture (bidirectional long short term memory). The theoretical and practical implications of these results are discussed in terms of decision-making and energy management systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Icg: intensity and color gradient operator on RGB images for visual object tracking.
- Author
-
Dasari, Mohana Murali and Gorthi, Rama Krishna
- Subjects
- *
DEEP learning , *ARTIFICIAL neural networks , *OBJECT tracking (Computer vision) , *CONVOLUTIONAL neural networks , *FEATURE extraction - Abstract
The design of digital filters is now mostly automated with convolutional neural networks (CNNs). State-of-the-art works in tracking methods, including the well-known correlation and deep Siamese trackers, use features from such CNNs. However, deep learning requires huge data, high computational resources, and more training time. Hence, smart and simple alternative feature extraction strategies are needed in embedded applications. In this direction, a method is proposed for obtaining enriched "intensity and color gradient features" using the "three-dimensional gradient operator" on color images. This work considered the popular first-order gradient operator ([ - 1 0 1]) and outer product operator to generate various intensity and color gradients. The generated features contain rich information, including edges, color, and mid-level segmentation-like features. This simple yet effective operator does not involve any learning parameters. Despite not having learnable parameters, the proposed method's performance is comparable to lightweight learned CNNs such as MobileNet. The efficacy of the resultant features is demonstrated for the visual tracking task using well-known datasets in tracking, namely GOT-10k, LaSOT, OTB2015, UAV123, and VOT2018. The proposed features combined with other deep learning features have boosted the performance over baseline efficient convolution operator (ECO) tracker. This work will open new frontiers in designing hybrid features for visual tracking and other visual computation tasks. The code is available "https://github.com/dasari4321/ICG_tracker.git" [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. ZRDNet: zero-reference image defogging by physics-based decomposition–reconstruction mechanism and perception fusion.
- Author
-
Li, Zi-Xin, Wang, Yu-Long, Han, Qing-Long, and Peng, Chen
- Subjects
- *
ARTIFICIAL neural networks , *PROBLEM solving - Abstract
This paper investigates challenging fully unsupervised defogging problems, i.e., how to remove fog by feeding only foggy images in deep neural networks rather than using paired or unpaired synthetic images, and how to overcome the problems of insufficient structure and detail recovery in existing unsupervised defogging methods. For this purpose, a zero-reference image defogging method (ZRDNet) is proposed to solve these two problems. Specifically, we develop an unsupervised defogging network consisting of a layer decomposition network and a perceptual fusion network, which are separately optimized by joint multiple-loss based on the stage-wise learning. The decomposition network guides the image decomposition–reconstruction process by rationally constructing loss functions. The fusion network further enhances the details and contrast of the defogged images by fusing the decomposition–reconstruction results. The joint multiple-loss optimization strategy based on the stage-wise learning guides decomposition and fusion tasks, which are completed stage-by-stage. Additionally, a non-reference loss is constructed to prevent artifacts and distortion induced by transmission value deviation. Our method is completely unsupervised, and training only relies on fog images and information derived from the fog images themselves. Experiments are conducted to demonstrate that our ZRDNet, which overcomes the problems of insufficient structure and detail recovery, and domain shift induced by using synthetic image, achieves favorable performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Towards assessing the synthetic-to-measured adversarial vulnerability of SAR ATR.
- Author
-
Peng, Bowen, Peng, Bo, Xia, Jingyuan, Liu, Tianpeng, Liu, Yongxiang, and Liu, Li
- Subjects
- *
ARTIFICIAL neural networks , *AUTOMATIC target recognition , *SYNTHETIC aperture radar , *COMPUTER vision , *REMOTE sensing - Abstract
Recently, there has been increasing concern about the vulnerability of deep neural network (DNN)-based synthetic aperture radar (SAR) automatic target recognition (ATR) to adversarial attacks, where a DNN could be easily deceived by clean input with imperceptible but aggressive perturbations. This paper studies the synthetic-to-measured (S2M) transfer setting, where an attacker generates adversarial perturbation based solely on synthetic data and transfers it against victim models trained with measured data. Compared with the current measured-to-measured (M2M) transfer setting, our approach does not need direct access to the victim model or the measured SAR data. We also propose the transferability estimation attack (TEA) to uncover the adversarial risks in this more challenging and practical scenario. The TEA makes full use of the limited similarity between the synthetic and measured data pairs for blind estimation and optimization of S2M transferability, leading to feasible surrogate model enhancement without mastering the victim model and data. Comprehensive evaluations based on the publicly available synthetic and measured paired labeled experiment (SAMPLE) dataset demonstrate that the TEA outperforms state-of-the-art methods and can significantly enhance various attack algorithms in computer vision and remote sensing applications. Codes and data are available at https://github.com/scenarri/S2M-TEA. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Automatic Segmentation of Mediastinal Lymph Nodes and Blood Vessels in Endobronchial Ultrasound (EBUS) Images Using Deep Learning.
- Author
-
Ervik, Øyvind, Tveten, Ingrid, Hofstad, Erlend Fagertun, Langø, Thomas, Leira, Håkon Olav, Amundsen, Tore, and Sorger, Hanne
- Subjects
ARTIFICIAL neural networks ,IMAGE analysis ,LYMPH nodes ,BLOOD vessels ,ANATOMICAL variation ,DEEP learning - Abstract
Endobronchial ultrasound (EBUS) is used in the minimally invasive sampling of thoracic lymph nodes. In lung cancer staging, the accurate assessment of mediastinal structures is essential but challenged by variations in anatomy, image quality, and operator-dependent image interpretation. This study aimed to automatically detect and segment mediastinal lymph nodes and blood vessels employing a novel U-Net architecture-based approach in EBUS images. A total of 1161 EBUS images from 40 patients were annotated. For training and validation, 882 images from 30 patients and 145 images from 5 patients were utilized. A separate set of 134 images was reserved for testing. For lymph node and blood vessel segmentation, the mean ± standard deviation (SD) values of the Dice similarity coefficient were 0.71 ± 0.35 and 0.76 ± 0.38, those of the precision were 0.69 ± 0.36 and 0.82 ± 0.22, those of the sensitivity were 0.71 ± 0.38 and 0.80 ± 0.25, those of the specificity were 0.98 ± 0.02 and 0.99 ± 0.01, and those of the F1 score were 0.85 ± 0.16 and 0.81 ± 0.21, respectively. The average processing and segmentation run-time per image was 55 ± 1 ms (mean ± SD). The new U-Net architecture-based approach (EBUS-AI) could automatically detect and segment mediastinal lymph nodes and blood vessels in EBUS images. The method performed well and was feasible and fast, enabling real-time automatic labeling. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Advanced UAV Design Optimization Through Deep Learning-Based Surrogate Models.
- Author
-
Karali, Hasan, Inalhan, Gokhan, and Tsourdos, Antonios
- Subjects
ARTIFICIAL neural networks ,MULTIDISCIPLINARY design optimization ,AEROSPACE engineering ,STRUCTURAL optimization ,DRONE aircraft ,DEEP learning - Abstract
The conceptual design of unmanned aerial vehicles (UAVs) presents significant multidisciplinary challenges requiring the optimization of aerodynamic and structural performance, stealth, and propulsion efficiency. This work addresses these challenges by integrating deep neural networks with a multiobjective genetic algorithm to optimize UAV configurations. The proposed framework enables a comprehensive evaluation of design alternatives by estimating key performance metrics required for different operational requirements. The design process resulted in a significant improvement in computational time over traditional methods by more than three orders of magnitude. The findings illustrate the framework's capability to optimize UAV designs for a variety of mission scenarios, including specialized tasks such as intelligence, surveillance, and reconnaissance (ISR), combat air patrol (CAP), and Suppression of Enemy Air Defenses (SEAD). This flexibility and adaptability was demonstrated through a case study, showcasing the method's effectiveness in tailoring UAV configurations to meet specific operational requirements while balancing trade-offs between aerodynamic efficiency, stealth, and structural weight. Additionally, these results underscore the transformative impact of integrating AI into the early stages of the design process, facilitating rapid prototyping and innovation in aerospace engineering. Consequently, the current work demonstrates the potential of AI-driven optimization to revolutionize UAV design by providing a robust and effective tool for solving complex engineering problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Improved organs at risk segmentation based on modified U‐Net with self‐attention and consistency regularisation.
- Author
-
Manko, Maksym, Popov, Anton, Gorriz, Juan Manuel, and Ramirez, Javier
- Subjects
CHEST (Anatomy) ,ARTIFICIAL neural networks ,COMPUTED tomography ,RETINAL blood vessels ,IMAGE segmentation ,HEART ,ESOPHAGUS - Abstract
Cancer is one of the leading causes of death in the world, with radiotherapy as one of the treatment options. Radiotherapy planning starts with delineating the affected area from healthy organs, called organs at risk (OAR). A new approach to automatic OAR segmentation in the chest cavity in Computed Tomography (CT) images is presented. The proposed approach is based on the modified U‐Net architecture with the ResNet‐34 encoder, which is the baseline adopted in this work. The new two‐branch CS‐SA U‐Net architecture is proposed, which consists of two parallel U‐Net models in which self‐attention blocks with cosine similarity as query‐key similarity function (CS‐SA) blocks are inserted between the encoder and decoder, which enabled the use of consistency regularisation. The proposed solution demonstrates state‐of‐the‐art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient (oesophagus—0.8714, heart—0.9516, trachea—0.9286, aorta—0.9510) and Hausdorff distance (oesophagus—0.2541, heart—0.1514, trachea—0.1722, aorta—0.1114) and significantly outperforms the baseline. The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Tjong: A transformer‐based Mahjong AI via hierarchical decision‐making and fan backward.
- Author
-
Li, Xiali, Liu, Bo, Wei, Zhi, Wang, Zhaoqi, and Wu, Licheng
- Subjects
ARTIFICIAL neural networks ,ARTIFICIAL intelligence ,REINFORCEMENT learning ,DECISION making ,DEEP learning - Abstract
Mahjong, a complex game with hidden information and sparse rewards, poses significant challenges. Existing Mahjong AIs require substantial hardware resources and extensive datasets to enhance AI capabilities. The authors propose a transformer‐based Mahjong AI (Tjong) via hierarchical decision‐making. By utilising self‐attention mechanisms, Tjong effectively captures tile patterns and game dynamics, and it decouples the decision process into two distinct stages: action decision and tile decision. This design reduces decision complexity considerably. Additionally, a fan backward technique is proposed to address the sparse rewards by allocating reversed rewards for actions based on winning hands. Tjong consists of 15M parameters and is trained using approximately 0.5 M data over 7 days of supervised learning on a single server with 2 GPUs. The action decision achieved an accuracy of 94.63%, while the claim decision attained 98.55% and the discard decision reached 81.51%. In a tournament format, Tjong outperformed AIs (CNN, MLP, RNN, ResNet, VIT), achieving scores up to 230% higher than its opponents. Furthermore, after 3 days of reinforcement learning training, it ranked within the top 1% on the leaderboard on the Botzone platform. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Hybrid convolutional long short‐term memory models for sales forecasting in retail.
- Author
-
de Castro Moraes, Thais, Yuan, Xue‐Ming, and Chew, Ek Peng
- Subjects
SALES forecasting ,CONVOLUTIONAL neural networks ,ARTIFICIAL neural networks ,RETAIL industry ,DEEP learning ,COMPUTATIONAL complexity - Abstract
This study proposes novel sales forecasting approaches that merge deep learning methods in a hybrid model. Long short‐term memory (LSTM) is adopted for modeling the temporal characteristics of the data, whereas the convolutional neural network (CNN) focuses on identifying and extracting relevant exogenous information. We propose stacked (S‐CNN‐LSTM) and parallel (P‐CNN‐LSTM) hybrid architectures to understand complex time series data with varying seasonal patterns and multiple products correlations. The performance drivers of both architectures were empirically tested with a real‐world multivariate retail dataset and outperformed when compared with simple neural network architectures and standard autoregressive methods for short and long‐term forecasting horizons. When compared with traditional predictive approaches, the proposed hybrid models reduce the computational complexity while providing flexibility and robustness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Optimizing credit card fraud detection: a deep learning approach to imbalanced datasets.
- Author
-
Ndama, Oussama, Bensassi, Ismail, and En-Naimi, El Mokhtar
- Subjects
ARTIFICIAL neural networks ,CREDIT card fraud ,FRAUD investigation ,CREDIT risk ,MACHINE learning - Abstract
Imbalanced datasets pose a significant challenge in credit card fraud detection, hindering the training effectiveness of models due to the scarcity of fraudulent cases. This study addresses the critical problem of data imbalance through an in-depth exploration of techniques, including cross-entropy loss minimization, weighted optimization, and synthetic minority oversampling technique-based resampling, coupled with deep neural networks (DNNs). The urgent need to combat class imbalances in credit card fraud datasets is underscored, emphasizing the creation of reliable detection models. The research method delves into the application of DNNs, strategically optimizing and resampling the dataset to enhance model performance. The study employs a dataset from October 2018, containing 284,807 transactions, with a mere 492 classified as fraudulent. Various resampling techniques, such as undersampling and SMOTE oversampling, are evaluated alongside weighted optimization. The results showcase the effectiveness of SMOTE oversampling, achieving an accuracy of 99.83% without any false negatives. The study concludes by advocating for flexible strategies, integrating cutting-edge machine learning methods, and developing adaptive defenses to safeguard against emerging financial risks in credit card fraud detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.