15,142 results on '"deep Neural Networks"'
Search Results
2. DONN: leveraging heterogeneous outer products for CTR prediction.
- Author
-
Kim, Tae-Suk
- Subjects
- *
ARTIFICIAL neural networks , *MATRIX decomposition , *RECOMMENDER systems , *QUADRATIC forms , *VANILLA , *DEEP learning - Abstract
A primary strategy for constructing click-through rate models based on deep learning involves combining a multi-layer perceptron (MLP) with custom networks that can effectively capture the interactions between different features. This is due to the widespread recognition that relying solely on a vanilla MLP network is not effective in acquiring knowledge about multiplicative feature interactions. These custom networks often employ product methods, such as inner, Hadamard, and outer products, to construct dedicated architectures for this purpose. Among these methods, the outer product has shown superiority in capturing feature interactions. However, the resulting quadratic form from the outer product operation limits the conveyance of informative higher-order interactions to the MLP. Efforts to address this limitation have led to models attempting to increase interaction degrees to higher orders. However, utilizing matrix factorization techniques to reduce learning parameters has resulted in information loss and decreased performance. Furthermore, previous studies have constrained the MLP's potential by providing it with inputs consisting of homogeneous outer products, thus limiting available information diversity. To overcome these challenges, we introduce DONN, a model that leverages a composite-wise bilinear module incorporating factorized bilinear pooling to mitigate information loss and facilitate higher-order interaction development. Additionally, DONN utilizes a feature-wise bilinear module for outer product computations between feature pairs, augmenting the MLP with combined information. By employing heterogeneous outer products, DONN enhances the MLP's prediction capabilities, enabling the recognition of additional nonlinear interdependencies. Our evaluation on two benchmark datasets demonstrates that DONN surpasses state-of-the-art models in terms of performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Nonlinear continuum thermo-mechanics of the composite thick panel used in concrete ceilings via general expression of the polynomial series theory.
- Author
-
Chen, Haojie, Zeng, Jie, El-Meligy, Mohammed A., and Sharaf, Mohamed
- Subjects
- *
ARTIFICIAL neural networks , *CONCRETE durability , *VIBRATION of buildings , *NONLINEAR equations , *SHEAR (Mechanics) - Abstract
Proper concrete vibration is vital to the final quality and durability of concrete structures. There is a lack of objective methods to assess conformance to the requirements of vibration behavior of concrete systems used in various engineering structures. So, in the current work for the first time, nonlinear vibrations of a composite thick panel made of concrete materials are presented. In the current work, the GPLs are used to reinforce the thick concrete panel in the length direction. In the current work, according to the higher-order shear deformation shell theory the effects of the von Kármán strain-displacement kinematic nonlinearity are presented in the constitutive laws of the shell. The nonlinear governing equations for various nonlinear boundary edges are solved via discretization of equations on the space domain, derivation of Duffing-type equations, and Hadamard and Kronecker Products. The results are validated by comparing the current results with deep neural networks (DNN) and open-source results in the literature. For DNN, it is introduced a supervised neural network based on physical information to predict the vibrational behavior of the current system. In this context, data-driven solutions and data-driven discovery are presented to solve the problem of determining nonlinear frequency. Consequently, new results are investigated to show the effects of the Pasternak modulus, Winkler modulus, and the GPLs' weight fraction on the nonlinear vibrations of concrete panels. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Dynamic training for handling textual label noise.
- Author
-
Cheng, Shaohuan, Chen, Wenyu, Liu, Wanlong, Zhou, Li, Zhao, Honglin, Kong, Weishan, Qu, Hong, and Fu, Mingsheng
- Subjects
ARTIFICIAL neural networks ,NOISE ,MEMORIZATION ,GENERALIZATION ,SPINE - Abstract
Label noise causes deep neural networks to gradually memorize incorrect labels, leading to a decline in generalization. In this paper, based on three observations from learning behavior in textual noise scenarios, we propose a dynamic training method to enhance model robustness and generalization against textual label noise. This method corrects noisy labels by dynamically incorporating the model's predictions. The combination weight of the original labels is a decay function on training time, which relates to the learning dynamics. Additionally, our method introduces r-drop and prior regularization terms to ensure that the single-model backbone generates reliable predictions, thereby obtaining accurate corrected labels. This design removes the stage splitting and data segmentation required by existing SOTA methods and effectively mitigates the adverse impact of erroneous labels without introducing additional dependencies. Experimental results on four text classification datasets demonstrate that dynamic training outperforms strong baselines designed for class-conditional and instance-dependent noises within the common noise range. Our code is available at https://github.com/shaohuancheng/noisy_label_for_exp_decay. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Weaker number sense accounts for impaired numerosity perception in dyscalculia: Behavioral and computational evidence.
- Author
-
Dolfi, Serena, Decarli, Gisella, Lunardon, Maristella, De Filippo De Grazia, Michele, Gerola, Silvia, Lanfranchi, Silvia, Cossu, Giuseppe, Sella, Francesco, Testolin, Alberto, and Zorzi, Marco
- Subjects
- *
ARTIFICIAL neural networks , *EXECUTIVE function , *ARITHMETIC mean , *ACALCULIA , *NUMBER systems , *COGNITION - Abstract
Impaired numerosity perception in developmental dyscalculia (low "number acuity") has been interpreted as evidence of reduced representational precision in the neurocognitive system supporting non‐symbolic number sense. However, recent studies suggest that poor numerosity judgments might stem from stronger interference from non‐numerical visual information, in line with alternative accounts that highlight impairments in executive functions and visuospatial abilities in the etiology of dyscalculia. To resolve this debate, we used a psychophysical method designed to disentangle the contribution of numerical and non‐numerical features to explicit numerosity judgments in a dot comparison task and we assessed the relative saliency of numerosity in a spontaneous categorization task. Children with dyscalculia were compared to control children with average mathematical skills matched for age, IQ, and visuospatial memory. In the comparison task, the lower accuracy of dyscalculics compared to controls was linked to weaker encoding of numerosity, but not to the strength of non‐numerical biases. Similarly, in the spontaneous categorization task, children with dyscalculia showed a weaker number‐based categorization compared to the control group, with no evidence of a stronger influence of non‐numerical information on category choice. Simulations with a neurocomputational model of numerosity perception showed that the reduction of representational resources affected the progressive refinement of number acuity, with little effect on non‐numerical bias in numerosity judgments. Together, these results suggest that impaired numerosity perception in dyscalculia cannot be explained by increased interference from non‐numerical visual cues, thereby supporting the hypothesis of a core number sense deficit. Research Highlights: A strongly debated issue is whether impaired numerosity perception in dyscalculia stems from a deficit in number sense or from poor executive and visuospatial functions.Dyscalculic children show reduced precision in visual numerosity judgments and weaker number‐based spontaneous categorization, but no increasing reliance on continuous visual properties.Simulations with deep neural networks demonstrate that reduced neural/computational resources affect the developmental trajectory of number acuity and account for impaired numerosity judgments.Our findings show that weaker number acuity in developmental dyscalculia is not necessarily related to increased interference from non‐numerical visual cues. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Object-centric Learning with Capsule Networks: A Survey.
- Author
-
De Sousa Ribeiro, Fabio, Duarte, Kevin, Everett, Miles, Leontidis, Georgios, and Shah, Mubarak
- Published
- 2024
- Full Text
- View/download PDF
7. Low-dimensional intrinsic dimension reveals a phase transition in gradient-based learning of deep neural networks.
- Author
-
Tan, Chengli, Zhang, Jiangshe, Liu, Junmin, and Zhao, Zixiang
- Abstract
Deep neural networks complete a feature extraction task by propagating the inputs through multiple modules. However, how the representations evolve with the gradient-based optimization remains unknown. Here we leverage the intrinsic dimension of the representations to study the learning dynamics and find that the training process undergoes a phase transition from expansion to compression under disparate training regimes. Surprisingly, this phenomenon is ubiquitous across a wide variety of model architectures, optimizers, and data sets. We demonstrate that the variation in the intrinsic dimension is consistent with the complexity of the learned hypothesis, which can be quantitatively assessed by the critical sample ratio that is rooted in adversarial robustness. Meanwhile, we mathematically show that this phenomenon can be analyzed in terms of the mutable correlation between neurons. Although the evoked activities obey a power-law decaying rule in biological circuits, we identify that the power-law exponent of the representations in deep neural networks predicted adversarial robustness well only at the end of the training but not during the training process. These results together suggest that deep neural networks are prone to producing robust representations by adaptively eliminating or retaining redundancies. The code is publicly available at https://github.com/cltan023/learning2022. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. PRE-DNNOFF: ON-DEMAND DNN MODEL OFFLOADING METHOD FOR MOBILE EDGE COMPUTING.
- Author
-
LIN ZUO
- Subjects
ARTIFICIAL neural networks ,MOBILE computing ,EDGE computing ,INTELLIGENT networks ,GENETIC algorithms - Abstract
Deep Neural Networks (DNNs) are critical for modern intelligent processing but cause significant latency and energy consumption issues on mobile devices due to their high computational demands. Moreover, different tasks have different accuracy demands for DNN inference. To balance latency and accuracy across various tasks, we introduce PreDNNOff, a method that offloads DNNs at a layer granularity within the Mobile Edge Computing (MEC) environment. PreDNNOff utilizes a binary stochastic programming model and Genetic Algorithms (GAs) to optimize the expected latency for multiple exit points based on the distribution of task inference accuracy and layer latency regression models. Compared to the existing method Edgent, PreDNNOff has achieved a reduction of about 10% in the expected total latency, and due to the consideration of different tasks' varying requirements for accuracy, it has a broader applicability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. NeuroTAP: Thermal and Memory Access Pattern-Aware Data Mapping on 3D DRAM for Maximizing DNN Performance.
- Author
-
Pandey, Shailja and Panda, Preeti Ranjan
- Subjects
ARTIFICIAL neural networks ,VERTICAL integration ,DATA mapping ,POWER density ,DYNAMIC random access memory ,MEMORY - Abstract
Deep neural networks (DNNs) have been widely adopted, owing to break-through performance and high accuracy. DNNs exhibit varying memory behavior involving specific and recognizable memory access patterns and access intensity, depending on the selected data reuse in different layers. Such applications have high memory bandwidth demands due to aggressive computations, performing several billion-floating-point-operations-per-second (BFLOPs). 3D DRAMs, providing very high memory access bandwidth, are extensively employed to break the memory wall, bridging the gap between compute and memory while running DNNs. However, the vertical integration in 3D DRAM introduces serious thermal issues, resulting from high power density and close proximity of memory cells, and requires dynamic thermal management (DTM). To unleash the true potential of 3D DRAM and exploit the enormous bandwidth under thermal constraints, there is a need to intelligently map the DNN application's data across memory channels, pseudo-channels, and banks, minimizing the effective memory latency and reducing the thermal-induced application slowdown. The specific memory access patterns exhibited by a DNN layer execution are crucial to determine a favorable data mapping method for 3D DRAM dies that potentially causes minimal thermal impact and also maximizes DRAM bandwidth utilization. In this work, we propose an application-aware and thermal-sensitive data mapping that intelligently assigns portions of the 3D DRAM to DNN layers, leveraging the knowledge about layer's memory access patterns and minimizing DTM-induced performance overheads. Additionally, we also deploy a DRAM low-power states based DTM mechanism to keep the 3D DRAM within safe thermal limits. Using our proposal, we observe a performance improvement of 1% to 61%, and memory energy savings of 1% to 55% for popular DNNs over state-of-the-art DTM strategies while running DNN inference. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Optimization Strategies for Urban Waterlogging Warning in Complex Environments: Based on Particle Swarm Optimization and Deep Neural Networks.
- Author
-
Hu, Xiande, Gu, Fenfei, and Fan, Xueping
- Subjects
ARTIFICIAL neural networks ,COMPUTER input design ,PARTICLE swarm optimization ,BAYESIAN analysis ,NONLINEAR equations - Abstract
Waterlogging warning has gradually become an important means of urban waterlogging prevention and control. However, the current urban waterlogging warning model still has problems such as low accuracy, real‐time performance, and poor model convergence. To better address these issues, this article combines particle swarm optimization (PSO) and deep neural networks (DNN) to explore the construction of early warning models in depth. First, the influencing factors of urban waterlogging were analyzed in the article, and the PSO algorithm was used to determine the influencing factors of urban waterlogging in this study; then, the selected influencing factors were used as input data to design a backpropagation (BP) neural network (NN) structure; several representative waterlogging points can be selected to construct a BP NN model and perform fitting analysis. Afterward, combined with the PSO algorithm, the constructed model was trained and optimized. In this article, the old town of Hefei City is used as the experimental object, and the building model is used to conduct early warning research on waterlogging. The study's findings indicate that the PSO + BP model's average accuracy in 10 early warning tests is as high as 97.95%, with a response time of only 0.022 ms; the average accuracy and response time of the BP model are 89.06% and 0.255 ms, respectively; the Bayesian network model (BN model) is 82.78% and 0.275 ms. Through the analysis of actual cases in the old urban area of Hefei City, the advantages of this model in practical application were verified, and a new intelligent warning method for urban waterlogging prevention and control was provided, demonstrating its effectiveness and potential in dealing with complex nonlinear problems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Deep neural network-based robotic visual servoing for satellite target tracking.
- Author
-
Ghiasvand, Shayan, Xie, Wen-Fang, and Mohebbi, Abolfazl
- Subjects
ARTIFICIAL neural networks ,ROBOT vision ,ARTIFICIAL satellite tracking ,DEEP learning ,SPACE stations - Abstract
In response to the costly and error-prone manual satellite tracking on the International Space Station (ISS), this paper presents a deep neural network (DNN)-based robotic visual servoing solution to the automated tracking operation. This innovative approach directly addresses the critical issue of motion decoupling, which poses a significant challenge in current image moment-based visual servoing. The proposed method uses DNNs to estimate the manipulator's pose, resulting in a significant reduction of coupling effects, which enhances control performance and increases tracking precision. Real-time experimental tests are carried out using a 6-DOF Denso manipulator equipped with an RGB camera and an object, mimicking the targeting pin. The test results demonstrate a 32.04% reduction in pose error and a 21.67% improvement in velocity precision compared to conventional methods. These findings demonstrate that the method has the potential to improve efficiency and accuracy significantly in satellite target tracking and capturing. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Artificial intelligence-Enabled deep learning model for multimodal biometric fusion.
- Author
-
Byeon, Haewon, Raina, Vikas, Sandhu, Mukta, Shabaz, Mohammad, Keshta, Ismail, Soni, Mukesh, Matrouk, Khaled, Singh, Pavitar Parkash, and Lakshmi, T. R. Vijaya
- Subjects
ARTIFICIAL neural networks ,ARTIFICIAL intelligence ,INFORMATION technology security ,DEEP learning ,DATA quality ,BIOMETRIC identification ,MULTIMODAL user interfaces ,BIOMETRY - Abstract
The goal of information security is to prevent unauthorized access to data. There are several conventional ways to confirm user identity, such as using a password, user name, and keys. These conventional methods are rather limited; they can be stolen, lost, copied, or cracked. Because multimodal biometric identification systems are more secure and have higher recognition efficiency than unimodal biometric systems, they get attention. Single-modal biometric recognition systems perform poorly in real-world public security operations because of poor biometric data quality. Some of the drawbacks of current multimodal fusion methods include low generalization and single-level fusion. This study presents a novel multimodal biometric fusion model that significantly enhances accuracy and generalization through the power of artificial intelligence. Various fusion methods, encompassing pixel-level, feature-level, and score-level fusion, are seamlessly integrated through deep neural networks. At the pixel level, we employ spatial, channel, and intensity fusion strategies to optimize the fusion process. On the feature level, modality-specific branches and jointly optimized representation layers establish robust dependencies between modalities through backpropagation. Finally, intelligent fusion techniques, such as Rank-1 and modality evaluation, are harnessed to blend matching scores on the score level. To validate the model's effectiveness, we construct a virtual homogeneous multimodal dataset using simulated operational data. Experimental results showcase significant improvements compared to single-modal algorithms, with a remarkable 2.2 percentage point increase in accuracy achieved through multimodal feature fusion. The score fusion method surpasses single-modal algorithms by 3.5 percentage points, reaching an impressive retrieval accuracy of 99.6%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. A Stock Prediction Method Based on Multidimensional and Multilevel Feature Dynamic Fusion.
- Author
-
Dong, Yuxin and Hao, Yongtao
- Subjects
ARTIFICIAL neural networks ,STOCK prices ,FINANCIAL markets ,PRICES ,GOVERNMENT policy - Abstract
Stock price prediction has long been a topic of interest in academia and the financial industry. Numerous factors influence stock prices, such as a company's performance, industry development, national policies, and other macroeconomic factors. These factors are challenging to quantify, making predicting stock price movements difficult. This paper presents a novel deep neural network framework that leverages the dynamic fusion of multi-dimensional and multi-level features for stock price prediction, which means we utilize fundamental trading data and technical indicators as multi-dimensional data and local and global multi-level information. Firstly, the model dynamically assigns weights to multi-dimensional features of stocks to capture the impact of each feature on stock prices. Next, it applies the Fourier transform to the global features to capture the long-term trends of the global environment and dynamically fuses these with local and global features of the stocks to capture the overall market environment's impact on individual stocks. Finally, temporal features are captured using an attention layer and an RNN-based model, which incorporates historical price data to forecast future prices. Experiments on stocks from various industries within the Chinese CSI 300 index reveal that the proposed model outperforms traditional methods and other deep learning approaches in terms of stock price prediction. This paper proposes a method that facilitates the dynamic integration of multi-dimensional and multi-level features in an efficient manner and experimental results show that it improves the accuracy of stock price predictions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Strategic safeguarding: A game theoretic approach for analyzing attacker-defender behavior in DNN backdoors.
- Author
-
Kallas, Kassem, Le Roux, Quentin, Hamidouche, Wassim, and Furon, Teddy
- Subjects
ARTIFICIAL neural networks ,ZERO sum games ,UTILITY functions ,NASH equilibrium ,GAME theory - Abstract
Deep neural networks (DNNs) are fundamental to modern applications like face recognition and autonomous driving. However, their security is a significant concern due to various integrity risks, such as backdoor attacks. In these attacks, compromised training data introduce malicious behaviors into the DNN, which can be exploited during inference or deployment. This paper presents a novel game-theoretic approach to model the interactions between an attacker and a defender in the context of a DNN backdoor attack. The contribution of this approach is multifaceted. First, it models the interaction between the attacker and the defender using a game-theoretic framework. Second, it designs a utility function that captures the objectives of both parties, integrating clean data accuracy and attack success rate. Third, it reduces the game model to a two-player zero-sum game, allowing for the identification of Nash equilibrium points through linear programming and a thorough analysis of equilibrium strategies. Additionally, the framework provides varying levels of flexibility regarding the control afforded to each player, thereby representing a range of real-world scenarios. Through extensive numerical simulations, the paper demonstrates the validity of the proposed framework and identifies insightful equilibrium points that guide both players in following their optimal strategies under different assumptions. The results indicate that fully using attack or defense capabilities is not always the optimal strategy for either party. Instead, attackers must balance inducing errors and minimizing the information conveyed to the defender, while defenders should focus on minimizing attack risks while preserving benign sample performance. These findings underscore the effectiveness and versatility of the proposed approach, showcasing optimal strategies across different game scenarios and highlighting its potential to enhance DNN security against backdoor attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Information FOMO: The Unhealthy Fear of Missing Out on Information—A Method for Removing Misleading Data for Healthier Models.
- Author
-
Pickering, Ethan and Sapsis, Themistoklis P.
- Subjects
- *
ARTIFICIAL neural networks , *KRIGING , *MACHINE learning , *DATA modeling , *EXPERIMENTAL design - Abstract
Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset while ignoring data that are either misleading or bring unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data lead to worse performance and instabilities of the surrogate model, often termed sample-wise "double descent". We find these instabilities are a result of the complexity of the underlying map and are linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. One-dimensional deep learning driven geospatial analysis for flash flood susceptibility mapping: a case study in North Central Vietnam.
- Author
-
Hoa, Pham Viet, Binh, Nguyen An, Hong, Pham Viet, An, Nguyen Ngoc, Thao, Giang Thi Phuong, Hanh, Nguyen Cao, Ngo, Phuong Thao Thi, and Bui, Dieu Tien
- Subjects
- *
ARTIFICIAL neural networks , *GEOSPATIAL data , *SUPPORT vector machines , *DEEP learning , *NATURAL disasters - Abstract
Flash floods rank among the most catastrophic natural disasters worldwide, inflicting severe socio-economic, environmental, and human impacts. Consequently, accurately identifying areas at potential risk is of paramount importance. This study investigates the efficacy of Deep 1D-Convolutional Neural Networks (Deep 1D-CNN) in spatially predicting flash floods, with a specific focus on the frequent tropical cyclone-induced flash floods in Thanh Hoa province, North Central Vietnam. The Deep 1D-CNN was structured with four convolutional layers, two pooling layers, one flattened layer, and two fully connected layers, employing the ADAM algorithm for optimization and Mean Squared Error (MSE) for loss calculation. A geodatabase containing 2540 flash flood locations and 12 influencing factors was compiled using multi-source geospatial data. The database was used to train and check the model. The results indicate that the Deep 1D-CNN model achieved high predictive accuracy (90.2%), along with a Kappa value of 0.804 and an AUC (Area Under the Curve) of 0.969, surpassing the benchmark models such as SVM (Support Vector Machine) and LR (Logistic Regression). The study concludes that the Deep 1D-CNN model is a highly effective tool for modeling flash floods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Improving Localization in Wireless Sensor Networks for the Internet of Things Using Data Replication-Based Deep Neural Networks.
- Author
-
Esheh, Jehan and Affes, Sofiene
- Subjects
- *
ARTIFICIAL neural networks , *WIRELESS sensor networks , *DATA augmentation , *STANDARD deviations , *WIRELESS localization , *LOCALIZATION (Mathematics) - Abstract
Localization is one of the most challenging problems in wireless sensor networks (WSNs), primarily driven by the need to develop an accurate and cost-effective localization system for Internet of Things (IoT) applications. While machine learning (ML) algorithms have been widely applied in various WSN-based tasks, their effectiveness is often compromised by limited training data, leading to issues such as overfitting and reduced accuracy, especially when the number of sensor nodes is low. A key strategy to mitigate overfitting involves increasing both the quantity and diversity of the training data. To address the limitations posed by small datasets, this paper proposes an intelligent data augmentation strategy (DAS)-based deep neural network (DNN) that enhances the localization accuracy of WSNs. The proposed DAS replicates the estimated positions of unknown nodes generated by the Dv-hop algorithm and introduces Gaussian noise to these replicated positions, creating multiple modified datasets. By combining the modified datasets with the original training data, we significantly increase the dataset size, which leads to a substantial reduction in normalized root mean square error (NRMSE). The experimental results demonstrate that this data augmentation technique significantly improves the performance of DNNs compared to the traditional Dv-hop algorithm at a low number of nodes while maintaining an efficient computational cost for data augmentation. Therefore, the proposed method provides a scalable and effective solution for enhancing the localization accuracy of WSNs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Θ-Net: A Deep Neural Network Architecture for the Resolution Enhancement of Phase-Modulated Optical Micrographs In Silico.
- Author
-
Kaderuppan, Shiraz S., Sharma, Anurag, Saifuddin, Muhammad Ramadan, Wong, Wai Leong Eugene, and Woo, Wai Lok
- Subjects
- *
ARTIFICIAL neural networks , *PHASE-contrast microscopy , *MICROSCOPY , *OPTICAL engineering , *IMAGE denoising , *NEAR-field microscopy , *INTERFERENCE microscopy - Abstract
Optical microscopy is widely regarded to be an indispensable tool in healthcare and manufacturing quality control processes, although its inability to resolve structures separated by a lateral distance under ~200 nm has culminated in the emergence of a new field named fluorescence nanoscopy, while this too is prone to several caveats (namely phototoxicity, interference caused by exogenous probes and cost). In this regard, we present a triplet string of concatenated O-Net ('bead') architectures (termed 'Θ-Net' in the present study) as a cost-efficient and non-invasive approach to enhancing the resolution of non-fluorescent phase-modulated optical microscopical images in silico. The quality of the afore-mentioned enhanced resolution (ER) images was compared with that obtained via other popular frameworks (such as ANNA-PALM, BSRGAN and 3D RCAN), with the Θ-Net-generated ER images depicting an increased level of detail (unlike previous DNNs). In addition, the use of cross-domain (transfer) learning to enhance the capabilities of models trained on differential interference contrast (DIC) datasets [where phasic variations are not as prominently manifested as amplitude/intensity differences in the individual pixels unlike phase-contrast microscopy (PCM)] has resulted in the Θ-Net-generated images closely approximating that of the expected (ground truth) images for both the DIC and PCM datasets. This thus demonstrates the viability of our current Θ-Net architecture in attaining highly resolved images under poor signal-to-noise ratios while eliminating the need for a priori PSF and OTF information, thereby potentially impacting several engineering fronts (particularly biomedical imaging and sensing, precision engineering and optical metrology). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Ptychographic phase retrieval via a deep‐learning‐assisted iterative algorithm.
- Author
-
Yamada, Koki, Akaishi, Natsuki, Yatabe, Kohei, and Takayama, Yuki
- Subjects
- *
ARTIFICIAL neural networks , *STAR maps (Astronomy) , *ACQUISITION of data , *ALGORITHMS , *LIGHTING - Abstract
Ptychography is a powerful computational imaging technique with microscopic imaging capability and adaptability to various specimens. To obtain an imaging result, it requires a phase‐retrieval algorithm whose performance directly determines the imaging quality. Recently, deep neural network (DNN)‐based phase retrieval has been proposed to improve the imaging quality from the ordinary model‐based iterative algorithms. However, the DNN‐based methods have some limitations because of the sensitivity to changes in experimental conditions and the difficulty of collecting enough measured specimen images for training the DNN. To overcome these limitations, a ptychographic phase‐retrieval algorithm that combines model‐based and DNN‐based approaches is proposed. This method exploits a DNN‐based denoiser to assist an iterative algorithm like ePIE in finding better reconstruction images. This combination of DNN and iterative algorithms allows the measurement model to be explicitly incorporated into the DNN‐based approach, improving its robustness to changes in experimental conditions. Furthermore, to circumvent the difficulty of collecting the training data, it is proposed that the DNN‐based denoiser be trained without using actual measured specimen images but using a formula‐driven supervised approach that systemically generates synthetic images. In experiments using simulation based on a hard X‐ray ptychographic measurement system, the imaging capability of the proposed method was evaluated by comparing it with ePIE and rPIE. These results demonstrated that the proposed method was able to reconstruct higher‐spatial‐resolution images with half the number of iterations required by ePIE and rPIE, even for data with low illumination intensity. Also, the proposed method was shown to be robust to its hyperparameters. In addition, the proposed method was applied to ptychographic datasets of a Simens star chart and ink toner particles measured at SPring‐8 BL24XU, which confirmed that it can successfully reconstruct images from measurement scans with a lower overlap ratio of the illumination regions than is required by ePIE and rPIE. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Deep learning for ultrasound medical images: artificial life variant.
- Author
-
Karunanayake, Nalan and Makhanov, Stanislav S.
- Subjects
- *
ARTIFICIAL neural networks , *COMPUTER-assisted image analysis (Medicine) , *IMAGE segmentation , *BREAST ultrasound , *DIAGNOSTIC imaging , *DEEP learning - Abstract
Segmentation of tumors in the ultrasound (US) images of the breast is a critical problem in medical imaging. Due to the poor quality of the US images and varying specifications of the US machines, the segmentation and classification of the abnormalities present difficulties even for trained radiologists. Nevertheless, the US remains one of the most reliable and inexpensive tests. Recently, an artificial life (ALife) model based on tracing agents and fusion of the US and the elasticity images (F-ALife) has been proposed and analyzed. Under certain conditions, F-ALife outperforms state-of-the-art including the selected deep learning (DL) models, deformable models, machine learning, contour grouping and superpixels. Apart from the improved accuracy, F-ALife requires smaller training sets. The strongest competitors of the F-ALife are hybrids of the DL with conventional models. However, the current DL methods require a large amount of data (thousands of annotated images), which often is not available. Moreover, the hybrids require that the conventional model is properly integrated into the DL. Therefore, we offer a new DL-based hybrid with ALife. It is characterized by a high accuracy, requires a relatively small dataset, and is capable of handling previously unseen data. The new ideas include (1) a special image mask to guide ALife. The mask is generated using DL and the distance transform, (2) modification of ALife for segmentation of the US images providing a high accuracy. (These ideas are motivated by the "vehicles" of Braitenberg (Vehicles, experiments in synthetic psychology, MIT Press, Cambridge, 1984) and ALife proposed in Karunanayake et al. (Pattern Recognit 108838, 2022), (3) a two-level genetic algorithm which includes training by an individual image and by the entire set of images. The training employs an original categorization of the images based on the properties of the edge maps. The efficiency of the algorithm is demonstrated on complex tumors. The method combines the strengths of the DL neural networks with the speed and interpretability of ALife. The tests based on the characteristics of the edge map and complexity of the tumor shape show the advantages of the proposed DL-ALife. The model outperforms 14 state-of-the-art algorithms applied to the US images characterized by a complex geometry. Finally, the novel classification allows us to test and analyze the limitations of the DL for the processing of the unseen data. The code is applicable to breast cancer diagnostics (Automated Breast Ultra Sound), US-guided biopsies as well as to projects related to automatic breast scanners. A video demo is at https://tinyurl.com/3xthedff. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Development of a Machine Learning (ML)-Based Computational Model to Estimate the Engineering Properties of Portland Cement Concrete (PCC).
- Author
-
Polo-Mendoza, Rodrigo, Martinez-Arguelles, Gilberto, Peñabaena-Niebles, Rita, and Duque, Jose
- Subjects
- *
ARTIFICIAL neural networks , *CONCRETE construction , *CONSTRUCTION materials , *PORTLAND cement , *ENGINEERING models - Abstract
Portland cement concrete (PCC) is the construction material most used worldwide. Hence, its proper characterization is fundamental for the daily-basis engineering practice. Nonetheless, the experimental measurements of the PCC's engineering properties (i.e., Poisson's Ratio -v-, Elastic Modulus -E-, Compressive Strength -ComS-, and Tensile Strength -TenS-) consume considerable amounts of time and financial resources. Therefore, the development of high-precision indirect methods is fundamental. Accordingly, this research proposes a computational model based on deep neural networks (DNNs) to simultaneously predict the v, E, ComS, and TenS. For this purpose, the Long-Term Pavement Performance database was employed as the data source. In this regard, the mix design parameters of the PCC are adopted as input variables. The performance of the DNN model was evaluated with 1:1 lines, goodness-of-fit parameters, Shapley additive explanations assessments, and running time analysis. The results demonstrated that the proposed DNN model exhibited an exactitude higher than 99.8%, with forecasting errors close to zero (0). Consequently, the machine learning-based computational model designed in this investigation is a helpful tool for estimating the PCC's engineering properties when laboratory tests are not attainable. Thus, the main novelty of this study is creating a robust model to determine the v, E, ComS, and TenS by solely considering the mix design parameters. Likewise, the central contribution to the state-of-the-art achieved by the present research effort is the public launch of the developed computational tool through an open-access GitHub repository, which can be utilized by engineers, designers, agencies, and other stakeholders. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Deep recommendation with iteration directional adversarial training.
- Author
-
Paul, Agyemang, Wan, Yuxuan, Wu, Zhefu, Chen, Boyu, and Gong, Shufeng
- Subjects
- *
ARTIFICIAL neural networks , *RECOMMENDER systems , *CONSUMER preferences , *COMPUTER vision , *CONSUMER goods , *DEEP learning - Abstract
Deep neural networks are vulnerable to attacks, posing significant security concerns across various applications, particularly in computer vision. Adversarial training has demonstrated effectiveness in improving the robustness of deep learning models by incorporating perturbations into the input space during training. Recently, adversarial training has been successfully applied to deep recommender systems. In these systems, user and item embeddings are perturbed through a minimax game, with constraints on perturbation directions, to enhance the model's robustness and generalization. However, they still fail to defend against iterative attacks, which have shown an over 60% increase in effectiveness in the computer vision domain. Deep recommender systems may therefore be more susceptible to iterative attacks, which might lead to generalization failures. In this paper, we adapt iterative examples for deep recommender systems. Specifically, we propose a Deep Recommender with Iteration Directional Adversarial Training (DRIDAT) that combines attention mechanism and directional adversarial training for recommendations. Firstly, we establish a consumer-product collaborative attention to convey consumers different preferences on their interested products and the distinct preferences of different consumers on the same product they like. Secondly, we train the DRIDAT objective function using adversarial learning to minimize the impact of iterative attack. In addition, the maximum direction attack could push the embedding vector of input attacks towards instances with distinct labels. We mitigate this problem by implementing suitable constraints on the direction of the attack. Finally, we perform a series of evaluations on two prominent datasets. The findings show that our methodology outperforms all other methods for all metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. A Novel Non-Iterative Deep Convolutional Neural Network with Kernelized Classification for Robust Face Recognition.
- Author
-
Vishwakarma, Virendra P., Gupta, Reena, and Yadav, Abhay Kumar
- Abstract
Deep Convolutional Neural Networks (DCNNs) are very useful for image-based pattern classification problems because of their efficient feature extraction capabilities. Although DCNNs have good generalization performance, their applicability is limited due to slow learning speed, as they are based on iterative weight-update algorithms. This study presents a new noniterative DCNN that can be trained in real-time. The fundamental block of the proposed DCNN is fixed real number-based filters for convolution operations for multi-feature extraction. After a finite number of feature extraction layers, nonlinear kernel mapping along with pseudo-inverse is used for the classification of extracted feature vectors. The proposed DCNN, named Deep Convolutional Kernelized Classification (DCKC), is noniterative, as the mask coefficients of its convolution operations are fixed real numbers. The kernel function with predefined parameters of DCKC does a nonlinear mapping of extracted features, and pseudo-inverse is used to find its output weights. The proposed noniterative DCKC was evaluated on benchmark face recognition databases, achieving better results and establishing its superiority. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. BTSC: Binary tree structure convolution layers for building interpretable decision‐making deep CNN.
- Author
-
Wang, Yuqi, Dai, Dawei, Liu, Da, Xia, Shuyin, and Wang, Guoyin
- Abstract
Although deep convolution neural network (DCNN) has achieved great success in computer vision field, such models are considered to lack interpretability in decision‐making. One of fundamental issues is that its decision mechanism is considered to be a "black‐box" operation. The authors design the binary tree structure convolution (BTSC) module and control the activation level of particular neurons to build the interpretable DCNN model. First, the authors design a BTSC module, in which each parent node generates two independent child layers, and then integrate them into a normal DCNN model. The main advantages of the BTSC are as follows: 1) child nodes of the different parent nodes do not interfere with each other; 2) parent and child nodes can inherit knowledge. Second, considering the activation level of neurons, the authors design an information coding objective to guide neural nodes to learn the particular information coding that is expected. Through the experiments, the authors can verify that: 1) the decision‐making made by both the ResNet and DenseNet models can be explained well based on the "decision information flow path" (known as the decision‐path) formed in the BTSC module; 2) the decision‐path can reasonably interpret the decision reversal mechanism (Robustness mechanism) of the DCNN model; 3) the credibility of decision‐making can be measured by the matching degree between the actual and expected decision‐path. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Learning Compressed Artifact for JPEG Manipulation Localization Using Wide-Receptive-Field Network.
- Author
-
Li, Fengyong, Zhai, Huajun, Liu, Teng, Zhang, Xinpeng, and Qin, Chuan
- Subjects
ARTIFICIAL neural networks ,JPEG (Image coding standard) ,LEARNING modules ,FORGERY ,LOCALIZATION (Mathematics) ,GENERALIZATION - Abstract
JPEG image manipulation localization aims to accurately classify and locate tampered regions in JPEG images. Existing image manipulation localization schemes usually consider diverse data streams of spatial domain, e.g. noise inconsistency and local content inconsistency. They, however, easily ignore an objective scenario: data stream features of spatial domain are hard to directly apply to compressed image format, e.g., JPEG, because tampered JPEG images may contain severe re-compression inconsistency and re-compression artifacts, when they are re-compressed to JPEG format. As a result, the traditional localization schemes relying on general data streams of spatial domain may result in a large number of false detection of tampered region in JPEG images. To address the above problem, we a new JPEG image manipulation localization scheme, in which a wide-receptive-field attention network is designed to effectively learn JPEG compressed artifacts. We firstly introduce the wide-receptive-field attention mechanism to re-construct U-Net network, which can effectively capture contextual information of JPEG images and analyze tampering traces from different image regions. Furthermore, a flexible JPEG compressed artifact learning module is designed to capture the image noise caused by JPEG compression, in which the weights can be adjusted flexibly based on image quality, without the need for decompression operations on JPEG images. Our proposed method can significantly strength the differentiation capability of detection model for tampered and non-tampered regions. A series of experiments are performed over different image sets, and the results demonstrate that the proposed scheme can achieve an overall localization performance for multi-scale JPEG manipulation regions and outperform most of state-of-the-art schemes in terms of detection accuracy, generalization and robustness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. A Non-Invasive Fetal QRS Complex Detection Method Based on a Multi-Feature Fusion Neural Network.
- Author
-
Huang, Zhuya, Yu, Junsheng, Shan, Ying, and Wang, Xiangqing
- Subjects
ARTIFICIAL neural networks ,BLIND source separation ,FETAL heart rate ,SIGNAL denoising ,DATA quality ,FETAL heart - Abstract
Fetal heart monitoring, as a crucial part of fetal monitoring, can accurately reflect the fetus's health status in a timely manner. To address the issues of high computational cost, inability to observe fetal heart morphology, and insufficient accuracy associated with the traditional method of calculating the fetal heart rate using a four-channel maternal electrocardiogram (ECG), a method for extracting fetal QRS complexes from a single-channel non-invasive fetal ECG based on a multi-feature fusion neural network is proposed. Firstly, a signal entropy data quality detection algorithm based on the blind source separation method is designed to select maternal ECG signals that meet the quality requirements from all channel ECG data, followed by data preprocessing operations such as denoising and normalization on the signals. After being segmented by the sliding window method, the maternal ECG signals are calculated as data in four modes: time domain, frequency domain, time–frequency domain, and data eigenvalues. Finally, the deep neural network using three multi-feature fusion strategies—feature-level fusion, decision-level fusion, and model-level fusion—achieves the effect of quickly identifying fetal QRS complexes. Among the proposed networks, the one with the best performance has an accuracy of 95.85% and sensitivity of 97%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Diagnosis of glaucoma using multi‐scale attention block in convolution neural network and data augmentation techniques.
- Author
-
Khajeha, Hamid Reza, Fateh, Mansoor, and Abolghasemi, Vahid
- Subjects
ARTIFICIAL neural networks ,CONVOLUTIONAL neural networks ,DATA augmentation ,MEDICAL personnel ,VISION disorders - Abstract
Glaucoma is defined as an eye disease leading to vision loss due to the optic nerve damage. It is often asymptomatic, thus, timely diagnosis and treatment is crucial. In this article, we propose a novel approach for diagnosing glaucoma using deep neural networks, trained on fundus images. Our proposed approach involves several key steps, including data sampling, pre‐processing, and classification. To address the data imbalance issue, we employ a combination of suitable image augmentation techniques and Multi‐Scale Attention Block (MAS Block) architecture in our deep neural network model. The MAS Block is a specific architecture design for CNNs that allows multiple convolutional filters of various sizes to capture features at several scales in parallel. This will prevent the over‐fitting problem and increases the detection accuracy. Through extensive experiments with the ACRIMA dataset, we demonstrate that our proposed approach achieves high accuracy in diagnosing glaucoma. Notably, we recorded the highest accuracy (97.18%) among previous studies. The results from this study reveal the potential of our approach to improve early detection of glaucoma and offer more effective treatment strategies for doctors and clinicians in the future. Timely diagnosis plays a crucial role in managing glaucoma since it is often asymptomatic. Our proposed method utilizing deep neural networks shows promise in enhancing diagnostic accuracy and aiding healthcare professionals in making informed decisions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Exploring Learning Strategies for Training Deep Neural Networks Using Multiple Graphics Processing Units.
- Author
-
Nien-Tsu Hu, Ching-Chien Huang, Chih-Chieh Mo, and Chien-Lin Huang
- Subjects
ARTIFICIAL neural networks ,LEARNING strategies ,COMPUTERS ,SPEECH perception ,COMPUTER algorithms - Abstract
Neural network algorithms are becoming more commonly used to model big data, such as images and speech. Although they often offer superior performance, they require more training time than traditional approaches. Graphics processing units (GPUs) are an excellent solution for reducing training time. The use of multiple GPUs, in addition to a single GPU, can further improve computing power. Training DNNs with algorithm and computer hardware support can be challenging when selecting an appropriate learning strategy. In this work, we investigate various learning strategies for training DNNs using multiple GPUs. Experimental data show that using six GPUs with the suggested approach results in a speed boost of approximately four times that of using a single GPU. Moreover, the precision of the suggested method using six GPUs is similar to that of using a single GPU. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. A hybrid deep learning using reptile dragonfly search algorithm for reducing the PAPR in OFDM systems.
- Author
-
Raveen, Panchireddi and Ratna Kumari, Uppalapati Venkata
- Subjects
ARTIFICIAL neural networks ,MACHINE learning ,ORTHOGONAL frequency division multiplexing ,RECURRENT neural networks ,MULTI-carrier modulation - Abstract
Orthogonal frequency division multiplexing (OFDM) is a famous multi-carrier modulation technique as it has a vast range of features like robustness against multi-path fading, higher bandwidth efficiency, and higher data rates. Though, OFDM has its own challenges. Among them, high peak power to average power ratio (PAPR) of the transmitted signal is the major problem in OFDM. In recent years, deep learning has drastically enhanced the performance of PAPR. In addition, the excessive training data and high computational complexity lead to a considerable issue in OFDM system. Thus, this paper implements a new PAPR reduction scheme in OFDM Systems through hybrid deep learning algorithms. A new optimized hybrid deep learning termed O-DNN + RNN is implemented by integrating the deep neural networks (DNN) and recurrent neural networks (RNN), where the parameters of both DNN and RNN are optimized using Hybrid Reptile Dragonfly Search Algorithm (HR-DSA). The new deep learning model is adopted for determining the constellation mapping and demapping of symbols on each subcarrier. This new optimized hybrid deep learning helps in reducing the PAPR and maximizes the performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Context-aware cross feature attentive network for click-through rate predictions.
- Author
-
Lee, Soojin and Hwang, Sangheum
- Subjects
ARTIFICIAL neural networks ,RECOMMENDER systems ,INTERNET advertising ,SOCIAL interaction ,FORECASTING - Abstract
Click-through rate (CTR) prediction aims to estimate the likelihood that a user will interact with an item. It has gained significant attention in areas such as online advertising and e-commerce. Existing studies have verified that feature interactions play a crucial role in CTR prediction, highlighting the need for efficient modeling of these interactions. However, most existing approaches in CTR prediction tend to overlook specific feature characteristics, relying instead on deep neural networks or advanced attention mechanisms to learn meaningful feature interactions. In real-world scenarios, features can be categorized into groups based on prior information, which motivates the explicit consideration of interactions between groups of features. For example, the unique context of an item often has a substantial correlation with a particular user, and a specific item often has a strong relationship with a particular user demographic. An efficient model, therefore, requires an appropriate inductive bias to learn these relationships. To address this issue, we present a Context-aware Cross Feature Attentive Network (CCFAN) that explicitly considers the relationship or association between items and users. We categorize input variables into four groups: user, item, user context, and item context, which allows learning significant interactions between (user)-(item context) and (item)-(user context) in an explicit way. These interactions are learned using a multi-head self-attention network that includes modules for user-item interaction and cross-feature interaction. To demonstrate the effectiveness of CCFAN, we conduct experiments on two public benchmark datasets, MovieLens1M and Frappe, and one real-world dataset from an educational service provider, WJTB. The experimental results show that CCFAN not only outperforms previous state-of-the-art CTR methods but also offers a high degree of explainability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Comprehensive comparisons of gradient-based multi-label adversarial attacks.
- Author
-
Chen, Zhijian, Luo, Wenjian, Naseem, Muhammad Luqman, Kong, Linghao, and Yang, Xiangkai
- Subjects
ARTIFICIAL neural networks ,CLASSIFICATION algorithms ,ALGORITHMS - Abstract
Adversarial examples which mislead deep neural networks by adding well-crafted perturbations have become a major threat to classification models. Gradient-based white-box attack algorithms have been widely used to generate adversarial examples. However, most of them are designed for multi-class models, and only a few gradient-based adversarial attack algorithms specifically designed for multi-label classification models. Due to the correlation between multiple labels, the performance of these gradient-based algorithms in generating adversarial examples for multi-label classification is worthy of analyzing and evaluating comprehensively. In this paper, we first transplant five typical gradient-based adversarial attack algorithms in the multi-class environment to the multi-label environment. Secondly, we comprehensively compared the performance of these five attack algorithms and the other four existing multi-label adversarial attack algorithms by experiments on six different attack types, and evaluated the transferability of adversarial examples generated by all algorithms under two attack types. Experimental results show that, among different attack types, the majority of multi-step attack algorithms have higher attack success rates compared to one-step attack algorithms. Additionally, these gradient-based algorithms face greater difficulty in augmenting labels than in hiding them. For transfer experimental results, the adversarial examples generated by all attack algorithms exhibit weaker transferability when attacking other different models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. S2HConv-Caps-BiGRU: Deep Learning-Based Heterogeneous Face Recognition Model with Divergent Stages.
- Author
-
Balayesu, Narasimhula and Reddy, Avuthu Avinash
- Subjects
- *
ARTIFICIAL neural networks , *STANDARD deviations , *FEATURE extraction , *DATABASES , *PUBLIC safety - Abstract
Real-time images of faces captured in different spectrum bands are considered heterogeneous images. Heterogeneous Face Recognition (HFR) matches faces across domains and is crucial to public safety. This paper proposes an HFR approach based on Deep Neural Networks (DNN). Feature maps are extracted from two images, such as gallery and sketch images, using Squirrel Search Heterogeneous Convolutional-Capsule- Bidirectional Gated Recurrent Unit (S2HConv-Caps-BiGRU). As a method of efficiently recognizing faces, coupled representation similarity metric (CRSM) will use the measure for the similarity of two feature maps. The experimental results will be evaluated to state-of-the-art (SOTA) statistical measures in terms of accuracy, recall, Jaccard score, dice score, mean square error (MSE), image similarity, performance and root mean square error (RMSE). Compared to other SOTA, the model produces the best results. The accuracy value of a CUFS dataset is 98.7%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. A self-attention-based deep architecture for online handwriting recognition.
- Author
-
Molavi, Seyed Alireza and BabaAli, Bagher
- Subjects
- *
ARTIFICIAL neural networks , *AUTOMATIC speech recognition , *NATURAL language processing , *RECURRENT neural networks , *ARTIFICIAL intelligence - Abstract
The self-attention mechanism has been the most frequent and efficient way for processing and learning sequences in numerous domains of artificial intelligence, including natural language processing, automatic speech recognition, and computer vision in recent years. It has a strong ability to learn the dependencies between the points of the input sequence, particularly those that are separated by a distance, and it also allows for parallel processing of the sequence. As a result, when used in processing sequences, this mechanism has a stronger ability to extract an appropriate representation from the input sequence at a faster rate than other approaches such as recurrent neural networks. Despite the benefits of the self-attention mechanism, recurrent neural networks along with feature engineering have been the most commonly employed approaches to online handwriting recognition. This study introduces an end-to-end online handwriting recognition system that utilizes the self-attention mechanism into three different modeling methods: CTC-based, RNN-T, and encoder–decoder. The proposed system demonstrates the capacity to recognize handwritten scripts without the need for feature engineering. The system's performance was evaluated using the Arabic Online-KHATT dataset and the English IAM-OnDB dataset. On the former, it achieved character error rate (CER) of 4.78% and word error rate (WER) of 20.63%, and on the latter, the CER of 4.10% and the WER of 14.31%, both of which were noticeably better than the results previously reported. Additionally, the Persian Online Handwriting Database was utilized for experimental validation, resulting in a CER 8.03% and a WER of 28.39%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Deep neural network models for cell type prediction based on single-cell Hi-C data.
- Author
-
Zhou, Bing, Liu, Quanzhong, Wang, Meili, and Wu, Hao
- Subjects
- *
ARTIFICIAL neural networks , *CHROMOSOME structure , *DATA libraries , *CELL anatomy , *DRUG development - Abstract
Background: Cell type prediction is crucial to cell type identification of genomics, cancer diagnosis and drug development, and it can solve the time-consuming and difficult problem of cell classification in biological experiments. Therefore, a computational method is urgently needed to classify and predict cell types using single-cell Hi-C data. In previous studies, there is a lack of convenient and accurate method to predict cell types based on single-cell Hi-C data. Deep neural networks can form complex representations of single-cell Hi-C data and make it possible to handle the multidimensional and sparse biological datasets. Results: We compare the performance of SCANN with existing methods and analyze the model by using five different evaluation metrics. When using only ML1 and ML3 datasets, the ARI and NMI values of SCANN increase by 14% and 11% over those of scHiCluster respectively. However, when using all six libraries of data, the ARI and NMI values of SCANN increase by 63% and 88% over those of scHiCluster respectively. These findings show that SCANN is highly accurate in predicting the type of independent cell samples using single-cell Hi-C data. Conclusions: SCANN enhances the training speed and requires fewer resources for predicting cell types. In addition, when the number of cells in different cell types was extremely unbalanced, SCANN has higher stability and flexibility in solving cell classification and cell type prediction using the single-cell Hi-C data. This predication method can assist biologists to study the differences in the chromosome structure of cells between different cell types. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Enhanced speech emotion recognition using averaged valence arousal dominance mapping and deep neural networks.
- Author
-
Rizhinashvili, Davit, Sham, Abdallah Hussein, and Anbarjafari, Gholamreza
- Abstract
This study delves into advancements in speech emotion recognition (SER) by establishing a novel approach for emotion mapping and prediction using the Valence-Arousal-Dominance (VAD) model. Central to this research is the creation of reliable emotion-to-VAD mappings, achieved by averaging outcomes from multiple pre-trained networks applied to the RAVDESS dataset. This approach adeptly resolves prior inconsistencies in emotion-to-VAD mappings and establishes a dependable framework for SER. The study also introduces a refined SER model, integrating the pre-trained Wave2Vec 2.0 with Long Short-Term Memory (LSTM) networks and linear layers, culminating in an output layer representing valence, arousal, and dominance. Notably, this model exhibits commendable accuracy across various datasets, such as RAVDESS, EMO-DB, CREMA-D, and TESS, thereby showcasing its robustness and adaptability, an improvement over earlier models susceptible to dataset-specific overfitting. The research further unveils a comprehensive speech analysis application, adept at denoising, segmenting, and profiling emotions in speech segments. This application features interactive emotion tracking and sentiment reports, illustrating its practicality in diverse applications. The study recognizes ongoing challenges in SER, especially in managing the subjective nature of emotion perception and integrating multimodal data. Although the research marks a progression in SER technology, it underscores the need for continuous research and careful consideration of ethical aspects in deploying such technologies. This work contributes to the SER domain by introducing a dependable method for emotion mapping, a robust model for emotion recognition, and a user-friendly application for practical implementations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Dual-Stream CoAtNet models for accurate breast ultrasound image segmentation.
- Author
-
Zaidkilani, Nadeem, Garcia, Miguel Angel, and Puig, Domenec
- Subjects
- *
ARTIFICIAL neural networks , *BREAST ultrasound , *TRANSFORMER models , *ULTRASONIC imaging , *IMAGE segmentation , *BREAST - Abstract
The CoAtNet deep neural model has been shown to achieve state-of-the-art performance by stacking convolutional and self-attention layers. In particular, the initial layers of CoAtNet apply efficient convolutions for extracting local features out of the input image and the initial fine-resolution feature maps. In turn, the final layers apply more cumbersome Transformers in order to extract global features from the coarse-resolution feature maps. The model's outcome directly depends on those final global features. This paper proposes an extension of the original CoAtNet model based on the introduction of a dual stream of convolution and self-attention blocks applied at the final layers of CoAtNet. In this way, those final layers automatically aggregate both local and global features extracted from the initial feature maps. Two dual-stream topologies have been proposed and evaluated. This Dual-Stream CoAtNet model exhibits a significant improvement on the segmentation accuracy of breast ultrasound images, thus contributing to the development of more robust tumor detection methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Bridging auditory perception and natural language processing with semantically informed deep neural networks.
- Author
-
Esposito, Michele, Valente, Giancarlo, Plasencia-Calaña, Yenisel, Dumontier, Michel, Giordano, Bruno L., and Formisano, Elia
- Subjects
- *
ARTIFICIAL neural networks , *NATURAL language processing , *HUMAN behavior , *CONVOLUTIONAL neural networks , *AUDITORY perception - Abstract
Sound recognition is effortless for humans but poses a significant challenge for artificial hearing systems. Deep neural networks (DNNs), especially convolutional neural networks (CNNs), have recently surpassed traditional machine learning in sound classification. However, current DNNs map sounds to labels using binary categorical variables, neglecting the semantic relations between labels. Cognitive neuroscience research suggests that human listeners exploit such semantic information besides acoustic cues. Hence, our hypothesis is that incorporating semantic information improves DNN's sound recognition performance, emulating human behaviour. In our approach, sound recognition is framed as a regression problem, with CNNs trained to map spectrograms to continuous semantic representations from NLP models (Word2Vec, BERT, and CLAP text encoder). Two DNN types were trained: semDNN with continuous embeddings and catDNN with categorical labels, both with a dataset extracted from a collection of 388,211 sounds enriched with semantic descriptions. Evaluations across four external datasets, confirmed the superiority of semantic labeling from semDNN compared to catDNN, preserving higher-level relations. Importantly, an analysis of human similarity ratings for natural sounds, showed that semDNN approximated human listener behaviour better than catDNN, other DNNs, and NLP models. Our work contributes to understanding the role of semantics in sound recognition, bridging the gap between artificial systems and human auditory perception. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. A neural network pruning and quantization algorithm for hardware deployment.
- Author
-
WANG Peng, ZHANG Jia-cheng, and FAN Yu-yang
- Abstract
Abstract:Due to their superior performance, deep neural networks have been widely applied in fields such as image recognition and object detection. However, they contain a large number of parameters and require immense computational power, posing challenges for deployment on mobile edge devices that require low latency and low power consumption. To address this issue, a compression algorithm that replaces multiplication operations with bit-shifting and addition is proposed. This algorithm compresses neural network parameters to low bit-widths through pruning and quantization. This algorithm reduces the hardware deployment difficulty under limited multiplication resources, meets the requirements of low latency and low power consumption on mobile edge devices, and improves operational efficiency. Experiments conducted on classical neural networks with the ImageNet dataset revealed that when the neural network parameters were compressed to 4 bits, the accuracy remained essentially unchanged compared to the full-precision neural network. Furthermore, for ResNet18, ResNet50, and GoogleNet, the Top-1/Top-5 accuracies even improved by 0.38%/0.22%, 0.35%/0.21%, and 1.14%/0.57%, respectively. When testing the eighth convolutional layer of VGG16 deployed on Zynq7035, the results showed that the compressed network reduced the inference time by 51.1% and power consumption by 46.7%, while using 43% fewer DSP resources. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. A Physics-Informed Neural Network Based on the Boltzmann Equation with Multiple-Relaxation-Time Collision Operators.
- Author
-
Liu, Zhixiang, Zhang, Chenkai, Zhu, Wenhao, and Huang, Dongmei
- Subjects
- *
ARTIFICIAL neural networks , *BOLTZMANN'S equation , *DISTRIBUTION (Probability theory) , *MULTISCALE modeling , *DECOMPOSITION method , *DEEP learning - Abstract
The Boltzmann equation with multiple-relaxation-time (MRT) collision operators has been widely employed in kinetic theory to describe the behavior of gases and liquids at the macro-level. Given the successful development of deep learning and the availability of data analytic tools, it is a feasible idea to try to solve the Boltzmann-MRT equation using a neural network-based method. Based on the canonical polyadic decomposition, a new physics-informed neural network describing the Boltzmann-MRT equation, named the network for MRT collision (NMRT), is proposed in this paper for solving the Boltzmann-MRT equation. The method of tensor decomposition in the Boltzmann-MRT equation is utilized to combine the collision matrix with discrete distribution functions within the moment space. Multiscale modeling is adopted to accelerate the convergence of high frequencies for the equations. The micro–macro decomposition method is applied to improve learning efficiency. The problem-dependent loss function is proposed to balance the weight of the function for different conditions at different velocities. These strategies will greatly improve the accuracy of the network. The numerical experiments are tested, including the advection–diffusion problem and the wave propagation problem. The results of the numerical simulation show that the network-based method can obtain a measure of accuracy at O 10 − 3 . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Parallel PSO for Efficient Neural Network Training Using GPGPU and Apache Spark in Edge Computing Sets.
- Author
-
Capel, Manuel I., Salguero-Hidalgo, Alberto, and Holgado-Terriza, Juan A.
- Subjects
- *
ARTIFICIAL neural networks , *PARTICLE swarm optimization , *ELECTRONIC data processing , *PARALLEL processing , *EDGE computing - Abstract
The training phase of a deep learning neural network (DLNN) is a computationally demanding process, particularly for models comprising multiple layers of intermediate neurons.This paper presents a novel approach to accelerating DLNN training using the particle swarm optimisation (PSO) algorithm, which exploits the GPGPU architecture and the Apache Spark analytics engine for large-scale data processing tasks. PSO is a bio-inspired stochastic optimisation method whose objective is to iteratively enhance the solution to a (usually complex) problem by approximating a given objective. The expensive fitness evaluation and updating of particle positions can be supported more effectively by parallel processing. Nevertheless, the parallelisation of an efficient PSO is not a simple process due to the complexity of the computations performed on the swarm of particles and the iterative execution of the algorithm until a solution close to the objective with minimal error is achieved. In this study, two forms of parallelisation have been developed for the PSO algorithm, both of which are designed for execution in a distributed execution environment. The synchronous parallel PSO implementation guarantees consistency but may result in idle time due to global synchronisation. In contrast, the asynchronous parallel PSO approach reduces the necessity for global synchronization, thereby enhancing execution time and making it more appropriate for large datasets and distributed environments such as Apache Spark. The two variants of PSO have been implemented with the objective of distributing the computational load supported by the algorithm across the different executor nodes of the Spark cluster to effectively achieve coarse-grained parallelism. The result is a significant performance improvement over current sequential variants of PSO. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Circumpolar Transport and Overturning Strength Inferred From Satellite Observables Using Deep Learning in an Eddying Southern Ocean Channel Model.
- Author
-
Meng, Shuai, Stewart, Andrew L., and Manucharyan, Georgy
- Subjects
- *
MERIDIONAL overturning circulation , *ARTIFICIAL neural networks , *CONVOLUTIONAL neural networks , *ANTARCTIC Circumpolar Current , *OCEAN bottom - Abstract
The Southern Ocean connects the ocean's major basins via the Antarctic Circumpolar Current (ACC), and closes the global meridional overturning circulation (MOC). Observing these transports is challenging because three‐dimensional mesoscale‐resolving measurements of currents, temperature, and salinity are required to calculate transport in density coordinates. Previous studies have proposed to circumvent these limitations by inferring subsurface transports from satellite measurements using data‐driven methods. However, it is unclear whether these approaches can identify the signatures of subsurface transport in the Southern Ocean, which exhibits an energetic mesoscale eddy field superposed on a highly heterogeneous mean stratification and circulation. This study employs Deep Learning techniques to link the transports of the ACC and the upper and lower branches of the MOC to sea surface height (SSH) and ocean bottom pressure (OBP), using an idealized channel model of the Southern Ocean as a test bed. A key result is that a convolutional neural network produces skillful predictions of the ACC transport and MOC strength (skill score of ∼ ${\sim} $0.74 and ∼ ${\sim} $0.44, respectively). The skill of these predictions is similar across timescales ranging from daily to decadal but decreases substantially if SSH or OBP is omitted as a predictor. Using a fully connected or linear neural network yields similarly accurate predictions of the ACC transport but substantially less skillful predictions of the MOC strength. Our results suggest that Deep Learning offers a route to linking the Southern Ocean's zonal transport and overturning circulation to remote measurements, even in the presence of pronounced mesoscale variability. Plain Language Summary: Monitoring changes in the strengths of Southern Ocean current systems is challenging due to their vast size and the region's relative inaccessibility. This study explores the potential for remotely monitoring these currents via satellite measurements. Neural networks are used to "learn" the relationship between satellite‐measurable ocean properties and the strengths of Southern Ocean currents, using a simplified simulation as a test case. A key question is whether the circulation can be inferred from satellite measurements when the ocean hosts a vigorous field of mesoscale eddies—horizontal swirls of fluid that reach hundreds of kilometers in diameter. Three neural network (NN) frameworks are trained to predict the simulated ocean circulation strength from the simulated satellite measurements, and then their performance is evaluated using a separate segment of the simulation data. It is shown that this approach yields accurate predictions of all of the targeted components of the Southern Ocean circulation strength, provided that the NNs use a "convolutional" filter, which enhances their ability to identify spatial patterns in the simulated satellite measurements, and thus to infer movements of ocean water induced by the eddies. These findings serve to guide future indirect approaches to observing the Southern Ocean using remote sensing. Key Points: Deep Learning methods link sea surface height and ocean bottom pressure to transport variability in an eddying Southern Ocean channel modelConvolutional neural network captures sub‐annual and interannual variance in both circumpolar transport and overturning strengthPredicting overturning variability requires convolutional kernel to capture eddy‐induced meridional transports [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Rescaling large datasets based on validation outcomes of a pre-trained network.
- Author
-
Nguyen, Thanh Tuan and Nguyen, Thanh Phuong
- Subjects
- *
ARTIFICIAL neural networks - Published
- 2024
- Full Text
- View/download PDF
43. Distributed edge to cloud ensemble deep learning architecture to diagnose Covid-19 from lung image in IoT based e-Health system.
- Author
-
Zamani, Mohammadreza and Sharifian, Saeed
- Subjects
- *
DEEP learning , *LUNGS , *ARTIFICIAL neural networks , *COVID-19 , *ARTIFICIAL intelligence , *COVID-19 pandemic , *INTERNET of things - Abstract
Today, with the expansion of technology and new architectures of deep learning, the accuracy of artificial intelligence methods in diagnosing diseases has increased. On the other hand, with the spread of new pandemic diseases such as Covid-19, timely and accurate diagnosis of the disease has become more important. Recently, proposed deep learning methods diagnose Covid-19 with acceptable accuracy but have expensive computational cost which could not distributed and implemented in edge devices. Sometimes the type of disease could be diagnosed by small models with few parameters. These small models can be placed in the fog or edge devices, and if they detect the disease with high confidence locally, the disease investigation request will not be sent to the cloud where the comprehensive and main trained model is located. Based on this idea; we proposed an ensemble of two deep learning models using boosting Shema named mobile COVID-Net, first a light weight MobileNet model designed and embedded in fog devices to diagnose pneumonia and Covid-19 which have similar symptoms with low computational cost and high confidence. If the embedded model fails to diagnose; a modified ResNet based neural network in the second layer designed to diagnose only Covid-19 with high precision in cloud, the distributed edge to cloud ensemble of neural network models trained and tested on publicly available dataset, has achieved a total accuracy of 93.8% for detection of Covid-19, in compare to 92.4% and 92% accuracy of COVID-Net and inception algorithms respectively. The most challenging part of the work is the accurate diagnosis of Covid-19 and pneumonia diseases from one another with the least amount of error. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. SRU-Net: a novel spatiotemporal attention network for sclera segmentation and recognition.
- Author
-
Mashayekhbakhsh, Tara, Meshgini, Saeed, Rezaii, Tohid Yousefi, and Makouei, Somayeh
- Abstract
Segmenting sclera images for effective recognition under non-cooperative conditions poses a significant challenge due to the prevalent noise. While U-Net-based methods have shown success, their limitations in accurately segmenting objects with varying shapes necessitate innovative approaches. This paper introduces the spatiotemporal residual encoding and decoding network (SRU-Net), featuring multi-spatiotemporal feature integration (Ms-FI) modules and attention-pool mechanisms to enhance segmentation accuracy and robustness. Ms-FI modules within SRU-Net’s encoders and decoders identify salient feature regions and prune responses, while attention-pool modules improve segmentation robustness. To assess the proposed SRU-Net, we conducted experiments using six datasets, employing precision, recall, and F1-score metrics. The experimental results demonstrate the superiority of SRU-Net over state-of-the-art methods. Specifically, SRU-Net achieves F1-score values of 94.58%, 98.31%, 98.49%, 97.52%, 95.3%, 97.47%, and 93.11% for MSD, MASD, SVBPI, MASD+MSD, UBIRIS.v1, UBIRIS.v2, and MICHE, respectively. Further evaluation in recognition tasks, with metrics such as AUC, EER, VER@0.1%FAR, and VER@1%FAR considered for the six datasets. The proposed pipeline, comprising SRU-Net and auto encoders (AE), outperforms previous research for all datasets. Particularly noteworthy is the comparison of EER, where SRU-Net + AE exhibits the best recognition results, achieving an EER of 9.42%, 3.81%, and 5.73% for MSD, MASD, and MICHE datasets, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Development and evaluation of a deep neural network model for orthokeratology lens fitting.
- Author
-
Yang, Hsiu‐Wan Wendy, Liang, Chih‐Kai Leon, Chou, Shih‐Chi, Wang, Hsin‐Hui, and Chiang, Huihua Kenny
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *CORNEAL topography , *MACHINE learning , *ORTHOKERATOLOGY - Abstract
Purpose: To optimise the precision and efficacy of orthokeratology, this investigation evaluated a deep neural network (DNN) model for lens fitting. The objective was to refine the standardisation of fitting procedures and curtail subjective evaluations, thereby augmenting patient safety in the context of increasing global myopia. Methods: A retrospective study of successful orthokeratology treatment was conducted on 266 patients, with 449 eyes being analysed. A DNN model with an 80%–20% training‐validation split predicted lens parameters (curvature, power and diameter) using corneal topography and refractive indices. The model featured two hidden layers for precision. Results: The DNN model achieved mean absolute errors of 0.21 D for alignment curvature (AC), 0.19 D for target power (TP) and 0.02 mm for lens diameter (LD), with R2 values of 0.97, 0.95 and 0.91, respectively. Accuracy decreased for myopia of less than 1.00 D, astigmatism exceeding 2.00 D and corneal curvatures >45.00 D. Approximately, 2% of cases with unique physiological characteristics showed notable prediction variances. Conclusion: While exhibiting high accuracy, the DNN model's limitations in specifying myopia, cylinder power and corneal curvature cases highlight the need for algorithmic refinement and clinical validation in orthokeratology practice. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Learning Scatter Artifact Correction in Cone-Beam X-Ray CT Using Incomplete Projections with Beam Hole Array.
- Author
-
Hattori, Haruki, Yatagawa, Tatsuya, Ohtake, Yutaka, and Suzuki, Hiromasa
- Abstract
X-ray cone-beam computed tomography (CBCT) is a powerful tool for nondestructive testing and evaluation, yet the CT image quality can be compromised by artifact due to X-ray scattering within dense materials such as metals. This problem leads to the need for hardware- and software-based scatter artifact correction to enhance the image quality. Recently, deep learning techniques have merged as a promising approach to obtain scatter-free images efficiently. However, these deep learning techniques rely heavily on training data, often gathered through simulation. Simulated CT images, unfortunately, do not accurately reproduce the real properties of objects, and physically accurate X-ray simulation still requires significant computation time, hindering the collection of a large number of CT images. To address these problems, we propose a deep learning framework for scatter artifact correction using projections obtained solely by real CT scanning. To this end, we utilize a beam-hole array (BHA) to block the X-rays deviating from the primary beam path, thereby capturing scatter-free X-ray intensity at certain detector pixels. As the BHA shadows a large portion of detector pixels, we incorporate several regularization losses to enhance the training process. Furthermore, we introduce radiographic data augmentation to mitigate the need for long scanning time, which is a concern as CT devices equipped with BHA require two series of CT scans. Experimental validation showed that the proposed framework outperforms a baseline method that learns simulated projections where the entire image is visible and does not contain scattering artifacts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Local Interpretations for Explainable Natural Language Processing: A Survey.
- Author
-
Luo, Siwen, Ivison, Hamish, Han, Soyeon Caren, and Poon, Josiah
- Published
- 2024
- Full Text
- View/download PDF
48. Automated barcodeless product classifier for food retail self-checkout images.
- Author
-
Ciapas, Bernardas and Treigys, Povilas
- Subjects
- *
SELF-service stores , *ARTIFICIAL neural networks , *RETAIL stores , *PLASTIC bags , *CONSUMERS - Abstract
Growing popularity of self-service in retail stores and increasing associated shrinkage presents an urgent need for computer-vision-based product recognition in the area of self-checkouts. The article focuses on individual product recognition using automated workflow in images collected from retail store self-checkouts. The interest of this research lies exclusively in the recognition of barcodeless products—ones that present a challenge of being identified quickly and qualitatively at self-checkouts. Image sets representative of retail store product distribution do not exist as of the time of writing to the authors' knowledge. Images collected by exploiting self-checkout events often contain products partially covered by customer body parts, inside semi-transparent plastic bags, or not present in the area of interest. Due to the huge assortment of products that varies between stores and changes frequently, manual image labeling, filtering and long training time are unpractical. The proposed method investigates the need for automated steps to eliminate empty images and eliminate images where product visibility is unsatisfactory. Authors achieved 80.5±1.2% classification accuracy on a real-world dataset of 194 products using automatic workflow. The ablation studies proved the need for image filtering in both training and inference workflows. The neural network architecture tuned to the self-checkout dataset proved to outperform well-known networks: the suggested architecture's training time is a fraction of ImageNet's best EfficientNet and accuracy is slightly better. The suggested method generalization is proved on comparable products dataset Fruits 360, where 99.6% accuracy was achieved—comparable or better than other authors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Fast continuous patch-based artistic style transfer for videos.
- Author
-
Wu, Bing, Dong, Qingshuang, and Sun, Wenqing
- Subjects
- *
ARTISTIC style , *ARTIFICIAL neural networks , *OPTICAL flow , *VIDEOS - Abstract
Convolutional neural network-based image style transfer models often suffer from temporal inconsistency when applied to video. Although several video style transfer models have been proposed to improve temporal consistency, they often trade off processing speed, perceptual style quality, and temporal consistency. In this work, we propose a novel approach for fast continuous patch-based arbitrary video style transfer that achieves high-quality transfer results while maintaining temporal coherence. Our approach begins with stylizing the first frame as a standalone single image using patch propagation within the content activation. Subsequent frames are computed based on the key insight that optical flow field evaluated from neighboring content activations provides meaningful information to preserve temporal coherence efficiently. To address the problems introduced from optical flow stage, we additionally incorporate a correction procedure as a post-process to ensure a high-quality stylized video. Finally, we demonstrate our method can transfer arbitrary styles on a set of examples and illustrate that our approach exhibits superior performance both qualitatively and quantitatively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Learning-based data-driven optimal deployment control of tethered space robot.
- Author
-
Jin, Ao, Zhang, Fan, and Huang, Panfeng
- Subjects
- *
ARTIFICIAL neural networks , *ROBOT dynamics , *DEEP learning , *OPTIMAL control theory , *LINEAR systems , *OPERATOR theory , *ROBOTS , *SPACE robotics - Abstract
• A data-driven optimal control framework is proposed for TSR deployment. • A linear representation of TSR's dynamics is derived with Koopman operator. • An enhanced deep learning method is proposed for finding embedding functions. To avoid complex constraints of the traditional nonlinear method for tethered space robot (TSR) deployment, a data-driven optimal control framework with an improve deep learning based Koopman operator is proposed in this work. In consideration of nonlinearity of tethered space robot dynamics, its finite dimensional global linear representation called lifted linear system is derived with the Koopman operator theory. A deep learning scheme is adopted to find the embedding functions associate with Koopman operator. And an auxiliary neural network is developed to encode the nonlinear control term of finite dimensional lifted system. Then a controllability constraint is considered for learning a controllable lifted linear system. Besides two loss functions that relate to reconstruction and prediction ability of lifted linear system are designed for training the deep neural network. With the learned lifted linear dynamics, Linear Quadratic Regulator (LQR) is applied to derive the optimal control policy for the tethered space robot deployment. Finally, simulation results verify the effectiveness of proposed framework and show that it could deploy tethered space robot more quickly with less swing of in-plane angle. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.