Author: "Ali Borji" / Publisher: ieee - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Ali Borji"' showing total 23 results

Start Over Author "Ali Borji" Publisher ieee

23 results on '"Ali Borji"'

1. Human Attention in Image Captioning: Dataset and Analysis

Author: Hamed R. Tavakoli, Sen He, Ali Borji, and Nicolas Pugeault
Subjects: FOS: Computer and information sciences, Closed captioning, Artificial neural network, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Speech recognition, media_common.quotation_subject, 05 social sciences, Computer Science - Computer Vision and Pattern Recognition, Eye movement, 02 engineering and technology, 050105 experimental psychology, Task (computing), Feature (computer vision), Perception, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 0501 psychology and cognitive sciences, Artificial intelligence, business, Encoder, media_common
Abstract: In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images. Using this data, we study the differences in human attention during free-viewing and image captioning tasks. We look into the relationship between human attention and language constructs during perception and sentence articulation. We also analyse attention deployment mechanisms in the top-down soft attention approach that is argued to mimic human attention in captioning tasks, and investigate whether visual saliency can help image captioning. Our study reveals that (1) human attention behaviour differs in free-viewing and image description tasks. Humans tend to fixate on a greater variety of regions under the latter task, (2) there is a strong relationship between described objects and attended objects ($97\%$ of the described objects are being attended), (3) a convolutional neural network as feature encoder accounts for human-attended regions during image captioning to a great extent (around $78\%$), (4) soft-attention mechanism differs from human attention, both spatially and temporally, and there is low correlation between caption scores and attention consistency scores. These indicate a large gap between humans and machines in regards to top-down attention, and (5) by integrating the soft attention model with image saliency, we can significantly improve the model's performance on Flickr30k and MSCOCO benchmarks. The dataset can be found at: https://github.com/SenHe/Human-Attention-in-Image-Captioning., Comment: To appear at ICCV 2019
Published: 2019

2. Understanding and Visualizing Deep Visual Saliency Models

Author: Ali Borji, Yang Mi, Nicolas Pugeault, Hamed R. Tavakoli, and Sen He
Subjects: FOS: Computer and information sciences, Artificial neural network, Contextual image classification, business.industry, Computer science, Computer Vision and Pattern Recognition (cs.CV), Deep learning, Feature extraction, Computer Science - Computer Vision and Pattern Recognition, Cognitive neuroscience of visual object recognition, 020207 software engineering, Pattern recognition, 02 engineering and technology, Data visualization, Salience (neuroscience), 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), 020201 artificial intelligence & image processing, Artificial intelligence, business, Set (psychology)
Abstract: Recently, data-driven deep saliency models have achieved high performance and have outperformed classical saliency models, as demonstrated by results on datasets such as the MIT300 and SALICON. Yet, there remains a large gap between the performance of these models and the inter-human baseline. Some outstanding questions include what have these models learned, how and where they fail, and how they can be improved. This article attempts to answer these questions by analyzing the representations learned by individual neurons located at the intermediate layers of deep saliency models. To this end, we follow the steps of existing deep saliency models, that is borrowing a pre-trained model of object recognition to encode the visual features and learning a decoder to infer the saliency. We consider two cases when the encoder is used as a fixed feature extractor and when it is fine-tuned, and compare the inner representations of the network. To study how the learned representations depend on the task, we fine-tune the same network using the same image set but for two different tasks: saliency prediction versus scene classification. Our analyses reveal that: 1) some visual regions (e.g. head, text, symbol, vehicle) are already encoded within various layers of the network pre-trained for object recognition, 2) using modern datasets, we find that fine-tuning pre-trained models for saliency prediction makes them favor some categories (e.g. head) over some others (e.g. text), 3) although deep models of saliency outperform classical models on natural images, the converse is true for synthetic stimuli (e.g. pop-out search arrays), an evidence of significant difference between human and data-driven saliency models, and 4) we confirm that, after-fine tuning, the change in inner-representations is mostly due to the task and not the domain shift in the data., Comment: To appear in CVPR2019, camera ready version
Published: 2019
Full Text: View/download PDF

3. Salient Object Detection With Pyramid Attention and Salient Edges

Author: Steven C. H. Hoi, Wenguan Wang, Jianbing Shen, Shuyang Zhao, and Ali Borji
Subjects: 0209 industrial biotechnology, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Object (computer science), Convolutional neural network, Edge detection, 020901 industrial engineering & automation, Salience (neuroscience), Salient, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Segmentation, Computer vision, Enhanced Data Rates for GSM Evolution, Pyramid (image processing), Artificial intelligence, Representation (mathematics), business
Abstract: This paper presents a new method for detecting salient objects in images using convolutional neural networks (CNNs). The proposed network, named PAGE-Net, offers two key contributions. The first is the exploitation of an essential pyramid attention structure for salient object detection. This enables the network to concentrate more on salient regions while considering multi-scale saliency information. Such a stacked attention design provides a powerful tool to efficiently improve the representation ability of the corresponding network layer with an enlarged receptive field. The second contribution lies in the emphasis on the importance of salient edges. Salient edge information offers a strong cue to better segment salient objects and refine object boundaries. To this end, our model is equipped with a salient edge detection module, which is learned for precise salient boundary estimation. This encourages better edge-preserving salient object segmentation. Exhaustive experiments confirm that the proposed pyramid attention and salient edges are effective for salient object detection. We show that our deep saliency model outperforms state-of-the-art approaches for several benchmarks with a fast processing speed (25fps on one GPU).
Published: 2019
Full Text: View/download PDF

4. Salient Object Detection Driven by Fixation Prediction

Author: Ali Borji, Wenguan Wang, Jianbing Shen, and Xingping Dong
Subjects: 0209 industrial biotechnology, Artificial neural network, Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, Salient object detection, Image segmentation, Object detection, 020901 industrial engineering & automation, Salience (neuroscience), Fixation (visual), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Segmentation, Artificial intelligence, business, Visual saliency
Abstract: Research in visual saliency has been focused on two major types of models namely fixation prediction and salient object detection. The relationship between the two, however, has been less explored. In this paper, we propose to employ the former model type to identify and segment salient objects in scenes. We build a novel neural network called Attentive Saliency Network (ASNet)1 that learns to detect salient objects from fixation maps. The fixation map, derived at the upper network layers, captures a high-level understanding of the scene. Salient object detection is then viewed as fine-grained object-level saliency segmentation and is progressively optimized with the guidance of the fixation map in a top-down manner. ASNet is based on a hierarchy of convolutional LSTMs (convLSTMs) that offers an efficient recurrent mechanism for sequential refinement of the segmentation map. Several loss functions are introduced for boosting the performance of the ASNet. Extensive experimental evaluation shows that our proposed ASNet is capable of generating accurate segmentation maps with the help of the computed fixation map. Our work offers a deeper insight into the mechanisms of attention and narrows the gap between salient object detection and fixation prediction.
Published: 2018
Full Text: View/download PDF

5. Detect Globally, Refine Locally: A Novel Approach to Saliency Detection

Author: Gang Yang, Shuo Wang, Ali Borji, Tiantian Wang, Xiang Ruan, Huchuan Lu, and Lihe Zhang
Subjects: Artificial neural network, business.industry, Computer science, Feature extraction, 020207 software engineering, Pattern recognition, 02 engineering and technology, Object (computer science), Convolutional neural network, Object detection, Visualization, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Artificial intelligence, Adaptive learning, business
Abstract: Effective integration of contextual information is crucial for salient object detection. To achieve this, most existing methods based on 'skip' architecture mainly focus on how to integrate hierarchical features of Convolutional Neural Networks (CNNs). They simply apply concatenation or element-wise operation to incorporate high-level semantic cues and low-level detailed information. However, this can degrade the quality of predictions because cluttered and noisy information can also be passed through. To address this problem, we proposes a global Recurrent Localization Network (RLN) which exploits contextual information by the weighted response map in order to localize salient objects more accurately. Particularly, a recurrent module is employed to progressively refine the inner structure of the CNN over multiple time steps. Moreover, to effectively recover object boundaries, we propose a local Boundary Refinement Network (BRN) to adaptively learn the local contextual information for each spatial position. The learned propagation coefficients can be used to optimally capture relations between each pixel and its neighbors. Experiments on five challenging datasets show that our approach performs favorably against all existing methods in terms of the popular evaluation metrics.
Published: 2018
Full Text: View/download PDF

6. Learning to Promote Saliency Detectors

Author: Yu Zeng, Ali Borji, Lihe Zhang, Huchuan Lu, and Mengyang Feng
Subjects: Pixel, Artificial neural network, Computer science, business.industry, 020208 electrical & electronic engineering, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, 02 engineering and technology, Salience (neuroscience), Salient, 0202 electrical engineering, electronic engineering, information engineering, Embedding, 020201 artificial intelligence & image processing, Artificial intelligence, business, Classifier (UML)
Abstract: The categories and appearance of salient objects vary from image to image, therefore, saliency detection is an image-specific task. Due to lack of large-scale saliency training data, using deep neural networks (DNNs) with pretraining is difficult to precisely capture the image-specific saliency cues. To solve this issue, we formulate a zero-shot learning problem to promote existing saliency detectors. Concretely, a DNN is trained as an embedding function to map pixels and the attributes of the salient/background regions of an image into the same metric space, in which an image-specific classifier is learned to classify the pixels. Since the image-specific task is performed by the classifier, the DNN embedding effectively plays the role of a general feature extractor. Compared with transferring the learning to a new recognition task using limited data, this formulation makes the DNN learn more effectively from small data. Extensive experiments on five data sets show that our method significantly improves accuracy of existing methods and compares favorably against state-of-the-art approaches.
Published: 2018
Full Text: View/download PDF

7. A Stagewise Refinement Model for Detecting Salient Objects in Images

Author: Tiantian Wang, Lihe Zhang, Pingping Zhang, Ali Borji, and Huchuan Lu
Subjects: Artificial neural network, Computer science, business.industry, Pooling, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 020207 software engineering, Pattern recognition, Context (language use), 02 engineering and technology, Image segmentation, Convolutional neural network, Object detection, Salience (neuroscience), 0202 electrical engineering, electronic engineering, information engineering, Feedforward neural network, 020201 artificial intelligence & image processing, Pyramid (image processing), Artificial intelligence, business
Abstract: Deep convolutional neural networks (CNNs) have been successfully applied to a wide variety of problems in computer vision, including salient object detection. To detect and segment salient objects accurately, it is necessary to extract and combine high-level semantic features with low-levelfine details simultaneously. This happens to be a challenge for CNNs as repeated subsampling operations such as pooling and convolution lead to a significant decrease in the initial image resolution, which results in loss of spatial details and finer structures. To remedy this problem, here we propose to augment feedforward neural networks with a novel pyramid pooling module and a multi-stage refinement mechanism for saliency detection. First, our deep feedward net is used to generate a coarse prediction map with much detailed structures lost. Then, refinement nets are integrated with local context information to refine the preceding saliency maps generated in the master branch in a stagewise manner. Further, a pyramid pooling module is applied for different-region-based global context aggregation. Empirical evaluations over six benchmark datasets show that our proposed method compares favorably against the state-of-the-art approaches.
Published: 2017
Full Text: View/download PDF

8. Deeply Supervised Salient Object Detection with Short Connections

Author: Ming-Ming Cheng, Zhuowen Tu, Philip H. S. Torr, Qibin Hou, Ali Borji, and Xiaowei Hu
Subjects: Boundary detection, FOS: Computer and information sciences, Computer science, Computer Vision and Pattern Recognition (cs.CV), Feature extraction, Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, 02 engineering and technology, Machine learning, computer.software_genre, Convolutional neural network, Edge detection, Object-class detection, Artificial Intelligence, 0202 electrical engineering, electronic engineering, information engineering, Segmentation, Artificial neural network, business.industry, Applied Mathematics, 020207 software engineering, Pattern recognition, Image segmentation, Object detection, Computational Theory and Mathematics, Feature (computer vision), 020201 artificial intelligence & image processing, Computer Vision and Pattern Recognition, Artificial intelligence, Enhanced Data Rates for GSM Evolution, business, computer, Software
Abstract: Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and saliency detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on salience detection is not obvious. In this paper, we propose a new method for saliency detection by introducing short connections to the skip-layer structures within the HED architecture. Our framework provides rich multi-scale feature maps at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.15 seconds per image), effectiveness, and simplicity over the existing algorithms., Comment: IEEE TPAMI 2018 (IEEE CVPR 2017)
Published: 2017
Full Text: View/download PDF

9. Saliency Revisited: Analysis of Mouse Movements Versus Fixations

Author: Jorma Laaksonen, Hamed R. Tavakoli, Fawad Ahmed, and Ali Borji
Subjects: FOS: Computer and information sciences, Computer science, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Gaze tracking, 02 engineering and technology, Mouse tracking, Machine learning, computer.software_genre, 050105 experimental psychology, Field (computer science), Mice, Databases, Salience (neuroscience), 0202 electrical engineering, electronic engineering, information engineering, Context modeling, 0501 psychology and cognitive sciences, Computer vision, Analytical models, Visualization, ta113, Measurement, Context model, business.industry, 05 social sciences, Image segmentation, Eye tracking, 020201 artificial intelligence & image processing, Artificial intelligence, User interface, business, computer
Abstract: This paper revisits visual saliency prediction by evaluating the recent advancements in this field such as crowd-sourced mouse tracking-based databases and contextual annotations. We pursue a critical and quantitative approach towards some of the new challenges including the quality of mouse tracking versus eye tracking for model training and evaluation. We extend quantitative evaluation of models in order to incorporate contextual information by proposing an evaluation methodology that allows accounting for contextual factors such as text, faces, and object attributes. The proposed contextual evaluation scheme facilitates detailed analysis of models and helps identify their pros and cons. Through several experiments, we find that (1) mouse tracking data has lower inter-participant visual congruency and higher dispersion, compared to the eye tracking data, (2) mouse tracking data does not totally agree with eye tracking in general and in terms of different contextual regions in specific, and (3) mouse tracking data leads to acceptable results in training current existing models, and (4) mouse tracking data is less reliable for model selection and evaluation. The contextual evaluation also reveals that, among the studied models, there is no single model that performs best on all the tested annotations.
Published: 2017
Full Text: View/download PDF

10. iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning

Author: Saeed Izadi, Ali Borji, and Laurent Itti
Subjects: Property (programming), business.industry, Computer science, Deep learning, Cognitive neuroscience of visual object recognition, 02 engineering and technology, 010501 environmental sciences, Object (computer science), 01 natural sciences, Convolutional neural network, Field (computer science), Set (abstract data type), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Computer vision, Artificial intelligence, business, Focus (optics), 0105 earth and related environmental sciences
Abstract: Tolerance to image variations (e.g., translation, scale, pose, illumination, background) is an important desired property of any object recognition system, be it human or machine. Moving towards increasingly bigger datasets has been trending in computer vision especially with the emergence of highly popular deep learning models. While being very useful for learning invariance to object inter-and intra-class shape variability, these large-scale wild datasets are not very useful for learning invariance to other parameters urging researchers to resort to other tricks for training models. In this work, we introduce a large-scale synthetic dataset, which is freely and publicly available, and use it to answer several fundamental questions regarding selectivity and invariance properties of convolutional neural networks. Our dataset contains two parts: a) objects shot on a turntable: 15 categories, 8 rotation angles, 11 cameras on a semi-circular arch, 5 lighting conditions, 3 focus levels, variety of backgrounds (23.4 per instance) generating 1320 images per instance (about 22 million images in total), and b) scenes: in which a robotic arm takes pictures of objects on a 1:160 scale scene. We study: 1) invariance and selectivity of different CNN layers, 2) knowledge transfer from one object category to another, 3) systematic or random sampling of images to build a train set, 4) domain adaptation from synthetic to natural scenes, and 5) order of knowledge delivery to CNNs. We also discuss how our analyses can lead the field to develop more efficient deep learning methods.
Published: 2016
Full Text: View/download PDF

11. Fixation prediction with a combined model of bottom-up saliency and vanishing point

Author: Mengyang Feng, Huchuan Lu, and Ali Borji
Subjects: FOS: Computer and information sciences, Computational model, Computer science, business.industry, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Eye movement, Top-down and bottom-up design, Gaze, Visualization, Visual processing, Fixation (visual), Computer vision, Artificial intelligence, Vanishing point, business
Abstract: By predicting where humans look in natural scenes, we can understand how they perceive complex natural scenes and prioritize information for further high-level visual processing. Several models have been proposed for this purpose, yet there is a gap between best existing saliency models and human performance. While many researchers have developed purely computational models for fixation prediction, less attempts have been made to discover cognitive factors that guide gaze. Here, we study the effect of a particular type of scene structural information, known as the vanishing point, and show that human gaze is attracted to the vanishing point regions. We record eye movements of 10 observers over 532 images, out of which 319 have vanishing points. We then construct a combined model of traditional saliency and a vanishing point channel and show that our model outperforms state of the art saliency models using three scores on our dataset., Comment: arXiv admin note: text overlap with arXiv:1512.01722
Published: 2016
Full Text: View/download PDF

12. Studying the added value of computational saliency in objective image quality assessment

Author: Fuzheng Yang, Ali Borji, Ping Jiang, Wei Zhang, and Hantao Liu
Subjects: business.industry, Computer science, Image quality, media_common.quotation_subject, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Context (language use), Machine learning, computer.software_genre, Metric (mathematics), Added value, Eye tracking, Visual attention, Computer vision, Quality (business), Artificial intelligence, business, computer, media_common
Abstract: Advances in image quality assessment have shown the potential added value of including visual attention aspects in objective quality metrics. Numerous models of visual saliency are implemented and integrated in different quality metrics; however, their ability of improving a metric's performance in predicting perceived image quality is not fully investigated. In this paper, we conduct an exhaustive comparison of 20 state-of-the-art saliency models in the context of image quality assessment. Experimental results show that adding computational saliency is beneficial to quality prediction in general terms. However, the amount of performance gain that can be obtained by adding saliency in quality metrics highly depends on the saliency model and on the metric.
Published: 2014
Full Text: View/download PDF

13. Human vs. Computer in Scene and Object Recognition

Author: Ali Borji and Laurent Itti
Subjects: business.industry, Computer science, Color normalization, Generalization, 3D single-object recognition, Cognitive neuroscience of visual object recognition, Scene statistics, Image-based modeling and rendering, Machine learning, computer.software_genre, Histogram of oriented gradients, Bag-of-words model in computer vision, Histogram, Computer vision, Artificial intelligence, business, computer
Abstract: Several decades of research in computer and primate vision have resulted in many models (some specialized for one problem, others more general) and invaluable experimental data. Here, to help focus research efforts onto the hardest unsolved problems, and bridge computer and human vision, we define a battery of 5 tests that measure the gap between human and machine performances in several dimensions (generalization across scene categories, generalization from images to edge maps and line drawings, invariance to rotation and scaling, local/global information with jumbled images, and object recognition performance). We measure model accuracy and the correlation between model and human error patterns. Experimenting over 7 datasets, where human data is available, and gauging 14 well-established models, we find that none fully resembles humans in all aspects, and we learn from each test which models and features are more promising in approaching humans in the tested dimension. Across all tests, we find that models based on local edge histograms consistently resemble humans more, while several scene statistics or "gist" models do perform well with both scenes and objects. While computer vision has long been inspired by human vision, we believe systematic efforts, such as this, will help better identify shortcomings of models and find new paths forward.
Published: 2014
Full Text: View/download PDF

14. Analysis of Scores, Datasets, and Models in Visual Saliency Prediction

Author: Laurent Itti, Dicky N. Sihite, Ali Borji, and Hamed R. Tavakoli
Subjects: Computer science, Salience (neuroscience), business.industry, Visual attention, Eye movement, Artificial intelligence, Fixation (psychology), Machine learning, computer.software_genre, business, computer, Visual saliency
Abstract: Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confounding factors. In this study, we pursue a critical and quantitative look at challenges (e.g., center-bias, map smoothing) in saliency modeling and the way they affect model accuracy. We quantitatively compare 32 state-of-the-art models (using the shuffled AUC score to discount center-bias) on 4 benchmark eye movement datasets, for prediction of human fixation locations and scan path sequence. We also account for the role of map smoothing. We find that, although model rankings vary, some (e.g., AWS, LG, AIM, and HouNIPS) consistently outperform other models over all datasets. Some models work well for prediction of both fixation locations and scan path sequence (e.g., Judd, GBVS). Our results show low prediction accuracy for models over emotional stimuli from the NUSEF dataset. Our last benchmark, for the first time, gauges the ability of models to decode the stimulus category from statistics of fixations, saccades, and model saliency values at fixated locations. In this test, ITTI and AIM models win over other models. Our benchmark provides a comprehensive high-level picture of the strengths and weaknesses of many popular models, and suggests future research directions in saliency modeling.
Published: 2013
Full Text: View/download PDF

15. Adaptive object tracking by learning background context

Author: Simone Frintrop, Dicky N. Sihite, Laurent Itti, and Ali Borji
Subjects: Rest (physics), Computer science, business.industry, Frame (networking), ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Representation (systemics), Context (language use), Object (computer science), Tracking (particle physics), Video tracking, Computer vision, Artificial intelligence, Particle filter, business
Abstract: One challenge when tracking objects is to adapt the object representation depending on the scene context to account for changes in illumination, coloring, scaling, etc. Here, we present a solution that is based on our earlier approach for object tracking using particle filters and component-based descriptors. We extend the approach to deal with changing backgrounds by using a quick training phase with user interaction at the beginning of an image sequence. During this phase, some background clusters are learned along with object representations for those clusters. Next, for the rest of the sequence the best fitting background cluster is determined for each frame and the corresponding object representation is used for tracking. Experiments show a particle filter adapting to background changes can efficiently track objects and persons in natural scenes and results in higher tracking results than the basic approach. Additionally, using an object tracker to follow the main character in video games, we were able to explain a large amount of eye fixations higher than other saliency models in terms of NSS score proving that tracking is an important top-down attention component.
Published: 2012
Full Text: View/download PDF

16. Exploiting local and global patch rarities for saliency detection

Author: Ali Borji and Laurent Itti
Subjects: Channel (digital image), Computer science, business.industry, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Pattern recognition, Color space, Visualization, Image (mathematics), RGB color space, Kadir–Brady saliency detector, Color model, Lab color space, RGB color model, Saliency map, Computer vision, Artificial intelligence, business, ComputingMethodologies_COMPUTERGRAPHICS
Abstract: We introduce a saliency model based on two key ideas. The first one is considering local and global image patch rarities as two complementary processes. The second one is based on our observation that for different images, one of the RGB and Lab color spaces outperforms the other in saliency detection. We propose a framework that measures patch rarities in each color space and combines them in a final map. For each color channel, first, the input image is partitioned into non-overlapping patches and then each patch is represented by a vector of coefficients that linearly reconstruct it from a learned dictionary of patches from natural scenes. Next, two measures of saliency (Local and Global) are calculated and fused to indicate saliency of each patch. Local saliency is distinctiveness of a patch from its surrounding patches. Global saliency is the inverse of a patch's probability of happening over the entire image. The final saliency map is built by normalizing and fusing local and global saliency maps of all channels from both color systems. Extensive evaluation over four benchmark eye-tracking datasets shows the significant advantage of our approach over 10 state-of-the-art saliency models.
Published: 2012
Full Text: View/download PDF

17. Boosting bottom-up and top-down visual features for saliency estimation

Author: Ali Borji
Subjects: Boosting (machine learning), business.industry, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Image processing, Pattern recognition, Image segmentation, Visualization, Support vector machine, Salience (neuroscience), Video tracking, Segmentation, Computer vision, AdaBoost, Artificial intelligence, business
Abstract: Despite significant recent progress, the best available visual saliency models still lag behind human performance in predicting eye fixations in free-viewing of natural scenes. Majority of models are based on low-level visual features and the importance of top-down factors has not yet been fully explored or modeled. Here, we combine low-level features such as orientation, color, intensity, saliency maps of previous best bottom-up models with top-down cognitive visual features (e.g., faces, humans, cars, etc.) and learn a direct mapping from those features to eye fixations using Regression, SVM, and AdaBoost classifiers. By extensive experimenting over three benchmark eye-tracking datasets using three popular evaluation scores, we show that our boosting model outperforms 27 state-of-the-art models and is so far the closest model to the accuracy of human model for fixation prediction. Furthermore, our model successfully detects the most salient object in a scene without sophisticated image processings such as region segmentation.
Published: 2012
Full Text: View/download PDF

18. Probabilistic learning of task-specific visual attention

Author: Laurent Itti, Ali Borji, and Dicky N. Sihite
Subjects: Visual search, business.industry, Computer science, Eye movement, Context (language use), Stimulus (physiology), Machine learning, computer.software_genre, Gaze, Task (project management), Visualization, Salience (neuroscience), Margin (machine learning), Video tracking, Visual attention, Artificial intelligence, business, computer
Abstract: Despite a considerable amount of previous work on bottom-up saliency modeling for predicting human fixations over static and dynamic stimuli, few studies have thus far attempted to model top-down and task-driven influences of visual attention. Here, taking advantage of the sequential nature of real-world tasks, we propose a unified Bayesian approach for modeling task-driven visual attention. Several sources of information, including global context of a scene, previous attended locations, and previous motor actions, are integrated over time to predict the next attended location. Recording eye movements while subjects engage in 5 contemporary 2D and 3D video games, as modest counterparts of everyday tasks, we show that our approach is able to predict human attention and gaze better than the state-of-the-art, with a large margin (about 15% increase in prediction accuracy). The advantage of our approach is that it is automatic and applicable to arbitrary visual tasks.
Published: 2012
Full Text: View/download PDF

19. Modeling the influence of action on spatial attention in visual interactive environments

Author: Laurent Itti, Dicky N. Sihite, and Ali Borji
Subjects: GAZE FIXATION, Focus (computing), Visual perception, Multimedia, Computer science, computer.software_genre, Test (assessment), Visualization, Eye position, Action (philosophy), Salience (neuroscience), Human–computer interaction, Mental state, Visual attention, computer
Abstract: A large number of studies have been reported on top-down influences of visual attention. However, less progress have been made in understanding and modeling its mechanisms in real-world tasks. In this paper, we propose an approach for learning spatial attention taking into account influences of physical actions on top-down attention. For this purpose, we focus on interactive visual environments (video games) which are modest real-world simulations, where a player has to attend to certain aspects of visual stimuli and perform actions to achieve a goal. The basic idea is to learn a mapping from current mental state of the game player, represented by past actions and observations, to its gaze fixation. A data-driven approach is followed where we train a model from the data of some players and test it over a new subject. In particular, two contributions this paper makes are: 1) employing multi-modal information including mean eye position, gist of a scene, physical actions, bottom-up saliency, and tagged events for state representation and 2) analysis of different methods of combining bottom-up and top-down influences. Comparing with other top-down task-driven and bottom-up spatio-temporal models, our approach shows higher NSS scores in predicting eye positions.
Published: 2012
Full Text: View/download PDF

20. Scene classification with a sparse set of salient regions

Author: Laurent Itti and Ali Borji
Subjects: Contextual image classification, Matching (graph theory), business.industry, Computer science, Feature extraction, ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION, Scale-invariant feature transform, Pattern recognition, Visualization, Set (abstract data type), Salient, Computer vision, Artificial intelligence, business
Abstract: This work proposes an approach for scene classification by extracting and matching visual features only at the focuses of visual attention instead of the entire scene. Analysis over a database of natural scenes demonstrates that regions proposed by the saliency-based model of visual attention are robust to image transformations. Using a nearest neighbor classifier and a distance measure defined over the salient regions, we obtained 97.35% and 78.28% classification rates with SIFT and C2 features from the HMAX model at 5 salient regions covering at most 31% of the image. Classification with features extracted from the entire image results in 99.3% and 82.32% using SIFT and C2 features, respectively. Comparing attentional and adhoc approaches shows that classification rate of the first approach is 0.95 of the second. Overall, our results prove that efficient scene classification, in terms of reducing the complexity of feature extraction is possible without a significant drop in performance.
Published: 2011
Full Text: View/download PDF

21. Simultaneous learning of spatial visual attention and physical actions

Author: Babak Nadjar Araabi, Majid Nili Ahmadabadi, and Ali Borji
Subjects: Class (computer programming), Computer science, business.industry, media_common.quotation_subject, Feature extraction, Cognitive neuroscience of visual object recognition, Machine learning, computer.software_genre, Visualization, Perception, Generalization (learning), Reinforcement learning, Artificial intelligence, business, computer, media_common
Abstract: This paper introduces a new method for learning top-down and task-driven visual attention control along with physical actions in interactive environments. Our method is based on the Reinforcement Learning of Visual Classes(RLVC) algorithm and adapts it for learning spatial visual selection in order to reduce computational complexity. Proposed algorithm also addresses aliasings due to not knowing previous actions and perceptions. Continuing learning shows our method is robust to perturbations in perceptual information. Our method also allows object recognition when class labels are used instead of physical actions. We have tried to gain maximum generalization while performing local processing. Experiments over visual navigation and object recognition tasks show that our method is more efficient in terms of computational complexity and is biologically more plausible.
Published: 2010
Full Text: View/download PDF

22. Learning sequential visual attention control through dynamic state space discretization

Author: Ali Borji, Majid Nili Ahmadabadi, and Babak Nadjar Araabi
Subjects: business.industry, Computer science, Cognitive neuroscience of visual object recognition, Attentional control, Sensory system, Mobile robot, Robotics, Saccadic masking, Saccade, State space, Visual attention, Motion planning, Artificial intelligence, Aliasing (computing), business
Abstract: Similar to humans and primates, artificial creatures like robots are limited in terms of allocation of their resources to huge sensory and perceptual information. Serial processing mechanisms used in the design of such creatures demands engineering attentional control mechanisms. In this paper, we present a new algorithm for learning top-down sequential visual attention control for agents acting in interactive environments. Our method is based on the key idea, that attention can be learned best in concert with visual representations through automatic construction and discretization of the visual state space. The tree representing the top-down attention is incrementally refined whenever aliasing occurs by selecting the most appropriate saccadic direction. The proposed approach is evaluated on action-based object recognition and urban navigation tasks, where obtained results support applicability and usefulness of developed saccade movement method for robotics.
Published: 2009
Full Text: View/download PDF

23. CLPSO-based Fuzzy Color Image Segmentation

Author: Ali Borji, A.M.E. moghadam, and Mandana Hamidi
Subjects: Adaptive neuro fuzzy inference system, Fuzzy classification, business.industry, Fuzzy number, Fuzzy set operations, Particle swarm optimization, Pattern recognition, Artificial intelligence, Image segmentation, business, Defuzzification, Fuzzy logic, Mathematics
Abstract: A new method for color image segmentation using fuzzy logic is proposed in this paper. Our aim here is to automatically produce a fuzzy system for color classification and image segmentation with least number of rules and minimum error rate. Particle swarm optimization is a sub class of evolutionary algorithms that has been inspired from social behavior of fishes, bees, birds, etc, that live together in colonies. We use comprehensive learning particle swarm optimization (CLPSO) technique to find optimal fuzzy rules and membership functions because it discourages premature convergence. Here each particle of the swarm codes a set of fuzzy rules. During evolution, a population member tries to maximize a fitness criterion which is here high classification rate and small number of rules. Finally, particle with the highest fitness value is selected as the best set of fuzzy rules for image segmentation. Our results, using this method for soccer field image segmentation in Robocop contests shows 89% performance. Less computational load is needed when using this method compared with other methods like ANFIS, because it generates a smaller number of fuzzy rules. Large train dataset and its variety, makes the proposed method invariant to illumination noise.
Published: 2007
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

23 results on '"Ali Borji"'

1. Human Attention in Image Captioning: Dataset and Analysis

2. Understanding and Visualizing Deep Visual Saliency Models

3. Salient Object Detection With Pyramid Attention and Salient Edges

4. Salient Object Detection Driven by Fixation Prediction

5. Detect Globally, Refine Locally: A Novel Approach to Saliency Detection

6. Learning to Promote Saliency Detectors

7. A Stagewise Refinement Model for Detecting Salient Objects in Images

8. Deeply Supervised Salient Object Detection with Short Connections

9. Saliency Revisited: Analysis of Mouse Movements Versus Fixations

10. iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning

11. Fixation prediction with a combined model of bottom-up saliency and vanishing point

12. Studying the added value of computational saliency in objective image quality assessment

13. Human vs. Computer in Scene and Object Recognition

14. Analysis of Scores, Datasets, and Models in Visual Saliency Prediction

15. Adaptive object tracking by learning background context

16. Exploiting local and global patch rarities for saliency detection

17. Boosting bottom-up and top-down visual features for saliency estimation

18. Probabilistic learning of task-specific visual attention

19. Modeling the influence of action on spatial attention in visual interactive environments

20. Scene classification with a sparse set of salient regions

21. Simultaneous learning of spatial visual attention and physical actions

22. Learning sequential visual attention control through dynamic state space discretization

23. CLPSO-based Fuzzy Color Image Segmentation

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Journal

Database

23 results on '"Ali Borji"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources