The salient region is the most important part of an image. The salient portion in images also attracts the most attention when people search for images in large-scale datasets. However, to improve image retrieval accuracy, considering only the most salient object in an image is insufficient because the background also influences the accuracy of image retrieval. To address this issue, this paper proposes a novel concept called the extended salient region (ESR). First, the salient region of an input image is detected using a Region Contrast (RC) algorithm. Then, a polar coordinate system is constructed; the centroid of the salient region is set as the pole. Next, the regions surrounding the salient region are determined by the neighboring regions, moving in a counterclockwise direction. The resulting combination of the salient region and its surrounding regions is defined as the ESR. We extract the visual content from the ESR using the well-known Bag of Words (BoW) model based on Gabor, SIFT and HSVH features and propose a graph model of the visual content nodes to represent the input image. Then, we design a novel algorithm to perform matching between two images. We also define a new similarity measure by combining the similarities of the salient region and the surrounding regions using weights. Finally, to better evaluate the image retrieval accuracy, an improved measure called the mean label average precision (MLAP) is proposed. The results of experiments on three benchmark datasets (Corel, TU Darmstadt, and Caltech 101) demonstrate that our proposed ESR model and region-matching algorithm are highly effective at image retrieval, and can achieve more accurate query results than current state-of-the-art methods. [ABSTRACT FROM AUTHOR]