Recent advances in sensor technology have led to an increased availability of hyperspectral remote sensing data at very high both spectral and spatial resolutions. Many techniques are developed to explore the spectral information and the spatial information of these data. In particular, feature extraction (FE) aimed at reducing the dimensionality of hyperspectral data while keeping as much spectral information as possible is one of methods to preserve the spectral information, while morphological profile analysis is the most popular methods used to explore the spatial information. Hyperspectral sensors collect information as a set of images represented by hundreds of spectral bands. While offering much richer spectral information than regular RGB and multispectral images, the high dimensional hyperspectal data creates also a challenge for traditional spectral data processing techniques. Conventional classification methods perform poorly on hyperspectral data due to the curse of dimensionality (i.e. the Hughes phenomenon: for a limited number of training samples, the classification accuracy decreases as the dimension increases). Classification techniques in pattern recognition typically assume that there are enough training samples available to obtain reasonably accurate class descriptions in quantitative form. However, the assumption that enough training samples are available to accurately estimate the class description is frequently not satisfied for hyperspectral remote sensing data classification, because the cost of collecting ground-truth of observed data can be considerably difficult and expensive. In contrast, techniques making accurate estimation by using only small training samples can save time and cost considerably. The small sample size problem therefore becomes a very important issue for hyperspectral image classification. Very high-resolution remotely sensed images from urban areas have recently become available. The classification of such images is challenging because urban areas often comprise a large number of different surface materials, and consequently the heterogeneity of urban images is relatively high. Moreover, different information classes can be made up of spectrally similar surface materials. Therefore, it is important to combine spectral and spatial information to improve the classification accuracy. In particular, morphological profile analysis is one of the most popular methods to explore the spatial information of the high resolution remote sensing data. When using morphological profiles (MPs) to explore the spatial information for the classification of hyperspectral data, one should consider three important issues. Firstly, classical morphological openings and closings degrade the object boundaries and deform the object shapes, while the morphological profile by reconstruction leads to some unexpected and undesirable results (e.g. over-reconstruction). Secondly, the generated MPs produce high-dimensional data, which may contain redundant information and create a new challenge for conventional classification methods, especially for the classifiers which are not robust to the Hughes phenomenon. Last but not least, linear features, which are used to construct MPs, lose too much spectral information when extracted from the original hyperspectral data. In order to overcome these problems and improve the classification results, we develop effective feature extraction algorithms and combine morphological features for the classification of hyperspectral remote sensing data. The contributions of this thesis are as follows. As the first contribution of this thesis, a novel semi-supervised local discriminant analysis (SELD) method is proposed for feature extraction in hyperspectral remote sensing imagery, with improved performance in both ill-posed and poor-posed conditions. The proposed method combines unsupervised methods (Local Linear Feature Extraction Methods (LLFE)) and supervised method (Linear Discriminant Analysis (LDA)) in a novel framework without any free parameters. The underlying idea is to design an optimal projection matrix, which preserves the local neighborhood information inferred from unlabeled samples, while simultaneously maximizing the class discrimination of the data inferred from the labeled samples. Our second contribution is the application of morphological profiles with partial reconstruction to explore the spatial information in hyperspectral remote sensing data from the urban areas. Classical morphological openings and closings degrade the object boundaries and deform the object shapes. Morphological openings and closings by reconstruction can avoid this problem, but this process leads to some undesirable effects. Objects expected to disappear at a certain scale remain present when using morphological openings and closings by reconstruction, which means that object size is often incorrectly represented. Morphological profiles with partial reconstruction improve upon both classical MPs and MPs with reconstruction. The shapes of objects are better preserved than classical MPs and the size information is preserved better than in reconstruction MPs. A novel semi-supervised feature extraction framework for dimension reduction of generated morphological profiles is the third contribution of this thesis. The morphological profiles (MPs) with different structuring elements and a range of increasing sizes of morphological operators produce high-dimensional data. These high-dimensional data may contain redundant information and create a new challenge for conventional classification methods, especially for the classifiers which are not robust to the Hughes phenomenon. To the best of our knowledge the use of semi-supervised feature extraction methods for the generated morphological profiles has not been investigated yet. The proposed generalized semi-supervised local discriminant analysis (GSELD) is an extension of SELD with a data-driven parameter. In our fourth contribution, we propose a fast iterative kernel principal component analysis (FIKPCA) to extract features from hyperspectral images. In many applications, linear FE methods, which depend on linear projection, can result in loss of nonlinear properties of the original data after reduction of dimensionality. Traditional nonlinear methods will cause some problems on storage resources and computational load. The proposed method is a kernel version of the Candid Covariance-Free Incremental Principal Component Analysis, which estimates the eigenvectors through iteration. Without performing eigen decomposition on the Gram matrix, our approach can reduce the space complexity and time complexity greatly. Our last contribution constructs MPs with partial reconstruction on nonlinear features. Traditional linear features, on which the morphological profiles usually are built, lose too much spectral information. Nonlinear features are more suitable to describe higher order complex and nonlinear distributions. In particular, kernel principal components are among the nonlinear features we used to built MPs with partial reconstruction, which led to significant improvement in terms of classification accuracies. The experimental analysis performed with the novel techniques developed in this thesis demonstrates an improvement in terms of accuracies in different fields of application when compared to other state of the art methods.