About the editors ............................................ xiii
List of authors ................................................ xv
Preface ....................................................... xix
Acknowledgments ............................................. xxiii
List of symbols ............................................... xxv
List of abbreviations ....................................... xxvii
I. Introduction .............................................. 1
1. Machine learning techniques in remote sensing
data analysis ............................................... 3
Björn Waske, Mathieu Fauvel, Jon Atli Benediktsson
and Jocelyn Chanussot
1.1. Introduction .......................................... 3
1.1.1. Challenges in remote sensing .................. 3
1.1.2. General concepts of machine learning .......... 4
1.1.3. Paradigms in remote sensing ................... 6
1.2. Supervised classification: algorithms and
applications ......................................... 10
1.2.1. Bayesian classification strategy ............. 10
1.2.2. Neural networks .............................. 11
1.2.3. Support Vector Machines (SVM) ................ 13
1.2.4. Use of multiple classifiers .................. 17
1.3. Conclusion ........................................... 20
Acknowledgments ............................................ 21
References ................................................. 21
2. An introduction to kernel learning algorithms .............. 25
Peter V. Gehler and Bernhard Schölkopf
2.1. Introduction ......................................... 25
2.2. Kernels .............................................. 26
2.2.1. Measuring similarity with kernels ............ 26
2.2.2. Positive definite kernels .................... 27
2.2.3. Constructing the reproducing kernel
Hilbert space ................................ 29
2.2.4. Operations in RKHS ........................... 31
2.2.5. Kernel construction .......................... 32
2.2.6. Examples of kernels .......................... 33
2.3. The representer theorem .............................. 36
2.4. Learning with kernels ................................ 37
2.4.1. Support vector classification ................ 38
2.4.2. Support vector regression .................... 39
2.4.3. Gaussian processes ........................... 39
2.4.4. Multiple kernel learning ..................... 40
2.4.5. Structured prediction using kernels .......... 42
2.4.6. Kernel principal component analysis .......... 43
2.4.7. Applications of support vector algorithms .... 44
2.4.8. Available software ........................... 44
2.5. Conclusion ........................................... 45
References ................................................. 45
II. Supervised image classification ........................... 49
3. The Support Vector Machine (SVM) algorithm for supervised
classification of hyperspectral remote sensing data ........ 51
J. Anthony Gualtieri
3.1. Introduction ......................................... 52
3.2. Aspects of hyperspectral data and its acquisition .... 53
3.3. Hyperspectral remote sensing and supervised
classification ....................................... 56
3.4. Mathematical foundations of supervised
classification ....................................... 57
3.4.1. Empirical risk minimization .................. 58
3.4.2. General bounds for a new risk minimization
principle .................................... 58
3.4.3. Structural risk minimization ................. 61
3.5. From structural risk minimization to a support
vector machine algorithm ............................. 63
3.5.1. SRM for hyperplane binary classifiers ........ 63
3.5.2. SVM algorithm ................................ 64
3.5.3. Kernel method ................................ 66
3.5.4. Hyperparameters .............................. 68
3.5.5. A toy example ................................ 68
3.5.6. Multi-class classifiers ...................... 68
3.5.7. Data centring ................................ 69
3.6. Benchmark hyperspectral data sets .................... 70
3.6.1. The 4 class subset scene ..................... 70
3.6.2. The 16 class scene ........................... 71
3.6.3. The 9 class scene ............................ 71
3.7. Results .............................................. 72
3.7.1. SVM implementation ........................... 72
3.7.2. Effect of hyperparameter d ................... 72
3.7.3. Measure of accuracy of results ............... 73
3.7.4. Classifier results for the 4 class subset
scene and the 16 class full scene ............ 74
3.7.5. Results for the 9 class scene and
comparison of SVM with other classifiers ..... 74
3.7.6. Effect of training set size .................. 75
3.7.7. Effect of simulated noisy data ............... 75
3.8. Using spatial coherence .............................. 77
3.9. Why do SVMs perform better than other methods? ....... 78
3.10. Conclusions .......................................... 79
References ................................................. 79
4. On training and evaluation of SVM for remote sensing
applications ............................................... 85
Giles M. Foody
4.1. Introduction ......................................... 85
4.2. Classification for thematic mapping .................. 86
4.3. Overview of classification by a SVM .................. 88
4.4. Training stage ....................................... 90
4.4.1. General recommendations on sample size ....... 91
4.4.2. Training a SVM ............................... 94
4.4.3. Summary on training .......................... 97
4.5. Testing stage ........................................ 97
4.5.1. General issues in testing .................... 98
4.5.2. Specific issues for SVM classification ...... 103
4.6. Conclusion .......................................... 103
Acknowledgments ........................................... 104
References ................................................ 104
5. Kernel Fisher's Discriminant with heterogeneous kernels ... 111
M. Murat Dundar and Glenn Fung
5.1. Introduction ........................................ 111
5.2. Linear Fisher's Discriminant ........................ 112
5.3. Kernel Fisher Discriminant .......................... 114
5.3.1. Mathematical programming formulation ....... 114
5.4. Kernel Fisher's Discriminant with heterogeneous
kernels ............................................. 116
5.5. Automatic kernel selection KFD algorithm ............ 118
5.6. Numerical results ................................... 119
5.6.1. Dataset used: Purdue Campus data ............ 119
5.6.2. Classifier design ........................... 120
5.6.3. Analysis of the results ..................... 121
5.7. Conclusion .......................................... 123
References ................................................ 123
6. Multi-temporal image classification with kernels .......... 125
Jordi Muñoz-Marí, Luis Gómez-Chova, Manel
Martínez-Ramón, José Luis Rojo-Álvarez, Javier
Calpe-Maravilla and Gustavo Camps-Valls
6.1. Introduction ........................................ 126
6.1.1. Multi-temporal classification methods ....... 126
6.1.2. Change detection methods .................... 127
6.1.3. The proposed kernel-based framework ......... 128
6.2. Multi-temporal classification and change detection
with kernels ........................................ 129
6.2.1. Problem statement and notation .............. 129
6.2.2. Mercer's kernels properties ................. 130
6.2.3. Composite kernels for multi-temporal
classification .............................. 131
6.2.4. Composite kernels for change detection ...... 133
6.3. Contextual and multi-source data fusion with
kernels ............................................. 134
6.3.1. Composite kernels for integrating
contextual information ...................... 134
6.3.2. Composite kernels for dealing with
multi-source data ........................... 134
6.3.3. Remarks ..................................... 134
6.4. Multi-temporal/-source urban monitoring ............. 135
6.4.1. Model development and free parameter
selection ................................... 135
6.4.2. Data collection and feature extraction ...... 135
6.4.3. Multi-temporal image classification ......... 138
6.4.4. Change detection ............................ 138
6.4.5. Classification maps ......................... 141
6.5. Conclusions ......................................... 141
Acknowledgments ........................................... 143
References ................................................ 143
7. Target detection with kernels ............................. 147
Nasser M. Nasrabadi
7.1. Introduction ........................................ 147
7.2. Kernel learning theory .............................. 149
7.3. Linear subspace-based anomaly detectors and their
kernel versions ..................................... 150
7.3.1. Principal component analysis ................ 151
7.3.2. Kernel PCA subspace-based anomaly
detection ................................... 152
7.3.3. Fisher linear discriminant analysis ......... 154
7.3.4. Kernel fisher discriminant analysis ......... 154
7.3.5. Eigenspace separation transform ............. 156
7.3.6. Kernel eigenspace separation transform ...... 157
7.3.7. RX algorithm ................................ 159
7.3.8. Kernel RX algorithm ......................... 160
7.4. Results ............................................. 161
7.4.1. Simulated toy data .......................... 162
7.4.2. Hyperspectral imagery ....................... 163
7.5. Conclusion .......................................... 166
References ................................................ 166
8. One-class SVMs for hyperspectral anomaly detection ........ 169
Amit Banerjee, Philippe Burlina and Chris Diehl
8.1. Introduction ........................................ 169
8.2. Deriving the SVDD ................................... 172
8.2.1. The linear SVDD ............................. 172
8.2.2. The kernel-based SVDD ....................... 173
8.3. SVDD function optimization .......................... 176
8.4. SVDD algorithms for hyperspectral anomaly
detection ........................................... 177
8.4.1. Outline of algorithms ....................... 177
8.4.2. Dimensions for the background window ........ 179
8.4.3. Estimating sigma ............................ 179
8.4.4. Normalized SVDD test statistic .............. 181
8.5. Experimental results ................................ 183
8.6. Conclusions ......................................... 190
References ................................................ 191
III. Semi-supervised image classification .................... 193
9. A domain adaptation SVM and a circular validation
strategy for land-cover maps updating ..................... 195
Mattia Marconcini and Lorenzo Bruzzone
9.1. Introduction ........................................ 195
9.2. Literature survey ................................... 198
9.2.1. Learning under sample selection bias:
transductive and semi-supervised methods .... 198
9.2.2. Domain adaptation: partially-unsupervised
methods ..................................... 200
9.3. Proposed domain adaptation SVM ...................... 200
9.3.1. DASVM: problem definition and assumptions ... 201
9.3.2. DASVM: formulation .......................... 201
9.4. Proposed circular validation strategy ............... 208
9.4.1. Circular validation strategy: rationale ..... 208
9.4.2. Circular validation strategy: formulation ... 209
9.5. Experimental results ................................ 210
9.6. Discussions and conclusion .......................... 218
References ................................................ 219
10. Mean kernels for semi-supervised remote sensing
image classification ...................................... 223
Luis Gómez-Chova, Javier Calpe-Maravilla, Lorenzo
Bruzzone and Gustavo Camps-Valls
10.1. Introduction ........................................ 224
10.2. Semi-supervised classification with mean kernels .... 225
10.2.1. Learning from labelled samples .............. 225
10.2.2. Image clustering ............................ 226
10.2.3. Cluster similarity and the mean map ......... 226
10.2.4. Composite sample-cluster kernels ............ 228
10.2.5. Sample selection bias and the soft
mean map .................................... 229
10.2.6. Summary of composite mean kernel methods .... 231
10.3. Experimental results ................................ 232
10.3.1. Model development ........................... 232
10.3.2. Results on synthetic data ................... 232
10.3.3. Results on real data ........................ 233
10.4. Conclusions ......................................... 243
Acknowledgments ........................................... 243
References ................................................ 244
IV. Function approximation and regression .................... 247
11. Kernel methods for unmixing hyperspectral imagery ......... 249
Joshua Broadwater, Amit Banerjee and Philippe Burlina
11.1. Introduction ........................................ 249
11.2. Mixing models ....................................... 250
11.2.1. Areal mixtures .............................. 251
11.2.2. Intimate mixtures ........................... 251
11.3. Proposed kernel unmixing algorithm .................. 252
11.3.1. Support vector data description for
endmember extraction ........................ 254
11.3.2. Rate-distortion theory ...................... 255
11.3.3. Kernel fully constrained least squares
abundance estimates ......................... 256
11.3.4. Outline of full algorithm ................... 258
11.4. Experimental results of the kernel unmixing
algorithm ........................................... 258
11.4.1. RELAB data results .......................... 259
11.4.2. AVIRIS data results ......................... 261
11.4.3. Processing times ............................ 264
11.5. Development of physics-based kernels for
unmixing ............................................ 265
11.5.1. Simplification of the albedo to
reflectance transform ....................... 265
11.5.2. Kernel approximation of intimate
mixtures .................................... 265
11.6. Physics-based kernel results ........................ 266
11.7. Summary ............................................. 268
References ................................................ 268
12. Kernel-based quantitative remote sensing inversion ........ 271
Yanfei Wang, Changchun Yang and Xiaowen Li
12.1. Introduction ........................................ 272
12.2. Typical kernel-based remote sensing inverse
problems ............................................ 273
12.2.1. Aerosol inverse problems .................... 274
12.2.2. Land surface parameter retrieval problem .... 275
12.3. Well-posedness and ill-posedness .................... 276
12.4. Regularization ...................................... 278
12.4.1. Imposing a priori constraints on
the solution ................................ 278
12.4.2. Tikhonov variational regularization ......... 278
12.4.3. Direct regularization ....................... 282
12.4.4. Statistical regularization .................. 284
12.5. Optimization techniques ............................. 285
12.5.1. Sparse inversion in l1 space ................ 285
12.5.2. Optimization methods for l2 minimization
model ....................................... 286
12.6. Kernel-based BRDF model inversion ................... 288
12.6.1. Inversion by NTSVD .......................... 288
12.6.2. Tikhonov regularized solution ............... 288
12.6.3. Land surface parameter retrieval results .... 289
12.7. Aerosol particle size distribution function
retrieval ........................................... 293
12.8. Conclusion .......................................... 296
Acknowledgments ........................................... 296
References ................................................ 296
13. Land and sea surface temperature estimation by support
vector regression ......................................... 301
Gabriele Moser and Sebastiano B. Serpico
13.1. Introduction ........................................ 302
13.2. Previous work ....................................... 303
13.2.1. LST and SST estimation from satellite
data ........................................ 303
13.2.2. Parameter optimization and error modelling
for SVR ..................................... 305
13.3. Methodology ......................................... 306
13.3.1. SVR for LST and SST estimation .............. 306
13.3.2. Automatic parameter optimization for SVR .... 307
13.3.3. Pointwise statistical modelling the
SVR error ................................... 309
13.4. Experimental results ................................ 311
13.4.1. Data sets and experimental set-up ........... 311
13.4.2. Parameter-optimization results .............. 313
13.4.3. Results on the estimation of regression-
error variance .............................. 318
13.5. Conclusions ......................................... 320
Acknowledgments ........................................... 322
References ................................................ 322
V. Kernel-based feature extraction ......................... 327
14. Kernel multivariate analysis in remote sensing
feature extraction ........................................ 329
Jerónimo Arenas-García and Kaare Brandt Petersen
14.1. Introduction ........................................ 329
14.2. Multivariate analysis methods ...................... 332
14.2.1. Principal component analysis (PCA) .......... 333
14.2.2. Partial least squares ....................... 335
14.2.3. Canonical correlation analysis .............. 337
14.2.4. Orthonormalized partial least squares ....... 338
14.3. Kernel multivariate analysis ........................ 339
14.3.1. Kernel PCA .................................. 340
14.3.2. Kernel PLS .................................. 341
14.3.3. Kernel CCA .................................. 342
14.3.4. Kernel OPLS ................................. 343
14.3.5. Some considerations about Kernel MVA
methods ..................................... 344
14.4. Sparse Kernel OPLS .................................. 344
14.5. Experiments: pixel-based hyperspectral image
classification ...................................... 346
14.5.1. Data set description and experimental
setup ....................................... 346
14.5.2. Results description ......................... 347
14.6. Conclusions ......................................... 350
Acknowledgments ........................................... 351
References ................................................ 351
15. KPCA algorithm for hyperspectral target/anomaly
detection ................................................. 353
Yanfeng Gu
15.1. Introduction ........................................ 353
15.2. Motivation .......................................... 354
15.2.1. Feature extraction of hyperspectral
images ...................................... 354
15.2.2. Introducing KM for hyperspectral image
processing .................................. 355
15.2.3. Hyperspectral images for numerical
experiments ................................. 356
15.3. Kernel-based feature extraction in hyperspectral
images .............................................. 357
15.3.1. Principal component analysis ................ 357
15.3.2. Kernel mapping .............................. 358
15.3.3. Kernel Principal Component Analysis
(KPCA) ...................................... 358
15.4. Kernel-based target detection in hyperspectral
images .............................................. 360
15.4.1. The concept of target detection ............. 361
15.4.2. Invariant subpixel material detector ........ 361
15.4.3. Kernel invariant subpixel detection ......... 362
15.5. Kernel-based anomaly detection in hyperspectral
images .............................................. 364
15.5.1. The concept of anomaly detection ............ 364
15.5.2. RX detector ................................. 366
15.5.3. Selective KPCA Feature Extraction for
Anomaly Detection ........................... 367
15.6. Conclusions ......................................... 372
Acknowledgments ........................................... 372
References ................................................ 372
16. Remote sensing data classification with kernel
nonparametric feature extractions ......................... 375
Bor-Chen Kuo, Jinn-Min Yang and Cheng-Hsuan Li
16.1. Introduction ........................................ 376
16.2. Related feature extractions ......................... 377
16.2.1. Linear discriminant analysis ................ 377
16.2.2. Generalized discriminant analysis ........... 378
16.2.3. Nonparametric weighted feature extraction ... 380
16.2.4. Fuzzy linear feature extraction ............. 382
16.3. Kernel-based NWFE and FLFE .......................... 383
16.3.1. Kernel-based NWFE ........................... 383
16.3.2. Kernel-based FLFE ........................... 386
16.4. Eigenvalue resolution with regularization ........... 388
16.5. Experiments ......................................... 389
16.5.1. Data sets ................................... 389
16.5.2. Experiment design ........................... 392
16.5.3. Experiment results .......................... 392
16.6. Comments and conclusions ............................ 398
References ................................................ 398
Index ......................................................... 401
|