| Preface | 6 |
|---|
| Acknowledgments | 7 |
| Contents | 8 |
|---|
| Contributing Authors | 13 |
|---|
| Symbol Index | 15 |
|---|
| OVERVIEW OF BOOK | 17 |
|---|
| 1. Introduction | 17 |
| 2. Scope of Book | 18 |
| 3. Book Organization | 19 |
| STOCHASTIC CONTROL THEORY FOR SENSOR MANAGEMENT | 22 |
|---|
| 1. Introduction | 22 |
| 2. Markov Decision Problems | 25 |
| 2.1 Dynamic Programming | 27 |
| 2.2 Stationary Problems | 28 |
| 2.3 Algorithms for MDPs | 33 |
| 3. Partially Observed Markov Decision Problems | 34 |
| 3.1 MDP Representation of POMDPs | 36 |
| 3.2 Dynamic Programming for POMDPs | 39 |
| 4. Approximate Dynamic Programming | 41 |
| 5. Example | 42 |
| 6. Conclusion | 47 |
| INFORMATION THEORETIC APPROACHES TO SENSOR MANAGEMENT | 48 |
|---|
| 1. Introduction | 48 |
| 2. Background | 50 |
| 2.1 a-Entropy, a-Conditional Entropy, and a-Divergence | 51 |
| 2.2 Relations Between Information Divergence and Risk | 53 |
| 2.3 Fisher Information and Information Divergence | 55 |
| 3. Information-Optimal Policy Search | 55 |
| 4. Information Gain Via Classification Reduction | 58 |
| 5. A Near Universal Proxy | 59 |
| 6. Information Theoretic Sensor Management for Multi- target Tracking | 62 |
| 6.1 The Model Multi-target Tracking Problem | 63 |
| 6.2 R ´ enyi Divergence for Sensor Scheduling | 64 |
| 6.3 Multi-target Tracking Experiment | 65 |
| 6.4 On the Choice of | 65 |
| 6.5 Sensitivity to Model Mismatch | 66 |
| 6.6 Information Gain vs Entropy Reduction | 67 |
| 7. Terrain Classification in Hyperspectral Satellite Imagery | 68 |
| 7.1 Optimal Waveform Selection | 69 |
| 8. Conclusion and Perspectives | 72 |
| JOINT MULTI-TARGET PARTICLE FILTERING | 73 |
|---|
| 1. Introduction | 73 |
| 2. The Joint Multi-target Probability Density | 76 |
| 2.1 General Bayesian Filtering | 78 |
| 2.2 Non-Linear Bayesian Filtering for a Single Target | 79 |
| 2.3 Accounting for Target Birth and Death | 80 |
| 2.4 Computing Renyi Divergence | 81 |
| 2.5 Sensor Modeling | 82 |
| 3. Particle Filter Implementation of JMPD | 85 |
| 3.1 The Single Target Particle Filter | 86 |
| 3.2 The Multi-target Particle Filter | 87 |
| 3.3 Permutation Symmetry and Improved Importance Densities for JMPD | 88 |
| 3.4 Multi-target Particle Proposal Via Individual Target Proposals | 89 |
| 3.5 Multi-target Particle Proposal Via Joint Sampling | 93 |
| 3.6 Partition Ordering | 95 |
| 3.7 Estimation | 97 |
| 3.8 Resampling | 99 |
| 4. Multi-target Tracking Experiments | 99 |
| 4.1 Adaptive Proposal Results | 100 |
| 4.2 Partition Swapping | 103 |
| 4.3 The Value of Not Thresholding | 103 |
| 4.4 Unknown Number of Targets | 104 |
| 5. Conclusions | 105 |
| POMDP APPROXIMATION USING SIMULATION AND HEURISTICS | 108 |
|---|
| 1. Introduction | 108 |
| 2. Motivating Example | 110 |
| 3. Basic Principle: Q-value Approximation | 111 |
| 3.1 Optimal Policy | 111 |
| 3.2 Q-values | 112 |
| 3.3 Stationary policies | 113 |
| 3.4 Receding horizon | 113 |
| 3.5 Approximating Q-values | 113 |
| 4. Control Architecture | 114 |
| 4.1 Controller | 115 |
| 4.2 Measurement filter | 115 |
| 4.3 Action selector | 116 |
| 5. Q-value Approximation Methods | 117 |
| 5.1 Basic approach | 117 |
| 5.2 Monte Carlo sampling | 117 |
| 5.3 Relaxation of optimization problem | 118 |
| 5.4 Heuristic approximation | 119 |
| 5.5 Parametric approximation | 120 |
| 5.6 Action-sequence approximations | 122 |
| 5.7 Rollout | 123 |
| 5.8 Parallel rollout | 124 |
| 5.9 Control architecture in the Monte Carlo case | 124 |
| 5.10 Belief-state simplification | 127 |
| 5.11 Reward surrogation | 128 |
| 6. Simulation Result | 129 |
| 7. Summary and Discussion | 131 |
| MULTI-ARMED BANDIT PROBLEMS | 133 |
|---|
| 1. Introduction | 133 |
| 2. The Classical Multi-armed Bandit | 134 |
| 2.1 Problem Formulation | 135 |
| 2.2 On Forward Induction | 137 |
| 2.3 Key Features of the Classical MAB Problem and the Nature of its Solution | 139 |
| 2.4 Computational Issues | 142 |
| 3. Variants of the Multi-armed Bandit Problem | 146 |
| 3.1 Superprocesses | 146 |
| 3.2 Arm-acquiring Bandits | 149 |
| 3.3 Switching Penalties | 150 |
| 3.4 Multiple Plays | 152 |
| 3.5 Restless Bandits | 154 |
| 3.6 Discussion | 159 |
| 4. Example | 160 |
| 5. Chapter Summary | 163 |
| APPLICATION OF MULTI-ARMED BANDITS TO SENSOR MANAGEMENT | 164 |
|---|
| 1. Motivating Application and Overview | 164 |
| 1.1 Introduction | 164 |
| 1.2 SM Example of Multi-armed Bandit | 165 |
| 1.3 Organization and Notation of This Chapter | 166 |
| 2. Application to Sensor Management | 166 |
| 2.1 Application of the Classical MAB | 167 |
| 2.2 Single Sensor with Multiple Modes | 170 |
| 2.3 Detecting New Targets | 171 |
| 2.4 Sensor Switching Delays | 171 |
| 2.5 Multiple Sensors | 172 |
| 2.6 Application of Restless Bandits | 173 |
| 3. Example Application | 173 |
| 3.1 MAB Formulation of SM Tracking Problem | 174 |
| 3.2 Index Rule Solution of MAB | 175 |
| 3.3 Numerical Results and Comparison to Other Solutions | 177 |
| 4. Summary and Discussion | 184 |
| ACTIVE LEARNING AND SAMPLING | 187 |
|---|
| 1. Introduction | 187 |
| 1.1 Some Motivation Examples | 188 |
| 2. A Simple One-dimensional Problem | 189 |
| 3. Beyond 1d - Piecewise Constant Function Estimation |