: Marina Axelson-Fisk
: Comparative Gene Finding Models, Algorithms and Implementation
: Springer-Verlag
: 9781849961042
: 1
: CHF 116.30
:
: Biologie
: English
: 304
: Wasserzeichen/DRM
: PC/MAC/eReader/Tablet
: PDF
Comparative genomics is a new and emerging ?eld, and with the explosion of ava- able biological sequences the requests for faster, more ef?cient and more robust algorithms to analyze all this data are immense. This book is meant to serve as a self-contained instruction of the state-of-the-art of computational gene ?nding in general and of comparative approaches in particular. It is meant as an overview of the various methods that have been applied in the ?eld, and a quick introduction into how computational gene ?nders are built in general. A beginner to the ?eld could use this book as a guide through to the main points to think about when constructing a gene ?nder, and the main algorithms that are in use. On the other hand, the more experienced gene ?nder should be able to use this book as a reference to different methods and to the main components incorporated in these methods. I have focused on the main uses of the covered methods and avoided much of the technical details and general extensions of the models. In exchange I have tried to supply references to more detailed accounts of the different research areas touched upon. The book, however, makes no claim on being comprehensive.
Preface7
Acknowledgments9
Contents10
Acronyms14
Introduction15
Some Basic Genetics15
The Central Dogma17
The Structure of a Gene19
How Many Genes Do We Have?21
Problems of Gene Definitions25
The Gene Finding Problem26
Comparative Gene Finding28
History of Algorithm Development29
To Build a Gene Finder32
References35
Single Species Gene Finding41
Hidden Markov Models (HMMs)41
Markov Chains42
Discrete-Time Markov Chains42
Stationarity and Reversibility48
Continuous-Time Markov Chains50
Hidden Markov Models53
Dynamic Programming56
Silent Begin and End States58
The Forward Algorithm59
The Backward Algorithm59
The Viterbi Algorithm61
EasyGene: A Prokaryotic Gene Finder63
Posterior Decoding65
Statistical Significance of Predictions65
Generalized Hidden Markov Models (GHMMs)66
Preliminaries66
The Forward and Backward Algorithms68
The Forward Variables68
The Backward Variables70
The Viterbi Algorithm70
Genscan: A GHMM-Based Gene Finder71
Sequence Generation Algorithm74
Reducing Computational Complexity74
Exon Probabilities78
Interpolated Markov Models (IMMs)81
Preliminaries81
Linear and Rational Interpolation82
GLIMMER: A Microbial Gene Finder83
Gene Prediction84
Training the IMM85
GlimmerM86
Neural Networks86
Biological Neurons87
Artificial Neurons and the Perceptron88
Multi-Layer Neural Networks90
GRAIL: A Neural Network-Based Gene Finder91
Decision Trees93
Classification94
Decision Tree Learning95
MORGAN: A Decision Tree-Based Gene Finder99
References100
Sequence Alignment103
Pairwise Sequence Alignment103
Dot Plot Matrix105
Nucleotide Substitution Models106
The Jukes-Cantor Model108
The Kimura Model109
The Felsenstein Model110
The Tamura and Nei Model111
General Time-Reversible (GTR) Model111
Amino Acid Substitution Models112
The PAM Matrix113
The BLOSUM Matrix117
The GONNET matrix120
Gap Models120
The Needleman-Wunsch Algorithm122
Needleman-Wunsch Using Affine Gaps124
The Smith-Waterman Algorithm126
Pair Hidden Markov Models (PHMMs)128
Preliminaries128
The Forward, Backward, and Viterbi Algorithms130
Database Similarity Searches132
FASTA132
BLAST134
Gapped BLAST136
PSI-BLAST136
The Significance of Alignment Scores137
Multiple Sequence Alignment138
Scoring Schemes139
Sum-of-Pairs (SP)141
Weighted Sum-of-Pairs (WSP)141
Minimum Entropy141
Gap Costs142
Phylogenetic Trees143
The Neighbor-Joining Method143
Fitch-Margoliash144
Dynamic Programming145
The MSA Package145
Progressive Alignments147
Iterative Methods150
Hidden Markov Models153
SAM-Sequence Alignment and Modeling153
Genetic Algorithms155
Simulated Annealing158
Alignment Profiles161
Standard Profiles161
Profile HMMs163
Scoring a New Sequence164
References165
Comparative Gene Finding171
Similarity-Based Gene Finding171
GenomeScan: GHMM-Based Gene Finding Using Homology172
Twinscan: GHMM-Based Gene Finding Using Informant Sequences174
Heuristic Cross-Species Gene Finding176
ROSETTA176
Pair Hidden Markov Models (PHMMs)177
DoubleScan: A PHMM-Based Comparative Gene Finder178
The State Space178
The Stepping Stone Algorithm180
Generalized Pair Hidden Markov Models (GPHMMs)181
Preliminaries181
The Forward, Backward and Viterbi Algorithms182
SLAM: A GPHMM-Based Comparative Gene Finder184
The State Space184
Reducing Computational Complexity186
Gene Mapping188
Projector: A Gene Mapping Tool188
GeneMapper-Reference Based Annotation189
Multiple Sequence Gene Finding190
N-SCAN: A Multiple Informant-Based Gene Finder191
References193
Gene Structure Submodels195
The State Space195
The Exon States196
Splice Sites198
Introns and Intergenic Regions199
Untranslated Regions (UTRs)200
Promoters and PolyA-signals201
State Length Distributions202
Geometric and Negative Binomial Lengths202
Empirical Length Distributions205
Acyclic Discrete Phase Type Distributions206
Sequence Content Sensors210
GC-Content Binning210
Start Codon Recognition211
Codon and Amino Acid Usage212
K-Tuple Freq