ebook ebooks e-book e-books downloaden bei MyEbooks.ch downloaden

Methods of Microarray Data Analysis II

:	Simon M. Lin, Kimberly F. Johnson (Eds.)
:	Methods of Microarray Data Analysis II
:	Kluwer Academic Publishers
:	9780306475986
:	1
:	CHF 96.80
:

:	Naturwissenschaft
:	English

:	214
:	DRM
:	PC/MAC/eReader/Tablet
:	PDF

Microarray technology is a major experimental tool for functional genomic explorations, and will continue to be a major tool throughout this decade and beyond. The recent explosion of this technology threatens to overwhelm the scientific community with massive quantities of data. Because microarray data analysis is an emerging field, very few analytical models currently exist. Methods of Microarray Data Analysis II is the second book in this pioneering series dedicated to this exciting new field. In a single reference, readers can learn about the most up-to-date methods, ranging from data normalization, feature selection, and discriminative analysis to machine learning techniques.

Currently, there are no standard procedures for the design and analysis of microarray experiments. Methods of Microarray Data Analysis II focuses on a single data set, using a different method of analysis in each chapter. Real examples expose the strengths and weaknesses of each method for a given situation, aimed at helping readers choose appropriate protocols and utilize them for their own data set. In addition, web links are provided to the programs and tools discussed in several chapters. This book is an excellent reference not only for academic and industrial researchers, but also for core bioinformatics/genomics courses in undergraduate and graduate programs.

Writ en for: Academic and industrial researchers

5 EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES (p. 81-82)

Charless Fowlkes1, Qun Shan2, Serge Belongie3, and Jitendra Malik1
Departments of Computer Science1 and Molecular Cell Biology 2, University of California at
Berkeley, Department of Computer Science and Engineering, University of California at San Diego3

Abstract: We have developed a program, GENECUT, for analyzing datasets from gene expression profiling. GENECUT is based on a pairwise clustering method known as Normalized Cut (Shi and Malik, 1997). GENECUT extracts global structures by progressively partitioning datasets into well-balanced groups, performing an intuitive k-way partitioning at each stage in contrast to commonly used 2-way partitioning schemes. By making use of the Nyström approximation, it is possible to perform clustering on very large genomic datasets.

Key words: gene expression profiles, clustering analysis, spectral partitioning

1. INTRODUCTION

DNA microarray technology empowers biologists to analyze thousands of mRNA transcripts in parallel, providing insights about the cellular states of tumor cells, the effect of mutations and knockouts, progression of the cell cycle, and reaction to environmental stresses or drug treatments. Gene expression profiles also provide the necessary raw data to interrogate cellular transcription regulation networks. Efforts have been made in identifying cis acting elements based on the assumption that co-regulated genes have a higher probability of sharing transcription factor binding sites. There is a well-recognized need for tools that allow biologists to explore public domain microarray datasets and integrate insights gained into their own research. One important approach for structuring the exploration of gene expression data is to find coherent clusters of both genes and experimental conditions. The association of unknown genes with functionally well-characterized genes will guide the formation of hypotheses and suggest experiments to uncover the function of these unknown genes. Similarly, experimental conditions that cluster together may affect the same regulatory pathway.

Unsupervised clustering is a classical data analysis problem that is still an active area of intensive research in the computer science and statistics communities (Ripley, 1996). Broadly speaking, the goal of clustering is to partition a set of feature vectors into k groups such that the partition is"good" according to some cost function. In the case of genes, the feature vector is usually the degree of induction or suppression over some set of experimental conditions. As of yet, there is no clear consensus as to which algorithms are most suitable for gene expression data.

Clustering methods generally fall into one of two categories: central or pairwise (Buhmann, 1995). Central clustering is based on the idea of prototypes, wherein one finds a small number of prototypical feature vectors to serve as"cluster centers". Feature vectors are then assigned to the most similar cluster center. Pairwise methods are based directly on the distances between all pairs of feature vectors in the data set. Pairwise methods don’t require one to solve for prototypes, which provides certain advantages over central methods. For example, when the shape of the clusters are not simple, compact clouds in feature space, central methods are ill-suited while pairwise methods perform well since similarity is allowed to propagate in a transitive fashion from neighbor to neighbor. A family of genes related by a series of small mutations might well exhibit this sort of structure, particularly when features are based on sequence data. Clustering algorithms can also often be characterized as greedy or global in nature. The agglomerative clustering method used by Eisen et al. (1998) to order microarray data is an example of a greedy pairwise method: it starts with a full matrix of pairwise distances, locates the smallest value, merges the corresponding pair, and repeats until the whole dataset has been merged into a single cluster. Because this type of process only considers the closest pair of data points at each step, global structure present in the data may not be handled properly.

	Contents	5
	Contributors	7
	Acknowledgements	9
	Preface	10
	INTRODUCTION	11
	CAMDA 2001 Data Sets	12
	Feature Selection and Extraction	12
	Clustering Strategies	13
	Modeling Complex Systems	14
	Ontologies, Semantic Understanding, and Functional Genomics	15
	A standard protocol?	16
	Web Companion	17
	1 AN INTRODUCTION TO DNA MICROARRAYS	18
	1. INTRODUCTION TO FUNCTIONAL GENOMICS	18
	2. MICROARRAY TECHNOLOGY	19
	3. MICROARRAY DATA	22
	4. MICROARRAY EXPERIMENT GOALS	23
	5. MICROARRAY EXPERIMENTAL DESIGN	25
	6. MICROARRAY DATA ANALYSIS	26
	7. RESULT VALIDATION	27
	7.1 Sample and Data Triage	27
	7.2 Statistical Validation	27
	7.3 Biological Validation	28
	8. CONCLUSION	28
	REFERENCES	29
	2 EXPERIMENTAL DESIGN FOR GENE MICROARRAY EXPERIMENTS AND DIFFERENTIAL EXPRESSION ANALYSIS	31
	1. INTRODUCTION	31
	2. DESIGN OF MICROARRAY EXPERIMENTS	32
	2.1 Biological variation	33
	2.2 Technological variations	34
	2.3 Microarray quality checklist	36
	3. EXPERIMENTAL DESIGNS THAT INCORPORATE BIOLOGICAL AND TECHNOLOGICAL VARIATION	37
	3.1 Block designs	37
	3.2 Randomization	38
	3.3 Loop designs	38
	3.4 Split plot designs	39
	3.5 Optimal designs	40
	4. DESIGN OF MICROARRAYS	41
	5. NORMALIZATION MODELS	42
	5.1 Data transformation and background removal	42
	5.2 Linear vs. non-linear effects	43
	5.3 Random vs. fixed effects	43
	5.4 Ordinary least squares vs. orthogonal regression	43
	5.5 Means vs. medians	43
	5.6 Self-consistency	44
	5.7 Flagging outliers	44
	6. DIFFERENTIAL EXPRESSION	44
	6.1 Error models	45
	6.2 Bayesian approach	45
	6.3 Adjustment for multiple comparisons and power considerations	46
	7. FINAL REMARKS	46
	ACKNOWLEDGEMENTS	47
	REFERENCES	47
	3 MICROARRAY DATA PROCESSING AND ANALYSIS	50
	1. INTRODUCTION	50
	2. DESIGN OF THE ARRAY	51
	3. DATA ACQUISITION AND IMAGE ANALYSIS	53
	4. NORMALISATION AND FILTERING	54
	5. DATA STORAGE	55
	6. ADDRESSING BIOLOGICAL QUESTIONS	57
	7. DATA ANALYSIS	58
	7.1 Two conditions comparison	58
	7.2 Multiple conditions comparison.	58
	7.3 Gene networks	64
	8. CONCLUSIONS AND FUTURE PROSPECTS	65
	REFERENCES	66
	4 BIOLOGY-DRIVEN CLUSTERING OF MICROARRAY DATA	71
	1. INTRODUCTION	71
	2. THE ANNOTATION PROBLEM	72
	2.1 Reannotating the Spots	72
	2.2 Finding Functional Categories	73
	3. PRELIMINARY ANALYSIS	76
	3.1 Data Preprocessing	76
	3.2 Updating Cell Line Classifications	78
	3.3 Choosing a Distance Metric	79
	4. CHROMOSOMAL CLUSTERING	80
	5. FUNCTIONAL CLUSTERING	82
	6. CONCLUSIONS	83
	ACKNOWLEDGEMENTS	85
	REFERENCES	85
	5 EXTRACTING GLOBAL STRUCTURE FROM GENE EXPRESSION PROFILES	86
	1. INTRODUCTION	86
	2. CLUSTERING WITH NORMALIZED CUT	88
	2.1 The NCut Criterion	88
	2.2 K-Way Partitioning	89
	2.3 Clustering Large Datasets	90
	3. RESULTS	90
	4. CONCLUSIONS	94
	REFERENCES	94
	6 SUPERVISED NEURAL NETWORKS FOR CLUSTERING CONDITIONS IN DNA ARRAY DATA AFTER REDUCING NOISE BY CLUSTERING GENE EXPRESSION PROFILES	96
	1. INTRODUCTION	96
	2. COMPARATIVE PERFORMANCES OF CLUSTERING METHODS	98
	2.1 Data set used	98
	2.2 Comparative runtimes	98
	2.3 Comparative accuracy	100
	2.4 Conclusions on comparative performances	101
	3. CLUSTERING OF CONDITIONS	102
	3.1 The problem of noisy patterns	102
	3.2 Clustering of conditions and noise reduction	103
	4. CONCLUSIONS	107
	ACKNOWLEDGEMENTS	107
	REFERENCES	107
	7 BAYESIAN DECOMPOSITION ANALYSIS OF GENE EXPRESSION IN YEAST DELETION MUTANTS	109
	1. INTRODUCTION	110
	1.1 The Development of Cancer	110
	1.2 Microarray Measurements and Analysis	110
<