Spring 2002

 Seminar in Biotechnology

 Douglas W. Smith

2130 Bonner Hall

BILD94

 5254 Muir Biology Building

 Suresh Subramani, Instructor

 x42620; dsmith@ucsd.edu

 

 

Introduction to Bioinformatics

 

Web page: http://elcapitan.ucsd.edu/bild94/

Search Google et al for "bild94 bioinformatics"

 


| DNASYSTEM | Lecture |


Topics:

I. What is Bioinformatics?
 
II. Sequence Databases and Their Use
A. Primary Sequence Databases
B. Uses of Sequence Databases
C. Retrieve Information from Sequence Databases
D. Analysis using Sequences: finding Homologues
1. BLAST: Basics, Filters, Variations
2. FASTA
E. Analysis using Sequences: finding Genes in DNA
1. Grail
2. BCM Gene Finder - BCM Search Launcher
3. GeneMark
F. Analysis using Sequences: finding Motifs
1. Motifs
2. Protein Family Classifications
G. Multiple Sequence Analysis
1. Clustal W
2. Web Databases of Multiple Sequence Alignments
H. Phylogenetics
1. Basics and Methods
2. Confidence Levels
 
III. Whole Genomes
A. Implications
B. TIGR
 
IV. Organism and Other Databases
A. Need for Organism Databases
B. Paradigm: ACeDB
C. Advantages of ACeDB for Organism DBs
D. ACeDB on Web: WebAce and AceBrowser
1. ACeDB: Disadvantages
2. WebAce and AceBrowser
E. Example of ACeDB: DictyDB
1. Graphics, Text Displays
2. WebAce and AceBrowser
F. Other Web Organism Databases
 
V. Problems ... Directions to Go
A. Problems
B. Need: "smart" Analysis Packages
C. "Training" of "smart" Analysis Packages
 
VI. Additional Materials
A. Books
B. Recent Short Articles

 

 

 

I. What is Bioinformatics?

Bioinformatics:

 

Overall Aim of Bioinformatics:

 

Databases:

 

Software tools (computer programs):

 

Key words from some of the review articles listed at end:

 

The Need for Bioinformatics:

 

 

Bioinformatics is New, Hot, and Growing:

Chronology of Review Articles:

446 at PubMed for 'bioinformatics AND review',

(101 for 'bioinformatics AND review AND 2002')

257 for 'bioinformatics AND review AND 2001'
144 for 'bioinformatics AND review AND 2000'
67 for 'bioinformatics AND review AND 1999'
52 for 'bioinformatics AND review AND 1998'
18 for 'bioinformatics AND review AND 1997'
13 for 'bioinformatics AND review AND 1996'
6 for 'bioinformatics AND review AND 1995'
7 for 'bioinformatics AND review AND 1994'
1 for 'bioinformatics AND review AND 1993'
2 for 'bioinformatics AND review AND 1992'
0 for 'bioinformatics AND review AND 1991'

 

 

II. Sequence Databases and Their Use:

A. Primary Sequence Databases:

 

B. Uses of Sequence Databases:

 

C. Retrieve Info from Sequence Databases:

 

 

D. Sequence Analysis: finding Homologues

 

 

 

1. BLAST:

Basic BLAST Algorithm:

 

BLAST Programs:

 

Filters:

 

Variations on basic BLAST algorithm:

 

Output: example of BLASTP with filter, defaults here

 

 

 

 

2. FASTA:

Rapid, heuristic (not mathematically sound) global alignment

FASTA 4-Step Algorithm:

Output: example here

 

 

E. Sequence Analysis: finding Genes in DNA

Methods:

 

Grail:



BCM Gene Finder:



GeneMark: more info here.



Other Programs:

 

 

 

 

F. Sequence Analysis: finding Motifs

Motifs:

 

DNA Motifs:

 

Protein Motifs:

 

Basic Methods:

 

 

Protein Family Classifications

Prosite:

 

BLOCKS:

 

 

DOMO:

 

 

Pfam:

 

 

 

Protein Structural Classifications

 

SCOP:

 

 

CATH:

 

 

G. Multiple Sequence Analysis

Basics

 

 

Clustal W:

 

 

Other Web Programs

 

Web Databases of Multiple Sequence Alignments

 

 

 

 

H. Phylogenetics

Basics:

1, 2, 3, 4, 5:   Taxa or External Nodes (or OTUs)
X, Y, Z:         Internal Nodes
R1:              Root
a, b, c, d, e:   External Branches
f, g:            Internal Branches
h:               Internal Branch ONLY IF tree is Rooted;
                       else h is part of e
Outgroup:        Taxan 5 ... used to "root trees"

 

Methods:

 

 

 

 

 

Confidence - "How good is the Tree?"

 

 

 

 

III. Whole Genomes

A. Implications

 

 

 

B. TIGR - The Institute for Genomic Research

 

 

 

 

IV. Organism and Other Databases

A. Need for Organism Databases

 

B. Paradigm: ACeDB

 

C. Advantages of ACeDB for Organism DBs

 

D. ACeDB on Web: WebAce and AceBrowser

1. ACeDB - Disadvantages

 

2. However: WebAce and AceBrowser

 

 

 

E. Example of ACeDB: DictyDB

1. MacAce: Graphics, Text Displays

 

2. Web ACeDB: WebAce and AceBrowser at Cornell

 

 

 

 

F. Other Web Organism Databases

 

 

VI. Problems ... Directions to Go

A. Problems:

 

B. Need: "smart" Analysis Packages

 

 

C. "Training" of "smart" Analysis Packages:

 

 

 

VI. Additional Materials:

A. Books:

43 hits to "books" and "bioinformatics" at "amazon.com" in March, 2002 ...

 

1. "Bioinformatics: Sequence and Genome Analysis." David W. Mount. Cold Spring Harbor Press, 2001.

Recent, authoritative, most emphasis on Sequence Analysis, some on genomics and organismal databases.

 

2. "Bioinformatics : A Practical Guide to Analysis of Genes and Proteins". Second Edition. Ed., Andreas Baxevanis and B.F.Francis Ouellette. John Wiley, 2000.

Recent text emphasizing how to use Web ... database searches ... software tools available.

 

3. "Bioinformatics Basics: Applications in Biological Science and Medicine." Hooman Rashidi and Lukas Buehler. 1999.

Recent text providing introduction to many topics, by two UCSD Biology personnel.

 

4. "Computational Molecular Biology: An Algorithmic Approach." Pavel A. Pevzner. 1999.

Advanced computational approach to algorithms used in bioinformatics and molecular biology, by UCSD CSE professor.

 

5. "Biological Sequence Analysis." R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Cambridge University Press, 1998.

Recent text emphasizing automata and hidden Markov model (HMM) approaches

 

6. "Bioinformatics: The Machine Learning Approach." Pierre Baldi and Soren Brunak. MIT Press, 1998.

Another recent text emphasizing automata and HMMs

 

7. "Sequence Analysis Primer." Ed., Michael Gribskov and John Devereux. Oxford University Press, 1992.

Bit dated, but still good ... GCG program emphasis ... detailed sequence analysis example ... Gribskov now at SDSC.

 

New Bioinformatics books are now coming out regularly
Check at Amazon.com under "Books" and "bioinformatics"
for latest ... (amazon.com lists 5 books about to come out ...)

 

Others:

"Intro to Computational Molecular Biology." Joao Meidanis and Joao Carlos Setubal. PWS Publishing, Boston, 1997.

"Computational Methods in Molecular Biology." Ed., S. Salzberg, D. Searls, and S. Kasif. Elsevier Science, 1999.

"Intro to Computational Biology: Maps, Sequences, and Genomes." Michael S. Waterman. Chapman and Hall, 1997. Math and algorithm intensive.

"Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology." Dan Gusfield. Cambridge Univ Press, 1997. Definitive algorithm text ... very math intensive.

"The Secrets of Life: A Mathematician's Introduction to Molecular Biology." Ed., Eric S. Lander and Michael S. Waterman. Natl Acad Sci Press, 1998. Very readable ...

"DNA Sequencing : From Experimental Methods to Bioinformatics." Luke Alphey. Springer Verlag, 1997.

"Computer Methods for Macromolecular Sequence Analysis." Ed., Russell F. Doolittle. Methods of Enzymology, Vol 266. 1996.

"Computer Analysis of Sequence Data", parts I and II. Ed, Annette M. Griffin and Hugh G. Griffin. Humana Press, 1994.

"Biocomputing: Bioinformatics and Genome Projects." Ed, Douglas W. Smith. Academic Press, 1993.

"Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences." Ed., Russell F. Doolittle. Methods of Enzymology, Vol 183. 1990.

"Of URFs and ORFs: A Primer on How to Analyze Derived Amino Acid Sequences." Russell F. Doolittle. University Science Books. 1986.

 

 

B. Selected Recent Review Articles on Bioinformatics:

446 at PubMed for 'bioinformatics AND review',

101 for 'bioinformatics AND review AND 2002'
257 for 'bioinformatics AND review AND 2001'
144 for 'bioinformatics AND review AND 2000'
67 for 'bioinformatics AND review AND 1999'
52 for 'bioinformatics AND review AND 1998'
18 for 'bioinformatics AND review AND 1997'
13 for 'bioinformatics AND review AND 1996'
6 for 'bioinformatics AND review AND 1995'
7 for 'bioinformatics AND review AND 1994'
1 for 'bioinformatics AND review AND 1993'
2 for 'bioinformatics AND review AND 1992'
0 for 'bioinformatics AND review AND 1991'

"Molecular Biologist's Guide to Proteomics." Graves, PR, and Haystead, TA. 2002. Microbiol Mol Biol Rev 66: 39-63. [PubMed]

"A Genomic Regulatory Network for Development." Davidson, EH, et al (26 authors). 2002. Science 295:1669-1678. [PubMed]

"Systems Biology: A Brief Overview." Kitano, H. 2002. Science 295: 1662-1664. [PubMed]

"Biological data becomes computer literate: new advances in bioinformatics." Goodman, N. 2002. Curr Opin Biotechnol 13: 68-71. [PubMed]

"Insights into Protein Function through Large-Scale Computational Analysis of Sequence and Structure." Weir, M., Swindells, M., and Overington, J. 2001. Trends Biotechnol 19(10Suppl): S1-S6. [PubMed]

"Exploring the Protein Interactome using Comprehensive Two-Hybrid Projects." Ito, T., Chiba, T., and Yoshida, M. 2001. Trends Biotechnol 19(10Suppl): S23-S27. [PubMed]

"A Genomic View of Alternative Splicing." Modrek, B., and Lee, C. 2001. Nat Genet 30: 13-19. [PubMed]

"Recent Advances in Computational Genomics." Claverie, JM., Abergel, C., Audic, S., and Ogata, H. 2001. Pharmacogenomics 2: 361-372. [PubMed]

"The Impact of Microbial Genomics on Antimicrobial Drug Development." Tang, CM., and Moxon, ER. 2001. Annu Rev Genomics Hum Genet 2: 259-269. [PubMed]

"Bioinformatics Tools for Whole Genomes." Searls, DB. 2000. Annu Rev Genomics Hum Genet 1: 251-279. [PubMed]

"Of Mice and Genome Sequence." Hamilton, BA, and Frankel, WN. 2001. Cell 107: 13-16. [PubMed]

"A Tour of Structural Genomics." Brenner, SE. 2001. Nat Rev Genet 2: 801-809. [PubMed]

"Gene Expression Data Analysis." Brazma, A., and Vilo, J. 2001. Microbes Infect 3: 823-829. [PubMed]

"Analysing Gene Expression Data from DNA Microarrays to Identify Candidate Genes." Wu, TD. 2001. J Pathol 195: 53-65. [PubMed]

"What is Bioinformatics? A Proposed Definition and Overview of the Field." Luscombe, NM., Greenbaum, D., and Gerstein, M. 2001. Methods Inf Med 40: 346-358. [PubMed]

"Sequencing the entire genomes of free-living organisms: the foundation of pharmacology in the new millennium." Broder, S, and Venter, JC. 2000. Annu Rev Pharmacol Toxicol 40: 97-132. [PubMed]

"Protein function in the post-genomic era." Eisenberg, D, Marcotte, EM, Xenarios, I, and Yeates, TO. 2000. Nature 405: 823-826. [PubMed]

"Who's your neighbor? New computational approaches for functional genomics." Galperin, MY, and Koonin, EV. 2000. Nature Biotech 18: 609-613. [PubMed]

"Structural genomics: beyond the human genome project." Burley SK, et al. 1999. Nat Genet. 23: 151-157. [PubMed]

"Computational methods for the identification of differential and coordinated gene expression." Claverie JM. 1999. Hum Mol Genet. 8: 1821-1832. [PubMed]

"Multiple sequence alignment: algorithms and applications." Gotoh O. 1999. Adv Biophys. 36: 159-206. [PubMed]

"Mapping regulatory networks in microbial cells." VanBogelen RA, et al. 1999. Trends Microbiol. 7: 320-328. [PubMed]

"How will bioinformatics influence metabolic engineering?" Edwards JS and Palsson B. 1998. Biotechnol Bioeng. 58: 162-169. [PubMed] Bernhard Palsson group in Bioengineering.

"Computational aspects of expression data." Vingron M, et al. 1999. J Mol Med. 77: 3-7. [PubMed]

"Functional genomics: going forwards from the databases." Rastan S and Beeley LJ. 1997. Curr Opin Genet Dev. 7: 777-783. [PubMed]

"Informatics--genome and genetic databases." Ashburner M and Goodman N. 1997. Curr Opin Genet Dev. 7:750-756. [PubMed]

"Bioinformatics: from genome data to biological knowledge." Andrade MA and Sander C. 1997. Curr Opin Biotechnol. 8: 675-683. [PubMed]

"Functional Genomics - Bioinformatics is Ready for the Challenge." T.F. Smith. 1998. Trends Genet. 14: 291-293. [PubMed]

"Bioinformatics in a post-genomics age." Gershon et al. 1997. Nature 389: 417-422. [PubMed]

"Bioinformatics." Boguski MS. 1994. Curr Opin Genet Dev. 4: 383-388. [PubMed]

 

... and many more ...

 

 


| DNASYSTEM | Lecture |



If you have problems or comments, send email to Doug Smith