|
PPG Home
Page
PPG
SNP Data A. Locus-Specific
Workbooks B. Timeline Workbooks C. UCSD
SNP ID Names D. UCSD
DNA Data Files E. Sequenom
DNA Data Files F. TCGA
DNA Data Files Z. Powerpoint
Presentation
DNASYSTEM
|
|
|
|
Unusual ?? |
AUG-site at 5001 bp |
CAP-site at 5001 bp |
CAP-site at 5001 bp |
Workbook |
| ABCB1.gbAUG.old |
ABCB1.htm |
||||
| ACE.gbAUG.old |
ACE.htm |
||||
| ACHE.gbAUG.old |
ACHE.htm |
||||
| ADD1.gbAUG.old |
ADD1.htm |
||||
| ADRA1B.gbAUG.old |
ADRA1B.htm |
||||
| ADRA1D.gbAUG.old |
ADRA1D.htm |
||||
| ADRA2A.gbAUG.old |
ADRA2A.htm |
||||
| ADRA2B.gbAUG.old |
ADRA2B.htm |
||||
| ADRA2C.gbAUG.old |
ADRA2C.htm |
||||
| ADRB1.gbAUG.old |
ADRB1.htm |
||||
| ADRB2.gbAUG.old |
|||||
| ADRB3.gbAUG.old |
ADRB3.gbCAP.old | ADRB3.faNeg.old |
ADRB3.htm |
||
| AGT.gbAUG.old |
AGT.htm |
||||
| AGTR1.gbAUG.old |
AGTR1.htm |
||||
| ANGPT1 |
ANGPT1.gbAUG.old |
ANGPT1.htm |
|||
| BCHE.gbAUG.old |
BCHE.htm |
||||
| CACNA1S.gbAUG.old |
CACNA1S.htm |
||||
| CBS.gbAUG.old |
CBS.htm |
||||
| CHAT.gbAUG.old |
CHAT.htm |
||||
| CHGA.gbAUG.old |
|||||
| CHGB.gbAUG.old |
|||||
| CHRM2.gbAUG.old |
CHRM2.htm |
||||
| CHRM3.gbAUG.old |
CHRM3.gbCAP.old |
CHRM3.fasta.old |
CHRM3.htm |
||
| CHRNA3.gbAUG.old |
CHRNA3.htm |
||||
| CHRNA5.gbAUG.old |
CHRNA5.htm |
||||
| CHRNA7.gbAUG.old |
CHRNA7.htm |
||||
| CHRNB4.gbAUG.old |
CHRNB4.htm |
||||
| COMT.gbAUG.old |
COMT.htm |
||||
| CTSL.gbAUG.old |
CTSL.htm |
||||
| CYB561.gbAUG.old |
CYB561.htm |
||||
| CYBA.gbAUG.old |
CYBA.htm |
||||
| CYP11B2.gbAUG.old |
CYP11B2.htm |
||||
| CYP3A4.gbAUG.old |
CYP3A4.htm |
||||
| DBH.gbAUG.old |
DBH.htm |
||||
| DRD1.gbAUG.old |
DRD1.htm |
||||
| DRD1IP |
DRD1IP.gbAUG.old |
DRD1IP.gbCAP.old |
DRD1IP.faNeg.old |
DRD1IP.htm |
|
| FMO2.gbAUG.old |
FMO2.htm |
||||
| FMO3 |
FMO3.gbAUG.old |
FMO3.htm |
|||
| GNAS.gbAUG.old |
GNAS.htm |
||||
| GNB3.gbAUG.old |
GNB3.htm |
||||
| GPRK2L |
GRK4.gbAUG.old |
GRK4.htm |
|||
| GSTT1.gbAUG.old |
GSTT1.htm |
||||
| HSD11B1.gbAUG.old |
HSD11B1.htm |
||||
| HSD11B2.gbAUG.old |
HSD11B2.htm |
||||
| ITGAL.gbAUG.old |
ITGAL.htm |
||||
| KCNA5.gbAUG.old |
KCNA5.htm |
||||
| KCNB1.gbAUG.old |
KCNB1.htm |
||||
| KCNMB1.gbAUG.old |
KCNMB1.htm |
||||
| KLK1.gbAUG.old |
KLK1.htm |
||||
| MAOA.gbAUG.old |
MAOA.htm |
||||
| MAOB.gbAUG.old |
MAOB.htm |
||||
| MTHFR.gbAUG.old |
MTHFR.htm |
||||
| MTR.gbAUG.old |
MTR.htm |
||||
| NET1.gbAUG.old |
NET1.htm |
||||
| NOS3.gbAUG.old |
NOS3.htm |
||||
| NPY.gbAUG.old |
NPY.htm |
||||
| NPY1R.gbAUG.old |
NPY1R.htm |
||||
| NPY2R.gbAUG.old |
NPY2R.htm |
||||
| NR3C2.gbAUG.old |
NR3C2.htm |
||||
| PEMT |
PEMT.gbAUG.old |
PEMT.gbCAP.old |
PEMT.faNeg.old |
PEMT.htm |
|
| PNMT.gbAUG.old |
PNMT.htm |
||||
| PYY.gbAUG.old |
PYY.htm |
||||
| REN.gbAUG.old |
REN.htm |
||||
| RGS1.gbAUG.old |
RGS1.htm |
||||
| SCG2.gbAUG.old |
SCG2.htm |
||||
| SLC18A1.gbAUG.old |
SLC18A1.htm |
||||
| SLC18A2.gbAUG.old |
SLC18A2.htm |
||||
| SLC9A3.gbAUG.old |
SLC9A3.htm |
||||
| SLC9A3R1.gbAUG.old |
SLC9A3R1.htm |
||||
| RGSPX1 |
SNX13.gbAUG.old |
SNX13.htm |
|||
| TH.gbAUG.old |
TH.htm |
||||
| XDH.gbAUG.old |
XDH.htm |
Files for each Locus of interest include: 1) six Locus-specific Sequence Files, and 2) a Locus-specific Excel spreadsheet
1. Locus-specific Sequence files
01.06.2005:
Locus-specific sequence files now include six text files.
Unless there are "unusual" Locus properties, these include two
copies, an old (*.old files) and a new (*.doc files), of each of three
files:
1) LOCUS.gbAUG.doc: This text file is a GenBank-annotated
sequence file for the locus DNA sequence plus varying amounts of sequence
5' (upstream) and 3' (downstream) of the locus. In the usual case, the
AUG translation protein start site is at position 5001. The annotation
is typical GenBank description information, including dbSNP SNP information
and all Exon-Intron junction position information.
If the locus is transcribed off the complementary DNA strand to that of
the NP sequence, the LOCUS.gbAUG.doc file shows annotation and sequence
of the complementary (or NEGative) strand and the corresponding FASTA-formatted
file is called LOCUS.faNeg.doc
2) LOCUS.gbCAP.doc: This
text file is a GenBank-annotated sequence file for the locus DNA sequence
plus varying amounts of sequence 5' (upstream) and 3' (downstream) of
the locus. In the usual case, the CAP transcription mRNA start site is
at position 5001. The sequence and annotation in this file is thus identical
to that found in the LOCUS.gbAUG.doc file except for a translation of
coordinate system. The annotation is typical GenBank description information,
including dbSNP SNP information and all Exon-Intron junction position
information.
If the locus is transcribed off the complementary DNA strand to that of
the NP sequence, the LOCUS.gbAUG.doc file shows annotation and sequence
of the complementary (or NEGative) strand and the corresponding FASTA-formatted
file is called LOCUS.faNeg.doc
3) LOCUS.fasta.doc: This
text file is a FASTA-formatted sequence file of the same sequence as found
in the Genbank-annotated sequence file with position 5001 located at the
CAP transcription start site, the LOCUS.gbCAP.doc file.
If the locus is transcribed off the complementary DNA strand to that of
the NP sequence, the LOCUS.fasta.doc file shows the sequence of the complementary
(or NEGative) strand and the file is called LOCUS.faNeg.doc
These sequence files provide a coordinate system that completely covers the gene with additional ~5000 bp sequence 5' of the gene and additional ~2000 bp 3' of the sequence. The annotation in the LOCUS.gb.old file includes all dbSNP SNPs with alleles and position. Thus, one can easily use these files to place any given SNP relative to other structural features of the Locus and relative to dbSNP SNPs. The LOCUS.fasta.old file provides a FASTA-formatted version of the gb.old sequence, for convenient use in BLAST2SEQS and other analysis programs.
Old (*.old) files vs New (*.doc) Sequence Files: Two versions of each of the three types of Sequence Files are available. The first is the current file (current at time of the latest update) and is the *.doc file. The second is the immediately previous file (file that was updated at time of the latest update) and is the *.old file.
It turns out that NCBI continues to update the Human Genome coordinate system via new Genome Assemblies; this update occurs roughly every three months! The result is a new version of the current RefSeq NT genomic DNA sequence, or, on occasion, generation of a new NT sequence for a given Locus. Most of these updates result in changes in the NT coordinates for the Locus. However, relatively few result in chnages in SNP or Intron/Exon positional coordinates RELATIVE TO a start position within the Locus, eg position 5001 at the CAP site. This is because most of the Exon sequences are now complete and no longer changing, and relatively few of the Introns (mainly only large ones) are still changing in sequence with new assemblies.
However, when such changes DO occur relative to the CAP site or AUG site, the SNP ID Name may change, since the SNP ID Name contains position information.
Thus the purpose of having BOTH old *.old files and new *.doc files is to permit the User to compare whether changes have taken place for their Locus of interest, and then to take appropriate action.
Information is contained in the History information (click on the "Last Update" date link) on comparisons of length of the LOCUS.gbCAP.old and LOCUS.gbCAP.doc sequences. If these are the same, then there have been no changes. If they are different, then use of BLAST2SEQ to compare the two FASTA-formatted sequences (LOCUS.fasta.old and LOCUS.fasta.doc), with subsequent comparison to the GB-annotation, provides information on where within the gene the sequence changes took place.
NOTE: if you find that any SNP coordinates in the current SNP ID Names Excel file are out of date, please so inform Doug Smith !!
"Unusual" Properties of a Locus: As indicated above, for "usual" loci, the position of the AUG protein translation start site is set to be 5001 in the LOCUS.gbAUG.doc file, and the position of the CAP mRNA transcription start site is set to be 5001 in the LOCUS.gbCAP.doc and LOCUS.fasta.doc files. This however does not work well for some loci. Here are two examples:
ACE has 3 isoforms, with CAP sites for the mRNA species at 5001, 12744, and 12744, and AUG sites for the coding sequence at 5023, 12796, and 12796. In this case, the ACE.gbCAP.doc sequence contains 5000 bp upstream of the CAP site for mRNA 1 and 2000 bp downstream of the polyA site for mRNA 2, the longest of the three mRNA species.
ABCB1 has only one isoform, with CAP site for the mRNA species at 5001 in the ABCB1.gbCAP.doc file. However, the AUG site for the coding sequence in this file is at position 118065! Thus, if one wishes to include roughly the same nucleotides in both the ABCB1.gbCAP.doc and ABCB1.gbAUG.doc files, the AUG site position in the ABCB1.gbAUG.doc file cannot be at position 5001, it must be at least at position 118000 or so.
When a given Locus has such "unusual" properties, the word "yes" appears in the "last Update" column of the Table above, under the Update Date, otherwise the word "no" appears. If "yes" appears, the nature of the "unusual" Locus properties is summarized with the History information; just click on the locus "Last Update" link to go to this information.
Uses of the Locus-specific Sequence Files:
These Sequence Files are very useful for a variety
of tasks.
Two types of Sequence Files: The Locus-specific Sequence Files are of two types:
The two types of files thus complement each other vis a vis types of use.
Sequence Reference Position is at Postion 5001
"usually", for either the CAP site or for the AUG site:
The LOCUS.gbCAP.doc and LOCUS.fasta.doc
files contain the complete gene plus 5000 bp upstream and 2000 bp downstream.
Thus, promoter sequences and sequences past the polyA site are present,
permitting examination and analysis of SNPs and other features in these
regions. Further, there is no problem with negation position coordinates
in the promoter region.
The LOCUS.gbAUG.doc file is identical in annotation to the LOCUS.gbCAP.doc file, and contains similar sequences. However, the positions have been moved (mathematically "translated") such that the AUG site is at position 5001 for "usual" loci, and at position 10001 or other convenient position for some of the "unusual" loci (click on "Last Update" links for details for a given Locus). This readily permits identification of SNP position relative to the protein AUG translation-start position rather than to the mRNA CAP transcription-start position (simply subtract 5000; if the position is a promoter position and a negative number comes up, increase the negative number by one, eg -100 to -101, to account for no use of position zero, ie one goes from +1 to -1).
2. Locus-specific Spreadsheets
The Locus-specific Spreadsheets contain considerable locus-specific information absent in the timeline spreadsheets. Their precise format is still under development as of June, 2003.
01.06.2005 - Note:
Construction of these Locus-specific Spreadsheets has been difficult to
automate. Rather a combination of the Locus-specific Sequence
Files described above and the
UCSD "Standardized SNP ID Names Excel
file provides nearly all of this Locus-specific information. The manually-constructed
Locus-specific Spreadsheets for the ADRB2, CHGA, and CHGB loci are still
available here.
a. Recommendations for Use of the Locus-specific Spreadsheets:
Display vs Download:
The *.htm Web Page files are Web page versions of
the Locus-specific spreadsheets. Their display is still rather finicky
... display requires Internet Explorer 5.2 ... and sometimes requires
a few "refresh the screen" ...
Also, IE 5.2 on Mac OS X tends to hang up ...
Use of the *.doc Sequence files and the *.xls Excel
files requires that they be downloaded to disk.
The files then or course are "yours" and can be modified, sorted,
etc as desired.
Such downloading is usually the best way to go ...
Recommendations for Web Display of the *.htm Files: