UCSD NHLBI PPG Program

Sympathetic Neuroeffector Junctions and Blood Pressure

Human Essential Hypertension

Recent Changes in Hypertension Web Site

Home
Home Page for the UCSD SNP Discovery programs

PPG Program
PPG program aims, organization, projects, personnel, flow chart

PPG Program Data
Public Data and UCSD Account Data

PPG SNP Data
Excel Workbooks, SNP ID Names, Public Domain and Proprietary Data

Human Chromosomes
Human Genome info
Candidate Loci

Mouse Chromosomes
Mouse Genome info

Cand Loci Homologues

SNPs and Haplotypes
Single Nucleotide
Polymorphism and Haplotype info

Usage
Usage and help for Program software

Biological Protocols
Protocols used in laboratory
research by the PPG program

Recent Changes

31 Oct 2006: Complete Renewal and Update of SNP ID Names
SNP ID Names have now been assigned to all SNPs known to be of interest to Hypertension personnel as of May, 2006, including those analysed by Sequenom, UCSD resequencing, TGCA, and Weber. Names are provided based on CAP site (transcription start), on AUG site (translation start), or on Mature Peptide start site for nonCDS SNPs, or on Protein or Mature Peptide protein start sites for CDS SNPs. Locus information provided includes four coordinate systems (Locus start site, 5000 bp upstream of locus start site, NT start site (NCBI SNP coordinates, NC start site, with comparison to UCSC chromosomal coordinates), structure coordinates, locus cartoon, and summary of locus function. Complete data are provided for each transcriptional Variant and protein Isoform having data at NCBI. Flanking sequence is provided for all SNPs. For two loci of interest (CHGA and TH), complete data is also provided in spreadsheet "SNPnames-dbSNPall" for all SNPs present in dbSNP at NCBI as of Sept, 2006, including 5000 bp into the promoter (upstream) and 2000 bp 3' (downstream) of the locus. TH with its 3 transcriptional variants is used as an example for comparison of SNP ID names between the 3 variants. More information is available here

.

7 Sept 2005: CHI-square Analysis of Select Subgroup Pairs
CHI-square analyses of Sequenom data for pairs of subgroups of the nnnn.Unrelated population are now available. CHI-sq values were calculated in two ways, with identical results. The pairs of subgroups are:

  • Ethnicity: White vs Black
  • Gender: Male vs Female
  • Blood Pressure Status: Normotensive vs Hypertensive
  • Normotensive and Hypertensive subgroups of Black, White, Male, and Female subgroups

13 July 2005: Biological Protocols of the PPG Program
Biological or laboratory protocols or "standard operating procedures" (SOPs) have been documented and recorded here, to assist in maintaining a consistent and stable operating environment.

3 June 2005: Subset Analyses of Sequenom Resequencing Data
We have begun analyses of subgroups of the resequencing data received from Sequenom. Initial analyses are for the nnnn.Unrelated population, and include data analysis of the following subgroups of the total data (data received Nov, 2003, Jan, 2004, and Feb, 2004):

  • All data
  • Ethnicity: White, Black, Hispanic, Asian
  • Gender: Male, Female
  • Blood Pressure Status: Normotensive, Hypertensive
  • Normotensive and Hypertensive subgroups of each of the 4 Ethnicity subgroups
  • Normotensive and Hypertensive subgroups of each of the 2 Gender subgroups

Results for each subgroup are presented in Excel worksheets having the same columnar format as developed for analysis of the UCSD Data Workbooks described below. In addition to these worksheets of complete information, worksheets containing major comparative data columns from each subgroup are developed. These worksheets present 8 columns of data for each subgroup in side-by-side style, permitting relatively easy direct comparison between different subgroups; for example, comparison of SNP data for black normotensives vs black hypertensives, or for white hypertensives vs black hypertensives. These types of subgroup analyses can be readily extended to other populations and to data obtained from other sources. Data from different sources can also be combined, and new subgroups can be analysed. Note that these subgroups are of zero order (all data), first order (ethnicity, gender, BP status), and second order (Ethnicity-BP, Gender-BP); others can be developed as desired. For more information see Subgroup Data Workbooks.

Recent Changes

Sep 2005: CHIsquares of Subgroups

Biological Protocols

June 2005: Subgroup Analysis of Seq Data

May 2005: Presentation about SNP Analysis

May 2005: Analysis of TCGA Data

April 2005 Enhancements

Jan 2005 Enhancements

Dec 2004 Enhancements

Oct 2004 Enhancements

Quality Control AdditionsPrese

Duplicate Data Clarification

Standardized SNP ID Names

SNR Cluster Plots

Jan, Feb, 2004 Data Analysis

Nov, 2003, Data Analysis

Sep, 2006: SNP ID Names

Recent Changes
Changes in Web Site Data

 

 

DNASYSTEM
Web pages with Links to other Bioinformatic Sites

 

 

 

10 May 2005: Presentation to Nik Schork Group on "Primary Analysis of new SNP Genotype Data"
A Powerpoint presentation was given to the Nik Schork group in May, 2005, on the rationale, analysis methodology, quality control (QC) methodology, format of resulting SNP data analysis workbooks, and usage of the analysis and data files available for downloading from this Web site. The presentation emphasized the following topics: standard format of raw genotype data, a format independent of sequencing source; format of the analysis worksheets (one row per SNP, issues of unique SNP ID name); QC analyses; additional data present in analysis worksheets; types of SNP ID names; analysis of subsets or subgroups, eg specific gender or ethnic group, from within a total population. More information is available here.

4 May 2005: Analysis of TCGA Resequencing Data
To determine the ease of using TCGA as an outsourcing center for generation of resequencing data, data were obtained for the CHRNA5 locus from the Tnnn.Twin population. These data were converted into a facsimile of the format obtained from Sequenom and analysed. For more information see TCGA Data Workbooks.

24 Apr 2005: Further Analyses of UCSD Resequencing Data
"Step 1" analysis of resequencing or genotyping data from 6 loci (ADCY6, ADORA2B, DBH, RSG2, RSG4, SNX13, SNX14) is now available. "Step 1" analysis is analysis of the SNP genotype data but not necessarily with position-determined SNP ID Names assigned. In lieu of such names, a SNP Number SNP ID name is used; this name is based on 1) Locus, 2)Source or Sequencer name, and 3) chronological Number of the SNP. Design of this name permits sorting on this name in Excel files. Analysis includes raw data in the SNPdata spreadsheet and analysis in the SNPseqSumAlphaQC spreadsheets. Data are presented for the nnnn, Tnnn, and Hnnn populations, in separate spreadsheets. Data analysis includes Quality Control (QC) determinations of CHI-square analysis of Hardy-Weinberg Equilibrium as well as a second statistic suggested by Miguel Robinson. The downloadable single Excel workbook includes reanalysis and QC analysis of the data on the 10 loci made available in Dec, 2004. More information is available under UCSD Data Workbooks, including detailed description of the spreadsheets and columns of information.

24 Apr 2005: Analysis of Population Subsets: Gender, Ethnicity, Age, BP status, Fam Hist
As a test case, genotype data analysis has been done, including QC analysis of HWE, for subsets of the RGS2 data generated by Kenton (13 SNPs on 79 nnnn.Unrelateds). The data subsets include: All data; Gender (Male vs Female); Ethnicity (White, Black, Hispanic, Filipino, Asian); Age (ages 19-29, 30-49, 50-69, 70-82); Blood Pressure (Normotensive vs Hypertensive); Family History: have or do not have (control data: genotype should be indep of whether researchers has a family history or not). Excel Workbook format is the same as for UCSD resequencing data. More information is available here.

11 Jan 2005: Update and Extension of Locus-specific Sequence Files
The Locus-specific Sequence Files have been updated and extended to include old and new versions of 3 types of files, two of which are GenBank-formatted, to provide nearly all of NCBI information about a given locus, and one of which is FASTA-formatted, to provide sequence for analysis purposes. The reference position differs in the two GenBank-formatted files; one has position 5001 at the CAP site, and the other has position 5001 at the AUG site (if the AUG site is distant from the CAP site, this latter position can be 10001, 50001, etc). Thus, it is now easy to determine, for example, position of any dbSNP SNP relative to either CAP or AUG sites for any Locus of interest. The "old" versions are retained because NCBI continues to update the RefSeq genomic NT sequences, the basis for these Sequence files, with each new Genome Assembly, which can be as often as every 3 months. Thus, one can compare "old" with "new, eg via BLAST2SEQ, to see where changes have occurred in a given locus. Further, info is included about "unusual" loci, to alert the user to multiple, eg alternatively spliced, mRNA species and to presence of different protein isoforms; information on all of these mRNA and protein species is inherently part of the NCBI GenBank annotation. More information is provided under Locus-specific Sequence Files.

15 Dec 2004: Initial Analysis of UCSD Resequencing Data
Two Excel workbooks are now available under UCSD Data Workbooks containing raw data and initial analyses of ten loci of the some 25 human loci for which resequencing has been done under this program. The analysis is patterned after that for the Sequenom data, and attempts to present the raw data and the analyses in formats that preserve a consisten look and feel independent of resequencing source. Two files are available, one for nnnn.Random population data ( loci CHGA, CHGB, TH, KCNMB1, NPY2R, PMX2B, PNMT, PYY, and SCG2) and one for TH Tnnn.Twin population data. In preparing these data, additional SNPs have been added to the SNP ID Name Excel workbook, and standards are being developed for the handling of Loci which encode multiple isoforms. The TH locus, with its three isoforms (a, b, c) is being used as an initial test case, as documented under UCSD SNP ID Names.

8 Dec 2004: Enhancements to SNPnames.xls file
Additional columns were added to Summary Information, for Locus Name, SNP Posn relative to CAP site (mRNA start site), NCBI NT sequence name and SNP posn relative to this NT sequence. The Locus and SNP posn data permit sorting on Locus and Locus posn to yield progressive positions of SNPs within a given locus. The NT data permit correlation of SNP and posn of SNPs here with those documented at NCBI dbSNP.
Additional columns were added for Sequencing Source information, to permit adding SNPs found by UCSD investigators and other sources (TCGA, UCLAS, Harvard, etc) to those found by Sequenom. Finally, a column was added to indicate how the SNP position was determined (by SNP sequencing person or de novo from flanking sequence via NCBI BLAST2SEQ determinations)

3 Oct 2004: Enhancements to Sequenom SNP Analysis Workbooks
The following enhancements were made to the Sequenom SNP Population-specific Excel workbooks: 1) all Sequenom data (from Nov, 2003, Jan, 2004, and Feb, 2004) are present in one workbook; 2) three new columns for Locus information were added adjacent to the SNP ID Name column; 3) groupings of columns containing similar types of data were improved. The three new columns contain 1) locus name; 2) SNP locus position relative to CAP site; and 3) SNP position relative to locus "item" (exon, intron, promoter, etc). These new columns enhance sorting operations and visualization of SNP locus position directly.

15 Sept 2004: Addition of Quality Control Methodologies
Three methodologies for Quality Control were added to the data analysis Excel workbooks, namely: 1) CHI-square and P-value analysis of HWE values; 2) deviation and RMS analysis of <Het> / SQRT (<Hom1> * <Hom2>) from 2.00; and 3) error rates found in repeat data, data from repeated SNP analysis for same individuals. "Grades" were assigned to each of these QC analyses, complementing the Ambiguity analysis.

23 May 2004: Clarification of "Duplicate" Data
Some data from SNPs initially assigned different SNP ID Names, and hence thought to be duplicate data, were in fact data from different individuals in the population. These issues are now corrected, and data reanalysed. See Introduction spreadsheets of any Data Analysis Excel workbook.

18 May 2004: More Summary Info on Additional Sequenom Info
Additional timeline date information was added, together with Sequenom plate number info, to the data analysis sheets.

12 May 2004: Coalescence of Duplicate Entries plus More Seq Info
Duplicate SNP entries and redundancies in SNP ID Names were removed and/or coalesced. New spreadsheets containing all returned data, with Allele 1 defined alphabetically, were created for each population. Timeline dates are now included in the data analysis files.

6 May 2004: Standardized SNP ID Names plus dbSNP rs Names
Determination of "Standarized" SNP ID Names, based on BLAST analyses of SNP flanking sequences, for all Sequenom analysed SNPs was completed. These determinations provide considerable locus structure information about each SNP. In addition, the data analysis Excel workbooks now contain these SNP ID names, cognate dbSNP rsSNP name, and more complete individual and genotype call information for each SNP in each population.

13 April 2004: SNR "Cluster Plots" for Seq Jan, Feb Data
Cluster plots of SNR1 vs SNR2 for each SNP in the Sequenom January and February, 2004, returned data were created, as both linear and log-log plots. These plots provide a ready visual display of the quality of the genotype calls, as well as the noCalls, made by Sequenom. noCall information was also added to the data analysis Excel files.

5 Mar 2004: Analysis of Sequenom SNP Data of Feb 2004
Analysis of SNP data returned by Sequenom in February, 2004, was completed and made available. Analysis is similar to that for the Nov 2003 data, except that Sequenom is now using a Signal to Noise Ratio (SNR) method for genotype determination. As a result, the analysis is based solely on the Sequenom genotype calls.

5 Feb 2004: Analysis of Sequenom SNP Data of Jan 2004
Analysis of SNP data returned by Sequenom in January, 2004, was completed and made available. Analysis is similar to that for the Nov 2003 data, except that Sequenom is now using a Signal to Noise Ratio (SNR) method for genotype determination. As a result, the analysis is based solely on the Sequenom genotype calls.

12 Dec 2003: Summary Analysis of Sequenom SNP Data for all Populations
A summary sample Excel spreadsheet was prepared containing data for all twelve populations for three data items: 1) total assayed individuals in each population; 2) data quality "grades" assigned; and 3) alleles: minor/MAJOR for each population. These summary data exemplify a few ways in which the data can be manipulated.

10 Dec 2003: Analysis of Sequenom SNP Data returned Nov 2003
Analysis of SNP data returned by Sequenom in November, 2003, was completed and made available. Analysis includes genotype values for all individuals in a given population for a given SNP, together with p,q and HWE values. SNP genotype calls were ambiguous in some cases, with data values falling within "ambiguous regions". Five such ambiguous regions were defined, yielding genotype calls for comparison with the Sequenom genotype calls. Based on these comparisons, qualitative "grades" were assigned to assess quality of the data. Data are presented in MS Excel workbooks.


Latest modification: October, 2006

If you have comments or queries, send email to Doug Smith