This contains links to genetics, statistics, and computer programs for linkage and association analysis, including my own. It was last updated on 01-Oct-2008.
David L. Duffy, MBBS PhD.
Queensland Institute of Medical Research,
300 Herston Road,
Herston, Queensland 4029, Australia.
Email: davidD@qimr.edu.au.
Some photos from our Tasmanian holiday 2000.
Some photos showing climbing 2001-6.
Paintings.
| QIMR and departmental links | My software | ||
| My CV/Publications | |||
| About asthma | |||
| About genetics | |||
| A genetic map | |||
| My links to other sites: genetics etc |
These are periodically updated reviews of the genetics of atopy and asthma. The articles include:
These chapters are based on my doctoral thesis, which is available in PDF format here.
Our publications on the genetics of allergic disease include:
This table (last updated 20060618 18:40) contains interpolated genetic map positions for 128115 marker loci. The positions are in "Rutger's" cM (Kong X, Murphy K, Raj T, He C, White PS, Matise TC. A combined linkage-physical map of the human genome. Am J Hum Genet 2004; 75:1143-1148), estimated via locally weighted linear regression (lo(w)ess) from the Build 35.1 (and 34.3) physical map positions and published Rutgers genetic map positions ( R code here), and linearly interpolated "Oxstats" cM positions (Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. 2005. A fine-scale map of recombination rates and hotspots across the human genome. Science 2005; 310: 321-324). A major difference between these two metrics is the model for recombination across the centromere.
For the pseudoautosomal region, I have interpolated a male map based on the sperm typing data of Lien et al [2000]. This is a separate file.
A research note describing this map is:
Duffy DL (2006). An integrated genetic map for linkage analysis.
Behavior Genetics 36: 4-6.
|
Duplicate markers have been removed. Beware of name modifications (eg Marshfield names often have letter code suffixes such as Z i.e. "Primer moved from its initial position; the allele size changed").
| 1 | Marker name |
| 2 | Alternative name |
| 3 | Alternative name 2 |
| 4 | Locus type (STS or SNP) |
| 5 | Decode marker (D or .) |
| 6 | Marshfield marker (M or .) |
| 7 | Chromosome |
| 8 | Alternative chromosome (occasionally differs!) |
| 9 | Decode order number (1-5136) |
| 10 | Marshfield order number (1-8010) |
| 11 | Decode physical position on chromosome (bp) |
| 12 | Decode genetic map position (cM) |
| 13 | Marshfield genetic map position (cM) |
| 14 | Rutger genetic map position (cM) |
| 15 | Rutger male genetic map position (cM) |
| 16 | Rutger female genetic map position (cM) |
| 17 | Build 34.3 physical map position (bp) |
| 17 | Build 35.1 physical map position (bp) |
| 18 | Interpolated physical map position (bp) |
| 19 | Interpolated Rutger genetic map position (cM) |
| 20 | Interpolated Oxstats genetic map position (cM) |
I have taken the chromosome band data used by NCBI Mapview to draw ideograms, and interpolated the band positions onto the above map (as opposed to the perhaps more logical approach of mapping linkage findings to a physical map!):
I am making Fortran 95 source code and some binaries for my programs SIB-PAIR available for downloading. Sib-pair is now close to a state I consider suitable for general release, and so is now available in a beta-test version. The Metropolis algorithm for ibd etc has been incrementally improved, and seems to give sensible answers. It is also now used to give MLEs for marker allele frequencies that agree closely with those from programs such as MENDEL and PAP. While loading data could still be further improved for large datasets, analysis of data once in memory is fairly fast, so the program can be used for handling and analysis of genome-wide association study (GWAS) data.
The most recent version of SIB-PAIR (1.00b) is dated 26th September 2008 (see the list of new features). With respect to urgency of upgrading, for SIB-PAIR it is always a good idea! For example, on 2008-09-23, I fixed two errors introduced on 2008-05-28, one of which lead to intermittent program crashes under Windows.
As of 29th September 2006, the Fortran 95 version of Sib-pair is the "stable" code. The executable (as of 9th March 2007) is called sib-pair or sib-pair.exe, and the old versions (Fortran 77) have been renamed sib-pair77, sp77.exe and so forth. Precompiled executables are available for Linux and for Windows, but there should be no problems compiling and running on platforms that have a Fortran 95 compiler.
The pedigree data is now permanently resident in memory, with the expected gains in speed of analysis. There are no hard coded constraints on number of loci, number of pedigree members or number of alleles at a marker (providing your computer has enough memory). Almost all of the original Sib-pair commands are implemented and give the same or even better answers (;)). Already, a new quantitative trait TDT (following Gauderman 2003) has been added, and Mendelian errors tabulated by pedigree and by locus. The regression command can give gene-dropped P-values, so allowing genetic survival analysis (in pedigrees), and gene dropping can now be conditional on IBD, thus giving an association test conditional on linkage. The GLM regression now also allows multiple imputation of unobserved genotypes. The new "join" command allows one to (re)merge pedigrees, and "update" allows you to merge in data from a secondary data file. Please let me know if you encounter any problems with any of these new commands.
One feature of the g95 compiler I am using is checkpointing: you can save the current state of a long Sib-pair job by sending the QUIT command (Ctrl-\). This creates a new program called "dump" that when run, will continue from that state. You can then safely stop the original Sib-pair run and resume later. Use "sib-pair --g95" to see information about this and other runtime options.
Using the japi library, a graphical file picker or directory browser is now working under Windows and Linux. An alternative uses the GTK2+ based pilib library. If these are not activated (I am still experimenting), there is a fallback simple text based file chooser.
Program SIB-PAIR performs a number of simple analyses of family data that tend to be "nonparametric" or "robust" in nature. The name is a misnomer in that Sib-pair is actually for the analysis of arbitrary pedigrees. It is modelled to some extent on the Genetic Analysis System [Young, 1995] in terms of the command language and types of analysis. Included are routines for:
The most recent releases of SIB-PAIR add flexible manipulation of pedigree data, MLE of allele frequencies, segregation analysis, variance components (linkage) analyses that now allow multiple fixed effects including measured genotypes, the combined sibship/transmission disequilibrium score test for allelic association, a quantitative trait TDT, mixtures of normal (and other) distributions for quantitative traits, generalized (mixed) linear models, simple classical twin analysis.
There is also a program for binning of the STR allele lengths coming out of software for reading gels: BINNING.
This program generates nuclear families, a proportion of which contain monozygotic twins, in which multiple quantitative trait loci are segregating. One of these QTLs is linked to multiple markers. Families can be selected to contain high and/or low values at the quantitative or ordinal trait.
This is a Basic program that performs a number of simple statistical analyses of contingency tables useful in epidemiology and genetics. One can estimate tetrachoric correlations and odds ratios for 2x2 tables (with exact confidence intervals), combine multiple 2x2 tables via Mantel-Haenszel and maximum likelihood procedures (jackknife standard error for pooled MLE odds ratio), test for symmetry and quasi-symmetry in square contingency tables, and obtain exact (Pearson-Clopper) 95% confidence intervals on a proportion. A calculator (double precision) with scientific functions including inht(), fact(), and ran() is also accessible via the same menu.
rcexact. A program that calculates Fisher exact P-values for RxC contingency tables. Written by Mehta in Fortran 77 (Algorithm 643 from the ACM). I have altered the driving program slightly.
drawhap.sh. Takes SIMWALK2 haplotyping output file and draws the pedigree as a marriage-node graph with haplotypes using Graphviz (needs sh, awk, dot). Not completely satisfactory in terms of placement of haplotypes on the drawing. Colouring is of alleles, rather than haplotypes.
join.unsorted. Just like (unix) join, but files do not have to be sorted. Returns a file following the order of the key in the first named file:
Usage: join.unsorted [OPTION]... FILE1 FILE2
For each pair of input lines with identical join fields, write a line to
standard output. The default join field is the first, delimited
by whitespace. When FILE1 or FILE2 (not both) is -, read standard input.
-a FILENUM print unpairable lines coming from file FILENUM, where
FILENUM is 1 or 2, corresponding to FILE1 or FILE2
-e EMPTY replace missing input fields with EMPTY
-i, --ignore-case ignore differences in case when comparing fields
-j FIELD equivalent to -1 FIELD -2 FIELD
-o FORMAT obey FORMAT while constructing output line
-t CHAR use CHAR as input and output field separator
-v FILENUM like -a FILENUM, but suppress joined output lines
-1 FIELD join on this FIELD of file 1
-2 FIELD join on this FIELD of file 2
--help display this help and exit
--version output version information and exit
Unless -t CHAR is given, leading blanks separate fields and are ignored,
else fields are separated by CHAR. Any FIELD is a field number counted
from 1. FORMAT is one or more comma or blank separated specifications,
each being FILENUM.FIELD or 0. Default FORMAT outputs the join field,
the remaining fields from FILE1, the remaining fields from FILE2, all
separated by CHAR.
fscheme. Port of the tinyscheme (and minischeme) small Scheme interpreter to Fortran 95. Hopefully useful as an embedded interpreter.
fortransockets. Minimal Fortran 95 sockets library for linux. Enough functionality for a simple server. Includes wrappers for socket(), setsockopt(), bind() and listen(); accept(); send(); recv(); close(); gethostbyname(). This code last updated 2007-12-19.