SOME NOTES ON HUMAN GENETICS AND GENETIC EPIDEMIOLOGY

Overview of genetic epidemiology

The introductory slides came from here.

Recurrence Risks

Familial aggregation of a trait is suggestive but not proof of a genetic contribution.

The recurrence risk is the probability a person expresses a disease given they have an affected relative (of a given type). For example, the population lifetime risk of developing psoriasis is approximately 2%, but if one has an affected parent or sibling the recurrence risk is 20%.

The sibling recurrence risk ratio (PRRsib, or often referred to as lambdas) in this case is 10. The size of PRR determines how easily linkage studies can detect a gene underlying a trait.

Classical twin study

More on this here.

The classical twin design dates from the 1920's, when diagnosis of identical (MZ) and nonidentical (DZ) twins became reliable. It is assumed that twins in the same household are exposed to shared environmental risk factors to the same extent whether they are MZ or DZ. Then, MZ and DZ twins differ only in the proportion of genes they share. Recurrence risks of disease in a cotwin of a case can then be interpreted as follows:

Rpop = RDZ = RMZ No familial factors involved
Rpop < RDZ = RMZ Family environmental factors
Rpop < RDZ < RMZ Genetic factors +/- family environment
Rpop < RDZ << RMZ Epistatic polygenes and/or gene-environment interaction

Pooling two twin studies of psoriasis (Australian and Dutch):

Twin typeRecurrence Risk95% Confidence IntervalPRR
MZ 60% 47-71% 30
DZ 22% 10-37% 11

Twin concordance and recurrence risk

Estimation of recurrence risk in families is confused by the ascertainment mechanism, that is the method of sampling or recruitment.

Example: The total population of twins (in the Australian NHMRC Twin Registry say)
Twin 1
AffUnA
Twin 2Affab
UnAcd

The recurrence risk is a/(a+b) or a/(a+c), or better still the ``average'' a/(a+0.5(b+c)).

In the usual situation, twins don't have a label, so one instead sees:

Aff-Aff Aff-UnA UnA-UnA
ab+cd
CDU

The correct formula for the recurrence risk is then:

R = C/(C+0.5 D) = 2C/(2C+D).

I belabour the point because the pairwise concordance (C/(C+D)) is often quoted in the literature, and does not estimate the recurrence risk in the same way as does this probandwise concordance.

The general rule for dealing with ascertainment

A proband is a family member whose disease status brought the family into the study. If a family contains two probands, then it should contribute twice to the estimation of recurrence risk. In the case of complete ascertainment, every affected person is a proband. This is the type of ascertainment one would see in a population survey where all members of a family are automatically examined.

For the twin example above, every family with two affected twins is counted twice, and every family with one affected twin is only counted once, so,
Number of times a cotwin is affected given the proband is affected2C
Total number of families (counting those with two probands twice)2C+D

If one is carrying out a twin study in a clinic, often only one twin will be the proband (patient). If the cotwin is subsequently found to be affected, s/he will be a ``secondary'' case, and this family will only be counted once. This is single ascertainment.

Caveats to the classical twin method

Cotwin control study

One commonly matches cases and controls in observational and intervention studies to control confounding. MZ twins are well matched on genetic and family environment, so only small sample sizes are needed. Some examples:

Mitochondrial inheritance

Prototypic mitochondrial diseases

Leber's hereditary optic neuropathy (LHON, MIM 535000)

Mitochondrial myopathy, encephalopathy, lactic acidosis, and strokelike episodes (MELAS, MIM 540000)

Myoclonic epilepsy associated with ragged-red fibres (MERRF, MIM 545000)

Parent-of-origin effects and imprinting

Examples of parent-of-origin effects

IGF2 (MIM 147470)

Prader-Willi and Angelman (``happy puppet'') syndrome

Atopic disease

Linkage and association analysis

There is much material on this here.

Lod score linkage analysis

Lod score linkage analysis requires a roughly correct model for transmission of the trait. This is used to estimate the most likely trait locus genotype each member of a family carries.

The recombination distance between the trait locus and one or more known marker loci can then be estimated. The strength of evidence for linkage (recombination fraction < 50%) is traditionally expressed as the decimal log odds ratio: the lod score. Given the size of the human genome, a lod score of 3 (equivalent to a P-value of 2x10-4) is regarded as significant evidence for linkage, and a lod score of -2 as definitively excluding linkage.

Nonparametric linkage analysis

A number of statistical methods have been developed that do not require a trait locus model to be specified. The most common approach one will encounter in the literature is the affected sib pair method. This requires families containing two affected siblings, and can be shown to be equivalent to a lod score linkage analysis assuming a completely recessive model for the trait locus.

Haseman-Elston regression and variance component linkage analysis are nonparametric methods that can be applied to continuous traits (eg blood pressure).

Transmission-disequilibrium test

The transmission-disequilibrium test (TDT) tests the transmission ratio of alleles at a marker locus from a heterozygous parent to an affected child. There is distortion of the ratio from the Mendelian 50:50 only if the marker locus is linked to the trait locus and exhibits allelic association. If a marker locus is close (tightly linked) to the trait locus, then allelic association may be observed, but association does not automatically imply tight linkage.

Genetic marker loci: SNPs, RFLPs, microsatellites

To be useful in the detection of cosegregation of alleles at a marker locus with a trait, the marker needs to be polymorphic. This is measured usually as the expected heterozygosity of the marker (proportion of individuals expected to be heterozygous based on the allele frequencies).

Simple Sequence Repeat polymorphisms (SSRs)

These are common polymorphic loci where alleles are differing number of repeated sequences of 2, 3, 4 or more nucleotides. Also known as microsatellites or Short Tandem Repeat Polymorphisms. After PCR amplification of a length of DNA that includes the SSR, the different size fragments are identified by gel electrophoresis. They are currently the most utilised genetic markers for linkage analysis.

Single nucleotide polymorphisms (SNPs)

These are simply point mutations of a sufficiently high frequency to be useful for linkage and association analysis. The nucleotide-wise mutation rate is approximately 10 @Sup{-6} and randomly occurs within coding or noncoding regions, so that there is approximately one SNP per 1000 base pairs. Because of this density, they are more useful for fine mapping of trait loci known to be within a certain chromosomal region.

Restriction Fragment Length Polymorphisms (RFLPs)

This refers to a method of genotyping SNPs. PCR is used to amplify a segment of DNA that includes the SNP. A restriction enzyme is chosen that recognises one of the two variants at a SNP, and will cleave the DNA strand only when that variant is present so either one or two fragments will be formed.