SOME NOTES ON HUMAN GENETICS AND GENETIC EPIDEMIOLOGY
Overview of genetic epidemiology
The introductory slides came from here.
Recurrence Risks
Familial aggregation of a trait is suggestive but not proof of a genetic
contribution.
The recurrence risk is the probability a person expresses a disease
given they have an affected relative (of a given type). For example,
the population lifetime risk of developing psoriasis is approximately
2%, but if one has an affected parent or sibling the recurrence risk is
20%.
The sibling recurrence risk ratio (PRRsib, or often
referred to as lambdas) in this case is 10. The size of PRR
determines how easily linkage studies can detect a gene underlying a
trait.
Classical twin study
More on this here.
The classical twin design dates from the 1920's, when diagnosis of identical (MZ) and
nonidentical (DZ) twins became reliable. It is assumed that twins in
the same household are exposed to shared environmental risk factors to
the same extent whether they are MZ or DZ. Then, MZ and DZ twins differ
only in the proportion of genes they share. Recurrence risks of disease
in a cotwin of a case can then be interpreted as follows:
| Rpop = RDZ = RMZ | No familial factors involved
|
| Rpop < RDZ = RMZ | Family environmental factors
|
| Rpop < RDZ < RMZ | Genetic factors +/- family environment
|
| Rpop < RDZ << RMZ | Epistatic polygenes and/or gene-environment interaction
|
Pooling two twin studies of psoriasis (Australian and Dutch):
| Twin type | Recurrence Risk | 95% Confidence Interval | PRR
|
|---|
| MZ | 60% | 47-71% | 30
|
| DZ | 22% | 10-37% | 11
|
Twin concordance and recurrence risk
Estimation of recurrence risk in families is confused by the ascertainment
mechanism, that is the method of sampling or recruitment.
Example: The total population of twins (in the Australian NHMRC
Twin Registry say)
| Twin 1
|
|---|
| Aff | UnA
|
|---|
| Twin 2 | Aff | a | b
|
|---|
| UnA | c | d
|
The recurrence risk is a/(a+b) or a/(a+c),
or better still the ``average'' a/(a+0.5(b+c)).
In the usual situation, twins don't have a label, so one instead sees:
| Aff-Aff | Aff-UnA | UnA-UnA
|
|---|
| a | b+c | d
|
| C | D | U
|
The correct formula for the recurrence risk is then:
R = C/(C+0.5 D) = 2C/(2C+D).
I belabour the point because the pairwise concordance
(C/(C+D)) is
often quoted in the literature, and does not estimate the recurrence risk
in the same way as does this probandwise concordance.
The general rule for dealing with ascertainment
A proband is a family member whose disease status brought the family
into the study. If a family contains two probands, then it should
contribute twice to the estimation of recurrence risk. In the case of
complete ascertainment, every affected person is a proband.
This is the type of ascertainment one would see in a population survey where
all members of a family are automatically examined.
For the twin example above, every family with two affected twins is
counted twice, and every family with one affected twin is only counted
once, so,
| Number of times a cotwin is affected given the proband is affected | 2C
|
| Total number of families (counting those with two probands twice) | 2C+D
|
If one is carrying out a twin study in a clinic, often only one twin
will be the proband (patient). If the cotwin is subsequently found to
be affected, s/he will be a ``secondary'' case, and this family will
only be counted once. This is single
ascertainment.
Caveats to the classical twin method
- Discordance of X-inactivation (MZ discordance for highly penetrant
X-linked disorders
- Increased maternal hormone levels (breast and ovarian Ca possibly
increased in twins
- Sharing a uterus (birth weight, complications, ``fetal programming'')
- Sharing a placenta (monochorionic versus dichorionic MZ twins)
- Inequality of parental treatment (MZ versus DZ).
- Greater MZ than DZ sharing of environmental exposures (``social
contagion'', education, job choice)
Cotwin control study
One commonly matches cases and controls in observational and
intervention studies to control confounding. MZ twins are well matched
on genetic and family environment, so only small sample sizes are
needed. Some examples:
- The shorter twin has a higher rate of atherosclerotic heart disease
than the taller cotwin.
- The twin with tibiofemoral osteophytes weighs 4 kg more than the
twin without radiographic changes.
- The smoking twin has lower bone density, larger carotid plaques etc.
- The twin who served in Vietnam has nine times the rate of PTSD.
- The schizophrenic twin has smaller hippocampi.
- The twin taking ascorbic acid gets just as many colds.
Mitochondrial inheritance
- Each cell and mitochondrion contains multiple copies of the
mitochondrial genome (the latter 2-10 copies).
- Mitochondria are transmitted from mother to child, so that
mitochondrial diseases are inherited matrilineally.
- The mitochondrial genome contains 37 genes (13 respiratory chain
proteins, 22 tRNAs, 2 rRNAs) in 16569 bp.
- The mutation rate of mitochondrial DNA is high (repair mechanisms
are inferior to those for nuclear DNA, and recombination is rare or absent).
- Several hundred mutations leading to disease are known: OMIM has
240 entries for mitochondrial inheritance, and lists 18 mutations
leading to Leber's hereditary optic neuropathy.
- Heteroplasmy refers to the fact that multiple different
mitochondrial genotypes can be present throughout one organism.
- Most disease mutations are ``recessive'' in nature, in that
they have to make up 70% or more of all mtDNA before disease occurs in that
organ (heteroplasmy can vary by tissue).
- The disease phenotype is quite variable within families due to
heteroplasmy.
- Mitochondrial diseases can affect all systems, but have a
predeliction for CNS and muscular (including cardiac muscle) manifestations.
- A number of mitochondrial mutations increase with age through
replicative errors during mitosis eg the ``common deletion'' (nt 8470-13447).
Prototypic mitochondrial diseases
Leber's hereditary optic neuropathy
(LHON, MIM 535000)
- Midlife central vision loss, cardiac conduction defects (WPW, LGL),
multiple sclerosis, more widespread CNS degeneration (putamen, basal ganglia).
- Several hundred families worldwide (including Australia).
- Male preponderance in Europeans (confusing the mode of inheritance).
- Three point mutations explain 90% of cases: at nucleotides
11778, 3460, and 14484. A better visual prognosis is associated with the
bp 14484 mutation (NADH dehydrogenase subunit 6, Met64Val).
Mitochondrial myopathy, encephalopathy, lactic acidosis, and
strokelike episodes (MELAS,
MIM 540000)
- Episodic vomiting, seizures, recurrent cerebral infarcts.
- Diabetes mellitus, sensorineural hearing impairment,
hypertrophic cardiomyopathy, ataxia, basal ganglia calcification,
opthalmoplegia, cognitive decline, short stature.
- Ragged-red muscle fibre changes on occasionally.
- The 3243A>G mutation (tRNA-leu) has an estimated frequency 16 per
105 in northern Finland.
- 3243A>G mutation is present in 1% of unselected
patients with diabetes mellitus, and 2-5% of those with a matrilineal
history of diabetes.
- 3243A>G mutation is present in 0.07% of unslected patients
with deafness, and 7% of those with a matrilineal history of
sensorineural deafness.
Myoclonic epilepsy associated with ragged-red fibres
(MERRF,
MIM 545000)
- Familial myoclonic epilepsy.
- 90% of cases due point mutation nucleotide 8344A>G.
- Histological muscle changes.
Parent-of-origin effects and imprinting
- In crosses between breeds/species (eg horse with donkey, lion with tiger),
it is known that the phenotype in the offspring depends on the specific
parental phenotype: imprinting.
- Several clusters of autosomal genes act differently depending on
which parent transmits them. Methylation of the CpG rich
imprinting control region (ICR) upstream from the
target loci leads to binding of methyl-CpG-binding proteins (eg
MeCP2) and subsequent compacting of chromatin packing and thus
gene silencing. In this context, imprinting refers to
inactivation of that specific parental allele in the
offspring. A side effect is asynchronous replication at the
locus (the imprinted allele replicates early).
- Different parentally derived alleles at loci in the same
imprinted region may be inactivated.
- Other mechanisms can give rise to the same parent-specific
transmission of phenotype. Maternal effects mediated by the hormonal
milieu of the fetus can reinforce a phenotype only when the child
carries the same genotype as the mother at a locus.
- Parent-of-origin effects in the triplet expansion diseases are
mediated by differential expansion of repeat number. For example in
Huntingdon's disease
(MIM 143100),
paternally transmitted alleles expand more than maternal alleles, so that
offspring inheriting the disease from a father are more severely affected,
and at an earlier age.
Examples of parent-of-origin effects
- Only the paternally derived IGF2 allele is expressed normally
after the 8 cell stage in embryonic development: maternal imprinting.
- At the neighbouring H19 locus, paternal imprinting is the norm.
- Both Beckwith-Wiedemann syndrome
(MIM 130650)
and a pure somatic overgrowth syndrome are associated with biallelic expression of IGF2.
Prader-Willi and Angelman (``happy puppet'') syndrome
- Both loci are within a 2 Mb imprinting region on 15q11-13.
- Deletion of the promotor and first exon of SNRPN gene along
with a small upstream ICR on the paternally derived chromosome leads to
Prader-Willi syndrome (MIM 600161),
and a similar deletion of the maternally derived chromosome to Angelman syndrome
(MIM 105830).
Small nuclear ribonucleoprotein polypeptide N (snRPN,
MIM 182279)
is predominantly expressed in brain.
- Uniparental disomy is the other mechanism whereby alleles derived
from only one parent are present in the region.
- Some Angelman families have simple point mutations in SNRPN.
Atopic disease
- Loci on chromosome 11q13 (FCER1B), 13q, and 3q have been
found to exhibit linkage only through one parent and not the other to
atopy, atopic dermatitis and asthma.
- FCER1B is in a region homologous to mouse imprinting regions, but
true imprinting has not been documented.
Linkage and association analysis
There is much material on this here.
Lod score linkage analysis
Lod score linkage analysis requires a roughly correct model for transmission
of the trait. This is used to estimate the most likely trait locus genotype
each member of a family carries.
The recombination distance between the trait locus and one or more
known marker loci can then be estimated. The strength of evidence for linkage
(recombination fraction < 50%) is traditionally expressed as the decimal log
odds ratio: the lod score. Given the size of the human genome, a lod
score of 3 (equivalent to a P-value of 2x10-4) is regarded as
significant evidence for linkage, and a lod score of -2 as definitively
excluding linkage.
Nonparametric linkage analysis
A number of statistical methods have been developed that do not require
a trait locus model to be specified. The most common approach one will
encounter in the literature is the affected sib pair
method. This requires families containing two affected siblings, and can be
shown to be equivalent to a lod score linkage analysis assuming a
completely recessive model for the trait locus.
Haseman-Elston regression and variance component linkage analysis are
nonparametric methods that can be applied to continuous traits (eg blood
pressure).
Transmission-disequilibrium test
The transmission-disequilibrium test (TDT) tests the transmission
ratio of alleles at a marker locus from a heterozygous parent to an
affected child. There is distortion of the ratio from
the Mendelian 50:50 only if the marker locus is linked to the trait
locus and exhibits allelic association. If a marker
locus is close (tightly linked) to the trait locus, then allelic
association may be observed, but association does not automatically
imply tight linkage.
Genetic marker loci: SNPs, RFLPs, microsatellites
To be useful in the detection of cosegregation of alleles at a marker locus
with a trait, the marker needs to be polymorphic. This is measured
usually as the expected heterozygosity of the marker
(proportion of individuals expected to be heterozygous based on the allele
frequencies).
Simple Sequence Repeat polymorphisms (SSRs)
These are common polymorphic loci where alleles are differing number of
repeated sequences of 2, 3, 4 or more nucleotides. Also known as
microsatellites or Short Tandem Repeat Polymorphisms. After PCR
amplification of a length of DNA that includes the SSR, the different
size fragments are identified by gel electrophoresis. They are
currently the most utilised genetic markers for linkage analysis.
Single nucleotide polymorphisms (SNPs)
These are simply point mutations of a sufficiently high frequency to be
useful for linkage and association analysis. The nucleotide-wise
mutation rate is approximately 10 @Sup{-6} and randomly occurs
within coding or noncoding regions, so that there is approximately one
SNP per 1000 base pairs. Because of this density, they are more useful for
fine mapping of trait loci known to be within a certain
chromosomal region.
Restriction Fragment Length Polymorphisms (RFLPs)
This refers to a method of genotyping SNPs. PCR is used to amplify a
segment of DNA that includes the SNP. A restriction enzyme is chosen that
recognises one of the two variants at a SNP, and will cleave the DNA strand
only when that variant is present so either one or two fragments will be
formed.