HomeWriting Links Resources
BackUp

 

5. Mutations and selection in mouse

 

Alec MacAndrew

 

 

The draft mouse genome was published on 6th December 2002 , Waterstone et al, Nature 420, 520 - 562

Note that this is a 43 page paper (Nature averages 2 -3 pages per paper) with around 200 authors and 330 references. This is all new to science and the volume of material is more than a very fat text book if one includes the references . The detail is published not in a single paper, but in about six related papers occupying more than half of the super fat 6th December issue of Nature.

Selection in the Mouse genome

Deletions

We turn now to the two principal mechanisms for evolution, mutation and selection, as evidenced in the mouse genome.

First, comparison of the two genomes by detailed nucleotide alignment indicates that 40% of the human genome can be aligned to the mouse genome. At first sight, this percentage seems low as we have already seen that more than 98% of mouse genes have a homologue in man and, within proteins, there is ~70% identity between mouse and man.  But the latter two percentages correspond to conserved functional protein coding genes.  The detailed nucleotide alignment of only 40% refers to the entire genome including the vast tracts of functionless DNA that are not under selection.

The 40% can be explained as follows.  We know that some 700Mb of lineage specific repeat sequences (see the article on repeat sequences)  has been acquired since divergence of mouse and man lineages by identifying lineage specific repeat sequences.  But it seems likely that the common ancestor would have a genome of a similar length to extant mammals (about 2.9Gb). In that case there must have been about 700Mb of deletions since divergence to keep the human genome size constant (implying that 76% of the common ancestor's genome survived and 24% was lost).  It would also imply a gross loss of about 1300Mb in mouse offset against the known gain of 900Mb of lineage specific repeat sequences (as the net loss in mouse is 400Mb).  For mouse this would imply 55% retention and 45% deletion of the genome of the common human- mouse ancestor.

The expected proportion of the genome retained in both genomes is thus 76% x 55% = 42%; very close to the sequence alignment observed.  Taking divergence of mouse and human lineages to have occurred at 75 million years gives a rate of deletion in the human lineage of approximately 9Mb per million years.  This rate is supported by comparisons of human and baboon genomes where the deletion of DNA material in the human lineage since divergence of man from baboon is 174Mb in about 22 million years.

The rate of deletion in the mouse lineage appears to be roughly twice the rate in the human lineage which similar to the ratio in rates observed for nucleotide substitutions.

Neutral Substitutions

Neutral substitutions are mutations which do not affect the organism because they have no influence on the phenotype.  They are therefore not affected by natural selection which in most cases acts to eliminate deleterious mutations from the species gene pool and to select for beneficial mutations. Their rate of fixation in the genome is therefore more or less constant over time.

The neutral substitution rate can be established by observing the difference between orthologous ancestral repeat sequences.  This analysis indicates a neutral divergence of between 0.46 and 0.47 substitutions per neutral site since divergence of mouse from man.  Since mouse substitutions have occurred at twice the rate of human lineage substitutions, these divergences distribute at 0.31 per site in mouse (4x10^-9 per year) and 0.16 per site in man (2x10^-9 per year).  Note that this is slightly lower than the number quoted on previous page in this sequence as lineage specific repeat sequences are used to derive this number

How does this compare with a different class of site which should also be neutral?  This class of site is called four-fold degenerate.  It occurs within functional genes.  We will, of course remember that there are 20 amino acids but that there are 64 arrangements of codons of three bases which code for these 20.  There is therefore redundancy in coding: several different codons represent the same amino acid.  (We'll also remember that mutations that change one codon to another that codes for the same amino acid is called a synonymous mutation - it has no effect on the amino acid, the protein or the organism). Because of the way that these synonymous codons are arranged (see Table below - four fold degenerate cases are highlighted in pink), there are eight cases where the third base of the codon can be any one of the four bases and we get the same amino acid - these sites are four fold degenerate - it doesn't matter what the base at these sites is - the amino acid, protein and organism are not affected. Because these sites are in the middle of functional genes, they represent a superb check on the neutral rate as measured in the definitely non-functional ancestral repeat sequences. And the answer?  The estimated number of substitutions per four fold degenerate site is 0.46 to 0.47, practically identical to the number of substitutions per site in ancestral repeat sequences at 0.46 to 0.47 substitutions per site.  This is a very beautiful result, since these mutation rates are in sites located in such different genomic environments, and can only be explained by mutation and selection according to evolutionary theory over millions of years.

 

U

C

A

G

U

UUU

Phe

UCU

Ser

UAU

Tyr

UGU

Cys

UUC

UCC

UAC

UGC

UUA

Leu

UCA

UAA

STOP

UGA

STOP

UUG

UCG

UAG

UGG

Trp

C

CUU

 Leu

CCU

Pro

CAU

His

CGU

Arg

CUC

CCC

CAC

CGC

CUA

CCA

CAA

Gln

CGA

CUG

CCG

CAG

CGG

A

AUU

Ile

ACU

Thr

AAU

Asn

AGU

Ser

AUC

ACC

AAC

AGC

AUA

ACA

AAA

Lys

AGA

Arg

AUG

Met

ACG

AAG

AGG

G

GUU

Val

GCU

Ala

GAU

Asp

GGU

Gly

GUC

GCC

GAC

GGC

GUA

GCA

GAA

Glu

GGA

GUG

GCG

GAG

GGG

The table of mRNA triplet codons and the amino acids they code for - note that in RNA, T (thymine) in DNA is transcribed as U (uracil).  mRNA is transcribed from the anti-sense strand of DNA.  Triplets where the third base in the codon is four fold degenerate are highlighted in pink

Conservation within genes

We can focus in on the conservation within genes.  By looking at the substitution rate across genes we can see which parts of genes are more conserved than others.

The coding region of genes are distinctive: substitutions per nucleotide site since mouse-human divergence = 0.165 (compared with 0.47 substitutions per neutral site); alignment gaps are ten fold less common than in non-coding sites.

Introns (non-coding DNA sequences that intersperse the exons or coding sequences in genes) have very similar statistics in substitution rate and alignment gaps to the general non-functional portions of the genome.  It seems therefore that introns have no function coded within their sequences contrary to some recent hypotheses.

5' UTRs and 3' UTRs (untranslated regions) are midway between coding and neutral regions with regard to conservation, as are promoter regions (but somewhat less conserved than UTRs).   The very highest conservation is at the splice sites where the introns are deleted and the exons are spliced in messenger RNA.  The polyadenylation site AATAAA (translated to the poly-A tail) is also centred on a region of high conservation.  This is very much compatible with evolutionary theory.

The conservation of the gene structures between mouse and man is truly remarkable.  86% of orthologous genes have the same number of coding exons and 46% have an identical sequence coding length.  91% of orthologous human-mouse exon pairs have identical exon length.  On the other hand only 1% of orthologous introns have identical length.   This is very strong evidence for conservation in coding exons and lack of conservation in non-coding introns.  This can only be explained by common ancestry and by the action of mutation and selection.

How much of the Genome is under Selection

An analysis of 100bp windows of DNA sequence from aligned man-mouse data indicates that the functional conserved subset (that part of the two genomes that is under selection) of the genomes represents about 5% of the genome.  

This is a very important (and unexpected) finding.  The portion of the genome associated with coding regions is only 1.5% of the genome.  There is, therefore, 3.5% of the mammalian genome under selection outside coding regions -  a much higher proportion than expected. Some is explainable: about 1% arises from untranslated regions of coding genes: that gets us to 2.5% expected conserved sequence .  The remaining 2.5% can include regions which control gene expression, non-protein coding RNAs and elements which contribute to chromosome structure.  Nevertheless, there remains significant conserved DNA sequence which is unexpected  More work is required here to unravel the reason for the higher than expected percentage of conserved sequence.

Variation in rate over the genome

The data indicates that there is a significantly different mutation rate at different places in the genome.  For example, the lowest neutral substitution rate is found in the X-chromosome.  That is consistent with the finding that the mutation rate is lower (perhaps twice as low) in female meiosis compared with male meiosis.  (Meiosis is the process by which sex cells are duplicated).  Because the proportion of time spent in the female germ line for Chromosome X is 2/3 and for autosomes (non-sex chromosomes) is 1/2, the substitution rate for the X-chromosome should be 8/9 or 89% ((0.67x0.5+0.33x1.0)/(0.5x0.5+0.5x1.0)).  And what is it?  It is 87% for four fold degenerate sites and 92% for ancestral repeat sites.  Another gorgeous result.

Another fascinating result is that the rate of substitution at a particular place in the genome is correlated with other evolutionary rates such as the rate of insertion of repeat sequences.  Most interestingly, regions with high rates of substitution and insertion are correlated with regions of high recombination. It seems that these features are caused by underlying local variations in mutation rate and therefore variations along the genome in DNA metabolism.

Conclusion

The sequencing of the mouse genome and its comparison with the human genome has resulted in many fascinating facts - many of which are entirely consistent with common ancestry of human and mouse and the action of mutation and selection and which cannot be explained by other mechanisms. 


 

BackUp