Understanding the Code of Life is important to understanding how our bodies function and how they are evolving.  This scond post in a series explains the Code of Life. 


Scientists predicted, based on a vestigial organ in the human body, that the human genome should contain a broken version of a particular reptile gene.  This predicted gene has now been discovered, in the hidden 'comment' lines of the human genetic code.



Part 1 - the code of computers

Part 2 - the code of life (below)

Part 3 - why they're similar

Part 4 - and why that's significant


The Code of Life

This is a eukaryotic cell (image from gene therapy review):


The human body contains about 10,000,000,000,000 of them, each one is only 0.01 mm in length but it contains an immensely varied chemical soup.  Everything the human body does is controlled by the way the cells move or the chemicals the cells release; and that's determined by which chemicals in the soup react with each other.

The cell controls which reactions happen in two ways.  Firstly it uses membranes to divide the cell into different 'reaction chambers'.  And secondly it uses proteins as catalysts.  Each specific catalyst has a dented surface that exactly matches the particular chemicals in the soup that it wants to react.  Without the catalyst the chemicals bump together at random, only reacting if they happen to hit each other facing in exactly the right direction.  With the catalyst the correct orientation happens every time, and the speed of the reaction is increased by a factor of a million or even a billion.

But it wasn't until 1953 that we cracked the code of how the cell decides which protein catalysts to make, or where it stores the information.


Locating the Code

It was known that the nucleus of the cell contained structures called chromosomes (chromosome images from Wikipedia):

Humans have 23 pairs of them:

In 1953, James Watson and Francis Crick discovered that our chromosomes are made up of DeoxyriboNucleic Acid (DNA), arranged in a double-helix structure (image from Wikipedia).  Read more about Watson & Crick, the basics of DNA and how it "does it's thing" in these past journal posts.

And that genetic information is encoded in this twisted ladder using the sequence of 'base pairs' that make up the rungs of the ladder.  There are 4 base pairs, which are known by their initial letters: T, C, A & G.

Chromosomes vary in length, but average about 100,000,000 base pairs in length (source).  Each base pair is about 0.34 nanometers long, which means that the DNA in each cell, if fully unwound, would stretch almost 2 meters high.


Cracking the Code

In 1961, Francis Crick and Sydney Brenner demonstrated that these base pairs are arranged into groups of three, known as "codons", such as: TGG or CTA, each of which corresponds to a particular amino acid (image from NASA):

Actually working out which codon corresponded to which amino acid was done in a massive race by labs around the world, triggered by Crick and Brenner's announcement.  Each lab synthesised a sequence of base pairs, such as "CCC-CCC-CCC-CCC" and then fed it into the right part of a cell, to see which amino acid then got produced.

Not all the codons correspond to an amino acid.  Some combinations are interpreted as "START SEQUENCE" and "STOP SEQUENCE".  These START and STOP signals are used to divide the chromosome into regions called "genes", each gene sequence corresponding to a particular sequence of amino acids that forms a specific protein catalyst.

(If you'd like to know more about the code, and how it was discovered, I recommend as a good starting place this Introduction to the subject. )



A huge thanks to Clairwil for this special series that will walk us through using human genetics to establish ancestry of our species.  For more from Clairwil, or discussion on the topic of evolution, see her new group Debate Evolution vs Creationsim.  It's not just for debate - by reading, you'll learn alot about evolution, genetics and what's new in our scientific understanding of where we came from.

Add A Comment


Jul. 20, 2011 at 7:27 PM

The above journal entry was originally published by science_spot

I have archived it here, in case CafeMom's changes to journals make the original inaccessible.


Message Friend Invite (Original Poster)

Jul. 20, 2011 at 7:30 PM

The Third Discoverer

Rosalind Franklin, by the way, also worked out that DNA had a double helix structure, and might have shared the Nobel Prize, if she hadn't tragically died of cancer at the age of 37.

WikiPedia relates:

By January 1953, Franklin had reconciled her conflicting data and had started to write a series of three draft manuscripts, two of which included a double helical DNA backbone (see below). Her two A form manuscripts reached Acta Crystallographica in Copenhagen on 6 March 1953,[42] one day before Crick and Watson had completed their model.[43] Franklin must have mailed them while the Cambridge team was building their model, and certainly had written them before she knew of their work. On 8 July 1953 she modified one of these "in proof", Acta articles "in light of recent work" by the King's and Cambridge research teams.

You can read more about her story, and the sexism she faced, at: (WikiPedia) (spartacus) and (sdsc)

Message Friend Invite (Original Poster)

Jul. 20, 2011 at 7:31 PM

By the way, if you like beautiful results, check out:

Perez,  Jean-Claude [2010] "Codon Populations in Single-stranded Whole Human Genome DNA Are Fractal and Fine-tuned by the Golden Ratio 1.618"

He has linked the frequency in which the codons appear to a fractal, known formally as the 'paper folding' or 'dragon' curve, based around the Golden Ratio (approximately 1.618) beloved of artists and appearing many times in nature (such as spiral shells) because it derives from a fundamental property of recursive relationships, and nature loves recursion.

You'll realise how it got its name when you see the fractal itself

It does look a bit like a Chinese dragon, doesn't it?

Message Friend Invite (Original Poster)

Jul. 20, 2011 at 7:31 PM

The above is near universal, but there are a few exceptions, and these can tell us a great deal about our history.  For example, it has long been suggested that one particular part of the cell (the mitochondria, which is the cell's power plant) started off as a seperate life form which got subsumed into the cell via symbiosis.  Mitochondria has its own set of genetic info, not kept in the cell's nucleus.  From (source):

Mitochondrial genes

When mitochondrial mRNA from animals or microorganisms (but not from plants) is placed in a test tube with the cytosolic protein-synthesizing machinery (amino acids, enzymes, tRNAs, ribosomes) it fails to be translated into a protein.

The reason: these mitochondria use UGA to encode tryptophan (Trp) rather than as a chain terminator. When translated by cytosolic machinery, synthesis stops where Trp should have been inserted.

In addition, most

  • animal mitochondria use AUA for methionine not isoleucine and
  • all vertebrate mitochondria use AGA and AGG as chain terminators.
  • Yeast mitochondria assign all codons beginning with CU to threonine instead of leucine (which is still encoded by UUA and UUG as it is in cytosolic mRNA).

Message Friend Invite (Original Poster)

Jul. 20, 2011 at 7:31 PM

As mentioned above in the Dragon curve, the code contains redundancy (the same amino acid can be encoded in several ways), and not all these ways are equally likely.  From (source):

All but two of the amino acids (Met and Trp) can be encoded by from 2 to 6 different codons. However, the genome of most organisms reveals that certain codons are preferred over others. In humans, for example, alanine is encoded by GCC four times as often as by GCG. This probably reflects a greater translation efficiency by the translation apparatus for certain codons over their synonyms.

  • At the start of translation, two or more of a set of synonymous codons (e.g., the 6 codons that incorporate leucine in the growing protein) are used alternately. The need to locate first one and then another tRNA for that amino acid slows down the rate of translation.
    • This may aid in keeping ribosomes from bumping into each other on the polysome.
    • It may also provide more time for the nascent protein to begin to fold correctly as it emerges from the ribosome.
  • Once translation is well underway (after 30-50 amino acids have been added), one particular codon tends to be chosen each time its amino acid is called for. Presumably this now increases the efficiency, i.e., speed, of translation.
  • Most organisms have more than the 61 genes needed to encode a tRNA for each of the 61 codons (we have 270 tRNA genes). The presence of multiple genes for tRNAs with an identical anticodon increases the concentration of tRNAs able to bind a particular codon. Messenger RNAs - especially those of active genes - tend to favor codons that correspond to abundant tRNAs carrying the anticodon.

Codon bias even extends to pairs of codons: wherever a human protein contains the amino acids Ala-Glu, the gene encoding those amino acids is seven times as likely to use the codons GCAGAG rather than the synonymous GCCGAA.


However, despite this, there is still a good amount of variation in how a particular protein gets encoded by different animals, and it is possible to construct a philogenetic tree just by considering how similar the encoding is between different pairs of animals. (read more)

Message Friend Invite (Original Poster)

Jul. 20, 2011 at 7:31 PM

Finally, yet another aside that never made it into the jounal because although I find it fascinating, it is utterly irrelevant to the thrust of the 4 part journal arc:


When a cell divides, sometimes (very rarely - on average once in every 10,000,000 base pairs) the DNA isn't copied exactly, and a mutation occurs.  Some of these are point mutations (a single base pair is altered), and these may have no effect, if they happen at a point where the codon has some degeneracy (meaning that whichever of the 4 base pairs appears at that spot, the same amino acid is encoded for, because the code is redundant and all four variations happen to be the same amino acid).  You can see, when you look at the code in detail, that this fourfold pattern is quite common, and this is the reason for it.

But not all mutations are point mutations.  Sometimes a single basepair is added or removed rather than changed in place, and when that happens the whole frame is shifted and every codon from that point on is misinterpreted.  Most often this causes massive problems (like in Tay-Sachs disease), but it also opens the possibility of massive changes in functionality from a single mutation.  It is probably the reason why there is only one START signal, but multiple encodings of STOP.

Message Friend Invite (Original Poster)

Want to leave a comment and join the discussion?

Sign up for CafeMom!

Already a member? Click here to log in