Molecular Level of Genetics

So far in this tutorial, chromosomes and genes have been described broadly without saying precisely what they are composed of and how they function.  In order to better understand them, we need to look at the molecular level of the cell.

Water is by far the most common type of molecule in the human body, normally amounting to 50-60% of our total adult weight.  Most of the remaining molecules fall into one of four categories:

1.    carbohydrates click this icon to hear the preceding term pronounced (sugars and starches)
2.    lipids click this icon to hear the preceding term pronounced (fats, oils, waxes, fatty acids, triglycerides and cholesterol)
3.    proteins click this icon to hear the preceding term pronounced
4.    nucleic acids click this icon to hear the preceding term pronounced

Proteins are large chain-like molecules that are twisted and folded back on themselves in complex patterns.  They serve as structural material for the body, gas transporters in blood, hormones click this icon to hear the preceding term pronounced, antibodies click this icon to hear the preceding term pronounced, neurotransmitters click this icon to hear the preceding term pronounced, and enzymes click this icon to hear the preceding term pronounced.  When looking at someone, you mostly see proteins since skin, hair, and muscles are primarily made of them.  Proteins acting as enzymes are particularly important substances because they trigger and control the chemical reactions by which carbohydrates, lipids, and other substances are created.  Our human bodies produce about 90,000 different kinds of proteins, all of which consist of simpler units called amino acids click this icon to hear the preceding term pronounced.  Proteins in all organisms are mostly composed of just 20 kinds of amino acids.  Proteins differ in the number, sequence, and kinds of amino acids.  Our bodies create some of these amino acids, while others come directly from food that we consume.

  alanine   glutamic acid     leucine   serine
  arginine   glutamine   lysine   threonine
  asparagine   glycine   methionine   tryptophan  
  aspartic acid     histidine   phenylalanine     tyrosine
  cysteine   isoleucine   proline   valine
(NOTE: The 8 amino acids in red cannot be synthesized by the
human body and must be obtained from food.  They are referred
to as "essential amino acids."  Arginine and histidine (shown in
green) evidently are essential only for infants and are, therefore,
considered to be semi-essential.)

Proteins, and subsequently amino acids, are mostly made up of just four elements: carbon, oxygen, hydrogen, and nitrogen.  In fact, 96.3% of a human body is composed of these common elements.

The largest molecules in people and other organisms are nucleic acids.  Like proteins, they consist of very long chains of simpler units.  However, the components, shapes, and functions of nucleic acids differ significantly from those of proteins.  There are two basic forms of nucleic acids: DNA (deoxyribonucleic acid click this icon to hear the preceding term pronounced) and RNA (ribonucleic acid click this icon to hear the preceding term pronounced).  Both play critical roles in the production of proteins.  Some kinds of RNA perform other critical functions in our cells not related to protein synthesis.

A chromosome consists mainly of a single very long DNA molecule and proteins called histones click this icon to hear the preceding term pronounced that densely package the DNA.  There are many small RNA molecules surrounding the DNA as well.  Each of our DNA molecules contains the genetic codes, or genes, for the synthesis of many different kinds of proteins and for the regulation of other genes.  In a sense, a DNA molecule contains a sequence of permanently stored blueprints or recipes that are used mostly by our cells to assemble proteins out of amino acids.  In other words, chromosomes are packets of information encoded by DNA.  If all of the DNA molecules in a single human cell were straightened out and arranged end to end, it would form a thread about 6 feet long but only a few atoms across.  If the DNA in all of a normal adult human's cells were arranged in this way, the thread could reach the moon about 6,000 times.

DNA molecules have the shape of a double helix click this icon to hear the preceding term pronounced, which is like a twisted ladder.  The sides of the ladder are composed of sugar and phosphate units, while the rungs consist of complementary pairs of four different chemical bases.  The base guanine only bonds to cytosine and adenine only bonds to thymine.  Each combined sugar, phosphate, and base subunit is a nucleotide click this icon to hear the preceding term pronounced.

drawing of a section of a DNA molecule showing the double helix molecular shape and the relative location of sugars, phosphates, and bases (guanine, cytosine, adenine, and thymine)

Section of a DNA molecule showing the double helix molecular shape
  Drawing of the relationship of a DNA molecule to a Chromosome

The linear sequence of base pairs along the length of a DNA molecule is the genetic code for the assembly of particular amino acids to make specific types of proteins.  Therefore, a gene is essentially a recipe consisting of a sequence of some of these base pairs.  The sequence is usually not continuous but is in several different sections of a DNA molecule.  Sections of the same gene can be assembled in different ways resulting in recipes for different kinds of proteins.  This "alternative splicing" process is common, occurring in 92-94% of human genes.

Only 1.1-1.5% of the approximately 3 billion base pairs in human DNA actually code for proteins.  These meaningful code sequences are called exons click this icon to hear the preceding term pronounced.  The remaining 98+% of our DNA base pairs were in the past thought to consist merely of genetic junk, including meaningless code section duplications and ancient remnants of parasitic DNA that invaded our ancestors' cells many millions of years ago.  However, it is now becoming clear that much of this "junk" actually has important functions.  These non-protein coding sections are referred to as introns click this icon to hear the preceding term pronounced.  At least 80% of the intron regions of human DNA contain switches to turn the genes on and off.  There are at least 4 million of these switches.  Many of them are involved in monitoring and controlling cell functions throughout the body.  Some also may be involved in a wide range of diseases including autoimmune responses (multiple sclerosis, lupis, rheumatoid arthritis, Crohn's disease, and celiac disease).

  Don't Throw It Out: 'Junk DNA' Essential In Evolution--new evidence of the importance
       of intronsThis link takes you to an external website.  To return here, you must click
       the "back" button on your browser program.           (length = 3 mins, 45 secs)

Textbooks written before 2001 most often stated that there are 100,000 human genes.  This estimate was significantly reduced to about 30,000 as a result of completion of the initial phase of the Human Genome Project in that year.  The number was further revised downward in 2004 by the National Human Genome Research Institute (NHGRI).  It is now known that there are about 20,500 still functioning human protein-coding genes.  Since the human genome "parts list" was compiled, research has largely focused on what these parts do--i.e., what proteins they code for and what those proteins do in our bodies.  In addition, there is intense ongoing research to understand the short RNA molecules that are coded for by introns.

  Extract Your Own DNA--simple how-to instructions to separate out your own DNA at home.
       This link takes you to an external website.  To return here, you must click the "back" button
       on your browser program.           (length = 2 mins, 45 secs)

There is far more DNA variation between people than was thought possible in 2001 when the human genome was first worked out.  The variations mostly show up in 1500 DNA regions (about 12% of the total).  The differences are commonly in the form of deletions, duplications, and reversals of DNA segments and individual bases.  Research is still at the beginning of the process of determining the significance of these differences.  Understanding them very likely will shed considerable light on whether or not individuals might develop particular diseases.  These variations are far more common and extensive than previously assumed.  Current estimates are that 65-80% of people have copy number variants (CNV's) that are at least 100,000 base units long.

About half of the sequences of base units in human nuclear DNA are "mobile elements."  They move around inserting new copies of themselves.  When these insertions go into a gene, the result is a changed protein recipe.  This is a major source of new genetic variation and potentially a cause of genetic diseases such as hemophilia and muscular dystrophy in families that did not previously have these defective genes.

  Black and white photo of human chromosomes with telomeres shown in white   Human chromosomes
with telomeres shown
in white

NOTE:  At both ends of chromosomes there are telomere caps, which are sections of several thousand repeat DNA base unit sequences that are not part of genes (TTAGGG, TTAGGG, TTAGGG, etc.).  Telomeres help protect critical DNA sequences from being changed by recombination.  However, the telomeres become slightly shorter every time cell division occurs.  When they have only around 77 base units remaining, they become unstable, leading to the fusion of chromosomes and cell death.  This is a normal consequence of aging.  Very short telomeres are also associated with early stages of cancer.

Not all of our DNA is in the cell nucleus.  A small amount is in a circular looping chromosome in mitochondria click this icon to hear the preceding term pronounced--organelles located in the cytoplasm.  Mitochondria use glucose, a sugar, derived from the food we eat to produce adenosine triphosphate (ATP).  This is the fuel for many cell functions.  Without ATP, cell activity would cease and we would die.  There are about 500 mitochondria in each human cell.  Muscle cells have more because they need additional ATP to function.  The 16,569 base units of human mitochondrial DNA (mtDNA) have only 37 genes of which 13 code for proteins. 

drawing of a generalized animal cell with mitochondria highlighted  

animal cell

  Sperm cell

Unlike nuclear DNA (nDNA) in chromosomes, mtDNA is inherited almost exclusively from our mothers.  In sperm cells, the mitochondria are located in coils between the flagellum tail and the head.  They provide the fuel that allows sperm to move.  It has been widely assumed that, at conception, the flagellum along with the mitochondria do not enter the ovum.  We now know that this is not always true in the case of humans and other mammals.  However, less than .1% of the mitochondrial DNA in a zygote normally comes from the father, and the genes in the paternal mtDNA apparently get turned off.  Subsequently, for all intent purposes, we only inherit maternal mtDNA.

The second type of nucleic acid, RNA, consists of molecules that are single stranded copies of nuclear DNA segments.  They are smaller than DNA molecules and do not have the double helix shape.  In addition, the DNA base thymine is replaced by the base uracil in RNA molecules.  The sugar component is also somewhat different.  RNA is found in both the cell nucleus and the cytoplasm.  The different kinds of RNA are described below.

Protein Synthesis

The genetic code is central to two critical functions within our cells.  They are the creation of new proteins and the duplication of chromosomes during cell division.  The former process is actually an assembly of existing amino acids in their proper sequences for each type of protein.  Subsequently, it is referred to as protein synthesis rather than production.  In this process, the recipe for a needed protein is first copied.  Specifically, the relevant DNA code sequence is transcribed, or copied, to RNA. The process begins by a section of a DNA molecule unwinding and then unzipping in response to several interacting enzymes.  The separation occurs between the bases, as shown below.

DNA molecule partially
and unzipping

  drawing of a DNA molecule partially unwinding and unzipping between the base pairs

Free nucleotides in the nucleus are attracted to complimentary bases on the exposed DNA strands (guanine to cytosine and adenine to thymine).  The result of this sequential bonding of base units is the formation of a messenger RNA (mRNA) molecule that is a copy, or transcription, of specific sections of the nuclear DNA molecule corresponding to a gene.  Many identical copies are made, one right after another.  In the process of mRNA formation, the non-coding sections are removed and the remaining exons are spliced together.

Free nucleotides attracted
to exposed bases of a
partially unzipped DNA

  drawing of free nucleotides being attracted to exposed bases of a partially unzipped DNA molecule

These new identical messenger RNA molecules then leave the nucleus and go out into the cytoplasm where the protein they are coded for is actually synthesized or assembled.

mRNA migrating out
of the cell nucleus

  schematic drawing of mRNA migrating out of the cell nucleus

Specifically, the messenger RNA molecules migrate from the chromosomes to the ribosomes click this icon to hear the preceding term pronounced, which are small granule-like organelles in the cytoplasm.  Some ribosomes are on the surface of long membrane networks called endoplasmic reticula click this icon to hear the preceding term pronounced, while others are free ribosomes.  Assembly of proteins takes place at the site of the ribosomes.

Generalized animal cell

  drawing of a generalized animal cell with ribosomes and endoplasmic reticula highlighted

Protein synthesis begins as ribosomes move along the messenger RNA strand and attach transfer RNA (tRNA) anticodons click this icon to hear the preceding term pronounced (each with 3 bases) to triplets of complementary bases on the mRNA.

drawing of protein synthesis at the ribosomes initiated by mRNA momentarily bonding with tRNA

Protein synthesis at the ribosomes initiated by mRNA momentarily bonding with tRNA
(This schematic representation is a simplification of the actual shapes and processes.)

Each transfer RNA attracts and brings a specific amino acid along with it.  As a ribosome translates (or activates) the messenger RNA code, a protein chain is assembled, one amino acid at a time.  Each kind of amino acid has a particular codon that specifies it.  A codon is a sequence of three nucleotide components chemically bound together (illustrated below).  As mentioned earlier, every nucleotide consists of a sugar, a phosphate, and a base.  Codons differ in terms of the sequence of their three bases.  For example, the sequence CAG (cytosine-adenine-guanine) is a code for the amino acid glutamine.

drawing of a generalized codon illustrating the sugar-phosphate-base chemical bond

Sugar-phosphate-base chemical bond of a codon   
(Simplified representation rather than actual molecular shape)   

This genetic code permits 64 different codons because each of the 3 nucleotides can have 1 of the 4 bases ( 4 X 4 X 4 = 64 ).  Because there are many fewer than 64 amino acids, the code system has built in redundancy--most amino acids can be attracted by transfer RNA having several different base triplet combinations.  In other words, some codons are functionally equivalent, as shown in the table below.  For instance, asparagine is specified with the mRNA codon sequence AAU (adenine-adenine-uracil).  However, AAC (adenine-adenine-cytosine) also works.


Amino acids

Corresponding DNA codons

Corresponding RNA codons
asparagine TTA, TTG AAU, AAC
aspartic acid CTA, CTG GAU, GAC
cysteine ACA, ACG UGU, UGC
glutamic acid CTT,CTC GAA, GAG
glutamine GTT, GTC CAA, CAG
histidine GTA, GTG CAU, CAC
isoleucine TAA, TAG, TAT AUU, AUC, AUA
lysine TTT, TTC AAA, AAG
methionine (start codon) TAC AUG
phenylalanine AAA, AAG UUU, UUC
tryptophan ACC UGG
tyrosine ATA, ATG UAU, UAC
(stop codon) ATT, ATC, ACT UAA, UAG, UGA
NOTE:  the DNA base thymine (T) is replaced with uracil (U) in RNA.

Not all codons specify amino acid components to be included in a protein.  For instance, a start codon appears in DNA at the beginning of the code sequence for each gene and a stop codon is at the end.  In other words, they indicate where a protein recipe begins and ends.

  From DNA to Protein--video clip illustrating protein synthesis
View in: QuickTime or Windows Media Player  (length = 3 mins, 25 secs)      

Most plant and animal cells, including those of humans, have thousands to millions of ribosomes.  Many ribosomes simultaneously translate identical strands of messenger RNA.  As a result, the synthesis of proteins can be rapid and massive.  These same processes usually occur at the same time in millions of cells when a particular protein is needed.  Inevitably, there are some errors in the code copying process.  The result is mutations that cause the creation of incorrect proteins.  However, there is an error checking and correction process carried out by enzymes that catches many of these errors.  Recent research suggests that DNA to RNA copy mistakes are common.

Most of our genes code for several different proteins.  This should not be surprising since we have only about a third as many genes as we do kinds of proteins.  Some genes act in consort with other genes to produce the same protein.  Some genes do not code for proteins at all.  Recent research has shown that at least 200-255 of our genes code for micro RNA (miRNA) molecules instead.  These are not messenger RNA's that transport the recipe for protein synthesis.  Rather, they are small RNA molecules consisting of only around 20-25 base units.  They perform important functions similar to enzymes in regulating chemical reactions in our cells, especially in the embryonic stage at the beginning of life.  It is thought that at least 1/3 of human genes are controlled in some way by micro-RNA molecules. 

DNA Replication

In addition to keeping the blueprints for protein synthesis, DNA has one further function.  It replicates, or duplicates, itself.  This occurs whenever a chromosome is duplicated for cell division.  At the beginning of this process, the DNA molecule of a stretched out chromosome unwinds and unzips along its bases beginning at one end.  Then in response to an enzyme, free nucleotides pair up with corresponding bases on both of the DNA strands, as illustrated below.  This results in the formation of two exact copies of the original molecule.  Nuclear DNA replication occurs in the rest period (interphase) just before mitosis and meiosis.


  drawing of a DNA molecule replicating

Occasionally, an error is made in DNA replication.   For example, an incorrect base pair may be included. This constitutes a mutation.  If it occurs in the formation of sex cells, the mutation may be inherited and passed on in future generations.  Such errors are the principal source of new genetic variation for a species and, subsequently, play a major role in evolution.  DNA replication errors are also responsible for changes in somatic cells that result in the uncontrolled tumor growths of cancers.

  How DNA replicates--video clip illustrating DNA replication
View in: QuickTime or Windows Media Player  (length = 48 secs)      

Universality of the Genetic Code

The DNA code system of humans is not unique but is shared by all living things on our planet.  The same codons code for the same amino acids in people, dogs, fleas, and even bacteria.  In addition, we share many genes with other creatures.  For instance, about 90% of human genes are identical to those of a mouse.  Even more surprising is the fact that more than 1/3 of our genes are shared with a primitive group of worm species known as nematodes.  The universal nature of the genetic code is compelling evidence for the evolution of all organisms from the same early life forms.

  The Common Genetic Code--evidence that humans share a common genetic code with
        other organismsThis link takes you to an external website.  To return here, you must
        click the "back" button on your browser program.              (length = 4 mins, 20 secs)


Whether a gene is copied for protein synthesis and what the product of that copy becomes is largely determined by signals from proteins acting as markers and switches along the DNA double helix structure.  This chemical signaling system, referred to as the epigenome, is very likely as important as the DNA itself in determining the phenotype of individuals.  It is becoming increasingly clear that epigenetic signals can be altered by the environment.  This partly explains why monozygotic twins progressively become different from one another as they grow older despite the fact that they are genetically identical.

Epigenetic changes can be passed on to future generations because epigenome proteins are inherited along with DNA in chromosomes from parents.  Unlike mutations in DNA sequences, however, epigenetic alterations potentially can be reversible.  For example, if a change in an epigenetic protein causes a disease, it could be possible to alter that protein and bring about a cure.  As a consequence, learning more about the human epigenome is likely to be an important area for future medical research. 

  Epigenetics--excerpt from the PBS series Nova Science Now (July 24, 2007)
        This link takes you to an external website.  To return here, you must click the "back" button
        on your browser program.              (length = 13 mins, 2 secs)

The effects of changes in an individual's epigenome can be far reaching.  Not only can they affect physical appearance and health, but they can also alter behavior.  For instance, the babies of rat mothers who are unusually nurturing are likely to have altered expressions of the genes involved with stress responses. These changes allow them to handle stress better as adults.  It is thought that a similar connection occurs in humans.  Likewise, some researchers have suggested that children of parents who were severely undernourished at the time of conception have a greater chance of developing Schizophrenia.  The assumption is that famine conditions resulted in epigenome alterations responsible for this severe mental disorder in future children.

When sperm cells enter an egg at conception, they bring along with them substantial amounts of paternal RNA and epigenetic proteins which apparently play an important role in the development of embryos.

NOTE:  If you had difficulty in understanding these complex processes, you may wish to go over them again.  Keep in mind that the graphic shapes used here to model the components of DNA, RNA, protein, and amino acid are not accurate representations of the real shapes and the relative sizes.  You can also review this information with the animated tutorials on DNA, genes, chromosomes, and proteins at the Learn Genetics web site linked below: 

  Learn Genetics: Tour of the Basics--animated tutorial showing the relationship between
        DNA, genes, chromosomes, and proteins.  Produced by the Genetics Science Learning
        Center of the University of Utah.  This link takes you to an external website.  To return
        here, you must click the "back" button on your browser program.

NOTE:  There is an RNA-based defense system in the cytoplasm of cells that functions to prevent the genetic material of invading viruses from being replicated.  This is known as RNA interference or RNAi.  The video linked below provides a simple explanation of how this system works.

  RNAi explained--excerpt from the PBS series Nova Science Now (July 25, 2005)
        This link takes you to an external website.  To return here, you must click the "back" button
        on your browser program.              (length = 15 mins)


  Previous Topic    Return to Menu    Practice Quiz  

Copyright 1998-2014 by Dennis O'Neil. All rights reserved.
illustration credits