So far in this tutorial, chromosomes and genes have been described broadly without saying precisely what they are composed of and how they function. In order to better understand them, we need to look at the molecular level of the cell.Most of the molecules found in humans and other living organisms fall into one of four categories:
1. carbohydrates (sugars and starches)
2. lipids (fats, oils, and waxes)
3. proteins 4. nucleic acids Proteins are large chain-like molecules that are twisted and folded back on themselves in complex patterns. They serve as structural material for the body, gas transporters in blood, hormones
, antibodies
, neurotransmitters
, and enzymes
. When looking at someone, you mostly see proteins since skin, hair, and muscles are primarily made of them. Proteins acting as enzymes are particularly important substances because they trigger and control the chemical reactions by which carbohydrates, lipids, and other substances are created. Our human bodies produce about 90,000 different kinds of proteins, all of which consist of simpler units called amino acids
. Proteins in all organisms are mostly composed of just 20 kinds of amino acids. Proteins differ in the number, sequence, and kinds of amino acids. Our bodies create some of these amino acids, while others come directly from food that we consume.
AMINO ACIDS alanine glutamic acid leucine serine arginine glutamine lysine threonine asparagine glycine methionine tryptophan aspartic acid histidine phenylalanine tyrosine cysteine isoleucine proline valine
(NOTE: The 8 amino acids in red cannot be synthesized by the human body and must be obtained from food. They are referred to as "essential amino acids." Arginine and histidine evidently are essential only for children.) Proteins, and subsequently amino acids, are mostly made up of just four elements: carbon, oxygen, hydrogen, and nitrogen. In fact, 96.3% of a human body is composed of these common elements.
The largest molecules in people and other organisms are nucleic acids. Like proteins, they consist of very long chains of simpler units. However, the components, shapes, and functions of nucleic acids differ significantly from those of proteins. There are two basic forms of nucleic acids: DNA (deoxyribonucleic acid
) and RNA (ribonucleic acid
). Both play critical roles in the production of proteins. Some kinds of RNA perform other critical functions in our cells not related to protein synthesis.
A chromosome consists mainly of a single very long DNA molecule and proteins called histones
that densely package the DNA. There are many small RNA molecules surrounding the DNA as well. Each of our DNA molecules contains the genetic codes, or genes, for the synthesis of many different kinds of proteins and for the regulation of other genes. In a sense, a DNA molecule contains a sequence of permanently stored blueprints or recipes that are used mostly by our cells to assemble proteins out of amino acids. In other words, chromosomes are packets of information encoded by DNA. If all of the DNA molecules in a single human cell were straightened out and arranged end to end, it would form a thread about 6½ feet long but only a few atoms across. If the DNA in all of a normal adult human's cells were arranged in this way, the thread could reach the moon about 6,000 times.
DNA molecules have the shape of a double helix
, which is like a twisted ladder. The sides of the ladder are composed of sugar and phosphate units, while the rungs consist of complementary pairs of four different chemical bases. The base guanine only bonds to cytosine and adenine only bonds to thymine. Each combined sugar, phosphate, and base subunit is a nucleotide
.
Section of a DNA molecule showing the double helix molecular shape
The linear sequence of base pairs along the length of a DNA molecule is the genetic code for the assembly of particular amino acids to make specific types of proteins. Therefore, a gene is essentially a recipe consisting of a sequence of some of these base pairs. The sequence is usually not continuous but is in several different sections of a DNA molecule. Sections of the same gene can be assembled in different ways resulting in recipes for different kinds of proteins. This "alternative splicing" process is common, occurring in 92-94% of human genes.
Only 1.1-1.5% of the approximately 3 billion base pairs in human DNA actually code for proteins. These meaningful code sequences are called exons
. The remaining 98+% of our DNA base pairs were in the past thought to consist merely of genetic junk, including meaningless code section duplications and ancient remnants of parasitic DNA that invaded our ancestors' cells many millions of years ago. However, it is now becoming clear that this "junk" actually has important functions. These non-protein coding sections are referred to as introns
. Apparently most are translated into short RNA molecules which are involved in monitoring and controlling virtually every cell function. Some of these RNA's also act as enhancers or suppressors of genes.
Textbooks written before 2001 most often stated that there are 100,000 human genes. This estimate was significantly reduced to about 30,000 as a result of completion of the initial phase of the Human Genome Project in that year. The number was further revised downward in 2004 by the National Human Genome Research Institute (NHGRI). It is now known that there are about 20,000-25,000 still functioning human protein-coding genes. Since the human genome "parts list" was compiled, research has largely focused on what these parts do--i.e., what proteins they code for and what those proteins do in our bodies. In addition, there is intense ongoing research to understand the short RNA molecules that are coded for by introns.
Not all of our DNA is in the cell nuclei. A small amount is in a circular looping chromosome in mitochondria
--organelles located in the cytoplasm. Mitochondria use glucose, a sugar, derived from the food we eat to produce adenosine triphosphate (ATP). This is the fuel for many cell functions. Without ATP, cell activity would cease and we would die. There are about 500 mitochondria in each human cell. Muscle cells have more because they need additional ATP to function. The 16,569 base units of human mitochondrial DNA (mtDNA) have only 37 genes of which 13 code for proteins.
Generalized animal cell
Sperm cellUnlike nuclear DNA (nDNA) in chromosomes, mtDNA is inherited almost exclusively from our mothers. In sperm cells, the mitochondria are located in coils between the flagellum tail and the head. They provide the fuel that allows sperm to move. It has been widely assumed that, at conception, the flagellum along with the mitochondria do not enter the ovum. We now know that this is not always true in the case of humans and other mammals. However, less than .1% of the mitochondrial DNA in a zygote normally comes from the father, and the genes in the paternal mtDNA apparently get turned off. Subsequently, for all intent purposes, we only inherit maternal mtDNA.
The second type of nucleic acid, RNA, consists of molecules that are single stranded copies of nuclear DNA segments. They are smaller than DNA molecules and do not have the double helix shape. In addition, the DNA base thymine is replaced by the base uracil in RNA molecules. The sugar component is also somewhat different. RNA is found in both the cell nucleus and the cytoplasm. The different kinds of RNA are described below.
Protein SynthesisThe genetic code is central to two critical functions within our cells. They are the creation of new proteins and the duplication of chromosomes during cell division. The former process is actually an assembly of existing amino acids in their proper sequences for each type of protein. Subsequently, it is referred to as protein synthesis rather than production. In this process, the recipe for a needed protein is first copied. Specifically, the appropriate DNA code sequence is transcribed, or copied, to RNA. The process begins by a section of a DNA molecule unwinding and then unzipping in response to several interacting enzymes. The separation occurs between the bases, as shown below.
DNA molecule partially unwinding and unzipping
Free nucleotides in the nucleus are attracted to complimentary bases on the exposed DNA strands (e.g., guanine to cytosine). The result of this sequential bonding of base units is the formation of a messenger RNA (mRNA) molecule that is a copy, or transcription, of specific sections of the nuclear DNA molecule corresponding to a gene. Many identical copies are made, one right after another. In the process of mRNA formation, the non-coding sections are removed and the remaining exons are spliced together.
Free nucleotides attracted to exposed bases of a partially unzipped DNA molecule
These new identical messenger RNA molecules then leave the nucleus and go out into the cytoplasm where the protein they are coded for is actually synthesized or assembled.
mRNA migrating out of the cell nucleusSpecifically, the messenger RNA molecules migrate from the chromosomes to the ribosomes
, which are small granule-like organelles in the cytoplasm. Some ribosomes are on the surface of long membrane networks called endoplasmic reticula
, while others are free ribosomes. Assembly of proteins takes place at the site of the ribosomes.
Generalized animal cell
Protein synthesis begins as ribosomes move along the messenger RNA strand and attach transfer RNA (tRNA) anticodons
(each with 3 bases) to triplets of complementary bases on the mRNA.
Protein synthesis at the ribosomes initiated by mRNA momentarily bonding with tRNA
(This schematic representation is a simplification of the actual shapes and processes.)
Each transfer RNA attracts and brings a specific amino acid along with it. As a ribosome translates (or activates) the messenger RNA code, a protein is assembled, one amino acid at a time. Each kind of amino acid has a particular codon that specifies it. A codon is a sequence of three nucleotide components chemically bound together (illustrated below). As mentioned earlier, every nucleotide consists of a sugar, a phosphate, and a base. Codons differ in terms of the sequence of their three bases. For example, the sequence CAG (cytosine-adenine-guanine) is a code for the amino acid glutamine.
Sugar-phosphate-base chemical bond of a codon
(Simplified representation rather than actual molecular shape)This genetic code permits 64 different codons because each of the 3 nucleotides can have 1 of the 4 bases ( 4 X 4 X 4 = 64 ). Because there are many fewer than 64 amino acids, the code system has built in redundancy--most amino acids can be attracted by transfer RNA having several different base triplet combinations. In other words, some codons are functionally equivalent, as shown in the table below. For instance, asparagine is specified with the mRNA condon sequence AAU (adenine-adenine-uracil). However, AAC (adenine-adenine-cytosine) also works.
|
DNA AND RNA CODES FOR AMINO ACIDS |
||
| Amino acids |
DNA triplets |
mRNA codons |
|---|---|---|
| alanine | CGA, CGG, CGT, CGC | GCU, GCC, GCA, GCG |
| arginine | GCA, GCG, GCT, GCC, TCT, TCC | CGU, CGC, CGA, CGG, AGA, AGG |
| asparagine | TTA, TTG | AAU, AAC |
| aspartic acid | CTA, CTG | GAU, GAC |
| cysteine | ACA, ACG | UGU, UGC |
| glutamic acid | CTT,CTC | GAA, GAG |
| glutamine | GTT, GTC | CAA, CAG |
| glycine | CCA, CCG, CCT, CCC | GGU, GGC, GGA, GGG |
| histidine | GTA, GTG | CAU, CAC |
| isoleucine | TAA, TAG, TAT | AUU, AUC, AUA |
| leucine | AAT, AAC, GAA, GAG, GAT, GAC | UUA, UUG, CUU, CUC, CUA, CUG |
| lysine | TTT, TTC | AAA, AAG |
| methionine (start codon) | TAC | AUG |
| phenylalanine | AAA, AAG | UUU, UUC |
| proline | GGA, GGG, GGT, GGC | CCU, CCC, CCA, CCG |
| serine | AGA, AGG, AGT, AGC, TCA, TCG | UCU, UCC, UCA, UCG, AGU, AGC |
| threonine | TGA, TGG, TGT, TGC | ACU, ACC, ACA, ACG |
| tryptophan | ACC | UGG |
| tyrosine | ATA, ATG | UAU, UAC |
| valine | CAA, CAG, CAT, CAC | GUU, GUC, GUA, GUG |
| (stop codon) | ATT, ATC, ACT | UAA, UAG, UGA |
| NOTE: the DNA base thymine (T) is replaced with uracil (U) in the formation of mRNA. |
Not all codons specify amino acid components to be included in a protein. For instance, a start codon appears in DNA at the beginning of the code sequence for each gene and a stop codon is at the end. In other words, they indicate where a protein recipe begins and ends.
Most plant and animal cells, including those of humans, have tens of thousands of ribosomes. Many ribosomes simultaneously translate identical strands of messenger RNA. As a result, the synthesis of proteins can be rapid and massive. These same processes can occur at the same time in millions of cells when a particular protein is needed. Inevitably, there are some errors in the code copying process. The result is the creation of incorrect proteins. However, there is an error checking and correction process carried out by enzymes that catches most of these potentially harmful mistakes.
From DNA to Protein--video clip illustrating protein synthesis
View in: QuickTime or Windows Media Player (length = 3 mins, 25 secs)
DNA ReplicationIn addition to keeping the blueprints for protein synthesis, DNA has one further function. It replicates, or duplicates, itself. This occurs whenever a chromosome is duplicated for cell division. At the beginning of this process, the DNA molecule of a stretched out chromosome unwinds and unzips along its bases beginning at one end. Then in response to an enzyme, free nucleotides pair up with corresponding bases on both of the DNA strands, as illustrated below. This results in the formation of two exact copies of the original molecule. Nuclear DNA replication occurs in the rest period (interphase) just before mitosis and meiosis.
DNA replication
Occasionally, an error is made in DNA replication. For example, an incorrect base pair may be included. This constitutes a mutation. If it occurs in the formation of sex cells, the mutation may be inherited and passed on in future generations. Such errors are the principal source of new genetic variation for a species and, subsequently, play a major role in evolution. DNA replication errors are also responsible for changes in somatic cells that result in the uncontrolled tumor growths of cancers.
How DNA replicates--video clip illustrating DNA replication
View in: QuickTime or Windows Media Player (length = 48 secs)DNA Workshop Activity--hands-on simulations of protein synthesis
and DNA replication. This link takes you to an external website.
To return here, click the "back" button on your browser program.
Universality of the Genetic CodeThis genetic code system of humans is not unique but is shared by all living things on our planet. The same codons code for the same amino acids in people, dogs, fleas, and even bacteria. In addition, we share many genes with other creatures. For instance, about 90% of human genes are identical to those of a mouse. Even more surprising is the fact that more than 1/3 of our genes are shared with a primitive group of worm species known as nematodes. The universal nature of the genetic code is compelling evidence for the evolution of all organisms from the same early life forms.
The Common Genetic Code--evidence that humans share a common genetic code with
other organisms. This link takes you to an external website. To return here, you must
click the "back" button on your browser program. (length = 4 mins, 20 secs)
New Understanding of the Molecular Level
of Genetics Gained from Recent ResearchMost of our genes code for several different proteins. This should not be surprising since we have only about a third as many genes as we do proteins. Some genes act in consort with other genes to produce the same protein. Some genes do not code for proteins at all. Recent research has shown that at least 200-255 of our genes code for micro RNA (miRNA) molecules instead. These are not messenger RNA's that transport the recipe for protein synthesis. Rather, they are small RNA molecules consisting of around 20-25 base units. They perform important functions similar to enzymes in regulating chemical reactions in our cells, especially in the embryonic stage at the beginning of life. It is thought that at least 1/3 of human genes are controlled in some way by micro-RNA molecules.
About half of the sequences of base units in human DNA are "mobile elements." They move around inserting new copies of themselves. When these insertions go into a gene, the result is a changed protein recipe. This is a major source of new genetic variation and potentially a cause of genetic diseases such as hemophilia and muscular dystrophy in families that did not previously have these defective genes.
Whether a gene is copied for protein synthesis and what the product of that copy becomes is largely determined by signals from proteins acting as markers and switches along the DNA double helix structure. This chemical signaling system, referred to as the epigenome, is very likely as important as the DNA itself in determining the phenotype of individuals. It is becoming increasingly clear that epigenetic signals can be altered by the environment and that these changes can then be passed on to future generations because the epigenome proteins are inherited along with DNA in chromosomes from parents. Unlike mutations in DNA sequences, however, epigenetic changes potentially can be reversible. For example, if an alteration in an epigenetic protein causes a disease, it could be possible to alter that protein and bring about a cure. As a consequence, learning more about the human epigenome is likely to be an important area for future research.
Epigenetics--excerpt from the PBS series Nova Science Now (July 24, 2007)
This link takes you to an external website. To return here, you must click the "back" button
on your browser program. (length = 13 mins, 2 secs)When sperm cells enter an egg at conception, they bring along with them substantial amounts of paternal RNA and epigenetic proteins which apparently play an important role in the development of embryos.
There is an RNA-based defense system in the cytoplasm of cells that functions to prevent the genetic material of invading viruses from being replicated. This is known as RNA interference or RNAi. The video linked below provides a simple explanation of how this system works.
RNAi explained--excerpt from the PBS series Nova Science Now (July 25, 2005)
This link takes you to an external website. To return here, you must click the "back" button
on your browser program. (length = 15 mins)There is far more DNA variation between people than was thought possible in 2001 when the human genome was first worked out. The variations mostly show up in 1500 DNA regions (about 12% of the total). The differences are commonly in the form of deletions, duplications, and reversals of DNA segments and individual bases. Research is still at the beginning of the process of determining the significance of these differences. Understanding them very likely will shed considerable light on whether or not we might develop particular diseases. These variations are far more common and extensive than previously assumed. Current estimates are that 65-80% of people have copy number variants (CNV's) that are at least 100,000 base units long.
NOTE: If you had difficulty in understanding these complex processes, you may wish to go over them again. Keep in mind that the graphic shapes used here to model the components of DNA, RNA, protein, and amino acid are not accurate representations of the real shapes and the relative sizes. If you particularly had difficulty understanding the protein synthesis process, you should start reviewing by looking at the following video clip:
The Human Genome Project--3D Animation Introduction--video showing the steps of protein synthesis
This link takes you to an external website. (length = 3 mins, 33 secs)NEWS: On January 26, 2008 an international team of researchers announced the beginning of a 3 year effort known as the "1000 Genomes Project". The goal is to work out the genome of 1000 people from around the world in order to create a master catalog of human diversity. The 30-50 million dollar estimated cost of the project is being paid for by the U.S. National Human Genome Research Institute (NHGRI) in Bethesda Maryland.
This page was last updated on
Friday, May 01, 2009.
Copyright © 1998-2009 by Dennis
O'Neil.
All rights reserved.
illustration credits