E. coli Up Close and Personal: Scientific Rockstar and Public Enigma

This is an article created as guest post for Kitchen Table Science.

It seems nothing puts fear in the hearts of the masses like mentioning E. coli. Most think of the disease-causing germ that contaminates everything from spinach to beef. I agree the strain Escherichia coli O157:H7 and its cousins O26, O145, STEC O104:H4, and others, are a wretched bunch that give the whole species a bad reputation. What makes these strains so vile are the extra proteins encoded within their genome. For example, E. coli O157:H7 has a larger genome coding for 5561 proteins while the parent strain E. coli W codes for 4739 proteins. Thus is the life of a bacterium. The fact there are so many bacteria means they are usually in close proximity to each other. Physical contact between bacteria, not just those of the same species, allows for the transfer of genetic material between two cells (horizontal gene transfer); the closest thing to sexual reproduction you will find for prokaryotes. If the genes transferred to the recipient give it an advantage or new ability that helps it compete and thrive in its environment, they will remain in the genome. Otherwise, they will be discarded after genome compaction.

Most E. coli are completely harmless and, in fact, beneficial. If the general public knew more than what was broadcasted on the 24 hour news channels, they would see the tiny rockstar scientists have known about for some time now. Beginning in earnest in the 1950s, E. coli is easily cultured in laboratories and very cheaply. Its quick generation time (20 min. at optimum temperature) made it a great model organism to study in many fields of science and medicine. This organism is the work horse of biotechnology due to the relative ease of manipulating its genome or adding complete genetic circuits into the cell using plasmids.


Even after 50 years of intense research, E. coli still holds many unknowns out of the reach of our knowledge. Like all other sequenced genomes, there are a number of “hypothetical proteins” and “proteins of unknown function”. This means by our best abilities, we can locate parts of the genome that code for proteins, however, this doesn’t mean we are able to understand the function of a particular protein.

Image courtesy of Predrag Radivojac. Thanks, Pedra.

The above shows just how much work is left to understand the biological capabilities of Mother Nature. Short version: over 40 gene sequences in databases, but the number of which that we know what the function is holding steady around 500,000 and the number of solved protein structures is over 100,000. This is a growing gap between the known and unknown.

 Where would we be without E. coli?

One advantage of E. coli is their effect on our immune system. Some may find this counter-intuitive, but E. coli can lower the workload of our immune system when pathogens are present, especially in the intestine. When E. coli attach to the GI wall, it changes the acidity of the lining thus making infection from other bacteria less likely. Another benefit is in overall digestion. E. coli promotes better breakdown of food thus preventing accumulation of waste which is a major cause of bloating and constipation.

Many outside the scientific community may not be aware of how integral E. coli are to the advancement of many fields including medicine, pharmacology, biology, and even human physiology. Another reason to not believe the hype.

Repeat after me: There is no newly discovered hidden code in DNA.

It is a very sad and unfortunate occurrence when newly released research findings are hyped and overstated. This week the University of Washington Office of News & Information released a press release embarrassingly called “Scientists discover double meaning in genetic code“. Since then, the release has been picked up by websites across the globe. In that way, the press release did its job. Unfortunately, the statements within the release along with the title have done a world of harm. I can only hope it was unintended.

The release starts by stating scientists discovered a second code hiding within DNA.

This second code contains information that changes how scientists read the instructions contained in DNA and interpret mutations to make sense of health and disease.

This ‘second code’ will not change anything scientists do regarding studying DNA. This ‘hidden second code’ has been known and studied for decades.

Since the genetic code was deciphered in the 1960s, scientists have assumed that it was used exclusively to write information about proteins. UW scientists were stunned to discover that genomes use the genetic code to write two separate languages. One describes how proteins are made, and the other instructs the cell on how genes are controlled. One language is written on top of the other, which is why the second language remained hidden for so long.

Let me rewrite this paragraph to make it factual:

Since the genetic code was deciphered in the 1950s, scientists have continued to find additional layers of complexity in the regulation of how genes are transcribed to make proteins. The current study from UW scientists have added additional knowledge to this growing field.

This is the most unfortunate part:

“For over 40 years we have assumed that DNA changes affecting the genetic code solely impact how proteins are made,” said Stamatoyannopoulos. “Now we know that this basic assumption about reading the human genome missed half of the picture. These new findings highlight that DNA is an incredibly powerful information storage device, which nature has fully exploited in unexpected ways.”

This release was written by writers in a news department as a marketing piece, but when the scientist also grossly exaggerates the findings, it is very sad. Like Emily Willingham said in Forbes, “I can only hope that Stamatoyanopoulos didn’t really say that”. Scientists have not made any such assumption and have decades of evidence to the contrary.

The study shows that changes in the DNA sequence can have two-fold consequences upon the protein made from it. It can change the amino acid sequence of the protein and change which proteins bind that help transcribe the DNA into the RNA used to create the protein. This is not new. The finding that made this study worth of the prestige of publishing into Science is the frequency of the DNA code that is used to determine which proteins bind to the DNA to create the right form of the protein. These proteins, known as transcription factors, have been known for decades and bind to a number of DNA sequences to ensure the cell creates the exact protein needed.

As is common in press releases, the last part of the piece tries to explain DNA and the language of genes. In this aspect, the release does an even worse job:

The genetic code uses a 64-letter alphabet called codons.

The genetic code uses 64 different combinations of nucleotide sets of three, called codons; most of which code for one of the twenty amino acids needed to make a protein.

I could keep going, but I’m exhausted by trying to set the record straight.

Never send for whom the budget tolls, it tolls for thee: an open letter Part II.

The sixty years between World War II and September 11, 2001 were unparalleled for discovery and innovation even though they were fueled by fear. First, fear of Japan, Germany, and later Russia. After the war was over, a new ominous threat emerged that (in the eyes of most) threatened our future as a country if not defeated, communism. ‘Necessity breeds invention’ sure was true during the Cold War and our Research & Development infrastructure became the envy of other countries.

On the tragic day 2,977 innocent lives died, our nation changed. We awoke to a new, hidden enemy with no country boundaries. It brought us together like nothing before. We were united. But some quickly turned to ideology and misinformation leading us into constant military offenses with no real way to fund them. One of the first schemes was by increasing the maximum allowable interest rate of student loans. Personally, mine went from 2% to 6% overnight.

science funding
R&D spending initially rose after 2001. However, this is due to mostly an increase in Defense Department R&D budget increases.

Later on, other sources were needed to continue funding war campaigns. Although most of this funding was borrowed against our future generations, the rest came from discretionary spending. One of the major spending bills is that for science R&D and energy R&D. When the Human Genome Project was completed in April 2003, America’s largest scientific spending project in history was over. Instead of using these funds for other science and technology programs, no other big science project has ever come to fruition (the Spallation Neutron Source began construction prior to 9-11-01).

Infographic: You vs. Your Microbiome

infographic, microbiome

A simple illustration of what your genome is up against. This is a representation of the proportion of your DNA (in red) in relation to the 10,000 or so bacteria that live in or on you (in black). If you are keeping score, the microbes win 100 to 1.

The biggest ring under the little Big Top: The bacterial circus revisited

Continuing on the theme that bacteria are Nature’s smallest circus, I want to highlight the most glaring problem with our knowledge of these 2000 ring circuses. We have discussed how proteins encoded by genes within a microbe’s genome often work together to carry out their function, i.e. pathways (or rings). To date, according to the NCBI genome site 4019 bacterial genomes have been sequenced to the point that we know the number of genes and proteins each organism contains. Moreover, this equates to 7,309,205 genes total or roughly 1818 genes per genome. These are astonishing numbers. To show our futility as experts of all things natural, over 30% of these genes are considered hypothetical or uncharacterized. In some genomes, these genes make up 60% of the total genes. These terms are a technical way of saying “hell if we know what they do”. Computers have recognized them as genes or open reading frames, however, the gene itself isn’t similar enough to known or characterized genes for scientists or computers to call it “the same”. If these gene products (proteins) functions are unknown, they cannot be assigned to a ring in the circus therefore making the largest ring by far in any bacterial circus the “unknown” ring.

bacteria, genomes, unknown
Rings of the Bacterial Circus

What is in a genome?

English language svg version of Image:Plasmid ...
English language svg version of Image:Plasmid (numbers).svg Description : This image shows a line drawing of a bacterium with its chromosomal DNA and several plasmids within it. The bacterium is drawn as a large oval. Within the bacterium, small to medium size circles illustrate the plasmids, and one long thin closed line that intersects itself repeatedly illustrates the chromosomal DNA. (Photo credit: Wikipedia)

I first have to apologize. The mission of this blog is to inform those who are curious about science and nature. My ADD gets the best of me sometimes and I digress towards more policy and advocation.

So…what is in a genome? A broad question with lots of answers. Let’s start with the ‘simple’ example of a genome; bacteria. Unlike humans, and other animals, bacteria have only one true chromosome which is circular. Many bacteria, however, have extra DNA not on the chromosome. This extra DNA is also circular and usually called a plasmid. Many bacteria have several plasmids, and some even have very large plasmids called cosmids.

There is not a lot of room within a bacterial cell, so there is not a lot of ‘junk’ DNA in its chromosome. If an average gene is 1000 base pairs (bp), then a 7 Mbp (7,000,000 bp) genome usually has about 6500 genes. This means bacteria pack a big punch in a small size cellular blueprint. Other than genes, bacteria contain DNA elements that help regulate what and when genes are actively transcribed into RNA to produce functional proteins. Promoters are areas of DNA upstream of genes that are attractive places for some proteins to interact with. Some proteins activate gene transcription while other repress transcription. This ensures only the proteins needed by the cell are being produced since making and degrading unneeded proteins costs energy.

What about plants and animals?

I’ll leave plants out since I’m not knowledgeable enough to write about them. Animals have very elaborate genomes. The number of chromosomes vary for each organism and are not circular. For simplicity, I will discuss humans. Humans have pairs of each chromosome that are identical except for the pair that determine a persons sex. Even identical chromosomes are essentially different in the characteristics of individual genes (see dominant and recessive alleles). Strands of DNA are wrapped around proteins known as histones which interact to compact the size of the chromosome.

The major chromatin structures.
The major chromatin structures. (Photo credit: Wikipedia)

Human genes have MANY ways of being regulated. The histones themselves can undergo modification by enzymes that affects how compact they are and how attractive they are to proteins regulating gene transcription. Like bacteria, human gene transcription can be regulated by promoters. However, unlike bacteria, these genes will not totally be used to make a protein. Human genes are composed of introns, regions not translated into a protein, and exons, regions that are translated into protein. Human messenger RNA is processed after transcription which removes intron sequences leaving only exons that will be shuttled out of the nucleus for protein synthesis. To make this more complicated, during processing, many genes can undergo something called alternative splicing. This means as mRNA is being processed, even some exons can be removed resulting in different versions of a protein! 

Other elements can regulate gene transcription besides promoters. Animals have DNA elements called enhancers and insulators that may or may not be located close to actual genes. Enhancers and insulators can intricately interact to regulate gene expression.

Français : Organisation de l'ADN en chromosome...
Français : Organisation de l’ADN en chromosome National Human Genome Research (USA) (Photo credit: Wikipedia)


I will leave it at this. I hope you enjoyed my little ramble about genomes. Let me know what you think…please…