Showing posts with label metagenomics. Show all posts
Showing posts with label metagenomics. Show all posts

Tuesday, May 15, 2012

Useful comparative analysis of sequence classification systems w/ a few questionable bits

There is a useful new publication just out: BMC Bioinformatics | Abstract | A comparative evaluation of sequence classification programs by Adam L Bazinet and Michael P Cummings.  In the paper the authors attempt to do a systematic comparison of tools for classifying DNA sequences according to the taxonomy of the organism from which they come.

I have been interested in such activities since, well, since 1989 when I started working in Colleen Cavanaugh's lab at Harvard sequencing rRNA genes to do classification.  And I have known one of the authors, Michael Cummings for almost as long.

Their abstract does a decent job of summing up what they did


Background
A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. 
Results
We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. 
Conclusions
We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs.

The three main categories of methods they identified are

  • Programs that primarily utilize sequence similarity search
  • Programs that primarily utilize sequence composition models (like CompostBin from my lab)
  • Programs that primarily utilize phylogenetic methods (like AMPHORA & STAP from my lab)
The paper has some detailed discussion and comparison of some of the methods in each category.  They even made a tree of the methods

Figure 1. Program clustering. A neighbor-joining tree
 that clusters the classification programs based on their similar attributes. From here.
In some ways - I love this figure.  Since, well, I love trees.  But in other ways I really really really do not like it.  I don't like it because they use an explicitly phylogenetic method (neighbor joining, which is designed to infer phylogenetic trees and not to simply cluster entities by their similarity) to cluster entities that do not have a phylogenetic history.  Why use neighbor-joining here?  What is the basis for using this method to cluster methods?  It is cute, sure.  But I don't get it.  What do deep branches represent in this case?  It drives me a bit crazy when people throw a method designed to represent branching history at a situation where clustering by similarity is needed.  Similarly it drives me crazy when similarity based clustering methods are used when history is needed.

Not to take away from the paper too much since this is definitely worth a read for those working on methods to classify sequences as well as for those using such methods.  They even go so far as to test various web served (e.g., MGRAST) and discuss time to get results.  They also test the methods for their precision and sensitivity.  Very useful bits of information here.

So - overall I like the paper.  But one other thing in here sits in my craw in the wrong way.  The discussion of "marker genes."  Below is some of the introductory text on the topic.  I have labelled some bits I do not like too much:

It is important to note that some supervised learning methods will only classify sequences that contain “marker genes”. Marker genes are ideally present in all organisms, and have a relatively high mutation rate that produces significant variation between species. The use of marker genes to classify organisms is commonly known as DNA barcoding. The 16S rRNA gene has been used to greatest effect for this purpose in the microbial world (green genes [6], RDP [7]). For animals, the mitochondrial COI gene is popular [8], and for plants the chloroplast genes rbcL and matK have been used [9]. Other strategies have been proposed, such as the use of protein-coding genes that are universal, occur only once per genome (as opposed to 16S rRNA genes that can vary in copy number), and are rarely horizontally transferred [10]. Marker gene databases and their constitutive multiple alignments and phylogenies are usually carefully curated, so taxonomic and functional assignments based on marker genes are likely to show gains in both accuracy and speed over methods that analyze input sequences less discriminately. However, if the sequencing was not specially targeted [11], reads that contain marker genes may only account for a small percentage of a metagenomic sample.  
I think I will just leave these highlighted sections uncommented upon and leave it to people to imagine what I don't like about them .. for now.

Anyway - again - the paper is worth checking out.  And if you want to know more about methods used for classifying sequences see this Mendeley collection which focuses on metagenomic analysis but has many additional paper on top of the ones discussed in this paper.

Friday, April 27, 2012

Phylogenetic analysis of metagenomic data - Mendeley group ...

Just a little plug for a Mendeley reference collection I have been helping make on "Phylogenetic and related analyses of metagenomic data." If you want to know more about such studies you can find a growing list of publications at they group collection.

Phylogenetic and related analyses of metagenomic data is a group in Biological Sciences on Mendeley.

Tuesday, March 20, 2012

OMICS Driven Microbial Ecology ...

Quick post here.  Just discovered a nice review paper by Suenaga on targeted metagenomics: Targeted metagenomics: a high-resolution metagenomics approach for specific gene clusters in complex microbial communities - Suenaga - 2011 - Environmental Microbiology

This "Special Issue" on "OMICS Driven Microbial Ecology" has a series of papers, all of which seem to be freely available, of potential interest to readers of this blog including:

and more

Oh, and a paper of mine (with Alex Worden and other members of her lab as well as multiple others)

Thursday, February 23, 2012

PCR amplification and pyrosequencing of rpoB as complement to rRNA

Figure 1. Number of OTUs as
 a function of fractional sequence difference
 (OTU cut-off) for the 16S rRNA marker
 gene (A) and the rpoB marker gene (B).

Interesting new paper in PLoS One: PLoS ONE: A Comparison of rpoB and 16S rRNA as Markers in Pyrosequencing Studies of Bacterial Diversity

In the paper they test and use PCR amplification and pyrosequencing of the rpoB gene for studies of the diversity of bacteria. Due to the lower level of conservation of rpoB than rRNA genes at the DNA level they focused on proteobacteria. It seems that with a little perseverance once can get PCR for protein coding genes to work reasonably well for even reasonably broad taxonomic groups (not totally new here but I am not aware of too many papers doing this with pyrosequencing). Anyway, the paper is worth a look.


Citation:
 Vos M, Quince C, Pijl AS, de Hollander M, Kowalchuk GA (2012) A Comparison of rpoB and 16S rRNA as Markers in Pyrosequencing Studies of Bacterial Diversity. PLoS ONE 7(2): e30600. doi:10.1371/journal.pone.0030600 ResearchBlogging.org Vos, M., Quince, C., Pijl, A., de Hollander, M., & Kowalchuk, G. (2012). A Comparison of rpoB and 16S rRNA as Markers in Pyrosequencing Studies of Bacterial Diversity PLoS ONE, 7 (2) DOI: 10.1371/journal.pone.0030600

Monday, February 13, 2012

Cool paper from DerisiLab on viruses in unknown tropical febrile illnesses #metagenomics #viroarray

Quick post:

Figure 3. Circovirus-like
NI sequence coverage and phylogeny.
Cool new paper from Joe Derisi's lab: PLoS Neglected Tropical Diseases: Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing

Full citation: Yozwiak NL, Skewes-Cox P, Stenglein MD, Balmaseda A, Harris E, et al. (2012) Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing. PLoS Negl Trop Dis 6(2): e1485. doi:10.1371/journal.pntd.0001485

They used a combination of a viral microarray and metagenomic sequencing to characterize viruses in various samples from patients with febrile illness.  And they found some semi-novel viruses in the sample.  Definitely worth a look.

Note - here are some other posts of mine about Derisi:

See some follow up discussion on Google+ here.

Sunday, February 12, 2012

Microbial metaomics discussion group this week: metatranscriptomics and biogeography

A visiting student at my lab Lea Benedicte Skov Hansen will be leading our "metaomics" discussion group this week.  We will be discussing a combination of metatranscriptomics and biogeography and the papers of the week are:

Metatranscriptomics paper:

Microbial community gene expression in ocean surface waters. Frias-Lopez J, Shi Y, Tyson GW, Coleman ML, Schuster SC, Chisholm SW, Delong EF. Proc Natl Acad Sci U S A. 2008 Mar 11;105(10):3805-10. Epub 2008 Mar 3.



Some related papers of potential interest from DeLong

We are also discussing:

Drivers of bacterial beta-diversity depend on spatial scale. Martiny JB, Eisen JA, Penn K, Allison SD, Horner-Devine MC.
Proc Natl Acad Sci U S A. 108(19):7850-4.  (NOTE I am an author on this one - but the meat of the ideas/work was done by Jen Martiny, Claire Horner-Devine and others).

Related papers of possible interest by Jen Martiny and Claire Horner-Divine include:

Will let everyone know how the discussions go.  

Friday, February 3, 2012

Interesting new metagenomics paper w/ one big big big caveat - critical software not available "

Very very strange.  There is an interesting new metagenomics paper that has come out in Science this week.  It is titled "Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota" and it is from the Armbrust lab at U. Washington.

One of the main points of this paper is that the lab has developed software that apparently can help assemble the complete genomes of organisms that are present in low abundance in a metagenomic sample.  At some point I will comment on the science in the paper, (which seems very interesting) though as the paper in non Open Access I feel uncomfortable doing so since many of the readers of this blog will not be able to read it.

But something else relating to this paper is worth noting and it is disturbing to me.  In a Nature News story on the paper by Virginia Gewin there is some detail about the computational method used in the paper:
"He developed a computational method to break the stitched metagenome into chunks that could be separated into different types of organisms. He was then able to assemble the complete genome of Euryarchaeota, even though it was rare within the sample. He plans to release the software over the next six months."
What?  It is imperative that software that is so critical to a publication be released in association with the paper.  It is really unacceptable for the authors to say "we developed a novel computational method" and then to say "we will make it available in six months".  I am hoping the authors change their mind on this but I find it disturbing that Science would allow publication of a paper highlighting a new method and then not have the method be available.  If the methods and results in a paper are not usable how can one test/reproduce the work?

Saturday, December 31, 2011

Draft blog post cleanup #2: Metagenomics meets animals

OK - I am cleaning out my draft blog post list.  I start many posts and don't finish them and then they sit in the draft section of blogger.  Well, I am going to try to clean some of that up by writing some mini posts.  Here is #2:

Saw an interesting story on Genome Web: 'Denizens' of the Deep | The Daily Scan | GenomeWeb.  I have not been able to get the original article yet, but it seems that what they have done can basically be considered metagenomics for animals.  They collected sloughed off cells and other material from a lake and surveyed it for animal DNA.  This seems like a very cool derivative of metagenomic approaches and has enormous potential.  But alas, I never got down to getting access to the paper: Monitoring endangered freshwater biodiversity using environmental DNA so this will have to stay as a mini post.  Damn non open access journals ...


Monday, October 31, 2011

Fun with a $1300 3D printer - featuring @ryneches in my lab

Just a quick one here.  I am posting some links to videos and blog posts about efforts by a student in my lab - Russell Neches - to use 3D printing to help with carrying out high throughput studies of microbial diversity. Basically the idea is that we can use new very cheap 3D printer technologies to help with normalizing sample volumes by printing in essence micro titer dishes with variable well depth. For more on this see some of the links/videos/etc below:  

From Russell's blog:
Some of Russell's videos







Aggie TV news story about Russell's work on this:

 

Sunday, October 30, 2011

Single cell genomics even has its own software: SmashCell

Somehow I was not aware of this software called SmashCell even though it came out a while ago.  But it is worth checking out if you are interested in single-cell genomics.  There is an open access paper describing the software: SmashCell: a software framework for the analysis of single-cell amplified genome sequences.

Single cell genomics is becoming more and more important in studies of environmental microbiology as well as other fields like cancer biology.  One of the challenges with single cell genomics is that the amplification processes used to make copies of the genome from single cells are not completely accurate or efficient so you frequently end up with partial, somewhat messed up samples of genomes.  If you then sequence these amplified genomes it can be hard to make sense out of the data.  Hopefully this software may be of use to some doing this type of work.

Note - for those interested in more see this PLoS One paper I am a coauthor on:

Assembling the Marine Metagenome, One Cell at a Time



Friday, September 30, 2011

Guest post from Antarctica: Joe Grzymski (@grzymski) on "The Story Behind Nitrogen Cost-Minimization"

Well, this is getting really fun. I have been doing "The Story Behind the Paper" posts for my own papers for a while and recently opened this up to guest posts. And the one today is coming to us from the true wilds - Antarctica. Joe Grzymski (aka @grzymski on Twitter) is out there doing field work (yes, microbiologists have the best field sites ...). For more on the field project see the Desert Research Institute's "Mission Antarctica" site. Joe responded to my request for more guest posts and wrote up a really nice discussion of a recent open access paper of his from the ISME Journal. If anyone else is interesting in writing a guest post on an open access paper or an issue in open access, let me know ... without any further ado -- below is Joe's post


I thoroughly enjoy reading Jonathan’s posts detailing – far beyond what can possibly be included in published papers – the who, what, where, when, why and how of science. The story behind the potential fourth domain of life article in PLOS ONE provides great detail about how science is done. After reading Matthew Hahn’s insightful history and commentary on his ortholog conjecture paper I was happy to reply to the request for more "stories" and am chiming in from Antarctica (where I am currently doing field research) to discuss the story behind our recent paper in ISME J, "The significance of nitrogen cost minimization in the proteomes of marine microorganisms". I hope it will provide another example of how a lot of science is lost in final, streamlined, published versions. Also, it is work that was largely done by an undergraduate and was vigorously and carefully reviewed – the improvements and expansion of ideas because of great reviewers highlights the best of the review process. What started out as a short two-page paper morphed into a larger piece of research – not things you can properly detail in a manuscript.



What was the origin of the idea?

The story behind this paper begins in 1997 when I was in graduate school at Rutgers University. Paul Falkowski joined the faculty right around the time when he published a seminal paper, “The evolution of the nitrogen cycle and its influence on the biological sequestration of CO2 in the ocean.” Paul’s office was across from an office I shared with Jay Cullen (who will factor into the story later); Paul was on my committee and influential in how and what I studied in grad school and as a PostDoc. He constantly kept us on our toes (to say the least). Many of the implications of our recent paper were guided by his thoughts and original work on evolution of the nitrogen cycle and many papers on the functional and ecological factors that dictate the structure of phytoplankton communities. There are many papers here by Paul and the awesome Oscar Schofield- my primary dissertation adviser. Incidentally, I overlapped with Felisa Wolfe-Simon at Rutgers for a few years; she was in the science news recently [#arseniclife], and we had common advisers.

Paul’s paper was pre-genomics – but its scope and breadth are strengthened by recent work on isolates, environmental genomes and transcriptomes from the ocean. Simple mass balance says that the reason why we have oil buried deep in the earth and oxygen in the atmosphere is because photosynthesis (net carbon fixation and oxygenation of the atmosphere) exceeds respiration. During long periods of time, organisms draw down CO2, and it gets sequestered from the atmosphere. In his paper, Paul details an inextricable link between the ratios of nitrogen fixation and denitrification (across geological periods) to the potential draw down of CO2 by particulate organic carbon (namely, large sinking diatoms). That is, if nitrogen fixation is abundant and denitrification is zero, there is more available inorganic nitrogen (in the form of nitrate) in the surface ocean for phytoplankton to utilize and carbon sequestration increases. His paper further details why fixed nitrogen is limiting in the ocean surface across geological scales. It boils down to iron limitation, the specialization required to harness the beastly, triple-bond cracking but woefully inefficient nitrogenase enzyme (which has a high Fe requirement) and also the easier, multiple evolution of the process of denitrification. All of this is articulately summarized here.



How did this work advance?

Fast forward to 2001 and publication of the paper by Baudouin-Cornu et al. In this paper, links between environmental imprinting from fluctuating nutrient availability and atomic composition of assimilatory proteins are quantified. Using genome sequences from E. coli and S. cerevisiae, the authors show that carbon and sulfur assimilatory proteins have amino acid sequences that are depleted in carbon and sulfur side chains, respectively. This makes sense. Proteins high in carbon or nitrogen hardly would provide added fitness to an organism that often struggles to find enough of the nutrient to satisfy other fundamental cellular processes. Similar logic also explains why organisms tend to utilize smaller amino acids more frequently than larger ones: it takes more ATP to make a tyrosine than an alanine. Conversely, the pressure to “cost minimize” is less in organisms, like gut dwelling microbes, that have easy access to amino acids. It is not a perfect rule, but most of the time thermodynamic arguments explain a lot about why organisms do what they do. Fast forward again to Craig Venter’s genomic survey of select surface ocean sites (GOS). This (and now other) sequence data sets provided access to genomic information on organisms that inhabit various surface ocean biomes and, crucially, are largely difficult to isolate in pure culture.

What motivated the writing of the paper?

Last summer, I was sitting in my office writing a proposal. I can’t remember the specific topic, but I was thinking about cost-minimization mostly from the perspective of building proteins in cold environments and the challenges organisms face when it is cold: there is little access to organic carbon (food), and other environmental conditions hamper optimal living. I was re-reading Baudouin-Cornu, and there is a specific sentence in the paper in which the authors hypothesize that the phenomenon of cost-minimization might be a broader evolutionary strategy in resource-limited environments. I figured that organisms that did well in the oligotrophic parts of the ocean probably had mechanisms to reduce nitrogen usage and an easy place to start reducing nitrogen is by not making so many proteins or at the very least reducing the usage of arginine, histadine, lysine, asparagine, tryptophan and glutamine – amino acids with at least one added nitrogen on their side chains.

This is a good spot to introduce my co-author, Alex Dussaq.

Co-author, Alex Dussaq

Alex completed his honors undergraduate work in mathematics and biochemistry and was working with me on some coding and analysis projects. To follow Matthew’s example, the conversation that started this paper went like this:

Joe: Alex, I have an interesting idea I want to discuss in a proposal… do you think you can download all the GOS data and calculate the nitrogen, C, H and S atoms per residue side chain as in this paper (hand him Baudouin-Cornu) and then correlate those values with chlorophyll (a proxy for phytoplankton and thus primary productivity), NO3 and Fe. This would be just one figure in the proposal.

Alex: OK, sure that should be pretty easy.

Joe: My proposal is due next week so I need the numbers quickly.

Alex: Yeah, yeah.

Alex codes easier than most people write in their native language. By the way, Alex has moved on to a combined Ph.D./M.D. program at UAB through which he hopes to combine genomics research with new approaches to medicine. I have no doubt he will do unbelievably well in science.

I think that downloading organized data was initially more difficult than it should have been - we spend so much money generating data and so little taking care of it - but we had average values after a few days for several oligotrophic GOS sites and some coastal ocean GOS sites that were convincing enough to put in the proposal. Unfortunately, there are no great metadata – especially physical and chemical characterization of the GOS sites – so we used the “distance to continental land mass” as a proxy for nitrate concentration and oligotrophy (this stung us at first in review). After a week, Alex analyzed all the GOS data and a few important isolated, single organism genomes that factor in the story. After a little less than a month, we had a draft of a two-page brevia that we submitted to Science. It was a simple story that showed data from coastal and open-ocean GOS sites. We found a clear relationship between frequency of nitrogen atoms in side chains of proteins and distance from continental land mass (a proxy for nutrient availability as there are lots of nutrients running off our land). The main conclusion of the paper was that organisms living in oligotrophic oceans tend to have reduced nitrogen content of proteins. Kudos to Alex for some great work.

What was the larger context for the initial findings?

We tried to write the paper from a broader evolutionary and biogeochemical perspective (and used the aforementioned paper by Paul Falkowski as a model). We talked about the implications of organisms in the ocean that are under selective pressure to cost minimize with respect to nitrogen. I’d be happy to share the original submission with anyone who wants to see the evolution of a paper; just contact me. I’d post it here, but Jonathan might charge me for the bytes given how long this is turning out to be. Great reviews make good stories that are decently executed a lot better.

How did the reviewers react?

When reviews of a paper are longer than the original submission, you have an indication that the paper prompted some thought. We received three comprehensive reviews to a two-page paper that contained one main figure and some supplemental material. Given that I didn’t think we could spend time on the subject, we attempted to be brief, too brief especially when compared to the final open access result in ISME. Next, I’ll review some criticisms of the nitrogen cost-minimization hypothesis (having our paper handy will be helpful):

1. Nitrogen cost minimization by simply looking at the predicted proteomes of organisms or environmental genomes assumes that all proteins are made de novo when salvage pathways and dissolved free amino acids (DFAAs) and higher mol. weight/energy compounds are utilized.

Looking at predicted proteomes is indeed a simplification in much the same way that analyzing codon usage frequencies was a simple way to identify with varying degrees of certainty highly expressed genes. No doubt, organisms have multiple methods to acquire the energy they need – especially when under rate-limiting conditions. For example, the pervasive transfer of proteorhodopsin to many different marine microbes presumably helps overcome some nutrient limitation situations by providing added energy from the sun (in the form of a proton gradient), perhaps to aid in transport. The predicted proteome analysis just says that organisms that live in low N waters have lower frequencies of N in their side chains than organisms in the coastal ocean (or in say a sludge metagenome). It doesn’t discount the importance of gene expression, the fact that cells are not “averages” of the genome, etc. None of that really fits into a two-page paper.


2. In our paper, we used the diazotroph Trichodesmium as a model open-ocean organism that was severely N-cost-minimized and compared this to similar success of the SAR11 organism, Pelagibacter ubique. We were criticized because N-fixation should help an organism overcome any N stress.

This was clarified in our next, longer draft. As was shown in the elegant paper by Baudouin-Cornu, assimilatory proteins reflect the “history” of an organism trying to compete for the very atom or molecule they are trying to assimilate. Thus, Trichodesmium would hardly bother to break the triple bond of dinitrogen costing 16 ATP to make ammonia if they were swimming in a vat of inorganic nitrogen. Or put differently, the nitrogenase operon should be nitrogen-cost-minimized reflecting the assimilatory costs of acquiring N. This is, indeed, the case.

3. Why not calculate the bio-energetic costs associated with changes in N content?

We ended up doing this by proxy in the ISME paper. But it raised a far more interesting point that we pursued in further detail and a chicken/egg argument that was pursued subsequently by another reviewer. If you simply plot N atoms per amino acid side chain versus GC, you get a relationship that looks like this:





This is neither surprising nor novel. But it highlights well the "cost" of having a high GC versus low GC genome in terms of added nitrogen atoms in proteins. These data plotted are all marine microbes but the result is universal.

Furthermore, if you plot GC versus median mass of amino acids in the predicted proteome of organisms you get this:



The relationship between GC and the average mass of amino acids is strong. And, this is one of the places where the story gets interesting. Organisms that have low GC genomes have inherently heavier proteins… i.e., All resources being equal and all metabolic pathways being the same (rare, I know), a low GC organism is going to invest more ATP and NADH to make the same protein as a high GC organism. Let’s ignore why this might not matter if you are Helicobacter pylori and quite comfortable acquiring amino acids from your host but focus on ocean microbes. There is a trade-off for all organisms simply based on the GC content of the genome. If you have a low GC genome, you have (on average) larger proteins and less N in your proteins than a high GC genome. Is this trade-off the reason why many of the most successful organisms in the ocean have low GC content? Probably not, but it has to be considered a contributing factor. Constant low nitrogen has to be a major selective pressure given the recent biogeochemical history of the ocean as pointed out in Falkowski (1997). In the final version of the ISME paper, we model differences in the nitrogen budgets of various “model” organisms based on some trade-offs. It was a decent first step, showing that N-cost minimization actually matters.

4. How do you make a quantifiable association between organisms that are so diversely located in space/time and environmental forcing like N availability?

This is a fundamental question in microbial ecology (example, and another). How do we tackle why and when organisms are going to be abundant? Here, I think there are two approaches worth taking. First, what specific genome/metabolic characteristics determine success under specific conditions? For example, what are the characteristics of SAR11 that enable them to “thrive” in oligotrophic waters while their alphaproteobacteria neighbors, the Roseobacter, tend to do better in waters that are more hyper-variable (like the coastal ocean)? Lauro et al. define the characteristics that can be found in genomes of oligotrophic versus copiotrophic organisms. Second, given specific global biogeochemical patterns and environmental forcing constraints, how do we predict organisms will respond? Put in the context of nitrogen cost-minimization, we can ask, “Over geological time will low N waters continue to exert pressure on organisms such that either organisms with N-cost-minimized genomes will thrive or will organisms be forced on a downward GC content trajectory to ease some of this burden?” In our paper, we suggest that the evolutionary history of organisms hints at the impacts nutrient limitations are having on organisms. And this, of course, is by no means new. A beautiful example (albeit not open access).





The divergence of the cyanobacteria Synechococcus and Prochlorococccus during the rise of the diatoms – the most important phytoplankton group in the ocean – suggests the impact of biogeochemical changes on marine microbes. The diversification and proliferation of diatoms in the oceans marginalized cyanobacteria. Diatoms are the workhorses of the ocean biogenic carbon cycle – in comparison to cyanobacteria, they grow quickly and sink faster – thus they sequester fixed CO2, N and Fe that all other surface ocean microbes need. The diatoms changed the ocean, thus putting pressure on cyanobacteria. A result (because many other things also happened) was the genome streamlining and niche adaptation of the lineage. The best example is the high-light adapted MED4 strain of Prochlorococcus. This particular strain has a small genome, low GC and is nitrogen-cost-minimized, as detailed in our paper. Diatoms marginalized cyanobacteria forcing them into specific niches (e.g., high-light, low Fe, low N, low P) where they are successful and well adapted (like these clades that live in iron poor water).

Where we are heading?

What are the implications of cost-minimization in the genomes of ocean microbes? Could it alter the overall nutrient pools in the surface ocean (and thus affect the potential CO2 draw down by phytoplankton)? These are questions we are now pursuing using modeling approaches in an attempt to bolster our understanding of biogeochemistry through genomics and microbial ecology. We are teaming up with Jay Cullen, a chemical oceanography professor, good friend and super smart guy to figure out if cost-minimization and other metabolic changes in microbes might be having more of an effect on biogeochemical cycles than we think. Stay tuned.

Tuesday, September 27, 2011

Blast from the past: video of a talk I gave in 2006 #metagenomics

Just re-found this video and posted it to youtube.  It is from a talk I gave in 2006 at the first "International Metagenomics Meeting" in 2006.

I think one may still be able to view videos from the CalIT2/UCSD page here. But I thought it might be better to have this talk on YouTube than at the CalIT site so I posted it ... hope they don't sue me.

Note - I wrote a blog post about the meeting here:
The Tree of Life: Metagenomics 2006

Tuesday, September 13, 2011

Storification of my notes/tweets from #UCDavis CLIMB Symposium "The infant gut microbiome: prebiotics, probiotics and establishment"

I made a Storify posting for the CLIMB Symposium I participated in yesterday. First I am reposting my summary of what the symposium was about which I posted the day before the meeting:
There is a symposium tomorrow at UC Davis organized by a undergraduates in the CLIMB program.  CLIMB stands for "Collaborative Learning at the Interface of Mathematics and Biology (CLIMB)" and is a program that emphasizes hands-on training using mathematics and computation to answer state-of-the-art questions in biology.  A select group of undergraduates participate in the program and this summer the students had to do some sort of modelling project.  Somehow I managed to convince them to do work on human gut microbes.  And they have done a remarkable job.  
As part of their summer work, they organized a symposium on the topic and their symposium takes place tomorrow.  Details are below. 
The Infant Gut Microbiome: Prebiotics, Probiotics, & Establishment 
  • Jonathan Eisen, UC Davis “DNA and the hidden world of microbes”
  • Mark Underwood, UC Davis “Dysbiosis and necrotizing enterocolitis”
  • Ruth Ley, Cornell University “Host-microbial interactions and metabolic syndrome” 
  • CLIMB 2010 cohort “Breast milk metabolism and bacterial coexistence in the infant microbiome”
  • David Relman, Stanford University “Early days: assembly of the human gut microbiome during childhood" 
  • Bruce German, UC Davis
The only major issue for me is I am losing my voice.  So we will see how this goes.  Though I note I have gotten some very sage advice on how to treat my voice problem via the magic of twitter.  If I do not collapse I will also be tweeting/posting about the other talks during the day. 


Anyway - here is the storification:

Sunday, September 11, 2011

Coming Monday at #UCDavis "The Infant Gut Microbiome: Prebiotics, Probiotics, & Establishment"

Just a little announcement here.  There is a symposium tomorrow at UC Davis organized by a undergraduates in the CLIMB program.  CLIMB stands for "Collaborative Learning at the Interface of Mathematics and Biology (CLIMB)" and is a program that emphasizes hands-on training using mathematics and computation to answer state-of-the-art questions in biology.  A select group of undergraduates participate in the program and this summer the students had to do some sort of modelling project.  Somehow I managed to convince them to do work on human gut microbes.  And they have done a remarkable job.

As part of their summer work, they organized a symposium on the topic and their symposium takes place tomorrow.  Details are below.
The Infant Gut Microbiome: Prebiotics, Probiotics, & Establishment

Monday, 12 September 2011, 9am-4pm

Life Sciences 1022

UC Davis

9:00-9:10 Introduction

9:10-9:40 Jonathan Eisen, UC Davis

“DNA and the hidden world of microbes”

9:40-10:40 Mark Underwood, UC Davis

“Dysbiosis and necrotizing enterocolitis”

10:40-10:50 break

10:50-11:50 Ruth Ley, Cornell University

“Host-microbial interactions and metabolic syndrome”

11:50-12:00 general discussion

12:00-1:00 lunch

1:00-2:00 CLIMB 2010 cohort

“Breast milk metabolism and bacterial coexistence in the infant microbiome”

2:00-2:10 break

2:10-3:10 David Relman, Stanford University

“Early days: assembly of the human gut microbiome during childhood"

3:10-3:40 Bruce German, UC Davis

3:40-4:00 next steps
The only major issue for me is I am losing my voice.  So we will see how this goes.  Though I note I have gotten some very sage advice on how to treat my voice problem via the magic of twitter.  If I do not collapse I will also be tweeting/posting about the other talks during the day.





Friday, September 9, 2011

A Forest (Rohwer that is) on Black Reefs, Shipwrecks and Coral Reef Conservation

Well Forest Rohwer is at it again.  He just is always doing something I find worth paying attention to.  

First, he does fascinating and pioneering science on viruses in the environment.  For example, consider that he was one of if not the first to do random shotgun metagenomics from environmental samples.  See his lab's 2001 and 2002 papers on the topic (Production of shotgun libraries using random amplification and Genomic analysis of uncultured marine viral communities) which I note came out before the Sargasso and Acid Mine Drainage papers which most cite as the first environmental shotgun sequencing pubs.  



In fact, you could say in many ways we do very similar work, except he focuses on viruses.  Not that we always agree mind you. I once gave a talk after him at a meeting and I changed my title to "Seeing the Forest and Missing the Trees" in a little dig at his not using phylogenetic methods and in his approach to metagenomic analysis.  But I digress. 

What I want to write about today is a new paper from his lab: Black reefs: iron-induced phase shifts on coral reefs.


Alas, it is not freely available as it is in ISME but is not published under their "open" option.  Am working on getting a link to an available PDF ... will let everyone know.


Here is the abstract:
The Line Islands are calcium carbonate coral reef platforms located in iron-poor regions of the central Pacific. Natural terrestrial run-off of iron is non-existent and aerial deposition is extremely low. However, a number of ship groundings have occurred on these atolls. The reefs surrounding the shipwreck debris are characterized by high benthic cover of turf algae, macroalgae, cyanobacterial mats and corallimorphs, as well as particulate-laden, cloudy water. These sites also have very low coral and crustose coralline algal cover and are call black reefs because of the dark-colored benthic community and reduced clarity of the overlying water column. Here we use a combination of benthic surveys, chemistry, metagenomics and microcosms to investigate if and how shipwrecks initiate and maintain black reefs. Comparative surveys show that the live coral cover was reduced from 40 to 60% to <10% on black reefs on Millennium, Tabuaeran and Kingman. These three sites are relatively large (>0.75 km2). The phase shift occurs rapidly; the Kingman black reef formed within 3 years of the ship grounding. Iron concentrations in algae tissue from the Millennium black reef site were six times higher than in algae collected from reference sites. Metagenomic sequencing of the Millennium Atoll black reef-associated microbial community was enriched in iron-associated virulence genes and known pathogens. Microcosm experiments showed that corals were killed by black reef rubble through microbial activity. Together these results demonstrate that shipwrecks and their associated iron pose significant threats to coral reefs in iron-limited regions.
Forest and others have recently been studying the Line Islands because they are relatively undisturbed reefs. Here are a short video about the work there (the work in general, not this specific study per se):

Anyway, the new paper does something very different.  It focuses on shipwrecks and the impact of these wrecks on reefs.  This is of particular interest because as indicated in the abstract, the reefs are very low in iron.  And many shipwrecks introduce massive amounts of iron.  What they conclude in this new paper is that the iron from the shipwrecks leads to algal blooms, and lead to rapid killing of / damage to the pristine reefs.

For more on the paper there is an article in National Geographic Newswatch by Enric Sala worth checking out.

Forest also wrote me some information by email.  He states:

Black reefs are associated with shipwrecks or other debris in this region of the world. These sites are interesting both from a conservation and scientific point of view. As a conservation issue, they are amazingly destructive. Kingman, one of the jewels of the USA coral reefs, has lost >1 km of the lagoon in less than 3 years. An old wreck on Fanning atoll has killed about 10% of their reef.
Visually, the black reefs are some of the eeriest places I've ever seen. The bottom is completely covered in different algae (including cyanobacterial mats), the water is filled with marine snow, and dark precipitate on the benthos (probably sulfur). We just published a paper in ISME where we have recreate the precipitate, cloudiness, and
coral death in microcosms by combining rubble from the black reefs, with corals and an iron addition. Addition of antibiotics blocks the coral death, precipitate, and marine snow, suggesting a microbial role.
The black reefs are probably caused by iron-enrichment from the wrecks and debris. We think black reefs are specific to non-emergent coral reefs, where iron is a limiting nutrient. Our current model is that iron stimulation of algae leads to increased microbial activity and coral death. In support of this, metagenomic analysis of the microbial community showed an enrichment of iron-related pathogenicity factors.
Forest also adds a plea to help in conservation of these reefs.

If you are interested in conservation, then please help us petition Congress to support removal of the wrecks and debris. Please contact Emily Douce at the Marine Conservation Biology Institute.
I encourage people to contact her.

Wednesday, September 7, 2011

My science communication hero/heroine of the month - Dr. Kiki @drkiki

Been working on revising my lab's web site and was looking for some videos of talks I have given online to post there.  And I discovered/rediscovered this video of an interview I did for Dr. Kiki's Science Hour.  Here it is:

NOTE - AT LEAST TEMPORARILY REMOVING THE VIDEO DUE TO MALWARE INFECTION OF TWIT.TV SITE

Now I know - this is over a year old. But I just watched the full video. Not so bad I think.

As many of you know, I like to talk.  And talk.  And talk.  But I would like to say that as an interviewer, Dr. Kiki is pretty frigging awesome.  Don't know how she does it.  But I am going to post this video on the new lab page and point people to it if they want to know what my lab does and what I am interested in.

But enough about me.  I want to thank Dr. Kiki for this great interview by saying a little bit about her.  Or, well, her work in science communication.



As some of you may know, I listen to podcasts of TWIS - This Week in Science frequently on my bike rides to work.  And I really recommend anyone/everyone out there give it a whirl.  It is sort of like Science Friday but it is a bit edgier, a bit funnier, a bit goofier, and a bit sciencier (is that a word?)  Dr. Kiki and Justin on it are great and it is so good that I frequently sit outside my building listening to the end of a show if I take the short ride to work which is less than an hour.  So if you like Science - you really should check out the TWIS web site and find some way to listen such as what I do by subscribing to their podcasts at iTunes.

And I guess now I will be checking out "Dr. Kiki's Science Hour" more after rewatching this video.  There are many many more shows at twit.tv/kiki.  I have not checked out as many as TWIS shows but the ones I have watched are great.

And if you want to follow her more directly check out her Blog: The Bird's Brain, or her twitter feed  (@drkiki)  or her  Google+ feed.

Very proud that she is a UC Davis alum ... and just want to say thanks to her for giving me a video I can share with others that says more about me and my lab than almost anything I have written.

Thursday, September 1, 2011

I think that I shall never see - metagenomic analysis as lovely as a tree #PhylogenyRules #PLoSOne

ResearchBlogging.org
Figure 2. Phylogenetic tree linking
metagenomic sequences from 31 gene
families  along an oceanic depth gradient
 at the HOT ALOHA site
I am a co-author on a new paper that came out in PLoS One yesterday.  The paper is PLoS ONE: The Phylogenetic Diversity of Metagenomes and the full citation is Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214.

The first author is Steven Kembel, a brilliant post doc at the University of Oregon.  You can follow him on twitter here. This paper is a product of the "iSEEM" "integrating statistical, ecological and evolutionary approaches to metagenomics" collaboration between my lab and the labs of Jessica Green at U. Oregon and Katie Pollard at UCSF.  For more on iSEEM see http://iseem.org.  iSEEM was supported by the Gordon and Betty Moore Foundation.



Anyway - the paper focuses on developing and using a new method for assessing the phylogenetic diversity of microbes via in samples via analysis of metagenomic data.  Phylogenetic diversity (aka PD) is measured by building evolutionary trees and summing up the total length of branches in such trees.  It is an important diversity metric and is complementary to metrics such as "species richness" which is a measure of the number of species in a sample. When one counts species in a sample, one ends up ignoring the evolutionary distances between species and thus one may get an incomplete picture of the diversity of organisms in a sample simply by counting species.  For example, a sample that contains 500 different species in the genus Escherichia would have the same "richness" as a sample that contained one representative of each of 500 different Orders of bacteria.  For many purposes it is useful to know whether one has a phylogenetically diverse sample or not.  (And of course, if one just focuses on species richness it is also important to not simply ignore some set of organisms in the samples as has sort of been done in a recent paper estimating the total species richness on the planet).  But that is not the point here - the point here is that counting species, even if done correctly, can give an incomplete picture of the diversity of organisms in sample.

For many years researchers have been attempting to measure phylogenetic diversity of various organisms in various samples.  And to do this one needs an evolutionary tree of the organisms in order to then measure branch length in the tree.  There is actually a relatively rich history of researchers attempting to look at PD in studies of microbes - especially in cases where one has access to a rRNA tree for the organisms / samples in question.  Examples of past work on this include:
What we wanted to do here was use metagenomic data to assess phylogenetic diversity of samples.  And in particular we wanted to do this with genes other than rRNA genes (e.g., protein coding genes).  There were multiple challenges in being able to do this (e.g., see a blog post I made about this issue a few years ago asking for community input).  Fortunately, Kembel has worked previously on multiple issues relating to phylogenetic diversity and phylogenetic ecology and his work led to this paper.

I note, as an aside, I have created a Mendeley group focusing on phylogenetic analysis of metagenomes and have added a diversity of papers to the collection:


In the paper Steve basically started with some of the notions and the code from AMPHORA which was designed by Martin Wu (when he was in my lab).  AMPHORA automatically infers phylogenetic trees of a set of 31 protein coding genes - and it can do this from genomic or metagenomic data. 

AMPHORA was designed to build phylogenetic trees of metagenomic sequences individually - in order to classify reads from samples to infer from what organism they likely came


But that is not what Steven wanted to do here.  What he wanted to do was infer phylogenetic trees from metagenomic samples where ALL the organisms in the sample were included in the same tree.  This was / is challenging for many reasons and this is what I had written the blog post about previously.  One issue we had was the fact that sequences might not overlap with each other and thus including them in a single phylogenetic tree together was complicated.  

From my earlier post:

The challenge with this is really two things. First, we want to analyze just the reads themselves (i.e., we do not want to use assemblies you can make from this type of data). Second, and more importantly, we want to include in our analysis sequence reads that only cover small, not necessarily overlapping regions of the "full length" sequence alignments for the family. 

The alignment would look something like
    sequence 1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 1 XXXXXXXXX-------------------------
    sequence 2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 2 ---------XXXXXXXXXXXX-------------
    fragment 3 ---------------------XXXXXXXXXXXXX
    fragment 4 ----XXXXXXXXXXXXXXXXXX------------
    sequence 3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    sequence 4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    sequence 5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 5 -----------------------XXXXXXXXXXX- 
    where Xs are the regions covered by the sequences/fragments (could be DNA or amino acids)

We want to build trees from these alignments with the hope of using them to learn lots of cool things about the evolution of the fragments and the species from which they come. I can provide more information but really the key part for the phylogenetics here is the nature of the alignment.

In the past, I have decided to constrain my analyses to NOT deal with this type of alignments. I have either analyzed each fragment on its own or we have built a multiple alignment but only inlcuded fragments that cover more than 3/4 of the full length sequence and thus the matrix is much more filled out. Such an alignment would look like this


    sequence 1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 1 XXXXXXXXXXXXXXXXXXXXXXXXXXX-------
    sequence 2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 2 --XXXXXXXXXXXXXXXXXXXXXXXX--------
    fragment 3 -----XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 4 ----XXXXXXXXXXXXXXXXXXXXXXXXXXXX--
    sequence 3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    sequence 4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    sequence 5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 5 --XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX- 
But we really want to include the smaller fragments in our analysis. And we are just not certain how to best do this. We know LOTs of people out there think of similar problems in terms of sparse matrices, supermatrices, supertrees, EST data, etc. And we have ideas about how to do this and are asking around by email some phylogenetics gurus we know. But I thought it might be fun to have the discussion on a blog rather than by email.

So again, how might one best build phylogenetic trees from data that looks like this?

    sequence 1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 1 XXXXXXXXX-------------------------
    sequence 2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 2 ---------XXXXXXXXXXXX-------------
    fragment 3 ---------------------XXXXXXXXXXXXX
    fragment 4 ----XXXXXXXXXXXXXXXXXX------------
    sequence 3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    sequence 4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    sequence 5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    fragment 5 -----------------------XXXXXXXXXXX


And from these trees we want to place each fragment relative to (1) the full length sequences and (2) to each other if possible. We also, of course, want branch lengths to reflect some sort of amount of evolution and thus do not just want a cladogram.

So what Steven decided to do in the end was create a method that took all of the AMPHORA markers and concatenated them together into a single mega alignment and then built a reference tree of this mega alignment from available genomes.  Then he searched for matches to any of these genes in metagenomic data and built a tree for each sequence that placed it relative to the reference data.  
Figure 1. Conceptual overview of approach to infer phylogenetic relationships among sequences from metagenomic data sets.
This pipeline allowed him to place many sequences from metagenomic samples onto a single tree such as this one:

Phylogenetic tree linking metagenomic sequences from 31 gene
families along an oceanic depth gradient at the HOT ALOHA site 
And from that he could calculate PD for metagenomic samples.  We then used the PD calculations to comparate and contrast PD with other information in particular from the HOT ALOHA metagenomic data set of Ed Delong, Steve Karl and others.

Figure 3. Taxonomic diversity and standardized
phylogenetic diversity versus depth in environmental
samples along an oceanic depth gradient at the HOT ALOHA site.
For more detail on what we did from there on - read the paper.  It is open access so all can see it / download it / play with it / whatever.  But rather than blather on and on as usual I thought I would email Steve some questions and then post his answers.  These are below:

Can you provide any background to how this work got started and why you ended up doing it?
This work got started as a collaboration between the Eisen, Green, and Pollard labs as part of the iSEEM project ("Integrating Statistical Evolutionary & Ecological Approaches to Metagenomics"), which was funded by the Moore Foundation to figure out ways to address ecological and evolutionary questions using metagenomic data. I had a background in using phylogenetic and evolutionary information to understand ecological communities, and one of the things I wanted to do at iSEEM was to try to think about ways that we could apply methods from ecophylogenetics or phylogenetic community ecology to metagenomic data sets. In conversations among the co-authors, we realized that if we could build phylogenetic hypotheses for organisms based on metagenomic data, we could apply a huge body of ecological and evolutionary theory and use these data sets to improve our understanding of microbial communities and their dynamics.

2. How did you end up working on microbes with your background in larger organisms?

The transition from working on macro-organisms to working on microbes actually wasn't that big of a leap, since my research has generally been question driven rather than study-system or study-organism driven. My previous research involved using phylogenetic information to better understand community assembly in plants and animals. The increasing availability of phylogenetic information for entire communities of plants and animals drove the development of the field of 'ecophylogenetics', and it always seeemed to me that microbes would be the ideal system for this type of approach due to the greater availability of sequence data and phylogenetic information for microbes. Also, the development of high-throughput  sequencing methods meant that the size of microbial community data sets would quickly become really, really large... the prospect of working on data sets with hundreds of millions of observations was really exciting. As my first postdoc was wrapping up, I collaborated on a study looking at phylogenetic diversity of the rhizobacterial symbionts of plant roots that got me interested in microbial ecology. Right around that time I came across the opportunity to work on the iSEEM project, so it seemed like the perfect opportunity to try a new study system.

Having studied the community ecology of both micro- and macro-organisms, I find it interesting that the fields of microbial and non-microbial phylogenetic community ecology have been fairly insulated from one another until recently. For example, the two fields independently developed phylogenetic approaches to community ecology, each field having its own set of favored statistical methods and software packages, with almost no cross-citation, despite addressing very similar questions. In microbiology the emphasis on phylogenetic diversity measures seems to have been driven by the empirical difficulty of defining microbial 'species' and other taxonomic units that macro-organismal ecologists are comfortable with, as well as the availability of phylogenetic and sequence data for microbes. Conversely, for macroorganisms the field of ecophylogenetics was driven by a desire to apply a large body of theory on the links between ecological and evolutionary dynamics to empirical data sets, but was relatively data poor in terms of phylogenetic information about individual species.

3. What was the biggest challenge in this work?

For me the biggest challenge was convincing myself and others that we could infer anything about organismal phylogenies from metagenomic data.  People had built phylogenies for individual genes from metagenomic data sets, but there was a lot of skepticism about how and whether it would be possible to infer a phylogeny for multiple genes given the short, non-overlapping nature of metagenomic sequences. A post on your blog provided a lot of useful feedback. In the end this challenge was overcome both through the availability of software packages for placement of short sequences onto reference phylogenies, as well as simulation and bootstrap analyses to make sure that the results we were finding were robust.

4. Any additional things left out of the paper that you would like to mention here? Other acknowledgements?  Annoyances?

There were a number of people involved in the iSEEM project, including Samantha Risenfeld and Aaron Darling, who did simulations that were very helpful in figuring out when and whether we could make inferences about phylogenetic relationships among metagenomic reads.

Our paper makes use of a large number of open-source software packages and I'd like to thank the people who made their code available for re-use in this way. In particular the short sequence placement methods implemented in packages like RAxML and pplacer made this study possible.

5. What (in general) are your current and future plans?

Right now I'm working at the Biology & the Built Environment Center on a number of projects studying the phylogenetic and functional diversity of microbes in indoor environments, trying to understand the interaction between architectural design and microbial diversity indoors, and the role indoor microbes play in human health and well being. I am still interseted in plant biology, and I have an ongoing project looking at the diversity and function of microbial communities on plant leaves (the 'phyllosphere') in tropical and temperate forests.
Kembel, S., Eisen, J., Pollard, K., & Green, J. (2011). The Phylogenetic Diversity of Metagenomes PLoS ONE, 6 (8) DOI: 10.1371/journal.pone.0023214

Friday, June 3, 2011

Crosspost from http://microBE.net: New, massive volumes on #metagenomics coming out soon

For those interested in microbial diversity and/or metagenomics there are two volumes that are coming out soon that are of interest:
Edited by Frans J. de Bruijn these two volumes are the most comprehensive coverage of metagenomics out there right now. The chapters are almost overwhelming (full disclosure, I have two chapters in here - both of which are republications of Open Access papers I have published on metagenomics).  See below for full chapter lists.

Order from Amazon:
Volume I: Metagenomics and Complementary Approached
  • 1. Introduction (Frans J. de Bruijn).
  • Background Chapters.
    • 2. DNA reassociation yields broad-scale information on metagenome complexity and microbial diversity (V. Torsvik).
    • 3. Diversity of 23S rRNA genes within individual prokaryotic genomes (Zhiheng Pei).
    • 4. Use of the rRNA operon and genomic repetitive sequences for the identification of bacteria (A. Nascimento).
    • 5. Use of different PCR primer-based strategies for characterization of natural microbial communities (James Prosser).
    • 6. Horizontal gene transfer and recombination shape mesorhizobial populations in the gene center of the host plants Astragalus luteolus and Astragalus ernestii in Sichuan, China (Xiaoping Zhang).
    • 7. Amplified rDNA restriction analysis (ARDRA)for identification and phylogenetic placement of 16S-rDNA clones (Menachim Sklarz).
    • 8. Clustering-based peak alignment algorithm for objective and quantitative analysis of DNA fingerprinting data (Satoshi Ishii).
  • The Species Concept.
    • 9. Population genomics informs our understanding of the bacterial species concept (Margaret Riley).
    • 10. Genome analysis of Streptococcus agalactiae: Implication for the microbial “pan-genome” (Rino Rappuoli).
    • 11. Metagenomic insights into bacterial species (Kostas Konstantinidis).
    • 12. Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology (Erko Stackebrandt).
    • 13. Metagenomic Approaches for the Identification of Microbial Species (David Ward).
  • Metagenomics.
    • 14. Microbial Ecology in the age of metagenomics (Jianping Xu).
    • 15. The enduring legacy of small rRNA in microbiology (Susan Tringe).
    • 16. Pitfalls of PCR-based rRNA gene sequence analysis:  an update on some parameters (Erko Stackebrandt).
    • 17. Empirical testing of 16S rRNA gene PCR primer pairs reveals variance in target specificity and efficacy not suggested by in silico analysis (Sergio Morales and Bill Holben).
    • 18. The impact of next-generation sequencing technologies on (meta)genomics (George Weinstock).
    • 19. Accuracy and quality of massively parallel DNA pyrosequencing (Susan Huse and David Mark Welch).
    • 20. Environmental shotgun sequencing: Its potential and challenges for studying the hidden world of microbes (Jonathan Eisen).
    • 21. Comparison of random sequence reads versus 16S rDNA sequences for estimating the biodiversity of a metagenomic library (C. Manischan).
    • 22. Metagenomic libraries for functional screeing (Svein Valla).
    • 23. GC Fractionation Allows Comparative Total Microbial Community  Analysis, Enhances Diversity Assessment, and Facilitates of Minority Populations of Bacteria (Bill Holben).
    • 24. Enriching plant microbiota for a metagenomic library construction (Ying Zeng).
    • 25. Towards Automated Phylogenomic Inference (Wu and Eisen).
    • 26. Integron first gene cassettes: a target to find adaptive genes in metagenomes (Christine Cagnon).
    • 27. High-resolution metagenomics: assessing specific functional types in complex microbial communities (Christoserdova).
    • 28. Gene-targeted –metagenomics (GT-metagenomics) to explore the extensive diversity of genes of interest in microbial communities (J. Tiedje).
    • 29. Phylogenetic screening of metagenomic libraries using homing endonuclease restriction and marker insertion (Torsten Thomas).
    • 30. ArrayOme- & tRNAcc-facilitated mobilome discovery: comparative genomics approaches for identifying rich veins of novel bacterial DNA sequences (Hong-Yu OU).
    • 31. Sequence-Based Characterization of Microbiomes by Serial Analysis of Ribosomal Sequence Tags (SARST) (Zhongtang Yu).
  • Consortia and Databases.
    • 32. The metagenomics of plant pathogen-suppressive soils (J.D. Van Elsas).
    • 33. Soil Metagenomic Exploration of the Rare Biosphere (Pascal Simonet and Timothy Vogel).
    • 34. The BIOSPAS consortium: Soil Biology and agricultural production (Luis Wall).
    • 35. The Human Microbiome Project (George Weinstock).
    • 36. The Ribosomal Database Project: sequences and Software for high-throughput rRNA analysis (J. R. Cole, G. M. Garrity and Jim Tiedje).
    • 37. The metagenomics RAST server- a public resource for the automatic phylogenetic and functional analysis of metagenomes (Folker Meyer).
    • 38. The EBI Metagenomics Archive, Integration and Analysis resource (Apweiler).
  • Computer Assisted Analysis.
    • 39. Comparative metagenome analysis using MEGAN (Suparna Mitra and Daniel Huson).
    • 40. Phylogenetic binning of metagenome sequence samples (Alice C. McHardy).
    • 41. Gene prediction in metagenomic fragments with Orphelia: A large scale machine learning approach (Katharina Hoff).
    • 42. Binning metagenomic sequences using seeded GSOm (Sen-Lin Tang).
    • 43. Iterative read mapping and assembly allows the use of a more distant reference in metagenomic assembly (Bas E. Dutilh).
    • 44. Ribosomal RNA identification in metagenomic and metatranscriptomic datasets (Li).
    • 45. SILVA: comprehensive databases for quality checked and aligned ribosomal RNA sequence data compatible with ARB (Frank Gloeckner).
    • 46. ARB; a software environment for sequence data (Wolfgang Ludwig).
    • 47. The Phyloware Project: A software framework for phylogenomic virtue (Daniel Frank).
    • 48. Metasim- A sequencing simulator for genomics and metagenomics (Daniel Richter).
    • 49. ClustScan: an integrated program package for the detection and semi-automatic annotation of secondary metabolite clusters in genomic and metagenomic DNA datasets (Daslav Hranueli).
    • 50. MetaGene; Prediction of prokaryotic and phage genes in metagenomic sequences (Noguchi).
    • 51. primers4clades, a web server to design lineage-specific PCR primers for gene-targeted metagenomics (Pablo Vinuesa).
    • 52. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes (Y. Ye).
    • 53. ESPRIT: estimating species richness using large collections of 16S rRNA data (Yijun Sun).
  • Complementary Approaches.
    • 54. (Meta) genomics approaches in systems biology (Manuel Ferrer).
    • 55. Towards “focused metagenomics”: a case study combining DNA stable-isotope probing, multiple displacement amplification and metagenomics (J. Colin Murrell).
    • 56. Galbraith, E. A., D. A. Antonopoulos, K. E. Nelson, and B. A. White . Suppressive subtractive hybridization reveals extensive horizontal transfer in the rumen metagenome (Bryan White).
  • Microarrays.
    • 57. GeoChip: A high throughout metagenomics technology for dissecting microbial community functional structure (J. Zhou).
    • 58. Phylogenetic microarrays (PhyloChips) for analysis of complex microbial communities (Eoin Brodie).
    • 59. Phenomics and Phenotype MicroArrays: Applications Complementing Metagenomics (Barry Bochner).
    • 60. Microbial persistence in low biomass, extreme environments: The great unknown (Kasthuri Venkateswaran).
    • 61. Application of phylogenetic oligonucleotide microarrays in microbial analysis (Nian Wang).
  • Metatranscriptomics.
    • 62. Isolation of mRNA from environmental microbial communities for metatranscriptomic analyses (P. Schenk).
    • 63. Comparative day/night metatrancriptomic analysis of microbial communities in the North Pacific subtropical gyre (Rachel Poretski).
    • 64. The “double RNA” approach to simultaneously assess the structure and function of environmental microbial communities by meta-transcriptomics (Tim Urich and Christa Schleper).
    • 65. Soil eukaryotic diversity, a metatranscriptomic approach (Marmeisse).
  • Metaproteomics.
    • 66. Proteomics for the analysis of environmental stress responses in prokaryotes Ksenia Groh, Victor Nesatyy and Marc Suter).
    • 67. Microbial community proteomics (Paul Wilmes).
    • 68. Synchronicity between population structure and proteome profiles: A metaproteomic  analysis of Chesapeake Bay bacterial communities (Feng Chen).
    • 69. High-Throughput Cyanobacterial Proteomics: Systems-level Proteome Identification and Quantitation   (Phillip Wright).
    • 70. Protein Expression Profile of an Environmentally Important Bacterial Strain: the Chromate Response of Arthrobacter sp. strain FB24 (K. Henne).
  • Metabolomics.
    • 71. The small molecule dimension: Mass spectrometry based metabolomics, enzyme assays, and imaging (Trent R. Northen).
    • 72. Metabolomics: high resolution tools offer to follow bacterial growth on a molecular level (Lucio Marianna and Philipp Schmitt-Kopplin).
    • 73. Metabolic profiling of plant tissues by electrospray mass spectrometry (Heather Walker).
    • 74. Metabolite identification, pathways and omic integration using online databases and tools (Matthew Davey).
  • Single cell analysis.
    • 75. Application of cytomics to separate natural microbial communities by their physiological properties (Susann Müller).
    • 76. Capturing microbial populations for environmental genomics (A. Pernthaler/Wendeberg).
    • 77. Microscopic single-cell isolation and multiple displacement amplification of genomes from uncultured prokaryotes (Peter Westermann).
Volume 2: Metagenomics in Different Habitats
  • 1. Introduction (Frans J. de Bruijn).
  • Viral Genomes.
    • 2. Viral metagenomics (Shannon Williamson).
    • 3. Methods in Viral Metagenomics (Thurber).
    • 4. Metagenomic contrasts of viruses in soil and aquatic environments (Eric Wommack).
    • 5. Biodiversity and biogeography of phages in modern stromatolites and thromolites (Christelle Desnues).
    • 6. Assembly of Viral Metagenomes from Yellowstone Hot Springs Reveals Phylogenetic Relationships and Host Co-Evolution (Thomas Schoenfeld).
    • 7. Next-generation sequencing and metagenomic analysis; a universal diagnostic tool in plant pathology (Ian Adams).
    • 8. Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-throughput Sequencing Approach (T. Nakaya).
  • The Soil Habitat.
    • 9. Soil based Metagenomics (R. Daniel).
    • 10. Methods in Metagenomic DNA, RNA and Protein Isolation from Soil (P. Gunasharan).
    • 11. Soil Microbial DNA Purification Strategies for Multiple Metagenomic Applications (Mark Liles).
    • 12. Application of PCR-DGGE and metagenome walking to retrieve full-length functional genes from soil (Morimoto).
    • 13. Actinobacterial diversity associated with Antarctic Dry Valley mineral soils (Cowan).
    • 14. Targetting major soil-borne bacterial lineages using large-insert metaenomic approaches (G. Kowalchuk).
    • 15. Novelty and uniqueness patterns of rare members of the soil biosphere (M. Elshahed).
    • 16. Extensive phylogenetic analysis of a soil bacterial community illustrates extreme taxon evenness and the effects of amplicon length, degree of coverage, and DNA fractionation on classification and ecological parameters (Holben WE).
    • 17. The Antibiotic Resistance: Origins, Diversity, and Future Prospects (Gerard Wright).
  • The Digestive Tract.
    • 18. Functional Intestinal Metagenomics (Michael Kleerebezem).
    • 19. Assessment and improvement of methods for microbial DNA preparation from fecal Samples (M. Hattori).
    • 20. Role of dysbiosis in inflammatory bowel diseases (Johan Dicksved).
    • 21. Culture independent analysis of the human gut microbiota and its activities (Kieran Tuohi).
    • 22. Complete genome of an uncultured endowsymbiont coupling nitrogen fixation to cellulolysis with protest cells in termite gut (Hongo).
    • 23. Cloning and identification of genes encoding acidic cellulases from metagenomes of buffalo rumen (Feng).
  • Marine and Lakes.
    • 24. Microbial diversity in the deep seas and the underexplored “rare biosphere” (David Mark Welsch and Susan Huse).
    • 25. Bacterial Community Structure and Dynamics in a Seasonally Anoxic Fjord (Steven J. Hallam).
    • 26. Adaptation to nutrient availability in marine microorganisms by gene gain and loss (A. Martini).
    • 27. Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities (Jack Gilbert).
    • 28. Metagenomic approach studying the taxonomic and functional diversity of the bacterial community in a lacustrine ecosystem (Didier Debroas).
    • 29. Metagenomics of the marine subsurface: the first glimpse from the Peru Margin, ODP Site 122 (Jennifer Biddle).
    • 30. A targeted metagenomic approach to determine the ‘population genome’ of marine Synechoccus (D. J. Scanlan).
    • 31. Diversity and role of bacterial integron/gene cassette metagenome in extreme marine environments (Hosam Easa Elsaied and Akihiko Maruyama).
  • Other Habitats.
    • 32. The Olavius algarvensis metagenome revisited: lessons learned from the analysis of the low diversity microbial consortium of a gutless marine worm (Nicole Dubulier).
    • 33. Microbiome diversity in human saliva (Ivan Nasidze).
    • 34. Approaches to understanding population level functional diversity in a microbial community (D. Bhaya).
    • 35. A functional metagenomic approach for discovering nickel resistance genes from the rhizosphere of an acid mine drainage environment (JOSE Gonzales –Pastor).
    • 36. The Microbiome of Leaf-cutter Ant Fungus Gardens (Garret Suen).
    • 37. Diversity of archaea in terrestrial hot springs and role In ammonia oxidation (Chuanlun Zhang).
    • 38. Colinization of nascent, deep-sea hydrothermal vents by a novel Archaeal and Nanoarchaeal assemblage (S. Craig Cary).
    • 39. Analysis of the Metagenome from a biogas-producing microbial Community by means of Bioinformatucs Methods (Andreas Schlueter).
    • 40. Amplicon pyrosequencing analysis of endosymbiont population structure (Colleen Kavahagh).
    • 41. Investigative bacterial diversity along alkaline hot spring thermal gradients by barcoded pyrosequencing (Scott Miller and Michael Welzer).
    • 42. Genetic characterization of microbial communities living at the surface of building stones (J. C. Salvado).
    • 43. Novel aromatic degradation pathway genes and their organization as revealed by metagenomic analysis (Kentaro).
    • 44. Functional screening of a wide host-range metagenomic library from a wastewater treatment plant yields a novel alcohol/aldehyde dehydrogenase (Wexler).
    • 45. Aromatic hydrocarbon degradation genes from chronically polluted Subantarctic marine sediments (H. M. Dionisi).
    • 46. Isolation and characterization if alkane hydroxylases from a metagenomic llibrary of Pacific deep-sea sediment  (Fengping Wang).
  • Biocatalysts and Natural Products.
    • 47. Emerging Fields in Functional Metagenomics and its Industrial Relevance  - Overcoming Limitations and Redirecting the Search for Novel Biocatalysts (Wolfgang Streit).
    • 48. Carboxylesterases and Lipases from Metagenomes (Chow and Wolfgang Streit).
    • 49. Expanding small molecule functional megenomics through parallel screening of broad host-range cosmid environmental DNA libraries in diverse Proteobacteria (Sean Brady).
    • 50. Biomedicinals from the microbial metagenomes of marine invertebrates (Walter Dunlap).
    • 51. Molecular characterization of TEM-type beta-Lactamases identified in Cold-seep sediments of Edison Seamount (South of Lihir Island).
    • 52. Identification of Novel Bioactive Compounds from the Metagenome of the Marine (David Lejon).
    • 53. Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and Sequencing (Thomas Schoenfeld).
  • Summary.
    • 54. Future of metagenomics, metatranscriptomics, metabolomics, metaproteomics and single cell analysis: A perspective (J. Tiedje).
    • 55. Darwin in the 21st Century: Natural Selection, Molecular Biology, and Species Concepts (Francisco Ayala).