Showing posts with label bacteria. Show all posts
Showing posts with label bacteria. Show all posts

Wednesday, March 14, 2012

Diabetes & H.pylori - a correlation but no known causation despite authors claims

Am having a hard time right now with the comments from the authors of this new paper showing a correlation between H. pylori presence and both type II diabetes and blood glucose levels.  As far as I can tell, the paper does not show any causal connection.  That is, they do not determine if H. pylori infection is a cause of blood sugar issues or a consequence of blood sugar issues.

Yet the authors of the paper, one of whom (Martin Blaser) is a very respected H. pylori expert are saying things like
This study provides further evidence of late-in-life cost to having H. pylori,
And they suggest that antibiotic treatment for the elderly may help prevent diabetes.

This to be seems to be a bit over the top.  Yes, it makes sense that H. pylori could cause these issues.  And they have a model for how it might.  But they really should be more careful with their words until a causal connection is established.  After all, we have many well known negative effects of antibiotic overuse, including some shown by Blaser.  The last thing we need is people going out and dosing up on antibiotics in the hope that it will prevent type II diabetes.  But I can guarantee that is what will happen if this story gets overplayed.

At least a few sources report on the lack of anything showing a causal connection (e.g. see US News and World Report):
An expert not involved with the study said that while it did not show a cause-and-effect relationship between the bacterium and diabetes, the findings suggest certain possibilities
But I am worried that that is not enough skepticism to counteract the claims of the authors here. The study is certainly interesting.  And their model for a causal connection is fine.  But they probably need to do a little bit of toning down of their claims here.

Sunday, March 4, 2012

Yes, Virginia, Cell Phones Have Bacteria on Them ... And this means????

A new report is out with a discussion of microbes on cell phones: Study: Cellphones can be more germ-infested than toilet handle | News - Home. Not sure what was done in the study but regardless it seems to be focused on culture based work. And as is usually, the finding of some microbes related to ones known to cause disease leads to the inevitable conclusion that we must kill everything on the phones.

It seems to me that we need a bit more detail on what microbes are found on cell phones before bringing out the cleaners and the irradiation and such.

Saturday, December 31, 2011

Draft blog post cleanup #1: Divide and Conquer to Find Orthologs

OK - I am cleaning out my draft blog post list.  I start many posts and don't finish them and then they sit in the draft section of blogger.  Well, I am going to try to clean some of that up by writing some mini posts.  Here is the first ---

Saw an interesting paper worth checking out:
PLoS ONE: Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach



It describes not only a way to speed up continual ortholog annotation in bacterial and archaeal genomes but also is linked to an ongoing open code development project.

Here is the abstract:
Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.
Definitely worth checking out.

Wednesday, December 7, 2011

Twisted Tree of Life Award #12: Billion Year Old Smart Bacteria That Perfectly Treat Cancer

OMG - for crying out loud. In the following story Billion-year-old Bacteria Could be Medical Goldmine Fox News discusses studies of marine cyanobacteria at the University of Florida. It is so wrong in so many ways I do not know where to begin. Watch the video first for layers of trouble. Then, if you dare, read the article. Among the painful parts:

All cyanobacteria are basically lumped together into a single entity

Cyanobacteria are the oldest organisms on earth - billions of years old - which means they must have have evolved amazing chemistry to deter predators. Wow - by the same logic - bacteria in general should be even better - because bacteria are even older than cyanobacteria. And therefore - if one focused on ALL bacteria, we should find even better predator deterring chemicals. Wait - actually - why not target all life. Surely, if cyanobacteria have perfected the art of deterring predators by the fact that they are billions of years old surely the existence of life is proof that there must be some protection against predators and therefore "living organisms" have the best deterrence systems.

Then they make the leap from cyanobacteria surviving for billions of years by deterring predators to - wait for it - wait - hold on - be patient - wait for it - to - yes that is right - deterring "a devastating human predator - cancer." At least they did not reveal that cancer is also billions of years old.

And then, without any further detail, they leap from this insight to that apparently the researchers have found that the cyanobacteria make the nearly perfect anticancer drug that "has a 1-2 punch" to inhibit growth factors and receptors to be extremely potent.

Furthermore they tell us that these cyanobacteria "are valuable because unlike similar species they are smart - targeting bad cells and sparing healthy ones." That is right, the cyanobacteria have been smart enough to target their drugs to human cancer cells - something they must encounter frequently in their marine life.

Oh for f3#*$# sake. I can't even write about this anymore.

I will just give out a well deserved "Twisted Tree of Life" award here. Not sure though who should get it - because it is unclear if this material came from U. Florida or if the station somehow came to it itself.

Past winners include

    Friday, October 7, 2011

    The story behind Pseudomonas syringae comparative genomics / pathogenicity paper; guest post by David Baltrus (@surt_lab)


    More fun from the community.  Today I am very happy to have another guest post in my "Story behind the paper" series.  This one comes to us from David Baltrus, an Assistant Professor at University of Arizona.  For more on David see his lab page here and his twitter feed here.  David has a very nice post here about a paper on the "Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates" which was published in PLoS Pathogens in July.  There is some fun/interesting stuff in the paper, including analysis of the "core" and "pan" genome of this species.  Anyway - David saw my request for posts and I am very happy that he responded.  Without further ado - here is his story (I note - I added a few links and Italics but otherwise he wrote the whole thing ...).

    ---------------------------------------
    I first want to than Jonathan for giving me this opportunity. I am a big fan of “behind the science” stories, a habit I fed in grad school by reading every Perspectives (from the journal Genetics) article that I could get a hold of. Science can be rough, but I remember finding solace in stories about the false starts and triumphs of other researchers and how randomness and luck manage to figure into any discovery. If anything I hope to use this space to document this as it is fresh in my mind so that (inevitably) when the bad science days roll around I can have something to look back on. In the very least, I'm looking forward to mining this space in the future for quotes to prove just how little I truly understood about my research topics in 2011. It took a village to get this paper published, so apologies in advance to those that I fail to mention. Also want to mention this upfront, Marc Nishimura is my co-author and had a hand in every single aspect of this paper.


    Joining the Dangl Lab

    This project really started way back in 2006, when I interviewed for a postdoc with Jeff Dangl at UNC Chapel Hill. In grad school I had focused on understanding microbial evolution and genetics but I figured that the best use of my postdoc would be to learn and understand genomics and bioinformatics. I was just about to finish up my PhD and was lucky enough to have some choices when it came around to choosing what to do next. I actually had no clue about Dangl’s research until stumbling across one of his papers in Genetics, which gave me the impression that he was interested in bringing an evolutionary approach to studies of the plant pathogen Pseudomonas syringae. I was interested in plant pathogens because, while I wanted to study host/pathogen evolution, my grad school projects on Helicobacter pylori showed me just how much fun it is dealing with the bureaucracy of handling human pathogens. There is extensive overlap in the mechanisms of pathogenesis between plant and human pathogens, but no one really cares how many Arabidopsis plants you infect or if you dispose of them humanely (so long as the transgenes remain out of nature!). By the time I interviewed with Jeff I was leaning towards joining a different lab, but the visit to Chapel Hill went very well and by the end I was primed for Dangl’s sales pitch. This went something along the lines of “look, you can go join another lab and do excellent work that would be the same kinds of things that you did in grad school...or you can come here and be challenged by jumping into the unknown”. How can you turn that down? Jeff sold me on continuing a project started by Jeff Chang (now a PI at Oregon State), on categorizing the diversity of virulence proteins (type III effector proteins to be exact) that were translocated into hosts by the plant pathogen Pseudomonas syringae. Type III effectors are one of the main determinants of virulence in numerous gram negative plant and animal pathogens and are translocated into host cells to ultimately disrupt immune functions (I'm simplifying a lot here). Chang had already created genomic libraries and had screened through random genomic fragments of numerous P. syringae genomes to identify all of the type III effectors within 8 or so phylogenetically diverse strains. The hope was that they would find a bunch of new effectors by screening strains from different hosts. Although this method worked well for IDing potential effectors, I was under the impression that it was going to be difficult to place and verify these effectors without more genomic information. I was therefore brought in to figure out a way to sequence numerous P. syringae genomes without burning through a Scrooge McDuckian money bin worth of grant money. We had a thought that some type of grand pattern would emerge after pooling all this data but really we were taking a shot in the dark.


    Tomato leaves after 10 days infection by the tomato pathogen P.syringae DC3000 (left) as well as a less virulent strain (right). Disease symptoms are dependent on a type III secretion system.

    Moments of Randomness that Shape Science

    When I actually started the postdoc, next generation sequencing technologies were just beginning to take off. It was becoming routine to use 454 sequencing to generate bacterial genome sequences, although Sanger sequencing was still necessary to close these genomes. Dangl had it in his mind that there had to be a way to capitalize on the developing Solexa (later Illumina) technology in order to sequence P. syringae genomes. There were a couple of strokes of luck here that conspired to make this project completely worthwhile. I arrived at UNC about a year before the UNC Genome Analysis core facility came online. Sequencing runs during the early years of this core facility were subsidized by UNC, so we were able to sequence many Illumina libraries very cheaply. This gave us the opportunity to play around with sequencing options at low cost, so we could explore parameter space and find the best sequencing strategy. This also meant that I was able to learn the ins and outs of making libraries at the same time as those working in the core facility (Piotr Mieczkowski was a tremendous resource). Secondly, I started this postdoc without knowing a lick of UNIX or perl and knew that I was going to have to learn these if I had any hope of assembling and analyzing genomes. I was very lucky to have Corbin Jones and his lab 3 floors above me in the same building to help work through my kindergarden level programming skills. Corbin was really instrumental to all of these projects as well as in keeping me sane and I doubt that these projects would have turned out anywhere near as well without him. Lastly, plant pathogens in general, and P. syringae in particular, were poised to greatly benefit from next generation sequencing in 2006. While there was ample funding to completely sequence (close) genomes for numerous human pathogens, lower funding opportunities for plant pathogens meant that we were forced to be more creative if we were going to pull of sequencing a variety of P.syringae strains. This pushed us into trying a NGS approach in the first place. I suspect that it’s no coincidence that, independently of our group, the NGS assembler Velvet was first utilized for assembling P.syringae isolates.

    The Frustrations of Library Making

    Through a collaboration with Elaine Mardis’s group at Washington University St. Louis, we got some initial data back that suggested it would be difficult to make sense of bacterial genomes at that time using only Illumina (the paired end kits weren’t released until later). There simply wasn’t good enough coverage of the genome to create quality assemblies with the assemblers available at this time (SSAKE and VCAKE, our own (really Will Jeck’s) take on SSAKE). Therefore we decided to try a hybrid approach, combining low coverage 454 runs (initially separate GS Flex runs with regular reads and paired ends, and later one run with long paired ends) with Illumina reads to fill in the gaps and leveraging this data to correct for any biases inherent in the different sequencing technologies. Since there was no core facility at UNC when I started making libraries, I had to travel around in order to find the necessary equipment. The closest place that I could find a machine to precisely shear DNA was Fred Dietrich’s lab at Duke. More than a handful of mornings were spent riding a TTA bus from UNC to Duke, with a cooler full of genomic DNA on dry ice (most times having to explain to the bus drivers how I wasn’t hauling anything dangerous), spending a couple of hours on Fred’s hydroshear, then returning to UNC hoping that everything worked well. There really is no feeling like spending a half a day travelling/shearing only to find out that the genomic DNA ended up the wrong size. We were actually planning to sequence one more strain of P. syringae, and already had Illumina data, but left this one out because we filled two plates of 454 sequencing and didn’t have room for a ninth strain. In the end there were two very closely related strains (P.syringae aptata or P. syringae atrofaciens) left to make libraries for and the aptata genome sheared better on the last trip than atrofaciens. If you’ve ever wondered why researchers pick certain strains to analyze, know that sometimes it just comes down to which strain worked first. Sometimes there were problems even when the DNA was processed correctly. I initially had trouble making the 454 libraries correctly in that, although I would follow the protocol exactly, I would lose the DNA somewhere before the final step. I was able to trace down the problem to using an old (I have no clue when the Dangl lab bought it, but it looked as useable as salmon sperm ever does) bottle of salmon sperm DNA during library prep. There were also a couple of times that I successfully constructed Illumina libraries only to have the sequencing runs dominated by few actual sequences. These problems ultimately stemmed from trying to use homebrew kits (I think) for constructing Illumina libraries. Once these problems were resolved, Josie Reinhardt managed to pull everything together and create a pipeline for hybrid genome assembly and we published our first hybrid genome assembly in Genome Research. At that moment it was a thrill that we could actually assemble a genome for such a low cost. It definitely wasn’t a completely sequenced genome, but it was enough to make calls about the presence or absence of genes.

    Waiting for the story to Emerge

    There are multiple ways to perform research. We are all taught about how important it is to define testable hypothesis and to set up appropriate experiments to falsify these educated guesses. Lately, thanks to the age of genomics, it has become easier and feasible to accumulate as much genomic data as possible and find stories within that data. We took this approach with the Pseudomonas syringae genome sequences because we knew that there was going to be a wealth of information, and it was just a matter of what to focus on. Starting my postdoc I was optimistic that our sampling scheme would allow us to test questions about how host range evolves within plant pathogens (and conversely, identify the genes that control host range) because the strains we were going to sequence were all isolated from a variety of diseased hosts. My naive viewpoint was that we were going to be able to categorize virulence genes across all these strains, compare suites of virulence genes from strains that were pathogens of different hosts, and voila...we would understand host range evolution. The more I started reading about plant pathology the more I became convinced that this approach was limited. The biggest problem is that, unlike some pathogens, P. syringae can persist in a variety of environments with strains able to survive our flourish or on a variety of hosts. Sure we had strains that were known pathogens of certain host plants, but you can’t just assume that these are the only relevant hosts. Subjective definitions are not your friend when wading into the waters of genomic comparisons.

    We were quite surprised that, although type III effectors are gained and lost rapidly across P.syringae and our sequenced strains were isolated from diverse hosts, we only managed to identify a handful of new effector families. I should also mention here that Artur Romanchuk came on board and did an extensive amount of work analyzing gene repertoires across strains. A couple of nice stories did ultimately emerge by comparing gene sequences across strains and matching these up with virulence in planta (we are able to show how mutation and recombination altered two different virulence genes across strains), but my two favorite stories from this paper came about from my habit of persistently staring at genome sequences and annotations. As I said above, a major goal of this paper was to categorize the suites of a particular type of virulence gene (type III effectors) across P. syringae. I was staring at gene repertoires across strains when I noticed that two of the strains had very few of these effectors (10 or so) compared to most of the other strains (20-30). When I plotted total numbers of effectors across strains, a phylogenetic pattern arose where genomes from a subset of closely related P. syringae strains possessed lower numbers of effectors. I then got the idea to survey for other classes of virulence genes, and sure enough, strains with the lowest numbers of effectors all shared pathways for the production of well characterized toxin genes (Non ribosomal peptide synthase (NRPS) toxins are secreted out of P. syringae cells and are virulence factors, but are not translocated through the type III secretion system). One exception did arise across this handful of strains (a pea pathogen isolate from pathovar pisi) in that this strain has lost each of these conserved toxin pathways and also contain the highest number of effectors within this phylogenetic group. The relationship between effector number and toxin presence remains a correlation at the present time, but I’m excited to be able to try and figure out what this means in my own lab.
    Modified Figure 3 from the paper. Strain names are listed on the left and are color coded for phylogenetic similarity. Blue boxes indicate that the virulence gene/toxin pathway is present, green indicates that the pathway is likely present but sequence was truncated or incomplete, while box indicates absence. I have circled the group II strains, which have the lowest numbers of type III effectors while also having two conserved toxin pathways (syringomycin and syringolin). Note that the Pisi strain (Ppi R6) lacks these toxin pathways.

    The other story was a complete stroke of luck. P. syringae genomes are typically 6Mb (6 million base pairs) in size, but one strain that we sequenced (a cucumber pathogen) contained an extra 1Mb of sequence. Moreoever, the two largest assembled contigs from this strain were full of genes that weren’t present in any other P. syringae strain. After some similarity comparisons, I learned that there was a small bit of overlap between each of these contigs and performed PCR to confirm this. Then, as a hunch, I designed primers facing out of each end of the contig and was able to confirm that this extra 1Mb of sequence was circular in conformation and likely separate from the chromosome. I got a bit lucky here because there was a small bit (500bp or so) of sequence that was not assembled with either of these two contigs that closed the circle (a lot more and I wouldn’t have gotten the PCR to work at all). We quickly obtained 3 other closely related strains and were able to show that only a subset of strains contain this extra 1Mb and that it doesn’t appear to be directly involved in virulence on cucumber. So it turns out that a small number (2 so far) of P. syringae strains have acquired and extra 1Mb of DNA, and we don’t quite know what any of these ~700 extra genes do. There are no obvious pathways present aside from additional chromosomal maintenance genes, extra tRNAs in the same ratio as the chromosomal copies, and a couple of secretion systems. So somehow we managed to randomly pick the right strain to capture a very recent event that increased the genome size of this one strain by 15% or so. We’ve made some headway on this megaplasmid story since I started my lab, but I’ll save that for future blog posts.

    Modified Figure S12 from the paper. Strains that contain the 1Mb megaplasmid (Pla7512 and Pla107) are slightly less virulent during growth in cucumber than strains lacking the megaplasmid (PlaYM8003, PlaYM7902). This growth defect is also measurable in vitro. In case you are wondering, I used blue and yellow because those were the dolors of my undergrad university, the University of Delaware.Reviewer Critiques

    We finally managed to get this manuscript written up by the summer of 2010 and submitted it to PLoS Biology. I figured that (as always) it would take a bit of work to address reviewer’s critiques, but we would nonetheless be able to publish without great difficulty. I was at a conference on P. syringae at Oxford in August of 2010 when I got the reviews back and learned that our paper had gotten rejected. Everyone has stories about reviewer comments and so I’d like to share one of my own favorites thus far. I don’t think it ever gets easier to read reviews when your paper has been rejected, but I was knocked back the main critique of one reviewer:

    “I realize that the investigators might not typically work in the field of bacterial genomics, but when looking at divergent strains (as opposed to resequencing to uncover SNPs among strains) it is really necessary to have complete, not draft, genomes. I realize that this might sound like a lot to ask, but if they look at comparisons of, for example, bacterial core and pan-genomes, such as the other paper on this that they cite (and numerous other examples exist), they are based on complete genome sequences. If this group does not wish to come up to the standards applied to even the most conventional bacterial genomics paper, it is their prerogative; however, they should be aware of the expectations of researchers in this field.”

    So this reviewer was basically asking us to spend an extra 50k to finish the genomes for these strains before they were scientifically useful. Although I do understand the point, this paper was never about getting things perfect but about demonstrating what is possible with draft genomes. I took the part about working in the field of bacterial genomics a bit personally I have to admit, c'mon that's harsh, but I got over that feeling by downing a few pints in Oxford with other researchers that (judging by their research and interest in NGS) also failed to grasp the importance of spending time and money to close P. syringae genomes. We managed to rewrite this paper to address most of the other reviewers critiques and finally were able to submit to PLoS Pathogens.


    Friday, September 23, 2011

    Crosspost from http://microbe.net: A very misleading “bacteria in buildings” advertisement presented as “news”


    Am crossposting this from http://microbe.net where I posted it earlier. See original post here: A very misleading “bacteria in buildings” advertisement presented as “news”
    Wow this “story” (which is really an ad) is just so incredibly bad I do not know what to say: Dangerous Bacteria Isolated in Healthcare HVAC Evaporator Coils. I do not even know where to begin with criticism so I will just go step by step through some of the advertisement.
    1. Title: Dangerous Bacteria Isolated in Healthcare HVAC Evaporator Coils
    There is no evidence that the bacteria being looked at here are dangerous.
    2. First sentence ”A recent study suggests that doctors may want to monitor the environmental condition of their air conditioners evaporator coil before surgery to help prevent the spread of bacterial infections”
    No evidence is presented anywhere that monitoring AC coils has any even remote potential value here.
    3. Second sentence: Dr. Rajiv Sahay, Laboratory Director at Environmental Diagnostics Laboratory (EDLab) and his colleagues sampled evaporator coils in healthcare air handling systems and isolated Pseudomonas aeruginosa a known noscocomial pathogen.
    Well, Pseudomonas aeruginosa is indeed a known pathogen.  However, there is no evidence presented that all the things they detect are indeed pathogenic/virulent.  In fact, later in the article they report their results as being for “Pseudomonas sp” which suggests that their typing was very broad.  It is very possible that many of the cells they detected are not pathogenic.
    4. Ignore the middle part.  It is just saying that Pseudomonas aeruginosa can be nasty in compromised patients.
    5. They then go on to discuss their study more “In the study, over 560,000 colony forming units (CFU)/gram of Pseudomonas sp were isolated from deep within the evaporator coil system.”
    What study?  No data is presented.  No methods.  No results.  Nothing.
    6. They then say “Potential aerosolization of these micro-organisms from the infested coil is immense due to a discharge of air stream with 6 miles/hours (commonly observed) across the evaporator coils”
    Not so sure about that.  Would have been much better to study ACTUAL aerosolization.
    7. Then we find out that they person who conducted the study Dr. Rajiv Sahay is also the one selling the cleaning service to clean your air coils.  That does not instill confidence in me.
    So a person selling HVAC cleaning reports unpublished results that they claim suggest if you do not clean your HVACs in hospitals you put all your patients at risk.  I am on board with the need to study microbes in hospitals more.  I am on board with the potential risks of microbes in AC systems.  I am not on board with not presenting data, and with getting the science wrong.

    Monday, September 19, 2011

    Interested in sex? How about in bacteria? Then these #PLoSGenetics papers are for you

    Well I was torn about this. Should I title the post " ICE, ICE, Bacterial BABIES" or say something about sex? I settled on sex, but not sure if that was wise.

    Anyway - quick post to say that there are two papers from PLoS Genetics last month that caught my eye. They are
    The latter is a "review" paper linked to the first one which is a research paper. The papers together provide both a good background and a window into modern studies of "ICEs" or integrative conjugative elements in bacteria.

    I like the summary from the first paper:
    Some mobile genetic elements spread genetic information horizontally between prokaryotes by conjugation, a mechanism by which DNA is transferred directly from one cell to the other. Among the processes allowing genetic transfer between cells, conjugation is the one allowing the simultaneous transfer of larger amounts of DNA and between the least related cells. As such, conjugative systems are key players in horizontal transfer, including the transfer of antibiotic resistance to and between many human pathogens. Conjugative systems are encoded both in plasmids and in chromosomes. The latter are called Integrative Conjugative Elements (ICE); and their number, identity, and mechanism of conjugation were poorly known. We have developed an approach to identify and characterize these elements and found more ICEs than conjugative plasmids in genomes. While both ICEs and plasmids use similar conjugative systems, there are remarkable preferences for some systems in some elements. Our evolutionary analysis shows that plasmid conjugative systems have often given rise to ICEs and vice versa. Therefore, ICEs and conjugative plasmids should be regarded as one and the same, the differences in their means of existence in cells probably the result of different requirements for stabilization and/or transmissibility of the genetic information they contain.

    That should be enough to get people started. And that is alas all I have time to write about here.

    Tuesday, August 23, 2011

    Bacteria & archaea don't get no respect from interesting but flawed #PLoSBio paper on # of species on the planet

    ResearchBlogging.org
    Uggh. Double uggh. No no. My first blog quadruple uggh. There is an interesting new paper in PLoS Biology published today. Entitled "How many Species Are There on Earth and in the Ocean?" PLoS Biol 9(8): e1001127 - it is by Camilo Mora, Derek Tittensor, Sina Adl, Alastair Simpson and Boris Worm. It is accompanied by a commentary by none other than Robert May, one of the greatest Ecologists of all time: PLoS Biology: Why Worry about How Many Species and Their Loss?

    I note - I found out about this paper from Carl Zimmer who asked me if I had any comments.  Boy did I.  And Zimmer has a New York Times article today discussing the paper: How Many Species on Earth? It’s Tricky.  Here are my thoughts that I wrote down without seeing Carl's article, which I will look at in a minute.

    The new paper takes a novel approach to estimating the number of species. I would summarize it but May does a pretty good job:
    "Mora et al. [4] offer an interesting new approach to estimating the total number of distinct eukaryotic species alive on earth today. They begin with an excellent survey of the wide variety of previous estimates, which give a range of different numbers in the broad interval 3 to 100 million species"

    ....

    "Mora et al.'s imaginative new approach begins by looking at the hierarchy of taxonomic categories, from the details of species and genera, through orders and classes, to phyla and kingdoms. They documented the fact that for eukaryotes, the higher taxonomic categories are “much more completely described than lower levels”, which in retrospect is perhaps not surprising. They also show that, within well-known taxonomic groups, the relative numbers of species assigned to phylum, class, order, family, genus, and species follow consistent patterns. If one assumes these predictable patterns also hold for less well-studied groups, the more secure information about phyla and class can be used to estimate the total number of distinct species within a given group."
    The approach is novel and shows what appears to be some promise and robustness for certain multicellular eukaryotes. For example, analysis of animals shows a reasonable leveling off for many taxonomic levels:



    Figure 1. Predicting the global number of species in Animalia from their higher taxonomy. (A–F) The temporal accumulation of taxa (black lines) and the frequency of the multimodel fits to all starting years selected (graded colors). The horizontal dashed lines indicate the consensus asymptotic number of taxa, and the horizontal grey area its consensus standard error. (G) Relationship between the consensus asymptotic number of higher taxa and the numerical hierarchy of each taxonomic rank. Black circles represent the consensus asymptotes, green circles the catalogued number of taxa, and the box at the species level indicates the 95% confidence interval around the predicted number of species (see Materials and Methods).
    From Mora C, Tittensor DP, Adl S, Simpson AGB, Worm B (2011) How Many Species Are There on Earth and in the Ocean? PLoS Biol 9(8): e1001127. doi:10.1371/journal.pbio.1001127

    They also do a decent job of testing their use of higher taxon discovery to estimate number of species.  Figure 2 shows this pretty well.

    Figure 2. Validating the higher taxon approach. We compared the number of species estimated from the higher taxon approach implemented here to the known number of species in relatively well-studied taxonomic groups as derived from published sources [37]. We also used estimations from multimodel averaging from species accumulation curves for taxa with near-complete inventories. Vertical lines indicate the range of variation in the number of species from different sources. The dotted line indicates the 1∶1 ratio. Note that published species numbers (y-axis values) are mostly derived from expert approximations for well-known groups; hence there is a possibility that those estimates are subject to biases arising from synonyms.

    So all seems hunky dory and pretty interesting.  That is, until we get to the bacteria and archaea.  For example, check out Table 2:

    Table 2. Currently catalogued and predicted total number of species on Earth and in the ocean.

    Their approach leads to an estimate of 455 ± 160 Archaea on Earth and 1 in the ocean.  Yes, one in the ocean.  Amazing.  Completely silly too.  Bacteria are a little better.  An estimate of 9,680 ± 3,470 on Earth and 1,,320 ±436 in the oceans.  Still completely silly.

    Now the authors do admit to some challenges with bacteria and archaea. For example:
    We also applied the approach to prokaryotes; unfortunately, the steady pace of description of taxa at all taxonomic ranks precluded the calculation of asymptotes for higher taxa (Figure S1). Thus, we used raw numbers of higher taxa (rather than asymptotic estimates) for prokaryotes, and as such our estimates represent only lower bounds on the diversity in this group. Our approach predicted a lower bound of ~10,100 species of prokaryotes, of which ~1,320 are marine. It is important to note that for prokaryotes, the species concept tolerates a much higher degree of genetic dissimilarity than in most eukaryotes [26],[27]; additionally, due to horizontal gene transfers among phylogenetic clades, species take longer to isolate in prokaryotes than in eukaryotes, and thus the former species are much older than the latter [26],[27]; as a result the number of described species of prokaryotes is small (only ~10,000 species are currently accepted).
    But this is not remotely good enough from my point of view. Their estimates of ~ 10,000 or so bacteria and archaea on the planet are so completely out of touch in my opinion that this calls into question the validity of their method for bacteria and archaea at all. 

    Now you may ask - why do I think this is out of touch. Well because reasonable estimates are more on the order or millions or hundreds of millions, not tens of thousands. To help people feel their way through the literature on this I have created a Mendeley group where I am posting some references worth checking out.




    I think it is definitely worth looking at those papers.  But just for the record, some quotes might be useful.  For example, Dan Dykhuizen writes
    we estimate that there are about 20,000 common species and 500,000 rare species in a small quantity of soil or about a half million species.
    And Curtis et al write:
    We are also able to speculate about diversity at a larger scale, thus the entire bacterial diversity of the sea may be unlikely to exceed 2 × 10^6, while a ton of soil could contain 4 × 10^6 different taxa.
    Are their estimates perfect?  No surely not.  But I think without a doubt the number of bacterial and archaeal species on the planet is in the range of millions upon millions upon millions.  10,000 is clearly not even close.  Sure, we do not all agree on what a bacterial or archaeal species is.  But with just about ANY definition I have heard, I think we would still count millions.

    Given how horribly horribly off their estimates are for bacteria and archaea, I think it would have been better to be more explicit in admitting that their method probably simply does not work for such taxa right now.  Instead, they took the approach of saying this is a "lower bound".  Sure.  That is one way of dealing with this.  But that is like saying "Dinosaurs lived at least 500 years ago" or "There are at least 10 people living in New York City" or "Hiking the Appalachian Trail will take at least two days."  Lower bounds are only useful when they provide some new insight.  This lower bound did not provide any.

    Mind you, I like the paper.  The parts on eukaryotes seem quite novel and useful.  But the parts of bacteria and archaea are painful.  Really really painful.

    Mora, C., Tittensor, D., Adl, S., Simpson, A., & Worm, B. (2011). How Many Species Are There on Earth and in the Ocean? PLoS Biology, 9 (8) DOI: 10.1371/journal.pbio.1001127

    Friday, August 19, 2011

    Get to know Jack & the story behind the paper by @gilbertjacka "Defining seasonal marine microbial community dynamics"

    ResearchBlogging.org A few days ago I became aware of the publication of a cool new paper: "Defining seasonal marine microbial community dynamics" by Jack A. Gilbert, Joshua A Steele, J Gregory Caporaso, Lars Steinbrück, Jens Reeder, Ben Temperton, Susan Huse, Alice C McHardy, Rob Knight, Ian Joint, Paul Somerfield, Jed A Fuhrman and Dawn Field.  The paper was published in the ISME Journal and is freely available using the ISME Open option. If you want to know more about Jack (in case you don't know Jack, or don't know jack about Jack) check out some of his rantings material on the web like his Google Scholar page, and his twitter feed, his LinkedIn page, his U. Chicago page. But rather than tell you about Jack or the paper, I thought I would send some questions to the first author, Jack Gilbert and see if I could get some of the "story behind the paper" out of him.  Since Jack likes to talk (and email and do things on the web), I figured it was highly likely I could get some good answers.  And indeed I was right. Here are his answers to my quickly written up questions (been out of the office due to family illness)


    1. Can you provide some detail about the history of the project ... How did it start ? What were the original plans ? (not this much sequencing I am sure)
    The Western English Channel has been studied for over 100 years, and is in fact it is the longest studied marine site in the world. It is the home, essentially of the Marine Biological Association, and has a long history. The idea to start contextualizing the abundant metadata (www.westernchannelobservatory.org) was started in 2003 by Ian Joint, a senior researcher at Plymouth Marine Laboratory (www.pml.ac.uk), who saw the benefit of collecting microbial life on filters and storing these at -80C. It was his vision to create and maintain this collection that enabled us to go back through this frozen time series and explore microbial life. I started working for PML in 2005, and basically was charged with trying to identify a potential technique to characterize the microbial life in these samples. initially we got funding through the International Census of Marine Life to performed 16S rDNA V6 pyrosequencing on 12 samples. We chose 2007 as the first year, almost arbitrarily, and published that work in Environmental Microbiology in 2009 (http://onlinelibrary.wiley.com/doi/10.1111/j.1462-2920.2009.02017.x/abstract). However, we had already decided to go ahead, and with help from Dawn Field (Center for Ecology and Hydrology, UK) we were able to secure funding to pyrosequence 60 further amplicon samples, essentially we did 2003-2008. We deposited all these in the ICoMM dataset (link below) and it quickly became the largest study in the series. This was also a gold standard study for the Genomic Standards Consortium's MIMARKS checklist (http://www.nature.com/nbt/journal/v29/n5/full/nbt.1823.html). We published the first analysis of these data in Nature Preceedings in 2010 (http://precedings.nature.com/documents/4406/version/1). We continued to characterize the microbial communities of the L4 sampling site in the Western English Channel by employing Metagenomic and Metatranscriptomic along side more 16S rRNA V6 pyroseqeuncing across diel and seasonal time scales throughout 2008 (the final year of the 6 year time series. This study was published in PLoS ONE also in 2010 (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0015545). This study also included our first analysis fo archaeal diversity in the English Channel, which was also funded through the ICoMM initiative. We owe a lot to Mitch Sogin's group for the first attempts at data analysis for the 16S rDNA profiles. We had a lot of difficulty getting the message right for the 6-year paper that was recently published in ISME J. Basically it was an issue of sequencing data as Natural History, we were generating data catalogs, and not doing enough to characterize the ecology interactions that occurred there.  So we reached out to the community, and found research groups who could help us plug that gap. Those involved Rob Knight's team, Alice McHardy's team, and Jed Fuhrman's team. We worked a lot of improving this paper, and had some valuable help from a wide selection of other researchers, including Steven Giovannoni, Doug Barlett, among many others.
    The publication of this study however, is just the start. 
    2. Who collected the samples? Any good field stories?
    Samples were all collected by the fantastic boat staff at Plymouth Marine Laboratory, who routinely go out every Monday morning to collect water and specific samples for the whole laboratory. They were the life blood of that organization. One specific I always like to relate is that during the 2008 sampling season which generated samples for both the new ISME J paper (http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2011107a.html) and the 2010 PLoS ONE paper (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0015545), we wanted to get diel sampling effort during the winter spring and summer. Unfortunately the only time I could convince my group to go out sampling for 24 hours was during the summer....some times science is limited by enthusiasm ;-). Also, the site is outside the Plymouth Sea Wall - which I think is still the largest concrete structure in the UK and was built in the 19th century, so taking people out to see the site (for what it was worth ;-)) meant taking them into usually very choppy water....which made people quite sick sometimes.In May 2009, J. Craig Venter and his crew came through to start the European leg of this Global Ocean Sampling expedition at L4, specificallly the Western English Channel. Together, our team at PML on our fishing boat, Plymouth Quest, and his team on-board the 100ft yacht, Sorcerer II sampled L4 and E1 (another monitoring site) in the Western English Channel. Excitingly these data form the first part of the attempt to start cataloguing the viral and Eukaryotic metagenomic and metatranscriptomic analysis of these communities. This analysis is being also further characterized using meta-metabolomics run by Carole Llewelyn at PML and Mark Viant at University of Birmingham. Increasing the multi'omic nature of these data.
    3. Can you give some web links for data, people involved , etc?
    • People on the paper - not an exhaustive list of those involved....this is a huge community effort.
    4. What else do you want people to know ?
    We have recently started to model the English Channel from both a taxonomic and functional perspective. I have attached a presentation that has cool gifs that demonstrate this, people can email me and request the gifs if necessary. These are generated by Peter Larsen at Argonne National Laboratory.This modelling is being driven by two new tools:(1) Predicted Relative Metabolic Turnover, which uses fucntional annotations from metagenomes to create predicted metabolomes, which enable us to accurate predict the turnover (relative consumption or production) of more than 1000 metabolites in the English Channel (http://www.microbialinformaticsj.com/content/1/1/4).(2) Microbial Assemblage Prediction, which enables the prediction of the relative abundance of every bacterial taxon at any given location and time, the predictions are driven by in situ or remotely modeled environmental parameter data. We used satellite data to produce the figures above, truely BUGS FROM SPAAAAACCCCCEEEE.....This is the new paradigm - creating information and predictive models from data - no longer will metagenomics be descriptive Natural History - it is now becoming ECOLOGY. These tools will form the corner stone the Earth Microbiome Project's (www.earthmicrobiome.org) data analytical initiative to create predictive models of microbial taxonomic community abundance structure and functional capability defined as the ability of a community to turnover metabolites.
    Note - as a bit of a side story - I am disappointed in the ISME Journals "Open" option for publishing which, though it uses a creative commons license, it is a pretty narrow one that says, for example "You may not alter, transform, or build upon this work." That is pretty limiting.  It means, for example, that the text cannot be reworded into a database of full text of papers where one uses intelligent language processing methods to play with the text.  It also means technically I probably cannot take the figures and modify them in any way to, for example, make an interesting movie using them.  Imagine if Genbank worked this way.  Imagine if you could only look at sequences but could not make alignments of them.  It is, well, not very open. So really this should be called the ISME "No charge" option or something like that since this is not "open access" to me - I think "open access" should really be reserved for material that is free of charge and free of most/all use restrictions (I prefer  the broader version of the "open access" definition described by Peter Suber.).  Sure - the fact that ISME makes some stuff available at no charge is nice.  And that they use CC licenses is good too since these are very straightforward to interpret compared to other licenses.  But their use of the no derivatives option seems silly. Anyway - nice paper.  And I hope some of the story behind the paper is useful to people.

    Reference:
    Gilbert JA, Steele JA, Caporaso JG, Steinbrück L, Reeder J, Temperton B, Huse S, McHardy AC, Knight R, Joint I, Somerfield P, Fuhrman JA, & Field D (2011). Defining seasonal marine microbial community dynamics. The ISME journal PMID: 21850055

    Friday, July 29, 2011

    Fun with Google Books - Old Books on Bacteria

    After discovering a copy of this great 100 year old book on "Bacteria in relation to Country Life" on Google Books I decided to snoop around for other old books on bacteria: Microbiology of the Built Environment – as of ~ 100 years ago: Bacteria in relation to country life

    The Bacteria by Antoine Magnin in 1880






    Lectures on bacteria - Page 1 - by Anton Bary 1887

    If you expand the search to "microbes" you get some other interesting ones



    I am sure there are many others that are fascinating there. It is always interesting to me to see what people were thinking about in terms of microbes in the past. And Google Books is one heck of a convenient way to do this.

    Saturday, July 23, 2011

    Rare, not rare, rare, not rare (and this is not about burgers, it's about ocean bacteria)

    Nice little press release/story for bacteria and ecology lovers out there: 'Rare' bacteria in the ocean ain't necessarily so, researchers report. This is about work by some colleague of mine including in particular Barbara Campbell at the University of Delaware. I worked with her on a genome project a few years ago (of a bacteria found on the surfaces of some deep sea worms - see Adaptations to submarine hydrothermal environments exemplified ....). Alas, the new paper, in PNAS is not Open Access but the story covers it reasonably well.

    Some relevant links:

    Friday, May 27, 2011

    Updated Again: Compilation of articles, news, blogs about the "arsenic bacteria" NASA study

    Lots of new stuff on the arsenic-bacteria front.  For those interested I am compiling some of the more useful links here:

    News stories:
    Blogs:
    • A Bacterium That Can Grow by Using Arsenic Instead of Phosphorus
      • Felisa Wolfe-Simon
      • Jodi Switzer Blum
      • Thomas R. Kulp
      • Gwyneth W. Gordon
      • Shelley E. Hoeft,
      • Jennifer Pett-Ridge
      • John F. Stolz
      • Samuel M. Webb
      • Peter K. Weber
      • Paul C. W. Davies,
      • Ariel D. Anbar
      • and Ronald S. Oremland