Tattoo World life: bioinformatics

Showing posts with label bioinformatics. Show all posts

Thursday, May 3, 2012

Summary of responses to question about metrics for density in phylogenetic trees

I posted a question to Twitter and Facebook about metrics for assessing density in a phylogenetic tree. Here is a "Storification" of the responses. Thanks for the help all.
Any other suggestions welcome in comments ...

View the story "Metrics to quantify density of taxa/sampling in a phylogenetic tree" on Storify

Monday, January 9, 2012

Draft post cleanup #11: Tree Hugging

Yet another post in my "draft blog post cleanup" series. Here is #11 from September.

Just a quick one. In August a nice review paper came out on phylogenetic analysis software: Learning to Become a Tree Hugger | The Scientist. By Amy Maxmen it is a "A guide to free software for constructing and assessing species relationships". Definitely worth checking out.

Among the key links & tools discussed:

Saturday, December 31, 2011

Draft blog post cleanup #1: Divide and Conquer to Find Orthologs

OK - I am cleaning out my draft blog post list. I start many posts and don't finish them and then they sit in the draft section of blogger. Well, I am going to try to clean some of that up by writing some mini posts. Here is the first ---

Saw an interesting paper worth checking out:
PLoS ONE: Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach

It describes not only a way to speed up continual ortholog annotation in bacterial and archaeal genomes but also is linked to an ongoing open code development project.

Here is the abstract:

Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.

Definitely worth checking out.

Saturday, September 3, 2011

Playing around with CloVR - cloud computing bioinformatics system

Nice new tool/resource available out there for metagenomic and genomic analysis called CloVR: CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

It is available at http://clovr.org and it should be useful to many people out there doing genomics and metagenomics if you want to make use of cloud computing resources.

CloVR is brought to us by Florian Fricke and Owen White and Sam Angiuoli and others from the University of Maryland (full disclosure - many of the authors are ex-colleagues of mine from TIGR).

Not only is Clovr available openly and freely but they even have a Clovr blog: http://clovr.org/category/blog/ ... though it does not seem to be heavily used. Kudos to this team for producing and releasing this software for others to use. And kudos to NSF, USDA and NIH for funding its development -- I have a feeling many people will use it.

Saturday, August 6, 2011

New paper from my lab (& the Facciotti lab): Mauve Assembly Metrics #Halophiles #Genomics

Just a quick post here. A new paper from my lab has come out in Bioinformatics. The paper is relatively simple. Titled "Mauve Assembly Metrics" it reports work of Aaron Darling and Andrew Tritt (with some minor contributions from me and Marc Facciotti). Aaron wrote the program Mauve when he was a student in Nicole Perna's lab at Wisconsin: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Over the years he (and others) have continued to develop the program and written a few papers too including for example, the development of progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. This new paper reports basically a system/scripts to measure assembly quality. Here is the abstract:

High throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to parameter tuning involves assembling data from an organism with an available high quality reference genome, and measuring assembly accuracy using some metrics. We developed a system to measure assembly quality under several scoring metrics, and to compare assembly quality across a variety of assemblers, sequence data types, and parameter choices. When used in conjunction with training data such as a high quality reference genome and sequence reads from the same organism, our program can be used to manually identify an optimal sequencing and assembly strategy for de novo sequencing of related organisms.

Check out the paper: Mauve Assembly Metrics. Download the scripts/code http://ngopt.googlecode.com and Mauve and play around and let me know what you think.

Note this paper was supported by a grant from the National Science Foundation (ER 0949453). That grant is focused on comparative genomics (sequencing and analysis) of halophlic archaea. Stay tuned for more on that project as we are writing up a series of papers ....

Some related links:

Monday, June 13, 2011

Yes, I am a #RedSox & #PLoS fan; & this video sort of is proof #BenFranklinAward #OpenScience

Just saw this posted on Youtube. Did not know it was coming ... but am happy they recorded it

And here are the slides I used. Will try to synch

Ben Franklin Award Slides

View more presentations from Jonathan Eisen

For more on this award see

Sunday, April 17, 2011

Boston, Bioinformatics & Ben Franklin Award wrap up from #BioIT11

Photo by Mark Gabrenya

Well, just got back from Boston where I went to the BioIT World convention to pick up the "Benjamin Franklin Award" for contributions to Open Science from Bioinformatics.Org. A quick round up of the trip:

Flew to Boston early Tuesday AM. Only thing of note - during Layover in Chicago I saw a bookstore selling autographed versions of "The Immortal Life of Henrietta Lacks" by Rebecca Skloot.

Dropped off my stuff at the Seaport Hotel - had a nice view from my room.

Called up my friend Ashlee Earl who currently works at the Broad and arranged to meet her at Kenmore Square in an hour. I collaborated with Ashlee many years ago on analyzing her expression studies of the Deinococcus radiodurans genome and have been friends ever since. Took the T to Kenmore Square and met Ashlee and then went into this "Fenway Park" place to see a baseball game. (I was born in Boston and am a Redsox fan ...) Had the best baseball seats ever - front row Green Monster Seats - which I had bought from Stubhub.com. Watched the Sox lose while Ashlee and I discussed Genome Centers. I note - to those in the Broad Public Affairs office, Ashlee makes the Broad sound like a great place to work. I tried to get some dirt out of her but she did not provide much.

Photo by Ashlee Earl

Took the T back to the hotel after the game. And went to sleep. Got up very early to think about my "acceptance speech" for when I was to pick up my Ben Franklin Award. I made some quick slides on my Ipad (this was the first time I have gone to a meeting w/o my laptop) and during the talk before the award ceremony I emailed them to one of the organizers and we got things set up.

Then I was Introduced and Jeff Bizzaro read a mini statement about why I won the award. Something like what they put on the Bioinformatics.org web site:

Jonathan uses his high visibility in social media to advocate for open access by sharing links to discussions, mentioning open access articles and initiatives, and pushing for the opening up of popular closed access articles. This culture is shared with his students, who advocate for "open access" peer reviewing and created a peer-to-peer service for sharing bioinformatics material (articles, software and datasets). He is the academic editor in chief of PLoS Biology [1] and voices his opinions and support for open access publication and open data sharing on his "Tree of Life" blog [2]. In addition to just voicing his opinion, he also practices what he preaches, by refusing to publish in non-open access journals. With respect to bioinformatics, he has been involved with many software packages that are freely available, such as the recent AMPHORA [3] and PhyloOTU [4]. Lastly, Jonathan helped release a new open data sharing tool for scientists called BioTorrents [5]. This is just another step in encouraging all scientists to share their data and results more openly.

References:

1. http://www.plosbiology.org/

2. http://phylogenomics.blogspot.com/

3. http://genomebiology.com/2008/9/10/R151

4. http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1001061

5. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0010071

Note - am proud to get this award. It is given for contributions to Open Science and previous winners are an esteemed crew: Michael Eisen (my brother), Alex Bateman, Michael Ashburner, Jim Kent, Robert Gentleman, Phil Bourne, Lincoln Stein, Ewan Birney, and Sean Eddy (see full details here).

Then I gave my mini talk focusing on a brief history of how I got into Open Science. Here are my slides

Ben Franklin Award Slides

View more presentations from Jonathan Eisen

Note the awkward typo where I introduced the "Public Library of Science". Oops. Anyway - talked for a few minutes. While wearing my RedSox PLoS 1 shirt I note.

Photo by Jeff Bizarro

Photo by Mark Gabrenya

And then there was a break. They took some pictures during the break and eventually I wandered around to the booths.

Photo by Mark Gabrenya

I saw Nat Pearson who now works for Knome and I went to lunch with him to discuss my "Exome" which Knome has sequenced. I note, Nat was a student in a class I TAd at Stanford -- good to see how far he has come.

And then back to the meeting where I wandered around again for a while. Saw an old friend from TIGR Xiaoying Lin who now works at Life Technologies and discussed the Ion Torrent with him.

Was pleased to see a booth giving away free RedSox tickets as a prize.

Then I headed out to Brookline for dinner with my Aunt and Uncle and cousin and eventually made my way back to the hotel where I had a few drinks.

The next day I got up a bit late, and eventually made my way to Logan Airport where the trip home was a disaster. My outbound flight was late. Missed my connection. Then the flight I was on was held up for others to make their connection. Though I did get a few hours in Denver Airport to wander around. Got home after 1 AM ... And finally made it home.

Thursday, May 3, 2012

Monday, January 9, 2012

Saturday, December 31, 2011

Saturday, September 3, 2011

Saturday, August 6, 2011

Monday, June 13, 2011

Sunday, April 17, 2011

Blog Archive