I posted a question to Twitter and Facebook about metrics for assessing density in a phylogenetic tree. Here is a "Storification" of the responses. Thanks for the help all.
Any other suggestions welcome in comments ...
Showing posts with label bioinformatics. Show all posts
Showing posts with label bioinformatics. Show all posts
Thursday, May 3, 2012
Monday, January 9, 2012
Draft post cleanup #11: Tree Hugging
Yet another post in my "draft blog post cleanup" series. Here is #11 from September.
Just a quick one. In August a nice review paper came out on phylogenetic analysis software: Learning to Become a Tree Hugger | The Scientist. By Amy Maxmen it is a "A guide to free software for constructing and assessing species relationships". Definitely worth checking out.
Among the key links & tools discussed:
Just a quick one. In August a nice review paper came out on phylogenetic analysis software: Learning to Become a Tree Hugger | The Scientist. By Amy Maxmen it is a "A guide to free software for constructing and assessing species relationships". Definitely worth checking out.
Among the key links & tools discussed:
- Clustal
- RAxML
- MrBayes
- BEAST
- TNT
- BEAST user group
- FigTree
- Mesquite
- Dendroscope
- Paloverde
- GeoPhylo
- MUSCLE
- MAFFT
- T-Coffee
- SeaView
- Phylogeny.fr
- European Bioinformatics Institute
- CIPRES
- Evolutionary Analysis Mesquite
- Biodiverse
- Bayes Traits
- Lagrange
Saturday, December 31, 2011
Draft blog post cleanup #1: Divide and Conquer to Find Orthologs
OK - I am cleaning out my draft blog post list. I start many posts and don't finish them and then they sit in the draft section of blogger. Well, I am going to try to clean some of that up by writing some mini posts. Here is the first ---
Saw an interesting paper worth checking out:
PLoS ONE: Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
It describes not only a way to speed up continual ortholog annotation in bacterial and archaeal genomes but also is linked to an ongoing open code development project.
Here is the abstract:
Saw an interesting paper worth checking out:
PLoS ONE: Calculating Orthologs in Bacteria and Archaea: A Divide and Conquer Approach
It describes not only a way to speed up continual ortholog annotation in bacterial and archaeal genomes but also is linked to an ongoing open code development project.
Here is the abstract:
Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a “divide and conquer” approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.Definitely worth checking out.
Saturday, September 3, 2011
Playing around with CloVR - cloud computing bioinformatics system
Nice new tool/resource available out there for metagenomic and genomic analysis called CloVR: CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing
It is available at http://clovr.org and it should be useful to many people out there doing genomics and metagenomics if you want to make use of cloud computing resources.
CloVR is brought to us by Florian Fricke and Owen White and Sam Angiuoli and others from the University of Maryland (full disclosure - many of the authors are ex-colleagues of mine from TIGR).
Not only is Clovr available openly and freely but they even have a Clovr blog: http://clovr.org/category/blog/ ... though it does not seem to be heavily used. Kudos to this team for producing and releasing this software for others to use. And kudos to NSF, USDA and NIH for funding its development -- I have a feeling many people will use it.
Saturday, August 6, 2011
New paper from my lab (& the Facciotti lab): Mauve Assembly Metrics #Halophiles #Genomics
Just a quick post here. A new paper from my lab has come out in Bioinformatics. The paper is relatively simple. Titled "Mauve Assembly Metrics" it reports work of Aaron Darling and Andrew Tritt (with some minor contributions from me and Marc Facciotti). Aaron wrote the program Mauve when he was a student in Nicole Perna's lab at Wisconsin: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Over the years he (and others) have continued to develop the program and written a few papers too including for example, the development of progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. This new paper reports basically a system/scripts to measure assembly quality. Here is the abstract:
High throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to parameter tuning involves assembling data from an organism with an available high quality reference genome, and measuring assembly accuracy using some metrics. We developed a system to measure assembly quality under several scoring metrics, and to compare assembly quality across a variety of assemblers, sequence data types, and parameter choices. When used in conjunction with training data such as a high quality reference genome and sequence reads from the same organism, our program can be used to manually identify an optimal sequencing and assembly strategy for de novo sequencing of related organisms.
Check out the paper: Mauve Assembly Metrics. Download the scripts/code http://ngopt.googlecode.com and Mauve and play around and let me know what you think.
Note this paper was supported by a grant from the National Science Foundation (ER 0949453). That grant is focused on comparative genomics (sequencing and analysis) of halophlic archaea. Stay tuned for more on that project as we are writing up a series of papers ....
Some related links:
Monday, June 13, 2011
Yes, I am a #RedSox & #PLoS fan; & this video sort of is proof #BenFranklinAward #OpenScience
Just saw this posted on Youtube. Did not know it was coming ... but am happy they recorded it
And here are the slides I used. Will try to synch
For more on this award see
And here are the slides I used. Will try to synch
For more on this award see
Sunday, April 17, 2011
Boston, Bioinformatics & Ben Franklin Award wrap up from #BioIT11
![]() |
Photo by Mark Gabrenya |
Flew to Boston early Tuesday AM. Only thing of note - during Layover in Chicago I saw a bookstore selling autographed versions of "The Immortal Life of Henrietta Lacks" by Rebecca Skloot.
Dropped off my stuff at the Seaport Hotel - had a nice view from my room.
Called up my friend Ashlee Earl who currently works at the Broad and arranged to meet her at Kenmore Square in an hour. I collaborated with Ashlee many years ago on analyzing her expression studies of the Deinococcus radiodurans genome and have been friends ever since. Took the T to Kenmore Square and met Ashlee and then went into this "Fenway Park" place to see a baseball game. (I was born in Boston and am a Redsox fan ...) Had the best baseball seats ever - front row Green Monster Seats - which I had bought from Stubhub.com. Watched the Sox lose while Ashlee and I discussed Genome Centers. I note - to those in the Broad Public Affairs office, Ashlee makes the Broad sound like a great place to work. I tried to get some dirt out of her but she did not provide much.
Took the T back to the hotel after the game. And went to sleep. Got up very early to think about my "acceptance speech" for when I was to pick up my Ben Franklin Award. I made some quick slides on my Ipad (this was the first time I have gone to a meeting w/o my laptop) and during the talk before the award ceremony I emailed them to one of the organizers and we got things set up.
Photo by Ashlee Earl |
Photo by Ashlee Earl |
Then I was Introduced and Jeff Bizzaro read a mini statement about why I won the award. Something like what they put on the Bioinformatics.org web site:
Jonathan uses his high visibility in social media to advocate for open access by sharing links to discussions, mentioning open access articles and initiatives, and pushing for the opening up of popular closed access articles. This culture is shared with his students, who advocate for "open access" peer reviewing and created a peer-to-peer service for sharing bioinformatics material (articles, software and datasets). He is the academic editor in chief of PLoS Biology [1] and voices his opinions and support for open access publication and open data sharing on his "Tree of Life" blog [2]. In addition to just voicing his opinion, he also practices what he preaches, by refusing to publish in non-open access journals. With respect to bioinformatics, he has been involved with many software packages that are freely available, such as the recent AMPHORA [3] and PhyloOTU [4]. Lastly, Jonathan helped release a new open data sharing tool for scientists called BioTorrents [5]. This is just another step in encouraging all scientists to share their data and results more openly.
References:
1. http://www.plosbiology.org/
2. http://phylogenomics.blogspot.com/
3. http://genomebiology.com/2008/9/10/R151
4. http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1001061
5. http://www.plosone.org/article/info:doi/10.1371/journal.pone.0010071
Note - am proud to get this award. It is given for contributions to Open Science and previous winners are an esteemed crew: Michael Eisen (my brother), Alex Bateman, Michael Ashburner, Jim Kent, Robert Gentleman, Phil Bourne, Lincoln Stein, Ewan Birney, and Sean Eddy (see full details here).
Then I gave my mini talk focusing on a brief history of how I got into Open Science. Here are my slides
Note the awkward typo where I introduced the "Public Library of Science". Oops. Anyway - talked for a few minutes. While wearing my RedSox PLoS 1 shirt I note.
Photo by Jeff Bizarro |
![]() |
Photo by Mark Gabrenya |
![]() |
Photo by Mark Gabrenya |
![]() |
Photo by Mark Gabrenya |
![]() |
Photo by Mark Gabrenya |
![]() |
Photo by Mark Gabrenya |
And then back to the meeting where I wandered around again for a while. Saw an old friend from TIGR Xiaoying Lin who now works at Life Technologies and discussed the Ion Torrent with him.
Was pleased to see a booth giving away free RedSox tickets as a prize.
Then I headed out to Brookline for dinner with my Aunt and Uncle and cousin and eventually made my way back to the hotel where I had a few drinks.
The next day I got up a bit late, and eventually made my way to Logan Airport where the trip home was a disaster. My outbound flight was late. Missed my connection. Then the flight I was on was held up for others to make their connection. Though I did get a few hours in Denver Airport to wander around. Got home after 1 AM ... And finally made it home.
Labels:
baseball,
bioinformatics,
open science
Subscribe to:
Posts (Atom)