Tattoo World life: phylogeny

Showing posts with label phylogeny. Show all posts

Tuesday, April 3, 2012

More phylogeny fun from Rod Page: TreeBase -> Genome Browser

More phylogeny fun from Rod Page. Been reading up on his blog post: iPhylo: Browsing TreeBASE using a genome browser-like interface. Seems very cool.

This looks useful: Online Phylogeny Course from Rod Page

If you have an interest in phylogeny then this is definitely worth checking out - Rod Page has an online phylogeny course: Phylogeny. It has some nice links in there to other online resources, some videos of talks, and various phylogeny resources.

Tuesday, February 7, 2012

Converting repeated emails into FAQs: Today's = How to Get Figures/Details from 2009 GEBA paper

OK I am now officially completely driven insane by email. As part of my attempt to reduce email communication with people I am going to start posting some of the emails I get often into FAQs.

Today's email relates to the 2009 paper on a "Phylogeny driven genomic encyclopedia of bacteria and archaea" for which I was the senior and corresponding author. The email is asking for higher resolution figures that were published in the paper. This person and many others have asked for a higher res. version of our "genome tree" which was Figure 1. Here is the version from the paper

But alas, as a JPG when you zoom in you can't see the text very well. And about 30 or so people, maybe more, have asked for a higher res. version. Well, the simplest way to get this figure with legible fonts when zoomed in is to get the PDF of the paper and zoom in on it. But that may not be for everyone - so here is a link to the PDF of the figure that I posted on postures (Blogger does not allow PDF uploads). I also posted PDFs of the other figures.

Many people also ask for the treefile (which is basically a coded version of the phylogenetic tree for viewing and analysis). I am directly posting the treefile below and have also submitted it to "Treebase" (which we should have done before). Enjoy ... and in the future I will be pointing people to this page when they ask for the figure/treefile. Not sure this will have saved me anytime but am sick of writing a lot of this in emails back to people ...

#NEXUS
BEGIN trees;
TREE 'Tree1' = ((((('GEBA_Thermanaerovibrio_acidaminovorans':0.190689,'GEBA_Dethiosulfovibrio_peptidovorans':0.263143):0.276658,(((((((((((((((((((((((((((((((((((((((('Escherichia_coli_O157_H7_str_Sakai':0.0,'Escherichia_coli_str_K12_substr_MG1655':0.0):1.51E-4,'Escherichia_coli_str_K12_substr_DH10B':0.0):2.0E-5,('Escherichia_coli_ATCC_8739':1.7E-4,'Escherichia_coli_HS':1.71E-4):1.7E-4):0.0,('Escherichia_coli_536':0.001196,'Escherichia_coli_APEC_O1':0.0):0.0):0.0,(('Shigella_flexneri_2a_str_301':0.001883,'Shigella_flexneri_2a_str_2457T':0.0):1.71E-4,'Shigella_flexneri_5_str_8401':3.44E-4):5.12E-4):0.0,('Shigella_boydii_CDC_3083_94':5.13E-4,'Shigella_boydii_Sb227':8.57E-4):3.42E-4):0.0,'Escherichia_coli_SMS_3_5':0.0):0.0,'Escherichia_coli_O157_H7_EDL933':0.0):0.0,'Escherichia_coli_CFT073':0.0):0.0,'Escherichia_coli_UTI89':0.0):0.0,'Escherichia_coli_E24377A':0.0):0.0,'Shigella_dysenteriae_Sd197':0.001025):1.69E-4,'Shigella_sonnei_Ss046':0.001028):0.012167,((((((('Salmonella_enterica_subsp_enterica_serovar_Typhi_str_Ty2':0.0,'Salmonella_enterica_subsp_enterica_serovar_Typhi_str_CT18':1.7E-4):3.4E-4,'Salmonella_typhimurium_LT2':0.0):0.0,'Salmonella_enterica_subsp_enterica_serovar_Paratyphi_A_str_ATCC_9150':8.5E-4):2.0E-6,'Salmonella_enterica_subsp_enterica_serovar_Paratyphi_B_str_SPB7':1.8E-4):1.69E-4,'Salmonella_enterica_subsp_enterica_serovar_Choleraesuis_str_SC_B67':3.48E-4):0.001774,'Salmonella_enterica_subsp_arizonae_serovar_62_z4_z23_':0.001673):0.008533,'Citrobacter_koseri_ATCC_BAA_895':0.007106):0.003103):0.00437,'Klebsiella_pneumoniae_subsp_pneumoniae_MGH_78578':0.012094):0.004392,'Enterobacter_sakazakii_ATCC_BAA_894':0.019781):0.007021,'Enterobacter_sp_638':0.019532):0.027075,('Erwinia_tasmaniensis':0.033979,'Serratia_proteamaculans_568':0.033607):0.013319):0.008604,'Erwinia_carotovora_subsp_atroseptica_SCRI1043':0.03122):0.012206,(((((('Yersinia_pestis_KIM':0.0,'Yersinia_pseudotuberculosis_IP_31758':0.0):0.0,'Yersinia_pestis_CO92':0.0):0.0,'Yersinia_pseudotuberculosis_IP_32953':0.0):0.0,(((('Yersinia_pseudotuberculosis_PB1_':0.0,'Yersinia_pestis_Pestoides_F':0.0):0.0,('Yersinia_pseudotuberculosis_YPIII':0.0,'Yersinia_pestis_Nepal516':0.0):0.0):0.0,'Yersinia_pestis_Antiqua':1.72E-4):0.0,'Yersinia_pestis_biovar_Microtus_str_91001':0.0):0.0):0.0,'Yersinia_pestis_Angola':3.44E-4):0.00575,'Yersinia_enterocolitica_subsp_enterocolitica_8081':0.009468):0.030284):0.008222,(((('Candidatus_Blochmannia_pennsylvanicus_str_BPEN':0.111832,'Candidatus_Blochmannia_floridanus':0.199319):0.14508,((('Buchnera_aphidicola_str_APS_Acyrthosiphon_pisum_':0.081646,'Buchnera_aphidicola_str_Sg_Schizaphis_graminum_':0.078489):0.090786,'Buchnera_aphidicola_str_Bp_Baizongia_pistaciae_':0.232468):0.043831,('Wigglesworthia_glossinidia_endosymbiont_of_Glossina_brevipalpis':0.30099,'Buchnera_aphidicola_str_Cc_Cinara_cedri_':0.303142):0.063766):0.064196):0.054991,'Baumannia_cicadellinicola_str_Hc_Homalodisca_coagulata_':0.166451):0.081445,'Sodalis_glossinidius_str_morsitans_':0.026696):0.024889):0.01239,'Photorhabdus_luminescens_subsp_laumondii_TTO1':0.048195):0.050763,(((('Haemophilus_somnus_129PT':9.73E-4,'Haemophilus_somnus_2336':7.28E-4):0.038446,'Pasteurella_multocida_subsp_multocida_str_Pm70':0.033371):0.011204,(('Mannheimia_succiniciproducens_MBEL55E':0.02899,'Actinobacillus_succinogenes_130Z':0.035825):0.013874,((('Haemophilus_influenzae_Rd_KW20':0.002043,'Haemophilus_influenzae_PittGG':0.00105):5.49E-4,'Haemophilus_influenzae_86_028NP':5.15E-4):2.24E-4,'Haemophilus_influenzae_PittEE':0.001026):0.040304):0.008478):0.011217,((('Actinobacillus_pleuropneumoniae_L20':0.00137,'Actinobacillus_pleuropneumoniae_serovar_7_str_AP76':3.39E-4):1.74E-4,'Actinobacillus_pleuropneumoniae_serovar_3_str_JL03':0.003062):0.011765,'Haemophilus_ducreyi_35000HP':0.024577):0.038512):0.080775):0.047571,((((('Vibrio_cholerae_O1_biovar_eltor_str_N16961':3.64E-4,'Vibrio_cholerae_O395':8.38E-4):0.041305,('Vibrio_vulnificus_CMCP6':5.27E-4,'Vibrio_vulnificus_YJ016':1.72E-4):0.020829):0.011445,('Vibrio_parahaemolyticus_RIMD_2210633':0.00625,'Vibrio_harveyi_ATCC_BAA_1116':0.010839):0.011481):0.027253,'Vibrio_fischeri_ES114':0.05376):0.027779,'Photobacterium_profundum_SS9':0.072105):0.054993):0.023142,('Aeromonas_hydrophila_subsp_hydrophila_ATCC_7966':0.006284,'Aeromonas_salmonicida_subsp_salmonicida_A449':0.0108):0.109795):0.020707,(((((('Shewanella_halifaxensis_HAW_EB4':0.008905,'Shewanella_pealeana_ATCC_700345':0.003805):0.025082,('Shewanella_sediminis_HAW_EB3':0.019306,'Shewanella_woodyi_ATCC_51908':0.012326):0.017175):0.016987,'Shewanella_loihica_PV_4':0.021173):0.024752,(((((('Shewanella_sp_ANA_3':0.002092,'Shewanella_sp_MR_4':9.8E-4):7.54E-4,'Shewanella_sp_MR_7':3.48E-4):0.003036,'Shewanella_oneidensis_MR_1':0.002265):0.010027,('Shewanella_sp_W3_18_1':3.26E-4,'Shewanella_putrefaciens_CN_32':1.85E-4):0.001996):0.006463,(('Shewanella_baltica_OS155':0.001236,'Shewanella_baltica_OS195':0.0):5.23E-4,'Shewanella_baltica_OS185':6.84E-4):0.009966):0.015626,('Shewanella_denitrificans_OS217':0.017657,'Shewanella_frigidimarina_NCIMB_400':0.02512):0.022711):0.020469):0.018183,'Shewanella_amazonensis_SB2B':0.035265):0.092511,'Psychromonas_ingrahamii_37':0.170059):0.021406):0.025447,(('Idiomarina_loihiensis_L2TR':0.129215,'Pseudoalteromonas_atlantica_T6c':0.121204):0.026755,('Pseudoalteromonas_haloplanktis_TAC125':0.120366,'Colwellia_psychrerythraea_34H':0.144993):0.031198):0.015669):0.082415,'GEBA_Kangiella_koreensis':0.172853):0.0486,(((((('Acinetobacter_baumannii_ACICU':8.84E-4,'Acinetobacter_baumannii_ATCC_17978':0.003413):5.27E-4,('Acinetobacter_baumannii_SDF':8.48E-4,'Acinetobacter_baumannii_AYE':1.7E-4):0.0):0.020056,'Acinetobacter_sp_ADP1':0.027905):0.107102,(('Psychrobacter_arcticus_273_4':0.006888,'Psychrobacter_cryohalolentis_K5':0.006086):0.066403,'Psychrobacter_sp_PRwf_1':0.060503):0.121017):0.15381,'Alcanivorax_borkumensis_SK2':0.175254):0.040455,((('Chromohalobacter_salexigens_DSM_3043':0.167612,'Marinomonas_sp_MWYL1':0.175719):0.038094,('Marinobacter_aquaeolei_VT8':0.1125,'Hahella_chejuensis_KCTC_2396':0.130701):0.053467):0.019075,((((((((('Pseudomonas_syringae_pv_syringae_B728a':0.001801,'Pseudomonas_syringae_pv_phaseolicola_1448A':0.002321):0.002597,'Pseudomonas_syringae_pv_tomato_str_DC3000':0.003918):0.02921,'Pseudomonas_fluorescens_Pf_5':0.014609):0.00751,'Pseudomonas_fluorescens_PfO_1':0.010753):0.026949,(((('Pseudomonas_putida_KT2440':1.71E-4,'Pseudomonas_putida_F1':0.0):0.001622,'Pseudomonas_putida_GB_1':0.003034):0.003248,'Pseudomonas_putida_W619':0.007868):0.003675,'Pseudomonas_entomophila_L48':0.007306):0.02498):0.030505,'Pseudomonas_mendocina_ymp':0.030084):0.018411,'Pseudomonas_stutzeri_A1501':0.035067):0.021645,(('Pseudomonas_aeruginosa_PAO1':0.0,'Pseudomonas_aeruginosa_UCBPP_PA14':1.71E-4):0.001113,'Pseudomonas_aeruginosa_PA7':0.001132):0.032184):0.118537,('Saccharophagus_degradans_2_40':0.107494,'Cellvibrio_japonicus_Ueda107':0.109149):0.090952):0.024456):0.029903):0.041423):0.029921,((('Legionella_pneumophila_str_Corby':0.001506,'Legionella_pneumophila_subsp_pneumophila_str_Philadelphia_1':5.43E-4):3.1E-4,('Legionella_pneumophila_str_Lens':0.00222,'Legionella_pneumophila_str_Paris':0.001831):3.76E-4):0.255271,(('Coxiella_burnetii_RSA_331':0.00102,'Coxiella_burnetii_RSA_493':5.1E-4):0.001141,'Coxiella_burnetii_Dugway_5J108_111':0.001413):0.286937):0.063734):0.020357,(((((((('Francisella_tularensis_subsp_tularensis_SCHU_S4':0.0,'Francisella_tularensis_subsp_tularensis_FSC198':0.0):0.001507,'Francisella_tularensis_subsp_tularensis_WY96_3418':3.35E-4):3.34E-4,(('Francisella_tularensis_subsp_holarctica_OSU18':0.0,'Francisella_tularensis_subsp_holarctica':5.02E-4):0.0,'Francisella_tularensis_subsp_holarctica_FTNF002_00':3.35E-4):0.002852):0.0,'Francisella_tularensis_subsp_mediasiatica_FSC147':0.002516):0.001333,'Francisella_tularensis_subsp_novicida_U112':6.81E-4):0.02679,'Francisella_philomiragia_subsp_philomiragia_ATCC_25017':0.028537):0.343835,'Dichelobacter_nodosus_VCS1703A':0.35117):0.058577,(('Candidatus_Ruthia_magnifica_str_Cm_Calyptogena_magnifica_':0.048117,'Candidatus_Vesicomyosocius_okutanii_HA':0.070088):0.299069,'Thiomicrospira_crunogena_XCL_2':0.218606):0.069384):0.027865):0.019953,((('Halorhodospira_halophila_SL1':0.175731,'Alkalilimnicola_ehrlichei_MLHE_1':0.10505):0.103584,'Nitrosococcus_oceani_ATCC_19707':0.225193):0.031953,(((((('Xanthomonas_campestris_pv_campestris_str_8004':3.43E-4,'Xanthomonas_campestris_pv_campestris_str_ATCC_33913':5.15E-4):3.45E-4,'Xanthomonas_campestris_pv_campestris':1.7E-4):0.006192,((('Xanthomonas_oryzae_pv_oryzae_KACC10331':1.75E-4,'Xanthomonas_oryzae_pv_oryzae_MAFF_311018':0.0):0.0,'Xanthomonas_oryzae_pv_oryzae_PXO99A':0.002957):0.008947,('Xanthomonas_campestris_pv_vesicatoria_str_85_10':0.001996,'Xanthomonas_axonopodis_pv_citri_str_306':0.00127):0.004934):0.005782):0.031343,'Stenotrophomonas_maltophilia_K279a':0.060056):0.030521,((('Xylella_fastidiosa_M23':0.0,'Xylella_fastidiosa_Temecula1':1.74E-4):0.005446,'Xylella_fastidiosa_M12':0.003419):0.005622,'Xylella_fastidiosa_9a5c':0.003688):0.09616):0.251782,'Methylococcus_capsulatus_str_Bath':0.208011):0.028921):0.02517):0.056558,(((((('Nitrosomonas_europaea_ATCC_19718':0.04084,'Nitrosomonas_eutropha_C91':0.04959):0.144982,'Nitrosospira_multiformis_ATCC_25196':0.08516):0.076439,'Thiobacillus_denitrificans_ATCC_25259':0.145726):0.023371,'Methylobacillus_flagellatus_KT':0.155554):0.027233,((('Azoarcus_sp_BH72':0.052172,'Azoarcus_sp_EbN1':0.058776):0.062917,'Dechloromonas_aromatica_RCB':0.113139):0.05407,(((((('Polynucleobacter_sp_QLW_P1DMWA_1':0.011,'Polynucleobacter_necessarius_STIR1':0.023235):0.136644,(((('Cupriavidus_taiwanensis':0.005601,'Ralstonia_eutropha_H16':0.005264):0.006279,'Ralstonia_eutropha_JMP134':0.007605):0.010799,'Ralstonia_metallidurans_CH34':0.013934):0.029247,('Ralstonia_solanacearum_GMI1000':0.02203,'Ralstonia_pickettii_12J':0.021784):0.025195):0.037629):0.022794,((('Burkholderia_phytofirmans_PsJN':0.004098,'Burkholderia_xenovorans_LB400':0.002629):0.017129,'Burkholderia_phymatum_STM815':0.017185):0.015186,((((('Burkholderia_ambifaria_AMMD':0.001201,'Burkholderia_ambifaria_MC40_6':1.65E-4):0.00415,((('Burkholderia_cenocepacia_AU_1054':0.0,'Burkholderia_cenocepacia_HI2424':1.71E-4):1.71E-4,'Burkholderia_cenocepacia_MC0_3':0.0):0.002317,'Burkholderia_sp_383':0.007333):0.002889):0.00261,'Burkholderia_vietnamiensis_G4':0.003346):0.015421,'Burkholderia_multivorans_ATCC_17616':0.005208):0.005853,(((((((('Burkholderia_mallei_NCTC_10247':0.0,'Burkholderia_mallei_NCTC_10229':1.72E-4):3.43E-4,'Burkholderia_mallei_ATCC_23344':1.91E-4):0.0,'Burkholderia_mallei_SAVP1':0.0):1.71E-4,'Burkholderia_pseudomallei_K96243':5.16E-4):0.0,'Burkholderia_pseudomallei_1710b':0.0):0.0,'Burkholderia_pseudomallei_668':0.0):0.0,'Burkholderia_pseudomallei_1106a':1.71E-4):0.004036,'Burkholderia_thailandensis_E264':0.005011):0.007936):0.019911):0.063072):0.026718,('Herminiimonas_arsenicoxydans':0.021781,'Janthinobacterium_sp_Marseille':0.013485):0.104015):0.024622,(('Methylibium_petroleiphilum_PM1':0.071447,'Leptothrix_cholodnii_SP_6':0.085431):0.041536,(((('Delftia_acidovorans_SPH_1':0.061373,'Acidovorax_sp_JS42':0.025396):0.016116,'Acidovorax_avenae_subsp_citrulli_AAC00_1':0.025052):0.01365,'Verminephrobacter_eiseniae_EF01_2':0.072504):0.026641,(('Polaromonas_naphthalenivorans_CJ2':0.031581,'Polaromonas_sp_JS666':0.022265):0.044515,'Rhodoferax_ferrireducens_T118':0.06856):0.022175):0.065841):0.119404):0.025568,(((('Bordetella_bronchiseptica_RB50':1.72E-4,'Bordetella_parapertussis_12822':0.001377):1.18E-4,'Bordetella_pertussis_Tohama_I':0.001433):0.021534,'Bordetella_avium_197N':0.031614):0.010355,'Bordetella_petrii_DSM_12804':0.021324):0.132685):0.071258):0.027059):0.037078,((('Neisseria_gonorrhoeae_FA_1090':5.0E-4,'Neisseria_gonorrhoeae_NCCP11945':0.001485):0.006103,((('Neisseria_meningitidis_MC58':0.001366,'Neisseria_meningitidis_Z2491':0.00101):7.87E-4,'Neisseria_meningitidis_FAM18':0.001575):5.57E-4,'Neisseria_meningitidis_053442':0.002602):0.001738):0.167934,'Chromobacterium_violaceum_ATCC_12472':0.09855):0.087498):0.124859):0.205339,((((((((((((('Rhizobium_etli_CIAT_652':0.003437,'Rhizobium_etli_CFN_42':0.005094):0.006043,'Rhizobium_leguminosarum_bv_viciae_3841':0.011645):0.036083,'Agrobacterium_tumefaciens_str_C58':0.045233):0.016839,('Sinorhizobium_meliloti_1021':0.006124,'Sinorhizobium_medicae_WSM419':0.006258):0.029756):0.070623,((((('Bartonella_henselae_str_Houston_1':0.020444,'Bartonella_quintana_str_Toulouse':0.029388):0.009808,'Bartonella_tribocorum_CIP_105476':0.025338):0.031387,'Bartonella_bacilliformis_KC583':0.065286):0.098029,((((('Brucella_canis_ATCC_23365':6.88E-4,'Brucella_suis_1330':1.72E-4):6.89E-4,'Brucella_suis_ATCC_23445':0.001033):0.0,((('Brucella_abortus_S19':1.76E-4,'Brucella_melitensis_biovar_Abortus_2308':1.72E-4):0.0,'Brucella_abortus_biovar_1_str_9_941':0.0):6.84E-4,'Brucella_melitensis_16M':0.002611):1.71E-4):0.0,'Brucella_ovis_ATCC_25840':0.001206):0.013975,'Ochrobactrum_anthropi_ATCC_49188':0.01275):0.047742):0.034988,('Mesorhizobium_loti_MAFF303099':0.077568,'Mesorhizobium_sp_BNC1':0.07419):0.027327):0.021349):0.086016,(((((('Methylobacterium_extorquens_PA1':0.011368,'Methylobacterium_populi_BJ001':0.004799):0.03256,'Methylobacterium_radiotolerans_JCM_2831':0.045067):0.041026,'Methylobacterium_sp_4_46':0.058056):0.074338,'Beijerinckia_indica_subsp_indica_ATCC_9039':0.149061):0.026115,('Xanthobacter_autotrophicus_Py2':0.058512,'Azorhizobium_caulinodans_ORS_571':0.044976):0.079042):0.02072,(((('Bradyrhizobium_sp_ORS278':0.007711,'Bradyrhizobium_sp_BTAi1':0.006207):0.029183,'Bradyrhizobium_japonicum_USDA_110':0.033032):0.020122,((('Rhodopseudomonas_palustris_TIE_1':0.0,'Rhodopseudomonas_palustris_CGA009':3.43E-4):0.030733,('Rhodopseudomonas_palustris_BisB5':0.012815,'Rhodopseudomonas_palustris_HaA2':0.011805):0.010984):0.018577,('Rhodopseudomonas_palustris_BisB18':0.037297,'Rhodopseudomonas_palustris_BisA53':0.031869):0.014739):0.023424):0.01122,('Nitrobacter_winogradskyi_Nb_255':0.018532,'Nitrobacter_hamburgensis_X14':0.017835):0.041031):0.111343):0.056949):0.041219,'Parvibaculum_lavamentivorans_DS_1':0.17312):0.029751,(('Maricaulis_maris_MCS10':0.176054,'Hyphomonas_neptunium_ATCC_15444':0.253873):0.037937,('Caulobacter_sp_K31':0.046364,'Caulobacter_crescentus_CB15':0.03404):0.177012):0.067981):0.024054,((((('Silicibacter_pomeroyi_DSS_3':0.054186,'Silicibacter_sp_TM1040':0.048369):0.026318,'Roseobacter_denitrificans_OCh_114':0.076151):0.022514,'Jannaschia_sp_CCS1':0.099178):0.015502,'Dinoroseobacter_shibae_DFL_12':0.062324):0.048422,((('Rhodobacter_sphaeroides_ATCC_17029':3.54E-4,'Rhodobacter_sphaeroides_2_4_1':7.92E-4):0.0133,'Rhodobacter_sphaeroides_ATCC_17025':0.013986):0.076691,'Paracoccus_denitrificans_PD1222':0.091793):0.030158):0.202704):0.030795,((('Novosphingobium_aromaticivorans_DSM_12444':0.086704,'Erythrobacter_litoralis_HTCC2594':0.121244):0.045873,'Sphingopyxis_alaskensis_RB2256':0.085216):0.03517,('Sphingomonas_wittichii_RW1':0.074921,'Zymomonas_mobilis_subsp_mobilis_ZM4':0.113715):0.03071):0.187787):0.027917,(('Rhodospirillum_rubrum_ATCC_11170':0.169238,'Magnetospirillum_magneticum_AMB_1':0.125062):0.055149,((('Gluconobacter_oxydans_621H':0.101123,'Gluconacetobacter_diazotrophicus_PAl_5':0.059111):0.060663,'Granulibacter_bethesdensis_CGDNIH1':0.085514):0.032014,'Acidiphilium_cryptum_JF_5':0.120348):0.145099):0.055072):0.09899,((((('Rickettsia_bellii_RML369_C':6.4E-4,'Rickettsia_bellii_OSU_85_389':6.24E-4):0.053585,((('Rickettsia_typhi_str_Wilmington':0.015422,'Rickettsia_prowazekii_str_Madrid_E':0.012042):0.039247,(('Rickettsia_felis_URRWXCal2':0.00684,'Rickettsia_akari_str_Hartford':0.024586):0.00502,((('Rickettsia_rickettsii_str_Iowa':0.0,'Rickettsia_rickettsii_str_Sheila_Smith_':0.0):0.00663,'Rickettsia_conorii_str_Malish_7':0.002765):0.00386,'Rickettsia_massiliae_MTU5':0.00794):0.008218):0.002179):0.00446,'Rickettsia_canadensis_str_McKiel':0.030982):0.03239):0.199956,('Orientia_tsutsugamushi_Boryong':0.012903,'Orientia_tsutsugamushi_str_Ikeda':0.003049):0.386268):0.180586,(((('Anaplasma_phagocytophilum_HZ':0.123173,'Anaplasma_marginale_str_St_Maries':0.120008):0.159306,(('Ehrlichia_ruminantium_str_Gardel':0.002292,'Ehrlichia_ruminantium_str_Welgevonden':0.002453):0.053871,('Ehrlichia_canis_str_Jake':0.027749,'Ehrlichia_chaffeensis_str_Arkansas':0.027236):0.031378):0.134774):0.151991,(('Wolbachia_pipientis':0.072061,'Wolbachia_endosymbiont_of_Drosophila_melanogaster':0.04265):0.018425,'Wolbachia_endosymbiont_strain_TRS_of_Brugia_malayi':0.077353):0.269684):0.141145,'Neorickettsia_sennetsu_str_Miyayama':0.697366):0.171539):0.064736,'Candidatus_Pelagibacter_ubique_HTCC1062':0.549831):0.056614):0.089191,'Magnetococcus_sp_MC_1':0.339568):0.063213):0.093791,(('Acidobacteria_bacterium_Ellin345':0.221486,'Solibacter_usitatus_Ellin6076':0.235193):0.269435,(((((('Geobacter_sulfurreducens_PCA':0.05188,'Geobacter_metallireducens_GS_15':0.04331):0.045927,'Geobacter_uraniireducens_Rf4':0.08407):0.035503,('Pelobacter_propionicus_DSM_2379':0.107415,'Geobacter_lovleyi_SZ':0.104313):0.062144):0.11137,'Pelobacter_carbinolicus_DSM_2380':0.218315):0.102231,((((((('Desulfovibrio_vulgaris_subsp_vulgaris_DP4':2.75E-4,'Desulfovibrio_vulgaris_subsp_vulgaris_str_Hildenborough':2.47E-4):0.096556,'Desulfovibrio_desulfuricans_subsp_desulfuricans_str_G20':0.130391):0.041935,'Lawsonia_intracellularis_PHE_MN1_00':0.194671):0.108731,('GEBA_Desulfohalobium_retbaense':0.206905,'GEBA_Desulfomicrobium_baculatum':0.21572):0.039647):0.204436,'Desulfotalea_psychrophila_LSv54':0.368324):0.05185,('Desulfococcus_oleovorans_Hxd3':0.325853,'Syntrophobacter_fumaroxidans_MPOB':0.267991):0.045811):0.045242,'Syntrophus_aciditrophicus_SB':0.335728):0.039815):0.036306,((('Sorangium_cellulosum_So_ce_56_':0.340758,'GEBA_Haliangium_ochraceum':0.326975):0.069132,(('Anaeromyxobacter_sp_Fw109_5':0.05281,'Anaeromyxobacter_dehalogenans_2CP_C':0.046187):0.14976,'Myxococcus_xanthus_DK_1622':0.199974):0.121235):0.065096,'Bdellovibrio_bacteriovorus_HD100':0.487944):0.045793):0.04331):0.033782):0.033693,'GEBA_Denitrovibrio_acetiphilus':0.52279):0.036282,((((((('Sulfurimonas_denitrificans_DSM_1251':0.23347,'Arcobacter_butzleri_RM4018':0.208618):0.040632,'Sulfurovum_sp_NBC37_1':0.230443):0.030751,((((('Campylobacter_jejuni_subsp_jejuni_NCTC_11168':0.001263,'Campylobacter_jejuni_RM1221':9.27E-4):6.61E-4,(('Campylobacter_jejuni_subsp_doylei_269_97':0.006051,'Campylobacter_jejuni_subsp_jejuni_81116':3.16E-4):0.001123,'Campylobacter_jejuni_subsp_jejuni_81_176':9.26E-4):3.5E-4):0.119606,(('Campylobacter_curvus_525_92':0.03145,'Campylobacter_concisus_13826':0.037212):0.061724,'Campylobacter_fetus_subsp_fetus_82_40':0.097271):0.023098):0.020182,'Campylobacter_hominis_ATCC_BAA_381':0.16927):0.075991,'GEBA_Sulfurospirillum_deleyianum':0.136694):0.07072):0.026032,(((((('Helicobacter_pylori_HPAG1':0.003032,'Helicobacter_pylori_26695':0.004293):9.24E-4,'Helicobacter_pylori_Shi470':0.004956):0.002035,'Helicobacter_pylori_J99':0.00625):0.008252,'Helicobacter_acinonychis_str_Sheeba':0.010673):0.199339,'Helicobacter_hepaticus_ATCC_51449':0.121696):0.062387,'Wolinella_succinogenes_DSM_1740':0.107844):0.108511):0.052813,'Nitratiruptor_sp_SB155_2':0.132215):0.401479,('Aquifex_aeolicus_VF5':0.291771,'Sulfurihydrogenibium_sp_YO3AOP1':0.270137):0.202377):0.049843,'Elusimicrobium_minutum_Pei191':0.667875):0.034753):0.023181,((((((('GEBA_Dyadobacter_fermentans':0.115999,'GEBA_Spirosoma_linguale':0.124662):0.074785,'Cytophaga_hutchinsonii_ATCC_33406':0.182703):0.054101,'Candidatus_Amoebophilus_asiaticus_5a2':0.332349):0.041954,(('GEBA_Pedobacter_heparinus':0.181859,'GEBA_Chitinophaga_pinensis':0.30193):0.037197,((((('Flavobacterium_psychrophilum_JIP02_86':0.059237,'Flavobacterium_johnsoniae_UW101':0.052582):0.072716,'Gramella_forsetii_KT0803':0.133872):0.034419,'GEBA_Capnocytophaga_ochracea':0.123672):0.087924,'Candidatus_Sulcia_muelleri_GWSS':0.665368):0.057908,(((('Bacteroides_fragilis_YCH46':0.0,'Bacteroides_fragilis_NCTC_9343':0.0):0.024113,'Bacteroides_thetaiotaomicron_VPI_5482':0.02726):0.02733,'Bacteroides_vulgatus_ATCC_8482':0.051865):0.063951,(('Porphyromonas_gingivalis_ATCC_33277':8.58E-4,'Porphyromonas_gingivalis_W83':0.001013):0.151787,'Parabacteroides_distasonis_ATCC_8503':0.06466):0.046245):0.18136):0.057273):0.041539):0.194704,('GEBA_Rhodothermus_marinus':0.154571,'Salinibacter_ruber_DSM_13855':0.312844):0.159944):0.062353,(((('Chlorobium_tepidum_TLS':0.032082,'Chlorobaculum_parvum_NCIB_8327':0.032712):0.058237,((('Prosthecochloris_vibrioformis_DSM_265':0.059493,'Pelodictyon_luteolum_DSM_273':0.0478):0.035359,'Chlorobium_chlorochromatii_CaD3':0.105899):0.014378,('Chlorobium_limicola_DSM_245':0.051253,'Chlorobium_phaeobacteroides_DSM_266':0.062034):0.016101):0.038256):0.042465,'Chlorobium_phaeobacteroides_BS1':0.115118):0.129491,'Chloroherpeton_thalassium_ATCC_35110':0.162073):0.253695):0.148638,(((('Akkermansia_muciniphila_ATCC_BAA_835':0.396966,'Opitutus_terrae_PB90_1':0.451463):0.059359,'Methylacidiphilum_infernorum_V4':0.427955):0.105972,((((('Chlamydia_trachomatis_D_UW_3_CX':0.002254,'Chlamydia_trachomatis_A_HAR_13':0.002736):0.004197,('Chlamydia_trachomatis_434_Bu':0.0,'Chlamydia_trachomatis_L2b_UCH_1_proctitis':3.44E-4):0.002396):0.021326,'Chlamydia_muridarum_Nigg':0.019605):0.087368,((('Chlamydophila_abortus_S26_3':0.030285,'Chlamydophila_caviae_GPIC':0.023437):0.007338,'Chlamydophila_felis_Fe_C_56':0.022986):0.042096,(('Chlamydophila_pneumoniae_AR39':0.0,'Chlamydophila_pneumoniae_J138':1.71E-4):1.71E-4,('Chlamydophila_pneumoniae_CWL029':3.42E-4,'Chlamydophila_pneumoniae_TW_183':3.44E-4):0.0):0.088464):0.034515):0.269463,'Candidatus_Protochlamydia_amoebophila_UWE25':0.26373):0.294159):0.072779,('Rhodopirellula_baltica_SH_1':0.328464,'GEBA_Planctomyces_limnophilus':0.334605):0.359784):0.051776):0.041976):0.020277,(((((('Borrelia_afzelii_PKo':0.011852,'Borrelia_garinii_PBi':0.016443):0.006519,'Borrelia_burgdorferi_B31':0.013496):0.093741,'Borrelia_hermsii_DAH':0.090331):0.391295,(('Treponema_pallidum_subsp_pallidum_SS14':0.0,'Treponema_pallidum_subsp_pallidum_str_Nichols':0.0):0.267368,'Treponema_denticola_ATCC_35405':0.151221):0.231493):0.155836,'GEBA_Brachyspira_murdochii':0.52097):0.062841,((('Leptospira_borgpetersenii_serovar_Hardjo_bovis_L550':2.2E-4,'Leptospira_borgpetersenii_serovar_Hardjo_bovis_JB197':0.0):0.017229,('Leptospira_interrogans_serovar_Lai_str_56601':1.81E-4,'Leptospira_interrogans_serovar_Copenhageni_str_Fiocruz_L1_130':8.38E-4):0.020842):0.144917,('Leptospira_biflexa_serovar_Patoc_strain_Patoc_1_Paris_':0.0,'Leptospira_biflexa_serovar_Patoc_strain_Patoc_1_Ames_':1.71E-4):0.165532):0.395767):0.1053):0.036183,(((((((('Mycoplasma_hyopneumoniae_J':0.002353,'Mycoplasma_hyopneumoniae_7448':0.003734):0.002271,'Mycoplasma_hyopneumoniae_232':0.003699):0.430983,(((('Mycoplasma_synoviae_53':0.261036,'Mycoplasma_agalactiae_PG2':0.263813):0.099129,'Mycoplasma_pulmonis_UAB_CTIP':0.278073):0.046282,'Mycoplasma_arthritidis_158L3_1':0.383762):0.035874,'Mycoplasma_mobile_163K':0.328229):0.046689):0.262618,((('Ureaplasma_parvum_serovar_3_str_ATCC_700970':1.66E-4,'Ureaplasma_parvum_serovar_3_str_ATCC_27815':0.0):0.453388,'Mycoplasma_penetrans_HF_2':0.366678):0.056306,(('Mycoplasma_pneumoniae_M129':0.106972,'Mycoplasma_genitalium_G37':0.117431):0.32371,'Mycoplasma_gallisepticum_R':0.299834):0.137813):0.242873):0.06613,(('Mycoplasma_mycoides_subsp_mycoides_SC_str_PG1':0.018411,'Mycoplasma_capricolum_subsp_capricolum_ATCC_27343':0.013116):0.146931,'Mesoplasma_florum_L1':0.163211):0.286922):0.11052,((('Onion_yellows_phytoplasma_OY_M':0.016914,'Aster_yellows_witches_broom_phytoplasma_AYWB':0.018909):0.202409,'Candidatus_Phytoplasma_mali':0.277597):0.160881,'Acholeplasma_laidlawii_PG_8A':0.249163):0.180723):0.122978,((('GEBA_Streptobacillus_moniliformis':0.173665,'GEBA_Leptotrichia_buccalis':0.08471):0.045217,'GEBA_Sebaldella_termitidis':0.115851):0.114134,'Fusobacterium_nucleatum_subsp_nucleatum_ATCC_25586':0.22874):0.263641):0.093021,(((((((('Desulfitobacterium_hafniense_Y51':0.199501,'Heliobacterium_modesticaldum_Ice1':0.161842):0.040662,'Moorella_thermoacetica_ATCC_39073':0.189602):0.025151,(((('Desulfotomaculum_reducens_MI_1':0.141921,'GEBA_Desulfotomaculum_acetoxidans':0.173079):0.026808,'Pelotomaculum_thermopropionicum_SI':0.119406):0.031051,'Candidatus_Desulforudis_audaxviator_MP104C':0.239396):0.041262,'Carboxydothermus_hydrogenoformans_Z_2901':0.192197):0.033883):0.022877,'Syntrophomonas_wolfei_subsp_wolfei_str_Goettingen':0.340927):0.029485,'Symbiobacterium_thermophilum_IAM_14863':0.302816):0.024169,('Natranaerobius_thermophilus_JW_NM_WN_LF':0.328273,'GEBA_Veillonella_parvula':0.318108):0.03302):0.020798,(((((((((((((('Bacillus_thuringiensis_serovar_konkukian_str_97_27':3.38E-4,'Bacillus_thuringiensis_str_Al_Hakam':1.69E-4):1.69E-4,(('Bacillus_anthracis_str_Sterne':0.0,'Bacillus_anthracis_str_Ames_Ancestor_':0.0):0.0,'Bacillus_anthracis_str_Ames':0.0):0.001183):6.51E-4,'Bacillus_cereus_E33L':5.32E-4):0.001131,('Bacillus_cereus_ATCC_14579':0.002251,'Bacillus_weihenstephanensis_KBAB4':0.013987):0.004118):0.001515,'Bacillus_cereus_ATCC_10987':2.57E-4):0.020967,'Bacillus_cereus_subsp_cytotoxis_NVH_391_98':0.013733):0.085584,((('Bacillus_subtilis_subsp_subtilis_str_168':0.015012,'Bacillus_amyloliquefaciens_FZB42':0.015183):0.020109,'Bacillus_pumilus_SAFR_032':0.040049):0.011797,'Bacillus_licheniformis_ATCC_14580':0.023456):0.065389):0.021459,('Geobacillus_thermodenitrificans_NG80_2':0.011367,'Geobacillus_kaustophilus_HTA426':0.01873):0.094956):0.020796,(((((((('Staphylococcus_aureus_subsp_aureus_Mu3':0.0,'Staphylococcus_aureus_subsp_aureus_Mu50':1.67E-4):1.67E-4,'Staphylococcus_aureus_subsp_aureus_N315':1.67E-4):0.0,('Staphylococcus_aureus_subsp_aureus_JH9':0.001001,'Staphylococcus_aureus_subsp_aureus_JH1':0.0):3.34E-4):5.16E-4,((('Staphylococcus_aureus_subsp_aureus_MRSA252':6.67E-4,'Staphylococcus_aureus_RF122':3.35E-4):0.0,((('Staphylococcus_aureus_subsp_aureus_NCTC_8325':6.77E-4,'Staphylococcus_aureus_subsp_aureus_USA300_TCH1516':0.0):0.0,'Staphylococcus_aureus_subsp_aureus_USA300':0.0):0.0,('Staphylococcus_aureus_subsp_aureus_COL':6.68E-4,'Staphylococcus_aureus_subsp_aureus_str_Newman':3.33E-4):3.34E-4):1.67E-4):0.0,('Staphylococcus_aureus_subsp_aureus_MSSA476':0.0,'Staphylococcus_aureus_subsp_aureus_MW2':0.0):3.34E-4):1.51E-4):0.032187,(('Staphylococcus_epidermidis_ATCC_12228':3.35E-4,'Staphylococcus_epidermidis_RP62A':1.66E-4):0.023543,'Staphylococcus_haemolyticus_JCSC1435':0.025193):0.012868):0.014679,'Staphylococcus_saprophyticus_subsp_saprophyticus_ATCC_15305':0.045225):0.208482,(((((((((('Streptococcus_pyogenes_SSI_1':1.69E-4,'Streptococcus_pyogenes_MGAS315':3.35E-4):3.86E-4,((((('Streptococcus_pyogenes_M1_GAS':3.34E-4,'Streptococcus_pyogenes_MGAS5005':1.67E-4):3.34E-4,((('Streptococcus_pyogenes_MGAS8232':6.78E-4,'Streptococcus_pyogenes_MGAS10750':8.62E-4):1.67E-4,'Streptococcus_pyogenes_str_Manfredo':5.01E-4):3.34E-4,'Streptococcus_pyogenes_MGAS10394':5.01E-4):0.0):1.67E-4,('Streptococcus_pyogenes_MGAS9429':5.01E-4,'Streptococcus_pyogenes_MGAS2096':0.0):5.01E-4):0.0,'Streptococcus_pyogenes_MGAS10270':5.09E-4):3.34E-4,'Streptococcus_pyogenes_MGAS6180':5.01E-4):2.83E-4):0.032705,(('Streptococcus_agalactiae_2603V_R':5.02E-4,'Streptococcus_agalactiae_A909':0.0):5.02E-4,'Streptococcus_agalactiae_NEM316':0.0):0.032531):0.009744,(('Streptococcus_thermophilus_LMG_18311':4.96E-4,'Streptococcus_thermophilus_CNRZ1066':6.75E-4):0.001002,'Streptococcus_thermophilus_LMD_9':8.46E-4):0.042346):0.010989,'Streptococcus_mutans_UA159':0.053241):0.014154,(('Streptococcus_suis_98HAH33':1.41E-4,'Streptococcus_suis_05ZYH33':0.003069):0.045053,(((('Streptococcus_pneumoniae_D39':0.0,'Streptococcus_pneumoniae_R6':0.0):0.001089,('Streptococcus_pneumoniae_TIGR4':7.57E-4,'Streptococcus_pneumoniae_CGSP14':0.001274):5.04E-4):1.79E-4,'Streptococcus_pneumoniae_Hungary19A_6':5.78E-4):0.025464,('Streptococcus_gordonii_str_Challis_substr_CH1':0.013866,'Streptococcus_sanguinis_SK36':0.013168):0.014048):0.01685):0.015608):0.079689,(('Lactococcus_lactis_subsp_cremoris_SK11':0.001565,'Lactococcus_lactis_subsp_cremoris_MG1363':8.56E-4):0.005333,'Lactococcus_lactis_subsp_lactis_Il1403':0.005458):0.157584):0.124114,'Enterococcus_faecalis_V583':0.096988):0.030082,((((('Lactobacillus_acidophilus_NCFM':0.017053,'Lactobacillus_helveticus_DPC_4571':0.025039):0.047802,('Lactobacillus_gasseri_ATCC_33323':0.008251,'Lactobacillus_johnsonii_NCC_533':0.00468):0.081515):0.035749,('Lactobacillus_delbrueckii_subsp_bulgaricus_ATCC_BAA_365':0.001075,'Lactobacillus_delbrueckii_subsp_bulgaricus_ATCC_11842':9.47E-4):0.095385):0.189762,(('Lactobacillus_casei_BL23':1.67E-4,'Lactobacillus_casei_ATCC_334':1.71E-4):0.13496,'Lactobacillus_sakei_subsp_sakei_23K':0.116042):0.029638):0.037981,((((('Lactobacillus_brevis_ATCC_367':0.113171,'Lactobacillus_plantarum_WCFS1':0.09989):0.023598,'Pediococcus_pentosaceus_ATCC_25745':0.140539):0.018317,('Lactobacillus_fermentum_IFO_3956':0.075472,'Lactobacillus_reuteri_F275':0.062813):0.114981):0.022679,(('Leuconostoc_mesenteroides_subsp_mesenteroides_ATCC_8293':0.031056,'Leuconostoc_citreum_KM20':0.035589):0.124121,'Oenococcus_oeni_PSU_1':0.234536):0.131959):0.021585,'Lactobacillus_salivarius_UCC118':0.124879):0.034651):0.077386):0.111668,((('Listeria_monocytogenes_EGD_e':0.00208,'Listeria_monocytogenes_str_4b_F2365':7.75E-4):0.001511,'Listeria_welshimeri_serovar_6b_str_SLCC5334':0.003974):0.001047,'Listeria_innocua_Clip11262':0.001031):0.107265):0.033808):0.044404,'Lysinibacillus_sphaericus_C3_41':0.152339):0.036331):0.021337,'Oceanobacillus_iheyensis_HTE831':0.18896):0.019902,('Bacillus_clausii_KSM_K16':0.093196,'Bacillus_halodurans_C_125':0.066964):0.053506):0.027554,'Exiguobacterium_sibiricum_255_15':0.211271):0.115095,'GEBA_Alicyclobacillus_acidocaldarius':0.223192):0.06559,((((('Clostridium_acetobutylicum_ATCC_824':0.123954,'Clostridium_novyi_NT':0.114099):0.023517,(((('Clostridium_perfringens_ATCC_13124':5.13E-4,'Clostridium_perfringens_str_13':0.001376):4.86E-4,'Clostridium_perfringens_SM101':0.00102):0.09186,(('Clostridium_botulinum_E3_str_Alaska_E43':0.003782,'Clostridium_botulinum_B_str_Eklund_17B':0.003812):0.055674,'Clostridium_beijerinckii_NCIMB_8052':0.067314):0.054513):0.057627,(((((('Clostridium_botulinum_A_str_ATCC_3502':0.0,'Clostridium_botulinum_A_str_ATCC_19397':0.0):0.0,'Clostridium_botulinum_A_str_Hall':1.67E-4):0.001673,('Clostridium_botulinum_F_str_Langeland':0.001211,'Clostridium_botulinum_B1_str_Okra':0.003486):0.001309):0.001791,'Clostridium_botulinum_A3_str_Loch_Maree':0.007218):0.098641,'Clostridium_kluyveri_DSM_555':0.121865):0.020585,'Clostridium_tetani_E88':0.115493):0.020236):0.012084):0.136573,'Clostridium_phytofermentans_ISDg':0.312208):0.036695,(('GEBA_Anaerococcus_prevotii':0.28787,'Finegoldia_magna_ATCC_29328':0.211317):0.173769,(('Alkaliphilus_metalliredigens_QYMF':0.12868,'Alkaliphilus_oremlandii_OhILAs':0.107044):0.082634,'Clostridium_difficile_630':0.211875):0.045791):0.027696):0.039355,(((('Thermoanaerobacter_pseudethanolicus_ATCC_33223':0.004904,'Thermoanaerobacter_sp_X514':0.004594):0.034053,'Thermoanaerobacter_tengcongensis_MB4':0.041493):0.149566,'Caldicellulosiruptor_saccharolyticus_DSM_8903':0.23409):0.036154,'Clostridium_thermocellum_ATCC_27405':0.176618):0.036512):0.061826):0.019474):0.069313,(((((((((((((('GEBA_Sanguibacter_keddieii':0.064063,'GEBA_Jonesia_denitrificans':0.099947):0.030002,'GEBA_Xylanimonas_cellulosilytica':0.094672):0.01975,'GEBA_Cellulomonas_flavigena':0.082405):0.035617,'GEBA_Beutenbergia_cavernae':0.115349):0.051352,(('GEBA_Brachybacterium_faecium':0.196353,'GEBA_Kytococcus_sedentarius':0.168724):0.036428,(((('Arthrobacter_aurescens_TC1':0.020568,'Arthrobacter_sp_FB24':0.028887):0.039891,'Renibacterium_salmoninarum_ATCC_33209':0.068094):0.050028,'Kocuria_rhizophila_DC2201':0.130484):0.065937,(('Clavibacter_michiganensis_subsp_michiganensis_NCPPB_382':0.003996,'Clavibacter_michiganensis_subsp_sepedonicus':0.003983):0.084663,'Leifsonia_xyli_subsp_xyli_str_CTCB07':0.089834):0.133992):0.04523):0.013623):0.017241,'Kineococcus_radiotolerans_SRS30216':0.154573):0.033958,((((((('GEBA_Nocardiopsis_dassonvillei':0.093463,'Thermobifida_fusca_YX':0.072507):0.074571,'GEBA_Thermomonospora_curvata':0.099223):0.020272,('GEBA_Thermobispora_bispora':0.072121,'GEBA_Streptosporangium_roseum':0.083245):0.048392):0.038128,'Acidothermus_cellulolyticus_11B':0.153492):0.027393,((('Frankia_sp_CcI3':0.019914,'Frankia_alni_ACN14a':0.017049):0.027318,'Frankia_sp_EAN1pec':0.039855):0.137522,(((((((((('Mycobacterium_gilvum_PYR_GCK':0.025373,'Mycobacterium_vanbaalenii_PYR_1':0.014398):0.020335,(('Mycobacterium_sp_KMS':0.0,'Mycobacterium_sp_MCS':0.0):2.54E-4,'Mycobacterium_sp_JLS':4.38E-4):0.029295):0.012695,'Mycobacterium_smegmatis_str_MC2_155':0.027998):0.015961,(((((((('Mycobacterium_bovis_AF2122_97':0.0,'Mycobacterium_bovis_BCG_str_Pasteur_1173P2':3.48E-4):5.22E-4,'Mycobacterium_tuberculosis_F11':3.48E-4):0.0,'Mycobacterium_tuberculosis_H37Ra':0.0):0.0,'Mycobacterium_tuberculosis_H37Rv':0.0):1.74E-4,'Mycobacterium_tuberculosis_CDC1551':3.48E-4):0.028503,('Mycobacterium_marinum_M':2.83E-4,'Mycobacterium_ulcerans_Agy99':0.005107):0.026919):0.008581,'Mycobacterium_leprae_TN':0.050438):0.008421,('Mycobacterium_avium_104':1.71E-4,'Mycobacterium_avium_subsp_paratuberculosis_K_10':0.001878):0.024032):0.041807):0.021744,'Mycobacterium_abscessus':0.054563):0.05338,(('GEBA_Gordonia_bronchialis':0.080084,'GEBA_Tsukamurella_paurometabola':0.095182):0.023186,('Nocardia_farcinica_IFM_10152':0.066164,'Rhodococcus_sp_RHA1':0.059553):0.015357):0.01873):0.081897,(('GEBA_Saccharomonospora_viridis':0.098036,'Saccharopolyspora_erythraea_NRRL_2338':0.100342):0.021978,'GEBA_Actinosynnema_mirum':0.097173):0.027185):0.031045,'GEBA_Nakamurella_multipartita':0.151197):0.032028,'GEBA_Geodermatophilus_obscurus':0.140115):0.027068,(('Salinispora_arenicola_CNS_205':0.012069,'Salinispora_tropica_CNB_440':0.010019):0.097618,'GEBA_Stackebrandtia_nassauensis':0.170224):0.084563):0.033468):0.03169):0.026445,((('Streptomyces_avermitilis_MA_4680':0.027139,'Streptomyces_coelicolor_A3_2_':0.027351):0.021484,'Streptomyces_griseus_subsp_griseus_NBRC_13350':0.037328):0.090414,'GEBA_Catenulispora_acidiphila':0.137426):0.05113):0.0286,(('GEBA_Kribbella_flavida':0.118628,'Nocardioides_sp_JS614':0.133604):0.026332,'Propionibacterium_acnes_KPA171202':0.242956):0.044904):0.019561):0.056195,(((('Corynebacterium_glutamicum_R':4.77E-4,'Corynebacterium_glutamicum_ATCC_13032':0.001073):0.033614,'Corynebacterium_efficiens_YS_314':0.038791):0.051561,'Corynebacterium_diphtheriae_NCTC_13129':0.068116):0.044622,('Corynebacterium_urealyticum_DSM_7109':0.065695,'Corynebacterium_jeikeium_K411':0.054499):0.056416):0.221144):0.045301,(('Bifidobacterium_longum_DJO10A':1.88E-4,'Bifidobacterium_longum_NCC2705':3.39E-4):0.040933,'Bifidobacterium_adolescentis_ATCC_15703':0.037617):0.303161):0.039753,('Tropheryma_whipplei_TW08_27':8.95E-4,'Tropheryma_whipplei_str_Twist':6.79E-4):0.478476):0.120846,'GEBA_Acidimicrobium_ferrooxidans':0.43829):0.090081,((('GEBA_Cryptobacterium_curtum':0.142154,'GEBA_Eggerthella_lenta':0.088587):0.047811,'GEBA_Slackia_heliotrinireducens':0.117231):0.104268,'GEBA_Atopobium_parvulum':0.269623):0.198082):0.036936,('GEBA_Conexibacter_woesei':0.383152,'Rubrobacter_xylanophilus_DSM_9941':0.357989):0.109409):0.087352,((((((((('Synechococcus_sp_WH_7803':0.023543,'Synechococcus_sp_CC9311':0.044194):0.014221,(('Synechococcus_sp_CC9605':0.022222,'Synechococcus_sp_WH_8102':0.025798):0.006423,'Synechococcus_sp_CC9902':0.032326):0.022152):0.016325,((('Prochlorococcus_marinus_subsp_marinus_str_CCMP1375':0.058394,'Prochlorococcus_marinus_str_MIT_9211':0.053438):0.019947,(('Prochlorococcus_marinus_str_NATL2A':0.00276,'Prochlorococcus_marinus_str_NATL1A':0.002109):0.070906,(('Prochlorococcus_marinus_str_MIT_9515':0.013908,'Prochlorococcus_marinus_subsp_pastoris_str_CCMP1986':0.016702):0.023743,((('Prochlorococcus_marinus_str_AS9601':0.005031,'Prochlorococcus_marinus_str_MIT_9301':0.005416):0.003975,'Prochlorococcus_marinus_str_MIT_9215':0.011185):0.004908,'Prochlorococcus_marinus_str_MIT_9312':0.010415):0.020455):0.117983):0.022565):0.040798,('Prochlorococcus_marinus_str_MIT_9313':0.00429,'Prochlorococcus_marinus_str_MIT_9303':0.003453):0.043557):0.012995):0.046129,'Synechococcus_sp_RCC307':0.067465):0.195828,('Synechococcus_elongatus_PCC_7942':3.21E-4,'Synechococcus_elongatus_PCC_6301':0.00142):0.098101):0.061386,((((('Nostoc_sp_PCC_7120':0.004827,'Anabaena_variabilis_ATCC_29413':0.003501):0.036361,'Nostoc_punctiforme_PCC_73102':0.042386):0.086848,'Trichodesmium_erythraeum_IMS101':0.151445):0.024405,((('Cyanothece_sp_ATCC_51142':0.088208,'Microcystis_aeruginosa_NIES_843':0.108904):0.018975,'Synechocystis_sp_PCC_6803':0.111161):0.027275,'Synechococcus_sp_PCC_7002':0.127272):0.053965):0.023778,('Acaryochloris_marina_MBIC11017':0.127345,'Thermosynechococcus_elongatus_BP_1':0.138774):0.035212):0.017944):0.063385,('Synechococcus_sp_JA_3_3Ab':0.023735,'Synechococcus_sp_JA_2_3B_a_2_13_':0.025648):0.172906):0.056598,'Gloeobacter_violaceus_PCC_7421':0.211642):0.301851,((((('Roseiflexus_castenholzii_DSM_13941':0.02822,'Roseiflexus_sp_RS_1':0.017828):0.118476,'Chloroflexus_aurantiacus_J_10_fl':0.154341):0.066504,'Herpetosiphon_aurantiacus_ATCC_23779':0.219302):0.124086,('GEBA_Sphaerobacter_thermophilus':0.242004,'GEBA_Thermobaculum_terrenum':0.272557):0.055029):0.055648,(('Dehalococcoides_sp_CBDB1':5.54E-4,'Dehalococcoides_sp_BAV1':8.51E-4):0.017643,'Dehalococcoides_ethenogenes_195':0.014048):0.50406):0.083393):0.032459):0.039409):0.035188):0.022061):0.039665):0.054226,((('Thermosipho_melanesiensis_BI429':0.127803,'Fervidobacterium_nodosum_Rt17_B1':0.188887):0.074834,((('Thermotoga_maritima_MSB8':0.007577,'Thermotoga_sp_RQ2':5.3E-4):0.005057,'Thermotoga_petrophila_RKU_1':0.006656):0.159923,'Thermotoga_lettingae_TMO':0.230211):0.037393):0.080613,'Petrotoga_mobilis_SJ95':0.362128):0.246177):0.287154,(('Thermus_thermophilus_HB8':0.001498,'Thermus_thermophilus_HB27':0.001224):0.144321,('GEBA_Meiothermus_ruber':0.114423,'GEBA_Meiothermus_silvanus':0.077947):0.086289):0.12529):0.1212895,('Deinococcus_geothermalis_DSM_11300':0.043888,'Deinococcus_radiodurans_R1':0.07314):0.1212895);
END;

New openaccess paper from my lab on "Zorro" software for automated masking of sequence alignments

A new Open Access paper from my lab was just published in PLoS One: Accounting For Alignment Uncertainty in Phylogenomics. Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.0030288

The paper describes the software "Zorro" which is used for automated "masking" of sequence alignments. Basically, if you have a multiple sequence alignment you would like to use to infer a phylogenetic tree, in some cases it is desirable to block out regions of the alignment that are not reliable. This blocking is called "masking."

Masking is thought by many to be important because sequence alignments are in essence a hypothesis about the common ancestry of specific residues in different genes/proteins/regions of the genome. This "positional homology" is not always easy to assign and for regions where positional homology is ambiguous it may be better to ignore such regions when inferring phylogenetic trees from alignments.

Historically, masking has been done by hand/eye looking for columns in a multiple sequence alignment that seem to have issues and then either eliminating those columns or giving them a lower weight and using a weighting scheme in the phylogenetic analysis.

What Zorro does is it removes much of the subjectivity of this process and generates automated masking patterns for sequence alignments. It does this by assigning confidence scores to each column in a multiple seqeunce alignment. These scores can then be used to account for alignment accuracy in phylogenetic inference pipelines.

The software is available at Sourceforge: ZORRO – probabilistic masking for phylogenetics. It was written primarily by Martin Wu (who is now a Professor at the University of Virginia) and Sourav Chatterji with a little help here and there from Aaron Darling I think. The development of Zorro was part of my "iSEEM" project that was supported by the Gordon and Betty Moore Foundation.

In the interest of sharing, since the paper is fully open access, ~~I am posting it here below the fold~~. UPDATE 2/9 - decided to remove this since it got in the way of getting to the comments ...

Sunday, January 29, 2012

One old, one new - a few phylogeny papers worth checking out

Just a quick one here. A few days ago in my lab we were discussing some challenges with doing phylogenetic diversity (PD) measurements in very very large phylogenetic trees. PD is a measure of total branch length in a phylogenetic tree for a group of taxa ... and we use it for many purposes.

For many of our applications we have been using an algorithm described by Mike Steele "Phylogenetic diversity and the Greedy Algorithm". But alas, is is not keeping up with the massive tree sets we are dealing with. Fortunately Aaron Darling in my lab found a alternative paper with a perfect sounding title for us: Phylogenetic Diversity within Seconds from Minh, Klaere, and von Haeseler. This seems like it will do the trick. I note - Kudos to Systematic Biology for making some older papers freely available. Not sure of their general policies on this but good to see.

Anyway - back to the grind ...

Sunday, January 8, 2012

Announcement: Workshop on Multiple Sequence Alignment and Phylogeny Estimation

Posting this for Tandy Warnow

Workshop on Advances in Multiple Sequence Alignment and Phylogeny Estimation

May 20 and 21, 2012, Smithsonian Institution, Washington, DC

The workshop is funded by the National Science Foundation through grant DEB 0733029 to the University of Texas. Registration is required, and attendance is limited to 40 participants. The workshop will include presentations of new methods for multiple sequence alignment and phylogeny estimation, also training in the use of these methods, and personal assistance in analyzing datasets using the SATé software (see this page). Applications for the workshop (and for travel support) are due by February 15, 2012, and will be responded to by March 1. We expect to be able to provide support to all attendees. Please click here for the application form. For more information, please send an email to Tandy Warnow (see below).

Letter from Tandy explaining workshop:

Dear Colleagues,

We are writing to let you know about a workshop and symposium that we will hold on May 20-22, 2012, at the Smithsonian Institution in Washington, DC. The workshop will provide training in advanced methods for multiple sequence alignment and phylogeny estimation, and will take place on May 20 and 21; the symposium will follow immediately and will feature research presentations on the same topic. This workshop is funded by:

NSF DEB 0733029
Large-scale simultaneous multiple alignment and phylogeny estimation
Project Webpage: http://www.cs.utexas.edu/users/tandy/ATOL-MSA.html

The workshop will include presentations of new methods for maximum likelihood phylogeny estimation of large sequence alignments (including GARLI and FastTree), for comparing different alignments of the same dataset, for phylogenetic analyses of datasets that include partial sequences (e.g., short reads generated in a metagenomic analysis), for supertree estimation, and for simulating sequence evolution. However, a main focus is to train participants in both basic and advanced use of the SATé software (Liu et al. 2009, Science, Vol. 324, no. 5934, pp. 1561-1564) for simultaneous estimation of alignments and trees (SATé software available for download at http://phylo.bio.ku.edu/software/sate/sate.html ).

Workshop participants are expected to bring laptops with them to the workshop, so that they can perform alignment and phylogenetic tree estimations. We will provide test datasets for you to learn how to use SATé, but strongly encourage you to bring your own datasets to analyze.

Attendance at the workshop is limited to 40 participants, and registration is required. If you are interested in attending the workshop, whether or not you are requesting travel support, please fill out the Word document available at http://www.cs.utexas.edu/users/tandy/workshop-application.doc, and return it to Laurie Alvarez (lauriea@austin.utexas.edu) by February 15, 2012. We will respond to requests for registration by March 1, 2012.

For more information on the workshop, please contact me (Tandy Warnow), at tandy@cs.utexas.edu. For more information on the Symposium, please contact Mike Braun (braunm@si.edu). We look forward to seeing you at the Smithsonian workshop and symposium!

Regards,

Tandy Warnow and Mike Braun

On behalf of the AToL project team:

Michael Braun, The Smithsonian Institution
Mark Holder, The University of Kansas
Jim Leebens-Mack, The University of Georgia
Randy Linder, The University of Texas
Etsuko Moriyama, The University of Nebraska
Tandy Warnow, The University of Texas

Tuesday, November 8, 2011

I am phylogeny obsessed but this is too much to me: phylogeny of cancer subtypes

Just because you have data that could be plugged into a phylogenetic analysis does not mean it makes sense to do so. Case in point - the following paper:

A Differentiation-Based Phylogeny of Cancer Subtypes by Riester M, Stephan-Otto Attolini C, Downey RJ, Singer S, Michor F.

In this paper the authors take gene expression data from various cancer samples/cell lines and then they build phylogenetic trees from the data. See example below:

Figure 2. A phylogeny of acute myeloid leukemia (AML) subtypes. According to the French-American-British (FAB) classification, AML samples are classified into seven different types according to their level of differentiation (see Table 1). Expression data from 362 AML patients and 7 Myelodysplastic Syndrome (MDS-AML) patients is used to construct a phylogeny of these leukemias. We include expression data of human embryonic stem cells (hESCs), CD34+ cells from bone marrow (CD34 BM) and peripheral blood (CD34 PB), and mononuclear cells from bone marrow (BM) and peripheral blood (PB). The differentiation pathway from hESCs to mononuclear cells from peripheral blood is represented in purple, and the common ancestors of subtypes are shown as pink dots. The bootstrap values of branches are indicated by boxed numbers, representing the percentage of bootstrapping trees containing this branch. The ranking of AML subtypes identified by the phylogenetic algorithm corresponds with the differentiation status indicated by the FAB classification. The M6 subtype, represented by only 10 samples in our dataset, has the least stable branch, leading to lower bootstrap values for those branches where it can alternatively be located.

The pictures are pretty. They make some sense biologically. The paper has some very interesting parts and I do not want to suggest that the paper is not useful. But it makes no sense to me to use a phylogenetic approach to analyze this data. Phylogenetic methods are about reconstructing history of evolutionary lineages. They should not be doing that here as far as I can tell since the cancers are from different people with different histories and what they make be looking at is convergent / developmental similarities in the cancer samples. But they are not looking at history per se. And thus it is not appropriate to use algorithms that use phylogenetic methods:

It just makes no sense to me to use a phylogenetic method instead of some sort of clustering method in the step where it says "construct tree" in their flow diagram. Sure phylogenetic methods can make nice pictures. But they should only be used when the underlying data has a history that is reflected in the model/assumptions of the phylogenetic method. I could, for example build a phylogeny of cities based on various metrics. But would that make sense? Most likely not. Don't get confused by the fact that similar things group together in the same part of a phylogenetic tree to thinking that that means that a phylogenetic model is right for your data.

I may be obsessed with phylogeny but that obsession applies to applying phylogenetic methods to data with histories that are approximated by the methods being used ... and this paper seems to not be doing that ...

Hat tip to Eric Lowe, an undergrad in my lab for showing me this paper.

I note - this does not mean that phylogenetic methods cannot be applied to cancer studies. Case in point - this paper:

Estimation of rearrangement phylogeny for cancer genomes by Greenman CD, Pleasance ED, Newman S, Yang F, Fu B, Nik-Zainal S, Jones D, Lau KW, Carter N, Edwards PA, Futreal PA, Stratton MR, Campbell PJ.

In this paper the authors focus on mutations in cancer cells and they use phylogenetic methods to determine the order in which genomic changes happen in these cancer cells. This seems to be an excellent use of phylogenetic / phylogenomic methods.

So - lesson of the day - phylogenetic methods should be used on data with a phylogenetic history. Not so complicated. But pretty important.

Friday, October 7, 2011

The story behind Pseudomonas syringae comparative genomics / pathogenicity paper; guest post by David Baltrus (@surt_lab)

More fun from the community. Today I am very happy to have another guest post in my "Story behind the paper" series. This one comes to us from David Baltrus, an Assistant Professor at University of Arizona. For more on David see his lab page here and his twitter feed here. David has a very nice post here about a paper on the "Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates" which was published in PLoS Pathogens in July. There is some fun/interesting stuff in the paper, including analysis of the "core" and "pan" genome of this species. Anyway - David saw my request for posts and I am very happy that he responded. Without further ado - here is his story (I note - I added a few links and Italics but otherwise he wrote the whole thing ...).

---------------------------------------

I first want to than Jonathan for giving me this opportunity. I am a big fan of “behind the science” stories, a habit I fed in grad school by reading every Perspectives (from the journal Genetics) article that I could get a hold of. Science can be rough, but I remember finding solace in stories about the false starts and triumphs of other researchers and how randomness and luck manage to figure into any discovery. If anything I hope to use this space to document this as it is fresh in my mind so that (inevitably) when the bad science days roll around I can have something to look back on. In the very least, I'm looking forward to mining this space in the future for quotes to prove just how little I truly understood about my research topics in 2011. It took a village to get this paper published, so apologies in advance to those that I fail to mention. Also want to mention this upfront, Marc Nishimura is my co-author and had a hand in every single aspect of this paper.

Joining the Dangl Lab

This project really started way back in 2006, when I interviewed for a postdoc with Jeff Dangl at UNC Chapel Hill. In grad school I had focused on understanding microbial evolution and genetics but I figured that the best use of my postdoc would be to learn and understand genomics and bioinformatics. I was just about to finish up my PhD and was lucky enough to have some choices when it came around to choosing what to do next. I actually had no clue about Dangl’s research until stumbling across one of his papers in Genetics, which gave me the impression that he was interested in bringing an evolutionary approach to studies of the plant pathogen Pseudomonas syringae. I was interested in plant pathogens because, while I wanted to study host/pathogen evolution, my grad school projects on Helicobacter pylori showed me just how much fun it is dealing with the bureaucracy of handling human pathogens. There is extensive overlap in the mechanisms of pathogenesis between plant and human pathogens, but no one really cares how many Arabidopsis plants you infect or if you dispose of them humanely (so long as the transgenes remain out of nature!). By the time I interviewed with Jeff I was leaning towards joining a different lab, but the visit to Chapel Hill went very well and by the end I was primed for Dangl’s sales pitch. This went something along the lines of “look, you can go join another lab and do excellent work that would be the same kinds of things that you did in grad school...or you can come here and be challenged by jumping into the unknown”. How can you turn that down? Jeff sold me on continuing a project started by Jeff Chang (now a PI at Oregon State), on categorizing the diversity of virulence proteins (type III effector proteins to be exact) that were translocated into hosts by the plant pathogen Pseudomonas syringae. Type III effectors are one of the main determinants of virulence in numerous gram negative plant and animal pathogens and are translocated into host cells to ultimately disrupt immune functions (I'm simplifying a lot here). Chang had already created genomic libraries and had screened through random genomic fragments of numerous P. syringae genomes to identify all of the type III effectors within 8 or so phylogenetically diverse strains. The hope was that they would find a bunch of new effectors by screening strains from different hosts. Although this method worked well for IDing potential effectors, I was under the impression that it was going to be difficult to place and verify these effectors without more genomic information. I was therefore brought in to figure out a way to sequence numerous P. syringae genomes without burning through a Scrooge McDuckian money bin worth of grant money. We had a thought that some type of grand pattern would emerge after pooling all this data but really we were taking a shot in the dark.

Tomato leaves after 10 days infection by the tomato pathogen P.syringae DC3000 (left) as well as a less virulent strain (right). Disease symptoms are dependent on a type III secretion system.

Moments of Randomness that Shape Science

When I actually started the postdoc, next generation sequencing technologies were just beginning to take off. It was becoming routine to use 454 sequencing to generate bacterial genome sequences, although Sanger sequencing was still necessary to close these genomes. Dangl had it in his mind that there had to be a way to capitalize on the developing Solexa (later Illumina) technology in order to sequence P. syringae genomes. There were a couple of strokes of luck here that conspired to make this project completely worthwhile. I arrived at UNC about a year before the UNC Genome Analysis core facility came online. Sequencing runs during the early years of this core facility were subsidized by UNC, so we were able to sequence many Illumina libraries very cheaply. This gave us the opportunity to play around with sequencing options at low cost, so we could explore parameter space and find the best sequencing strategy. This also meant that I was able to learn the ins and outs of making libraries at the same time as those working in the core facility (Piotr Mieczkowski was a tremendous resource). Secondly, I started this postdoc without knowing a lick of UNIX or perl and knew that I was going to have to learn these if I had any hope of assembling and analyzing genomes. I was very lucky to have Corbin Jones and his lab 3 floors above me in the same building to help work through my kindergarden level programming skills. Corbin was really instrumental to all of these projects as well as in keeping me sane and I doubt that these projects would have turned out anywhere near as well without him. Lastly, plant pathogens in general, and P. syringae in particular, were poised to greatly benefit from next generation sequencing in 2006. While there was ample funding to completely sequence (close) genomes for numerous human pathogens, lower funding opportunities for plant pathogens meant that we were forced to be more creative if we were going to pull of sequencing a variety of P.syringae strains. This pushed us into trying a NGS approach in the first place. I suspect that it’s no coincidence that, independently of our group, the NGS assembler Velvet was first utilized for assembling P.syringae isolates.

The Frustrations of Library Making

Through a collaboration with Elaine Mardis’s group at Washington University St. Louis, we got some initial data back that suggested it would be difficult to make sense of bacterial genomes at that time using only Illumina (the paired end kits weren’t released until later). There simply wasn’t good enough coverage of the genome to create quality assemblies with the assemblers available at this time (SSAKE and VCAKE, our own (really Will Jeck’s) take on SSAKE). Therefore we decided to try a hybrid approach, combining low coverage 454 runs (initially separate GS Flex runs with regular reads and paired ends, and later one run with long paired ends) with Illumina reads to fill in the gaps and leveraging this data to correct for any biases inherent in the different sequencing technologies. Since there was no core facility at UNC when I started making libraries, I had to travel around in order to find the necessary equipment. The closest place that I could find a machine to precisely shear DNA was Fred Dietrich’s lab at Duke. More than a handful of mornings were spent riding a TTA bus from UNC to Duke, with a cooler full of genomic DNA on dry ice (most times having to explain to the bus drivers how I wasn’t hauling anything dangerous), spending a couple of hours on Fred’s hydroshear, then returning to UNC hoping that everything worked well. There really is no feeling like spending a half a day travelling/shearing only to find out that the genomic DNA ended up the wrong size. We were actually planning to sequence one more strain of P. syringae, and already had Illumina data, but left this one out because we filled two plates of 454 sequencing and didn’t have room for a ninth strain. In the end there were two very closely related strains (P.syringae aptata or P. syringae atrofaciens) left to make libraries for and the aptata genome sheared better on the last trip than atrofaciens. If you’ve ever wondered why researchers pick certain strains to analyze, know that sometimes it just comes down to which strain worked first. Sometimes there were problems even when the DNA was processed correctly. I initially had trouble making the 454 libraries correctly in that, although I would follow the protocol exactly, I would lose the DNA somewhere before the final step. I was able to trace down the problem to using an old (I have no clue when the Dangl lab bought it, but it looked as useable as salmon sperm ever does) bottle of salmon sperm DNA during library prep. There were also a couple of times that I successfully constructed Illumina libraries only to have the sequencing runs dominated by few actual sequences. These problems ultimately stemmed from trying to use homebrew kits (I think) for constructing Illumina libraries. Once these problems were resolved, Josie Reinhardt managed to pull everything together and create a pipeline for hybrid genome assembly and we published our first hybrid genome assembly in Genome Research. At that moment it was a thrill that we could actually assemble a genome for such a low cost. It definitely wasn’t a completely sequenced genome, but it was enough to make calls about the presence or absence of genes.

Waiting for the story to Emerge

There are multiple ways to perform research. We are all taught about how important it is to define testable hypothesis and to set up appropriate experiments to falsify these educated guesses. Lately, thanks to the age of genomics, it has become easier and feasible to accumulate as much genomic data as possible and find stories within that data. We took this approach with the Pseudomonas syringae genome sequences because we knew that there was going to be a wealth of information, and it was just a matter of what to focus on. Starting my postdoc I was optimistic that our sampling scheme would allow us to test questions about how host range evolves within plant pathogens (and conversely, identify the genes that control host range) because the strains we were going to sequence were all isolated from a variety of diseased hosts. My naive viewpoint was that we were going to be able to categorize virulence genes across all these strains, compare suites of virulence genes from strains that were pathogens of different hosts, and voila...we would understand host range evolution. The more I started reading about plant pathology the more I became convinced that this approach was limited. The biggest problem is that, unlike some pathogens, P. syringae can persist in a variety of environments with strains able to survive our flourish or on a variety of hosts. Sure we had strains that were known pathogens of certain host plants, but you can’t just assume that these are the only relevant hosts. Subjective definitions are not your friend when wading into the waters of genomic comparisons.

We were quite surprised that, although type III effectors are gained and lost rapidly across P.syringae and our sequenced strains were isolated from diverse hosts, we only managed to identify a handful of new effector families. I should also mention here that Artur Romanchuk came on board and did an extensive amount of work analyzing gene repertoires across strains. A couple of nice stories did ultimately emerge by comparing gene sequences across strains and matching these up with virulence in planta (we are able to show how mutation and recombination altered two different virulence genes across strains), but my two favorite stories from this paper came about from my habit of persistently staring at genome sequences and annotations. As I said above, a major goal of this paper was to categorize the suites of a particular type of virulence gene (type III effectors) across P. syringae. I was staring at gene repertoires across strains when I noticed that two of the strains had very few of these effectors (10 or so) compared to most of the other strains (20-30). When I plotted total numbers of effectors across strains, a phylogenetic pattern arose where genomes from a subset of closely related P. syringae strains possessed lower numbers of effectors. I then got the idea to survey for other classes of virulence genes, and sure enough, strains with the lowest numbers of effectors all shared pathways for the production of well characterized toxin genes (Non ribosomal peptide synthase (NRPS) toxins are secreted out of P. syringae cells and are virulence factors, but are not translocated through the type III secretion system). One exception did arise across this handful of strains (a pea pathogen isolate from pathovar pisi) in that this strain has lost each of these conserved toxin pathways and also contain the highest number of effectors within this phylogenetic group. The relationship between effector number and toxin presence remains a correlation at the present time, but I’m excited to be able to try and figure out what this means in my own lab.

Modified Figure 3 from the paper. Strain names are listed on the left and are color coded for phylogenetic similarity. Blue boxes indicate that the virulence gene/toxin pathway is present, green indicates that the pathway is likely present but sequence was truncated or incomplete, while box indicates absence. I have circled the group II strains, which have the lowest numbers of type III effectors while also having two conserved toxin pathways (syringomycin and syringolin). Note that the Pisi strain (Ppi R6) lacks these toxin pathways.

The other story was a complete stroke of luck. P. syringae genomes are typically 6Mb (6 million base pairs) in size, but one strain that we sequenced (a cucumber pathogen) contained an extra 1Mb of sequence. Moreoever, the two largest assembled contigs from this strain were full of genes that weren’t present in any other P. syringae strain. After some similarity comparisons, I learned that there was a small bit of overlap between each of these contigs and performed PCR to confirm this. Then, as a hunch, I designed primers facing out of each end of the contig and was able to confirm that this extra 1Mb of sequence was circular in conformation and likely separate from the chromosome. I got a bit lucky here because there was a small bit (500bp or so) of sequence that was not assembled with either of these two contigs that closed the circle (a lot more and I wouldn’t have gotten the PCR to work at all). We quickly obtained 3 other closely related strains and were able to show that only a subset of strains contain this extra 1Mb and that it doesn’t appear to be directly involved in virulence on cucumber. So it turns out that a small number (2 so far) of P. syringae strains have acquired and extra 1Mb of DNA, and we don’t quite know what any of these ~700 extra genes do. There are no obvious pathways present aside from additional chromosomal maintenance genes, extra tRNAs in the same ratio as the chromosomal copies, and a couple of secretion systems. So somehow we managed to randomly pick the right strain to capture a very recent event that increased the genome size of this one strain by 15% or so. We’ve made some headway on this megaplasmid story since I started my lab, but I’ll save that for future blog posts.

Modified Figure S12 from the paper. Strains that contain the 1Mb megaplasmid (Pla7512 and Pla107) are slightly less virulent during growth in cucumber than strains lacking the megaplasmid (PlaYM8003, PlaYM7902). This growth defect is also measurable in vitro. In case you are wondering, I used blue and yellow because those were the dolors of my undergrad university, the University of Delaware.Reviewer Critiques

We finally managed to get this manuscript written up by the summer of 2010 and submitted it to PLoS Biology. I figured that (as always) it would take a bit of work to address reviewer’s critiques, but we would nonetheless be able to publish without great difficulty. I was at a conference on P. syringae at Oxford in August of 2010 when I got the reviews back and learned that our paper had gotten rejected. Everyone has stories about reviewer comments and so I’d like to share one of my own favorites thus far. I don’t think it ever gets easier to read reviews when your paper has been rejected, but I was knocked back the main critique of one reviewer:

“I realize that the investigators might not typically work in the field of bacterial genomics, but when looking at divergent strains (as opposed to resequencing to uncover SNPs among strains) it is really necessary to have complete, not draft, genomes. I realize that this might sound like a lot to ask, but if they look at comparisons of, for example, bacterial core and pan-genomes, such as the other paper on this that they cite (and numerous other examples exist), they are based on complete genome sequences. If this group does not wish to come up to the standards applied to even the most conventional bacterial genomics paper, it is their prerogative; however, they should be aware of the expectations of researchers in this field.”

So this reviewer was basically asking us to spend an extra 50k to finish the genomes for these strains before they were scientifically useful. Although I do understand the point, this paper was never about getting things perfect but about demonstrating what is possible with draft genomes. I took the part about working in the field of bacterial genomics a bit personally I have to admit, c'mon that's harsh, but I got over that feeling by downing a few pints in Oxford with other researchers that (judging by their research and interest in NGS) also failed to grasp the importance of spending time and money to close P. syringae genomes. We managed to rewrite this paper to address most of the other reviewers critiques and finally were able to submit to PLoS Pathogens.

Baltrus DA, Nishimura TM, Reinhardt JA, Romanchuk A, Chang JH, Mukhtar MS, Cherkis K, Roach J, Grant SR, Jones CD, Dangl JL “Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates” PLoS Pathogens 7(7):e1002132

Baltrus Lab Website

Dangl Lab Website

Jones Lab Website

Tuesday, September 27, 2011

Blast from the past: video of a talk I gave in 2006 #metagenomics

Just re-found this video and posted it to youtube. It is from a talk I gave in 2006 at the first "International Metagenomics Meeting" in 2006.

I think one may still be able to view videos from the CalIT2/UCSD page here. But I thought it might be better to have this talk on YouTube than at the CalIT site so I posted it ... hope they don't sue me.

Note - I wrote a blog post about the meeting here:
The Tree of Life: Metagenomics 2006

Thursday, September 15, 2011

Great paper showing the potential power of comparative and evolutionary genomics in #PLoS Genetics

There is a wonderful paper that has just appeared in PLoS Genetics I want to call people's attention to: PLoS Genetics: Emergence and Modular Evolution of a Novel Motility Machinery in Bacteria

In the paper, researchers from CNRS and Aix-Marseille in France used some nice comparative and evolutionary genomics analyses along with experimental work to characterize the function and evolution of gliding motility in bacteria.

Their summary of their work:

Motility over solid surfaces (gliding) is an important bacterial mechanism that allows complex social behaviours and pathogenesis. Conflicting models have been suggested to explain this locomotion in the deltaproteobacterium Myxococcus xanthus: propulsion by polymer secretion at the rear of the cells as opposed to energized nano-machines distributed along the cell body. However, in absence of characterized molecular machinery, the exact mechanism of gliding could not be resolved despite several decades of research. In this study, using a combination of experimental and computational approaches, we showed for the first time that the motility machinery is composed of large macromolecular assemblies periodically distributed along the cell envelope. Furthermore, the data suggest that the motility machinery derived from an ancient gene cluster also found in several non-gliding bacterial lineages. Intriguingly, we find that most of the components of the gliding machinery are closely related to a sporulation system, suggesting unsuspected links between these two apparently distinct biological processes. Our findings now pave the way for the first molecular studies of a long mysterious motility mechanism.

Basically, they started with some genetic and functional studies in Myxococcus xanthus. They analyzed these in the context of the genome sequence (note - I was a co-author on the original genome paper). And then they did some extensive comparative and evolutionary analysis of these genes, producing some wonderful figures along the way such as:

Figure 2. Taxonomic distribution of the closest homologues of the 14 genes composing the G1, G2, and M1 clusters, and genetic organization of the core complex. (A) For a given gene, the number of homologues in the corresponding genome is indicated by the numbers within arrows. The relationships between the species carrying the different homologues of the genes are indicated by the phylogeny on the left. Based on their taxonomic distribution, the 14 genes can be divided into Group A (grey background) and Group B (white background). (B) In all non Deltaproteobacteria and in Geobacter, the Group B genes clustered in a single genomic region. doi:10.1371/journal.pgen.1002268.g002

Based on their analysis they then came up with some hypotheses as to which genes were involved in key parts of gliding motility and what their biochemical functions were and they then went and confirmed this with experiments. I am not going to go into detail on the functional work they did but you can read their paper for more details.

They wrapped up their paper by proposing an model for the evolutionary history of gliding motility. I am not sure I buy all components of their model since our sampling of genomes right now is still very poor, but they have a pretty detailed theory captured in part in this figure:

Figure 8. Evolution and structure of the Myxococcus gliding motility machinery. A) Evolutionary scenario describing the emergence and evolution of the gliding motility machinery in M. xanthus. The relationships between organisms carrying close homologues of the 14 genes encoding putative components of the gliding machinery in M. xanthus are represented by the phylogeny. Green and red arrows respectively indicate gene acquisition and gene loss. The number of gene copies that were acquired or lost is indicated within arrows. The purple dotted arrows represent horizontal gene transfer events of one or several components. WGD marks the putative whole genome duplication event that occurred in the ancestor of Myxococcales. For each gene, locus_tag, former (agm/agl/agn) and new (glt and agl) names are provided. The number of complete genomes that contain homologues of glt and agl genes compared to the total number of complete genomes available at the beginning of this study are indicated in brackets. (B) The Myxococcus gliding machinery. The diagram compiles data from this work and published literature. Components were added based on bioinformatic predictions, mutagenesis, interaction and localization studies. Exhaustive information is not available for all proteins and thus the diagram largely is subject to modifications once more data will be available. Known interactions within the complex from experimental evidence are AglR-GltG, AglZ-MglA and interactions within the AglRQS molecular motor [13], [15]. For clarity, the proteins were colour-coded as in the rest of the manuscript

Anyway - I don't have much time right now to provide more detail on the paper. But it is definitely worth checking out.

Tuesday, September 6, 2011

More on 'phylogenomics' - as in functional prediction w/ phylogeny

There is a new paper out: Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium in Briefings in Bioinformatics.

The paper is interesting and presents a new general approach to using phylogeny for functional prediction of uncharacterized genes. I am interested in this for many reasons including that I was one of, if not the first to lay this out as a concept. In a series of papers from 1995-1998 I outlined how phylogenetic analysis could be used to aid in functional prediction for all the genes that were starting to be sequenced in genome projects without any associated functional studies (at the time, I referred to all these ESTs and other sequences as an "onslaught" - little did I know what was to come).

My first paper on this topic was in 1995: Evolution of the SNF2 family of proteins: subfamilies with distinct sequences and functions. The abstract is below:

The SNF2 family of proteins includes representatives from a variety of species with roles in cellular processes such as transcriptional regulation (e.g. MOT1, SNF2 and BRM), maintenance of chromosome stability during mitosis (e.g. lodestar) and various aspects of processing of DNA damage, including nucleotide excision repair (e.g. RAD16 and ERCC6), recombinational pathways (e.g. RAD54) and post-replication daughter strand gap repair (e.g. RAD5). This family also includes many proteins with no known function. To better characterize this family of proteins we have used molecular phylogenetic techniques to infer evolutionary relationships among the family members. We have divided the SNF2 family into multiple subfamilies, each of which represents what we propose to be a functionally and evolutionarily distinct group. We have then used the subfamily structure to predict the functions of some of the uncharacterized proteins in the SNF2 family. We discuss possible implications of this evolutionary analysis on the general properties and evolution of the SNF2 family.

I note - I am annoyed that when I went to the Nucleic Acids Research site for my paper I discovered for some bizarre reason they are now trying to charge for access to it even though it is in Pubmed Central and used to be freely available on the NAR site. WTF? Is this just an IT issue like the #OpenGate complaints I made for a while about Nature Genome papers.

Anyway - in that paper in 1995 I basically showed that at least for this family, phylogenetic analysis could be used as a tool in making functional predictions by allowing one to better identify orthology relationships and subfamilies within the SNF2 superfamily. This was novel I think maybe a little bit but others at the time were also looking into using various analyses to identify orthology relationships across genomes.

Shortly thereafter I started working on the concept that one could used the phylogenetic tree more explicitly in making functional predictions and eventually I laid out the concept of treating function as a character states and doing character state reconstruction using a gene tree to then infer functions for uncharacterized genes. I called this approach "phylogenomics" in a paper in 1997 in Nature Medicine (the editor asked us to give it a name ... and thus my own contribution to the omics word game began). Alas somehow the title of our paper became "Gatrogenomic delights" a movable feast" since we were writing about the E. coli and H. pylori genomes, so I added yet another omics term at the same time. In the paper I showed how phylogenetic analysis of the MutS family of proteins could help in interpreting one of the findings in the H. pylori genome paper:

In this paper we showed why blast searches were not ideal for inferring relationships among sequences (because blast measures similarity NOT evolutionary history per se). A bit annoyed still that other papers then sort of claimed they were the first to show blast was not ideal for inferring evolutionary relatedness, but whatever. This still did not fully describe the phylogeny driven approach that I was working on so I then wrote up an outline of this approach for a paper in Genome Research: Phylogenomics: Improving Functional Prediction for Uncharacterized Genes by Evolutionary Analysis. This paper really laid out the idea in more detail:

It also gave detailed examples of how similarity searches could be misleading and how phylogenetic analysis should in principle be better.

I note - I am very very proud of this paper. But it did not do a lot of things. Really it was about laying out a concept of using tools from phylogenetics in functional prediction. But it did not provide software for example. I later developed some of my own scripts for doing this when I was at TIGR but really the software for phylogeny driven functional predictions would come later from others like Kimmen Sjolander, Sean Eddy, and Steven Brenner. Each method laid out in these tools and in other papers had its own flavors and I continued to explore various approaches and applications to phylogeny driven functional prediction. Examples of my subsequent work are listed below (with links to the Mendeley pages for these papers):

Plus we (at TIGR) used phylogenetic analysis as a tool in annotation of many many genomes as well as metagenomes.

Anyway, enough of history for a bit. What is interesting about this new paper is that they take a slightly different approach to phylogeny driven functional prediction in that they make use of Gene Ontology functional annotations as their key parameter to trace on evolutionary trees. They lay out the differences in their method quite well in the introduction:

Our general approach is similar to the ‘phylogenomic’ method proposed by Eisen [6] and further developed into a probabilistic form by Engelhardt et al. [7], but with important differences. Eisen proposed a conceptual approach for predicting protein function using a phylogenetic tree together with available experimental knowledge of proteins. The original approach relied on manual curation to identify gene duplication events and to find and assimilate the literature for characterized members of the family. Engelhardt et al. used automated reconciliation with the species tree [8] to identify gene duplication events, and experimental GO terms (MF only) to capture the experimental literature. Using this information, they defined a probabilistic model of evolution of MF involving transitions between different molecular functions.

From these previous studies, we adopt the basic approach of function evolution through a phylogenetic tree and the use of GO annotations to represent function. However, unlike these other phylogenomic methods, we represent the evolution in terms of discrete gain and loss events. In Eisen's original model, an annotation does not necessarily represent a gain of function (it could have been inherited from an earlier ancestor), and losses are not explicitly annotated. The transition-based model of Engelhardt et al. assumes replacement of one function by another (gain of one function coupled to the loss of another), and does not capture uncoupled events, which is particularly important for BP annotations and cases where a protein has multiple molecular functions (see examples below). In addition, we make no a priori assumptions about conservation of function within versus between orthologous groups, or about the relationship between evolutionary distance and functional conservation (as the distance may not necessarily reflect every given function). While, as described below, gene duplication events and relatively long tree branches are important clues for curators to locate functional divergence (gain and/or loss), in our paradigm an ancestral function can be inherited by both descendants following a duplication (resulting in paralogs with the same function) or gained/lost by one descendant following a speciation event (resulting in orthologs with different functions). Evolution of each function is evaluated on a case-by-case basis, using many different sources of information about a given protein family

I note - Paul Thomas, one of the authors here has also been developing phylogeny driven functional prediction methods for many years and has done some cool things previously. This new approach seems novel and useful and their paper is worth looking at. I like too that they focus on MutS homologs for some of their examples:

Anyway - their paper is worth a read and some of their software tools may be of use including PAINT: http://sourceforge.net/projects/pantherdb/ and http://pantree.org

Good to see continuous developments in phylogeny driven functional predictions. If you want to learn more - check out the Mendeley Group I have created:

Phylogenetics assisted functional prediction is a group in Biological Sciences on Mendeley.

And please contribute to it. Below are some previous posts of mine of possible interest: