Which type of evidence is least likely to result in changes to a phylogenetic tree?

The Elements of Life

In biology, the elements of life are the essential building blocks that make up living things. They are carbon, hydrogen, nitrogen, oxygen, phosphorus, and sulfur. The first four of these are the most important, as they are used to construct the molecules that are necessary to make up living cells. These elements form the basic building blocks of the major macromolecules of life, including carbohydrates, lipids, nucleic acids and proteins. Carbon is an important element for all living organisms, as it is used to construct the basic building blocks of life, such as carbohydrates, lipids, and nucleic acids. Even the cell membranes are made of proteins. Carbon is also used to construct the energy-rich molecules adenosine triphosphate (ATP) and guanosine triphosphate (GTP). Hydrogen is used to construct the molecules water and organic compounds with carbon. Hydrogen is also used to construct ATP and GTP. Nitrogen is used to construct the basic building blocks of life, such as amino acids, nucleic acids, and proteins. It is also used to construct ATP and GTP. Oxygen is used to construct the basic building blocks of life, such as carbohydrates, lipids, and nucleic acids. It is also used to construct ATP and GTP. Phosphorus is used to construct the basic building blocks of life, such as carbohydrates, lipids, and nucleic acids.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Journal Article

Anup Som

Anup Som is an assistant professor at the Center of Bioinformatics, Institute of Interdisciplinary Studies, University of Allahabad, India—a public university and an institution of national importance. His research interests include evolutionary bioinformatics and systems biology.

Search for other works by this author on:

Received:

24 January 2014

  • PDF
  • Split View
    • Article contents
    • Figures & tables
    • Video
    • Audio
    • Supplementary Data
    • Email
    • Twitter
    • Facebook
    • More

Close

Navbar Search Filter Microsite Search Term Search

Abstract

Phylogenetic analysis is used to recover the evolutionary history of species, genes or proteins. Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. Moreover, it is important because of its wide applications that include understanding genome organization, epidemiological investigations, predicting protein functions, and deciding the genes to be analyzed in comparative studies. Despite immense progress in recent years, phylogenetic reconstruction involves many challenges that create uncertainty with respect to the true evolutionary relationships of the species or genes analyzed. One of the most notable difficulties is the widespread occurrence of incongruence among methods and also among individual genes or different genomic regions. Presence of widespread incongruence inhibits successful revealing of evolutionary relationships and applications of phylogenetic analysis. In this article, I concisely review the effect of various factors that cause incongruence in molecular phylogenies, the advances in the field that resolved some factors, and explore unresolved factors that cause incongruence along with possible ways for tackling them.

INTRODUCTION

Recovering true historical relationships between any groups of species, genes or proteins is one of the principal goals of evolutionary research. Knowledge of phylogeny tells us about the pattern of evolutionary relationships, revealing the historical pattern of speciation and divergence and enabling us to classify life according to evolutionary scheme. Moreover, it is important because of its wide applications throughout biology such as in studies of human diseases: for epidemiological investigations [1–3], for identifying and characterizing newly discovered pathogens [4, 5] and for identifying and tracking natural reservoirs of zoonotic diseases [6]. In addition, phylogenetic analysis has been widely used to test the a priori hypothesis of epidemiological clustering in suspected transmission chains of HIV [7]. Phylogenetic analysis of molecular sequences is also one of the principal interpretive tools for understanding the organization and evolution of genes and genomes [8]. However, the success of revealing evolutionary relationships and applications of phylogenetic analysis greatly depends on the quality of the inferred phylogeny.

Despite immense progress in recent years, phylogenetic reconstruction involves many challenges that create uncertainty with respect to the true historical relationships of the organisms or genes analyzed [9]. One of the most notable difficulties is the widespread occurrence of incongruence (see glossary) among methods [10, 11] and also among individual genes or different genomic regions [12–15], and the existence of pervasive incongruences impedes achievement of the primary goals of evolutionary research. Evolutionary biologists expected that the advent of large volumes of data with sophisticated computational tools would resolve the problem. Surprisingly, the advent of molecular and genomic data has increased the variety of classifications rather than reducing the problem [10, 16]. Thus, despite the rise of phylogenomics [17], many important nodes remain unresolved [18]. In reality, large amounts of data with available efficient algorithms have made immense progress, but still many important nodes remain unresolved [15, 19]. This issue raises a critical question: why important nodes remain unresolved, and how do we overcome this problem. Molecular data are continuously accumulating at a rapid pace, analytical methods are refined and new ones are developed but time has not stopped. Incongruence is still a problem that scientists are trying to tackle.

In this review, I emphasize the various factors that cause incongruence in molecular phylogenies, the advances in the field that resolved some factors, and explore unresolved factors that cause incongruence along with possible ways of tackling them. Reviewing the causes, consequences and solutions of phylogenetic incongruence is a difficult task, and may be too big to be addressed in a single article. However, some of these causes, consequences and solutions have already been reviewed separately in the different literatures [11, 19–21], but no such effort to combine all the possible causes and their consequences has been concisely discussed in a single article.

PRESENCE OF INCONGRUENCE: A SURVEY

The widespread occurrences of phylogenetic incongruence among methods and also among genes are well-known [10, 11]. Although there is no doubt about the existence of statistically significant incongruence among different tree reconstruction methods, but statistically significant incongruence among genes has been questioned [10]. However, it has been accepted for a long time that gene trees could be incongruent with each other because of the genetic drift in natural populations, gene duplication, horizontal gene transfer or lineage sorting [10, 22]. In an investigation, Rokas et al. [13] has shown that the single-gene phylogenies reveal extensive incongruence when a particular tree reconstruction method is used. They analyzed 106 genes of eight yeast species and showed that there are several strongly supported (defined here as a bootstrap value >70%) alternative phylogenies. Analyses of 106 genes resulted in >20 alternative maximum likelihood (ML) or maximum parsimony (MP) trees. In a similar study, Kopp and True [12] also reported that single-gene datasets of Drosophila melanogaster species group produce several strongly supported conflicting clades, which were also consistently seen in combined analyses. Lack of resolution in combined data analyses stems from conflicting phylogenetic signal between individual genes either because of real differences in their evolutionary history or because of different statistical biases [25]. Many studies have also reported that there are several strongly supported trees among different genes [23–25]. On the other hand, Jeffroy et al. [10] have analyzed the same 106 genes assembled by Rokas et al. [13] from 14 yeast species and shown that although individual genes recover different trees, they are not statistically significant (i.e. no incongruence among the 106 individual genes), and stated ‘all single gene trees are different because of the predominant effect of stochastic error’. This result contradicted other results and also stands in sharp contrast because gene trees are different not only due to statistical biases, but also due to real differences in their evolutionary history [26, 27].

Despite evidence of incongruence in recent studies, a further investigation has been conducted using nucleotide sequence data published in the scientific literatures (Table 1). These multigene datasets were used for testing different evolutionary hypotheses, and therefore quality of data had been verified. Reanalysis of the multigene datasets reveal that the phylogenetic incongruence is present in both cases (i.e. among methods and also among genes) even when robust reconstruction methods (i.e. ML and Bayesian method) are used. It is admissible that the incongruence among methods is very common and statistically robust (Figure 1), which agreed with other observations. Figure 1 shows the subtrees of six-taxon phylogenies that are based on the concatenation of four genes of Pryer et al. [28] data. Trees were inferred using neighbor-joining (NJ) and ML methods under the generalized time reversible (GTR) model with rate heterogeneity (α = 0.2652). Pryer et al. [28] data contained an alignment of 5072 nucleotide characters based on the concatenation of four genes from 35 taxa. In Figure 1, instead of 35 taxon trees, two six-taxa subtrees have been shown where incongruence is significant. The ML tree shows that Pinus and Gnetum are monophyletic with 78% bootstrap value, whereas the NJ tree shows Pinus and Gnetum are paraphyletic. This result shows a distinct example of statistically significant incongruence among methods. On the other hand, topological differences among genes are also widespread, but most of them are lacking statistical support. In Figure 2, phylogenies obtained from a single gene dataset of Springer et al. [29] data. Gene trees were obtained from ATP7A (690 bp, 42 taxa and rate of heterogeneity α = 0.751) and mt-RNA (1647 bp, 42 taxa and α = 0.751) genes by using neighbor-joining–maximum composite likelihood (NJ-MCL) method of phylogenetic inference [36–38]. Gene ATP7A shows Elephant and Hyrax are monophyletic with 65% bootstrap support whereas mt-RNA tree shows that Elephant and Sirenian are monophyletic with 81% bootstrap support.

Figure 1:

Incongruence among phylogenetic methods (Pinus and Gnetum case). Phylogenies based on the combined dataset (concatenation of four genes) containing alignments of 5072 nucleotide characters from 35 taxa [28]. Trees were inferred using (A) NJ and (B) ML methods under GTR model with rate heterogeneity (α = 0.2652). The bootstrap supports of each tree are indicated to the left of the corresponding node; 1000 bootstrap supports for NJ method using MEGA4 [38] and 500 bootstrap supports for ML method using PAUP [35] were performed. The ML tree shows that Pinus and Gnetum are monophyletic with 78% bootstrap support, whereas the NJ tree shows that Pinus and Gnetum are paraphyletic. This result shows the statistically significant incongruence among methods. [Note: In this figure, instead of 35 taxa trees, two six-taxa subtrees are shown where incongruence was significant]

Figure 2:

Incongruence among genes. Phylogenies obtained from single gene datasets of Springer et. al. [29]. Gene trees were obtained from (A) ATP7A (690 bp, 42 taxa and rate heterogeneity α = 0.751) and (B) mt-RNA (1647 bp, 42 taxa and α = 0.751) genes by using NJ method of phylogenetic inference under MCL model with rate heterogeneity. The bootstrap supports of each tree are indicated to the left of the corresponding node; 1000 replicates for NJ-MCL using MEGA4 [38] were performed. The ATP7A gene shows that Elephant and Hyrax are monophyletic with 65% bootstrap support, whereas mt-RNA tree shows that Elephant and Sirenian are monophyletic with 81% bootstrap support.

Table 1:

Information about the datasets used in this study, including source paper, number of taxa, sequence length, genes, organism and evolutionary model

DatasetReferenceTaxaLengthGene(s)OrganismEvolution model
Pryer  [28]  35  5072  atpB + rbcL + rps4 + SSUrRNA  Seed plant  GTR + G 
Springer  [29]  42  19397  19 nuclear & three mtRNA genesa  Placental mammal  F84 + G 
Murphy  [30]  66  9779  15 nuclear & three mtDNA genesa  Placental mammal  K2P, HKY + G 
Teeling  [31]  29  7187  A2AB + BRCA1 + RAG1 + RAG2  Rhinolophoid microbat  GTR + G + I 
Yoder  [32]  31  4454  Cytochrome b, ND2, IRBP  Malagasy carnivore  GTR + G + I 
Madsen  [33]  30  6518  A2AB + IRBP + vWF + rRNA  Placental mammal  HKY85 
James  [34]  214  6436  18S + 28S + 5.8S + EF1α + RPB1 + RPB2  Fungus  GTR + I + G 

DatasetReferenceTaxaLengthGene(s)OrganismEvolution model
Pryer  [28]  35  5072  atpB + rbcL + rps4 + SSUrRNA  Seed plant  GTR + G 
Springer  [29]  42  19397  19 nuclear & three mtRNA genesa  Placental mammal  F84 + G 
Murphy  [30]  66  9779  15 nuclear & three mtDNA genesa  Placental mammal  K2P, HKY + G 
Teeling  [31]  29  7187  A2AB + BRCA1 + RAG1 + RAG2  Rhinolophoid microbat  GTR + G + I 
Yoder  [32]  31  4454  Cytochrome b, ND2, IRBP  Malagasy carnivore  GTR + G + I 
Madsen  [33]  30  6518  A2AB + IRBP + vWF + rRNA  Placental mammal  HKY85 
James  [34]  214  6436  18S + 28S + 5.8S + EF1α + RPB1 + RPB2  Fungus  GTR + I + G 

Note: aGenes name can be found in the corresponding reference article.

Table 1:

Information about the datasets used in this study, including source paper, number of taxa, sequence length, genes, organism and evolutionary model

DatasetReferenceTaxaLengthGene(s)OrganismEvolution model
Pryer  [28]  35  5072  atpB + rbcL + rps4 + SSUrRNA  Seed plant  GTR + G 
Springer  [29]  42  19397  19 nuclear & three mtRNA genesa  Placental mammal  F84 + G 
Murphy  [30]  66  9779  15 nuclear & three mtDNA genesa  Placental mammal  K2P, HKY + G 
Teeling  [31]  29  7187  A2AB + BRCA1 + RAG1 + RAG2  Rhinolophoid microbat  GTR + G + I 
Yoder  [32]  31  4454  Cytochrome b, ND2, IRBP  Malagasy carnivore  GTR + G + I 
Madsen  [33]  30  6518  A2AB + IRBP + vWF + rRNA  Placental mammal  HKY85 
James  [34]  214  6436  18S + 28S + 5.8S + EF1α + RPB1 + RPB2  Fungus  GTR + I + G 

DatasetReferenceTaxaLengthGene(s)OrganismEvolution model
Pryer  [28]  35  5072  atpB + rbcL + rps4 + SSUrRNA  Seed plant  GTR + G 
Springer  [29]  42  19397  19 nuclear & three mtRNA genesa  Placental mammal  F84 + G 
Murphy  [30]  66  9779  15 nuclear & three mtDNA genesa  Placental mammal  K2P, HKY + G 
Teeling  [31]  29  7187  A2AB + BRCA1 + RAG1 + RAG2  Rhinolophoid microbat  GTR + G + I 
Yoder  [32]  31  4454  Cytochrome b, ND2, IRBP  Malagasy carnivore  GTR + G + I 
Madsen  [33]  30  6518  A2AB + IRBP + vWF + rRNA  Placental mammal  HKY85 
James  [34]  214  6436  18S + 28S + 5.8S + EF1α + RPB1 + RPB2  Fungus  GTR + I + G 

Note: aGenes name can be found in the corresponding reference article.

FACTORS THAT CAUSE PHYLOGENETIC INCONGRUENCE

Analytical factors

The causes of negative impact of various factors are analytical and biological. Analytical factors affecting phylogenetic reconstruction include: (i) the choice of optimality criterion [39], (ii) taxon sampling [40, 41] and (iii) specific assumptions in the modeling of sequence evolution [42]. The existing probability-based optimality criteria (i.e. ML and Bayesian method) and substitution models are statistically robust, but they have limitations to represent reality [43–45]. It might be possible that small variation from reality makes the problem. Models of sequence evolution are designed to capture the most essential aspects of processes shaping sequence evolution. However, these models rely on assumptions that render them sensitive to a number of influences. For example, extreme variability in rates of evolution among lineages has been shown to lead to long-branch attraction [46]. Complex models of sequence evolution may better approximate the evolution of the sequences, and therefore might be expected to give more accurate results. On the other hand, more complex models require the estimation of more parameters, each of which is subjected to some errors. Several studies reported that the simple models sometimes perform better than the complex models [47–49], and this finding highlighted the complexity of phylogeny reconstruction and demands for more theoretical work on statistical models [50, 51].

Appropriate and extensive taxon sampling is one of the most important determinants of accurate phylogenetic estimation. In phylogenetic analyses, the inclusion or exclusion of certain taxa may influence the accuracy of phylogenetic estimations. Considerable bodies of work have addressed the effect of taxon sampling on phylogenetic reconstruction, especially in relation to density of sampling, and have stated that an adequate increase of taxon sampling strongly improves the phylogenetic estimation [40, 52–55]. Heath et al. [56] performed a comprehensive study on taxon sampling that included an exploration of the effect and strategies of taxon sampling and concluded that there are many benefits of dense taxon sampling and advised evolutionary biologists to appreciate taxon sampling when interpreting phylogenetic analysis results to explain any unexpected patterns. For further details about the issue of taxon sampling and phylogenetic analysis readers are advised to go through the comprehensive review article by Nabhan and Sarkar [21].

Biological factors

Biological factors cause incongruence between two phylogenies because of (i) violations of the orthology due to three major mechanisms, namely, lineage sorting, hidden paralogy and horizontal gene transfer [57, 58]; (ii) stochastic error or character sampling bias related to the length of the genes; and (iii) systematic error due to the presence of a nonphylogenetic signal in the data [10].

A growing corpus of studies has revealed conflicting evolutionary histories among genes [11, 59]. The conflicting evolutionary histories among gene trees could mainly be due to coalescent stochasticity [11, 22, 60] when incomplete sorting of the ancestral polymorphism occurs during successive speciation events, leading to gene genealogies that differ from the species phylogeny (Figure 3). This phenomenon, called ‘incomplete lineage sorting’, has been detected in several different taxa [24, 60], including hominids [61]. A recent genome-wide analysis of hominid primates indicated that, as a consequence of incomplete lineage sorting, roughly 30% of our genome support the [chimpanzee, (human, gorilla)] or [human, (chimpanzee, gorilla)] branching order, i.e. topologies different from the true [gorilla, (human, chimpanzee)] species tree [60, 62]. A very different reason why gene trees can be truly incongruent is gene duplications and extinctions. If a dataset includes paralogous copies, then the true phylogeny will partly reflect the duplication history of the gene that is independent of species divergence history. The third mechanism is horizontal gene transfer. If genetic exchanges occur between species, then the phylogeny of individual genes will be influenced by the number and nature of transfers they have undergone. It is well-known that the horizontal gene transfers are an important source of conflicts between gene trees in bacteria [63, 64]. Quantifying the amount of phylogenetic conflict caused by horizontal gene transfer versus other factors would appear worthwhile for a correct interpretation of bacterial phylogenomic data: how frequently, and under which conditions, should we invoke horizontal gene transfer? Horizontal gene transfer is expected to be very rare or absent in animals. Thus, phylogenetic agreement in animals is much more common that disagreement, indicating that horizontal gene transfer is not widespread enough to erase the vertical signal [11, 15]. Beside the three major mechanisms, hybridization, recombination and natural selection, are cited as potential causes of gene tree incongruence [11]. Figure 4 summarizes the negative factors that cause incongruence in species and genes trees.

Figure 3:

An illustration of incomplete lineage sorting problem. The history of a gene (colored lines) is drawn in the context of a species tree (bars). New lineages arising from new polymorphisms in the gene are drawn in different colors. Consequently, the gene tree topology is (((AB)C)D), whereas the species tree topology is (((A(BC))D). A colour version of this figure is available at BIB online: //bib.oxfordjournals.org.

Figure 4:

A summarization of the negative factors that cause incongruence in species and genes trees.

INCONGRUENCE DUE TO SEQUENCE ALIGNMENTS

The reconstruction of phylogenetic history is predicated on being able to accurately establish hypotheses of character homology, and mistaken hypotheses of homology are a primary source of error in evolutionary studies. Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction [65–67]. One of the earlier studies to see the effects of nucleotide sequences alignment on phylogeny, Morrison and Ellis [68] did a case study of 18s rDNAs of Apicomplexa and concluded that many of the literature disagreements concerning the phylogeny of the Apicomplexa are probably based on differences in sequence alignment strategies rather than differences in data or tree-building methods [68].

Over the decade, continuous efforts have been made to improve the quality of multiple sequence alignments, and as a result, several improved algorithms of sequence alignment have been developed such as SATCHMO [69], RASCAL [70], MUSCLE [71], MAVID [72] and CLUSTAL OMEGA [73]. Different alignment algorithms claimed high alignment accuracy, but a recent study finds the pitfalls of the existing algorithms and raised questions about the quality of multiple alignments and their effects on subsequent analyses [74, 75]. Multiple alignments are typically produced by feeding sequences into the different alignment programs from left to right. Landan and Graur [74] used existing alignment programs and instructed them to read the sequences both from left to right and also from right to left, and found that the two resulting alignments can be quite different and consequently produce different trees even though both alignments start with identical information and were processed by identical algorithms. Therefore, one of the potential sources of incongruence in phylogeny is due to artifacts in multiple sequence alignments. So what are the possible solutions for resolving incongruence cause by alignment artifacts? The problems are hard and so far no perfect remedy exists. However, a good approach is choosing highly conserved, recently diverged and long sequences where the effect of alignments in both directions seems to be relatively minor. Furthermore, for alignments that are not too short, removal of problematic regions (i.e. eliminate regions that cannot be aligned with confidence) leads to better trees [76, 77].

INCONGRUENCE CAUSED BY HETEROTACHY

Heterotachy refers to the variation in the evolutionary rate of a given site (i.e. nucleotide/amino acid) of a gene or protein through time. It has been convincingly demonstrated that the evolutionary rate of a given site is not always constant throughout time. Functional constraints on sites in a gene sequence often change through time, causing shifts in site-specific evolutionary rates, a phenomenon called heterotachy [78]. This phenomenon has been recently confirmed as an important process of sequence evolution [79], and can lead to phylogenetic reconstruction artifacts [80]. Unlike other types of biases, heterotachy does not leave any evident footprints in sequences, and therefore leads to artifacts that are particularly difficult to detect [81, 82].

Recently, several simulation studies have been conducted to compare the relative performance of the phylogenetic inference methods under the heterotachous conditions, and studies have shown that ML outperforms MP, but that in the majority of cases, both methods failed to recover the correct topology under the heterotachous conditions [51, 80, 83–86]. Recent controversies over the relative performances of the ML and MP methods can not address the important issue of how to reconstruct a correct phylogeny when the concatenations suffer from heterotachy. Particularly, adopting a genome-scale approach increases the chance of heterotachy because genes and genomes are unlikely to evolve in a similar way as we concatenate sequence data from species that are extremely distantly related or have large differences in their environmental habitat, biological composition and life history traits.

So, what are the implications of these findings? Are all previously accepted multigene trees in need of reinspection? To find an acceptable answer, a simulation study was conducted to measure the performance of the phylogenetic inference methods when different level of heterotachy is present in the data. The simulation strategy of Kolaczkowski and Thornton [80] was used to introduce heterotachy in the data. Traditional (i.e. ML, MP and NJ) and Bayesian inference (BI) methods were considered for this simulation under Hasegawa, Kishino and Yano (HKY) model of sequence evolution. The results in Figure 5 show that when the level of heterotachy is very strong, all methods failed to reconstruct true tree; when the level of heterotachy is very weak, all methods are very efficient; and when level of heterotachy is intermediate, ML and BI are very efficient (BI slightly more efficient than ML), whereas the MP and NJ methods perform very poorly. Simulation experiments show that the situation is not so dire because when we are reconstructing a large-scale phylogeny by concatenating several genes, there is a fair chance of occurring heterotachy. However, for a set of conserved homologous genes, level of heterotachy is expected to be low or intermediate, and under such situations ML and BI are preferred methods to avoid incongruence caused by heterotachous signal. Recently, two models of sequence evolution have been proposed for dealing with heterotachy. Pagel and Meade [87] proposed a general likelihood-based covarion ‘mixture-model’ for inferring phylogenetic tree from multigene data that suffer from heterotachy. Another model is the site-heterogeneous mixture model, named CAT, was developed for dealing with heterotachy [88]. However, it has been claimed that these models accommodate cases in which different sites in the alignment evolve with different rates, but evidently these models of evolution could not detect or overcome heterotachy [30].

Figure 5:

A histogram plot shows the performance of the phylogenetic methods with the variation of heterotachous signal (see text). The simulation strategy of Kolaczkowski and Thornton [80] was used to introduce heterotachy in the data. A colour version of this figure is available at BIB online: //bib.oxfordjournals.org.

Molecular phylogenetics: the study of evolutionary relationships among organisms or genes by a combination of molecular biology and statistical techniques

Phylogenetic incongruence: two (or more) phylogenetic trees are said to be incongruent when they exhibit conflicting branching orders (i.e. topologies) and can not be superimposed. This implies that at least one node (also known as a bipartition) present in one tree is not found in the other(s), where it is replaced by alternative groupings of taxa.

Phylogenetic signals: tendency for related taxa to resemble each other more than they resemble species drawn at random from the tree. The strength of the phylogenetic signal is proportional to the number of substitutions occurring along the branch.

Phylogenomics: reconstruction of phylogenies using a large number of genes or genomics regions. Two fundamentally different approaches are used for reconstructing phylogenies from multiple datasets. In one, the phylogenetic reconstruction is done after the gene sequences are concatenated head-to-tail to form a super-gene alignment—called ‘supermatrix’ approach. In the other, phylogenies are inferred separately for each gene and the resulting gene trees are used to generate a majority rule consensus phylogeny—called ‘supertree’ approach. The size of homologous sequence datasets has increased dramatically in recent years, and many of these datasets now involve few hundred to several thousand species or genes [16, 18, 108].

Model of sequence evolution: a statistical description of the process of substitution in nucleotide or amino acid sequences. Complex models better approximate the evolutionary process but at the expense of more parameters and computational time. As parameter-rich models require more data to behave properly, they have become really useful with the advent of phylogenomic datasets.

Gene tree: a phylogenetic tree that shows the evolutionary relationships between a group of genes.

Species tree: a phylogenetic tree that shows the evolutionary relationships between a group of species.

Paralogy: two genes are said to be paralogous when they derive from a gene duplication event.

Horizontal gene transfer: the transfer of genetic material from one organism to another one that is not its offspring.

Incomplete lineage sorting: the failure of two or more lineages in a population to coalesce, leading to the possibility that at least one of the lineages first coalesces with a lineage from a less closely related population.

Ancestral polymorphism: the variants that arose by mutation prior to the speciation event that generated the species in which they segregate.

Coalescent: the process of joining ancestral lineages when the genealogical relationships of a random sample of sequences from a modern population are traced back.

Heterotachy: refers to the fact that the evolutionary rate of a given nucleotide/amino acid position varies throughout time.

Long-branch attraction: the tendency of species at the ends of long branches in a phylogenetic tree to be grouped artificially close to each other. Long branches can result from a fast evolutionary rate and/or a long time span. In particular, the branch of the outgroup is, in essence, long and often artifactually attracts fast-evolving lineages.

Stochastic error: the error in phylogenetic estimation caused by the finite length of the sequences used in the inference. As the size of the sequences (n) increases, the magnitude of the error decreases (

⁠.)

Systematic error: the error in phylogenetic estimation that is due to the failure of the reconstruction method to account fully for the properties of the data.

Natural selection: the process in nature by which, according to Darwin's theory of evolution, only the organisms best adapted to their environment tend to survive and produce more offspring while those less adapted tend to be eliminated.

Hybridization (DNA): the process of combining two complementary single-stranded DNA or RNA molecules and allowing them to form a single double-stranded molecule through base pairing.

Recombination: the exchange of a segment of DNA between two homologous chromosomes during meiosis leading to a novel combination of genetic material in the offspring.

Monophyletic: monophyletic taxa include all the species that are derived from a single common ancestor.

Polyphyletic: taxon is composed of unrelated organisms descended from more than one ancestor.

Optimality criteria: an objective function that is used to evaluate a given tree. The tree that maximizes or minimizes the function is chosen as the best estimate of phylogeny.

Phylogenetic networks: a phylogenetic network is any graph used to visualize evolutionary relationships (either abstractly or explicitly) between nucleotide sequences, genes, chromosomes, genomes or species. They are used when reticulate events such as hybridization, horizontal gene transfer, recombination or gene duplication and loss are believed to be involved.

Although probabilistic methods (i.e. BI and ML) are very efficient up to certain levels of heterotachous signal. The existence of heterotachy compels us to improve the phylogenetic reconstruction methods. So far, there is no such way to accommodate heterotachous signal in the methods of phylogenetics inferences. Moreover, it is well-known that existence of heterotachy dramatically affects phylogenetic accuracy [80]. Therefore, phylogenetic incongruence might be because of heterotachy, and phylogenetic inference methods should be improved so that methods can successfully accommodate within-site rate variation (i.e. heterotachous signals), and successful modeling of within-site rate variation will help to resolve many unresolved nodes.

RESOLVING INCONGRUENCE USING PHYLOGENOMICS APPROACH

To resolve the effect of analytical and biological factors, evolutionary biologists had tried to reconstruct a true (or at least a fair) evolutionary history by analysis of concatenated datasets [13, 16, 29, 89–93]. Concatenated datasets offer two major advantages. First, our confidence in the inferred relationships is greatly improved when they are supported by several independent datasets. Second, each individual dataset may not contain sufficient information to resolve all relationships; therefore, combining the data may increase the amount of phylogenetic signal and provide the necessary resolution. But a problem occurs when different genes support conflicting phylogenies. Perfect agreement among the gene-specific phylogenies is rare, and therefore phylogenetic analyses of different sets of concatenated genes often do not converge on the same tree [26, 27]. There is growing evidence that trees built from concatenated sequences have to be treated with caution [14, 27, 93, 94]. Some studies have yielded results at odds with widely accepted phylogenies [95].

Over the past two decades, tremendous progress in this field has resulted in resolving various analytical and biological factors that cause incongruence in phylogenies. Improvement of phylogenetic inference methods, probabilistic models of sequence evolution and genome-scale approach (i.e. phylogenetic reconstruction is done after the gene sequences are concatenated head-to-tail to form a super gene alignment) have resolved a significant amount of incongruence, but still there are large number of incongruences among methods and also among genes [10–15]. Adopting phylogenomics approach resolved two biological factors. Those are violation of orthology assumption and stochastic error. The nonorthologous comparisons are gene-specific and are buffered in a multigene analysis; and stochastic error is related to gene lengths, that is, naturally vanishes when more and more genes are concatenated. On the other hand, systematic errors (i.e. nonhistorical signals where reality differs from simplified models) are not expected to disappear with the addition of data. Moreover, longer sequences exacerbate the potential for biases (systematic error) to be positively misleading. Systematic errors result from nonphylogenetic signals being present in the data; those are compositional signal due to heterogeneity of nucleotide compositions among genes or species, rate-heterogeneity signal (i.e. rate variation across lineages) and also heterotachous signal due to within-site rate variation [19]. The bias causing systematic errors creates an erroneous signal that could dominate the true phylogenetic signal causing the tree reconstruction method to be inconsistent and lead to an incorrect, but highly supported tree [96].

APPROACHES TO OVERCOME SYSTEMATIC ERRORS

Phylogenetic inference from nucleotides or amino acids can be misled by both sampling (stochastic) error and systematic error. Systematic errors are due to heterogeneity of nucleotide compositions, rate variation across lineages and also within-site rate variation. Heterogeneity of nucleotide compositions causes incorrect phylogenetic inference because unrelated clades with similar nucleotide frequencies (due to convergence rather than shared ancestral frequencies) will have greater similarity and may group together in a phylogenetic analysis, sometimes with strong statistical support. Several methods have been developed to account for the problem of compositional heterogeneity among sequences, including distance approaches [97–99], parsimony approaches [100] and maximum-likelihood approaches [101]. Recently, the RY-coding strategy (coding the nucleotides as purines and pyrimidines) was recognized a way to deal with compositional signal [96]. Indeed, the G + C content can be extremely variable among homologous sequences from various organisms, whereas the frequency of purines is remarkably homogeneous [102]. This constitutes a method of choice to avoid incongruence resulting from compositional bias. A recent research has reported that GC-rich genes induced a higher amount of conflict among gene trees and performed worse than AT-rich genes in retrieving well-supported, consensual nodes on the placental tree [103]. This GC effect is mainly as a consequence of genome-wide variations in recombination rate. Indeed, recombination is known to drive GC-content evolution through GC-biased gene conversion and might be problematic for phylogenetic reconstruction, for instance, in an incomplete lineage sorting context. Beside compositional heterogeneity, rate variation across lineages has been well-studied and successfully modeled by the use of the gamma distribution [104, 105]. The mentioned implementations improved the reconstruction of the phylogenies, especially when very divergent species or genes were included and long-branch attraction artifacts were widespread.

As part of the continuous effort to resolve systematic errors, recently, several complementary approaches have been applied, namely, (i) variation in taxon sampling, (ii) removal of fast-evolving species, genes or sites [106–108] and (iii) allowing a more efficient detection of multiple substitutions. In addition to the mentioned complementary approaches, amino acid coding into functional categories has potential to overcome some types of systematic errors in genome-scale datasets [30].

Although recent studies indicated that reconstructing phylogeny from alignments of concatenated genes greatly reduces the stochastic error, the potential for systematic error still remains, heightening the need for reliable methods to analyze multigene datasets. Recently, a new methodology, called ‘phylogenetic networks’, has been adopted to deal with incongruences of gene trees [109, 110]. Phylogenetic networks provide an alternative to phylogenetic trees when analyzing datasets whose evolution involves significant amounts of reticulate events such as horizontal gene transfer, hybridization or recombination [111, 112]. However, it has been claimed that the phylogenetic networks approach has potential to visualize contradictory evidence for species phylogenies and incorporate additional information [113], but evidently this approach needs more theoretical work to establish an efficient network algorithm and simultaneously need comprehensive tests using large-scale data to assess the accuracy of this approach. These further analyses will find out the potential applications and pitfalls of phylogenetic networks approach.

CONCLUSIONS, IMPLICATIONS AND FUTURE RESEARCH

Although recovering a highly resolved tree of life is yet to be achieved, wealth of genomic data with probabilistic tree reconstruction methods and complex models of sequence evolution have taken phylogenetic analysis to a new height. Furthermore, cautionary application of several contemporary approaches such as variation of taxon sampling, removal of fast evolving species (genes or sites), removal of ambiguously aligned regions or use of amino acid sequences instead of nucleotides resolved several uncertainties. For further improvement, an accurate probabilistic modeling of heterotachy (i.e. within-site rate variation) is needed. In the future, it should be examined whether the phylogenomics approach itself can be fixed in case of heterotachy. To achieve this goal, it would be necessary to define a model of sequence evolution that considers heterotachy and to implement it in phylogenetic reconstruction methods. Secondly, accurate establishment of character homology (i.e. multiple sequence alignment) is an essential requirement. Evolutionary biologists should pay great attention to improve the quality of multiple sequence alignments because recent findings have demonstrated that the existing alignment algorithms fail to accurately establish character homology and mistaken hypotheses of homology are a primary source of error in phylogenetic analysis.

Despite several known factors, there are possibly other hidden factors that may cause incongruence, which prevent the achievement of Darwin’s dream of having ‘ … fairly true genealogical trees for each great kingdom of Nature … ’. Evolutionary biologists should explore the possibility of hidden negative factors and find out their best possible solutions.

  • Resolving phylogenetic incongruence is not an easy problem; particularly, the problem becomes more complicated when attempts of resolving one negative factor may introduce a new negative factor that causes incongruence.

  • Adopting phylogenomics approach resolved few negative factors that caused phylogenetic incongruence, but simultaneously increased the chance of systematic error; particularly, ‘heterotachous signal’ that is not expected to disappear with the addition of data.

  • Toward achieving the goal of recovering a highly resolved tree of life, accurate establishment of character homology (i.e. multiple sequence alignment) is an essential requirement.

  • Beside the known negative factors, there are possibly other hidden factors that may cause incongruence; therefore, evolutionary biologists should explore the possibility of hidden negative factors and find out their solutions.

FUNDING

The work was supported by the Center of Bioinformatics, University of Allahabad.

Acknowledgements

I thank Dr. Sudhir Kumar of Arizona State University and Dr. Dan Graur of University of Houston for useful discussions. I also thank the anonymous reviewers for their constructive comments.

References

1

,  ,  , et al. 

Molecular epidemiology of HIV transmission in a dental practice

,

Science

,

1992

, vol.

256

 

(pg.

1165

-

71

)

2

,  ,  , et al. 

Molecular epidemiology: HIV-1 and HCV sequences from Libyan outbreak

,

Nature

,

2006

, vol.

444

 

(pg.

836

-

7

)

3

,  ,  . 

Inconsistencies in estimating the age of HIV-1 subtypes due to heterotachy

,

Mol Biol Evol

,

2012

, vol.

29

 

(pg.

451

-

6

)

4

,  ,  , et al. 

Genetic identification of a hantavirus associated with an outbreak of acute respiratory illness

,

Science

,

1993

, vol.

262

 

(pg.

914

-

17

)

5

,  ,  , et al. 

A novel hantavirus associated with an outbreak of fatal respiratory disease in the southwestern United States: evolutionary relationships to known hantaviruses

,

J Virol

,

1994

, vol.

68

 

(pg.

592

-

6

)

6

New aspects of rabies with emphasis on epidemiology, diagnosis, and prevention of the disease in the United States

,

Clin Microbiol Rev

,

1996

, vol.

9

 

(pg.

166

-

76

)

7

,  ,  , et al. 

Source identification in two criminal cases using phylogenetic analysis of HIV-1 DNA sequences

,

Proc Natl Acad Sci USA

,

2010

, vol.

107

 

(pg.

21242

-

7

)

8

,  ,  , et al. 

Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events

,

Nature

,

2003

, vol.

422

 

(pg.

433

-

8

)

9

,  ,  . 

Accommodating phylogenetics uncertainty in evolutionary studies

,

Science

,

2000

, vol.

288

 

(pg.

2349

-

50

)

10

,  ,  , et al. 

Phylogenomics: the beginning of incongruence?

,

Trends Genet

,

2006

, vol.

22

 

(pg.

225

-

31

)

11

,  . 

Gene tree discordance phylogenetic inference and the multispecies coalescent

,

Trends Ecol Evol

,

2009

, vol.

24

 

(pg.

332

-

40

)

12

,  . 

Phylogeny of the oriental Drosophila melanogaster species group: a multilocus reconstruction

,

Syst Biol

,

2002

, vol.

51

 

(pg.

786

-

805

)

13

,  ,  , et al. 

Genome-scale approaches to resolving incongruence in molecular phylogenies

,

Nature

,

2003

, vol.

425

 

(pg.

798

-

804

)

14

,  ,  , et al. 

Genome-scale data angiosperm relationships and ‘ending incongruence’: a cautionary tale in phylogenetics

,

Trends Plant Sci

,

2004

, vol.

9

 

(pg.

477

-

83

)

15

,  . 

Dealing with incongruence in phylogenomic analyses

,

Philos Trans R Soc Lond B Biol Sci

,

2008

, vol.

363

 

(pg.

4023

-

9

)

16

,  ,  , et al. 

Toward automatic reconstruction of a highly resolved tree of life

,

Science

,

2006

, vol.

311

 

(pg.

1283

-

7

)

17

Phylogenomics intersection of evolution and genomics

,

Science

,

2003

, vol.

300

 

(pg.

1706

-

7

)

18

,  ,  . 

Rooting the eutherian tree: the power and pitfalls of phylogenomics

,

Genome Biol

,

2007

, vol.

8

 

pg.

R199

 

19

,  ,  . 

Phylogenomics and the reconstruction of the tree of life

,

Nat Rev

,

2005

, vol.

6

 

(pg.

361

-

75

)

20

,  . 

The supermatrix approach to systematics

,

Trends Ecol Evol

,

2007

, vol.

22

 

(pg.

34

-

41

)

21

,  . 

The impact of taxon sampling on phylogenetic inference: a review of two decades of controversy

,

Brief Bioinf

,

2011

, vol.

13

 

(pg.

122

-

34

)

22

,  . 

Relationships between gene trees and species trees

,

Mol Biol Evol

,

1988

, vol.

5

 

(pg.

568

-

83

)

23

,  ,  . 

Molecular phylogeny of the Drosophila melanogaster species subgroup

,

J Mol Evol

,

2003

, vol.

57

 

(pg.

562

-

73

)

24

,  ,  , et al. 

Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting

,

PLoS Genet

,

2006

, vol.

2

 

pg.

e173

 

25

,  ,  , et al. 

Phylogenetic incongruence in the Drosophila melanogaster species group

,

Mol Phylogenet Evol

,

2007

, vol.

43

 

(pg.

1138

-

50

)

26

,  ,  , et al. 

Phylogenomics revives traditional views on deep animal relationships

,

Curr Biol

,

2009

, vol.

19

 

(pg.

706

-

12

)

27

,  ,  , et al. 

Resolving difficult phylogenetic questions: why more sequences are not enough

,

PLoS Biol

,

2011

, vol.

9

 

pg.

e1000602

 

28

,  ,  , et al. 

Horsetails and ferns are a monophyletic group and the closest living relatives to seed plants

,

Nature

,

2001

, vol.

409

 

(pg.

618

-

22

)

29

,  ,  , et al. 

Placental mammal diversification and the Cretaceous–Tertiary boundary

,

PNAS

,

2003

, vol.

100

 

(pg.

1056

-

61

)

30

,  ,  , et al. 

Detecting and overcoming systematic errors in genome-scale phylogenies

,

Syst Biol

,

2007

, vol.

56

 

(pg.

389

-

99

)

31

,  ,  , et al. 

Microbat paraphyly and the convergent evolution of a key innovation in Old World rhinolophoid microbats

,

Proc Natl Acad Sci USA

,

2002

, vol.

99

 

(pg.

1431

-

6

)

32

,  ,  , et al. 

Single origin of Malagasy carnivora from an African ancestor

,

Nature

,

2003

, vol.

421

 

(pg.

734

-

7

)

33

,  ,  , et al. 

Parallel adaptive raditions in two major clades of placental mammals

,

Nature

,

2001

, vol.

409

 

(pg.

610

-

14

)

34

,  ,  , et al. 

Reconstructing the early evolution of Fungi using a six-gene phylogeny

,

Nature

,

2006

, vol.

443

 

(pg.

818

-

22

)

35

,

PAUP*: phylogenetic analysis using parsimony *and other methods Version 4

,

2002

Sunderland, MA

Sinauer Associates

36

,  ,  . 

Prospects for inferring very large phylogenies by using the Neighbor-Joining method

,

Proc Natl Acad Sci USA

,

2004

, vol.

101

 

(pg.

11030

-

5

)

37

ML or NJ-MCL? A comparison between two robust phylogenetic methods

,

Comput Biol Chem

,

2009

, vol.

33

 

(pg.

373

-

8

)

38

,  ,  ,  . 

MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0

,

Mol Biol Evol

,

2007

, vol.

24

 

(pg.

1596

-

9

)

39

Performance of phylogenetic methods in simulation

,

Syst Biol

,

1995

, vol.

44

 

(pg.

17

-

48

)

40

,  . 

Increased taxon sampling greatly reduces phylogenetic error

,

Syst Biol

,

2002

, vol.

51

 

(pg.

588

-

98

)

41

,  . 

More genes or more taxa: the relative contribution of gene number and taxon number to phylogenetic accuracy

,

Mol Biol Evol

,

2005

, vol.

22

 

(pg.

1337

-

44

)

42

,  . 

Model selection in Phylogenetics

,

Annu Rev Ecol Evol Syst

,

2005

, vol.

36

 

(pg.

445

-

66

)

43

,  . 

Phylogeny estimation: traditional and Bayesian approaches

,

Nat Rev Genet

,

2003

, vol.

4

 

(pg.

275

-

84

)

44

,

Inferring Phylogenies

,

2004

Sunderland, MA

Sinauer Associates

45

,  . 

Molecular phylogenetics: principles and practice

,

Nat Rev Gent

,

2012

, vol.

13

 

(pg.

304

-

14

)

46

Cases in which parsimony or compatibility methods will be positively misleading

,

Syst Zool

,

1978

, vol.

27

 

(pg.

401

-

10

)

47

Efficiencies of maximum likelihood methods of phylogenetics inferences when different substitution models are used

,

Mol Phylogenet Evol

,

2004

, vol.

31

 

(pg.

865

-

73

)

48

A new approach for estimating the efficiencies of the nucleotide substitution models

,

Theory Biosci

,

2007

, vol.

125

 

(pg.

133

-

45

)

49

How often do wrong models produce better phylogenies

,

Mol Biol Evol

,

1997

, vol.

17

 

(pg.

1251

-

8

)

50

Theoretical foundation to estimate the relative efficiencies of the JC+gamma model and the JC model in obtaining the correct phylogenetic tree

,

Gene

,

2006

, vol.

385

 

(pg.

103

-

10

)

51

,  . 

The effect of heterotachy in multigene analysis using the neighbor joining method

,

Mol Phylogenet Evol

,

2009

, vol.

52

 

(pg.

846

-

51

)

52

Taxonomic sampling phylogenetic accuracy and investigator bias

,

Syst Biol

,

1998

, vol.

47

 

(pg.

3

-

8

)

53

,  ,  . 

Resolution of phylogenetic conflict in large data sets by increased taxon sampling

,

Syst Biol

,

2006

, vol.

55

 

pg.

522

 

54

The effect of taxonomic sampling on accuracy of phylogenetic estimation: test case of a known phylogeny

,

Mol Biol Evol

,

1998

, vol.

15

 

(pg.

1086

-

90

)

55

,  ,  , et al. 

Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships

,

Mol Biol Evol

,

2010

, vol.

27

 

(pg.

1983

-

7

)

56

,  ,  . 

Taxon sampling and the accuracy of phylogenetic analyses

,

J Syst Evol

,

2008

, vol.

46

 

(pg.

239

-

57

)

57

Gene trees in species trees

,

Syst Biol

,

1997

, vol.

46

 

(pg.

523

-

36

)

58

,  . ,  ,  . 

Phylogenetic Incongruence: window into genome history and molecular evolution

,

Molecular Systematics of Plants II: DNA Sequencing

,

1998

Netherlands

Kluwer academic publishers

(pg.

265

-

96

)

59

,  . 

Discordance of species trees with their most likely gene trees

,

PLoS Genet

,

2006

, vol.

2

 

(pg.

762

-

8

)

60

,  ,  , et al. 

Incomplete lineage sorting patterns among human chimpanzee and orangutan suggest recent orangutan speciation and widespread selection

,

Genome Res

,

2011

, vol.

21

 

(pg.

349

-

56

)

61

,  ,  , et al. 

Genetic evidence for complex speciation of humans and chimpanzees

,

Nature

,

2006

, vol.

441

 

(pg.

1103

-

8

)

62

,  ,  ,  . 

Genomic relationships and speciation times of human chimpanzee and gorilla inferred from a coalescent hidden Markov model

,

PLoS Genet

,

2007

, vol.

3

 

pg.

e7

 

63

,  . 

Horizontal gene transfer and phylogenetics

,

Curr Opin Microbiol

,

2003

, vol.

6

 

(pg.

498

-

505

)

64

Horizontal gene transfer in evolution: facts and challenges

,

Proc Biol Sci

,

2010

, vol.

277

 

(pg.

819

-

27

)

65

Multiple sequence alignment for phylogenetic purposes

,

Aust Syst Bot

,

2006

, vol.

19

 

(pg.

479

-

539

)

66

,  . 

Multiple sequence alignment accuracy and phylogenetic inference

,

Syst Biol

,

2006

, vol.

55

 

(pg.

314

-

28

)

67

,  . 

Is multiple-sequence alignment required for accurate inference of phylogeny?

,

Syst Biol

,

2007

, vol.

56

 

(pg.

206

-

21

)

68

,  . 

Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18s rDNAs of apicomplexa

,

Mol Biol Evol

,

1997

, vol.

14

 

(pg.

428

-

41

)

69

,  . 

SATCHMO: sequence alignment and tree construction using hidden Markov models

,

Bioinformatics

,

2003

, vol.

19

 

(pg.

1404

-

11

)

70

,  ,  . 

RASCAL: rapid scanning and correction of multiple sequence alignments

,

Bioinformatics

,

2003

, vol.

19

 

(pg.

1155

-

61

)

71

MUSCLE: multiple sequence alignment with high accuracy and high throughput

,

Nucleic Acids Res

,

2004

, vol.

32

 

(pg.

1792

-

97

)

72

,  . 

MAVID: constrained ancestral alignment of multiple sequences

,

Genome Res

,

2004

, vol.

14

 

(pg.

693

-

9

)

73

,  ,  , et al. 

Fast scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

,

Mol Syst Biol

,

2011

, vol.

7

 

pg.

539

 

74

,  . 

Heads or tails: a simple reliability check for multiple sequence alignments

,

Mol Biol Evol

,

2007

, vol.

24

 

(pg.

1380

-

3

)

75

Recent evolutions of multiple sequence alignment algorithms

,

PLoS Comput Biol

,

2007

, vol.

3

 

pg.

e123

 

76

Missing data and the design of phylogenetic analyses

,

J Biomed Inform

,

2006

, vol.

39

 

(pg.

34

-

42

)

77

,  . 

Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments

,

Syst Biol

,

2007

, vol.

56

 

(pg.

564

-

77

)

78

,  . 

On the conservation of protein sequences in evolution

,

Trends Biochem Sci

,

2001

, vol.

26

 

(pg.

414

-

16

)

79

,  ,  . 

Heterotachy an important process of protein evolution

,

Mol Biol Evol

,

2002

, vol.

19

 

(pg.

1

-

7

)

80

,  . 

Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous

,

Nature

,

2004

, vol.

431

 

(pg.

980

-

4

)

81

,  ,  , et al. 

Covarion shifts cause a long-branch attraction artifact that unites Microsporidia and Archaebacteria in EF-1α phylogenies

,

Mol Biol Evol

,

2004

, vol.

21

 

(pg.

1340

-

9

)

82

,  ,  ,  , et al. 

An improved statistical method for detecting heterotachy in nucleotide sequences

,

Mol Biol Evol

,

2006

, vol.

23

 

(pg.

1397

-

405

)

83

,  . 

Maximum likelihood outperforms maximum parsimony even when evolutionary rates are heterotachous

,

Mol Biol Evol

,

2005

, vol.

22

 

(pg.

2139

-

41

)

84

,  . 

A call for likelihood phylogenetics even when the process of sequence evolution is heterogeneous

,

Mol Phylogenet Evol

,

2005

, vol.

37

 

(pg.

928

-

31

)

85

,  ,  , et al. 

Heterotachy and long-branch attraction in phylogenetics

,

BMC Evol Biol

,

2005

, vol.

5

 

pg.

50

 

86

,  ,  . 

Likelihood parsimony and heterogeneous evolution

,

Mol Biol Evol

,

2005

, vol.

22

 

(pg.

1161

-

4

)

87

,  . 

A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data

,

Syst Biol

,

2004

, vol.

53

 

(pg.

571

-

81

)

88

,  . 

A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process

,

Mol Biol Evol

,

2004

, vol.

21

 

(pg.

1095

-

109

)

89

,  ,  . 

Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology

,

Nature

,

1999

, vol.

402

 

(pg.

402

-

4

)

90

,  ,  , et al. 

Molecular phylogenetics and the origins of placental mammals

,

Nature

,

2001

, vol.

409

 

(pg.

614

-

18

)

91

,  ,  , et al. 

Obtaining maximal concatenated phylogenetic data sets from large sequence databases

,

Mol Biol Evol

,

2003

, vol.

20

 

(pg.

1036

-

42

)

92

,  ,  , et al. 

The analysis of 100 genes supports the grouping of three highly divergent amoebae: dictyostelium Entamoeba and Mastigamoeba

,

Proc Natl Acad Sci USA

,

2002

, vol.

99

 

(pg.

1414

-

19

)

93

,  ,  , et al. 

Alternative methods for concatenation of core genes indicate a lack of resolution in deep nodes of the prokaryotic phylogeny

,

Mol Biol Evol

,

2008

, vol.

25

 

(pg.

83

-

91

)

94

Genome-scale approach and the performance of phylogenetic methods

,

J Phylogen Evol Biol

,

2013

, vol.

1

 

pg.

116

 

95

,  . 

Amphioxus mitochondrial DNA chordate phylogeny and the limits of inference based on comparisons of sequences

,

Syst Biol

,

1998

, vol.

47

 

(pg.

61

-

76

)

96

,  ,  . 

Genome-scale phylogeny and the detection of systematic biases

,

Mol Biol Evol

,

2004

, vol.

21

 

(pg.

1455

-

8

)

97

Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances

,

Proc Natl Acad Sci USA

,

1994

, vol.

91

 

(pg.

1455

-

9

)

98

,  . 

Inferring phylogenies from DNA sequences of unequal base compositions

,

Proc Natl Acad Sci USA

,

1995

, vol.

92

 

(pg.

11317

-

21

)

99

,  . 

Estimation of evolutionary distances under stationary and nonstationary models of nucleotide substitution

,

Proc Natl Acad Sci USA

,

1998

, vol.

95

 

(pg.

5899

-

905

)

100

,  ,  . 

A frequency-dependent significance test for parsimony

,

Mol Phyl Evol

,

1995

, vol.

4

 

(pg.

64

-

71

)

101

,  . 

Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis

,

Mol Biol Evol

,

1998

, vol.

15

 

(pg.

871

-

9

)

102

,  ,  ,  . 

Archaeal phylogeny: reexamination of the phylogenetic position of Archaeoglobus fulgidus in light of certain composition-induced artifacts

,

Syst Appl Microbiol

,

1991

, vol.

14

 

(pg.

364

-

71

)

103

,  ,  , et al. 

Less is more in mammalian phylogenomics: AT-rich genes minimize tree conflicts and unravel the root of placental mammals

,

Mol Biol Evol

,

2013

, vol.

30

 

(pg.

2134

-

44

)

104

Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites

,

Mol Biol Evol

,

1993

, vol.

10

 

(pg.

1396

-

401

)

105

Maximum-likelihood models for combined analyses of multiple sequence data

,

J Mol Evol

,

1996

, vol.

42

 

(pg.

587

-

96

)

106

,  ,  , et al. 

Microsporidia are related to fungi: evidence from the largest subunit of RNA polymerase II and other proteins

,

Proc Natl Acad Sci USA

,

1999

, vol.

96

 

(pg.

580

-

5

)

107

,  ,  , et al. 

An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics

,

Syst Biol

,

2005

, vol.

54

 

(pg.

743

-

57

)

108

,  ,  , et al. 

Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees

,

Syst Biol

,

2011

, vol.

60

 

(pg.

117

-

25

)

109

,  . ,  . 

Consensus networks: a method for visualising incompatibilities in collections of trees

,

Algorithms in Bioinformatics

,

2003

Berlin

Springer-Verlag

(pg.

165

-

76

)

110

,  . 

Application of phylogenetic networks in evolutionary studies

,

Mol Biol Evol

,

2006

, vol.

23

 

(pg.

254

-

67

)

111

,  . 

A survey of combinatorial methods for phylogenetic networks

,

Genome Biol Evol

,

2011

, vol.

3

 

(pg.

23

-

35

)

112

,  ,  . 

Phylogenetic networks: a tool to display character conflict and demographic history

,

Afr J Biotechnol

,

2011

, vol.

10

 

(pg.

12799

-

803

)

113

,  ,  , et al. 

Using consensus networks to visualize contradictory evidence for species phylogeny

,

Mol Biol Evol

,

2004

, vol.

21

 

(pg.

1459

-

61

)

© The Author 2014. Published by Oxford University Press. For Permissions, please email:

© The Author 2014. Published by Oxford University Press. For Permissions, please email:

Which of the following would be least useful as evidence in constructing a phylogenetic tree?

So, the correct answer is 'Environment comparisons'

What type of evidence is primarily used to make phylogenetic trees?

Any DNA, RNA, or protein sequence can be used to generate a phylogenetic tree. But DNA sequences are most commonly used in generating trees today.

What evidence can be used to determine phylogeny?

The fossil record is often used to determine the phylogeny of groups containing hard body parts; it is also used to date divergence times of species in phylogenies that have been constructed on the basis of molecular evidence.

What 4 types of evidence do we use to construct the tree?

Evidence for building trees.
DNA and RNA sequences (as well as other molecular traits like the amino acid sequences of proteins). ... .
Morphology (i.e., structure and form). ... .
Physiology and biochemical pathways. ... .
Behavioral and ecological traits..

Toplist

Última postagem

Tag