Why are monophyletic groups important




















Loosely, a monophyletic taxon is one that includes a group of organisms descended from a single ancestor , whereas a polyphyletic taxon is composed of unrelated organisms descended from more than one ancestor. These loose definitions fail to recognize the fact that all organisms are related , therefore any conceivable group is logically " monophyletic ".

In modern usage, a monophyletic taxon is defined as one that includes the most recent common ancestor of a group of organisms, and all of its descendents [as in a ]. Such groups are sometimes called holophyletic. It is also possible to recognize a paraphyletic taxon as one that includes the most recent common ancestor, but not all of its descendents [as in c ]. Although its ubiquity has been debated Ellstrand et al. Below the species level, barriers to gene flow are generally believed to be weak; thus, the potential for movement of alleles between taxa such as populations is obvious.

One consequence of gene flow between taxa is the production of reticulate phylogenetic relationships. The cycles present in these graphs are a result of the algorithm used to summarize the set of acceptable alternative solutions for a single data set. Cycles within a network can arise as a consequence of conflicting phylogenetic signal in the underlying data set. Conflicting signal may be due to reticulate processes such as hybridization or recombination but may also be caused by routine homoplasy, present in virtually all data sets.

It is difficult to distinguish true reticulate signal from routine homoplasy using networks. In spite of their promise, most network methods have not been evaluated for their ability to reveal correct relationships when applied to data sets with known reticulate history.

Cassens et al. Moreover, tree-based methods continue to be routinely used for the analysis of potentially reticulate data sets despite numerous, clearly articulated concerns. It is therefore important 1 to determine the degree and manner in which an arguably illegitimate application of tree-based analysis may bias conclusions, and 2 to test whether alternative methods of analysis offer a satisfactory solution.

Past studies have explored the consequences of hybridization on tree-based, cladistic analysis Bremer and Wanntorp, ; Nelson and Platnick, ; Funk, The remainder of the time, the position of the hybrid in the tree and its effect on overall tree topology were less predictable. Although essential to our present understanding of the phenomenon, these studies provide incomplete guidance if the goal is to distinguish terminal monophyletic lineages from taxa with a reticulate history because 1 they relied on morphological characters, which are prone to nonindependence, unpredictable patterns of inheritance in hybrids Rieseberg and Ellstrand, , and other features that make them undesirable for phylogeny reconstruction Scotland et al.

In cases where postzygotic isolating mechanisms operate to prevent gene flow between taxa e. However, complex backcrossed individuals will predominate over F1 hybrids when biological barriers to reproduction are weak e. This is because the majority of potential mates for a rare F1 hybrid will be members of the local population and because selection may favor resident, as opposed to migrant, genotypes Stebbins, Because F1 hybrids will often be rare relative to back-crossed individuals, understanding the effect of F1 hybrids on phylogeny reconstruction is only one component of verifying the applicability of tree-based methods to reticulate data.

Two studies have considered the effect of including simulated recombinant DNA sequences in phylogenetic analyses Schierup and Hein, ; Posada and Crandall, These studies are relevant because a single recombinant DNA sequence mimics the chimeric assemblage of character states that would be expected if many unlinked loci were scored in an individual with reticulate ancestry. Schierup and Hein found that the inclusion of recombinant sequences in distance- and maximum likelihood ML -based phylogenetic analyses caused a predictable change in tree shape, with longer terminal branches and shorter internal branches than were found when recombinant sequences were not present.

They did not, however, consider the effect of recombinant sequences on the accuracy of the reconstructed phylogeny. Posada and Crandall examined the accuracy of tree-based analytical methods when recombinant sequences with characteristics of F1 hybrids, as well as back-crossed individuals, were analyzed. Sequences with characteristics of back-crossed individuals were shown to be less likely to impact the accuracy of phylogeny reconstruction.

However, the conclusions of Posada and Crandall are not easily extended to natural populations because the fate of only a single recombinant sequence within a larger tree of non-recombinant sequences was considered. The effect of reticulate taxa as opposed to single individuals or sequences on phylogeny reconstruction is not well understood.

Reticulate taxa are likely to pose special problems for empirical studies at the species level and below because the distribution of character states among individuals of such taxa is not readily predictable unlike with F1 hybrids. In the simplified case of unidirectional gene flow between two taxa, hybridization may be followed by introgression of character states from a donor taxon into a recipient taxon, resulting in conflicting phylogenetic signal.

Depending on random genetic drift and the strength of selection, only a small number of the character states from the donor taxon that are found in the F1 hybrid individual would be expected to become fixed in the recipient taxon. Accordingly, the magnitude of conflicting signal present in individuals of a recipient taxon will be less than that present in an F1 hybrid individual, potentially making taxa with reticulate histories difficult to identify, in spite of significant gene flow.

Furthermore, during the intermediate period between the hybridization event and fixation of the introgressed character states, variation in the proportion of character states that are traceable to the donor taxon will be observed among individuals in the recipient taxon.

Variation in conflicting signal levels among individuals within a single taxon is potentially problematic for methods that rely on conflicting signal to indicate historical reticulation. The presence of interindividual variation in conflicting signal levels suggests that many individuals per taxon must be included in an analysis to avoid conclusions that are an artifact of inadequate sampling. In this study, we evaluate the ability of six analytical procedures to distinguish terminal monophyletic groups from reticulate taxa in data sets simulated with a predefined pattern of historical reticulation.

We employ a population-level model for simulating multilocus data that allows multiple, complex back-crossed individuals to be sampled during a continuous process of introgressive hybridization. We include an assessment of tree-based methods as well as two phenetic methods that do not assume data fit a tree-like structure.

In addition, we evaluate the performance of two network methods and reverse successive weighting RSW Trueman, , a tree-based procedure intended to identify conflicting phylogenetic signal in data sets derived from reticulate historical processes.

A five-taxon tree was used because it is the simplest tree for which the effect of gene flow on topological relationships between taxa other than those directly involved in gene exchange can be evaluated. Four topologically distinct alternatives for unidirectional gene flow between taxa were considered. Sim1 modeled hybridization between sister taxa and Sim2 and Sim3 explored hybridization between increasingly divergent taxa, both in terms of genetic distance and the topological relationship between them.

Sim4, in which gene flow occurred between the same taxa as Sim2, but in the opposite direction, was used to examine whether the direction of gene flow within a tree affected the ability to infer a reticulate history for the recipient taxon. Each taxon included 10 individuals. Substitution probability for starting data sets is shown along internodes.

Arrows indicate direction of gene flow from donor to recipient taxon. Hybridization, which produced a single F1 individual in the recipient taxon, was followed by 10 generations of random mating within taxa, after which data sets were sampled.

Five-taxon starting trees were created by simulating four character data sets using Seq-Gen Rambaut and Grassly, The number of substitutions per site was set to 0. The substitution model used was equivalent to the Jukes-Cantor model Jukes and Cantor, , but with only two character states allowed so the resulting matrices contained binary data. Although this model was developed to describe DNA sequence evolution, the characters were not subsequently treated as a linked DNA sequence but rather as haploid, unlinked loci, subject to recombination and genetic drift during simulated reproduction.

The population size N of each taxon was set to 10 by replicating the starting haplotypes derived from Seq-Gen. Reproduction followed a Wright-Fisher model with constant population size, fully random mating including selfing , and nonoverlapping generations. The simulation proceeded as follows Fig.

Randomly choose one haplotype from recipient taxon Parent 1 to be hybridized with the donor taxon Parent 2.

Produce a progeny haplotype by randomly selecting, for each character, the character state present in either Parent 1 or Parent 2. Produce nine additional progeny by choosing both parents at random from within the recipient taxon. Assemble progeny haplotypes as in step 2. Allow 10 generations of random mating within the recipient taxon, then acquire data set for analysis. Repeat steps 1 to 4 an additional times so that data sets corresponding to total hybridization events during consecutive generations are acquired per run.

Repeat steps 1 to 5 so that five replicate runs are performed for each of the four starting data sets for all topological models of gene flow. All data sets contained five taxa and 50 individuals. Two parsimony-based procedures were used to analyze the data. Five hundred bootstrap replicates were performed.

Genetic distances were calculated using mean character differences. Two network methods were used. Split networks were calculated using SplitsTree 4. For simplicity, splits are named here using only the smaller of the two possible subsets of taxon names e. Two nonhierarchical statistical procedures were used: F st , and principal coordinate analysis followed by nonparametric modal clustering PCO-MC.

Although the choice of distance coefficient will affect the calculation of principal coordinates, Jaccard distances were used because they have been argued to be appropriate for binary multi-locus data Landry and LaPointe, and are commonly used for analysis of dominant marker data sets.

Simulate five-taxon 50 terminal starting data sets as described above. For each PCO data set, determine the minimum value for R at which four clusters are present.

Values of R that are smaller than this minimum value cause the correct inference of five clusters in a given starting data set. Find the smallest value in the list of R values obtained in step 3. This is the value for R used in this study.

To examine the level of conflicting phylogenetic signal in the simulated data sets using a tree-based approach, the program RSW1. For this study, the cutoff percent value in RSW1. In this way, levels of conflicting support could be measured for all clades at all time points.

Bootstrap searches conducted by RSW1. Conclude that overall data set contains conflicting signal when trees from step 3 are significantly different from step 1 tree. In order to identify regions in which type I error the erroneous rejection of the null hypothesis, H 0 : the recipient taxon is not a distinct evolutionary lineage occurred, a cutoff criterion was established for each analytical procedure, beyond which it was claimed that a reticulate history could be inferred.

In practice, adopting this criterion meant that a reticulate history was inferred for the recipient taxon whenever one or more members was found nested within the donor taxon, or vice versa. For F st analyses, reticulate history was inferred when F st dropped below 0. The effect of this test was to identify the point at which donor and recipient taxa were no longer statistically distinct from one another based on the minimum spanning network.

The time to inferred reticulate history TIRH was measured as the number of hybridization events required before cutoff criteria were met. TIRH is therefore a measure of the rate of erosion of type I error caused by persistent gene flow. For a given procedure, differences in TIRH observed between runs could be due to the topological model of gene flow, the starting data set, or from the stochastic processes of recombination and drift built into the model.

NeighborNet and RSW can also indicate reticulation by exposing conflicting signal in a data set. However, this evidence of reticulation could not be fairly evaluated using a single cutoff criterion.

A different approach, involving sensitivity analysis of type I and type II error across the simulation time course, was used. For NeighborNet, bootstrap support values for the non-trivial splits described previously were collected.

Support values for the corresponding clades from the overall and secondary signal partitions of RSW analyses were likewise collected. The relative support values for two contradictory splits or for contradictory clades in the overall and secondary RSW data partitions indicate the magnitude of conflicting phylogenetic signal. The conclusion that taxon A was as closely related to taxon B as it was to taxon C, and hence that taxon A may have a reticulate history, would then be supported.

Type I error occurred when such evidence of reticulate history was not present causing the null hypothesis to be falsely rejected , and type II error occurred when conflicting signal suggested that a taxon known to be monophyletic had a reticulate history. For each cutoff percent and sampled time point, the number of correct inferences of reticulate history was determined for the 20 data sets available.

Similarly, the number of incorrect inferences i. The probability of success correctly identifying the reticulate taxon , the probability of failure determining that a monophyletic taxon had a reticulate history, i.

The effect of gene flow between taxa on tree-based analyses is summarized using results from the parsimony analysis of Sim3 data. The topological conditions for gene flow examined in Sim1, Sim2, and Sim4 produced a subset of the responses seen in the more complex Sim3.

Results from NJ methods were qualitatively similar to parsimony. Differences in the topology of the strict consensus trees were observed between replicate runs at some sampling points. Given this variation, majority-rule consensus trees were used to determine the most frequently recovered clades over the time course and are used as a tool to generalize the topological effects of gene flow between taxa.

Figure 2 shows the topological changes that occurred during parsimony analysis of Sim3 data. The first observed change from the starting condition Fig. The two alternative topologies at this time point either contained polytomy ABC or were like the tree given in Figure 2c. At this point, taxon A had assumed a new position, forcing the creation of clade BC.

No topological changes occurred in Sim1 because donor and recipient were sister taxa. Topological effect of ongoing, unidirectional gene flow between taxa on parsimony analysis. The initial topology and direction of gene flow are shown in a , and the final, stable topology is shown in f. The recipient taxon A is marked by a bold branch. Arrows in b and c indicate the tree position to which taxon A moved during the next topological change. Time t , measured in number of hybridization events, is shown at the bottom of each figure.

Majority-rule consensus trees are labeled with clade frequency at nodes. Topological rearrangements among internal branches occurred soon after the start of hybridization and were followed by a longer period where support for the donor and recipient taxa as monophyletic groups gradually eroded. Alternate topologies included clade A nested within clade D, clade D nested within clade A, and a topology like Figure 2f.

After this point, the majority of the strict consensus trees from the 20 runs showed an unresolved polytomy containing all individuals from taxa A and D. At this point, taxa A and D were no longer identified as distinct monophyletic groups in any of the replicate data sets. Although the exact timing of topological changes varied between replicate runs, all showed an ordered progression through the phases described above.

Figure 3 shows trees from a single representative Sim3 run taken at time points described for Figure 2. Characters introduced to recipient taxon A from donor taxon D caused the appearance of false hierarchical structure within taxon A. Time series showing the signature of reticulate evolution on four analytical methods. Curved arrows indicate direction of gene flow between taxa.

Data sets were sampled at time points indicated at top. Taxa populations are indicated with capital letters; individuals are in lowercase and are numbered. Relevant bootstrap support values are shown for parsimony and NeighborNet analyses. Relevant branch lengths are shown for MSN analyses. Bold branches in the MSN panels show connections between taxa. These branches were typically 10 to times longer than the branches within recipient taxon A.

The first three principal coordinates are plotted for PCO analyses. Bootstrap support values were calculated for all data sets to determine statistical support for the tree structure during the simulations. Figure 4 shows that support for the monophyly of donor and recipient taxa, on average, eroded gradually, subsequent to internal topological changes. During the period of topological change which occurred soon after gene flow began , internal branches received high bootstrap support Fig.

Support for selected clades after parsimony analysis of Sim2 and Sim3 data. Criterion used for searches shown at bottom. In all graphs, x -axis is number of hybridization events since start of simulation. Evolutionary Adaptation in the Human Lineage. Genetic Mutation. Negative Selection.

Sexual Reproduction and the Evolution of Sex. Haldane's Rule: the Heterogametic Sex. Hybrid Incompatibility and Speciation. Hybridization and Gene Flow. Why Should We Care about Species? Citation: Baum, D. Nature Education 1 1 Phylogenies are a fundamental tool for organizing our knowledge of the biological diversity we observe on our planet. But how exactly do we understand and use these devices?

Aa Aa Aa. What an Evolutionary Tree Represents. Figure 1. Figure Detail. The Lexicon of Phylogenetic Inference. A node represents a branching point from the ancestral population. Terminals occur at the topmost part of each branch, and they are labeled by the taxa of the population represented by that branch. Figure 4: A monophyletic group, sometimes called a clade, includes an ancestral taxon and all of its descendants.

A monophyletic group can be separated from the root with a single cut, whereas a non-monophyletic group needs two or more cuts. How to Read an Evolutionary Tree. Figure 6: Types of phylogenetic trees. These trees depict equivalent relationships despite being different in. Figure 7: Relationships on a phylogenetic tree can be depicted in multiple ways.

These trees depict equivalent relationships despite the fact that certain internal branches have been rotated so that the order of the tip labels is different.

The Importance of Phylogenetic Trees. Science , — Baum, D. American Biology Teacher 70 , — Dawkins, R. Systematic Zoology 37 , — O'Hara, R. Zoologica Scripta 26 , — Maddison, W. Tree Thinking Group homepage, Article History Close. Share Cancel. Revoke Cancel. Keywords Keywords for this Article. Save Cancel. Flag Inappropriate The Content is: Objectionable. Flag Content Cancel.

Email your Friend. Submit Cancel. This content is currently under construction. Explore This Subject. Genome Evolution. Topic rooms within Evolutionary Genetics Close. No topic rooms are there. Or Browse Visually. Other Topic Rooms Genetics. Student Voices. Creature Cast. Simply Science.

Green Screen. Green Science. Bio 2. The Success Code. Why Science Matters. The Beyond. Plant ChemCast. Postcards from the Universe. Brain Metrics.

Mind Read. Eyes on Environment. Accumulating Glitches.



0コメント

  • 1000 / 1000