-
|
Hi, I have been working on a phylogenomic analysis for quite some time now, and I conducted a progressive removal of the fastest-evolving sites in my alignment. For this, I inferred site-specific substitution rates using IQTREE v2.0.3 (flag --rate, substitution model LG+C60+G). However, when looking at the profile of the rates inferred, they appeared to sort of follow the Gamma categories (see "Rates from IQTREE"). To be more precise, the rate value of the plateaux exactly match the gamma categories inferred during the tree search, but other sites have values in-between gamma category values. I have tried using the ML-based rate inference method (flag --mlrate), where the distribution of the values is more as expected following an exponential-ish curve (see "ML Rates from IQTREE") although there is a plateau that I do find weird. But I do find that those rates are not comparable across inferences. Indeed, I inferred such ML rates on the same sites for slow- and fast-evolving taxa in my species tree separately, and compared the rates. With this, I find that slow-evolving taxa have more sites that evolve faster than in fast-evolving taxa than the opposite (see "ML Rates from IQTREE comparison"). When comparing these inferred rates with the ones inferred using Dist_Est (https://www.mathstat.dal.ca/~tsusko/doc/dist_est.pdf, see Rates from Dist_Est), the latter looked more plausible and comparable (see "Rates from Dist_est" and "Rates from Dist_Est comparison"). All of this to ask if there was something wrong in my command when inferring rates using IQTREE (I am maybe thinking of removing Gamma in the substitution model), and if you have already seen such a distribution of rates in the past? Sorry for the humongous message and for the possible confusion. Don't hesitate to ask for anything (explanation or raw data), I remain available, and thank you very much for your time and your help! :) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
That's interesting observation indeed! I'm actually not aware of Dist_Est until now. Thanks for that. I can only comment on the perspective of two options you are using in IQ-TREE, noting that I don't really have any expectations as we don't know the ground truth.
I have no knowledge of Dist_Est and thus can't comment on the top-right figure. I don't know what are fast and slow-evolving taxa, thus also can't comment on other figures. In summary, I don't see anything unexpected [about the options of IQ-TREE]. |
Beta Was this translation helpful? Give feedback.

That's interesting observation indeed! I'm actually not aware of Dist_Est until now. Thanks for that. I can only comment on the perspective of two options you are using in IQ-TREE, noting that I don't really have any expectations as we don't know the ground truth.
--rateoption (or-wsrin IQ-TREE 1) will apply the "empirical Bayesian approach" to compute the site rates using posterior probabilities of sites belonging to each of the rate categories (4 in this case as you are using+Gwhich is a shorthand for+G4). From the output .rate file you can either grab the posterior mean or posterior mode. You are plotting the posterior mean here (top-left figure). This is not unexpected, as there…