Can AliSim start simulation from a user-defined ancestral sequence? (& how does mimicking real alignments work?) #212
Replies: 3 comments 3 replies
-
|
Hi there,
Thank you for using AliSim. Following are my answers to your questions.
1. "Does this mean that the simulated sequences will be somewhat/reasonably
similar to the input sequences?"
=> Yes, AliSim tries to simulate sequences that are as similar as possible
to the input sequces. In fact, AliSim internally runs IQ-TREE to infer a
tree and a model (which is defined by amino acid frequencies and
substitution rates). AliSim first generates an ancestral sequence at the
root of the tree using the amino acid frequencies from the model. Then,
AliSim simulates new sequences evolving along the tree using the model. So
we can expect that the frequencies of amino acids among simulated sequences
are similar to those of real sequences from the input alignment.
2. "Alternatively, does AliSim support using a user-defined ancestral
sequence as the starting point?"
=> Yes, you can use the option '--root-seq <ALN_FILE>,<SEQ_NAME>' where
<ALN_FILE> specifies the alignment that contains the root sequence, and
<SEQ_NAME> is the name of the root sequence. For more detail, please have a
look at our user manual:
http://www.iqtree.org/doc/AliSim#command-reference
Hope it is helpful. Please feel free to ping us if you have any question
when using AliSim/IQ-TREE.
Thanks,
Nhan
…On Wed, 29 May 2024 at 4:19 PM, sz-1002 ***@***.***> wrote:
Hi!
I have a question about AliSim, specifically the mimicking real alignments
part.
The documentation says it simulates alignments that are of the same length
as the input, using the inferred substitution model and tree. Does this
mean that the simulated sequences will be somewhat/reasonably similar to
the input sequences (e.g. having similar amino acid distributions at each
site)?
Alternatively, does AliSim support using a user-defined ancestral sequence
as the starting point?
Thank you very much for your help!
—
Reply to this email directly, view it on GitHub
<#212>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZPMLGX56XSXI7JPJJMQPLZEVXQPAVCNFSM6AAAAABIOHVSGCVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWG42DKNRUGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
roblanf
-
|
Hi,
does ALN_FILE has to be the same file that is supplied by "-s" (in other
words, am I just picking the root position within the input alignment to
mimick, or can I supply a different sequence not in the input alignment)?
=> the <ALN_FILE> file is not necessarily the same as the alignment
specified by "-s".
Also I assume if I supply a rooted tree, AliSim will start simulation from
the root node, but if the supplied tree is unrooted (or when no tree is
supplied) AliSim will try to root the tree first?
=> If provided with a rooted tree, AliSim will start the simulation from
the root node. If the tree (specified by users, randomly generated, or
inferred from an input alignment) is unrooted, AliSim will first create a
"fake" root. For example, given a tree: (A:0.1,B:0.2,C:0.5), AliSim first
creates a "fake" root as the parent of nodes A, B, and C.
Cheers,
Nhan
…On Thu, May 30, 2024 at 1:57 PM sz-1002 ***@***.***> wrote:
Hi Nhan,
Thank you for the quick and helpful response!
Just to clarify, in the option "--root-seq <ALN_FILE>,<SEQ_NAME>", does
ALN_FILE has to be the same file that is supplied by "-s" (in other words,
am I just picking the root position within the input alignment to mimick,
or can I supply a different sequence not in the input alignment)?
Also I assume if I supply a rooted tree, AliSim will start simulation from
the root node, but if the supplied tree is unrooted (or when no tree is
supplied) AliSim will try to root the tree first?
Thank you so much!
—
Reply to this email directly, view it on GitHub
<#212 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZPMLDIWWJKYFNYX7CA2LTZE2PUJAVCNFSM6AAAAABIOHVSGCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TMMBQHAZTM>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Hi,
I have a few comments regarding your command:
1. "-t rooted_tree.nwk"
=> Here, IQ-TREE uses your tree as a starting tree for the tree search.
This means that the final maximum likelihood tree, used for simulating new
sequences, could differ from your input tree. To fix the topology of the
maximum likelihood tree (i.e., to skip the tree search), you can use "-te
rooted_tree.nwk". If you want to keep the branch lengths unchanged as well,
you can add "-blfix". In summary, to keep your tree unchanged (both
topology and branch lengths), use "-te rooted_tree.nwk -blfix".
2. About your input rooted tree
=> You are using the feature to simulate new alignments that mimic your
input alignment. Therefore, AliSim first runs IQ-TREE to infer a tree and
model parameters, then uses that tree and model to simulate new sequences.
In this case, if you input a rooted tree, IQ-TREE will first unroot your
tree to perform the inference. Given the unrooted tree from IQ-TREE, AliSim
will create a "fake" root (as I described in the previous email) before
simulating the sequences. Even if you use "-te rooted_tree.nwk -blfix", we
can only keep the topology and branch lengths unchanged, but your tree will
be unrooted. If the root is crucial to your research, you might consider:
(1) manually running IQ-TREE to infer the model parameters (fixing tree
topology and branch lengths), then (2) running AliSim to simulate
alignments from your (rooted) tree and the model parameters inferred by
IQ-TREE.
3. Is this because I'm specifying an indel rate that is too high or too
low? (What is a good range of indel rates in general?)
=> I don't have a specific "good" range for indels, you may need to check
it from the literature. However, according to Cartwright, 2009
<https://academic.oup.com/mbe/article/26/2/473/1034550>, *"The estimated
relative rates are about 12–16 indels per 100 substitutions". If so, you
may consider specifying the insertion-deletion rates so that the sum of
them is between 0.12 and 0.16. *
4. AliSim crashes
=> I found a bug in AliSim when users simulate multiple alignments with
indels. When simulating the second alignment, it attempts to reuse a
variable that was deleted after the first alignment. I have fixed this bug,
and it will be included in the next release of IQ-TREE. For the current
version, you can remove the option "--num-alignments 3" and run AliSim
three times (with three different seed numbers). If this is not convenient,
please let me know, and I can send you an unofficial build of IQ-TREE (with
the bug fix included) to use while waiting for the next release.
Hope it helps!
Cheers,
Nhan
…On Fri, May 31, 2024 at 8:45 PM sz-1002 ***@***.***> wrote:
Hi Nhan,
Thanks a lot for the reply! Sorry I have yet another question...
I tested AliSim on some sequences using the command iqtree2 --alisim
test_alignment_mimic -s test.aln.fasta -m "WAG+F+I+G4" -t rooted_tree.nwk
--num-alignments 3 --out-format fasta --write-all --seed 1001
--no-unaligned -redo --indel "0.0001,0.0003", and got the following
error: (copying the last few lines in the log file)
Model of rate heterogeneity: Invar+Gamma with 4 categories
Proportion of invariable sites: 0.000
Gamma shape alpha: 0.952
Category Relative_rate Proportion
1 0.127 0.250
2 0.462 0.250
3 0.992 0.250
4 2.419 0.250
Relative rates are computed as MEAN of the portion of the Gamma distribution falling in the category.
ERROR: Opps! Insertion occurs at an invalid position. There is something wrong!
In another run (with larger indel rates), I got:
ERROR: genometree.cpp:298: void GenomeTree::exportReadableCharacters(vector<short> &, int, vector<std::string> &, std::string &): Assertion `(num_sites_per_state == 1 ? (pos_new + node->length) : ((pos_new + node->length) * num_sites_per_state)) <= output.length()' failed.
ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: *** Log file: test.aln.fasta.log
ERROR: *** Alignment files (if possible)
run_alisim.sh: line 3: 1504 Aborted iqtree2 --alisim test_alignment_mimic -s test.aln.fasta -m "WAG+F+I+G4" -t rooted_tree.nwk --num-alignments 3 --out-format fasta --write-all --seed 1001 --no-unaligned -redo --indel "0.001,0.003"
These errors seem to be related to the "--indel" option, as everything
finishes successfully if I don't include this option. Is this because I'm
specifying an indel rate that is too high or too low? (What is a good range
of indel rates in general?)
Here are the sequence and tree files used:
https://drive.google.com/drive/folders/1YOibeqjosbZ1H1qb-sFUJeZryG32rCY0?usp=sharing
Would you mind taking a look at what could be wrong here? Thank you very
much!
—
Reply to this email directly, view it on GitHub
<#212 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZPMLEPI6JXL72AXYVPG2DZFBIGJAVCNFSM6AAAAABIOHVSGCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TMMJWHE2TC>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I have a question about AliSim, specifically the mimicking real alignments part.
The documentation says it simulates alignments that are of the same length as the input, using the inferred substitution model and tree. Does this mean that the simulated sequences will be somewhat/reasonably similar to the input sequences (e.g. having similar amino acid distributions at each site)?
Alternatively, does AliSim support using a user-defined ancestral sequence as the starting point?
Thank you very much for your help!
Beta Was this translation helpful? Give feedback.
All reactions