-
|
Hi, I'm using "IQ-TREE multicore version 2.3.4 COVID-edition for Linux x86 64-bit built Apr 26 2024" to identify the optimal model of sequence evolution for a set of coronavirus genomes. To do so, I'm using the following command: iqtree2 -s coronaviridae_n2529_catcore_aln_trim20.fasta --seqtype DNA -m MF --mrate E,I,G,*G,I+G,R,*R,I+R,H,*H -cmax 15 --mtree --merit BIC --safe -T 16 Most of the more complex models were processed successfully, but K81*H4 caused the program to abort with the following lines: The original log file was 9.88 Gb, so I had to crop it. I did so by removing many thousand of lines with the following text: What might be the problem? Is there a need to repeat the warning mentioned above many times for each model? All the best, Lars |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
|
It can be due to the optimiser being unable to fit the branch lengths mixture model (*H or +H). Your data set is very large: Alignment has 2529 sequences with 12074 columns, 10858 distinct patterns Each +H class will add |
Beta Was this translation helpful? Give feedback.
-
|
Minh,
That is a valid point, and one that I have considered. It is perplexing, though, that the code did not keel over with the more complex substitution models, like the GTR model. Why the K81 model? Is it just by chance that this model was the one to cause the software to abort?
All the best,
Lars
From: Bui Quang Minh ***@***.***>
Date: Sunday, 26 May 2024 at 12:42
To: iqtree/iqtree2 ***@***.***>
Cc: Jermiin, Lars ***@***.***>, Author ***@***.***>
Subject: Re: [iqtree/iqtree2] ModelFinder unexpectedly fails testing model K81*H4 (Discussion #208)
EXTERNAL EMAIL: This email originated outside the University of Galway. Do not open attachments or click on links unless you believe the content is safe.
RÍOMHPHOST SEACHTRACH: Níor tháinig an ríomhphost seo ó Ollscoil na Gaillimhe. Ná hoscail ceangaltáin agus ná cliceáil ar naisc mura gcreideann tú go bhfuil an t-ábhar sábháilte.
It can be due to the optimiser being unable to fit the branch lengths mixture model (*H or +H). Your data set is very large:
Alignment has 2529 sequences with 12074 columns, 10858 distinct patterns
Each +H class will add 22529-3=5055 parameters (for branch lengths). So H4 model means 50554 = 20220 parameters (!) That's overkill for IQ-TREE. I'd recommend not to use such a complex models. You have "only" ~ 12K sites, that's not enough to resolve such complex model. That's why there are lots of WARNINGs about the numerical issues (there are printed many times for a reason).
—
Reply to this email directly, view it on GitHub<#208 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AG6JXEG2JTUOFKE7H4GIGP3ZEHDBJAVCNFSM6AAAAABIIESW76VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TKNRQG42TA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
From your log file, I highlight some important lines in bold: Rate parameters: A-C: 1.00000 A-G: 1.63867 A-T: 0.93132 C-G: 0.93132 C-T: 1.63867 G-T: 1.00000 As you can see the mixture classes 3 and 4 have the A-G and C-T rates hitting the lower bound, whereas on the contrary the A-T and C-G rates hitting the upper bound. There are also warnings about that printed. And class 3 and 4 happen to have exactly the same parameter values. The "heterotachy" weights are ridiculously low, 0.002 and 0.001. That means, these classes do not contribute much to the likelihood. And since their class likelihood are low, they are causing numerical troubles. Therefore, this *H4 model does not fit the data, they are an overparameterisation. The +H4 model didn't have that because the K81 parameters are "linked". Hope that explains now. |
Beta Was this translation helpful? Give feedback.
It can be due to the optimiser being unable to fit the branch lengths mixture model (*H or +H). Your data set is very large:
Alignment has 2529 sequences with 12074 columns, 10858 distinct patterns
Each +H class will add
2*2529-3=5055parameters (for branch lengths). So H4 model means5055*4 = 20220parameters (!) That's overkill for IQ-TREE. I'd recommend not to use such a complex models. You have "only" ~ 12K sites, that's not enough to resolve such complex model. That's why there are lots of WARNINGs about the numerical issues (there are printed many times for a reason).