Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to set the "TVM+F+I+R3" model? #282

Open
liamxg opened this issue Jul 15, 2023 · 4 comments
Open

how to set the "TVM+F+I+R3" model? #282

liamxg opened this issue Jul 15, 2023 · 4 comments
Labels

Comments

@liamxg
Copy link

liamxg commented Jul 15, 2023

@pontus @viklund @olas @eryl @msuchard

@Zshuyun
Copy link

Zshuyun commented Jul 21, 2023

Hello, I have also encountered the same problem, have you solved it? How should I set up the “JTT+I+G+F” model?

@liamxg
Copy link
Author

liamxg commented Jul 21, 2023

@Zshuyun sorry, no one reply to me.

@Zshuyun
Copy link

Zshuyun commented Jul 21, 2023

Okay, thank you

@nylander
Copy link
Collaborator

nylander commented Feb 2, 2024

Dear @liamxg

As for the "TVM"

From the help on lset:

Nst -- Sets the number of substitution types: "1" constrains all of
       the rates to be the same (e.g., a JC69 or F81 model); "2" all-
       ows transitions and transversions to have potentially different
       rates (e.g., a K80 or HKY85 model); "6" allows all rates to
       be different, subject to the constraint of time-reversibility
       (e.g., a GTR model). Finally, 'nst' can be set to 'mixed', which
       results in the Markov chain sampling over the space of all poss-
       ible reversible substitution models, including the GTR model and
       all models that can be derived from it model by grouping the six
       rates in various combinations. This includes all the named models
       above and a large number of others, with or without name.

For a nt "4-by-4" setup, you specify the number of substitution types with
lset nst=, choosing one of the options 1, 2, 6, or Mixed. Setting nst=1
means AC=AG=AT=CG=CT=GT, and nst=6 AC,CG,AT,GT,AG,CT. Using nst=2 will
set AC=AT=CG=GT,AG=CT. "TVM" would be AC,CG,AT,GT,AG=CT, but you can not
specify this specific rate configuration in MrBayes (no nst=5 for example).

However, one may try to "emulate" a TVM model, by setting lset nst=6, then
use the prset command to change to a highly informative prior for the
substitution rates (Revmatpr). From the help on prset:

Revmatpr -- This parameter sets the prior for the substitution rates
            of the GTR model for nucleotide data. The options are:
              prset revmatpr = dirichlet(<number>,<number>,...,<number>)
              prset revmatpr = fixed(<number>,<number>,...,<number>)

               The program assumes that the six substitution rates
               are independent gamma-distributed random variables with the
               same scale parameter when dirichlet is selected. The six
               numbers in brackets each corresponds to a particular substi-
               tution type. Together, they determine the shape of the prior
               The six rates are in the order A<->C, A<->G, A<->T, C<->G,
               C<->T, and G<->T. If you want an uninformative prior you can
               use dirichlet(1,1,1,1,1,1), also referred to as a 'flat'
               Dirichlet. This is the default setting. If you wish a prior
               where the C<->T rate is 5 times and the A<->G rate 2 times
               higher, on average, than the transversion rates, which are
               all the same, then you should use a prior of the form
               dirichlet(x,2x,x,x,5x,x), where x determines how much the
               prior is focused on these particular rates. For more info,
               see tratiopr. The fixed option allows you to fix the substi-
               tution rates to particular values.

As for the "+F" and "+R3"

"+F" is probably the syntax used in iqtree2 for applying "Empirically counted
frequencies from alignment" when estimating the state frequencies. MrBayes
uses MCMC to integrate over all possible state frequencies, and the settings
for this can be changed with the prset Statefreqpr command (see output from
help prset).

"+R3" is probably the syntax used in iqtree2 for applying "the FreeRate model
with 3 categories" for modelling rate heterogeneity among sites. In MrBayes
(v3.2.7a), a "FreeRate"-model can be applied by using lset rates=kmixture.
See the output from help lset.

Currently, the models in MrBayes (v3.2.7a) are not set up to (easily) combine
the +I (or +G) with +Rn.

A related comment

Due to the fact that different software implements
different models, some software have made program-specific subsets
available for easier comparison (e.g., MrModeltest2, Modeltest-NG, IQ-tree, ...).
These can be useful for many purposes.

Yours

Johan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants