# HG changeset patch # User malex # Date 1341523132 14400 # Node ID 681e9bb51cc4c771607a2efc32745854d39e1f15 # Parent 9ce35d2d9937e3bc9e6ab81cb11509066eb5896f Clean help, fix option descriptions, add genthreshfortopoterm, change filetypes to txt to make it more flexible. diff -r 9ce35d2d9937 -r 681e9bb51cc4 garli.xml --- a/garli.xml Fri Dec 02 17:07:27 2011 -0500 +++ b/garli.xml Thu Jul 05 17:18:52 2012 -0400 @@ -5,14 +5,14 @@ ## Arguments to the wrapper beyond the config file are just for Galaxy's benefit - all filenames are hardcoded garli_wrapper.py $garli_conf $best_all_tre $best_tre $log00_log $screen_log - + - + @@ -25,7 +25,7 @@ + value="1" label="Number of bootstrap replicates"> @@ -44,7 +44,7 @@ - + + + + @@ -342,7 +346,7 @@ outputeachbettertopology = 0 outputcurrentbesttopology = 0 enforcetermconditions = 1 -genthreshfortopoterm = 20000 +genthreshfortopoterm = ${genthreshfortopoterm} scorethreshforterm = 0.05 significanttopochange = 0.01 outputphyliptree = 0 @@ -408,461 +412,8 @@ Garli is written and maintained by Derrick Zwickl Configuration options are adapted from -https://www.nescent.org/wg_garli/GARLI_Configuration_Settings - ------ - -**Detailed description of the configuration options** - - -**Analysis Type** - - Specify whether to perform a maximum likelihood search for the best tree, or - a bootstrap analysis. - - -**Number of replicates** - - Number of independent search replicates to run. - - -**Relative size of resample data** - - This setting allows for bootstrap-like resampling, but with the - psuedoreplicate datasets having the number of alignment columns different - from the real data. Setting values below 1.0 is somewhat similar to - jackknifing, but not identical. - - -**Attachment branches evaluated per taxon (min=1)** - - The number of attachment branches evaluated for each taxon to be added to - the tree during the creation of an ML stepwise-addition starting tree. - Briefly, stepwise addition is an algorithm used to make a tree, and involves - adding taxa in a random order to a growing tree. For each taxon to be added, - a number of randomly chosen attachment branches are tried and scored, and - then the best scoring one is chosen as the location of that taxon. This - setting controls how many attachment points are evaluated for each taxon to - be added. A value of one is equivalent to a completely random tree (only one - randomly chosen location is evaluated). A value of greater than 2 times the - number of taxa in the dataset means that all attachment points will be - evaluated for each taxon, and will result in very good starting trees (but - may take a while on large datasets). Even fairly small values (less than 10) - can result in starting trees that are much, much better than random, but - still fairly different from one another. - - -**Constraint file** - - Select a file containing constraint specifications. - - -**Random seed** - - Random see can have a value of -1 or a positive integer. The random number - seed used by the random number generator. Specify “–1” to have a seed chosen - for you. Specifying the same seed number in multiple runs will give exactly - identical results, if all other parameters and settings are also identical. - - -**Available memory** - - This lets GARLI determine how much system memory it may be able to use to - store computations for reuse. - - -**Perform initial rough optimization** - - Specifies whether some initial rough optimization is performed on the - starting branch lengths and rate heterogeneity parameters. This is always - recommended. - - -**Outgroup taxa numbers** - - The outgroup option allows for orienting tree topologies in a consistent way - when they are written to a file. Note that this has NO effect whatsoever on - the actual inference and the specified outgroup is NOT constrained to be - present in the inferred trees. If multiple outgroup taxa are specified and - they do not form a monophyletic group, this setting will be ignored. If you - specify a single outgroup taxon it will always be present, and the tree will - always be consistently oriented. To specify an outgroup consisting of taxa - 1, 3 and 5 the format is this: outgroup = 1 3 5. Dashes are used for ranges - e.g. 1-3 5. - - -**Collapse branches** - - Before version 1.0, all trees that are returned were fully resolved. This is - true even if the maximum-likelihood estimate of some internal branch lengths - are effectively zero (or GARLI's minimum, which is 1e-8). In such cases, - collapsing the branch into a polytomy would be a better representation. Note - that GARLI will never return a tree with an actual branch length of zero, - but rather with its minimum value of 1.0e-8. The drawback of always - returning fully resolved trees is that what is effectively a polytomy can be - resolved in three ways, and different independent searches may randomly - return one of those resolutions. Thus, if you compare the trees by topology - only, they will look different. If you pay attention to the branch lengths - and likelihood scores of the trees it will be apparent that they are - effectively the same. I think that collapsing of branches is particularly - important when bootstrapping, since no support should be given to a branch - that doesn't really exist, i.e., that is a random resolution of a polytomy. - Collapsing is also good when calculating tree to tree distances such as the - symmetric tree distance, for example when calculating phylogenetic error to - a known target tree. Zero-length branches would add to the distances - (~error) although they really should not. - - -**Model type** - - The codon-aminoacid datatype means that the data will be supplied as a - nucleotide alignment, but will be internally translated and analyzed using - an amino acid model. The codon and codon-aminoacid datatypes require - nucleotide sequence that is aligned in the correct reading frame. In other - words, all gaps in the alignment should be a multiple of 3 in length, and - the alignment should start at the first position of a codon. If the - alignment has extra columns at the start, middle or end, they should be - removed or excluded with a Nexus exset (see the FAQ for an example of exset - usage). The correct Genetic Code must also be set. - - - - -**Datatype - nucleotide** - -**Rate matrix** - - The number of relative substitution rate parameters (note that the number of - free parameters is this value minus one). Equivalent to the “nst” setting in - PAUP* and MrBayes. 1rate assumes that substitutions between all pairs of - nucleotides occur at the same rate (JC model), 2rate allows different rates - for transitions and transversions (K2P or HKY models), and 6rate allows a - different rate between each nucleotide pair (GTR). These rates are estimated - unless the fixed option is chosen. Since version 0.96, parameters for any - submodel of the GTR model may be estimated. The format for specifying this - is very similar to that used in the “rclass’ setting of PAUP*. Within - parentheses, six letters are specified, with spaces between them. The six - letters represent the rates of substitution between the six pairs of - nucleotides, with the order being A-C, A-G, A-T, C-G, C-T and G-T. Letters - within the parentheses that are the same mean that a single parameter is - shared by multiple nucleotide pairs. - - -**State frequences** - - Specifies how the equilibrium state frequencies (A, C, G and T) are treated. - The empirical setting fixes the frequencies at their observed proportions, - and the other options should be self-explanatory. - - -**Datatype - nucleotide or amino-acid** - - -**Treatment of proportion of invariable sites parameter** - - Specifies whether a parameter representing the proportion of sites that are - unable to change (i.e. have a substitution rate of zero) will be included. - This is typically referred to as 'invariant sites', but would better be - termed 'invariable sites'. - - -**Rate heterogeneity type** - - (none, gamma, gammafixed) – The model of rate heterogeneity assumed. - “gammafixed” requires that the alpha shape parameter is provided, and a - setting of “gamma” estimates it. - - -**Number of discrete dN/dS categories** - - The number of categories of variable rates (not including the invariant site - class if it is being used). Must be set to 1 if ratehetmodel is set to none. - Note that runtimes and memory usage scale linearly with this setting. - - -**Datatype - amino-acid or codon-aminoacid** - -**Rate matrix** - - (poisson, jones, dayhoff, wag, mtmam, mtrev) – The fixed amino acid rate - matrix to use. You should use the matrix that gives the best likelihood, and - could use a program like PROTTEST (very much like MODELTEST, but for amino - acid models) to determine which fits best for your data. Poisson assumes a - single rate of substitution between all amino acid pairs, and is a very poor - model. - - -**Equilibrium Base Frequences ** - - (equal, empirical, estimate, fixed, jones, dayhoff, wag, mtmam, mtrev) – - Specifies how the equilibrium state frequencies of the 20 amino acids are - treated. The “empirical” option fixes the frequencies at their observed - proportions (when describing a model this is often termed '+F'). - - -**Number of discrete dN/dS categories** - - The number of categories of variable rates (not including the invariant site - class if it is being used). Must be set to 1 if ratehetmodel is set to none. - Note that runtimes and memory usage scale linearly with this setting. - - -**Treatment of proportion of invariable sites parameter** - - Specifies whether a parameter representing the proportion of sites that are - unable to change (i.e. have a substitution rate of zero) will be included. - This is typically referred to as 'invariant sites', but would better be - termed 'invariable sites'. - - -**Datatype - codon** - - -**Rate matrix** - - (1rate, 2rate, 6rate, fixed, custom string) – This determines the relative - rates of nucleotide substitution assumed by the codon model. The options are - exactly the same as those allowed under a normal nucleotide model. A codon - model with ratematrix = 2rate specifies the standard Goldman and Yang (1994) - model, with different substitution rates for transitions and transversions. - - -**State frequences** - - The options are to use equal codon frequencies (not a good option), the - frequencies observed in your dataset (termed “empirical” in GARLI), or the - codon frequencies implied by the “F1x4” or “F3x4” methods (using PAML - terminology). These last two options calculate the codon frequencies as the - product of the frequencies of the three nucleotides that make up each codon. - In the “F1x4” case the nucleotide frequencies are those observed in the - dataset across all codon positions, while the “F3x4” option uses the - nucleotide frequencies observed in the data at each codon position - separately. - - -**Rate Heterogeneity Type** - - For codon models, the default is to infer a single dN/dS parameter. - Alternatively, a model can be specified that infers a given number of dN/dS - categories, with the dN/dS values and proportions falling in each category - estimated (ratehetmodel = nonsynonymous). This is the 'discrete' or 'M3' - model of Yang et al., 2000. - - -**Number of discrete dN/dS categories** - - When ratehetmodel = nonsynonymous, this is the number of dN/dS parameter - categories. - - -**Datatype - codon or codon-aminoacid** - - -**Genetic code** - - The genetic code to be used in translating codons into amino acids. - - -**Population Settings** - - -**Number of individuals in population** - - The number of individuals in the population. This may be increased, but - doing so is generally not beneficial. Note that typical genetic algorithms - tend to have much, much larger population sizes than GARLI defaults. - - -**Unmutated copies of best individual** - - The number of times the best individual is copied to the next generation - with no chance of mutation. It is best not to mess with this setting. - - -**Strength of selection** - - Controls the strength of selection, with larger numbers denoting stronger - selection. The relative probability of reproduction of two individuals - depends on the difference in their log likelihoods (ΔlnL) and is formulated - very similarly to the procedure of calculating Akaike weights. - - -**Fitness handicap for the best individual** - - This can be used to bias the probability of reproduction of the best - individual downward. Because the best individual is automatically copied - into the next generation, it has a bit of an unfair advantage and can cause - all population variation to be lost due to genetic drift, especially with - small populations sizes. The value specified here is subtracted from the - best individual’s lnL score before calculating the probabilities of - reproduction. It seems plausible that this might help maintain variation, - but I have not seen it cause a measurable effect. - - -**Maximum number of generations to run** - - Use if automatic termination is desired to prevent a runaway process. - - -**Maximum time to run** - - The maximum number of seconds for the run to continue. Use if automatic - termination is desired to prevent a runaway process. - - -**Branch-length optimization settings** - - -**Minimal optimization precision** - - The minimum allowed value of the optimization precision - must not be larger - then the Starting optimization precision. - - -**Number of steps down from Start Precision to Minimum Precision** - - Specify the number of steps that it will take for the optimization precision - to decrease (linearly) from startoptrec to minoptprec. - - -**Tree rejection threshold** - - This setting controls which trees have more extensive branch-length - optimization applied to them. All trees created by a branch swap receive - optimization on a few branches that directly took part in the rearrangement. - If the difference in score between the partially optimized tree and the best - known tree is greater than treerejectionthreshold, no further optimization - is applied to the branches of that tree. Reducing this value can - significantly reduce runtimes, often with little or no effect on results. - However, it is possible that a better tree could be missed if this is set - too low. In cases in which obtaining the very best tree per search is not - critical (e.g., bootstrapping), setting this lower (~20) is probably safe. - - -**Settings controlling the proportions of the mutation types** - - -**Weight on topology mutations** - - The prior weight assigned to the class of topology mutations (NNI, SPR and - limSPR). Note that setting this to 0.0 turns off topology mutations, meaning - that the tree topology is fixed for the run. This used to be a way to have - the program estimate only model parameters and branch-lengths, but the - optimizeinputonly setting is now a better way to go. - - -**Weight on model parameter mutations** - - The prior weight assigned to the class of model mutations. Note that setting - this at 0.0 fixes the model during the run. - - -**Weight on branch-length parameter mutations** - - The prior weight assigned to branch-length mutations. The same procedure - used above to determine the proportion of Topology:Model:Branch-Length - mutations is also used to determine the relative proportions of the three - types of topological mutations (NNI:SPR:limSPR), controlled by the following - three weights. Note that the proportion of mutations applied to each of the - model parameters is not user controlled. - - -**Weight on NNI topology changes** - - The prior weight assigned to NNI mutations - - -**Weight on SPR topology changes** - - The prior weight assigned to random SPR mutations. For very large datasets - it is often best to set this to 0.0, as random SPR mutations essentially - never result in score increases. - - -**Weight on localized SPR topology changes** - - The prior weight assigned to SPR mutations with the reconnection branch - limited to being a maximum of limsprrange branches away from where the - branch was detached. - - -**Interval Length** - - The number of generations in each interval during which the number and - benefit of each mutation type are stored. - - -**Number of intervals to store** - - The number of intervals to be stored. Thus, records of mutations are kept - for the last (intervallength x intervalstostore) generations. Every - intervallength generations the probabilities of the mutation types are - updated by the scheme described above. - - -**Settings controlling mutation details** - - -**Max range for localized SPR topology changes** - - The maximum number of branches away from its original location that a branch - may be reattached during a limited SPR move. Setting this too high (> 10) - can seriously degrade performance, but if you do so in conjunction with a - large increase in genthreshfort. - - -**Settings controlling mutation details** - - The mean of the binomial distribution from which the number of branch - lengths mutated is drawn during a branch length mutation. - - -**Magnitude of branch-length mutations** - - The shape parameter of the gamma distribution (with a mean of 1.0) from - which the branch-length multipliers are drawn for branch-length mutations. - Larger numbers cause smaller changes in branch lengths. (Note that this has - nothing to do with gamma rate heterogeneity.) - - -**Magnitude of model parameter mutations** - - The shape parameter of the gamma distribution (with a mean of 1.0) from - which the model mutation multipliers are drawn for model parameters - mutations. Larger numbers cause smaller changes in model parameters. (Note - that this has nothing to do with gamma rate heterogeneity.) - - -**Relative weight assigned to already attempted branch swaps** - - With version 0.95 and later, GARLI keeps track of which branch swaps it has - attempted on the current best tree. Because swaps are applied randomly, it - is possible that some swaps are tried twice before others are tried at all. - This option allows the program to bias the swaps applied toward those that - have not yet been attempted. Each swap is assigned a relative weight - depending on the number of times that it has been attempted on the current - best tree. This weight is equal to (uniqueswapbias) raised to the (# times - swap attempted) power. In other words, a value of 0.5 means that swaps that - have already been tried once will be half as likely as those not yet - attempted, swaps attempted twice will be ¼ as likely, etc. A value of 1.0 - means no biasing. Use of this option may allow the use of somewhat larger - values of limsprrange. - - -**Relative weight assigned to branch swaps based on locality** - - This option is similar to uniqueswapbias, except that it biases toward - certain swaps based on the topological distance between the initial and - rearranged trees. The distance is measured as in the limsprrange, and is - half the the Robinson-Foulds distance between the trees. As with - uniqueswapbias, distanceswapbias assigns a relative weight to each potential - swap. In this case the weight is (distanceswapbias) raised to the - (reconnection distance - 1) power. Thus, given a value of 0.5, the weight of - an NNI is 1.0, the weight of an SPR with distance 2 is 0.5, with distance 3 - is 0.25, etc. Note that values less than 1.0 bias toward more localized - swaps, while values greater than 1.0 bias toward more extreme swaps. Also - note that this bias is only applied to limSPR rearrangements. Be careful in - setting this, as extreme values can have a very large effect. +https://www.nescent.org/wg_garli/GARLI_Configuration_Settings. Please see that +page for more details.