# HG changeset patch # User kaymccoy # Date 1481493696 18000 # Node ID 9bc8cfd2ab08880ae2ed057ee0c2a546df9e3574 # Parent 5ff57a3d0af2b71de6b9df95182fb4649d7b0687 Uploaded diff -r 5ff57a3d0af2 -r 9bc8cfd2ab08 aggregate.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/aggregate.xml Sun Dec 11 17:01:36 2016 -0500 @@ -0,0 +1,102 @@ + + fitness calculations by gene + + biopython + + + aggregate.py + #if $mark.certain == "yes": + -m $mark.genes + #end if + #if $weighted.algorithms == "yes": + -w 1 + #end if + -x $cutoff + -l $weightceiling + #if $blank.count == "yes": + -b $blank.custom_blanks + #end if + #if $blank.count == "no": + -f $blank.txt_blanks + #end if + -c $ref + -o $output + $input + #for $a in $additionalcsv + ${a.input2} + #end for + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +This tool calculates the aggregate fitness values of mutations by gene. + +**The options explained** + +The csv fitness file(s): These are the csv (comma separated values) files containing the fitness values you want to aggregate by gene. Since they should have been produced by the "Calculate Fitness" tool, each line besides the header should represent the following information for an insertion location: position,strand,count_1,count_2,ratio,mt_freq_t1,mt_freq_t2,pop_freq_t1,pop_freq_t2,gene,D,W,nW + +GenBank reference genome: the reference genome of whatever model you're working with, which needs to be in standard genbank format. For more on that format see the genbank website. + +Marking certain genes: If you chose to mark certain genes, those genes will have an "M" under the M column of the resulting aggregate file. + +Using weighted algorithms: Recommended. If you chose to use weighted algorithms, scores will be weighted by the number of reads their insertion location has, as insertions with more reads tend to be more accurate. + +Weight ceiling: This value lets you set a weight ceiling for the weights of fitness values. It's only relevant if you're using weighted algorithms. + +Cutoff3: This value lets you ignore the fitness scores of any insertion locations with an average count (the number of counts from t1 and t2 divided by 2) less than it. + +Bottleneck value: The percentage of insertions randomly lost, which will be discounted for all genes (for example, 20% would be entered as 0.20; default 0 if entered by hand). You can just use the blank % calculated from the normalization genes by calc_fit by entering its txt output file + +The name of your output file: self-explanatory. Remember to have it end in ".csv". + +**Additional notes** + +The output file should have each line (besides the header) represent the following information for a particular gene: locus,mean,var,sd,se,gene,Total,Blank,Not Blank,Blank Removed,M + + + \ No newline at end of file