Mercurial > repos > george-weingart > maaslin
annotate maaslin-4450aa4ecc84/README.md @ 1:a87d5a5f2776
Uploaded the version running on the prod server
author | george-weingart |
---|---|
date | Sun, 08 Feb 2015 23:08:38 -0500 |
parents | |
children |
rev | line source |
---|---|
1
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
1 MaAsLin User Guide v3.1 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
2 ======================= |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
3 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
4 September 2013 - Updated April 2014 for Galaxy |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
5 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
6 Timothy Tickle and Curtis Huttenhower |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
7 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
8 Table of Contents |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
9 ----------------- |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
10 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
11 A. Introduction to MaAsLin |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
12 B. Related Projects and Scripts |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
13 C. Installing MaAsLin |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
14 D. MaAsLin Inputs |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
15 E. Process Flow Overview |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
16 D. Process Flow Detail |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
17 G. Expected Output Files |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
18 H. Troubleshooting |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
19 I. Installation as an Automated Pipeline |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
20 J. Commandline Options (Modifying Process and Figures) |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
21 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
22 # A. Introduction to MaAsLin |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
23 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
24 MaAsLin is a multivariate statistical framework that finds |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
25 associations between clinical metadata and potentially |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
26 high-dimensional experimental data. MaAsLin performs boosted additive |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
27 general linear models between one group of data (metadata/the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
28 predictors) and another group (in our case relative taxonomic |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
29 abundances/the response). In our context we use it to discover |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
30 associations between clinical metadata and microbial community |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
31 relative abundance or function; however, it is applicable to other |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
32 data types. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
33 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
34 Metagenomic data are sparse, and boosting is used to select metadata |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
35 that show some potential to be useful in a linear model between the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
36 metadata and abundances. In the context of metadata and community |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
37 abundance, a sample's metadata is boosted for one Operational |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
38 Taxonomic Unit (OTU) (Yi). The metadata that are selected by boosting |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
39 are then used in a general linear model, with each combination of |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
40 metadata (as predictors) and OTU abundance (as response |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
41 variables). This occurs for every OTU and metadata combination. Given |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
42 we work with proportional data, the Yi (abundances) are |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
43 `arcsin(sqrt(Yi))` transformed. A final formula is as follows: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
44 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
45 ![](https://bitbucket.org/biobakery/maaslin/downloads/maaslinformula2.png) |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
46 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
47 For more information about maaslin please visit |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
48 [http://huttenhower.sph.harvard.edu/maaslin](http://huttenhower.sph.harvard.edu/maaslin). |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
49 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
50 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
51 # B. Related Projects and Scripts |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
52 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
53 Other projects exist at www.bitbucket.com that may help in your |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
54 analysis: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
55 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
56 * **QiimeToMaAsLin** is a project that reformats abundance files from |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
57 Qiime for MaAsLin. Several formats of Qiime consensus lineages are |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
58 supported for this project. To download please visit |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
59 [https://bitbucket.org/timothyltickle/qiimetomaaslin](https://bitbucket.org/timothyltickle/qiimetomaaslin). |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
60 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
61 * **merge_metadata.py** is a script included in the MaAsLin project to |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
62 generically merge a metadata file with a table of microbial (or |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
63 other) measurements. This script is located in `maaslin/src` and |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
64 is documented in `maaslin/doc/ Merge_Metadata_Read_Me.txt`. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
65 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
66 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
67 # C. Installing MaAsLin |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
68 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
69 R Libraries: Several libraries need to be installed in R these are |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
70 the following: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
71 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
72 * agricolae, gam, gamlss, gbm, glmnet, inlinedocs, logging, MASS, nlme, optparse, outliers, penalized, pscl, robustbase, testhat, vegan |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
73 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
74 You can install them by typing R in a terminal and using the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
75 install.packages command: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
76 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
77 install.packages(c('agricolae', 'gam', 'gamlss', 'gbm', 'glmnet', 'inlinedocs', 'logging', 'MASS', 'nlme', 'optparse', 'outliers', 'penalized', 'pscl', 'robustbase', 'testthat')) |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
78 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
79 # D. MaAsLin Inputs |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
80 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
81 There are 3 input files for each project, the "\*.read.config" file, the "\*.pcl" file, and the "\*.R" script. (If using the sfle automated pipeline, the "\*" in the file names can be anything but need to be identical for all three files. All three files need to be in the `../sfle/input/maasalin/input` folder only if using sfle). Details of each file follow: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
82 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
83 ### 1\. "\*.pcl" |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
84 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
85 Required input file. A PCL file is the file that contains all the data |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
86 and metadata. This file is formatted so that metadata/data (otus or |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
87 bugs) are rows and samples are columns. All metadata rows should come |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
88 first before any abundance data. The file should be a tab delimited |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
89 text file with the extension ".pcl". |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
90 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
91 ### 2\. "\*.read.config" |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
92 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
93 Required input file. A read config file allows one to indicate what data is read from a PCL file without having to change the pcl file or change code. This means one can have a pcl file which is a superset of metadata and abundances which includes data you are not interested in for the run. This file is a text file with ".read.config" as an extension. This file is later described in detail in section **F. Process Flow Overview** subsection **4. Create your read.config file**. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
94 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
95 ### 3\. "\*.R" |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
96 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
97 Optional input file. The R script file is using a call back programming pattern that allows one to add/modify specific code to customize analysis without touching the main MaAsLin engine. A generic R script is provided "maaslin_demo2.R" and can be renamed and used for any study. The R script can be modified to add quality control or formatting of data, add ecological measurements, or other changes to the underlying data before MaAsLin runs on it. This file is not required to run MaAsLin. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
98 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
99 # E. Process Flow Overview |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
100 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
101 1. Obtain your abundance or relative function table. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
102 2. Obtain your metadata. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
103 3. Format and combine your abundance table and metadata as a pcl file for MaAsLin. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
104 4. Create your read.config file. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
105 5. Create your R script or use the default. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
106 6. Place .pcl, .read.config, .R files in `../sfle/input/maaslin/input/` (sfle only) |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
107 7. Run |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
108 8. Discover amazing associations in your results! |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
109 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
110 # F. Process Flow Detail |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
111 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
112 ### 1\. Obtain your abundance or relative function table. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
113 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
114 Abundance tables are normally derived from sequence data using |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
115 *Mothur*, *Qiime*, *HUMAnN*, or *MetaPhlAn*. Please refer to their documentation |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
116 for further details. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
117 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
118 ### 2\. Obtain your metadata. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
119 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
120 Metadata would be information about the samples in the study. For |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
121 instance, one may analyze a case / control study. In this study, you |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
122 may have a disease and healthy group (disease state), the sex of the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
123 patents (patient demographics), medication use (chemical treatment), |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
124 smoking (patient lifestyle) or other types of data. All aforementioned |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
125 data would be study metadata. This section can have any type of data |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
126 (factor, ordered factor, continuous, integer, or logical |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
127 variables). If a particular data is missing for a sample for a |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
128 metadata please write NA. It is preferable to write NA so that, when |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
129 looking at the data, it is understood the metadata is missing and it's |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
130 absence is intentional and not a mistake. Often investigators are |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
131 interested in genetic measurements that may also be placed in the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
132 metadata section to associate to bugs. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
133 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
134 If you are not wanting to manually add metadata to your abundance |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
135 table, you may be interested in associated tools or scripts to help |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
136 combine your abundance table and metadata to create your pcl |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
137 file. Both require a specific format for your metadata file. Please |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
138 see the documentation for *QiimeToMaaslin* or *merge_metadata.py* (for |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
139 more details see section B). |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
140 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
141 ### 3\. Format and combine your abundance table and metadata as a pcl |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
142 file for *MaAsLin*. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
143 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
144 Please note two tools have been developed to help you! If you are |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
145 working from a Qiime OTU output and have a metadata text file try using |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
146 *QiimeToMaaslin* found at bitbucket. If you have a tab delimited file |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
147 which matches the below .pcl description (for instance MetaPhlAn |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
148 output) use the merge_metadata.py script provided in this project |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
149 (`maaslin/src/merge_metadata.py`) and documented in |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
150 `maaslin/doc/Merge_Metadata_Read_Me.txt`. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
151 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
152 ###PCL format description: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
153 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
154 i. Row 1 is expected to be sample IDs beginning the first column with a feature name to identify the row, for example "ID". |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
155 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
156 ii. Rows of metadata. Each row is one metadata, the first column entry |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
157 being the name of the metadata and each following column being the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
158 metadata value for that each sample. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
159 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
160 iii. Row of taxa/otu abundance. Each row is one taxa/otu, the first |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
161 column entry being the name of the taxa/otu followed by abundances of |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
162 the taxa/otu per sample. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
163 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
164 iv. Abundances should be normalized by dividing each abundance measurement by the sum of the column (sample) abundances. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
165 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
166 v. Here is an example of the contents of an extremely small pcl file; |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
167 another example can be found in this project at |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
168 `maaslin/input/maaslin_demo.pcl`. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
169 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
170 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
171 ID Sample1 Sample2 Sample3 Sample4 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
172 metadata1 True True False False |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
173 metadata2 1.23 2.34 3.22 3.44 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
174 metadata3 Male Female Male Female |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
175 taxa1 0.022 0.014 0.333 0.125 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
176 taxa2 0.406 0.029 0.166 0.300 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
177 taxa3 0.571 0.955 0.500 0.575 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
178 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
179 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
180 ### 4\. Create your read.config file. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
181 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
182 A *.read.config file is a structured text file used to indicate which |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
183 data in a *.pcl file should be read into MaAsLin and used for |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
184 analysis. This allows one to keep their *.pcl file intact while |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
185 varying analysis. Hopefully, this avoids errors that may be introduced |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
186 while manipulating the pcl files. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
187 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
188 Here is an example of the contents of a *.read.config file. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
189 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
190 Matrix: Metadata |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
191 Read_PCL_Columns: Sample2-Sample15 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
192 Read_PCL_Rows: Age-Height,Weight,Sex,Cohort-Profession |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
193 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
194 Matrix: Abundance |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
195 Read_PCL_Columns: Sample2-Sample15 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
196 Read_PCL_Rows: Bacteria-Bug100 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
197 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
198 The minimal requirement for a MaAsLin .read.config file is as |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
199 follows. The Matrix: should be specified. Metadata needs to be named |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
200 "Metadata" for the metadata section and "Abundance" for the abundance |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
201 section. “Read\_PCL\_Rows:” is used to indicate which rows are data or |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
202 metadata to be analyzed. Rows can be identified by their metadata/data |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
203 id. Separate ids by commas. If there is a consecutive group of |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
204 metadata/data a range of rows can be defined by indicating the first |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
205 and last id separated by a “-“. If the beginning or ending id is |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
206 missing surrounding an “–“, the rows are read from the beginning or to |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
207 the end of the pcl file, respectively. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
208 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
209 A minimal example is shown here: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
210 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
211 Matrix: Metadata |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
212 Read\_PCL\_Rows: -Weight |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
213 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
214 Matrix: Abundance |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
215 Read_PCL_Rows: Bacteria- |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
216 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
217 With this minimal example, the delimiter of the file is assumed to be |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
218 a tab, all columns are read (since they are not indicated |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
219 here). Metadata are read as all rows from the beginning of the pcl |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
220 file (skipping the first Sample ID row) to Weight; all data are read |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
221 as all rows from Bacteria to the end of the pcl file. This example |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
222 refers to the default input files given in the MaAsLin download as |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
223 maaslin_demo2.\*. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
224 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
225 ### 5\. Optionally, create your R script. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
226 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
227 The R script is used to add code that manipulates your data before |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
228 analysis, and for manipulating the multifactoral analysis figure. A |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
229 default “*.R” script is available with the default MaAsLin project at |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
230 maaslin/input/maaslin_demo2.R. This is an expert option and should |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
231 only be used by someone very comfortable with the R language. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
232 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
233 ###6. Optional step if using the sfle analysis pipeline. Place .pcl, .read.config, and optional .R files in `../sfle/input/maasalin/input` |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
234 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
235 ###7. Run. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
236 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
237 By running the commandline script: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
238 On the commandline call the Maaslin.R script. Please refer to the help (-h, --help) for command line options. If running from commandline, the PCL file will need to be transposed. A script is included in Maaslin for your convenience (src/transpose.py). The following example will have such a call included. An example call from the Maaslin folder for the demo data could be as follows. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
239 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
240 ./src/transpose.py < input/maaslin_demo2.pcl > maaslin_demo2.tsv |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
241 ./src/Maaslin.R -i input/maaslin_demo2.read.config demo.text maaslin_demo2.tsv |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
242 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
243 When using sfle: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
244 Go to ../sfle and type the following: scons output/maaslin |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
245 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
246 ###8. Discover amazing associations in your results! |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
247 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
248 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
249 #G. Expected Output Files |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
250 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
251 The following files will be generated per MaAsLin run. In the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
252 following listing the term projectname refers to what you named your "\*.pcl" file without the extension. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
253 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
254 ###Output files that are always created: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
255 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
256 **projectname_log.txt** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
257 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
258 This file contains the detail for the statistical engine. This can be |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
259 useful for detailed troubleshooting. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
260 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
261 **projectname-metadata.txt** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
262 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
263 Each metadata will have a file of associations. Any associations |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
264 indicated to be performed after initial variable selection (boosting) |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
265 is recorded here. Included are the information from the final general |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
266 linear model (performed after the boosting) and the FDR corrected |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
267 p-value (q-value). Can be opened as a text file or spreadsheet. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
268 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
269 **projectname-metadata.pdf** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
270 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
271 Any association that had a q-value less than or equal to the given |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
272 significance threshold will be plotted here (default is 0.25; can be |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
273 changed using the commandline argument -d). If this file does not |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
274 exist, the projectname-metadata.txt should not have an entry that is |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
275 less than or equal to the threshold. Factor data is plotted as |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
276 knotched box plots; continuous data is plotted as a scatter plot with |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
277 a line of best fit. Two plots are given for MaAslin Methodology; the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
278 left being a raw data plot, the right being a corresponding partial |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
279 residual plot. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
280 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
281 **projectname.pdf** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
282 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
283 Contains the biplot visualization. This visualization is presented as a build and can be affected by modifications in the R.script or by using commandline. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
284 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
285 **projectname.txt** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
286 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
287 A collection of all entries in the projectname-metadata.pdf. Can be |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
288 opened as a text file or spreadsheet. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
289 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
290 ###Additional troubleshooting files when the commandline: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
291 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
292 **data.tsv** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
293 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
294 The data matrix that was read in (transposed). Useful for making sure |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
295 the correct data was read in. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
296 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
297 **data.read.config** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
298 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
299 Can be used to read in the data.tsv. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
300 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
301 **metadata.tsv** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
302 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
303 The metadata that was read in (transposed). Useful for making sure the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
304 correct metadata was read in. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
305 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
306 **metadata.read.config** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
307 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
308 Can be used to read in the data.tsv. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
309 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
310 **read_merged.tsv** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
311 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
312 The data and metadata merged (transposed). Useful for making sure the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
313 merging occurred correctly. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
314 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
315 **read_merged.read.config** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
316 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
317 Can be used to read in the read_merged.tsv. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
318 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
319 **read_cleaned.tsv** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
320 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
321 The data read in, merged, and then cleaned. After this process the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
322 data is written to this file for reference if needed. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
323 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
324 **read_cleaned.read.config** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
325 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
326 Can be used to read in read_cleaned.tsv. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
327 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
328 **ProcessQC.txt** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
329 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
330 Contains quality control for the MaAsLin analysis. This includes |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
331 information on the magnitude of outlier removal. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
332 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
333 **Run_Parameters.txt** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
334 Contains an account of all the options used when running MaAsLin so the exact methodology can be recreated if needed. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
335 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
336 #H. Other Analysis Flows |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
337 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
338 ###1. All verses All |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
339 The all verses all analysis flow is a way of manipulating how metadata are used. In this method there is a group of metadata that are always evaluated, as well there are a group that are added to this one at a time. To give a more concrete example: You may have metadata cage, diet, and treatment. You may always want to have the association of abundance evaluated controlling for cage but otherwise looking at the metadata one at a time. In this way the cage metadata is the \D2forced\D3 part of the evaluation while the others are not forced and evaluated in serial. The appropriate commandline to indicate this follows (placed in your args file if using sfle, otherwise added in the commandline call): |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
340 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
341 > -a -F cage |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
342 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
343 -a indicates all verses all is being used, -F indicates which metadata are forced (multiple metadata can be given comma delimited as shown here -F metadata1,metadata2,metadata3). This does not bypass the feature selection method so the metadata that are not forced are subject to feature selection and may be removed before coming to the evaluation. If you want all the metadata that are not forced to be evaluated in serial you will need to turn off feature selection and will have a final combined commandline as seen here: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
344 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
345 > -a -F cage -s none |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
346 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
347 #I. Troubleshooting |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
348 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
349 ###1\. (Only valid if using Sfle) ImportError: No module named sfle |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
350 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
351 When using the command "scons output/maaslin/..." to run my projects I |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
352 get the message: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
353 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
354 ImportError: No module named sfle: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
355 File "/home/user/sfle/SConstruct", line 2: |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
356 import sfle |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
357 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
358 **Solution:** You need to update your path. On a linux or MacOS terminal |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
359 in the sfle directory type the following. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
360 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
361 export PATH=/usr/local/bin:`pwd`/src:$PATH |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
362 export PYTHONPATH=$PATH |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
363 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
364 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
365 ###2\. When trying to run a script I am told I do not have permission |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
366 even though file permissions have been set for myself. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
367 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
368 **Solution:** Most likely, you need to set the main MaAsLin script |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
369 (Maaslin.R) to executable. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
370 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
371 #J. Installation as an Automated Pipeline |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
372 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
373 SflE (pronounced souffle), is a framework for automation and |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
374 parallelization on a multiprocessor machine. MaAsLin has been |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
375 developed to be compatible with this framework. More information can |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
376 be found at |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
377 [http://huttenhower.sph.harvard.edu/sfle](http://huttenhower.sph.harvard.edu/sfle). If |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
378 interested in installing MaAsLin in a SflE environment. After |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
379 installing SflE, download or move the complete maaslin directory into |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
380 `sfle/input`. After setting up, one places all maaslin input files in |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
381 `sfle/input/maaslin/input`. To run the automated pipeline and analyze |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
382 all files in the `sfle/input/maaslin/input` directory, type: `scons output/maaslin` |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
383 in a terminal in the sfle directory. This will produce |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
384 output in the `sfle/output/maaslin` directory. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
385 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
386 #K. Commandline Options (Modifying Process and Figures) |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
387 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
388 Although we recommend the use of default options, commandline |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
389 arguments exist to modify both MaAsLin methodology and figures. To see |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
390 an up-to-date listing of argument usage, in a terminal in the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
391 `maaslin/src` directory type `./Maaslin.R -h`. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
392 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
393 An additional input file (the args file) can be used to apply |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
394 commandline arguments to a MaAsLin run. This is useful when using |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
395 MaAsLin as an automated pipeline (using SflE) and is a way to document |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
396 what commandline are used for different projects. The args file should |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
397 be named the same as the *.pcl file except using the extension .args |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
398 . This file should be placed in the `maaslin/input` directory with the |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
399 other matching project input files. In this file please have one line |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
400 of arguments and values (if needed; some arguments are logical flags |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
401 and do not require a value), each separated by a space. The contents |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
402 of this file will be directly added to the commandline call for |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
403 Maaslin.R. An example of the contents of an args file is given here. |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
404 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
405 **Example.args:** |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
406 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
407 -v DEBUG -d 0.1 -b 5 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
408 |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
409 In this example MaAsLin is modified to produce verbose output for |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
410 debugging (-v DEBUG), to change the threshold for making pdfs to a |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
411 q-value equal to or less than 0.1 (-d 0.1), and to plot |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
412 5 data (bug) features in the biplot (-b 5). |
a87d5a5f2776
Uploaded the version running on the prod server
george-weingart
parents:
diff
changeset
|
413 |