annotate maaslin-4450aa4ecc84/README.md @ 1:a87d5a5f2776

Uploaded the version running on the prod server
author george-weingart
date Sun, 08 Feb 2015 23:08:38 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
1 MaAsLin User Guide v3.1
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
2 =======================
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
3
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
4 September 2013 - Updated April 2014 for Galaxy
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
5
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
6 Timothy Tickle and Curtis Huttenhower
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
7
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
8 Table of Contents
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
9 -----------------
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
10
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
11 A. Introduction to MaAsLin
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
12 B. Related Projects and Scripts
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
13 C. Installing MaAsLin
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
14 D. MaAsLin Inputs
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
15 E. Process Flow Overview
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
16 D. Process Flow Detail
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
17 G. Expected Output Files
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
18 H. Troubleshooting
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
19 I. Installation as an Automated Pipeline
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
20 J. Commandline Options (Modifying Process and Figures)
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
21
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
22 # A. Introduction to MaAsLin
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
23
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
24 MaAsLin is a multivariate statistical framework that finds
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
25 associations between clinical metadata and potentially
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
26 high-dimensional experimental data. MaAsLin performs boosted additive
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
27 general linear models between one group of data (metadata/the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
28 predictors) and another group (in our case relative taxonomic
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
29 abundances/the response). In our context we use it to discover
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
30 associations between clinical metadata and microbial community
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
31 relative abundance or function; however, it is applicable to other
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
32 data types.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
33
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
34 Metagenomic data are sparse, and boosting is used to select metadata
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
35 that show some potential to be useful in a linear model between the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
36 metadata and abundances. In the context of metadata and community
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
37 abundance, a sample's metadata is boosted for one Operational
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
38 Taxonomic Unit (OTU) (Yi). The metadata that are selected by boosting
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
39 are then used in a general linear model, with each combination of
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
40 metadata (as predictors) and OTU abundance (as response
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
41 variables). This occurs for every OTU and metadata combination. Given
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
42 we work with proportional data, the Yi (abundances) are
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
43 `arcsin(sqrt(Yi))` transformed. A final formula is as follows:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
44
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
45 ![](https://bitbucket.org/biobakery/maaslin/downloads/maaslinformula2.png)
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
46
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
47 For more information about maaslin please visit
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
48 [http://huttenhower.sph.harvard.edu/maaslin](http://huttenhower.sph.harvard.edu/maaslin).
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
49
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
50
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
51 # B. Related Projects and Scripts
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
52
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
53 Other projects exist at www.bitbucket.com that may help in your
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
54 analysis:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
55
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
56 * **QiimeToMaAsLin** is a project that reformats abundance files from
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
57 Qiime for MaAsLin. Several formats of Qiime consensus lineages are
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
58 supported for this project. To download please visit
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
59 [https://bitbucket.org/timothyltickle/qiimetomaaslin](https://bitbucket.org/timothyltickle/qiimetomaaslin).
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
60
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
61 * **merge_metadata.py** is a script included in the MaAsLin project to
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
62 generically merge a metadata file with a table of microbial (or
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
63 other) measurements. This script is located in `maaslin/src` and
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
64 is documented in `maaslin/doc/ Merge_Metadata_Read_Me.txt`.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
65
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
66
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
67 # C. Installing MaAsLin
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
68
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
69 R Libraries: Several libraries need to be installed in R these are
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
70 the following:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
71
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
72 * agricolae, gam, gamlss, gbm, glmnet, inlinedocs, logging, MASS, nlme, optparse, outliers, penalized, pscl, robustbase, testhat, vegan
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
73
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
74 You can install them by typing R in a terminal and using the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
75 install.packages command:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
76
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
77 install.packages(c('agricolae', 'gam', 'gamlss', 'gbm', 'glmnet', 'inlinedocs', 'logging', 'MASS', 'nlme', 'optparse', 'outliers', 'penalized', 'pscl', 'robustbase', 'testthat'))
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
78
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
79 # D. MaAsLin Inputs
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
80
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
81 There are 3 input files for each project, the "\*.read.config" file, the "\*.pcl" file, and the "\*.R" script. (If using the sfle automated pipeline, the "\*" in the file names can be anything but need to be identical for all three files. All three files need to be in the `../sfle/input/maasalin/input` folder only if using sfle). Details of each file follow:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
82
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
83 ### 1\. "\*.pcl"
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
84
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
85 Required input file. A PCL file is the file that contains all the data
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
86 and metadata. This file is formatted so that metadata/data (otus or
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
87 bugs) are rows and samples are columns. All metadata rows should come
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
88 first before any abundance data. The file should be a tab delimited
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
89 text file with the extension ".pcl".
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
90
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
91 ### 2\. "\*.read.config"
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
92
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
93 Required input file. A read config file allows one to indicate what data is read from a PCL file without having to change the pcl file or change code. This means one can have a pcl file which is a superset of metadata and abundances which includes data you are not interested in for the run. This file is a text file with ".read.config" as an extension. This file is later described in detail in section **F. Process Flow Overview** subsection **4. Create your read.config file**.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
94
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
95 ### 3\. "\*.R"
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
96
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
97 Optional input file. The R script file is using a call back programming pattern that allows one to add/modify specific code to customize analysis without touching the main MaAsLin engine. A generic R script is provided "maaslin_demo2.R" and can be renamed and used for any study. The R script can be modified to add quality control or formatting of data, add ecological measurements, or other changes to the underlying data before MaAsLin runs on it. This file is not required to run MaAsLin.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
98
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
99 # E. Process Flow Overview
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
100
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
101 1. Obtain your abundance or relative function table.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
102 2. Obtain your metadata.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
103 3. Format and combine your abundance table and metadata as a pcl file for MaAsLin.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
104 4. Create your read.config file.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
105 5. Create your R script or use the default.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
106 6. Place .pcl, .read.config, .R files in `../sfle/input/maaslin/input/` (sfle only)
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
107 7. Run
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
108 8. Discover amazing associations in your results!
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
109
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
110 # F. Process Flow Detail
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
111
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
112 ### 1\. Obtain your abundance or relative function table.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
113
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
114 Abundance tables are normally derived from sequence data using
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
115 *Mothur*, *Qiime*, *HUMAnN*, or *MetaPhlAn*. Please refer to their documentation
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
116 for further details.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
117
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
118 ### 2\. Obtain your metadata.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
119
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
120 Metadata would be information about the samples in the study. For
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
121 instance, one may analyze a case / control study. In this study, you
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
122 may have a disease and healthy group (disease state), the sex of the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
123 patents (patient demographics), medication use (chemical treatment),
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
124 smoking (patient lifestyle) or other types of data. All aforementioned
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
125 data would be study metadata. This section can have any type of data
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
126 (factor, ordered factor, continuous, integer, or logical
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
127 variables). If a particular data is missing for a sample for a
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
128 metadata please write NA. It is preferable to write NA so that, when
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
129 looking at the data, it is understood the metadata is missing and it's
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
130 absence is intentional and not a mistake. Often investigators are
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
131 interested in genetic measurements that may also be placed in the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
132 metadata section to associate to bugs.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
133
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
134 If you are not wanting to manually add metadata to your abundance
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
135 table, you may be interested in associated tools or scripts to help
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
136 combine your abundance table and metadata to create your pcl
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
137 file. Both require a specific format for your metadata file. Please
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
138 see the documentation for *QiimeToMaaslin* or *merge_metadata.py* (for
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
139 more details see section B).
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
140
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
141 ### 3\. Format and combine your abundance table and metadata as a pcl
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
142 file for *MaAsLin*.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
143
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
144 Please note two tools have been developed to help you! If you are
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
145 working from a Qiime OTU output and have a metadata text file try using
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
146 *QiimeToMaaslin* found at bitbucket. If you have a tab delimited file
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
147 which matches the below .pcl description (for instance MetaPhlAn
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
148 output) use the merge_metadata.py script provided in this project
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
149 (`maaslin/src/merge_metadata.py`) and documented in
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
150 `maaslin/doc/Merge_Metadata_Read_Me.txt`.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
151
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
152 ###PCL format description:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
153
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
154 i. Row 1 is expected to be sample IDs beginning the first column with a feature name to identify the row, for example "ID".
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
155
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
156 ii. Rows of metadata. Each row is one metadata, the first column entry
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
157 being the name of the metadata and each following column being the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
158 metadata value for that each sample.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
159
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
160 iii. Row of taxa/otu abundance. Each row is one taxa/otu, the first
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
161 column entry being the name of the taxa/otu followed by abundances of
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
162 the taxa/otu per sample.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
163
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
164 iv. Abundances should be normalized by dividing each abundance measurement by the sum of the column (sample) abundances.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
165
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
166 v. Here is an example of the contents of an extremely small pcl file;
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
167 another example can be found in this project at
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
168 `maaslin/input/maaslin_demo.pcl`.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
169
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
170
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
171 ID Sample1 Sample2 Sample3 Sample4
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
172 metadata1 True True False False
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
173 metadata2 1.23 2.34 3.22 3.44
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
174 metadata3 Male Female Male Female
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
175 taxa1 0.022 0.014 0.333 0.125
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
176 taxa2 0.406 0.029 0.166 0.300
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
177 taxa3 0.571 0.955 0.500 0.575
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
178
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
179
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
180 ### 4\. Create your read.config file.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
181
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
182 A *.read.config file is a structured text file used to indicate which
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
183 data in a *.pcl file should be read into MaAsLin and used for
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
184 analysis. This allows one to keep their *.pcl file intact while
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
185 varying analysis. Hopefully, this avoids errors that may be introduced
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
186 while manipulating the pcl files.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
187
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
188 Here is an example of the contents of a *.read.config file.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
189
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
190 Matrix: Metadata
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
191 Read_PCL_Columns: Sample2-Sample15
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
192 Read_PCL_Rows: Age-Height,Weight,Sex,Cohort-Profession
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
193
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
194 Matrix: Abundance
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
195 Read_PCL_Columns: Sample2-Sample15
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
196 Read_PCL_Rows: Bacteria-Bug100
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
197
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
198 The minimal requirement for a MaAsLin .read.config file is as
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
199 follows. The Matrix: should be specified. Metadata needs to be named
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
200 "Metadata" for the metadata section and "Abundance" for the abundance
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
201 section. “Read\_PCL\_Rows:” is used to indicate which rows are data or
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
202 metadata to be analyzed. Rows can be identified by their metadata/data
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
203 id. Separate ids by commas. If there is a consecutive group of
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
204 metadata/data a range of rows can be defined by indicating the first
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
205 and last id separated by a “-“. If the beginning or ending id is
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
206 missing surrounding an “–“, the rows are read from the beginning or to
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
207 the end of the pcl file, respectively.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
208
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
209 A minimal example is shown here:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
210
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
211 Matrix: Metadata
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
212 Read\_PCL\_Rows: -Weight
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
213
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
214 Matrix: Abundance
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
215 Read_PCL_Rows: Bacteria-
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
216
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
217 With this minimal example, the delimiter of the file is assumed to be
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
218 a tab, all columns are read (since they are not indicated
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
219 here). Metadata are read as all rows from the beginning of the pcl
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
220 file (skipping the first Sample ID row) to Weight; all data are read
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
221 as all rows from Bacteria to the end of the pcl file. This example
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
222 refers to the default input files given in the MaAsLin download as
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
223 maaslin_demo2.\*.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
224
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
225 ### 5\. Optionally, create your R script.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
226
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
227 The R script is used to add code that manipulates your data before
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
228 analysis, and for manipulating the multifactoral analysis figure. A
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
229 default “*.R” script is available with the default MaAsLin project at
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
230 maaslin/input/maaslin_demo2.R. This is an expert option and should
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
231 only be used by someone very comfortable with the R language.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
232
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
233 ###6. Optional step if using the sfle analysis pipeline. Place .pcl, .read.config, and optional .R files in `../sfle/input/maasalin/input`
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
234
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
235 ###7. Run.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
236
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
237 By running the commandline script:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
238 On the commandline call the Maaslin.R script. Please refer to the help (-h, --help) for command line options. If running from commandline, the PCL file will need to be transposed. A script is included in Maaslin for your convenience (src/transpose.py). The following example will have such a call included. An example call from the Maaslin folder for the demo data could be as follows.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
239
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
240 ./src/transpose.py < input/maaslin_demo2.pcl > maaslin_demo2.tsv
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
241 ./src/Maaslin.R -i input/maaslin_demo2.read.config demo.text maaslin_demo2.tsv
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
242
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
243 When using sfle:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
244 Go to ../sfle and type the following: scons output/maaslin
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
245
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
246 ###8. Discover amazing associations in your results!
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
247
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
248
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
249 #G. Expected Output Files
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
250
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
251 The following files will be generated per MaAsLin run. In the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
252 following listing the term projectname refers to what you named your "\*.pcl" file without the extension.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
253
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
254 ###Output files that are always created:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
255
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
256 **projectname_log.txt**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
257
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
258 This file contains the detail for the statistical engine. This can be
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
259 useful for detailed troubleshooting.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
260
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
261 **projectname-metadata.txt**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
262
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
263 Each metadata will have a file of associations. Any associations
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
264 indicated to be performed after initial variable selection (boosting)
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
265 is recorded here. Included are the information from the final general
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
266 linear model (performed after the boosting) and the FDR corrected
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
267 p-value (q-value). Can be opened as a text file or spreadsheet.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
268
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
269 **projectname-metadata.pdf**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
270
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
271 Any association that had a q-value less than or equal to the given
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
272 significance threshold will be plotted here (default is 0.25; can be
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
273 changed using the commandline argument -d). If this file does not
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
274 exist, the projectname-metadata.txt should not have an entry that is
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
275 less than or equal to the threshold. Factor data is plotted as
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
276 knotched box plots; continuous data is plotted as a scatter plot with
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
277 a line of best fit. Two plots are given for MaAslin Methodology; the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
278 left being a raw data plot, the right being a corresponding partial
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
279 residual plot.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
280
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
281 **projectname.pdf**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
282
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
283 Contains the biplot visualization. This visualization is presented as a build and can be affected by modifications in the R.script or by using commandline.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
284
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
285 **projectname.txt**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
286
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
287 A collection of all entries in the projectname-metadata.pdf. Can be
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
288 opened as a text file or spreadsheet.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
289
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
290 ###Additional troubleshooting files when the commandline:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
291
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
292 **data.tsv**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
293
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
294 The data matrix that was read in (transposed). Useful for making sure
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
295 the correct data was read in.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
296
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
297 **data.read.config**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
298
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
299 Can be used to read in the data.tsv.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
300
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
301 **metadata.tsv**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
302
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
303 The metadata that was read in (transposed). Useful for making sure the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
304 correct metadata was read in.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
305
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
306 **metadata.read.config**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
307
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
308 Can be used to read in the data.tsv.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
309
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
310 **read_merged.tsv**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
311
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
312 The data and metadata merged (transposed). Useful for making sure the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
313 merging occurred correctly.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
314
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
315 **read_merged.read.config**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
316
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
317 Can be used to read in the read_merged.tsv.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
318
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
319 **read_cleaned.tsv**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
320
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
321 The data read in, merged, and then cleaned. After this process the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
322 data is written to this file for reference if needed.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
323
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
324 **read_cleaned.read.config**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
325
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
326 Can be used to read in read_cleaned.tsv.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
327
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
328 **ProcessQC.txt**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
329
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
330 Contains quality control for the MaAsLin analysis. This includes
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
331 information on the magnitude of outlier removal.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
332
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
333 **Run_Parameters.txt**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
334 Contains an account of all the options used when running MaAsLin so the exact methodology can be recreated if needed.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
335
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
336 #H. Other Analysis Flows
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
337
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
338 ###1. All verses All
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
339 The all verses all analysis flow is a way of manipulating how metadata are used. In this method there is a group of metadata that are always evaluated, as well there are a group that are added to this one at a time. To give a more concrete example: You may have metadata cage, diet, and treatment. You may always want to have the association of abundance evaluated controlling for cage but otherwise looking at the metadata one at a time. In this way the cage metadata is the \D2forced\D3 part of the evaluation while the others are not forced and evaluated in serial. The appropriate commandline to indicate this follows (placed in your args file if using sfle, otherwise added in the commandline call):
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
340
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
341 > -a -F cage
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
342
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
343 -a indicates all verses all is being used, -F indicates which metadata are forced (multiple metadata can be given comma delimited as shown here -F metadata1,metadata2,metadata3). This does not bypass the feature selection method so the metadata that are not forced are subject to feature selection and may be removed before coming to the evaluation. If you want all the metadata that are not forced to be evaluated in serial you will need to turn off feature selection and will have a final combined commandline as seen here:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
344
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
345 > -a -F cage -s none
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
346
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
347 #I. Troubleshooting
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
348
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
349 ###1\. (Only valid if using Sfle) ImportError: No module named sfle
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
350
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
351 When using the command "scons output/maaslin/..." to run my projects I
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
352 get the message:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
353
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
354 ImportError: No module named sfle:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
355 File "/home/user/sfle/SConstruct", line 2:
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
356 import sfle
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
357
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
358 **Solution:** You need to update your path. On a linux or MacOS terminal
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
359 in the sfle directory type the following.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
360
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
361 export PATH=/usr/local/bin:`pwd`/src:$PATH
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
362 export PYTHONPATH=$PATH
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
363
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
364
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
365 ###2\. When trying to run a script I am told I do not have permission
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
366 even though file permissions have been set for myself.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
367
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
368 **Solution:** Most likely, you need to set the main MaAsLin script
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
369 (Maaslin.R) to executable.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
370
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
371 #J. Installation as an Automated Pipeline
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
372
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
373 SflE (pronounced souffle), is a framework for automation and
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
374 parallelization on a multiprocessor machine. MaAsLin has been
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
375 developed to be compatible with this framework. More information can
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
376 be found at
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
377 [http://huttenhower.sph.harvard.edu/sfle](http://huttenhower.sph.harvard.edu/sfle). If
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
378 interested in installing MaAsLin in a SflE environment. After
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
379 installing SflE, download or move the complete maaslin directory into
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
380 `sfle/input`. After setting up, one places all maaslin input files in
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
381 `sfle/input/maaslin/input`. To run the automated pipeline and analyze
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
382 all files in the `sfle/input/maaslin/input` directory, type: `scons output/maaslin`
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
383 in a terminal in the sfle directory. This will produce
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
384 output in the `sfle/output/maaslin` directory.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
385
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
386 #K. Commandline Options (Modifying Process and Figures)
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
387
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
388 Although we recommend the use of default options, commandline
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
389 arguments exist to modify both MaAsLin methodology and figures. To see
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
390 an up-to-date listing of argument usage, in a terminal in the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
391 `maaslin/src` directory type `./Maaslin.R -h`.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
392
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
393 An additional input file (the args file) can be used to apply
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
394 commandline arguments to a MaAsLin run. This is useful when using
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
395 MaAsLin as an automated pipeline (using SflE) and is a way to document
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
396 what commandline are used for different projects. The args file should
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
397 be named the same as the *.pcl file except using the extension .args
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
398 . This file should be placed in the `maaslin/input` directory with the
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
399 other matching project input files. In this file please have one line
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
400 of arguments and values (if needed; some arguments are logical flags
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
401 and do not require a value), each separated by a space. The contents
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
402 of this file will be directly added to the commandline call for
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
403 Maaslin.R. An example of the contents of an args file is given here.
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
404
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
405 **Example.args:**
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
406
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
407 -v DEBUG -d 0.1 -b 5
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
408
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
409 In this example MaAsLin is modified to produce verbose output for
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
410 debugging (-v DEBUG), to change the threshold for making pdfs to a
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
411 q-value equal to or less than 0.1 (-d 0.1), and to plot
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
412 5 data (bug) features in the biplot (-b 5).
a87d5a5f2776 Uploaded the version running on the prod server
george-weingart
parents:
diff changeset
413