Mercurial > repos > earlhaminst > geneseqtofamily

--- a/readme.md	Fri May 12 11:50:14 2017 -0400
+++ b/readme.md	Thu May 18 14:09:32 2017 -0400
@@ -5,14 +5,42 @@

 GeneSeqToFamily is an open-source Galaxy workflow based on the [Ensembl GeneTrees](http://www.ensembl.org/info/genome/compara/homology_method.html) pipeline. The Ensembl GeneTrees pipeline [1] infers the evolutionary history of gene families, represented as gene trees. It is a computational pipeline that comprises clustering, multiple sequence alignment, and tree generation (using [TreeBeST](http://treesoft.sourceforge.net/treebest.shtml)), to discover familial relationship.

+## Installation
+
+To use this workflow, please [install](https://galaxyproject.org/admin/tools/add-tool-from-toolshed-tutorial/) the required tools (listed below) into Galaxy from the Galaxy ToolShed. Also [install and import](https://galaxyproject.org/toolshed/workflow-sharing/#finding-workflows-in-toolshed-repositories) the workflow from the Galaxy ToolShed.
+
+### List of required tools
+The 3 workflows in this repository requires Galaxy tools from the following ToolShed repositories:
+
+* [emboss_5](https://toolshed.g2.bx.psu.edu/view/devteam/emboss_5/)
+* [ncbi_blast_plus](https://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/)
+* [blast_parser](https://toolshed.g2.bx.psu.edu/view/earlhaminst/blast_parser/)
+* [hcluster_sg](https://toolshed.g2.bx.psu.edu/view/earlhaminst/hcluster_sg/)
+* [hcluster_sg_parser](https://toolshed.g2.bx.psu.edu/view/earlhaminst/hcluster_sg_parser/)
+* [t_coffee](https://toolshed.g2.bx.psu.edu/view/earlhaminst/t_coffee/)
+* [filter_by_fasta_ids](https://toolshed.g2.bx.psu.edu/view/galaxyp/filter_by_fasta_ids/)
+* [treebest_best](https://toolshed.g2.bx.psu.edu/view/earlhaminst/treebest_best)
+* [gafa](https://toolshed.g2.bx.psu.edu/view/earlhaminst/gafa/)
+* [fasta_to_tabular](https://toolshed.g2.bx.psu.edu/view/devteam/fasta_to_tabular/)
+* [text_processing](https://toolshed.g2.bx.psu.edu/view/bgruening/text_processing/)
+* [uniprot_rest_interface](https://toolshed.g2.bx.psu.edu/view/bgruening/uniprot_rest_interface/)
+* [suite_ensembl_rest](https://toolshed.g2.bx.psu.edu/view/earlhaminst/suite_ensembl_rest/)
+
+Helper tools for data preparation:
+
+* [ensembl_longest_cds_per_gene](https://toolshed.g2.bx.psu.edu/view/earlhaminst/ensembl_longest_cds_per_gene/)
+* [ete](https://toolshed.g2.bx.psu.edu/view/earlhaminst/ete/)
+* [gstf_preparation](https://toolshed.g2.bx.psu.edu/view/earlhaminst/gstf_preparation/) - to convert gene feature files from GFF3 and/or JSON format to SQLite and format CDS sequence headers
+
+
 ## Workflow inputs and steps

 ### Inputs
 GeneSeqToFamily requires the following inputs:

-* The Coding Sequence (CDS)
-* a species tree
-* gene feature information in JSON format
+* the coding sequences (CDS) in FASTA format (this can be achieved with GeneSeqToFamily preparation tool)
+* gene feature information in SQLite format (this can be achieved with GeneSeqToFamily preparation tool)
+* a species tree in Newick format (this can be generated by ete tool in Galaxy)

 ### Steps

@@ -29,7 +57,7 @@

 ### Helper tools:

-We have developed various tools to help with data preparation for the workflow. This includes tools for retrieving sequences, and features from Ensembl using its REST API, and tools to parse Ensembl results into the required formats for the workflow. We also developed a tool to merge gene feature files and convert them from GFF3 (Gene Feature File) to JSON format, which is then used to generate the Aequatus dataset.
+We have developed various tools to help with data preparation for the workflow. This includes tools for retrieving sequences, and features from Ensembl using its REST API, and tools to parse Ensembl results into the required formats for the workflow. We also developed a tool to merge gene feature files and convert them from GFF3 (Gene Feature File) and/or JSON format to SQLite, which is then used to generate the Aequatus dataset.


 ## Results
@@ -38,28 +66,7 @@

 The Aequatus.js plugin provides an interactive visual representation of the phylogenetic and structural relationships among the homologous genes, using a shared colour scheme for coding regions to represent homology in internal gene structure alongside their corresponding gene trees. It is also able to indicate insertions and deletions in homologous genes with respect to shared ancestors.

-## List of tools
-GeneSeqToFamily requires the following tools to run the workflow successfully:

-* Transeq
-* Filter by FASTA IDs
-* BLAST
-* BLAST parser
-* hcluster_sg
-* hcluster_sg parser
-* T-Coffee
-* Tranalign
-* TreeBeST
-* Gene Alignment and Family Aggregator (GAFA)
-
-Some tools for data conversion during workflow:
-
-* cut
-
-Helper tools for data preparation:
-
-* Ensembl REST API - tools for retrieving sequences, and features from Ensembl using its [REST API](http://rest.ensembl.org/)
-* gff3-to-json - to merge gene feature files and convert them from GFF3 (Gene Feature File) to JSON format


 ## References
@@ -67,10 +74,14 @@
 1. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E (2009) [EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.](http://genome.cshlp.org/content/19/2/327) *Genome Res.* 19(2):327–335, doi: 10.1101/gr.073585.107
 2. Thanki AS, Ayling S, Herrero J, Davey RP (2016) [Aequatus: An open-source homology browser.](http://biorxiv.org/content/early/2016/06/01/055632) *bioRxiv*, doi: 10.1101/055632

+## Pre-print
+
+Pre-print for this work can be found at [bioRxiv server](http://biorxiv.org/content/early/2017/04/19/096529)
+
 ## Project contacts:

 * Anil Thanki <Anil.Thanki@earlham.ac.uk>
 * Nicola Soranzo <Nicola.Soranzo@earlham.ac.uk>
 * Robert Davey <Robert.Davey@earlham.ac.uk>

-Copyright &copy; 2016 Earlham Institute, Norwich, UK
+Copyright &copy; 2016-2017 Earlham Institute, Norwich, UK