comparison tutorial.md @ 6:266800d51605 draft

"planemo upload for repository https://github.com/TGAC/earlham-galaxytools/tree/master/workflows/GeneSeqToFamily commit 3dbeddc06c9d15aadcc66a7eb7376c29da9233a3"
author earlhaminst
date Thu, 10 Jun 2021 16:18:59 +0000
parents
children
comparison
equal deleted inserted replaced
5:06470b2e491f 6:266800d51605
1 # Introduction
2
3 This tutorial explains how to proficiently use the GeneSeqToFamily Galaxy workflow, published in the paper Thanki et al. (2018) "GeneSeqToFamily: a Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline", https://doi.org/10.1093/gigascience/giy005
4
5 ## Galaxy
6 If you are new to Galaxy then get familiarised with Galaxy using [slides](https://training.galaxyproject.org/training-material/topics/introduction/slides/introduction.html#1) and [hands-on](https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html).
7
8 ## GeneSeqToFamily workflow
9
10 The GeneSeqToFamily workflow can be either installed from the Galaxy ToolShed, or downloaded from https://github.com/TGAC/earlham-galaxytools/tree/master/workflows/GeneSeqToFamily and then imported into a local Galaxy or a public instance where the necessary tools are installed, e.g. [Galaxy Europe](https://usegalaxy.eu).
11
12
13 # Importing input data
14
15 ### Hands-on: Data upload
16 1. Make sure you have an empty analysis history. Give it a name.
17 ### Tip: Starting a new history
18 * Click the gear icon at the top of the history panel
19 * Select the option Create New from the menu
20
21
22 2. Import Sample Data
23 * FASTA file: [`CDS.fasta`](https://doi.org/10.5281/zenodo.1256760)
24 * JSON file: [`gene.json`](https://doi.org/10.5281/zenodo.1256762)
25 * Species tree: [`species.nhx`](https://doi.org/10.5281/zenodo.1256753)
26 ### Tip: Importing data via links
27 * Copy the link locations
28 * Open the Galaxy Upload Manager
29 * Select Paste/Fetch Data
30 * Paste the link into the text field
31 * Press Start
32 ### Tip: Change the file type text to nhx once the data file is in your history
33 Click on the pencil button displayed in your data file in the history
34 * Choose Datatype on the top
35 * Select nhx
36 * Press save
37
38 ###
39 Rename the dataset to “First dataset”
40
41 By default, when data is imported via its link, Galaxy names it with its URL.
42
43 # Data Preparation
44
45 To convert uploaded data into the format acceptable by GeneSeqToFamily workflow:
46
47 ## GeneSeqToFamily preparation
48 GeneSeqToFamily preparation is a Galaxy tool that converts genomic information from GFF/JSON format to SQLite format for easy access during the workflow. It can also add species information to the header line of the FASTA sequences.
49
50 ### Hands-on: GeneSeqToFamily preparation : Run GeneSeqToFamily preparation on the imported GFF/JSON and FASTA files
51 1. GeneSeqToFamily preparation
52 * Select JSON and/or GFFs files
53 * Add specific species name (in-case of GFFs)
54 * Corresponding CDS datasets in FASTA format: select all FASTA datasets
55 * Which transcripts to keep: Only canonical transcripts (or longest CDS per gene)
56 * Change the header line of the FASTA sequences to the following format: TranscriptId_species
57 * Comma-separated list of region IDs (e.g. chromosomes or scaffolds) for which FASTA sequences should be filtered:
58 * Run tool
59
60
61
62 # Running workflow
63
64 1. GeneSeqToFamily workflow
65 * Select the CDS dataset generated by the GeneSeqToFamily preparation tool
66 * Select Gene Feature information, SQLite generated using GeneSeqToFamily preparation tool
67 * Select species tree,
68 * Species tree can be generated using the `ete_species_tree` generator tool
69 * Run the workflow
70
71
72 # Visualisation
73
74 ## Aequatus visualisation Plugin
75
76 The SQLite database generated by the GAFA tool can be rendered using a new visualization plugin, Aequatus.js. The Aequatus.js library, developed as part of the Aequatus project, has been configured to be used within Galaxy to visualize homologous gene structure and gene family relationships.
77
78 ### Hands-on: Aequatus visualization plugin
79 1. Aequatus visualisation Plugin
80 * In the history panel, expand the dataset generated by the previous step
81 * Choose GeneTree from side panel
82 * Visualise different GeneTrees
83
84
85
86 # Conclusion
87
88
89 Here we covered the various steps of the GeneSeqToFamily workflow. In this tutorial we used the default parameters for the workflow steps. They might need to be changed for different sources of data.