annotate README.md @ 2:f537d3e00eb8 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 811025ef8d9f60f4b02d66f7218127669fb42e0b
author iuc
date Fri, 23 Jun 2017 04:26:07 -0400
parents cdc4d8a998e1
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
1 ##What it does##
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
2
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
3 This is a Galaxy datamanager for the rna STAR gap-aware RNA aligner. It's a hack of Dan Blankenberg's BWA data manager
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
4 and works on any fasta file you have already downloaded with the all fasta data manager - start there!
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
5
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
6 Warning - this is not well tested and there are some complexities to do with splice junction annotation in rna star
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
7 indexes - feedback welcomed. Send code.
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
8
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
9 Note, currently you'll need a small patch to prevent an error when you try to generate splice junction indexes described at
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
10 https://bitbucket.org/galaxy/galaxy-central/pull-request/510/fix-for-data-manager-failure-to-update-a#comment-3265356
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
11
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
12 Please read the fine manual - that and the google group are the places to learn about the options above.
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
13
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
14 *Note on sjdbOverhang*
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
15
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
16 From https://groups.google.com/forum/#!topic/rna-star/h9oh10UlvhI::
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
17
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
18 James is right, using large enough --sjdbOverhang is safer and should not generally cause any problems with reads of varying length. If your reads are very short, <50b, then I would strongly recommend using optimum --sjdbOverhang=mateLength-1
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
19 By mate length I mean the length of one of the ends of the read, i.e. it's 100 for 2x100b PE or 1x100b SE. For longer reads you can simply use generic --sjdbOverhang 100.
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
20 It is a bit confusing because of the way I named this parameter. --sjdbOverhang Noverhang is only used at the genome generation step for constructing the reference sequence out of the annotations.
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
21 Basically, the Noverhang exonic bases from the donor site and Noverhang exonic bases from the acceptor site are spliced together for each of the junctions, and these spliced sequences are added to the genome sequence.
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
22
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
23 At the mapping stage, the reads are aligned to both genomic and splice sequences simultaneously. If a read maps to one of spliced sequences and crosses the "junction" in the middle of it, the coordinates of two pspliced pieces are translated back to genomic space and added to the collection of mapped pieces, which are then all "stitched" together to form the final alignment. Since in the process of "maximal mapped length" search the read is split into pieces of no longer than --seedSearchStartLmax (=50 by default) bases, even if the read (mate) is longer than --sjdbOverhang, it can still be mapped to the spliced reference, as long as --sjdbOverhang > --seedSearchStartLmax.
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
24
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
25 Cheers
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
26 Alex
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
27
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
28 *Note on gene model requirements for splice junctions*
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
29
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
30 From https://groups.google.com/forum/#!msg/rna-star/3Y_aaTuzBrE/lUylTB8h5vMJ::
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
31
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
32 When you generate a genome with annotations, you need to specify --sjdbOverhang value, which ideally should be equal to (oneMateLength-1), or you could use a generic value of ~100.
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
33
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
34 Your gtf lines look fine to me. STAR needs 3 features from a GTF file:
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
35 1. Chromosome names in col.1 that agree with chromosome names in genome .fasta files. If you have "chr2L" names in the genome .fasta files, and "2L" in the .gtf file, then you need to use --sjdbGTFchrPrefix chr option.
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
36 2. 'exon' in col.3 for the exons of all transcripts (this name can be changed with --sjdbGTFfeatureExon)
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
37 3. 'transcript_id' attribute that assigns each exon to a transcript (--this name can be changed with --sjdbGTFtagExonParentTranscript)
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
38
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
39 Cheers
cdc4d8a998e1 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
iuc
parents:
diff changeset
40 Alex