0
|
1 Inputs:
|
|
2
|
|
3 - A tabular file that contains a column with a peptide sequence and a column with an identifier for a reference sequence
|
|
4 - fasta files for the reference sequences
|
|
5 - gff or gtf for mapping the reference sequences to a genome
|
|
6 - reference genome fasta
|
|
7
|
|
8 Ensembl transcript_id files: Homo_sapiens.GRCh37.71.gtf,GRCh37.fa
|
|
9 transcript gtf+reference
|
|
10 map peptide to 3-frame translation of transcript
|
|
11 map to reference genome with ensembl gtf
|
|
12
|
|
13 ECGene ec_id files: ECgene_hg18_b1_low.fa,GRCh37.fa
|
|
14 transcript from ecgene.fa
|
|
15 map peptide to 3-frame translation of transcript
|
|
16 map transcript to reference genome with blat
|
|
17
|
|
18 Augustus id files: ssc10.2.RNA.hints.augustus.fa, ssc10.2.RNA.hints.augustus.gff
|
|
19 map peptide to augustus protien fasta
|
|
20 map to reference genome with GFF3
|
|
21
|
|
22 EEJ files: Homo_sapiens.GRCh37.71.gtf,eej_sus_scrofa_core_70_102.fa
|
|
23 map peptide to eej fasta
|
|
24 parse id to find exon names and junc_pos
|
|
25 map to reference genome with exon_id in ensembl GTF
|
|
26
|
|
27
|
|
28 Output:
|
|
29 a GFF3 file that specifies the position of the peptide in a reference genome
|
|
30
|
|
31
|
|
32 Mapping:
|
|
33 find transcript in cDNA fasta:
|
|
34 find transcript in translated fasta:
|
|
35
|
|
36
|
|
37 peptide to transcript:
|
|
38 translate transcript to animo acid sequence and search for peptide
|
|
39 tblastn
|
|
40 Biopython
|
|
41
|
|
42 transcript to genome:
|
|
43 If the fasta id lines contain the genomic mapping, use that
|
|
44 Map transcript to reference genome with BLAT
|
|
45 see if peptide cross exon boundaries
|
|
46
|