Mercurial > repos > galaxyp > peptide_to_gff

Inputs:

- A tabular file that contains a column with a peptide sequence and a column with an identifier for a reference sequence
- fasta files for the reference sequences
- gff or gtf for mapping the reference sequences to a genome
- reference genome fasta

Ensembl transcript_id 	files:  Homo_sapiens.GRCh37.71.gtf,GRCh37.fa
  transcript   gtf+reference
  map peptide to 3-frame translation of transcript
  map to reference genome with ensembl gtf

ECGene  ec_id           files:  ECgene_hg18_b1_low.fa,GRCh37.fa
  transcript from ecgene.fa
  map peptide to 3-frame translation of transcript
  map transcript to reference genome with blat

Augustus id  		files:  ssc10.2.RNA.hints.augustus.fa, ssc10.2.RNA.hints.augustus.gff
  map peptide to augustus protien fasta
  map to reference genome with GFF3

EEJ			files:  Homo_sapiens.GRCh37.71.gtf,eej_sus_scrofa_core_70_102.fa
  map peptide to eej fasta
  parse id to find exon names and junc_pos
  map  to reference genome with  exon_id in ensembl GTF


Output:
a GFF3 file that specifies the position of the peptide in a reference genome


Mapping:
  find transcript in cDNA fasta:
  find transcript in translated fasta:


  peptide to transcript:
   translate transcript to animo acid sequence and search for peptide
   tblastn
   Biopython

  transcript to genome:
    If the fasta id lines contain the genomic mapping, use that
    Map transcript to reference genome with BLAT
    see if peptide cross exon boundaries
author	Jim Johnson <jj@umn.edu>
date	Mon, 15 Jun 2015 15:44:54 -0500 (2015-06-15)
parents	cec60c540546
children