view README @ 3:42c89c1bbda9 default tip

Update output labels
author Jim Johnson <jj@umn.edu>
date Mon, 15 Jun 2015 15:44:54 -0500
parents cec60c540546
children
line wrap: on
line source

Inputs:

- A tabular file that contains a column with a peptide sequence and a column with an identifier for a reference sequence 
- fasta files for the reference sequences
- gff or gtf for mapping the reference sequences to a genome
- reference genome fasta 

Ensembl transcript_id 	files:  Homo_sapiens.GRCh37.71.gtf,GRCh37.fa
  transcript   gtf+reference
  map peptide to 3-frame translation of transcript
  map to reference genome with ensembl gtf

ECGene  ec_id           files:  ECgene_hg18_b1_low.fa,GRCh37.fa 
  transcript from ecgene.fa 
  map peptide to 3-frame translation of transcript
  map transcript to reference genome with blat
  
Augustus id  		files:  ssc10.2.RNA.hints.augustus.fa, ssc10.2.RNA.hints.augustus.gff
  map peptide to augustus protien fasta
  map to reference genome with GFF3 

EEJ			files:  Homo_sapiens.GRCh37.71.gtf,eej_sus_scrofa_core_70_102.fa
  map peptide to eej fasta
  parse id to find exon names and junc_pos
  map  to reference genome with  exon_id in ensembl GTF  


Output:
a GFF3 file that specifies the position of the peptide in a reference genome


Mapping:
  find transcript in cDNA fasta:
  find transcript in translated fasta:


  peptide to transcript:
   translate transcript to animo acid sequence and search for peptide
   tblastn
   Biopython

  transcript to genome:
    If the fasta id lines contain the genomic mapping, use that
    Map transcript to reference genome with BLAT
    see if peptide cross exon boundaries