diff README @ 0:cec60c540546

Uploaded
author galaxyp
date Wed, 26 Jun 2013 15:56:16 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/README	Wed Jun 26 15:56:16 2013 -0400
@@ -0,0 +1,46 @@
+Inputs:
+
+- A tabular file that contains a column with a peptide sequence and a column with an identifier for a reference sequence 
+- fasta files for the reference sequences
+- gff or gtf for mapping the reference sequences to a genome
+- reference genome fasta 
+
+Ensembl transcript_id 	files:  Homo_sapiens.GRCh37.71.gtf,GRCh37.fa
+  transcript   gtf+reference
+  map peptide to 3-frame translation of transcript
+  map to reference genome with ensembl gtf
+
+ECGene  ec_id           files:  ECgene_hg18_b1_low.fa,GRCh37.fa 
+  transcript from ecgene.fa 
+  map peptide to 3-frame translation of transcript
+  map transcript to reference genome with blat
+  
+Augustus id  		files:  ssc10.2.RNA.hints.augustus.fa, ssc10.2.RNA.hints.augustus.gff
+  map peptide to augustus protien fasta
+  map to reference genome with GFF3 
+
+EEJ			files:  Homo_sapiens.GRCh37.71.gtf,eej_sus_scrofa_core_70_102.fa
+  map peptide to eej fasta
+  parse id to find exon names and junc_pos
+  map  to reference genome with  exon_id in ensembl GTF  
+
+
+Output:
+a GFF3 file that specifies the position of the peptide in a reference genome
+
+
+Mapping:
+  find transcript in cDNA fasta:
+  find transcript in translated fasta:
+
+
+  peptide to transcript:
+   translate transcript to animo acid sequence and search for peptide
+   tblastn
+   Biopython
+
+  transcript to genome:
+    If the fasta id lines contain the genomic mapping, use that
+    Map transcript to reference genome with BLAT
+    see if peptide cross exon boundaries
+