Mercurial > repos > galaxyp > peptide_to_gff
diff README @ 0:cec60c540546
Uploaded
author | galaxyp |
---|---|
date | Wed, 26 Jun 2013 15:56:16 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README Wed Jun 26 15:56:16 2013 -0400 @@ -0,0 +1,46 @@ +Inputs: + +- A tabular file that contains a column with a peptide sequence and a column with an identifier for a reference sequence +- fasta files for the reference sequences +- gff or gtf for mapping the reference sequences to a genome +- reference genome fasta + +Ensembl transcript_id files: Homo_sapiens.GRCh37.71.gtf,GRCh37.fa + transcript gtf+reference + map peptide to 3-frame translation of transcript + map to reference genome with ensembl gtf + +ECGene ec_id files: ECgene_hg18_b1_low.fa,GRCh37.fa + transcript from ecgene.fa + map peptide to 3-frame translation of transcript + map transcript to reference genome with blat + +Augustus id files: ssc10.2.RNA.hints.augustus.fa, ssc10.2.RNA.hints.augustus.gff + map peptide to augustus protien fasta + map to reference genome with GFF3 + +EEJ files: Homo_sapiens.GRCh37.71.gtf,eej_sus_scrofa_core_70_102.fa + map peptide to eej fasta + parse id to find exon names and junc_pos + map to reference genome with exon_id in ensembl GTF + + +Output: +a GFF3 file that specifies the position of the peptide in a reference genome + + +Mapping: + find transcript in cDNA fasta: + find transcript in translated fasta: + + + peptide to transcript: + translate transcript to animo acid sequence and search for peptide + tblastn + Biopython + + transcript to genome: + If the fasta id lines contain the genomic mapping, use that + Map transcript to reference genome with BLAT + see if peptide cross exon boundaries +