neuma: NEUMA-1.2.1/README annotate

annotate NEUMA-1.2.1/README @ 0:c44c43d185ef draft default tip

NEUMA-1.2.1 Uploaded

author	chawhwa
date	Thu, 08 Aug 2013 00:46:13 -0400
parents
children

rev	line source
0 c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	1 ###########################
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	2 # README for NEUMA v1.2.0 #
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	3 ###########################
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	4
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	5 ## Additional information and files can be obtained from the NEUMA website, http://neuma.kobic.re.kr.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	6 ## Inquiries can be written to duplexa@gmail.com.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	7
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	8 # This version has removed the extra tab in the iNIR file. (version 1.2.1)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	9 # This version has fixed the problem of missing some reads at the 5'end in paired-end data. (Version 1.2.1)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	10
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	11 # This version has the following changes.(version 1.2.0)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	12 ## The intermediate cmbt and MA files are not produced and mapping stat, insertlendis and read counts are directly computed from bowtie output files. This drastically improves on memory and speed and uses less hard disk space.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	13 ## A subdirectory named 'readcount' will be generated in which gNIR, iNIR and gReadcount files will be placed.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	14 ## A new option of generating all read counts (not only gene-wise or isoform-wise informative reads) for each gene (--gReadcount option). It is possible to get only these numbers and not compute NIR (--noNIR), EUMA, FVKM and LVKM (--noNEUMA).
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	15 ## auto_NEUMA_PE.pl can be used either for initial run to determine maximum insert length (--only_init) for after-initial run (--skip_init) ,or both (default).
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	16 ## Mismatches are allowed (--mm), but the alignments are filtered for all best-matching alignments.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	17
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	18
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	19 # This version can handle the newer fastq format for paired-end case. (version 1.1.5)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	20 # This version can handle the newer fastq format with a space in the sequencd ID. (version 1.1.4)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	21 # This version includes two distinct strand-sepcificity options (S for forward and R for reverse strand). (version 1.1.3)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	22 # This version includes a script that generates merged LVKM file and NIR files that can be used for diffNEUMA, to identify differentially expression genes/isoforms. (version 1.1.2)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	23 # This version runs dos2unix for gene2NM and gene2symbol files in the beginning. (version 1.1.1)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	24 # This version handles strand-specific data. (version 1.1.1)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	25 # This version allows multi-thread option for bowtie. (version 1.1.0)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	26 # This version uses a different way to take argument. (usage has changed). (version 1.1.0)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	27 # This version fixed a bug in reading mapping stat file in the Ensembl mode. (version 1.0.5)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	28 # This version handles Ensembl data. (version 1.0.4)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	29 # This version handles SOLiD colorspace data. (version 1.0.3)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	30
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	31
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	32 ** Table of contents **
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	33
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	34 1. Installation
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	35 2. Bowtie
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	36 3. Ingredients
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	37 4. How to run
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	38 5. Output files
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	39 6. Preliminary run
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	40 7. Mergin output files
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	41
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	42 ***************************
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	43
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	44
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	45
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	46
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	47
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	48 1. Installation
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	49
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	50 No Installation is required. Simply, create a directory (eg. neumadir) and extract the .tar.gz file in the directory. Then, make all the perl scripts executable by the following command:
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	51
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	52 chmod a+x *.pl
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	53
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	54
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	55 2. Bowtie
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	56
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	57 Please download and install bowtie from the bowtie website http://bowtie-bio.sourceforge.net/index.shtml, following the developers' instruction. The reference index file must be created, using the same fasta file used for generating the gU and iU tables (gEUMA and iEUMA tables). The fasta file can be obtained from the NEUMA website.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	58
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	59
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	60
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	61 3. Ingredients
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	62
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	63 Before you run NEUMA, you need the following files.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	64
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	65 * a fasta file containing raw sequences (single-end)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	66 or a pair of fasta-files containing raw sequences, each mate pair having the same unique ID (paired-end).
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	67 * bowtie reference index file
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	68 * gEUMA and iEUMA tables (single-end)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	69 or gU table and iU table (paired-end)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	70 * gene2NM file
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	71 * gene2symbol file
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	72
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	73
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	74 Note that the gene2NM file must be matched to the initial reference fasta file that gU and iU tables (gEUMA and iEUMA tables) were created from. The gU and iU tables (gEUMA and iEUMA tables) along with gene2NM and gene2symbol files can be obtained from the NEUMA website.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	75
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	76
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	77
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	78 4. How to run
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	79
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	80 Two scripts, auto_NEUMA_PE.pl (paired-end) and auto_NEUMA_SE.pl (single-end) are what you need and these scripts run the other scripts automatically.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	81
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	82
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	83 ## paired-end case
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	84 usage: ./auto_NEUMA_PE.pl [options] -L=<read_length> -D=<maxdist> -1=<input_file1(mate1)> -2=<input_file2(mate2)> -U=<Utable_prefix(fullpath, before .gU.table or .iU.table)> --g2m=<gene2NM_file> --g2s=<gene2symbol_file> -b=<bowtie_dir(eg.bin/bowtie-0.12.5)> --bi=<bowtieindex> -o=<outputdir> -s=<sample_name>
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	85
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	86 The order of Arguments and options can be arbitrary.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	87
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	88 required arguments
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	89 * -L=<read_length> : read_length(eg.36) : sequenced length of a read mate (/not/ the insert length or sum of the two mate lengths) (no default : L must be specified)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	90 * -D=<maxdist> : maxdist(eg.400) : maximum outer distance between mates (insert size). This must be identical to the maxdist used for generating gU and iU tables. (no default : D must be specified)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	91 * -1=<input_file1(mate1)> : fasta or fastq file of a series of read mate 1
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	92 * -2=<input_file2(mate2)> : fasta or fastq file of a series of read mate 2. The ID of each mate 2 must be matched to that of mate 1 in case of fasta file.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	93 * -U=<Utable_prefix(fullpath, before .gU.table or .iU.table)> : the path to the gU.table and iU.table files, except their extension. If you placed your gU.table and iU.table files in /home/paired-end/Utable/ and the file names are hg19.refMrna.L36.D250.gU.table and hg19.refMrna.L36.D250.iU.table, then the value for this argument is '/home/paired-end/Utable/hg19.refMrna.L36.D250'. The files must be matched to the read length, maxdist and reference transcriptome sequence.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	94 * --g2m=<gene2NM_file> : gene2NM file, matched to the reference transcriptome model used for generating the gU and iU tables.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	95 * --g2s=<gene2symbol_file> : gene2symbol file, containing at least all of the genes in the reference transcriptome sequence.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	96 * -b=<bowtie_dir(eg.bin/bowtie-0.12.7)> : directory in which bowtie executable is installed. Either full path or relative path must be used and '~' must be avoided.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	97 * --bi=<bowtieindex> : the index file prefix of the reference transcriptome sequence, created by bowtie (bowtie-build). It is the same string put as the reference index argument for the bowtie program. (see bowtie manual)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	98 * -o=<outputdir> : a directory that will contain all the output files. If the directory does not exist, the program will create it automatically.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	99 * -s=<sample_name> : name of the sample that will be used as the prefix of all the output files.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	100
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	101 options
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	102 * -f=<file_type> : fasta(f)_or_fastq(q) : f if input files are in fasta format; q if in fastq format. (default : q)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	103 * -c=<coding_option> : nucleotide(n)_or_colorspace(c) : n if input files are in DNA sequence (A,C,G,T); c if in colorspace (SOLiD platform). Note that if colorspace is used, the colorspace version of bowtie index file must be used. (default : n)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	104 * -t=<euma_cut> : EUMAcut(eg.50) : The cut off of EUMA that determines measurable genes and transcripts. (default : 50)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	105 * -d=<data_type> : Refseq data(R) or Ensemble data(E). The R option uses the 'NM' and 'NR' prefices in RefSeq data to discriminate between mRNA and ncRNA. If your reference doesn't have these prefices, use the E option. (default : R)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	106 * -p=<num_cpu> : number of cpu's to use for Bowtie. (default : 1)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	107 * --str=<strand_specificity> : Strand-specific(S) vs Non-Strand-specific(N). For strand-specific data, strand-specific U tables must be used. (default : N)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	108 * --mm=<number_of_mismatches_allowed> : setting number of mismatches allowed (default : 0)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	109 * --gReadcount : Compute the number of all reads (not just gene-wise and isoform-wise informative reads) for each gene.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	110 * --noNIR : do not compute NIR. (This option must be used together with --noNEUMA)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	111 * --noNEUMA : do not compute EUMA, FVKM and LVKM (but compute NIR unless used with --noNIR).
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	112 * --only_init : run only the initial part of bowtie-mapping and insert-lendis calculation, preferably with a relatively large -D value (which is way over expected maximum, eg. 1000). A more detailed description can be found below in section 6.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	113 * --skip_init : given the maximum insert length to be used is decided, skip the initial bowtie mapping and insert-lendis part and use the output files from the previous run. (No need to run bowtie again with the new -D value because length filtering will be done at the downstream steps.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	114
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	115
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	116 ## single-end case
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	117 usage: ./auto_NEUMA_SE.pl [options] -L=<read_length> -i=<input_file> -U=<EUMA_prefix(fullpath, before .gEUMA or .iEUMA)> --g2m=<gene2NM_file> --g2s=<gene2symbol_file> -b=<bowtie_dir(eg.bin/bowtie-0.12.5)> --bi=<bowtieindex> -o=<outputdir> -s=<sample_name>
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	118
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	119 The order of Arguments and options can be arbitrary.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	120
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	121 required arguments
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	122 * -L=<read_length> : read_length(eg.36) : sequenced length of a read mate (/not/ the insert length or sum of the two mate lengths) (no default : L must be specified)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	123 * -i=<input_file> : fasta file of a series of sequenced reads
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	124 * -U=<EUMA_prefix(fullpath, before .gEUMA or .iEUMA)> : the path to the gEUMA and tEUMA files, except their extension. If you placed your gEUMA and iEUMA files in /home/single-end/EUMA/ and the file names are hg19.refMrna.L36.single.gEUMA and hg19.refMrna.L36.single.iEUMA, then the value for this argument is '/home/single-end/EUMA/hg19.refMrna.L36.single'.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	125 * --g2m=<gene2NM_file> : same as paired-end case
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	126 * --g2s=<gene2symbol_file> : same as paired-end case
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	127 * -b=<bowtiedir(eg.bin/bowtie-0.12.5)> : same as paired-end case
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	128 * --bi=<bowtieindex> : same as paired-end case
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	129 * -o=<outputdir> : same as paired-end case
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	130 * -s=<samplename> : same as paired-end case
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	131
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	132 options
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	133 * -f=<file_type> : fasta(f)_or_fastq(q) : f if input files are in fasta format; q if in fastq format. (default : q)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	134 * -c=<coding_option> : nucleotide(n)_or_colorspace(c) : n if input files are in DNA sequence (A,C,G,T); c if in colorspace (SOLiD platform). Note that if colorspace is used, the colorspace version of bowtie index file must be used. (default : n)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	135 * -t=<euma_cut> : EUMAcut(eg.50) : The cut off of EUMA that determines measurable genes and transcripts. (default : 50)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	136 * -d=<data_type> : Refseq data(R) or Ensemble data(E). The R option uses the 'NM' and 'NR' prefices in RefSeq data to discriminate between mRNA and ncRNA. If your reference doesn't have these prefices, use the E option. (default : R)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	137 * -p=<num_cpu> : number of cpu's to use for Bowtie. (default : 1)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	138 * --str=<strand_specificity> : Strand-specific, forward(S), strand-specific, reverse (R), and Non-Strand-specific(N). For strand-specific data, strand-specific EUMA tables must be used. (default : N)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	139 * --mm=<number_of_mismatches_allowed> : setting number of mismatches allowed (default : 0)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	140 * --gReadcount : Compute the number of all reads (not just gene-wise and isoform-wise informative reads) for each gene.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	141 * --noNIR : do not compute NIR. (This option must be used together with --noNEUMA)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	142 * --noNEUMA : do not compute EUMA, FVKM and LVKM (but compute NIR unless used with --noNIR).
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	143
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	144
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	145 Example usages are as follows:
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	146
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	147 # paired-end
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	148 neumadir/auto_NEUMA_PE.pl --only_init -f=f -L=36 -D=1000 --mm2 -1=MKN28.1.fa -2=MKN28.2.fa -U=U.tables/hg19.refMrna.L36.D250 --g2m=gene2NM.human.fastafiltered --g2s=gene2symbol.human -b=bin/bowtie-0.12.7 --bi=ebwt/hg19.RefmRNA -o=MKN28.hg19 -s=MKN28
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	149 # This command will run initial bowtie mapping, mapping stat and insert length distribution.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	150 neumadir/auto_NEUMA_PE.pl --skip_init --gReadcount --noNEUMA --mm2 -f=f -L=36 -D=250 -1=MKN28.1.fa -2=MKN28.2.fa -U=U.tables/hg19.refMrna.L36.D250 --g2m=gene2NM.human.fastafiltered --g2s=gene2symbol.human -b=bin/bowtie-0.12.7 --bi=ebwt/hg19.RefmRNA -o=MKN28.hg19 -s=MKN28
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	151 # This command will skip bowtie-mapping and insert length distribution and use the files from the previous (eg. above) run, and reports counts of all reads (gReadcount), gNIR and iNIR, and will not compute EUMA, FVKM, LVKM.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	152 # Only reads with insert length up to 250 will be used, although the bowtie output from the previous run contains larger insert lengths.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	153
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	154
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	155 # single-end
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	156 auto_NEUMA_SE.pl --noNEUMA --mm2 -L=36 -t=30 -p=2 -i=MKN28.txt -U=hg19.refMrna.L36.single --g2m=gene2NM.human.fastafiltered --g2s=gene2symbol.human -b=../bin/bowtie-0.12.5 --bi=ebwt/hg19.RefmRNA -o=MKN28.hg19 -s=MKN28
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	157 # This command will only do bowtie-mapping and reports mapping stat and gNIR and iNIR.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	158
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	159
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	160
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	161 5. OUTPUT Files
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	162
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	163 ## paired-end
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	164
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	165 For Refseq, the final log2(x+1) transformed values of FVKM (LVKM) can be found in outputdir/LVKM/samplename.ebwtname.maxinsMAX_DIST.mm0.-EUMAcut.-NR.gLVKM (gene-wise) and outputdir/LVKM/samplename.ebwtname.maxinsMAX_DIST.mm0.-EUMAcut.-NR.iLVKM (isoform-wise).
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	166
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	167 Files containing '-NR' excludes genes with no mRNA (eg. rRNA, tRNA, snRNA, snoRNA and other small RNA genes). If your sample is poly-A-selected, these RNAs cannot be quantified along with mRNAs. If your sample did not use any enrichment step and mRNAs and ncRNAs are represented in their exact proportion as in the cell, you can use the LVKM files without the '-NR' tag. The ncRNAs were removed at the last step because reads mapping to both an mRNA and an ncRNA must be considered a multi-read. For Ensembl, -NR version are not provided (ncRNAs are not removed).
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	168
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	169
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	170
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	171 The unadjusted and adjusted FVKM values (as 'FVK' and 'FVKM' for before and after sample-normalization, respectively) can be found in the LVKM files as well, along with gEUMA and iEUMA values and the number of isoforms and the number of measurable isoforms.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	172
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	173 The mapping stat can be found in outputdir/mapping_stat.samplename.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	174
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	175 The NIR and/or readcount values can be found in outputdir/readcount/samplename.ebwtname.maxinsMAX_DIST.mm0.gNIR, outputdir/readcount/samplename.ebwtname.maxinsMAX_DIST.mm0.iNIR and/or outputdir/readcount/samplename.ebwtname.maxinsMAX_DIST.mm0.gReadcount
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	176
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	177 The EUMA values can be found in outputdir/EUMA/samplename.ebwtname.maxinsMAX_DIST.mm0.gEUMA and outputdir/EUMA/samplename.ebwtname.maxinsMAX_DIST.mm0.iEUMA.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	178
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	179 The insert length distribution can be found in outputdir/insertlendis/samplename.ebwtname.maxinsMAX_DIST.mm0.i.insertlendis.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	180
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	181 If --mm option was used, replace 'mm0' in the above file names with 'mm1' or 'mm2'.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	182
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	183
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	184 ## single-end
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	185
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	186 The formats for LVKM and NIR files(gMA & iMA) are the same as in the paired-end case. The LVKM file names are outputdir/LVKM/samplename.ebwtname.maxinsMAX_DIST.mm0.single.-EUMAcut.-NR.gLVKM (gene-wise) and outputdir/LVKM/samplename.ebwtname.maxinsMAX_DIST.mm0.single.-EUMAcut.-NR.iLVKM (isoform-wise). The NIR values can be found in outputdir/MA/samplename.ebwtname.maxinsMAX_DIST.mm0.single.all.gMA (the #uniq.common column) and outputdir/MA/samplename.ebwtname.maxins400.mm0.single.all.iMA (the #uniq column).
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	187
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	188 The mapping stat can be found in outputdir/mappint_stat.samplename.single.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	189
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	190
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	191
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	192
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	193 6. Preliminary run
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	194
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	195 Given a new data set, users can run the first part of auto_NEUMA_PE.pl, without the U.tables, to find out the insert length distribution and mapping stats. Based on the length distribution, the user can determine a safe MAXDIST and request us to build U.tables (through the NEUMA website). After having the U.tables ready, the user can run the latter part of auto_NEUMA_PE.pl. Note that -U, --g2m and --g2s are not required for this run.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	196
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	197 * Usages:
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	198
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	199 auto_NEUMA_PE.pl --only_init [options] -L=<read_length> -D=<maxdist> -1=<input_file1(mate1)> -2=<input_file2(mate2)> -b=<bowtie_dir(eg.bin/bowtie-0.12.5)> --bi=<bowtieindex> -o=<outputdir> -s=<sample_name>
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	200
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	201 required arguments
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	202 * -L=<read_length> : read_length(eg.36) : sequenced length of a read mate (/not/ the insert length or sum of the two mate lengths) (no default : L must be specified)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	203 * -D=<maxdist> : maxdist(eg.400) : maximum outer distance between mates (insert size). This must be identical to the maxdist used for generating gU and iU tables. (no default : D must be specified)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	204 * -1=<input_file1(mate1)> : fasta or fastq file of a series of read mate 1
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	205 * -2=<input_file2(mate2)> : fasta or fastq file of a series of read mate 2. The ID of each mate 2 must be matched to that of mate 1 in case of fasta file.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	206 * -b=<bowtie_dir(eg.bin/bowtie-0.12.7)> : directory in which bowtie executable is installed. Either full path or relative path must be used and '~' must be avoided.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	207 * --bi=<bowtieindex> : the index file prefix of the reference transcriptome sequence, created by bowtie (bowtie-build). It is the same string put as the reference index argument for the bowtie program. (see bowtie manual)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	208 * -o=<outputdir> : a directory that will contain all the output files. If the directory does not exist, the program will create it automatically.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	209 * -s=<sample_name> : name of the sample that will be used as the prefix of all the output files.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	210
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	211 options
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	212 * -f=<file_type> : fasta(f)_or_fastq(q) : f if input files are in fasta format; q if in fastq format. (default : q)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	213 * -c=<coding_option> : nucleotide(n)_or_colorspace(c) : n if input files are in DNA sequence (A,C,G,T); c if in colorspace (SOLiD platform). Note that if colorspace is used, the colorspace version of bowtie index file must be used. (default : n)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	214 * -p=<num_cpu> : number of cpu's to use for Bowtie. (default : 1)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	215 * --str=<strand_specificity> : Strand-specific(S) vs Non-Strand-specific(N). (default : N)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	216 * --mm=<number_of_mismatches_allowed> : setting number of mismatches allowed (default : 0)
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	217
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	218 * The maxdist for this run can be set to something very large, eg. 1000. Then, for the real runs using the --skip_init option, the maxdist must be set identical to the one used to build the U.tables.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	219
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	220 * For single-end reads, insert length distribution does not have to be pre-determined. Simply tell us the read length to request the EUMA tables for single-end data.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	221
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	222 output
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	223 The quickest way to check the output insert length distribution is to look at the insertlendis directory under outputdir.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	224
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	225
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	226
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	227
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	228
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	229 7. Merging output files
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	230
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	231
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	232 1) merge_LVKM.pl
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	233
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	234 The script produces a text file that contains gLVKM or iLVKM values for all samples for all genes/isoforms.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	235
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	236 usage: ./merge_LVKM.pl genewise/isoformwise[g/i] EUMAcut LVKM_out_dir NR(1/0) > output.gLVKM(iLVKM).merged
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	237
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	238 Example usage:
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	239 ./merge_LVKM.pl g 50 data/LVKM 0 > data/LVKM/all.gLVKM.merged
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	240 ./merge_LVKM.pl i 50 data/LVKM 0 > data/LVKM/all.iLVKM.merged
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	241
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	242 * genewise/isoformwise[g/i] : put g for gLVKM and i for iLVKM.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	243 * EUMAcut : put the same EUMA cut off used to generate the LVKM files. This will be used to recognize the files.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	244 * LVKM_out : This is usually your_basedir/LVKM, which contains all the LVKM files generated. All the LVKM files in this directory that matches the EUMAcut specified will be included in the merged table.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	245 * NR(1/0) : 1 means the script must use the LVKM files generated after the noncoding RNAs starting with the NR prefix are removed. 0 is otherwise.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	246
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	247
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	248 2) merge_LVKM_readcount.pl
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	249
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	250
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	251 This script produces a gNIR / iNIR file that contains the read counts for all samples for all genes/isoforms. The gNIR / iNIR files can be fed to diffNEUMA (http://neuma.kobic.re.kr), for identification of differentially expressed genes/isoforms. This script is identical to merge_NEUMA_readcount.pl in previous versions.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	252
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	253 usage: ./merge_LVKM_readcount.pl genewise/isoformwise[g/i] EUMAcut LVKM_out_dir NR(1/0) > output.gLVKM(iLVKM).merged
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	254
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	255 Example usage:
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	256 ./merge_LVKM_readcount.pl g 10 data/LVKM 1 > data/LVKM/all.gNIR
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	257 ./merge_LVKM_readcount.pl i 10 data/LVKM 1 > data/LVKM/all.iNIR
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	258
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	259 * genewise/isoformwise[g/i] : put g for gLVKM and i for iLVKM.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	260 * EUMAcut : put the same EUMA cut off used to generate the LVKM files. This will be used to recognize the files.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	261 * LVKM_out : This is usually your_basedir/LVKM, which contains all the LVKM files generated. All the LVKM files in this directory that matches the EUMAcut specified will be included in the NIR file.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	262 * NR(1/0) : 1 means the script must use the LVKM files generated after the noncoding RNAs starting with the NR prefix are removed. 0 is otherwise.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	263
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	264
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	265
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	266 3) merge_readcount.pl
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	267
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	268
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	269 This script produces a gNIR.merged / iNIR.merged / gReadcount.merged file that contains the read counts for all samples. The gNIR / iNIR /gReadcount files can be fed to diffNEUMA (http://neuma.kobic.re.kr), for identification of differentially expressed genes/isoforms. The result is not filtered for EUMA cut or NR as in merge_LVKM_readcount.pl.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	270
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	271 usage: ./merge_readcount.pl type[gNIR/iNIR/gReadcount] readcount_dir > output.gNIR(iNIR/gReadcount).merged
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	272
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	273 Example usage:
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	274 ./merge_readcount.pl gNIR data/readcount > data/readcount/all.gNIR.merged
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	275 ./merge_readcount.pl iNIR data/readcount > data/readcount/all.iNIR.merged
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	276
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	277 * type (gNIR/iNIR/gReadcount) : the type of the read counts to be merged
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	278 * readcount_dir : This is usually your_basedir/readcount, which contains all the readcount files generated.
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	279
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	280
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	281 //
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	282
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	283
c44c43d185ef NEUMA-1.2.1 Uploaded chawhwa parents: diff changeset	284

Mercurial > repos > chawhwa > neuma

annotate NEUMA-1.2.1/README @ 0:c44c43d185ef draft default tip