view export_iprscan_to_Excel/readme.txt @ 0:a9762cd6e2e3 draft default tip

Uploaded
author basfplant
date Tue, 05 Mar 2013 04:00:19 -0500
parents
children
line wrap: on
line source

Installation of iprscanToExcel
------------------------------

1) The program iprscanToExcel does not work if InterProScan and the corresponding Galaxy wrapper are not present. 

2) Change the paths in the <command> part of Galaxy wrapper "interproscan.xml" to the paths on your system, at least if this is required

	${GALAXY_ROOT_DIR}/tools/iprscan/iprscanToExcel_v20.jar 

3) Installation of iprscanToExcel_v20.jar, iprscanToExcel.props and the Galaxy XML wrapper iprscanToExcel.xml

- The wrapper file "iprscanToExcel.xml", the program "iprscanToExcel_v20.jar" and its corresponding properties file "iprscanToExcel.props" should all be copied to the same directory, namely Galaxy tools directory "iprscan", {GALAXY_ROOT_DIR}/tools/iprscan
- Make GALAXY aware of the new tool: GALAXY knows about installed tools (and also what to display on the left pane) from the file {GALAXY_ROOT_DIR}/tool_conf.xml 
Use a text editor to add a line for the interproscan.xml wrapper to e.g.the Sequence Annotation section.

	<label text="My Tools" id="My tools" />  
	<section name="Sequence Annotation" id="sequence_annotation" >      
	<tool file="iprscan/interproscan.xml" />  
	<tool file="iprscan/iprscanToExcel.xml" />  
	</section>  

- start up GALAXY again, open it in the web browser and test


iprscanToExcel functionality
----------------------------

iprscanToExcel is a Java program that converts raw and/or xml output files from the interproscan program to Excel format (xlsx). Three modes of operation are available: convert both XML and raw iprscan output files to Excel, convert only the xml output file to Excel or convert only the raw file to Excel. 

The xml output file of the interproscan program contains the source data for the Excel tabsheet "summary tables". Those summary tables give for each protein family information concerning the detailed matches, the parent, the child_list, where they are found_in, the GO-terms, ... 

The raw output file of the interproscan program contains the source data for the Excel tabsheet "iprscan results", containing an overview table with proteinID, protein crc64, protein length, match dbname, classification id, classification description, start, end, score, status, date, interproID, interpro name, (title, GO number, description)n. The columns can be sorted and filtered via the filters present in the headers of the columns. 

The program requires the availability of raw and/or xml files in the Galaxy history. The files can be generated via the application "Interproscan functional predictions" (under the header Sequence Annotation).


Galaxy workflow example
-----------------------

The file "Galaxy-Workflow-Export_xml_and_raw_output_from_iprscan_to_Excel.ga" stores a workflow. In the first two sections, a sequence file (fasta) can be uploaded and all InterProScan applications will be executed to generate the and xml and a raw InterProScan output file. In the third section of the workflow, those two InterProScan output files will be used as input for the iprscanToExcel program, resulting in an Excel file (.xlsx) with two tab pages.


Author and affiliation
----------------------

Katrien Bernaerts and Domantas Motiejunas
corresponding author: gb-ctk-open-source-support@basf.com
10/06/2012

CropDesign N.V., a BASF Plant Science Company - Technologiepark 3, 9052 Zwijnaarde - Belgium


Terms of use 
--------------------------
iprscanToExcel - Copyright (C) 2012 CropDesign N.V. - this software may be used, copied and redistributed, with or without modification freely, without advance permission, provided that the above Copyright statement is reproduced with each copy. 
THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE (INCLUDING NEGLIGENCE OR OTHERWISE).

(R)Excel is a registered trademark of Microsoft Corporation in the United States and/or other countries.