Mercurial > repos > matces > carpet_toolsuite
changeset 0:cdd489d98766
Migrated tool version 1.0.0 from old tool shed archive to new tool shed repository
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/COPYING Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,674 @@ + GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + <program> Copyright (C) <year> <name of author> + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +<http://www.gnu.org/licenses/>. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +<http://www.gnu.org/philosophy/why-not-lgpl.html>.
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/INSTALL Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,39 @@ +Prerequisites: +- a working installation of galaxy +- R/Bioconductor +- Ringo bioconductor library +- rpy (version 1.0.x) +- a working C++ compiler + +Installation: +given a galaxy dir $GALAXYTOPDIR +- extract the content of the archive into $GALAXYTOPDIR: + + tar cvjf carpet.tar.bz2 -C $GALAXYTOPDIR + +- build the "comuni" executable: + + cd $GALAXYTOPDIR/tools/CARPET + g++ com_uni.cpp -o comuni + + (do NOT use optimizations if possible) + +- paste the content of add_to_tool_conf.xml within tool_conf.xml +file (part of galaxy distribution), where needed and between the +<toolbox></toolbox> tags + +- restart galaxy + +NOTES: +- rpy2 is not implemented in carpet yet. You can download rpy (1.0.3) here: + +http://sourceforge.net/projects/rpy/files/rpy/1.0.3/rpy-1.0.3.tar.gz/download + +- You may install bioconductor's Ringo library just issuing the following within +your R console: + + source("http://bioconductor.org/biocLite.R") + biocLite("Ringo") + +- carpet has been deployed and tested on galaxy build 1349. There's no warranty it +works on different builds (although it should). \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/add_to_tool_conf.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,12 @@ + <section name="CARPET: tiling analysis" id="mTools"> + <tool file="CARPET/Raw_data.xml" /> + <tool file="CARPET/norm_rep.xml" /> + <tool file="CARPET/gff2bed_v2.xml" /> + <tool file="CARPET/PeakPeaker.xml" /> + <tool file="CARPET/common_unique_probe.xml" /> + <tool file="CARPET/MapToExon_RefSeqMat.xml" /> + <tool file="CARPET/TSS_distance.xml" /> + <tool file="CARPET/annotation_expr.xml" /> + <tool file="CARPET/calcolo_p_v4_norm.xml" /> + <tool file="CARPET/genecentrico.xml" /> + </section>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/suite_config.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,33 @@ +<suite id="CARPET_toolsuite" name="CARPET" version="1.0.0"> + <description>This suite contains all tha CARPET tools created for Galaxy</description> + <tool id="view_chip" name="ChipView" version="1.0.0"> + <description>looking into the chip</description> + </tool> + <tool id="normalization" name="PreProcess for Tiling" version="1.0.0"> + <description>normalizing data</description> + </tool> + <tool id="gff to bed wiggle" name="Gff2Wig" version="1.1.0"> + <description>easy UCSC visualization of your raw-data</description> + </tool> + <tool id="Find peaks" name="PeakPicker" version="1.0.0"> + <description>Finding Peaks in a GFF Nimblegen File</description> + </tool> + <tool id="common unique" name="Com&Uni" version="1.1.0"> + <description>easy way to compare results</description> + </tool> + <tool id="Annotation_RefSeq" name="GIN" version="1.2.0"> + <description>Gene Intervals Notator</description> + </tool> + <tool id="Annotation visualization" name="GIN visualizator" version="1.0.0"> + <description>of peaks distribution</description> + </tool> + <tool id="Annotation_Expr" name="ENO" version="1.0.0"> + <description>Expression NOtator</description> + </tool> + <tool id="expressions" name="TEA" version="1.0.0"> + <description>Tiling Expression Analizer</description> + </tool> + <tool id="BECorrelation" name="BEC" version="1.0.0"> + <description>Binding-Expression-Correlation</description> + </tool> + </suite>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/MapToExon_RefSeqMat.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,99 @@ +<tool id="Annotation_RefSeq" name="GIN" version="1.2.0"> + <description>Gene Intervals Notator</description> + <command interpreter="perl">MapToExon_RefSeqMat_new.pl $input1 $input2 $promoter $3prime $priority $output</command> + <inputs> + <param format="tabular" name="input1" type="data" label="GFF file"/> + <param format="tabular" name="input2" type="data" label="Annotation table"/> + <param name="promoter" type="integer" size="10" value="-2000" label="Promoter definition (bp)"/> + <param name="3prime" type="integer" size="10" value="2000" label="3prime extension (bp)"/> + <param name="priority" type="select" label="Annotation priority"> + <option value="prom">promoter</option> + <option value="gene">gene</option> + </param> + + </inputs> + <outputs> + <data format="tabular" name="output"/> + </outputs> + + <help> + .. class:: infomark + +**What it does** + +GIN annotates peak queries (GFF files) with user defined transcript-annotation-tables (e.g. RefSeq, UCSC genes, Ensembl Genes etc). +It calculates the relative position of the peack with respect to the associated features (e.g. promoter, exon, intron, intergenic) + +PLEASE, for more detailed information refer to the CARPET user Manual: +click to download_ it. + +.. _download: /static/example_file/CARPET_userManual.zip + +-------- + +.. class:: warningmark + +**Annotation Table** + +Annotation table is directly downloadable from **"Get Data"** section (**"UCSC Main table browser"** link). +Pay attention to choose the right output format (**"all field from selected table"**) and check **"send output to Galaxy"**. + +It is possible to download many different annotation tables coming from different organisms and database such as RefSeq, UCSC gene, FlyBase, EST, etc etc... + +**All annotation tables must have headers.** + +Click here_ to download an Annotation Table file example. + +.. _here: /static/example_file/UCSC_hs_refGene_chr19.zip + +-------- + +.. class:: warningmark + +**Custom annotation table** + + .. class:: infomark + + + **About format** + + Annotation table format must be the same downlodable from UCSC. In the specific case of this tool the following fields must be present: + + 1. **chrom** - The name of the chromosome (e.g. chr1, chrY_random). + 2. **chromStart** - The starting position in the chromosome. (The first base in a chromosome is numbered 0.) + 3. **chromEnd** - The ending position in the chromosome, plus 1 (i.e., a half-open interval). + 4. **name** - The name of the BED line. + 5. **strand** - Defines the strand - either + or - . + 6. **blockCount** - The number of blocks (exons) in the BED line. + 7. **blockSizes** - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount. + 8. **blockStarts** - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount. + + +The table **must** have headers + + +-------- + +**Options** + +- **Promoter definition:** extent of TSS (Trascription Starting Site) upstream sequence in base pairs. +- **Annotaion priority:** + - if **promoter**: GIN tries to locate a peak in a promoter locus as first choice. If more than one promoter is found, the peak is associated to the closer transcriptional unit + - if **gene**: GIN tries to locate a peak in an exon as first choice. + +-------- + +**How does it work?** + +**- Floowchart** + +.. image:: static/images/CARPET/floowchart.png + + +**- Output** + +.. image:: static/images/CARPET/output_ann.png + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/MapToExon_RefSeqMat_new.pl Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,412 @@ +#!/usr/bin/perl + +# Copyright 2009 Matteo Cesaroni, Lucilla Luzi +# +# This program is free software; ; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as published by +# the Free Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + + +$|=1; +my $infile = $ARGV[0]; +my $infile2=$ARGV[1]; +my $definition_pro=$ARGV[2]; +my $definition_tre=$ARGV[3]; +my $priority=$ARGV[4]; +my $file_output=$ARGV[5]; + +open (INFILE, "<$infile"); +open (INFILE2, "<$infile2"); +open (OUTFILE1, ">$file_output") or die "Cannot find file $file_output\n"; + +$campi_t=0; +$definition_out=$definition_pro-2000; + +while (defined (my $line_down = <INFILE2>)) { + $line_down=~ s/\#//g; + chomp($line_down); + $campi_t++; + my @tmp_down=split(/\s+/, $line_down); + if($campi_t==1){ + $z=0; + foreach $campo_t(@tmp_down){ + if(($campo_t eq "name") || ($campo_t eq "qName") || ($campo_t eq "repClass")){ + $zRef=$z; + } + if(($campo_t eq "txStart") || ($campo_t eq "tStart") || ($campo_t eq "chromStart")|| ($campo_t eq "genoStart")){ + $ztxStart=$z; + } + if(($campo_t eq "txEnd") || ($campo_t eq "tEnd") || ($campo_t eq "chromEnd")|| ($campo_t eq "genoEnd")){ + $ztxEnd=$z; + } + if($campo_t eq "strand"){ + $zstrand=$z; + } + if(($campo_t eq "chrom") || ($campo_t eq "tName")|| ($campo_t eq "genoName")){ + $zchrom=$z; + } + if(($campo_t eq "exonStarts") || ($campo_t eq "tStarts")){ + $zexonstart=$z; + } + if($campo_t eq "exonEnds"){ + $zexonend=$z; + } + if($campo_t eq "blockSizes"){ + $zblocksize=$z; + } + if(($campo_t eq "name2") || ($campo_t eq "repFamily")){ + $zname=$z; + } + if(($campo_t eq "exonCount")||($campo_t eq "blockCount")){ + $zcount=$z; + } + $z++; + } + if(!$zname){ + $zname=$zRef; + } + if(!$zexonstart){ + $zexonstart=$ztxStart; + } + if(!$zexonend){ + $zexonend=$ztxEnd; + } + if(($zRef eq "") || ($ztxStart eq "") || ($zstrand eq "") || ($zchrom eq "")){ + print "Annotation file is not in the accepted format\n"; + exit; + }else{print "promoter=$definition_pro, priority=$priority";} + next; + } + chomp $tmp_down[$zchrom]; + $tab_ann{$tmp_down[$zchrom]}.="$line_down\n"; +} + +while (defined (my $line_down = <INFILE>)) { + my @tmp_down = split("\t", $line_down); + chomp $tmp_down[0]; + $tab_probe{$tmp_down[0]}.=$line_down; +} + +@chrom_probes= keys(%tab_probe); + +&chip; + +exit 0; + +########### +#subrutine# +########### + +sub chip +{ +foreach $chromosoma (@chrom_probes){ + +@file1=split("\n", $tab_probe{$chromosoma}); + +foreach $line(@file1) { + chomp $line; + #chop $line; + if ($line=~/track/g){next;} + if ($line=~/#/g){next;} + if ($line=~/^\s+$/g){next;} + my @Line=split(/\t/, $line); + my $Chrom=$Line[0]; + my $Start=$Line[3]; + my $Stop=$Line[4]; + #my $value=$Line[5]; + my $ProbeName=$Line[5]; + my $feature="ciccio"; + my $check=0; + my $relative_dist=10000000; + $double_check=0; + @file2=split("\n", $tab_ann{$chromosoma}); + foreach $linea(@file2) { + chomp $linea; + $linea=~ s/\#//g; + my @kEle=split("\t", $linea); + $ref=$kEle[$zRef]; + $chrom=$kEle[$zchrom]; + $strand=$kEle[$zstrand]; + $transcriptStart=$kEle[$ztxStart]; + $transcriptStop=$kEle[$ztxEnd]; + if($zcount){ + $exoncount=$kEle[$zcount]; + } + else + { + $exoncount=1; + } + $geneName=$kEle[$zname]; + $exonStartref=$kEle[$zexonstart]; + my $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand"; + + my @exonStart=split(",", $exonStartref); + + if (!$zblocksize){ + $exonEndref=$kEle[$zexonend]; + } + else { + @blockStop=split(",", $kEle[$zblocksize]); + $exonEndref=""; + for ($jj=0; $jj<=$#exonStart; $jj++){ + $end_block=$exonStart[$jj]+$blockStop[$jj]; + $exonEndref.="$end_block,"; + + } + } + + my @exonStop=split(",", $exonEndref); + + #print "$ref / $chrom / $strand / $transcriptStart/$transcriptStop/$exoncount/$geneName [$#exonStop]\n"; + #print "pippo $exonStart[0] - $exonStop[0],$exonStart[1] - $exonStop[1],$exonStart[2] - $exonStop[2] \n"; + + + if ($Chrom eq $chrom) { + #print "cazzo"; + if ($strand eq "+"){ + $promotore=$transcriptStart+$definition_pro; + $distanzaTSS=int((($Start+$Stop)/2)-$transcriptStart); + $trepr=$transcriptStop+$definition_tre; + $rel_start=$promotore; + $rel_stop=$trepr; + } + if ($strand eq "-"){ + $promotore=$transcriptStop-$definition_pro; + $distanzaTSS=int($transcriptStop-(($Start+$Stop)/2)); + $trepr=$transcriptStart-$definition_tre; + $rel_start=$trepr; + $rel_stop=$promotore; + } + #print OUTFILE1 "$ref\t$distanzaTSS\n"; + + #if(($Start<=$transcriptStart && $Stop>$transcriptStart) || ($Start>=$promotore && $Stop<=$transcriptStart) || ($Start>=$transcriptStop && $Stop<=$promotore) || ($Start<=$promotore && $Stop>$promotore) || ($Start<$transcriptStop && $Stop>=$transcriptStop) || ($Start>=$transcriptStart && $Stop<=$transcriptStop) ){ + #print "sono entrato con start $Start stop $Stop e $transcriptStart e $transcriptStop\n"; + + if($Start<=$rel_stop && $rel_start<=$Stop){ + + for(my $i=0;$i<=$#exonStart;$i++) { + + if ($strand eq "+"){ + $exoncount1=$i+1; + $exoncount2=$exoncount1; + $introncount=$exoncount1; + if($i==$#exonStart) { + $exoncount2="last"; + $introncount="last"; + } + if($i==$#exonStart-1) { + $introncount="last"; + } + } + if ($strand eq "-"){ + $exoncount1=($#exonStart+1)-$i; + $exoncount2=$exoncount1; + $introncount=$exoncount1-1; + #if(($i==0) && ($#exonStart != 0)) {$exoncount2="last";} + if($i==0) { + $exoncount2="last"; + $introncount="last"; + } + } + #print "esone start $exonStart[$i] e $exonStop[$i] e start $Start\n"; + + #print OUTFILE1 "$ref\t$exonStart[$i]\t$exonStop[$i]\t$Start\t$Stop\t$i\t$#exonStart\t$priority\n"; + + + + #if($priority eq "prom" && $check==1){ + # last; + #} + #if($priority eq "gene" && $check==2){ + # next; + #} + + + if(($Start<=$exonStart[$i]) && ($Stop>$exonStart[$i])){ + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tintronexon $exoncount2\t$distanzaTSS"; + if($priority eq "prom"){ + $check=2; + } + else{ + $check=1; + #last; + } + } + if(($Start>=$exonStart[$i]) && ($Stop<=$exonStop[$i])){ + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\texon $exoncount2\t$distanzaTSS"; + if($priority eq "prom"){ + $check=2; + } + else{ + $check=1; + #last; + } + } + if(($Start<$exonStop[$i]) && ($Stop>$exonStop[$i])){ + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\texonintron $introncount\t$distanzaTSS"; + if($priority eq "prom"){ + $check=2; + } + else{ + $check=1; + #last; + } + } + if($priority eq "prom"){ + if(($Start>=$exonStop[$i]) && ($Stop<=$exonStart[$i+1]) && ($check==0)){ + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tintron $introncount\t$distanzaTSS"; + #print "intron\n"; + $check=2; + } + } + else{ + if(($Start>=$exonStop[$i]) && ($Stop<=$exonStart[$i+1])){ + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tintron $introncount\t$distanzaTSS"; + #print "intron\n"; + $check=2; + } + } + + + + if (($strand eq "+") && ($Start>=$transcriptStop)) { + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\t3_prime\t$distanzaTSS"; + $distanzaTSS=int($transcriptStop-(($Start+$Stop)/2)); + if($priority eq "prom"){ + $check=2; + #last; + } + else{$check=1;} + } + if (($strand eq "-") && ($Stop<=$transcriptStart)) { + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\t3_prime\t$distanzaTSS"; + $distanzaTSS=int((($Start+$Stop)/2)-$transcriptStart); + if($priority eq "prom"){ + $check=2; + #last; + } + else{$check=1;} + } + + + + + if (($strand eq "+") && (($Start<=$promotore && $Stop>$promotore) || ($Start>$promotore && $Stop<=$transcriptStart))) { + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tpromoter\t$distanzaTSS"; + if($priority eq "prom"){ + $check=1; + #last; + } + else{$check=2;} + } + if (($strand eq "-") && (($Start<=$promotore && $Stop>=$promotore) || ($Start>=$transcriptStop && $Stop<$promotore))) { + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tpromoter\t$distanzaTSS"; + if($priority eq "prom"){ + $check=1; + #last; + } + else{$check=2;} + } + + if(($Start<=$exonStart[$i]) && ($i==0) && ($strand eq "+") && ($Stop>=$exonStart[$i])){ + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tprom_exon $exoncount2\t$distanzaTSS"; + #print "prom-exon e $exoncount1\n"; + if($priority eq "prom"){ + $check=1; + #last; + } + else{$check=2;} + } + if(($Start<=$exonStop[$i]) && ($i==$#exonStart) && ($strand eq "-") && ($Stop>=$exonStop[$i])){ + $feature="$ref\t$geneName\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tprom_exon $exoncount2\t$distanzaTSS"; + if($priority eq "prom"){ + $check=1; + #last; + } + else{$check=2;} + } + + } + #if ($check==1){ + # print "$chrom\t$Start\t$Stop\t$geneName\t$exoncount\t$ref\t$strand\t$feature\n"; + #} + }else{next;} + #exit; + }else{next;} + + #print OUTFILE1 "$ref\t$feature\n"; + #print "$geneName\t$transcriptStart\t$Chrom\t$Start\t$Stop\t$distanzaTSS\t$feature\n"; + + if (($priority eq "gene") && ($relative_dist>abs($distanzaTSS))){ + $relative_dist=abs($distanzaTSS); + $stampa="$Chrom\t$Start\t$Stop\t$ProbeName\t$feature\n"; + $double_check=1; + } + if (($check==1) && ($priority eq "prom") && ($relative_dist>abs($distanzaTSS))){ + $relative_dist=abs($distanzaTSS); + $double_check=1; + $stampa="$Chrom\t$Start\t$Stop\t$ProbeName\t$feature\n"; + } + } + + + + if($double_check==1){ + print OUTFILE1 "$stampa"; + next; + } + if (($check==2) && ($priority eq "prom")){ + print OUTFILE1 "$Chrom\t$Start\t$Stop\t$ProbeName\t$feature\n"; + next; + } + if($check==0){ + print OUTFILE1 "$Chrom\t$Start\t$Stop\t$ProbeName\tintergenic\tintergenic\t$Start\t$Stop\t+\t0\tOUT\t$definition_out\n"; + next; + } +} +} +close INFILE; +close INFILE2; +} + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/PeakPeaker.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,139 @@ +<tool id="Find peaks" name="PeakPicker" version="1.0.0"> + <description>Finding Peaks in a GFF Nimblegen File</description> + <command interpreter="perl">PeakPeaker2.pl --in $input --out $output --t $type --dist_peaks $dist_peaks --col3 $col3 --log $log --perc $perc --num $num --dist $dist --w $window --f_pv $output2</command> + <inputs> + <param format="tabular" name="input" type="data" label="Source file"/> + <param name="col3" size="20" type="text" value="Analisys" label="Analisys name"/> + <param name="type" type="select" label="Analysis type"> + <option value="p">p-value</option> + <option value="s">score</option> + </param> + <param name="perc" size="4" type="text" value="0.95" label="percentile value"/> + <param name="log" size="2" type="text" value="7" label="-log p-value cutoff"/> + <param name="num" size="2" type="text" value="3" label="minimal number of probes"/> + <param name="dist" size="4" type="text" value="100" label="max distance between two probes"/> + <param name="dist_peaks" size="4" type="text" value="200" label="min distance between two peaks"/> + <param name="window" size="4" type="text" value="500" label="window length"/> + </inputs> + <outputs> + <data format="bed" name="output" /> + <data format="gff" name="output2" /> + </outputs> + + + <help> + .. class:: infomark + +**What it does** + +This tool utilizes NimbleGen ratio files in gff format as INPUT FILE and provides a table of the computed peaks in the same gff format. + +-------- + +**Parameters:** + +- **Analysis type:** + - **p-value** analysis performs peaks determination based on p-value inference + - **score** analysis performs peaks determination based on a scoring system +- **Percentile value:** it is used to calculate the threshold rate based on dataset distribution to filter out background +- **-log p-value cutoff:** (required only for p-value based analysis) cutoff integer to be used to identify a significant peak +- **minimal # of probes:** minimal number of consecutive probes used to define a peak +- **max distance 2 probes:** greatest nucleotide distance (bp) between two probes that allow to consider two probes as adjacent +- **min distance 2 peaks:** minimum nucleotide distance (bp) required to consider two peaks as separate entities +- **window length:** length in bp of the window used for statistical analysis + +-------- + + +**INPUT FILE** + +Nimblegen gives you back a GFF file with the coordinates of each probe and the normalized signal value --> log2(Cy5/Cy3). + +Click here_ to download a GFF file example. + +.. _here: /static/example_file/GFF_file_norm.txt.zip + +Example of Nimblegen GFF format:: + + chr19 Nimblegen tiling_array 100000 1000051 -1.2 + . probe_name + chr19 Nimblegen tiling_array 100100 1000151 2.9 + . probe_name + +.. class:: warningmark + +The sixth column **must** contain the normalized log2(cy5/cy3) that Nimblegen gives you back after the experiment + + +--------- + +.. class:: infomark + +**How does it work?** + +**Two assumptions:** + + +- data are enriched for signal in the positive direction ("one-tailed") +- a peak (or enriched region) is represented by multiple probes that are genomically located close to each other + + +**Statistical approach: sliding window** + + +A window centered at each probe of the array moves probe by probe. In each window Chi squared is calculated + + +.. image:: static/images/CARPET/chi_squared.png + + +by building a contingency table for each probe, and a p-value is assigned + + +.. image:: static/images/CARPET/centered.png + + +**"-log2(p-value)"** is associated to each probe. This value takes in account the neighbouring probes effect. +This approach dramatically decreases the background signal. + + +.. image:: static/images/CARPET/background.png + + +New values are considered to defined an enriched locus + + +.. image:: static/images/CARPET/pvalue.png + + +Moreover a score is calculated taking into account the length and the raw signal of the peak + + +.. image:: static/images/CARPET/pvalue_score.png + + +Output is a gff file + + +.. image:: static/images/CARPET/table_pv.png + + +**NON Statistical approach: score** + + +Only the raw signal of each probe is considered. Only the regions with a number of consecutive probes above the defined threshold are selected + + +.. image:: static/images/CARPET/score.png + + +Output is a GFF file + + +.. image:: static/images/CARPET/table_score.png + + +and a GFF file with the p-values associate to each probe + + </help> + +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/PeakPeaker2.pl Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,432 @@ +#! /usr/bin/perl + +# Copyright 2009 Matteo Cesaroni, Lucilla Luzi +# +# This program is free software; ; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as published by +# the Free Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + + +use Getopt::Long; + +GetOptions ( + "help" => \$OPT{help}, + "in=s" => \$OPT{fname}, + "perc=s" => \$OPT{value}, + "fc=s" => \$OPT{fc_value}, + "log=s" => \$OPT{valore_log}, + "t=s" => \$OPT{type_peak}, + "num=s" => \$OPT{num_probe}, + "dist=s" => \$OPT{dist_max}, + "dist_peaks=s" => \$OPT{dist_max_peaks}, + "w=s"=> \$OPT{s_windows}, + "col3=s" => \$OPT{col3}, + "f_pv=s" => \$OPT{file_pv}, + "out=s"=> \$OPT{out} + +)|| printusage(); + +# opzioni da linea di comando +my $value=$OPT{value}; +my $fc_value=$OPT{fc_value}; +my $valore_log=$OPT{valore_log}; +my $type_peak=$OPT{type_peak}; +my $num_probe=$OPT{num_probe}; +my $dist_max=$OPT{dist_max}; +my $dist_max_peaks=$OPT{dist_max_peaks}; +my $fname=$OPT{fname}; +my $help=$OPT{help}; +my $s_windows=$OPT{s_windows}; +my $col3=$OPT{col3}; +my $outfile=$OPT{out}; +my $outfile_pv=$OPT{file_pv}, + +# "usage" se c'e' un help +$help and printusage(); + +# "usage" se non ci sono opzioni +if (!$s_windows || !$fname || (!$value && !$fc_value) || !$type_peak || !$num_probe || !$dist_max || (($type_peak eq "p") && !$valore_log)){ + &printusage() +}; + + +qx {sort -k 1,1 -k 4,4n $fname >$fname.sortato}; + + +open(FILE, "<$fname.sortato") or die "Cannot find file $fname.sortato\n"; +open(pvalue, ">$outfile_pv") or die "Cannot open file $outfile_pv: $!\n"; +#open(buonitutti, "> buoni_$fname") or die "Cannot find file $fname: $!\n"; +#loop th rought line-by-line until the end of the file and then push each line into an empty array, called @array# +print pvalue "#p-value track ($value) \n"; + + +@array= (); +$conto_tutti_sopra=0; +$conto_tutti=0; +while ($line = <FILE>){ + chomp ($line); + if ($line=~/track/g){next;} + if ($line=~/#/g){next;} + if ($line=~/^\s+$/g){next;} + push (@array,$line); + @value_perc=split("\t",$line); + push (@percentile,$value_perc[5]); + if ($value_perc[5]>=$fc_value){ + $conto_tutti_sopra++; + } + elsif ($value_perc[5]<$fc_value){ + $conto_tutti++; + } +} + +close (FILE); + +if (!$fc_value){ +@perc_ordine=sort(@percentile); +$position=(($value*($#perc_ordine+1))-1); +$valore_percentile=$perc_ordine[$position]; + #print "il percentile è $valore_percentile\n"; + #print "il max è $perc_ordine[$#perc_ordine]\n"; + #print "il min è $perc_ordine[1]\n"; +$probabilita=1-$value; +} +else { + $valore_percentile=$fc_value; + $tuttitutti=($conto_tutti+$conto_tutti_sopra); + #print"il numero totaledi probe è $tuttitutti\n"; + #print"il numero totale di probe sopra è $conto_tutti_sopra\n"; + $probabilita=$conto_tutti_sopra/($conto_tutti+$conto_tutti_sopra); + #print "la proba= $probabilita\n"; + } + +#print buonitutti"il percentile e $valore_percentile\n"; + + + +($one_ref,$two_ref)=&probe_cutoff(@array); + +if ($type_peak eq "p"){ + @forse_niente=&peak(@$one_ref); + } +elsif ($type_peak eq "s"){ + @forse_niente=&peak(@$two_ref); + } + +&double_peak(@forse_niente); + +# +# End process +# +close OUTFILE; +unlink "$fname.sortato"; +exit 0; + +################################################################################################ +######################## SUBROUTINE ############################### +################################################################################################ + +sub probe_cutoff +{ + +$c=-6; + +foreach $linea(@array){ + @subarray1 = split("\t",$linea); + $inizio=(((($subarray1[4]-$subarray1[3])/2)+$subarray1[3])-($s_windows/2)); #per modificare la windows cambia il 500 e il 1000 + $fine=$inizio+$s_windows; #windows 500 -->250 e 500 windows 1000 --> 500 e 1000 + $counter=0; + $counter_value=0; + + if($subarray1[5]>=$valore_percentile){ + push (@array_probe,[@subarray1]); + print buonitutti "$linea\n"; + } + + for($i=$c;$i<=$#array;$i++) { + if ($i<0){ + next; + } + @subarray = split("\t",$array[$i]); + if (($subarray[3]>=$inizio)&&($subarray[3]<$fine)&&("$subarray[0]" eq "$subarray1[0]")){ + $counter++; + if ($subarray[5]>= $valore_percentile){ + $counter_value++; + } + } + if (($subarray[3]>$fine)||("$subarray[0]" ne "$subarray1[0]")){ + $cazzo=$subarray[3]; + last; + } + + } +$c++; + if ($counter !=0){ + $chi_sq = (((($counter_value - ($probabilita*$counter))**2)/($probabilita*$counter))+(((($counter-$counter_value)-((1-$probabilita)*$counter))**2)/((1-$probabilita)*$counter))); + } + else{ + $chi_sq = NA; + } + +use Statistics::Distributions; +$chisprob=Statistics::Distributions::chisqrprob (1,$chi_sq); +if ($chisprob == 0){ + $log_10=100; +} +else { + $log_10 = -log($chisprob)/log(10); +} + +print pvalue "$subarray1[0]\t$subarray1[1]\tpv_$subarray1[2]\t$subarray1[3]\t$subarray1[4]\t$log_10\t$subarray1[6]\t$subarray1[7]\t$subarray1[5]\n"; + + if ($log_10>=$valore_log){ + #print"$subarray1[3],$inizio,$fine,$counter,$counter_value,$chi_sq,$chisprob,$log_10\n"; + @proviamo=($subarray1[0],$subarray1[1],$subarray1[2],$inizio,$fine,$log_10,$subarray1[6],$subarray1[7],$subarray1[5]); + push (@matrix_pvalue,[@proviamo]); + + } +} +@result_table=@matrix_pvalue; +@result_table2=@array_probe; +return(\@result_table,\@result_table2); + +} + + + + + +################################################################################################ + + + +sub peak +{ +@matrix_value=@_; + +$stop=$#matrix_value; + +for ($cio=0; $cio<=$stop;$cio++) { + @log_ratio=(); + @chilosa=(); + $inizio=$matrix_value[$cio][3]; + $somma=0; + $media=0; + $sommachilo=0; + $mediachilo=0; + $diviso=0; + push(@log_ratio,$matrix_value[$cio][5]); + if ($matrix_value[$cio][8]>=$valore_percentile){ + push(@chilosa,$matrix_value[$cio][8]); + } + for ($j=0; $j<=$stop;$j++){ + $distanza=($matrix_value[$cio+1][3]-$matrix_value[$cio][4]); + if (($distanza > $dist_max) || ($cio==$stop)||(!("$matrix_value[$cio+1][0]" eq "$matrix_value[$cio][0]"))){ + $j=$stop; + } + elsif (($distanza <= $dist_max)&&("$matrix_value[$cio+1][0]" eq "$matrix_value[$cio][0]")){ + $cio++; + push(@log_ratio,$matrix_value[$cio][5]); + $fine=$matrix_value[$cio][4]; + $inizio3=$inizio; + $j++; + if ($matrix_value[$cio][8]>=$valore_percentile){ + push(@chilosa,$matrix_value[$cio][8]); + } + } + + + } + $numeroprobes=($#chilosa+1); + if ((($#log_ratio+1)>=$num_probe) && (($type_peak eq "s") ||(($#chilosa+1)>=$num_probe))){ + foreach $n (@log_ratio) { + $somma+=$n; + } + if($type_peak eq "s"){ + @chilosa=@log_ratio; + } + foreach $nchilo (@chilosa) { + $sommachilo+=$nchilo; + } + $diviso=$#log_ratio+1; + $media = $somma/(@log_ratio); + $mediachilo=$sommachilo/(@chilosa); + #$somma_media_chilo=((sqrt($#chilosa+1))+$mediachilo); + #$moltipli=(($#log_ratio+1)*$media); + #$somma_media=((sqrt($#log_ratio+1))+$media); + if ($type_peak eq "p"){ + $inizio3=int($inizio3+(($s_windows/2)-25)); + $fine=int($fine-(($s_windows/2)-25)); + } + @proviamo_double=($matrix_value[$cio][0],$matrix_value[$cio][1],$col3,$inizio3,$fine,$media,$matrix_value[$cio][6],$matrix_value[$cio][7],$mediachilo,$diviso); + push (@matrix_for_double,[@proviamo_double]); + + } +} + +@result_table=@matrix_for_double; +return(@result_table); +} + + +################################################################################################ + + + +sub double_peak +{ +@matrix_value=@_; + +$stop=$#matrix_value; +$i=0; +$j=0; +#print $stop; + +# +# Print output file +# +if ($outfile){ + open (OUTFILE, ">$outfile"); +} + +# +# print File Header +# +my $header=<<e0c6654; +\# infile: $fname +\# percentile value: $value=$valore_percentile +\# fold change: $fc_value +\# type of analysis: $type_peak +\# log(pval analysis): $valore_log +\# num (probe defining a peak): $num_probe +\# dist: $dist_max +\# window length: $s_windows +track name=$col3 description="peaks find" visibility=2 +e0c6654 + +if ($outfile){ + print OUTFILE "$header"; +}else{ + print "$header"; +} +if ($type_peak eq "p"){ + print "perc value=$value=$valore_percentile, #probes=$num_probe (dist=$dist_max), window=$s_windows, type analisys =p-value (log=$valore_log), dist peaks=$dist_max_peaks"; +} +if ($type_peak eq "s"){ + print "perc value=$value=$valore_percentile, #probes=$num_probe (dist=$dist_max), window=$s_windows, type analisys=score, dist peaks=$dist_max_peaks"; +} +for ($i=0; $i<=$stop;$i++) { + @log_ratio=(); + @log_ratio2=(); + @diviso2=(); + $inizio=$matrix_value[$i][3]; + $somma=0; + $somma2=0; + $media=0; + $media2=0; + $diviso2=0; + push(@log_ratio,$matrix_value[$i][5]); + push(@log_ratio2,$matrix_value[$i][8]); + push(@diviso2,$matrix_value[$i][9]); + #print "giro $i\n"; + for ($j=0; $j<=$stop;$j++){ + # if ("$matrix_value[$i+1][0]" eq "$matrix_value[$i][0]"){ + $distanza=($matrix_value[$i+1][3]-$matrix_value[$i][4]); + #print "distanza fra i due $distanza\n"; + #exit; + if (($distanza > $dist_max_peaks) || ($i==$stop)||(!("$matrix_value[$i+1][0]" eq "$matrix_value[$i][0]"))){ + $j=$stop; + $fine=$matrix_value[$i][4]; + } + elsif (($distanza <= $dist_max_peaks)&&("$matrix_value[$i+1][0]" eq "$matrix_value[$i][0]")){ + $i++; + push(@log_ratio,$matrix_value[$i][5]); + push(@log_ratio2,$matrix_value[$i][8]); + push(@diviso2,$matrix_value[$i][9]); + $fine=$matrix_value[$i][4]; + $j++; + } + + # } + } + foreach $n (@log_ratio) { + $somma+=$n; + } + foreach $n1 (@log_ratio2) { + $somma2+=$n1; + } + foreach $n2 (@diviso2) { + $diviso2+=$n2; + } + $media = $somma/(@log_ratio); + $media2 = $somma2/(@log_ratio2); + $somma_media=(sqrt($diviso2)+$media); + $somma_score=(sqrt($diviso2)+$media2); + $somma_media = sprintf "%.2f", $somma_media; + $somma_score = sprintf "%.2f", $somma_score; + #$somma_score = $somma_score.$i + #$somma_media_chilo=((sqrt($#chilosa+1))+$mediachilo); + if ($outfile){ + print OUTFILE "$matrix_value[$i][0]\t$matrix_value[$i][1]\t$matrix_value[$i][2]\t$inizio\t$fine\t$somma_media\t$matrix_value[$i][6]\t$matrix_value[$i][7]\t$somma_score$i\n"; + }else{ + print "$matrix_value[$i][0]\t$matrix_value[$i][1]\t$matrix_value[$i][2]\t$inizio\t$fine\t$somma_media\t$matrix_value[$i][6]\t$matrix_value[$i][7]\t$somma_score$i\n"; + } + + +} +} + +#################################################################################################################################### + +sub printusage { + + print<<eoc22334; + + + + *************************************************** + :: N i m b l e G e n C h i p A n a l y s i s :: + *************************************************** + + + USAGE SUMMARY + --------------------------------------------------------------------------------- + This program utilizes NimbleGen ratio files in gff format as INPUT FILE and + provides a table of the computed picks in the same gff file format. + + + + --in [input filename] + --perc [percentile value, it is used to calculate the threshold rate based + on dataset distribution to filter out background ratio; i.e. 0.99] + OR + --fc [fold change value, it is used as fixed threshold to filter out + background ratio, i.e. 2] + --t [type of analysis; p, performs peaks determination based on p-value inference; + s, performs peaks determination based on a scoring system] + --log (required only for p-value based analysis) + [log2(p-value), cutoff integer to be used to identify a significant peak; i.e. 5] + --num [minimal number of consecutive probes used to define a peak; i.e. 3] + --dist [greatest nucleotide distance between two probes or between two peaks that + allow to consider the signals as belonging to the same peak] + --w [window length] + + --out [output filename (optional)] + ----------------------------------------------------------------------------------------- + EXAMPLES: + + perl program.pl --in 75340_ratio.gff --perc 0.98 --t s --num 3 --dist 250 --w 500 +OR + perl program.pl --in 75340_ratio.gff --perc 0.98 --t p --log 5 --num 3 --dist 250 --w 500 + + ----------------------------------------------------------------------------------------- + +eoc22334 + exit 0; + +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/Raw_data.py Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,130 @@ +#!/usr/bin/env python + +# Copyright 2009 Matteo Cesaroni, Lucilla Luzi +# +# This program is free software; ; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as published by +# the Free Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +. + +import sys +from rpy import * + + +def stop_err(msg): + sys.stderr.write(msg) + sys.exit() + +def main(): + + # Handle input params + in_fname = sys.argv[1] + out_fname = sys.argv[2] + #sys.stdout=open('log.txt','a') + try: + column = int( sys.argv[3] ) - 1 + column_x = int( sys.argv[4] ) - 1 + column_y = int( sys.argv[5] ) - 1 + except: + stop_err( "..Column not specified, your query does not contain a column of numerical data." ) + + + title = sys.argv[6] + + skipped_lines = 0 + first_invalid_line = 0 + invalid_value = '' + + riga = [] + for tuo in range(1,1025): + riga.append(int(0)) + #print riga + + matrice = [] + for mio in range(1,769): + #print mio + matrice.append(riga) + + #print matrix + matrix1 = array(matrice) + #print matrix1 + for i, line in enumerate( file( in_fname ) ): + valid = True + line = line.rstrip('\r\n') + # Skip comments + if line and not line.startswith( '#' ): + # Extract values and convert to floats + row = [] + val = 0 + val_x = 0 + val_y = 0 + try: + fields = line.split( "\t" ) + val = fields[column] + val_x = (int(fields[column_x])-1) + val_y = (int(fields[column_y])-1) + matrix1[val_x][val_y]=float(val) + + + except: + valid = False + skipped_lines += 1 + if not first_invalid_line: + first_invalid_line = i+1 + else: + try: + row.append( float( val ) ) + except ValueError: + valid = False + skipped_lines += 1 + if not first_invalid_line: + first_invalid_line = i+1 + invalid_value = fields[column] + else: + valid = False + skipped_lines += 1 + if not first_invalid_line: + first_invalid_line = i+1 + + output_prima=sys.stdout + fsock=open('log.txt','w') + sys.stdout=fsock + + + for i in range(768): + for j in range(1024): + if j<1022: + print "%s\t" %matrix1[i][j], + if j==1023: + print "%s" %matrix1[i][j] + sys.stdout=output_prima + fsock.close() + + set_default_mode(NO_CONVERSION) + if skipped_lines < i: + #print "..on columnn %s" %sys.argv[3] + a=r.read_table("log.txt") + b=r.as_matrix(a) + #r.print_(b) + #b=r.cbind(a[1],a[1]) + r.pdf( out_fname, 8, 8 ) + r.image(z=r.log2(b),col=r.terrain_colors(100000),main=title, xlab="X", ylab="Y") + r.dev_off() + + else: + print "..all values in column %s are non-numeric." %sys.argv[3] + + if skipped_lines > 0: + print "..skipped %d invalid lines starting with line #%d. Value '%s' is not numeric." % ( skipped_lines, first_invalid_line, invalid_value ) + + r.quit( save="no" ) + +if __name__ == "__main__": + main() +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/Raw_data.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,58 @@ +<tool id="view_chip" name="ChipView" version="1.0.0"> + <description>looking into the chip</description> + <command interpreter="python">Raw_data.py $input $output $numerical_column_PM $numerical_column_x $numerical_column_y $title</command> + <inputs> + <param format="tabular" name="input" type="data" label="Source file (*.pair file)"/> + <param name="numerical_column_PM" type="data_column" data_ref="input" numerical="False" value="c10" label="Numerical column for PM" /> + <param name="numerical_column_x" type="data_column" data_ref="input" numerical="False" value="c6" label="Numerical column for x axis" /> + <param name="numerical_column_y" type="data_column" data_ref="input" numerical="False" value="c7" label="Numerical column for y axis" /> + <param name="title" type="text" size="30" value="Image" label="Plot title"/> + </inputs> + <outputs> + <data format="pdf" name="output" /> + </outputs> + <help> +.. class:: infomark + +**What it does** + +This tool creates the image of the array to make sure that no artifacts are present on the surface. + +PLEASE, for more detailed information refer to the CARPET user Manual: +click to download_ it. + +.. _download: /static/example_file/CARPET_userManual.zip + +----- + +.. class:: warningmark + +This tool requires at least three numerical column **"X position"**, **"Y position"** (the position coordinates on the chip of each probe) and **"PM value"** (the raw signal of each probe). + +----- + +**Example** + +- On **Get Data** section it is possible to upload your files clicking on **"Upload File from your computer"**. + + Click here_ to download a pair_file example. + +.. _here: /static/example_file/Pair_file.txt.zip + +- Input dataset pair_file from NimbleGen contained all these informations (eleven columns: c1, c2, c3, c4, c5... c11):: + + c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 + IMAGE_ID GENE_EXPR_OPTION SEQ_ID PROBE_ID POSITION X Y MATCH_INDEX SEQ_URL PM MM + 1251702_635 FORWARD CHR19 CHR1900P000011001 11001 565 381 64160375 ____ 5459.89 0.00 + 1251702_635 FORWARD CHR19 CHR1900P000011050 11050 610 656 64160376 ____ 865.75 0.00 + + +- Chose column **c10** for "PM value", **c6** for "X position" and **c7** for "Y position". + + + +.. image:: static/images/CARPET/chip.jpg + + </help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/TSS_distance.py Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,106 @@ +#!/usr/bin/env python + +# Copyright 2009 Matteo Cesaroni, Lucilla Luzi +# +# This program is free software; ; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as published by +# the Free Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + + +import sys +from rpy import * + + +def stop_err(msg): + sys.stderr.write(msg) + sys.exit() + +def main(): + + # Handle input params + in_fname = sys.argv[1] + out_fname = sys.argv[2] + try: + column = int( sys.argv[3] ) - 1 + except: + stop_err( "..Column not specified, your query does not contain a column of numerical data." ) + title = sys.argv[4] + xlab = sys.argv[5] + breaks = int( sys.argv[6] ) + if breaks == 0: breaks = "Sturges" + if sys.argv[7] == "true": density = True + else: density = False + + + + matrix = [] + skipped_lines = 0 + first_invalid_line = 0 + invalid_value = '' + + for i, line in enumerate( file( in_fname ) ): + valid = True + line = line.rstrip('\r\n') + # Skip comments + if line and not line.startswith( '#' ): + # Extract values and convert to floats + row = [] + try: + fields = line.split( "\t" ) + val = fields[column] + if val.lower() == "na": + row.append( float( "nan" ) ) + if float(val) > float(xlab): + val = (float(xlab)+2000) + + row.append( float( val ) ) + except: + valid = False + skipped_lines += 1 + if not first_invalid_line: + first_invalid_line = i+1 + else: + try: + row.append( float( val ) ) + except ValueError: + valid = False + skipped_lines += 1 + if not first_invalid_line: + first_invalid_line = i+1 + invalid_value = fields[column] + else: + valid = False + skipped_lines += 1 + if not first_invalid_line: + first_invalid_line = i+1 + + if valid: + matrix.append( row ) + + if skipped_lines < i: + print "..on columnn %s" %sys.argv[3] + try: + a = array( matrix ) + r.pdf( out_fname, 8, 8 ) + r.hist( a, probability=True, main=title, xlab="TSS distance", breaks=breaks ) + if density: + r.lines( r.density( a ) ) + r.dev_off() + except exc: + stop_err("Building histogram resulted in error: %s." %str( exc )) + else: + print "..all values in column %s are non-numeric." %sys.argv[3] + + if skipped_lines > 0: + print "..skipped %d invalid lines starting with line #%d. Value '%s' is not numeric." % ( skipped_lines, first_invalid_line, invalid_value ) + + r.quit( save="no" ) + +if __name__ == "__main__": + main()
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/TSS_distance.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,61 @@ +<tool id="Annotation visualization" name="GIN visualizator" version="1.0.0"> + <description>of peaks distribution</description> + <command interpreter="python">TSS_distance.py $input $out_file1 $numerical_column "$title" $window $breaks $density</command> + <inputs> + <param name="input" type="data" format="tabular" label="Dataset" help="Query missing? See TIP below"/> + <param name="numerical_column" type="data_column" data_ref="input" numerical="True" label="Numerical column for x axis" /> + <param name="breaks" type="integer" size="4" value="20" label="Number of breaks (bars)"/> + <param name="title" type="text" size="30" value="Histogram" label="Plot title"/> + <param name="window" type="integer" size="10" value="4000" label="Zoom visualitazion"/> + <param name="density" type="boolean" checked="yes" label="Include smoothed density"/> + </inputs> + <outputs> + <data format="pdf" name="out_file1" /> + </outputs> + <help> + +.. class:: infomark + +**What it does** + +This tool generates a distribution of peaks with respect to their distance from TSS. + +PLEASE, for more detailed information refer to the CARPET user Manual: +click to download_ it. + +.. _download: /static/example_file/CARPET_userManual.zip + +----- + +.. class:: warningmark + +This tool requires at least 1 numerical column **"distance from TSS"**. + +----- + +**Syntax** + +- All invalid, blank and comment lines in the query are skipped. The number of skipped lines is displayed in the resulting history item. +- **Numerical column for x axis** - only numerical columns are possible. +- **Number of breaks(bars)** - breakpoints between histogram cells. Value of '0' will determine breaks automatically. +- **Plot title** - the histogram title. +- **Label for x axis** - the label of the x axis for the histogram. +- **Zoom visualization** - Limit of the X axis. All the peaks falling beyond this limit are plotted in the last histogram cell. +- **Include smoothed density** - if checked, the resulting graph will join the given corresponding points with line segments. + +----- + +**Example** + +- Input dataset ann_file from GIN (twelve columns: c1, c2, c3, c4, c5... c12): + +.. image:: static/images/CARPET/output_ann.png + + +- Chose column **c12** ("Distance TSS"). + +.. image:: static/images/CARPET/hist_ann.png + +</help> +</tool> +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/annotation_expr.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,74 @@ +<tool id="Annotation_Expr" name="ENO" version="1.0.0"> + <description>Expression NOtator</description> + <command interpreter="perl">annotation_expr_intron.pl $input1 $input2 $output</command> + <inputs> + <param format="tabular" name="input1" type="data" label="Expression file"/> + <param format="tabular" name="input2" type="data" label="Annotation table"/> + </inputs> + <outputs> + <data format="tabular" name="output"/> + </outputs> + <help> + .. class:: infomark + +**What it does** + +ENO assigns each exon of a transcript the relative matching probes of the array. If a probe matches with more than one transcript, it is associated to every transcript. + +PLEASE, for more detailed information refer to the CARPET user Manual: +click to download_ it. + +.. _download: /static/example_file/CARPET_userManual.zip + +-------- + +.. class:: warningmark + +**Annotation Table** + +Annotation table was directly downloadable from **"Get Data"** section (**"UCSC Main table browser"** link). +Pay attention to choose the right output format (**"all field from selected table"**) and check **"send output to Galaxy"**. + +It is possible to download many different annotation tables coming from different organisms and database such as RefSeq, UCSC gene, FlyBase, EST, etc etc... + +**All annotation tables must have headers.** + + +-------- + +.. class:: warningmark + +**Custom annotation table** + + .. class:: infomark + + + **About format** + + Annotation table format must be the same downlodable from UCSC. In the specific case of this tool the following fields must be present: + + 1. **chrom** - The name of the chromosome (e.g. chr1, chrY_random). + 2. **chromStart** - The starting position in the chromosome. (The first base in a chromosome is numbered 0.) + 3. **chromEnd** - The ending position in the chromosome, plus 1 (i.e., a half-open interval). + 4. **name** - The name of the BED line. + 5. **strand** - Defines the strand - either + or - . + 6. **blockCount** - The number of blocks (exons) in the BED line. + 7. **blockSizes** - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount. + 8. **blockStarts** - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount. + + +The table **must** have headers + + +--------- + +.. class:: infomark + +**How does it work?** + +.. image:: static/images/CARPET/Eno.png + + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/annotation_expr_intron.pl Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,232 @@ +#!/usr/bin/perl + +# Copyright 2009 Matteo Cesaroni, Lucilla Luzi +# + +# This program is free software; ; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as published by +# the Free Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + + + +$|=1; +my $infile = $ARGV[0]; +my $infile2=$ARGV[1]; +my $file_output=$ARGV[2]; + +open (INFILE, "<$infile"); +open (INFILE2, "<$infile2"); +open (OUTFILE1, ">$file_output") or die "Cannot find file $file_output\n"; + +$campi_t=0; +while (defined (my $line_down = <INFILE2>)) { + $line_down=~ s/\#//g; + chomp($line_down); + $campi_t++; + my @tmp_down=split(/\s+/, $line_down); + if($campi_t==1){ + $z=0; + foreach $campo_t(@tmp_down){ + if(($campo_t eq "name") || ($campo_t eq "qName")){ + $zRef=$z; + } + if(($campo_t eq "txStart") || ($campo_t eq "tStart") || ($campo_t eq "chromStart")){ + $ztxStart=$z; + } + if(($campo_t eq "txEnd") || ($campo_t eq "tEnd") || ($campo_t eq "chromEnd")){ + $ztxEnd=$z; + } + if($campo_t eq "strand"){ + $zstrand=$z; + } + if(($campo_t eq "chrom") || ($campo_t eq "tName")){ + $zchrom=$z; + } + if(($campo_t eq "exonStarts") || ($campo_t eq "tStarts")){ + $zexonstart=$z; + } + if($campo_t eq "exonEnds"){ + $zexonend=$z; + } + if($campo_t eq "blockSizes"){ + $zblocksize=$z; + } + if($campo_t eq "name2"){ + $zname=$z; + } + if(($campo_t eq "exonCount")||($campo_t eq "blockCount")){ + $zcount=$z; + } + $z++; + } + if(!$zname){ + $zname=$zRef; + } + if(!$zexonstart){ + $zexonstart=$ztxStart; + } + if(!$zexonend){ + $zexonend=$ztxEnd; + } + if(($zRef eq "") || ($ztxStart eq "") || ($zstrand eq "") || ($zchrom eq "")){ + print "Annotation file is not in the accepted format\n"; + exit; + }else{print "Expression chip annotation";} + next; + } + chomp $tmp_down[$zchrom]; + $tab_ann{$tmp_down[$zchrom]}.="$line_down\n"; +} + +while (defined (my $line_down = <INFILE>)) { + my @tmp_down = split("\t", $line_down); + chomp $tmp_down[0]; + $tab_probe{$tmp_down[0]}.=$line_down; +} + +@chrom_probes= keys(%tab_probe); + +&expression; + + +exit 0; + + +########### +#subrutine# +########### + +sub expression +{ +foreach $chromosoma (@chrom_probes){ + %gene_cen=""; + @file2=split("\n", $tab_ann{$chromosoma}); + foreach $linea(@file2) { + chomp $linea; + $linea=~ s/#//g; + my @kEle=split(/\s+/, $linea); + $ref=$kEle[$zRef]; + $chrom=$kEle[$zchrom]; + $strand=$kEle[$zstrand]; + $transcriptStart=$kEle[$ztxStart]; + $transcriptStop=$kEle[$ztxEnd]; + if($zcount){ + $exoncount=$kEle[$zcount]; + } + else + { + $exoncount=1; + } + $geneName=$kEle[$zname]; + $exonStartref=$kEle[$zexonstart]; + + my @exonStart=split(",", $exonStartref); + + if (!$zblocksize){ + $exonEndref=$kEle[$zexonend]; + } + else { + @blockStop=split(",", $kEle[$zblocksize]); + $exonEndref=""; + for ($jj=0; $jj<=$#exonStart; $jj++){ + $end_block=$exonStart[$jj]+$blockStop[$jj]; + $exonEndref.="$end_block,"; + + } + } + + my @exonStop=split(",", $exonEndref); + + #print @exonStart; + + @file1=split("\n",$tab_probe{$chromosoma}); + + foreach $line(@file1) { + chomp $line; + #chop $line; + if ($line=~/track/g){next;} + if ($line=~/#/g){next;} + if ($line=~/^\s+$/g){next;} + my @Line=split(/\t/, $line); + my $Chrom=$Line[0]; + my $Start=$Line[3]; + my $Stop=$Line[4]; + my $ProbeName=$Line[5]; + my $feature="ciccio"; + if ($Chrom eq $chrom) { + if(($Start<=$transcriptStart && $Stop>$transcriptStart) || ($Start<$transcriptStop && $Stop>=$transcriptStop) || ($Start>=$transcriptStart && $Stop<=$transcriptStop) ){ + #print "sono entrato con start $Start stop $Stop e $transcriptStart e $transcriptStop\n"; + + for($i=0;$i<=$#exonStart;$i++) { + if ($strand eq "+"){ + $exoncount1=$i+1; + $exoncount2=$exoncount1; + if($i==$#exonStart) {$exoncount2="last";} + } + if ($strand eq "-"){ + $exoncount1=($#exonStart+1)-$i; + $exoncount2=$exoncount1; + if($i==0) {$exoncount2="last";} + } + + if(($Start<=$exonStart[$i]) && ($i==0) && ($strand eq "+")){ + $feature="$chrom\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tprom_exon $exoncount2\t$exoncount1"; + $gene_cen{"$ref\t$geneName"}{$feature}.="$Chrom-$Start,"; + last; + } + if(($Start<=$exonStop[$i]) && ($i==$#exonStart) && ($strand eq "-") && ($Stop>=$exonStop[$i])){ + $feature="$chrom\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tprom_exon $exoncount2\t$exoncount1"; + $gene_cen{"$ref\t$geneName"}{$feature}.="$Chrom-$Start,"; + last; + } + if(($Start<=$exonStart[$i]) && ($Stop>$exonStart[$i])){ + $feature="$chrom\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\t*intronexon $exoncount2\t$exoncount1"; + $gene_cen{"$ref\t$geneName"}{$feature}.="$Chrom-$Start,"; + last; + } + if(($Start>=$exonStart[$i]) && ($Stop<=$exonStop[$i])){ + $feature="$chrom\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\texon $exoncount2\t$exoncount1"; + $gene_cen{"$ref\t$geneName"}{$feature}.="$Chrom-$Start,"; + last; + } + if(($Start<$exonStop[$i]) && ($Stop>$exonStop[$i])){ + $feature="$chrom\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\t*exonintron $exoncount2\t$exoncount1"; + $gene_cen{"$ref\t$geneName"}{$feature}.="$Chrom-$Start,"; + last; + } + if(($Start>=$exonStop[$i]) && ($Stop<=$exonStart[$i+1]) && ($check==0)){ + $feature="$chrom\t$transcriptStart\t$transcriptStop\t$strand\t$exoncount\tintron $exoncount2\t$exoncount1"; + $gene_cen{"$ref\t$geneName"}{$feature}.="$Chrom-$Start,"; + last; + } + + } + + + } + + } + + } + } + +foreach $nome (keys %gene_cen){ + foreach $description (keys %{$gene_cen {$nome}}) { + print OUTFILE1 "$nome\t$description\t$gene_cen{$nome}{$description}\n"; + + } + +} +} +close INFILE; +close INFILE2; +} + + +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/calcolo_p_v4_norm.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,187 @@ +<tool id="expressions" name="TEA" version="1.0.0"> + <description>Tiling Expression Analizer</description> + <command interpreter="perl">calcolo_p_v4_norm_intron.pl -opt ${ann_choice.type1} -log ${ann_choice.type_data} -ann ${ann_choice.ann} -wt ${ann_choice.wt} -treat ${ann_choice.treat} -out $output -fdr ${ann_choice.fdr} -pv ${ann_choice.pv} -norm ${ann_choice.norm} -rc ${ann_choice.rc} -fc ${ann_choice.summary_choice.fc} -exon ${ann_choice.type} -sum_met ${ann_choice.summary_choice.fc_output}</command> + <inputs> + <conditional name="ann_choice"> + <param name="type1" type="select" label="Analysis type"> + <option value="comp">comparison</option> + <option value="expr">expression</option> + </param> + + <when value="comp"> + <param format="tabular" name="ann" type="data" label="annotation file"/> + <param format="tabular" name="wt" type="data" label="expression chip condition A"/> + <param format="tabular" name="treat" type="data" label="expression chip condition B"/> + <param name="type_data" type="select" label="Data type"> + <option value="log">log2 value</option> + <option value="no_log">raw value</option> + </param> + <param name="norm" type="select" label="Normalization"> + <option value="yes">quantile-normalization</option> + <option value="no">no normalization</option> + </param> + <param name="type" type="select" label="probes selection"> + <option value="internal_exon">internal exon</option> + <option value="all_exon">all exon</option> + <option value="last_exon">last exon</option> + </param> + <conditional name="summary_choice"> + <param name="fc_output" type="select" label="summary method"> + <option value="mean">mean</option> + <option value="median">median</option> + <option value="both">both</option> + </param> + <when value="mean"> + <param name="fc" size="3" type="text" value="1.5" label="Fold change cutoff"/> + </when> + <when value="median"> + <param name="fc" size="3" type="text" value="1.5" label="Fold change cutoff"/> + </when> + <when value="both"> + <param name="fc" size="12" type="text" value="NOT-NEEDED" label="Fold change cutoff"/> + </when> + </conditional> + <param name="rc" size="2" type="text" value="7" label="raw value cutoff (log2)"/> + <param name="fdr" type="select" label="FDR correction"> + <option value="yes">yes</option> + <option value="no">no</option> + </param> + <param name="pv" size="4" type="text" value="0.05" label="p-value cutoff"/> + + + + </when> + <when value="expr"> + <param format="tabular" name="ann" type="data" label="annotation file"/> + <param format="tabular" name="wt" type="data" label="expression chip"/> + <param format="tabular" name="treat" type="data" label="NOT NEEDED"/> + <param name="type_data" type="select" label="Data type"> + <option value="log">log2 value</option> + <option value="no_log">raw value</option> + </param> + <param name="type" type="select" label="probes selection"> + <option value="internal_exon">internal exon</option> + <option value="all_exon">all exon</option> + <option value="last_exon">last exon</option> + </param> + <conditional name="summary_choice"> + <param name="fc_output" type="select" label="summary method"> + <option value="mean">mean</option> + <option value="median">median</option> + <option value="both">both</option> + </param> + <when value="mean"> + <param name="fc" size="12" type="text" value="NOT-NEEDED" label="Fold change cutoff"/> + </when> + <when value="median"> + <param name="fc" size="12" type="text" value="NOT-NEEDED" label="Fold change cutoff"/> + </when> + <when value="both"> + <param name="fc" size="12" type="text" value="NOT-NEEDED" label="Fold change cutoff"/> + </when> + </conditional> + <param name="norm" size="12" type="text" value="NOT-NEEDED" label="Normalization"/> + <param name="pv" size="12" type="text" value="NOT-NEEDED" label="p-value cutoff"/> + <param name="fdr" size="12" type="text" value="NOT-NEEDED" label="FDR correction"/> + <param name="rc" size="12" type="text" value="NOT-NEEDED" label="raw value cutoff (log2)"/> + </when> + </conditional> + + + </inputs> + <outputs> + <data format="tabular" name="output" /> + </outputs> + + <help> + .. class:: infomark + +**What it does** + +TEA utilizes NimbleGen expression files in gff format and annotated table by ENO as INPUT FILES and generates a table with the expression value. When comparing two different conditions a Fold Change and a p-value are calculated. + +PLEASE, for more detailed information refer to the CARPET user Manual: +click to download_ it. + +.. _download: /static/example_file/CARPET_userManual.zip + +-------- + +**Parameters:** + +- **Analysis type:** + - **expression** analysis calculates an expression value for each transcript coming from the mean or the median of all matching probes + - **comparison** analysis calculate an expression value for each transcript in both conditions, then calculates a Fold Change for each transcript and a p-value based on t-Test distribution. +- **Data type:** + - **log2 value:** the data is not converted in log2 + - **raw value:** the data is converted in log2 +- **Normalization:** quantile normalization between the two chips (not necessary if analysis type is "expression") +- **probe selection:** + - **internal exon:** only probes annotated as exons are used to calculate expression value. + - **all exon:** also probes annotated at the boundaries of introns/exons are used to calculate the expression value. (probes in intronexon position usually have lower signal) + - **last exon:** only probes in the last exon are used to calculate expression value. This analysis can be preferred for cDNA generated by oligo-dT RT, since 3' of transcripts are generally better represented. +- **summary method:** + - **mean:** fold change for each gene is calculated based on the mean value + - **median:** fold change for each gene is calculated based on the median value + - **both:** both are used +- **Fold change cutoff:** only transcripts with FC higher than cutoff are kept +- **FDR:** + - **yes:** False Discovery Rate correction is applied (as described in Storie et al. 2002) + - **no:** No correction --> raw p-value +- **p-value cutoff:** only transcripts with p-value less than cutoff are kept +- **raw value cutoff (log2):** only transcripts with raw value higher than cutoff at least in one experiment are kept + +-------- + + +**INPUT FILE** + +Nimblegen gives you back a GFF file with the coordinates of each probe and the signal raw value. + +Click here_ to download a GFF file example. + +.. _here: /static/example_file/Expression_analysis_files.zip + + +Example of Nimblegen Expression GFF format:: + + chr19 Nimblegen tiling_array 100000 1000051 20459 + . probe_name + chr19 Nimblegen tiling_array 100100 1000151 1394 + . probe_name + +.. class:: warningmark + +The sixth column **must** contain the raw signal (**NOT** log2) that Nimblegen gives you back after the experiment + +The annotation table **MUST** be created usign ENO tool, before running TEA. + +--------- + +.. class:: infomark + +**How does it work?** + +For each gene is built the signal distibution of the probes matching Exon. + +-In an expression experiment, the mean or the median of the distribution represents the result of Tea. + +-In a comparison experiment the distibution of the gene exon signal is compared between the two conditions (1 and 2) and a t-test is performed with the possibility to introduce FDR correction. + +.. image:: static/images/CARPET/Tea.png + + + + +**OUTPUT** + +- **expresion** + +.. image:: static/images/CARPET/expression.png + +- **comparison** + +.. image:: static/images/CARPET/comparison.png + + </help> + + +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/calcolo_p_v4_norm_intron.pl Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,636 @@ +#! /usr/bin/perl + +# Copyright 2009 Matteo Cesaroni, Lucilla Luzi +# +# This program is free software; ; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as published by +# the Free Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + + + +#prende tutte le probes che mecciano almeno in parte dentro gli esoni + +use Statistics::PointEstimation; +use Statistics::TTest; +#use Statistics::Test::WilcoxonRankSum; +use Getopt::Long; + +GetOptions ( + "help" => \$OPT{help}, + "ann=s" => \$OPT{ann}, + "wt=s" => \$OPT{wt}, + "treat=s" => \$OPT{treat}, + "out=s" => \$OPT{out}, + "pv=s" => \$OPT{prob}, + "rc=s" => \$OPT{raw_cut}, + "fc=s" => \$OPT{fc_cut}, + "exon=s" => \$OPT{exon}, + "opt=s" => \$OPT{option}, + "norm=s" =>\$OPT{norm_q}, + "sum_met=s" =>\$OPT{sum_met}, + "fdr=s" =>\$OPT{fdr}, + "log=s" =>\$OPT{log_t}, +)|| printusage(); + +# opzioni da linea di comando +my $file_ann=$OPT{ann}; +my $file_wt=$OPT{wt}; +my $file_treat=$OPT{treat}; +my $file_output=$OPT{out}; +my $pv_cut=$OPT{prob}; +my $media_cut=$OPT{raw_cut}; +my $FC_cut=$OPT{fc_cut}; +my $tipo=$OPT{exon}; +my $option=$OPT{option}; +my $normalization_q=$OPT{norm_q}; +my $sum_meth=$OPT{sum_met}; +my $corretction=$OPT{fdr}; +my $data_log=$OPT{log_t}; + +$help and printusage(); + + +if($option eq "comp"){ +if ($sum_meth eq "both"){ +$header=<<e0c6654; +\# annotation_file: $file_ann +\# wt_file: $file_wt +\# treat_file: $file_treat +\# p-value cutoff: $pv_cut +\# raw data cutoff cutoff: $media_cut +\# summary method: $sum_meth +\# fold change cutoff: $FC_cut +\# type of analysis: $tipo +\# headers: name RefSeq Chr txStart txEnd strand mean_chip1 mean_chip2 FC_mean median_chip1 median_chip2 FC_median num_probes_in_gene p-value FDR(q-value) +e0c6654 + +} +if ($sum_meth eq "mean"){ +$header=<<e0c6654; +\# annotation_file: $file_ann +\# wt_file: $file_wt +\# treat_file: $file_treat +\# p-value cutoff: $pv_cut +\# raw data cutoff cutoff: $media_cut +\# summary method: $sum_meth +\# fold change cutoff: $FC_cut +\# type of analysis: $tipo +\# headers: name RefSeq Chr txStart txEnd strand mean_chip1 mean_chip2 FC_mean num_probes_in_gene p-value FDR(q-value) +e0c6654 + +} +if ($sum_meth eq "median"){ +$header=<<e0c6654; +\# annotation_file: $file_ann +\# wt_file: $file_wt +\# treat_file: $file_treat +\# p-value cutoff: $pv_cut +\# raw data cutoff cutoff: $media_cut +\# summary method: $sum_meth +\# fold change cutoff: $FC_cut +\# type of analysis: $tipo +\# headers: name RefSeq Chr txStart txEnd strand median_chip1 median_chip2 FC_median num_probes_in_gene p-value FDR(q-value) +e0c6654 + +} + + + +# "usage" se non ci sono opzioni +if (!$file_ann || !$file_wt || !$file_treat) { + &printusage() +} + +#if ($pv_cut == 1){$pv_cut=0.9999;} +if (!$media_cut){$media_cut=0;} +if (!$FC_cut){$FC_cut=0;} + +my @r1=(); +my @r2=(); + +#my $confident=(1-$pv_cut)*100; +#inserire nell'ordine: tabella annotazione, tabella valori wt, tabella valori treated, p-value cutoff, raw signal cutoff + + +open (annotation,"<$file_ann") || die "file_ann not open:$!\n"; +open (wt,"<$file_wt") || die "$file_wt not open:$!\n"; +open (treated,"<$file_treat") || die "$file_treat not open:$!\n"; +open (output,">$file_output") || die "bed_$file_wt not open:$!\n"; + +print output "$header"; + +print "Comparison --> probe_selection=$tipo, Normalization=$normalization_q, summary method=$sum_meth Filters: p-value=$pv_cut, raw value=$media_cut, FC=$FC_cut"; + +while (defined (my $line_down = <annotation>)) { + chomp $line_down; + my @tmp_down = split("\t", $line_down); + my @probe_match=split("\,", $tmp_down[9]); + foreach $probes (@probe_match){ + my @coord=split("-", $probes); + $tab_tot{$tmp_down[0]}{"$coord[0]\t$coord[1]"}.="$tmp_down[1]\n$tmp_down[7]\n$tmp_down[0]\n$tmp_down[2]\n$tmp_down[3]\n$tmp_down[4]\n$tmp_down[5]\n$tmp_down[6]\n"; + push(@ciclo,"$tmp_down[0]\t$coord[0]\t$coord[1]"); + } +} + +while (defined (my $line_down = <wt>)) { + chomp $line_down; + my @tmp_down = split("\t", $line_down); + chomp $tmp_down[0]; + $tab_tot_wt{"$tmp_down[0]\t$tmp_down[3]"}=$tmp_down[5]; +} + + +while (defined (my $line_down = <treated>)) { + chomp $line_down; + my @tmp_down = split("\t", $line_down); + chomp $tmp_down[0]; + $tab_tot_treat{"$tmp_down[0]\t$tmp_down[3]"}=$tmp_down[5]; +} + +if($normalization_q eq "yes"){ + + @sort_wt=sort{$tab_tot_wt{$a} <=> $tab_tot_wt{$b}} keys %tab_tot_wt; + @sort_treat=sort{$tab_tot_treat{$a} <=> $tab_tot_treat{$b}} (keys %tab_tot_treat); + + for($i=0;$i<=$#sort_wt;$i++){ + $media=($tab_tot_wt{$sort_wt[$i]}+$tab_tot_treat{$sort_treat[$i]})/2; + $tab_tot_value{$sort_wt[$i]}.="$media\n"; + } + + for($i=0;$i<=$#sort_treat;$i++){ + $media=($tab_tot_wt{$sort_wt[$i]}+$tab_tot_treat{$sort_treat[$i]})/2; + $tab_tot_value{$sort_treat[$i]}.="$media\n"; + } + +} + + +foreach $key_value(@ciclo){ + + @keys_values=split("\t",$key_value); + @array1= split("\n",$tab_tot{$keys_values[0]}{"$keys_values[1]\t$keys_values[2]"}); + if($normalization_q eq "yes"){ + @array2= split("\n",$tab_tot_value{"$keys_values[1]\t$keys_values[2]"}); + $value1=$array2[0]; + $value2=$array2[1]; + } + else{ + $value1=$tab_tot_wt{"$keys_values[1]\t$keys_values[2]"}; + $value2=$tab_tot_treat{"$keys_values[1]\t$keys_values[2]"}; + } + if ($tipo eq "internal_exon"){ + $prendo = "^exon"; + } + if ($tipo eq "all_exon"){ + $prendo = "exon"; + } + if ($tipo eq "last_exon"){ + $prendo = "exon last"; + } + + if ((!($array1[0] eq "")) && ($array1[1] =~ /$prendo/g)) { + if ($data_log eq "no_log"){ + $log1=log($value1)/log(2); + $log2=log($value2)/log(2); + } + if ($data_log eq "log"){ + $log1=$value1; + $log2=$value2; + } + $tab_gene_wt{"$array1[0]\t$array1[2]\t$array1[3]\t$array1[4]\t$array1[5]\t$array1[6]"}.="$log1\n"; + $tab_gene_tratted{"$array1[0]\t$array1[2]\t$array1[3]\t$array1[4]\t$array1[5]\t$array1[6]"}.="$log2\n"; + } + +} + +foreach $key (keys %tab_gene_wt) { + @r1=split("\n",$tab_gene_wt{$key}); + @r2=split("\n",$tab_gene_tratted{$key}); + + sort {$a <=> $b} (@r1); + sort {$a <=> $b} (@r2); + + if ($#r1>0){ + my $ttest = new Statistics::TTest; + $ttest->set_significance(95); + $ttest->load_data(\@r1,\@r2); + my $s1=$ttest->{s1}; + my $s2=$ttest->{s2}; + $media1=$s1->{mean}; + $media2=$s2->{mean}; + $total_probes=$#r1+1; + $potenza_a=$media2-$media1; + $FCa=((2)**$potenza_a); + if( (@r1 % 2) == 1 ) { + $median1 = $r1[((@r1+1) / 2)-1]; + $median2 = $r2[((@r2+1) / 2)-1]; + } else { + $median1 = ($r1[(@r1 / 2)-1] + $r1[@r1 / 2]) / 2; + $median2 = ($r2[(@r2 / 2)-1] + $r2[@r2 / 2]) / 2; + } + $potenza_m=$median2-$median1; + $FCm=((2)**$potenza_m); + + + if ($FCa<1){ + $FCa=(-(1/$FCa)); + } + + + if ($FCm<1){ + $FCm=(-(1/$FCm)); + } + if ($sum_meth eq "mean"){ + if ((($media1<$media_cut) && ($media2<$media_cut)) || ($FC_cut>=abs($FCa))){ + next; + } + $p_value{"$key\t$media1\t$media2\t$FCa\t$total_probes"}=$ttest->{t_prob}; + + #print output "$key\t$media1\t$media2\t$FCa\t",$ttest->{t_prob},"\t$total_probes\n"; + } + if ($sum_meth eq "median"){ + if ((($media1<$media_cut) && ($media2<$media_cut)) || ($FC_cut>=abs($FCm))){ + next; + } + #print output "$key\t$median1\t$median2\t$FCm\t",$ttest->{t_prob},"\t$total_probes\n"; + $p_value{"$key\t$median1\t$median2\t$FCm\t$total_probes"}=$ttest->{t_prob}; + } + if ($sum_meth eq "both"){ + if (($media1<$media_cut) && ($media2<$media_cut)){ + next; + } + #print output "$key\t$media1\t$media2\t$FCa\t$median1\t$median2\t$FCm\t",$ttest->{t_prob},"\t$total_probes\n"; + $p_value{"$key\t$media1\t$media2\t$FCa\t$median1\t$median2\t$FCm\t$total_probes"}=$ttest->{t_prob}; + } + } + if ($#r1==0){ + $media1=$r1[0]; + $media2=$r2[0]; + $median1=$r1[0]; + $median2=$r2[0]; + $potenza=$media2-$media1; + $FCa=((2)**$potenza); + $FCm=((2)**$potenza); + if ($FCa<1){ + $FCa=(-(1/$FCa)); + } + if ($FCm<1){ + $FCm=(-(1/$FCm)); + } + + if ($sum_meth eq "both"){ + if (($media1<$media_cut) && ($media2<$media_cut)){ + next; + } + $p_value{"$key\t$media1\t$media2\t$FCa\t$median1\t$median2\t$FCm\t$total_probes"}=1; + } + if ($sum_meth eq "mean"){ + if ((($media1<$media_cut) && ($media2<$media_cut)) || ($FC_cut>=abs($FCa))){ + next; + } + $p_value{"$key\t$media1\t$media2\t$FCa\t$total_probes"}=1; + } + if ($sum_meth eq "median"){ + if ((($media1<$media_cut) && ($media2<$media_cut)) || ($FC_cut>=abs($FCm))){ + next; + } + $p_value{"$key\t$median1\t$median2\t$FCm\t$total_probes"}=1; + } + } +} + +@sort_pvalue=sort{$p_value{$a} <=> $p_value{$b}} keys %p_value; + +if($corretction eq "yes"){ + for($i=0;$i<=$#sort_pvalue;$i++){ + $qvalue=$p_value{$sort_pvalue[$i]}*(($#sort_pvalue+1)/($i+1)); + push(@QVALUE, $qvalue); + } + + @qvalue_sort = sort {$a <=> $b} @QVALUE; + + for($i=0;$i<=$#sort_pvalue;$i++){ + #$FDR=((($i+1)*0.05)/($#sort_pvalue+1)); + $qvalue=shift(@qvalue_sort); + if($qvalue>1){$qvalue=1;} + if($pv_cut>=$qvalue){ + print output "$sort_pvalue[$i]\t$p_value{$sort_pvalue[$i]}\t$qvalue\n"; + } + } +} +if($corretction eq "no"){ + for($i=0;$i<=$#sort_pvalue;$i++){ + if($pv_cut>=$qvalue){ + print output "$sort_pvalue[$i]\t$p_value{$sort_pvalue[$i]}\n"; + } + } +} + + + + + +} +if($option eq "expr"){ +if ($sum_meth eq "median"){ +$header=<<e0c6654; +\# annotation_file: $file_ann +\# expression_file: $file_wt +\# type of analysis: $tipo +\# summary method: $sum_meth +\# headers: name RefSeq Chr txStart txEnd strand median_chip num_probes_in_gene +e0c6654 + +} +if ($sum_meth eq "mean"){ +$header=<<e0c6654; +\# annotation_file: $file_ann +\# expression_file: $file_wt +\# type of analysis: $tipo +\# summary method: $sum_meth +\# headers: name RefSeq Chr txStart txEnd strand mean_chip num_probes_in_gene +e0c6654 + +} +if ($sum_meth eq "both"){ +$header=<<e0c6654; +\# annotation_file: $file_ann +\# expression_file: $file_wt +\# type of analysis: $tipo +\# summary method: $sum_meth +\# headers: name RefSeq Chr txStart txEnd strand mean_chip median_chip num_probes_in_gene +e0c6654 + +} + if (!$file_ann || !$file_wt) { + &printusage() + } +my @r1=(); +my @r2=(); + + +open (annotation,"<$file_ann") || die "file_ann not open:$!\n"; +open (wt,"<$file_wt") || die "$file_wt not open:$!\n"; +open (output,">$file_output") || die "bed_$file_wt not open:$!\n"; + +print output "$header"; + +print "Expression --> probe_selection=$tipo summary method=$sum_meth"; + +while (defined (my $line_down = <annotation>)) { + chomp $line_down; + my @tmp_down = split("\t", $line_down); + my @probe_match=split("\,", $tmp_down[9]); + foreach $probes (@probe_match){ + my @coord=split("-", $probes); + $tab_tot{$tmp_down[0]}{"$coord[0]\t$coord[1]"}.="$tmp_down[1]\n$tmp_down[7]\n$tmp_down[0]\n$tmp_down[2]\n$tmp_down[3]\n$tmp_down[4]\n$tmp_down[5]\n$tmp_down[6]\n"; + push(@ciclo,"$tmp_down[0]\t$coord[0]\t$coord[1]"); + } +} + +while (defined (my $line_down = <wt>)) { + chomp $line_down; + $line_down=~ s/ //g; + my @tmp_down = split("\t", $line_down); + chomp $tmp_down[0]; + $tab_tot_wt{"$tmp_down[0]\t$tmp_down[3]"}=$tmp_down[5]; +} +#@sort_wt=sort{$tab_tot_wt{$a} <=> $tab_tot_wt{$b}} keys %tab_tot_wt; + + +#print "ciclo = $ciclo[0]\t$ciclo[1]\n"; + +foreach $key_value(@ciclo){ + + @keys_values=split("\t",$key_value); + @array1= split("\n",$tab_tot{$keys_values[0]}{"$keys_values[1]\t$keys_values[2]"}); + @array2= split("\n",$tab_tot_wt{"$keys_values[1]\t$keys_values[2]"}); + + if ($tipo eq "internal_exon"){ + $prendo = "^exon"; + } + if ($tipo eq "all_exon"){ + $prendo = "exon"; + } + if ($tipo eq "last_exon"){ + $prendo = "exon last"; + } + + if ((!($array1[0] eq "")) && ($array1[1] =~ /$prendo/g)) { + + if ($data_log eq "no_log"){ + $log1=log($array2[0])/log(2); + } + if ($data_log eq "log"){ + $log1=$array2[0]; + } + + $tab_gene_wt{"$array1[0]\t$array1[2]\t$array1[3]\t$array1[4]\t$array1[5]\t$array1[6]"}.="$log1\n"; + } + if ((!($array1[0] eq "")) && ($array1[1] =~ /intron/g)) { + if ($data_log eq "no_log"){ + $log1=log($array2[0])/log(2); + } + if ($data_log eq "log"){ + $log1=$array2[0]; + } + $tab_intron_wt{"$array1[0]\t$array1[2]\t$array1[3]\t$array1[4]\t$array1[5]\t$array1[6]"}.="$log1\n"; + } + +} + + +foreach $key (keys %tab_gene_wt) { + @r1=split("\n",$tab_gene_wt{$key}); + if (exists $tab_intron_wt{$key}){ + @r2=split("\n",$tab_intron_wt{$key}); + sort {$a <=> $b} (@r2); + }else{ + @r2=("no"); + } + sort {$a <=> $b} (@r1); + if ($#r2>0) { + if ($#r1>0) { + my $ttest = new Statistics::TTest; + $ttest->set_significance(95); + $ttest->load_data(\@r1,\@r2); + my $s1=$ttest->{s1}; + my $s2=$ttest->{s2}; + $media1=$s1->{mean}; + $media2=$s2->{mean}; + $total_probes=$#r1+1; + $total_probes2=$#r2+1; + if( (@r1 % 2) == 1 ) { + $median1 = $r1[((@r1+1) / 2)-1]; + } else { + $median1 = ($r1[(@r1 / 2)-1] + $r1[@r1 / 2]) / 2; + } + if( (@r2 % 2) == 1 ) { + $median2 = $r2[((@r2+1) / 2)-1]; + } else { + $median2 = ($r2[(@r2 / 2)-1] + $r2[@r2 / 2]) / 2; + } + if($media1>=$media2){ + $pvalue=$ttest->{t_prob}; + } + if($media1<$media2){ + $pvalue=1; + } + if ($sum_meth eq "mean"){ + print output "$key\t$media1\t$media2\t$pvalue\t$total_probes\t$total_probes2\n"; + #print output "$key\t$media1\t$total_probes\n"; + } + if ($sum_meth eq "median"){ + print output "$key\t$median1\t$median2\t$pvalue\t$total_probes\t$total_probes2\n"; + #print output "$key\t$median1\t$total_probes\n"; + } + if ($sum_meth eq "both"){ + print output "$key\t$media1\t$media2\t$median1\t$median2\t$pvalue\t$total_probes\t$total_probes2\n"; + #print output "$key\t$media1\t$median1\t$total_probes\n"; + } + } + if ($#r1==0) { + @r1=@r2; + my $ttest = new Statistics::TTest; + $ttest->set_significance(95); + $ttest->load_data(\@r1,\@r2); + my $s2=$ttest->{s2}; + $media2=$s2->{mean}; + $total_probes=1; + $total_probes2=$#r2+1; + $media1=$r1[0]; + $median1=$r1[0]; + if( (@r2 % 2) == 1 ) { + $median2 = $r2[((@r2+1) / 2)-1]; + } else { + $median2 = ($r2[(@r2 / 2)-1] + $r2[@r2 / 2]) / 2; + } + if ($sum_meth eq "mean"){ + print output "$key\t$media1\t$media2\tNA\t$total_probes\t$total_probes2\n"; + #print output "$key\t$media1\t$total_probes\n"; + } + if ($sum_meth eq "median"){ + print output "$key\t$median1\t$median2\tNA\t$total_probes\t$total_probes2\n"; + #print output "$key\t$median1\t$total_probes\n"; + } + if ($sum_meth eq "both"){ + print output "$key\t$media1\t$media2\t$median1\t$median2\tNA\t$total_probes\t$total_probes2\n"; + #print output "$key\t$media1\t$median1\t$total_probes\n"; + } + } + + + } + + if ($r2[0] eq "no") { + if ($#r1>0) { + @r2=@r1; + my $ttest = new Statistics::TTest; + $ttest->set_significance(95); + $ttest->load_data(\@r1,\@r2); + my $s1=$ttest->{s1}; + $media1=$s1->{mean}; + $total_probes=$#r1+1; + if( (@r1 % 2) == 1 ) { + $median1 = $r1[((@r1+1) / 2)-1]; + } else { + $median1 = ($r1[(@r1 / 2)-1] + $r1[@r1 / 2]) / 2; + } + } + if ($#r==0){ + $media1=$r1[0]; + $median1=$r1[0]; + $total_probes=$#r1+1; + } + $media2=0; + $median2=0; + if ($sum_meth eq "both"){ + print output "$key\t$media1\t$media2\t$median1\t$median2\tNA\t$total_probes\t0\n"; + #print output "$key\t$media1\t$median1\t$total_probes\n"; + } + if ($sum_meth eq "mean"){ + print output "$key\t$media1\t$media2\tNA\t$total_probes\t0\n"; + #print output "$key\t$media1\t$total_probes\n"; + } + if ($sum_meth eq "median"){ + print output "$key\t$median1\t$median2\tNA\t$total_probes\t0\n"; + #print output "$key\t$median1\t$total_probes\n"; + } + } + if (($#r2 == 0) && ($r2[0] ne "no") && ($#r1==0)){ + $media1=$r1[0]; + $media2=$r2[0]; + $median1=$r1[0]; + $median2=$r2[0]; + if ($sum_meth eq "both"){ + print output "$key\t$media1\t$media2\t$median1\t$median2\tNA\t1\t1\n"; + #print output "$key\t$media1\t$median1\t1\n"; + } + if ($sum_meth eq "mean"){ + print output "$key\t$media1\t$media2\tNA\t1\t1\n"; + #print output "$key\t$media1\t1\n"; + } + if ($sum_meth eq "median"){ + print output "$key\t$median1\t$median2\tNA\t1\t1\n"; + #print output "$key\t$median1\t1\n"; + } + } +} + + + + +} + + + + + +#################################################################################################################################### + +sub printusage { + + print<<eoc22334; + + + + *************************************************************** + :: N i m b l e G e n E x p r e s s i o n A n a l y s i s :: + *************************************************************** + + + USAGE SUMMARY + --------------------------------------------------------------------------------- + This program utilizes NimbleGen ratio files in gff format as INPUT FILE and + provides a table of the computed picks in the same gff file format. + + + --ann annotation file + --wt wild type file + --treat treated file + --pv p-value cut-off + --rc raw signal cut-off + --fc fold change cut-toff + + ----------------------------------------------------------------------------------------- + EXAMPLES: + + perl program.pl --in 75340_ratio.gff --perc 0.98 --t s --num 3 --dist 250 --w 500 +OR + perl program.pl --in 75340_ratio.gff --perc 0.98 --t p --log 5 --num 3 --dist 250 --w 500 + + ----------------------------------------------------------------------------------------- + +eoc22334 + exit 0; + +} + + +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/com_uni.cpp Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,352 @@ +/* +* Copyright 2009 Matteo Cesaroni, Lucilla Luzi +* +* This program is free software; ; you can redistribute it and/or modify +* it under the terms of the GNU Lesser General Public License as published by +* the Free Software Foundation; either version 3 of the License, or (at your +* option) any later version. +* +* This program is distributed in the hope that it will be useful, +* but WITHOUT ANY WARRANTY; without even the implied warranty of +* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +* GNU General Public License for more details. + +*/ + +#include <iostream> +#include <fstream> +#include <string> +#include <vector> +#include <algorithm> +#include <sstream> +#include <deque> +#include <map> +#include <ctime> +#include <cstdlib> + +using namespace std; + +inline void Tokenize(const string& str, + vector<string>& tokens, + const string& delimiters = " ") +{ + // Skip delimiters at beginning. + string::size_type lastPos = str.find_first_not_of(delimiters, 0); + // Find first "non-delimiter". + string::size_type pos = str.find_first_of(delimiters, lastPos); + + while (string::npos != pos || string::npos != lastPos) + { + // Found a token, add it to the vector. + tokens.push_back(str.substr(lastPos, pos - lastPos)); + // Skip delimiters. Note the "not_of" + lastPos = str.find_first_not_of(delimiters, pos); + // Find next "non-delimiter" + pos = str.find_first_of(delimiters, lastPos); + } +} + + +typedef struct { + int inizioprobe; + int fineprobe; + string score; + string campo1; + string campo2; + string strand; + string index; + int file; +} Probe; + +struct Comparatore2 { + bool operator()(const Probe& s1, const Probe& s2) const { + if (s1.inizioprobe < s2.inizioprobe) { + return true; + } + else if (s1.inizioprobe == s2.inizioprobe) { + if (s1.fineprobe < s2.fineprobe) { + return true; + } + else if (s1.fineprobe == s2.fineprobe){ + return true; + } + else { + return false; + } + + } + else { + return false; + } + + } +}; + +int main (int argc, char * const argv[]) { + + string concatenate=argv[3]; + int dist_t=atoi(argv[1]); + string choice=argv[2]; + + string name_out= argv[6]; + ofstream resfile; + resfile.open (name_out.c_str()); + + + + if (concatenate=="no" && choice!="union"){ + + //print "flank=$win , type=$type col6=%overlap, concatenate=$concatenate"; + if (choice=="common"){ + cout<<"flank="<<dist_t<<" , type="<<choice<<" , col6=%overlap, concatenate="<<concatenate; + } + if (choice=="unique"){ + cout<<"flank="<<dist_t<<" , type="<<choice<<" , col6=score o p-value"; + } + + string line; + Probe thisprobe; + Probe thisanno; + int overlap=0; + vector<string> arraypro; + map<string, vector<Probe> > seq; + map<string, vector<Probe> > annotation; + map<string, vector<Probe> >::iterator itseq; + + ifstream seque_file(argv[4]); + while (getline(seque_file, line)) { + string s4; + s4.assign(line, 0, 1); + if (line=="" || s4=="#"){ + continue; + } + arraypro.clear(); + Tokenize(line, arraypro, "\t"); + string chr2 = (arraypro[0].c_str()); + thisprobe.inizioprobe=atoi(arraypro[3].c_str()); + thisprobe.fineprobe=atoi(arraypro[4].c_str()); + thisprobe.campo1=(arraypro[1].c_str()); + thisprobe.campo2=(arraypro[2].c_str()); + thisprobe.score=(arraypro[5].c_str()); + thisprobe.strand=(arraypro[6].c_str()); + thisprobe.index=(arraypro[8].c_str()); + seq[chr2].push_back(thisprobe); + } + + + ifstream anno_file(argv[5]); + while (getline(anno_file, line)) { + string s4; + s4.assign(line, 0, 1); + if (line=="" || s4=="#"){ + continue; + } + arraypro.clear(); + Tokenize(line, arraypro, "\t"); + string chr3= (arraypro[0].c_str()); + thisanno.inizioprobe=atoi(arraypro[3].c_str()); + thisanno.fineprobe=atoi(arraypro[4].c_str()); + thisanno.campo1=(arraypro[1].c_str()); + thisanno.campo2=(arraypro[2].c_str()); + thisanno.score=(arraypro[5].c_str()); + thisanno.strand=(arraypro[6].c_str()); + thisanno.index=(arraypro[8].c_str()); + annotation[chr3].push_back(thisanno); + } + + + for ( itseq=seq.begin() ; itseq != seq.end(); itseq++ ){ + + vector <Probe> seq_chr = (*itseq).second; + vector <Probe> anno_chr = annotation[(*itseq).first]; + if(anno_chr.size()==0 && choice=="unique"){ + for (int i=0; i<seq_chr.size();i++){ + resfile<<(*itseq).first<<"\t"<<seq_chr[i].campo1<<"\t"<<seq_chr[i].campo2<<"\t"<<seq_chr[i].inizioprobe<<"\t"<<seq_chr[i].fineprobe<<"\t"<<seq_chr[i].score<<"\t"<<seq_chr[i].strand<<"\t.\t"<<seq_chr[i].index<<endl; + } + continue; + } + if(anno_chr.size()==0 && choice=="common"){ + continue; + } + sort (seq_chr.begin(),seq_chr.end(),Comparatore2()); + sort (anno_chr.begin(),anno_chr.end(),Comparatore2()); + + int finefine=0; + + for (int i=0; i<anno_chr.size();i++){ + if(anno_chr[i].fineprobe<=finefine){ + anno_chr[i].fineprobe=finefine; + } + if(anno_chr[i].fineprobe>finefine){ + finefine=anno_chr[i].fineprobe; + } + } + + for (int i=0; i<seq_chr.size();i++){ + int start_array=0; + int fine_array=anno_chr.size(); + int pos=1; + int trovato=0; + + while (pos>0){ + pos=(fine_array-start_array)/2; + int position=start_array+pos; + + if((seq_chr[i].inizioprobe-dist_t)<anno_chr[position].inizioprobe){ + fine_array=position; + } + if((seq_chr[i].inizioprobe-dist_t)>anno_chr[position].inizioprobe){ + start_array=position; + } + if((seq_chr[i].inizioprobe-dist_t)<=anno_chr[position].fineprobe && (seq_chr[i].fineprobe+dist_t)>=anno_chr[position].inizioprobe){ + if (choice=="common"){ + if (seq_chr[i].inizioprobe<=anno_chr[position].inizioprobe && seq_chr[i].fineprobe<=anno_chr[position].fineprobe){ + overlap=(seq_chr[i].fineprobe-anno_chr[position].inizioprobe)*100/(seq_chr[i].fineprobe-seq_chr[i].inizioprobe); + } + if (seq_chr[i].inizioprobe<=anno_chr[position].inizioprobe && seq_chr[i].fineprobe>=anno_chr[position].fineprobe){ + overlap=(anno_chr[position].fineprobe-anno_chr[position].inizioprobe)*100/(seq_chr[i].fineprobe-seq_chr[i].inizioprobe); + } + if (seq_chr[i].inizioprobe>=anno_chr[position].inizioprobe && seq_chr[i].fineprobe>=anno_chr[position].fineprobe){ + overlap=(anno_chr[position].fineprobe-seq_chr[i].inizioprobe)*100/(seq_chr[i].fineprobe-seq_chr[i].inizioprobe); + } + if (seq_chr[i].inizioprobe>=anno_chr[position].inizioprobe && seq_chr[i].fineprobe<=anno_chr[position].fineprobe){ + overlap=100; + } + if (overlap<0){ + overlap=-1; + } + resfile<<(*itseq).first<<"\t"<<seq_chr[i].campo1<<"\t"<<seq_chr[i].campo2<<"\t"<<seq_chr[i].inizioprobe<<"\t"<<seq_chr[i].fineprobe<<"\t"<<overlap<<"\t"<<seq_chr[i].strand<<"\t.\t"<<"ValueA:"<<seq_chr[i].score<<"~"<<"ValueB:"<<anno_chr[position].score<<endl; + } + trovato=1; + break; + } + } + if (choice=="unique" && trovato==0){ + resfile<<(*itseq).first<<"\t"<<seq_chr[i].campo1<<"\t"<<seq_chr[i].campo2<<"\t"<<seq_chr[i].inizioprobe<<"\t"<<seq_chr[i].fineprobe<<"\t"<<seq_chr[i].score<<"\t"<<seq_chr[i].strand<<"\t.\t"<<seq_chr[i].index<<endl; + } + } + } + } + + if (concatenate=="yes" || choice == "union"){ + + cout<<"flank="<<dist_t<<" , type="<<choice<<" , col6=#overlaping regions, concatenate="<<concatenate; + + string line; + Probe thisprobe; + Probe thisanno; + vector<string> arraypro; + map<string, vector<Probe> > seq; + map<string, vector<Probe> > annotation; + map<string, vector<Probe> >::iterator itseq; + string concatenate=argv[3]; + int dist_t=atoi(argv[1]); + string choice=argv[2]; + + ifstream seque_file(argv[4]); + while (getline(seque_file, line)) { + string s4; + s4.assign(line, 0, 1); + if (line=="" || s4=="#"){ + continue; + } + arraypro.clear(); + Tokenize(line, arraypro, "\t"); + string chr2 = (arraypro[0].c_str()); + thisprobe.inizioprobe=atoi(arraypro[3].c_str()); + thisprobe.fineprobe=atoi(arraypro[4].c_str()); + thisprobe.campo2=(arraypro[2].c_str()); + thisprobe.campo1=(arraypro[1].c_str()); + thisprobe.score=(arraypro[5].c_str()); + thisprobe.index=(arraypro[8].c_str()); + thisprobe.strand=(arraypro[6].c_str()); + thisprobe.file=1; + seq[chr2].push_back(thisprobe); + } + + ifstream anno_file(argv[5]); + while (getline(anno_file, line)) { + string s4; + s4.assign(line, 0, 1); + if (line=="" || s4=="#"){ + continue; + } + arraypro.clear(); + Tokenize(line, arraypro, "\t"); + string chr3= (arraypro[0].c_str()); + thisanno.inizioprobe=atoi(arraypro[3].c_str()); + thisanno.fineprobe=atoi(arraypro[4].c_str()); + thisanno.campo2=(arraypro[2].c_str()); + thisanno.campo1=(arraypro[1].c_str()); + thisanno.score=(arraypro[5].c_str()); + thisanno.index=(arraypro[8].c_str()); + thisanno.strand=(arraypro[6].c_str()); + thisanno.file=2; + seq[chr3].push_back(thisanno); + } + + int inizio; + int fine; + string annot; + int overlap; + int inizio_ann; + int fine_ann; + + for ( itseq=seq.begin() ; itseq != seq.end(); itseq++ ){ + + vector <Probe> seq_chr = (*itseq).second; + sort (seq_chr.begin(),seq_chr.end(),Comparatore2()); + + for (int i=0; i<seq_chr.size();i++){ + inizio = seq_chr[i].inizioprobe; + fine=seq_chr[i].fineprobe; + int file_t=0; + int file_t2=0; + int entrato=1; + int z=1; + if(seq_chr[i].file==1){ + file_t=1; + } + if(seq_chr[i].file==2){ + file_t2=1; + } + if(i==(seq_chr.size()-1)){ + if (choice=="union"){ + resfile<<(*itseq).first<<"\tfile_"<<seq_chr[i].file<<"\tunique\t"<<seq_chr[i].inizioprobe<<"\t"<<seq_chr[i].fineprobe<<"\t"<<seq_chr[i].score<<"\t"<<seq_chr[i].strand<<"\t.\t"<<seq_chr[i].index<<endl; + } + } + + //cout<<"x"<<(*itseq).first<<"\t"<<seq_chr[i].inizioprobe<<"\t"<<seq_chr[i].fineprobe<<"\t"<<seq_chr[i].file<<endl; + for (int y=i+1; y<seq_chr.size(); y++){ + if((inizio-dist_t)<=seq_chr[y].fineprobe && (fine+dist_t)>=seq_chr[y].inizioprobe){ + if(seq_chr[y].file==1){ + file_t=1; + } + if(seq_chr[y].file==2){ + file_t2=1; + } + if(seq_chr[y].fineprobe>fine){ + fine=seq_chr[y].fineprobe; + } + entrato=2; + i++; + z++; + } + if(seq_chr[y].inizioprobe>fine || y==seq_chr.size()-1){ + if (choice == "union" && entrato==1){ + resfile<<(*itseq).first<<"\tfile_"<<seq_chr[i].file<<"\tunique\t"<<inizio<<"\t"<<fine<<"\t"<<seq_chr[i].score<<"\t"<<seq_chr[i].strand<<"\t.\t"<<seq_chr[i].index<<endl; + } + if (choice == "union" && entrato==2){ + resfile<<(*itseq).first<<"\tcommon\tcommon\t"<<inizio<<"\t"<<fine<<"\t"<<z<<"\t.\t.\tcommon"<<endl; + } + if (choice == "common" && entrato==2 && file_t == 1 && file_t2 == 1 && concatenate=="yes"){ + resfile<<(*itseq).first<<"\t"<<seq_chr[i].campo1<<"\t"<<seq_chr[i].campo2<<"\t"<<inizio<<"\t"<<fine<<"\t"<<z<<"\t"<<seq_chr[i].strand<<"\t.\t"<<seq_chr[i].index<<endl; + } + break; + } + } + } + } + } +} +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/common_unique_probe.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,76 @@ +<tool id="common unique" name="Com&Uni" version="1.1.0"> + <description>easy way to compare results</description> + <command>/data/galaxy/tools/CARPET/comuni $window ${match_choice.type} ${match_choice.conc} $input1 $input2 $output</command> + <inputs> + <param format="tabular" name="input1" type="data" label="Principal table"/> + <param format="tabular" name="input2" type="data" label="Secondary table"/> + <param name="window" type="integer" size="7" value="0" label="flank"/> + <conditional name="match_choice"> + <param name="type" type="select" label="Analysis type"> + <option value="common">common</option> + <option value="unique">unique</option> + <option value="union">union</option> + </param> + <when value="common"> + <param name="conc" type="select" label="coordinate common"> + <option value="yes">merge</option> + <option value="no">Principal table</option> + </param> + </when> + <when value="unique"> + <param name="conc" type="select" label="coordinate common"> + <option value="no">Principal table</option> + </param> + </when> + <when value="union"> + <param name="conc" type="select" label="coordinate common"> + <option value="yes">merge</option> + </param> + </when> + </conditional> + </inputs> + <outputs> + <data format="bed" name="output" file="common.dat"/> + </outputs> + + <tests> + <test> + <param name="input" value="1.gff"/> + <output name="output" file="wig-gff2bed.dat"/> + </test> + </tests> + <help> +.. class:: infomark + +**What it does** + +This tool evaluates the co-occurence of peaks between two GFF files. Common and/or unique peaks can be sorted out as exemplified below. It is possible to add a flanking region to the coordinates of the original peaks. + +-------- + +**Example:** + +- **Common merge** + +.. image:: static/images/CARPET/common_merg.png + +-------- + +- **Common Principal table** + +.. image:: static/images/CARPET/common_princ.png + +-------- + +- **Unique** + +.. image:: static/images/CARPET/unique.png + +-------- + +- **Union** + +.. image:: static/images/CARPET/union.png + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/genecentrico.pl Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,101 @@ +#! /usr/bin/perl + +# Copyright 2009 Matteo Cesaroni, Lucilla Luzi +# +# This program is free software; ; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as published by +# the Free Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + + +$file_all=$ARGV[0]; +$file_sub=$ARGV[1]; +$down=$ARGV[2]; +$up=$ARGV[6]; +$scelta=$ARGV[3]; +$result=$ARGV[4]; +$output=$ARGV[5]; + + +open(FILESUB, "<$file_sub") or die "Cannot find file $file_sub\n"; +open(FILEALL, "<$file_all") or die "Cannot open file $file_all: $!\n"; +open(FILEOUT, ">$output") or die "Cannot create file $output: $!\n"; + +print "Analysis Type=$scelta, Promoter def:$down/$up, output=$result\n"; + +@sub=<FILESUB>; +@all=<FILEALL>; +foreach $lines_all(@all){ + chomp $lines_all; + #chop $lines_all; + if ($lines_all=~/#/g){next;} + @line_all=split("\t",$lines_all); + $chr=$line_all[2]; + $refseq=$line_all[1]; + $refStart=$line_all[3]; + $refStop=$line_all[4]; + $name=$line_all[0]; + #$exon_count=$line_all[8]; + #@exonStart=split(",",$line_all[9]); + #@exonStop=split(",",$line_all[10]); + $strand=$line_all[5]; + if ($scelta eq "promoter"){ + if ($strand eq "+"){ + $prom_start=$refStart+$down; + $prom_stop=$refStart+$up; + } + if ($strand eq "-"){ + $prom_start=$refStop-$up; + $prom_stop=$refStop-$down; + } + } + if ($scelta eq "all"){ + if ($strand eq "+"){ + $prom_start=$refStart+$down; + $prom_stop=$refStop; + } + if ($strand eq "-"){ + $prom_start=$refStart; + $prom_stop=$refStop-$down; + } + } + + + + @promotore=(); + $max_prom=0; + foreach $lines_sub(@sub){ + chomp $lines_sub; + if ($lines_sub=~/#/g){next;} + @line_sub=split("\t",$lines_sub); + $chr_p=$line_sub[0]; + $peakStart=$line_sub[3]; + $peakStop=$line_sub[4]; + $value=$line_sub[5]; + #print "$peakStart\t$peakStop\t$prom_start\t$prom_stop\n"; + #print "$refStart\t$refStop\t$peakStart\t$peakStop\n"; + if (((($peakStart>=$prom_start) && ($peakStart<=$prom_stop)) || (($peakStart<=$prom_start) && ($peakStop>=$prom_start))) && ("$chr" eq "$chr_p")){ + push(@promotore,$value); + #print "cazzo"; + + } + } + $i=0; + foreach $valore(@promotore){ + $i++; + if($max_prom<$valore){ + $max_prom=$valore; + } + } + if ($result eq "max_value"){ + print FILEOUT "$lines_all\t$max_prom\n"; + } + else{ + print FILEOUT "$lines_all\t$i\n"; + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/genecentrico.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,75 @@ +<tool id="BECorrelation" name="BEC" version="1.0.0"> + <description>Binding-Expression-Correlation</description> + <command interpreter="perl">genecentrico.pl $expr $chip_gff ${prom_choice.prom_start} ${prom_choice.type} $result $output ${prom_choice.prom_end}</command> + <inputs> + <param format="tabular" name="expr" type="data" label="expression file"/> + <param format="bed" name="chip_gff" type="data" label="ChIP on chip GFF results"/> + <conditional name="prom_choice"> + <param name="type" type="select" label="Analysis type"> + <option value="promoter">only promoter</option> + <option value="all">all gene</option> + </param> + <when value="promoter"> + <param name="prom_start" type="integer" size="10" value="-2000" label="Promoter start"/> + <param name="prom_end" type="integer" size="10" value="1000" label="Promoter end"/> + </when> + <when value="all"> + <param name="prom_start" type="integer" size="10" value="-2000" label="Promoter start"/> + <param name="prom_end" type="text" size="12" value="NOT-NEEDED" label="Promoter end"/> + </when> + </conditional> + <param name="result" type="select" label="result output"> + <option value="number"># of matches</option> + <option value="max_value">max value</option> + </param> + </inputs> + <outputs> + <data format="tabular" name="output" /> + </outputs> + <help> + .. class:: infomark + +**What it does** + +BEC integrates the results of expression analysis and ChIP-chip analysis. For each transcript the number of peaks matching in the promoter and/or within the gene body is calculated. + +PLEASE, for more detailed information refer to the CARPET user Manual: +click to download_ it. + +.. _download: /static/example_file/CARPET_userManual.zip + +----- + +**Parameters:** + +- **Analysis type:** + - **promoter:** only the peaks matching in the promoter region (defined by user) are associated with the transcript + - **all gene:** peaks matching in the promoter and within the gene body are associated with the transcript +- **result output:** + - **# of matches:** the number of matching peaks are reported + - **max value:** the highest score among all matching peaks is reported + +-------- + +.. class:: warningmark + +If a peak matches with more than one transcript, it is associated with both. + +----- + +**INPUT FILES** + +- Expression file: file created by TEA +- ChIP on chip GFF results: file create by PeakPicker (or any GFF file) + +**OUTPUT FILES** + +.. image:: static/images/CARPET/bec_output.png + +.. class:: infomark + +This results table can be used again as input expression file, to add another ChIP-chip experiment. + + </help> + +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/gff2bed_v2.pl Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,44 @@ +#! /usr/bin/perl + +# Copyright 2009 Matteo Cesaroni, Lucilla Luzi +# +# This program is free software; ; you can redistribute it and/or modify +# it under the terms of the GNU Lesser General Public License as published by +# the Free Software Foundation; either version 3 of the License, or (at your +# option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + + +$fname=$ARGV[0]; #legge il nome del file che gli passi dopo il comando +$col3=$ARGV[1]; + +qx {sort -k 1,1 -k 4,4n $fname >$fname.sortato}; + +open(FILE, "< $fname.sortato") or die "Cannot find file $fname\n"; + +@array=<FILE>; + +print "track type=wiggle_0 name=\"$col3\" description=\"raw_data ratio\" visibility=full autoscale=off maxHeightPixels=100:50:20 color=200,100,0 altColor=0,100,200 \n"; + +for ($i=0;$i<$#array;$i++){ + @array_new= split("\t",$array[$i]); + @array_new2=split("\t",$array[$i+1]); + $dist=$array_new[4]-$array_new2[3]; + if (($array_new[4]>=$array_new2[3])&&("$array_new[0]"eq"$array_new2[0]")){ + $fine=$array_new2[3]-1; + } + else { + $fine=$array_new[4]; + + } + print "$array_new[0]\t$array_new[3]\t$fine\t$array_new[5]\n"; + + } + @array_fine=split("\t", $array[$#array]); + print "$array_fine[0]\t$array_fine[3]\t$array_fine[4]\t$array_fine[5]\n"; + + close FILE;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/gff2bed_v2.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,83 @@ +<tool id="gff to bed wiggle" name="Gff2Wig" version="1.1.0"> + <description>easy UCSC visualization of your raw-data</description> + <command interpreter="perl">gff2bed_v2.pl $input $col3 >$output</command> + <inputs> + <param format="gff" name="input" type="data" label="Source file"/> + <param name="col3" size="20" type="text" value="Analysis" label="Analysis name"/> + </inputs> + <outputs> + <data format="bed" name="output" file="wig-gff2bed.dat"/> + </outputs> + + <tests> + <test> + <param name="input" value="1.gff"/> + <output name="output" file="wig-gff2bed.dat"/> + </test> + </tests> + <help> +.. class:: infomark + +**What it does** + +This tool converts data from GFF format to WIGGLE format. This format allows the visuallization of raw intensity signals into the UCSC Genome Browser. + +PLEASE, for more detailed information refer to the CARPET user Manual: +click to download_ it. + +.. _download: /static/example_file/CARPET_userManual.zip + +-------- + +.. class:: infomark + +About formats + +**GFF** format General Feature Format is a format for describing genes and other features associated with DNA, RNA and Protein sequences. GFF lines have nine tab-separated fields: + +1. seqname - Must be a chromosome or scaffold. +2. source - The program that generated this feature. +3. feature - The name of this type of feature. Some examples of standard feature types are "CDS", "start_codon", "stop_codon", and "exon". +4. start - The starting position of the feature in the sequence. The first base is numbered 1. +5. end - The ending position of the feature (inclusive). +6. score - A score or signal. If there is no score value, enter ".". +7. strand - Valid entries include '+', '-', or '.' (for don't know/care). +8. frame - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'. +9. group - All lines with the same group are linked together into a single item. + +-------- + + +**Example** + + +Nimblegen gives you back a GFF file with the coordinates of each probe and the normalized signal value --> log2(Cy5/Cy3) on the sixth column. + + +Click here_ to download a GFF file example. + +.. _here: /static/example_file/GFF_file_norm.txt.zip + +The following data in GFF format:: + + chr19 Nimblegen tiling_array 100000 1000051 -1.2 + . probe_name + chr19 Nimblegen tiling_array 100100 1000151 2.9 + . probe_name + +will be converted to WIG as shown below (Please note that a header will be added to the file):: + + track type=wiggle_0 name="Analysis name" description="raw_data ratio" visibility=full autoscale=off maxHeightPixels=100:50:20 color=200,100,0 altColor=0,100,200 + chr19 1000000 1000050 -1.2 + chr19 1000100 1000150 2.9 + +.. class:: infomark + +"Analysis name" will be shown in the UCSC Genome Browser as track name and can be defined by user. + +Viusalize chip raw intensity: + +.. image:: static/images/CARPET/ucsc2.jpg + + + + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/norm_rep.xml Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,310 @@ +<tool id="normalization" name="PreProcess for Tiling" version="1.0.0"> + <description>normalizing data</description> + <command interpreter="bash">r_wrapper2.sh $script_file</command> + + <inputs> + <param name="type" type="select" label="Normalization"> + <option value="bwm" selected="true">Bi-weight function</option> + <option value="quantile">Quantile</option> + <option value="none">None</option> + </param> + <param name="sum" type="select" label="Summarization"> + <option value="mean" selected="true">Mean</option> + <option value="median">Median</option> + <option value="none">None</option> + </param> + <repeat name="series" title="Chip"> + <param name="input" type="data" format="tabular" label="Dataset"/> + <param name="header" type="select" label="Headers"> + <option value="T" selected="true">TRUE</option> + <option value="F">FALSE</option> + </param> + <param name="chrom_col" type="data_column" data_ref="input" label="Column for chr value (chr1,etc)"/> + <param name="start_col" type="data_column" data_ref="input" label="Column for start position"/> + + <conditional name="fine_col"> + <param name="si_o_no" type="select" label="End column"> + <option value="si_ce" selected="true">End column present</option> + <option value="no_ce">End column NOT present</option> + </param> + <when value="si_ce"> + <param name="end_col" type="data_column" data_ref="input" label="Column for end position"/> + </when> + <when value="no_ce"> + <param name="end_col" type="text" value="50" size="4" label="average length of the probes"/> + </when> + </conditional> + + <conditional name="data"> + <param name="data_type" type="select" label="Data type"> + <option value="log" selected="true">log2(ratio)</option> + <option value="no_log">one color raw data</option> + <option value="raw">Cy3-Cy5 raw data</option> + </param> + <when value="log"> + <param name="value_col" type="data_column" data_ref="input" label="Column for log2(ratio)"/> + <param name="value_col_cy3" type="text" value="NOT-NEEDED" size="12" label="Column for Cy3"/> + <param name="value_col_cy5" type="text" value="NOT-NEEDED" size="12" label="Column for Cy5"/> + </when> + <when value="no_log"> + <param name="value_col" type="data_column" data_ref="input" label="Column for raw data"/> + <param name="value_col_cy3" type="text" value="NOT-NEEDED" size="12" label="Column for Cy3"/> + <param name="value_col_cy5" type="text" value="NOT-NEEDED" size="12" label="Column for Cy5"/> + </when> + <when value="raw"> + <param name="value_col" type="text" value="NOT-NEEDED" size="12" label="Column for log2(ratio)"/> + <param name="value_col_cy3" type="data_column" data_ref="input" label="Column for Cy3"/> + <param name="value_col_cy5" type="data_column" data_ref="input" label="Column for Cy5"/> + </when> + </conditional> + <param name="col" type="select" label="Line Color"> + <option value="1">Black</option> + <option value="2">Red</option> + <option value="3">Green</option> + <option value="4">Blue</option> + <option value="5">Cyan</option> + <option value="6">Magenta</option> + <option value="7">Yellow</option> + <option value="8">Gray</option> + </param> + </repeat> + </inputs> + + <configfiles> + <configfile name="script_file"> + ## Setup R error handling to go to stderr + options( show.error.messages=F, + error = function () { cat( geterrmessage(), file=stderr() ); q( "no", 1, F ) } ) + ## Determine range of all series in the plot + options(scipen=999) + ciccioo=library(Ringo) + pdf( "${out_file1}" ) + xrange = c( NULL, NULL ) + xrange_norm = c( NULL, NULL ) + #for $i, $s in enumerate( $series ) + s${i} = read.table( "${s.input.file_name}",sep="\t",header=$s.header) + #if $i == 0 + firma=matrix(c("GALAXY","CARPET"),length(s${i}[,${s.chrom_col}]),2,byrow=T) + fine=matrix(c(".",".","Cesaroni_et_al."),length(s${i}[,${s.chrom_col}]),3,byrow=T) + + if ("${s.fine_col.si_o_no}"== "no_ce"){ + coord_gff=cbind(as.character(s${i}[,${s.chrom_col}]),firma,s${i}[,${s.start_col}],as.numeric(s${i}[,${s.start_col}])+${s.fine_col.end_col}) + } + if ("${s.fine_col.si_o_no}"== "si_ce"){ + coord_gff=cbind(as.character(s${i}[,${s.chrom_col}]),firma,s${i}[,${s.start_col}],s${i}[,${s.fine_col.end_col}]) + } + if ("${s.data.data_type}" == "raw") { + totali=log2(as.numeric(s${i}[,${s.data.value_col_cy5}])/as.numeric(s${i}[,${s.data.value_col_cy3}])) + } + if ("${s.data.data_type}" == "log") { + totali=s${i}[,${s.data.value_col}] + } + if ("${s.data.data_type}" == "no_log") { + totali=log2(as.numeric(s${i}[,${s.data.value_col}])) + } + + #elif $i >0 + if ("${s.data.data_type}" == "raw") { + totali=cbind(totali,log2(as.numeric(s${i}[,${s.data.value_col_cy5}])/as.numeric(s${i}[,${s.data.value_col_cy3}]))) + } + if ("${s.data.data_type}" == "log") { + totali=cbind(totali,s${i}[,${s.data.value_col}]) + } + if ("${s.data.data_type}" == "no_log") { + totali=cbind(totali,log2(as.numeric(s${i}[,${s.data.value_col}]))) + } + #end if + #end for + + + + print (paste("number of chips =",$i+1,sep=" "),quote=F) + tukey.biweight = function(x, c = 5, epsilon = 1e-04) { + m = median(x) + s = median(abs(x - m)) + u = (x - m)/(c * s + epsilon) + w = rep(0, length(x)) + ii = abs(u) <= 1 + w[ii] = ((1 - u^2)^2)[ii] + t.bi = sum(w * x)/sum(w) + return(t.bi) + } + totali=as.data.frame(totali) + if ("${type}" == "bwm"){ + totali.tbw = apply(totali, 2, tukey.biweight) + totali_norm = totali - matrix(totali.tbw, nrow = nrow(totali), ncol = ncol(totali), byrow = TRUE) + for (i in 1:length(totali.tbw)){ + print(paste("bi-weight_mean rep",i,"=",format(totali.tbw[i],digits=3),sep=" "),quote=F) + } + } + if ("${type}" == "quantile"){ + if (length(totali) == 1) { + print ("Quantile normalization is not feasible with one sample",quote=F) + q() + } + totali_norm=normalizeBetweenArrays(as.matrix(totali), method="quantile") + } + if ("${type}" == "none"){ + totali_norm=totali + } + + for (j in 1:length(as.data.frame(totali_norm))) + xrange_norm=range(totali_norm[,j],xrange_norm) + + for (jj in 1:length(totali)) + xrange=range(totali[,jj],xrange) + + plot( NULL, type="n", xlim=xrange, ylim=c(0,1.2), main="Raw signal distribution", xlab="log2(ratio)",ylab="Density") + ## Plot each series + #for $i, $s in enumerate( $series ) + lines(density(totali[,${i}+1]), col="${s.col}" ) + #if $i == 0 + colori="${s.col}" + #elif $i >0 + colori=rbind(colori,"${s.col}") + #end if + #end for + legend((xrange[1]), 1.2,pch="-", col=as.vector(colori),legend=paste("rep",c(1:(${i}+1)),sep="_")) + + + plot( NULL, type="n", xlim=xrange_norm, ylim=c(0,1.2), main="Normalized signal distribution", xlab="log2(ratio)",ylab="Density") + ## Plot each series + #for $i, $s in enumerate( $series ) + lines(density(totali_norm[,${i}+1]), col="${s.col}" ) + #end for + legend((xrange_norm[1]), 1.2,pch="-", col=as.vector(colori),legend=paste("rep",c(1:(${i}+1)),"norm",sep="_")) + + + + if (${i} > 0){ + corPlot(as.matrix(totali_norm),grouping=paste("rep",c(1:(${i}+1)),"norm",sep="_")) + } + devname = dev.off() + totali_norm=as.data.frame(totali_norm) + if ("${sum}" == "mean"){ + total_sum=apply(totali_norm,1,mean) + } + if ("${sum}" == "median"){ + total_sum=apply(totali_norm,1,median) + } + if ("${sum}" == "none"){ + total_sum=totali_norm + } + total_sum=round(total_sum,digits=3) + total_gff=cbind(coord_gff,total_sum,fine) + cazzolina=sub("CHR","chr",total_gff[,1]) + total_gff[,1]=as.vector(cazzolina) + write.table(total_gff,"${out_file2}",sep="\t",quote=F,col.names=F,row.names=F) + + </configfile> + </configfiles> + + <outputs> + <data format="pdf" name="out_file1" /> + <data format="tabular" name="out_file2" /> + </outputs> + +<help> + .. class:: infomark + +**What it does** + +PPT normalizes single ChIP-chip or multi ChIP-chip experiments. +PPT also compares the correlation between replicates and produces different plot to better understand the goodness of the experiment and creates a GFF file suitable for PeakPicker analysis. + +PLEASE, for more detailed information refer to the CARPET user Manual: +click to download_ it. + +.. _download: /static/example_file/CARPET_userManual.zip + +-------- + +**Parameters:** + +- **Normalization:** + - **Bi-weight function:** bi-weight function is used to scale all the chips (Standard Nimblegen normalization). + - **Quantile:** quantile normalization is performed between all the chips. + - **None:** no normalization is performed. + +- **Summarization:** + - **Mean:** the final value of each probe is the mean between all the chips. + - **Median:** the final value of each probe is the median between all the chips. + - **None:** all the values of each probe are given back. +- **Chips:** + - **Dataset:** input data file. + - **Headers:** if headers are present or not in the dataset file. + - **Column for chr value:** the column with the probe Chromosome numbers. + - **Column for start position:** the column with the probe start positions. + - **End column:** if the end position of the probes is present or not. + - **Column for end position:** the column with the probe end positions. + - **average length of the probes:** the average length of the probes (only for custom chip). + - **Data type:** choose between log2(ratio) or raw value (NOT log trasformaed) or Cy3-Cy5 raw values according to data format. + - **Column for log2(ratio):** the column with probe log2(ratio) values. + - **Column for raw data:** the column with probe raw values (NOT log trasformed). + - **Column for Cy3:** the column with probe Cy3 raw value. + - **Column for Cy5:** the column with probe Cy5 raw value. + - **Line Color:** the line colors for graphs create by the script. + + + +----- + +.. class:: warningmark + +This tool requires at least the following fields in each file or dataset: + - Chromosome number in this format : chr1 , chr2, etc etc. + - Start position + - one column with log2(ratio) or two columns with Cy3 and Cy5 raw values + +-------- + + + +**INPUT FILE** + +This tool accepts any kind of file, with at least the fields described above. + +Click here (pair_file_) to download a Cy3-Cy5 pair file example. + +.. _pair_file: /static/example_file/all_pair.txt.zip + +Click here (raw_value_file_) to download an one color example. + +.. _raw_value_file: /static/example_file/raw_value.txt.zip + +Click here (GFF_file_) to download a GFF log2(ratio) file example. + +.. _GFF_file: /static/example_file/log2ratio_file.txt.zip + + +--------- + +.. class:: infomark + +**How does it work?** + +For each chip the log2 of Cy5/Cy3 is calculated (if not already present). +All the chips are then normalized, according to the type of normalization selected. + + - **bi-weight** procedure scales all the probe log2ratio to center the data around zero. Scaling is performed by subtracting the bi-weight mean for the log2(ratio) values for all features on the array from each log2-ratio value. + - **quantile** procedure normalizes the ditributions of the probe log2ratio of each chip with a quantile normalization. + +Moreover, the correlations between chips are calculated and graphs are produced as shown in the following figures. + +.. image:: static/images/CARPET/distribution.png + +.. image:: static/images/CARPET/correlation.png + +The first two graphs are produced using the density function implemented in R. +The last graph is produced using the corPlot function implemented in Ringo package. +(The last graph is created only if more than one chip is uploaded.) + + +**OUTPUT FILE** + +- If a summarization method is selected or only one chip is uploaded, a GFF file (ready to be used with PeakPicker) is created. +- if NO summarization methods are selected and more than one file is uploaded, the output will be like in the table below: + + .. image:: static/images/CARPET/output_no_sum.png +</help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/carpet-src-1/tools/CARPET/r_wrapper2.sh Tue Jun 07 16:50:41 2011 -0400 @@ -0,0 +1,27 @@ +#!/bin/sh + +### Run R providing the R script in $1 as standard input and passing +### the remaining arguments on the command line + +# Function that writes a message to stderr and exits +function fail +{ + echo "$@" >&2 + exit 1 +} + +# Ensure R executable is found +which R > /dev/null || fail "'R' is required by this tool but was not found on path" + +# Extract first argument +infile=$1; shift + +# Ensure the file exists +test -f $infile || fail "R input file '$infile' does not exist" + +# Invoke R passing file named by first argument to stdin +cat $infile > /tmp/sticazzi.R + +R --vanilla --slave -f $infile $* 2> /dev/null + +