# HG changeset patch # User lgueguen # Date 1610017921 0 # Node ID 05c9b1a7f44e35346e3be70f2b61f4555b8b734d # Parent de6d0b7c17afda1ef605530750dd6389a1848fe8 Uploaded new release 1.7.3 diff -r de6d0b7c17af -r 05c9b1a7f44e README.md --- a/README.md Mon Oct 01 05:07:56 2018 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,52 +0,0 @@ --------------------------------------------------------------------------------------- -SARTools-Galaxy: a galaxy wrapper for SARTools (Statistical Analysis of RNA-Seq Tools) --------------------------------------------------------------------------------------- - -[![Build Status](https://travis-ci.org/PF2-pasteur-fr/SARTools-Galaxy.svg?branch=master)](https://travis-ci.org/PF2-pasteur-fr/SARTools-Galaxy) - -Description: ------------- - SARTools is a R package dedicated to the differential analysis of RNA-seq data. - - SARTools provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis. Note that SARTools does not intend to replace DESeq2 or edgeR: it simply provides an environment to go with them. For more details about the methodology behind DESeq2 or edgeR, the user should read their documentations and papers. - -Requirements: -------------- - These Galaxy tools need: - - R and the following R packages: SARTools, DESeq2, edgeR, genefilter, xtable and knitr. - - Rscript and optparse package - - SARTools can be downloaded on github (https://github.com/PF2-pasteur-fr/SARTools). More information about installation can be found at this url. - -Requirements using Conda: -------------------------- -[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/r-sartools/README.html) - -[Conda](http://conda.pydata.org/) is package manager that among many other things can be used to manage Python packages. - - -``` -#To install miniconda2 -#http://conda.pydata.org/miniconda.html -#To install the SARTools R library using conda: -conda install r-sartools -#To set an environment: -conda create -n r-sartools r-sartools` -#To activate the environment: -. activate r-sartools -``` - - -Test: ------ - -`planemo test` using conda: passed - - -References: ------------ - The SARTools package has been developped at PF2 - Institut Pasteur by M.-A. Dillies and H. Varet (hugo.varet@pasteur.fr). - Thanks to cite H. Varet, L. Brillet-Guéguen, J.-Y. Coppee and M.-A. Dillies, SARTools: A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data, PLoS One, 2016, doi: http://dx.doi.org/10.1371/journal.pone.0157022 when using this tool for any analysis published. - - The Galaxy wrapper and scripts have been developped by Loraine Brillet-Guéguen, Institut Français de Bioinformatique - diff -r de6d0b7c17af -r 05c9b1a7f44e README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Thu Jan 07 11:12:01 2021 +0000 @@ -0,0 +1,57 @@ +====================================================================================== +SARTools-Galaxy: a galaxy wrapper for SARTools (Statistical Analysis of RNA-Seq Tools) +====================================================================================== + +|Build Status| + +Description: +============ + + +SARTools is a R package dedicated to the differential analysis of RNA-seq data. + +SARTools provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis. Note that SARTools does not intend to replace DESeq2 or edgeR: it simply provides an environment to go with them. For more details about the methodology behind DESeq2 or edgeR, the user should read their documentations and papers. + +SARTools can be downloaded on github (https://github.com/PF2-pasteur-fr/SARTools). More information about installation can be found at this url. + + +Requirements using Conda: +========================= + +|install with bioconda| + +`Conda`_ is a package manager that among many other things can be used to manage Python packages. + +.. code-block:: bash + + #To install miniconda + #http://conda.pydata.org/miniconda.html + #To install the SARTools R library using conda: + conda install r-sartools + #To set an environment: + conda create -n r-sartools r-sartools` + #To activate the environment: + . activate r-sartools + +Test: +===== + +``planemo test`` using conda: passed + +References: +=========== + + +The SARTools package has been developped at PF2 - Institut Pasteur by M.-A. Dillies and H. Varet (hugo.varet@pasteur.fr). + +Thanks to cite H. Varet, L. Brillet-Guéguen, J.-Y. Coppee and M.-A. Dillies, SARTools: A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data, PLoS One, 2016, doi: http://dx.doi.org/10.1371/journal.pone.0157022 when using this tool for any analysis published. + +The Galaxy wrapper and scripts have been developped by Loraine Brillet-Guéguen, Institut Français de Bioinformatique + +.. _Conda: http://conda.pydata.org/ + +.. |Build Status| image:: https://travis-ci.org/PF2-pasteur-fr/SARTools-Galaxy.svg?branch=master + :target: https://travis-ci.org/PF2-pasteur-fr/SARTools-Galaxy + +.. |install with bioconda| image:: https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat + :target: http://bioconda.github.io/recipes/r-sartools/README.html diff -r de6d0b7c17af -r 05c9b1a7f44e abims_sartools_deseq2.xml --- a/abims_sartools_deseq2.xml Mon Oct 01 05:07:56 2018 -0400 +++ b/abims_sartools_deseq2.xml Thu Jan 07 11:12:01 2021 +0000 @@ -1,114 +1,96 @@ - + - Compare two or more biological conditions in a RNA-Seq framework with DESeq2 - macros.xml + macros.xml - - - - - - - - + +
+ + + + + - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + +
- - - - - + - - +--> - + ]]> - - - @ARTICLE{Cook77, - author = {R.-D. Cook}, - title = {Detection of Influential Observation in Linear Regression}, - journal = {Technometrics}, - year = {1977}, - month = {February} - } - @ARTICLE{Bourgon10, - author = {R. Bourgon, R. Gentleman, and W. Huber}, - title = {Independent filtering increases detection power for high-throughput experiments}, - journal = {PNAS}, - year = {2010}, - volume = {107}, - number = {21}, - pages = {9546–9551}, - note = {URL: http://www.pnas.org/content/107/21/9546.long} - } - + + + @ARTICLE{Cook77, + author = {R.-D. Cook}, + title = {Detection of Influential Observation in Linear Regression}, + journal = {Technometrics}, + year = {1977}, + month = {February} + } + @ARTICLE{Bourgon10, + author = {R. Bourgon, R. Gentleman, and W. Huber}, + title = {Independent filtering increases detection power for high-throughput experiments}, + journal = {PNAS}, + year = {2010}, + volume = {107}, + number = {21}, + pages = {9546–9551}, + note = {URL: http://www.pnas.org/content/107/21/9546.long} + } +
diff -r de6d0b7c17af -r 05c9b1a7f44e abims_sartools_deseq2_wrapper.py --- a/abims_sartools_deseq2_wrapper.py Mon Oct 01 05:07:56 2018 -0400 +++ b/abims_sartools_deseq2_wrapper.py Thu Jan 07 11:12:01 2021 +0000 @@ -67,7 +67,7 @@ report_html=args.report_html log=args.log #Print the parameters selected - print("Wrapper arguments: %s") %(args) + print("Wrapper arguments: %s" %(args)) #Get the working directory path working_directory = os.getcwd() @@ -104,7 +104,7 @@ if forceCairoGraph: cmd+="--forceCairoGraph %s " % (forceCairoGraph) cmd+="> %s 2>&1" % (log) - print("Rscript command: %s") % (cmd) + print("Rscript command: %s" % (cmd)) os.system(cmd) #Get output files diff -r de6d0b7c17af -r 05c9b1a7f44e abims_sartools_edger.xml --- a/abims_sartools_edger.xml Mon Oct 01 05:07:56 2018 -0400 +++ b/abims_sartools_edger.xml Thu Jan 07 11:12:01 2021 +0000 @@ -1,84 +1,68 @@ - + - Compare two or more biological conditions in a RNA-Seq framework with edgeR - macros.xml + macros.xml - - - - - - - +
+ + + + + + + - - - - - - - - - - - - - - - - - - - + + + + + + + +
- - - - + - @@ -86,24 +70,22 @@ - - +--> - @@ -111,22 +93,20 @@ - - +--> - + ]]> diff -r de6d0b7c17af -r 05c9b1a7f44e abims_sartools_edger_wrapper.py --- a/abims_sartools_edger_wrapper.py Mon Oct 01 05:07:56 2018 -0400 +++ b/abims_sartools_edger_wrapper.py Thu Jan 07 11:12:01 2021 +0000 @@ -63,7 +63,7 @@ report_html=args.report_html log=args.log #Print the parameters selected - print("Wrapper arguments: %s") %(args) + print("Wrapper arguments: %s" %(args)) #Get the working directory path working_directory = os.getcwd() @@ -96,7 +96,7 @@ if forceCairoGraph: cmd+="--forceCairoGraph %s " % (forceCairoGraph) cmd+="> %s 2>&1" % (log) - print("Rscript command: %s") % (cmd) + print("Rscript command: %s" % (cmd)) os.system(cmd) #Get output files diff -r de6d0b7c17af -r 05c9b1a7f44e macros.xml --- a/macros.xml Mon Oct 01 05:07:56 2018 -0400 +++ b/macros.xml Thu Jan 07 11:12:01 2021 +0000 @@ -1,11 +1,10 @@ - 1.6.3 + 1.7.3 - r-sartools - r-optparse + r-sartools @@ -23,101 +22,106 @@ - - --projectName $projectName - --author $author - --targetFile $targetFile - --rawDir $rawDir - --featuresToRemove $featuresToRemove - --varInt $varInt - --condRef $condRef - + +--projectName '$projectName' +--author '$author' +--targetFile '$targetFile' +--rawDir '$rawDir' +--featuresToRemove '$featuresToRemove' +--varInt '$varInt' +--condRef '$condRef' + - - #if $advanced_parameters.batch_condition.condition: - --batch $advanced_parameters.batch_condition.batch - #else: + + #if $advanced_parameters.batch_condition.condition: + --batch '$advanced_parameters.batch_condition.batch' + #else: --batch NULL - #end if - + #end if + - - --figures_html $figures_html - --figures_html_files_path $figures_html.files_path - --tables_html $tables_html - --tables_html_files_path $tables_html.files_path - --rdata $rdata - --report_html $report_html - --log $log - + +--figures_html '$figures_html' +--figures_html_files_path '$figures_html.files_path' +--tables_html '$tables_html' +--tables_html_files_path '$tables_html.files_path' +--rdata '$rdata' +--report_html '$report_html' +--log '$log' + - + - \S+ - + \S+ + - \S+ - + \S+ + - \S+ - + \S+ + - \S+ - + \S+ + - \S+ - - + \S+ + + - - - - - - - - - - - + + + + + + + + + + + + + + + - - - + + + + + + + + + + + - - - - - - - - - - - + + + \S+ + + + + + + + - - - \S+ - - + + + - - - - - + - + - + .. class:: infomark **Authors** M.-A. Dillies and H. Varet @@ -132,14 +136,14 @@ | Contact support.abims@sb-roscoff.fr for any questions or concerns about the Galaxy implementation of this tool. --------------------------------------------------- - + - + | SARTools is a R package dedicated to the differential analysis of RNA-seq data. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis. | Note that SARTools does not intend to replace DESeq2 or edgeR: it simply provides an environment to go with them. For more details about the methodology behind DESeq2 or edgeR, the user should read their documentations and papers. - + - + .. class:: warningmark If the counts and the target files are not supplied in the required formats, the workflow will probably crash and will not be able to run the analysis. @@ -155,12 +159,12 @@ Design/target file: - | The user has to supply a tab delimited file which describes the experiment, i.e. which contains the name of the biological condition associated with each sample. This file is called ”target” as a reference to the target file needed when using the limma package [1]. This file has one row per sample and is composed of at least three columns with headers: + | The user has to supply a tab delimited file which describes the experiment, i.e. which contains the name of the biological condition associated with each sample. This file is called ”target” as a reference to the target file needed when using the limma package [1]. This file has one row per sample and is composed of at least three columns with headers: - * column 1 : unique names of the samples (short but informative as they will be displayed on all the figures); - * column 2 : name of the count files; - * column 3 : biological conditions; - * optional columns : further information about the samples (day of library preparation for example). + * column 1 : unique names of the samples (short but informative as they will be displayed on all the figures); + * column 2 : name of the count files; + * column 3 : biological conditions; + * optional columns : further information about the samples (day of library preparation for example). - Example of a target file:: @@ -173,96 +177,96 @@ Zip file containing raw counts files: - | The statistical analysis assumes that reads have already been mapped and that counts per feature (gene or transcript) are available. If counting has been done with HTSeq-count [2, 3], output files are ready to be loaded in R with the dedicated SARTools function. If not, the user must supply, in a zip file, one count file per sample with two tab delimited columns without header: + | The statistical analysis assumes that reads have already been mapped and that counts per feature (gene or transcript) are available. If counting has been done with HTSeq-count [2, 3], output files are ready to be loaded in R with the dedicated SARTools function. If not, the user must supply, in a zip file, one count file per sample with two tab delimited columns without header: - * column 1 : the unique IDs of the features; - * column 2 : the raw counts associated with these features (null or positive integers). - + * column 1 : the unique IDs of the features; + * column 2 : the raw counts associated with these features (null or positive integers). + - - * **projectName:** name of the project; - * **author:** author of the analysis; - * **featuresToRemove:** character vector containing the IDs of the features to remove before running the analysis (default are "alignment not unique", "ambiguous", "no feature", "not aligned", "too low aQual" to remove HTSeq-count specific rows); - * **varInt:** variable of interest, i.e. biological condition, in the target file ("group" by default); - * **condRef:** reference biological condition used to compute fold-changes (no default, must be one of the levels of varInt); - + + * **projectName:** name of the project; + * **author:** author of the analysis; + * **featuresToRemove:** character vector containing the IDs of the features to remove before running the analysis (default are "alignment not unique", "ambiguous", "no feature", "not aligned", "too low aQual" to remove HTSeq-count specific rows); + * **varInt:** variable of interest, i.e. biological condition, in the target file ("group" by default); + * **condRef:** reference biological condition used to compute fold-changes (no default, must be one of the levels of varInt); + - + **Report:** - | Give details about the methodology, the different steps and the results. It displays all the figures produced and the most important results of the differential analysis as the number of up- and down-regulated features. - | The user should read the full HTML report and closely analyze each figure to check that the analysis ran smoothly. + | Give details about the methodology, the different steps and the results. It displays all the figures produced and the most important results of the differential analysis as the number of up- and down-regulated features. + | The user should read the full HTML report and closely analyze each figure to check that the analysis ran smoothly. **Tables:** - * **TestVsRef.complete.txt:** contains all the features studied; - * **TestVsRef.down.txt:** contains only significant down-regulated features, i.e. less expressed in Test than in Ref; - * **TestVsRef.up.txt:** contains only significant up-regulated features i.e. more expressed in Test than in Ref. + * **TestVsRef.complete.txt:** contains all the features studied; + * **TestVsRef.down.txt:** contains only significant down-regulated features, i.e. less expressed in Test than in Ref; + * **TestVsRef.up.txt:** contains only significant up-regulated features i.e. more expressed in Test than in Ref. **Figures:** - * **MAplot.png:** MA-plot for each comparison (log ratio of the means vs intensity). - * **PCA.png:** first and second factorial planes of the PCA on the samples based on VST or rlog data; - * **barplotNull.png:** percentage of null counts per sample; - * **barplotTC.png:** total number of reads per sample; - * **cluster.png:** hierachical clustering of the samples (based on VST or rlog data); - * **countsBoxplot.png:** boxplots on raw and normalized counts; - * **densplot.png:** estimation of the density of the counts for each sample; - * **diagSizeFactorsHist.png:** diagnostic of the estimation of the size factors; - * **diagSizeFactorsTC.png:** plot of the size factors vs the total number of reads; - * **dispersionsPlot.png:** graph of the estimations of the dispersions and diagnostic of log-linearity of the dispersions; - * **majSeq.png:** percentage of reads caught by the feature having the highest count in each sample; - * **pairwiseScatter.png:** pairwise scatter plot between each pair of samples and SERE values; - * **rawpHist.png:** histogram of the raw p-values for each comparison; - * **volcanoPlot.png:** vulcano plot for each comparison (− log10 (adjusted P value) vs log ratio of the means). + * **MAplot.png:** MA-plot for each comparison (log ratio of the means vs intensity). + * **PCA.png:** first and second factorial planes of the PCA on the samples based on VST or rlog data; + * **barplotNull.png:** percentage of null counts per sample; + * **barplotTC.png:** total number of reads per sample; + * **cluster.png:** hierachical clustering of the samples (based on VST or rlog data); + * **countsBoxplot.png:** boxplots on raw and normalized counts; + * **densplot.png:** estimation of the density of the counts for each sample; + * **diagSizeFactorsHist.png:** diagnostic of the estimation of the size factors; + * **diagSizeFactorsTC.png:** plot of the size factors vs the total number of reads; + * **dispersionsPlot.png:** graph of the estimations of the dispersions and diagnostic of log-linearity of the dispersions; + * **majSeq.png:** percentage of reads caught by the feature having the highest count in each sample; + * **pairwiseScatter.png:** pairwise scatter plot between each pair of samples and SERE values; + * **rawpHist.png:** histogram of the raw p-values for each comparison; + * **volcanoPlot.png:** vulcano plot for each comparison (− log10 (adjusted P value) vs log ratio of the means). **R log file:** - | Give the R console outputs. + | Give the R console outputs. **R objects (.RData file):** - | Give all the R objects created during the analysis is saved: it may be used to perform downstream analyses. - + | Give all the R objects created during the analysis is saved: it may be used to perform downstream analyses. + - - 10.1371/journal.pone.0157022 - @INBOOK{Smyth05, - author = {G.-K. Smyth}, - editor = {R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, and W. Huber}, - chapter = {Limma: linear models for microarray data}, - title = {Bioinformatics and Computational Biology Solutions Using R and Bioconductor}, - publisher = {Springer}, - year = {2005}, - pages = {397–420} - } - 10.1093/bioinformatics/btu638 - @ARTICLE{Benjamini95, - author = {Y. Benjamini and Y. Hochberg}, - title = {Controlling the false discovery rate: a practical and powerful approach to multiple testing}, - journal = {Journal of the Royal Statistical Society B}, - year = {1995}, - volume = {57}, - pages = {289–300} - } - @ARTICLE{Benjamini01, - author = {Y. Benjamini and D. Yekutieli}, - title = {The control of the false discovery rate in multiple testing under dependency}, - journal = {Ann. Statist.}, - year = {2001}, - volume = {29}, - number = {4}, - pages = {1165–1188} - } - + + 10.1371/journal.pone.0157022 + @INBOOK{Smyth05, + author = {G.-K. Smyth}, + editor = {R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, and W. Huber}, + chapter = {Limma: linear models for microarray data}, + title = {Bioinformatics and Computational Biology Solutions Using R and Bioconductor}, + publisher = {Springer}, + year = {2005}, + pages = {397–420} + } + 10.1093/bioinformatics/btu638 + @ARTICLE{Benjamini95, + author = {Y. Benjamini and Y. Hochberg}, + title = {Controlling the false discovery rate: a practical and powerful approach to multiple testing}, + journal = {Journal of the Royal Statistical Society B}, + year = {1995}, + volume = {57}, + pages = {289–300} + } + @ARTICLE{Benjamini01, + author = {Y. Benjamini and D. Yekutieli}, + title = {The control of the false discovery rate in multiple testing under dependency}, + journal = {Ann. Statist.}, + year = {2001}, + volume = {29}, + number = {4}, + pages = {1165–1188} + } + diff -r de6d0b7c17af -r 05c9b1a7f44e make_html.py --- a/make_html.py Mon Oct 01 05:07:56 2018 -0400 +++ b/make_html.py Thu Jan 07 11:12:01 2021 +0000 @@ -68,6 +68,6 @@ html+=fin_html -htmlf = file(output_html,'w') +htmlf = open(output_html,'w') htmlf.write(html) htmlf.close() diff -r de6d0b7c17af -r 05c9b1a7f44e pre_sartools.py --- a/pre_sartools.py Mon Oct 01 05:07:56 2018 -0400 +++ b/pre_sartools.py Thu Jan 07 11:12:01 2021 +0000 @@ -51,8 +51,8 @@ filename_base = basename(filename) # For RSEM files we process files as HTSeq count output tmpdir = tempfile.mkdtemp() - with open(filename, 'rb') as csvfile: - with open(join(tmpdir, basename(filename)), 'wb') as out: + with open(filename, 'rt') as csvfile: + with open(join(tmpdir, basename(filename)), 'wt') as out: spamwriter = csv.writer(out, delimiter='\t') reader = csv.DictReader(csvfile, delimiter='\t', skipinitialspace=True) if len(reader.fieldnames) > 2: diff -r de6d0b7c17af -r 05c9b1a7f44e pre_sartools.xml --- a/pre_sartools.xml Mon Oct 01 05:07:56 2018 -0400 +++ b/pre_sartools.xml Thu Jan 07 11:12:01 2021 +0000 @@ -1,10 +1,10 @@ - + generate design/target file and archive for SARTools inputs - - pre_sartools.py + + ]]> @@ -101,7 +101,7 @@ - + @@ -115,4 +115,7 @@ * In input of SARTools, don't change the "factor of interest", by default is group but you need to change the "Reference biological condition" in order to correspond to one name of the groups. * If you add a blocking factor, you need to specify it in "Advanced Parameters" of SARTools + + 10.1371/journal.pone.0157022 + diff -r de6d0b7c17af -r 05c9b1a7f44e repository_dependencies.xml --- a/repository_dependencies.xml Mon Oct 01 05:07:56 2018 -0400 +++ b/repository_dependencies.xml Thu Jan 07 11:12:01 2021 +0000 @@ -1,4 +1,4 @@ - + - - + + \ No newline at end of file diff -r de6d0b7c17af -r 05c9b1a7f44e template_script_DESeq2_CL.r --- a/template_script_DESeq2_CL.r Mon Oct 01 05:07:56 2018 -0400 +++ b/template_script_DESeq2_CL.r Thu Jan 07 11:12:01 2021 +0000 @@ -184,6 +184,7 @@ save.image(file=paste0(projectName, ".RData")) # generating HTML report +Sys.setenv(HOME = getwd()) writeReport.DESeq2(target=target, counts=counts, out.DESeq2=out.DESeq2, summaryResults=summaryResults, majSequences=majSequences, workDir=workDir, projectName=projectName, author=author, targetFile=targetFile, rawDir=rawDir, featuresToRemove=featuresToRemove, varInt=varInt, diff -r de6d0b7c17af -r 05c9b1a7f44e test-data/test_output_target.html --- a/test-data/test_output_target.html Mon Oct 01 05:07:56 2018 -0400 +++ b/test-data/test_output_target.html Thu Jan 07 11:12:01 2021 +0000 @@ -1,5 +1,5 @@ label files group -group1_rep1 dataset_1.dat group1 -group1_rep2 dataset_2.dat group1 -group2_rep1 dataset_3.dat group2 -group2_rep2 dataset_4.dat group2 +group1_rep1 dataset_[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}.dat group1 +group1_rep2 dataset_[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}.dat group1 +group2_rep1 dataset_[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}.dat group2 +group2_rep2 dataset_[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}.dat group2