Mercurial > repos > bgruening > sklearn_regression_metrics
annotate utils.py @ 12:8362c6cda4ef draft
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
| author | bgruening | 
|---|---|
| date | Fri, 17 Aug 2018 12:29:29 -0400 | 
| parents | |
| children | 1e02b574f5c0 | 
| rev | line source | 
|---|---|
| 12 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 1 import sys | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 2 import os | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 3 import pandas | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 4 import re | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 5 import pickle | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 6 import warnings | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 7 import numpy as np | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 8 import xgboost | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 9 import scipy | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 10 import sklearn | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 11 import ast | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 12 from asteval import Interpreter, make_symbol_table | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 13 from sklearn import metrics, model_selection, ensemble, svm, linear_model, naive_bayes, tree, neighbors | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 14 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 15 N_JOBS = int( os.environ.get('GALAXY_SLOTS', 1) ) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 16 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 17 def read_columns(f, c=None, c_option='by_index_number', return_df=False, **args): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 18 data = pandas.read_csv(f, **args) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 19 if c_option == 'by_index_number': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 20 cols = list(map(lambda x: x - 1, c)) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 21 data = data.iloc[:,cols] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 22 if c_option == 'all_but_by_index_number': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 23 cols = list(map(lambda x: x - 1, c)) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 24 data.drop(data.columns[cols], axis=1, inplace=True) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 25 if c_option == 'by_header_name': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 26 cols = [e.strip() for e in c.split(',')] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 27 data = data[cols] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 28 if c_option == 'all_but_by_header_name': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 29 cols = [e.strip() for e in c.split(',')] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 30 data.drop(cols, axis=1, inplace=True) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 31 y = data.values | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 32 if return_df: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 33 return y, data | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 34 else: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 35 return y | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 36 return y | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 37 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 38 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 39 ## generate an instance for one of sklearn.feature_selection classes | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 40 def feature_selector(inputs): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 41 selector = inputs["selected_algorithm"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 42 selector = getattr(sklearn.feature_selection, selector) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 43 options = inputs["options"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 44 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 45 if inputs['selected_algorithm'] == 'SelectFromModel': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 46 if not options['threshold'] or options['threshold'] == 'None': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 47 options['threshold'] = None | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 48 if inputs['model_inputter']['input_mode'] == 'prefitted': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 49 model_file = inputs['model_inputter']['fitted_estimator'] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 50 with open(model_file, 'rb') as model_handler: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 51 fitted_estimator = pickle.load(model_handler) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 52 new_selector = selector(fitted_estimator, prefit=True, **options) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 53 else: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 54 estimator_json = inputs['model_inputter']["estimator_selector"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 55 estimator = get_estimator(estimator_json) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 56 new_selector = selector(estimator, **options) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 57 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 58 elif inputs['selected_algorithm'] == 'RFE': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 59 estimator=get_estimator(inputs["estimator_selector"]) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 60 new_selector = selector(estimator, **options) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 61 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 62 elif inputs['selected_algorithm'] == 'RFECV': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 63 options['scoring'] = get_scoring(options['scoring']) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 64 options['n_jobs'] = N_JOBS | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 65 options['cv'] = get_cv( options['cv'].strip() ) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 66 estimator=get_estimator(inputs["estimator_selector"]) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 67 new_selector = selector(estimator, **options) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 68 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 69 elif inputs['selected_algorithm'] == "VarianceThreshold": | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 70 new_selector = selector(**options) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 71 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 72 else: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 73 score_func = inputs["score_func"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 74 score_func = getattr(sklearn.feature_selection, score_func) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 75 new_selector = selector(score_func, **options) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 76 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 77 return new_selector | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 78 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 79 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 80 def get_X_y(params, file1, file2): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 81 input_type = params["selected_tasks"]["selected_algorithms"]["input_options"]["selected_input"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 82 if input_type=="tabular": | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 83 header = 'infer' if params["selected_tasks"]["selected_algorithms"]["input_options"]["header1"] else None | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 84 column_option = params["selected_tasks"]["selected_algorithms"]["input_options"]["column_selector_options_1"]["selected_column_selector_option"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 85 if column_option in ["by_index_number", "all_but_by_index_number", "by_header_name", "all_but_by_header_name"]: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 86 c = params["selected_tasks"]["selected_algorithms"]["input_options"]["column_selector_options_1"]["col1"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 87 else: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 88 c = None | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 89 X = read_columns( | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 90 file1, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 91 c = c, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 92 c_option = column_option, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 93 sep='\t', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 94 header=header, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 95 parse_dates=True | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 96 ) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 97 else: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 98 X = mmread(file1) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 99 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 100 header = 'infer' if params["selected_tasks"]["selected_algorithms"]["input_options"]["header2"] else None | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 101 column_option = params["selected_tasks"]["selected_algorithms"]["input_options"]["column_selector_options_2"]["selected_column_selector_option2"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 102 if column_option in ["by_index_number", "all_but_by_index_number", "by_header_name", "all_but_by_header_name"]: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 103 c = params["selected_tasks"]["selected_algorithms"]["input_options"]["column_selector_options_2"]["col2"] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 104 else: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 105 c = None | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 106 y = read_columns( | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 107 file2, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 108 c = c, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 109 c_option = column_option, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 110 sep='\t', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 111 header=header, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 112 parse_dates=True | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 113 ) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 114 y=y.ravel() | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 115 return X, y | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 116 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 117 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 118 class SafeEval(Interpreter): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 119 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 120 def __init__(self, load_scipy=False, load_numpy=False): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 121 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 122 # File opening and other unneeded functions could be dropped | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 123 unwanted = ['open', 'type', 'dir', 'id', 'str', 'repr'] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 124 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 125 # Allowed symbol table. Add more if needed. | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 126 new_syms = { | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 127 'np_arange': getattr(np, 'arange'), | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 128 'ensemble_ExtraTreesClassifier': getattr(ensemble, 'ExtraTreesClassifier') | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 129 } | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 130 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 131 syms = make_symbol_table(use_numpy=False, **new_syms) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 132 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 133 if load_scipy: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 134 scipy_distributions = scipy.stats.distributions.__dict__ | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 135 for key in scipy_distributions.keys(): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 136 if isinstance(scipy_distributions[key], (scipy.stats.rv_continuous, scipy.stats.rv_discrete)): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 137 syms['scipy_stats_' + key] = scipy_distributions[key] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 138 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 139 if load_numpy: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 140 from_numpy_random = ['beta', 'binomial', 'bytes', 'chisquare', 'choice', 'dirichlet', 'division', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 141 'exponential', 'f', 'gamma', 'geometric', 'gumbel', 'hypergeometric', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 142 'laplace', 'logistic', 'lognormal', 'logseries', 'mtrand', 'multinomial', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 143 'multivariate_normal', 'negative_binomial', 'noncentral_chisquare', 'noncentral_f', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 144 'normal', 'pareto', 'permutation', 'poisson', 'power', 'rand', 'randint', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 145 'randn', 'random', 'random_integers', 'random_sample', 'ranf', 'rayleigh', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 146 'sample', 'seed', 'set_state', 'shuffle', 'standard_cauchy', 'standard_exponential', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 147 'standard_gamma', 'standard_normal', 'standard_t', 'triangular', 'uniform', | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 148 'vonmises', 'wald', 'weibull', 'zipf' ] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 149 for f in from_numpy_random: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 150 syms['np_random_' + f] = getattr(np.random, f) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 151 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 152 for key in unwanted: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 153 syms.pop(key, None) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 154 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 155 super(SafeEval, self).__init__( symtable=syms, use_numpy=False, minimal=False, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 156 no_if=True, no_for=True, no_while=True, no_try=True, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 157 no_functiondef=True, no_ifexp=True, no_listcomp=False, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 158 no_augassign=False, no_assert=True, no_delete=True, | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 159 no_raise=True, no_print=True) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 160 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 161 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 162 def get_search_params(params_builder): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 163 search_params = {} | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 164 safe_eval = SafeEval(load_scipy=True, load_numpy=True) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 165 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 166 for p in params_builder['param_set']: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 167 search_p = p['search_param_selector']['search_p'] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 168 if search_p.strip() == '': | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 169 continue | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 170 param_type = p['search_param_selector']['selected_param_type'] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 171 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 172 lst = search_p.split(":") | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 173 assert (len(lst) == 2), "Error, make sure there is one and only one colon in search parameter input." | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 174 literal = lst[1].strip() | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 175 ev = safe_eval(literal) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 176 if param_type == "final_estimator_p": | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 177 search_params["estimator__" + lst[0].strip()] = ev | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 178 else: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 179 search_params["preprocessing_" + param_type[5:6] + "__" + lst[0].strip()] = ev | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 180 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 181 return search_params | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 182 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 183 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 184 def get_estimator(estimator_json): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 185 estimator_module = estimator_json['selected_module'] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 186 estimator_cls = estimator_json['selected_estimator'] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 187 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 188 if estimator_module == "xgboost": | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 189 cls = getattr(xgboost, estimator_cls) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 190 else: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 191 module = getattr(sklearn, estimator_module) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 192 cls = getattr(module, estimator_cls) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 193 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 194 estimator = cls() | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 195 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 196 estimator_params = estimator_json['text_params'].strip() | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 197 if estimator_params != "": | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 198 try: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 199 params = safe_eval('dict(' + estimator_params + ')') | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 200 except ValueError: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 201 sys.exit("Unsupported parameter input: `%s`" %estimator_params) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 202 estimator.set_params(**params) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 203 if 'n_jobs' in estimator.get_params(): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 204 estimator.set_params( n_jobs=N_JOBS ) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 205 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 206 return estimator | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 207 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 208 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 209 def get_cv(literal): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 210 safe_eval = SafeEval() | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 211 if literal == "": | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 212 return None | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 213 if literal.isdigit(): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 214 return int(literal) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 215 m = re.match(r'^(?P<method>\w+)\((?P<args>.*)\)$', literal) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 216 if m: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 217 my_class = getattr( model_selection, m.group('method') ) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 218 args = safe_eval( 'dict('+ m.group('args') + ')' ) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 219 return my_class( **args ) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 220 sys.exit("Unsupported CV input: %s" %literal) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 221 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 222 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 223 def get_scoring(scoring_json): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 224 def balanced_accuracy_score(y_true, y_pred): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 225 C = metrics.confusion_matrix(y_true, y_pred) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 226 with np.errstate(divide='ignore', invalid='ignore'): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 227 per_class = np.diag(C) / C.sum(axis=1) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 228 if np.any(np.isnan(per_class)): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 229 warnings.warn('y_pred contains classes not in y_true') | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 230 per_class = per_class[~np.isnan(per_class)] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 231 score = np.mean(per_class) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 232 return score | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 233 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 234 if scoring_json['primary_scoring'] == "default": | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 235 return None | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 236 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 237 my_scorers = metrics.SCORERS | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 238 if 'balanced_accuracy' not in my_scorers: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 239 my_scorers['balanced_accuracy'] = metrics.make_scorer(balanced_accuracy_score) | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 240 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 241 if scoring_json['secondary_scoring'] != 'None'\ | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 242 and scoring_json['secondary_scoring'] != scoring_json['primary_scoring']: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 243 scoring = {} | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 244 scoring['primary'] = my_scorers[ scoring_json['primary_scoring'] ] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 245 for scorer in scoring_json['secondary_scoring'].split(','): | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 246 if scorer != scoring_json['primary_scoring']: | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 247 scoring[scorer] = my_scorers[scorer] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 248 return scoring | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 249 | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 250 return my_scorers[ scoring_json['primary_scoring'] ] | 
| 
8362c6cda4ef
planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/sklearn commit d00173591e4a783a4c1cb2664e4bb192ab5414f7
 bgruening parents: diff
changeset | 251 | 
