comparison table_compute.xml @ 3:60ff16842fcd draft

"planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/table_compute commit 5c7c463baf40edda673a569e91b2c2a5e3b6b4f8"
author iuc
date Fri, 18 Oct 2019 06:22:51 -0400
parents 02c3e335a695
children 93a3ce78ce55
comparison
equal deleted inserted replaced
2:02c3e335a695 3:60ff16842fcd
1295 </conditional> 1295 </conditional>
1296 </conditional> 1296 </conditional>
1297 </test> 1297 </test>
1298 </tests> 1298 </tests>
1299 <help><![CDATA[ 1299 <help><![CDATA[
1300 This tool computes table expressions on the element, row, and column basis. It can sub-select, 1300 Table Compute
1301 duplicate, as well as perform general and custom expressions on rows, columns or elements. 1301 -------------
1302
1303 This tool is a Galaxy wrapper for the `Pandas Data Analysis Library <https://pandas.pydata.org/>`_ in Python,
1304 for manipulating and computing expressions upon tabular data and matrices. It can perform functions on the
1305 element, row, and column basis, as well as sub-select, duplicate, replace, and perform general and custom
1306 expressions on rows, columns, and elements.
1307
1302 1308
1303 .. class:: infomark 1309 .. class:: infomark
1304 1310
1305 Only a single operation can be performed on the data. Multiple operations 1311 Only a single operation can be performed on the data. Multiple operations
1306 can be performed by chaining successive runs of this tool. This is to 1312 can be performed by chaining successive runs of this tool. This is to
1307 provide a more transparent workflow for complex operations. 1313 provide a more transparent workflow for complex operations.
1308 1314
1315
1316 Many of the examples given below relate to common research use-cases such as filtering large matrices for
1317 specific values, counting unique instances of elements, conditionally manipulating the data, and replacing
1318 unwanted values. Full table operations such as normalisation can be easily performed by scaling the data via
1319 mean/median/min/max (and many other) metrics, and general expressions can even be computed across multiple
1320 tables.
1309 1321
1310 1322
1311 Examples 1323 Examples
1312 ======== 1324 ========
1313 1325
1323 g2 3 6 9 1335 g2 3 6 9
1324 g3 4 8 12 1336 g3 4 8 12
1325 g4 81 6 3 1337 g4 81 6 3
1326 === === === === 1338 === === === ===
1327 1339
1328 and we want to duplicate c1 and remove c2. Also select g1 to g3 and add g2 at the end as well. This would result in the output table: 1340 and we want to duplicate c1 and remove c2. Also select g1 to g3 and add g2 at the end as well. This
1341 would result in the output table:
1329 1342
1330 === === === === 1343 === === === ===
1331 . c1 c1 c3 1344 . c1 c1 c3
1332 === === === === 1345 === === === ===
1333 g1 10 10 30 1346 g1 10 10 30
1339 In Galaxy we would select the following: 1352 In Galaxy we would select the following:
1340 1353
1341 * *Input Single or Multiple Tables* → **Single Table** 1354 * *Input Single or Multiple Tables* → **Single Table**
1342 * *Column names on first row?* → **Yes** 1355 * *Column names on first row?* → **Yes**
1343 * *Row names on first column?* → **Yes** 1356 * *Row names on first column?* → **Yes**
1344 * *Type of table operation* → **Drop, keep or duplicate rows and columns** 1357 * *Type of table operation* → **Drop, keep or duplicate rows and columns**
1345 1358
1346 * *List of columns to select* → **1,1,3** 1359 * *List of columns to select* → ``1,1,3``
1347 * *List of rows to select* → **1:3,2** 1360 * *List of rows to select* → ``1:3,2``
1348 * *Keep duplicate columns* → **Yes** 1361 * *Keep duplicate columns* → **Yes**
1349 * *Keep duplicate rows* → **Yes** 1362 * *Keep duplicate rows* → **Yes**
1350 1363
1351 Example 2: Filter for rows with row sums less than 50 1364 Example 2: Filter for rows with row sums less than 50
1352 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1365 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1374 In Galaxy we would select the following: 1387 In Galaxy we would select the following:
1375 1388
1376 * *Input Single or Multiple Tables* → **Single Table** 1389 * *Input Single or Multiple Tables* → **Single Table**
1377 * *Column names on first row?* → **Yes** 1390 * *Column names on first row?* → **Yes**
1378 * *Row names on first column?* → **Yes** 1391 * *Row names on first column?* → **Yes**
1379 * *Type of table operation* → **Filter rows or columns by their properties** 1392 * *Type of table operation* → **Filter rows or columns by their properties**
1380 1393
1381 * *Filter* → **Rows** 1394 * *Filter* → **Rows**
1382 * *Filter Criterion* → **Result of function applied to columns/rows** 1395 * *Filter Criterion* → **Result of function applied to columns/rows**
1383 1396
1384 * *Keep column/row if its observed* → **Sum** 1397 * *Keep column/row if its observed* → **Sum**
1385 * *is* → **< (Less Than)** 1398 * *is* → **< (Less Than)**
1386 * *this value* → **50** 1399 * *this value* → ``50``
1387 1400
1388 1401
1389 Example 3: Count the number of values per row smaller than a specified value 1402 Example 3: Count the number of values per row smaller than a specified value
1390 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1403 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1391 1404
1415 In Galaxy we would select the following: 1428 In Galaxy we would select the following:
1416 1429
1417 * *Input Single or Multiple Tables* → **Single Table** 1430 * *Input Single or Multiple Tables* → **Single Table**
1418 * *Column names on first row?* → **Yes** 1431 * *Column names on first row?* → **Yes**
1419 * *Row names on first column?* → **Yes** 1432 * *Row names on first column?* → **Yes**
1420 * *Type of table operation* → **Manipulate selected table elements** 1433 * *Type of table operation* → **Manipulate selected table elements**
1421 1434
1422 * *Operation to perform* → **Custom** 1435 * *Operation to perform* → **Custom**
1423 1436
1424 * *Custom Expression on 'elem'* → **elem < 10** 1437 * *Custom Expression on 'elem'* → ``elem < 10``
1425 1438
1426 * *Operate on elements* → **All** 1439 * *Operate on elements* → **All**
1427 1440
1428 **Note:** *There are actually simpler ways to achieve our purpose, but here we are demonstrating the use of a custom expression.* 1441 **Note:** *There are actually simpler ways to achieve our purpose, but here we are demonstrating
1442 the use of a custom expression.*
1429 1443
1430 After executing, we would then be presented with a table like so: 1444 After executing, we would then be presented with a table like so:
1431 1445
1432 === ===== ===== ===== 1446 === ===== ===== =====
1433 . c1 c2 c3 1447 . c1 c2 c3
1441 To get to our desired table, we would then process this table with the tool again: 1455 To get to our desired table, we would then process this table with the tool again:
1442 1456
1443 * *Input Single or Multiple Tables* → **Single Table** 1457 * *Input Single or Multiple Tables* → **Single Table**
1444 * *Column names on first row?* → **Yes** 1458 * *Column names on first row?* → **Yes**
1445 * *Row names on first column?* → **Yes** 1459 * *Row names on first column?* → **Yes**
1446 * *Type of table operation* → **Compute Expression across Rows or Columns** 1460 * *Type of table operation* → **Compute Expression across Rows or Columns**
1447 1461
1448 * *Calculate* → **Sum** 1462 * *Calculate* → **Sum**
1449 * *For each* → **Row** 1463 * *For each* → **Row**
1450 1464
1451 Executing this will sum all the 'True' values in each row. Note that the values must have no extra whitespace in them for this to work (e.g. 'True ' or ' True' will not be parsed correctly). 1465 Executing this will sum all the 'True' values in each row. Note that the values must have no
1466 extra whitespace in them for this to work (e.g. 'True ' or ' True' will not be parsed correctly).
1452 1467
1453 1468
1454 Example 4: Perform a scaled log-transformation conditionally 1469 Example 4: Perform a scaled log-transformation conditionally
1455 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1470 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1456 1471
1457 We want to perform a scaled log transformation on all values greater than 5, and set all other values to 1. 1472 We want to perform a scaled log transformation on all values greater than 5, and set all
1473 other values to 1.
1458 1474
1459 We have the following table: 1475 We have the following table:
1460 1476
1461 === === === === 1477 === === === ===
1462 . c1 c2 c3 1478 . c1 c2 c3
1481 In Galaxy we would select the following: 1497 In Galaxy we would select the following:
1482 1498
1483 * *Input Single or Multiple Tables* → **Single Table** 1499 * *Input Single or Multiple Tables* → **Single Table**
1484 * *Column names on first row?* → **Yes** 1500 * *Column names on first row?* → **Yes**
1485 * *Row names on first column?* → **Yes** 1501 * *Row names on first column?* → **Yes**
1486 * *Type of table operation* → **Manipulate selected table elements** 1502 * *Type of table operation* → **Manipulate selected table elements**
1487 1503
1488 * *Operation to perform* → **Custom** 1504 * *Operation to perform* → **Custom**
1489 1505
1490 * *Custom Expression* → :: 1506 * *Custom Expression* → ``(math.log(elem) / elem) if (elem > 5) else 1``
1491
1492 (math.log(elem) / elem) if (elem > 5) else 1
1493 1507
1494 * *Operate on elements* → **All** 1508 * *Operate on elements* → **All**
1495 1509
1496 1510
1497 Example 5: Perform a Full table operation 1511 Example 5: Perform a Full table operation
1506 g2 3 10 9 1520 g2 3 10 9
1507 g3 4 8 10 1521 g3 4 8 10
1508 g4 81 10 10 1522 g4 81 10 10
1509 === === === === 1523 === === === ===
1510 1524
1511 and we want to subtract from each column the mean of that column divided by the standard deviation of it to yield: 1525 and we want to subtract from each column the mean of that column divided by the standard
1526 deviation of it to yield:
1512 1527
1513 1528
1514 === ========= ========= ========= 1529 === ========= ========= =========
1515 . c1 c2 c3 1530 . c1 c2 c3
1516 === ========= ========= ========= 1531 === ========= ========= =========
1526 * *Column names on first row?* → **Yes** 1541 * *Column names on first row?* → **Yes**
1527 * *Row names on first column?* → **Yes** 1542 * *Row names on first column?* → **Yes**
1528 * *Type of table operation* → **Perform a Full Table Operation** 1543 * *Type of table operation* → **Perform a Full Table Operation**
1529 1544
1530 * *Operation* → **Custom** 1545 * *Operation* → **Custom**
1531 1546 * *Custom Expression on 'table' along axis (0 or 1)* → ``table - table.mean(0)/table.std(0)``
1532 * *Custom Expression on 'table' along axis (0 or 1)* → ::
1533
1534 table - table.mean(0)/table.std(0)
1535 1547
1536 1548
1537 Example 6: Perform operations on multiple tables 1549 Example 6: Perform operations on multiple tables
1538 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1550 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1539 1551
1656 * *Column names on first row?* → **Yes** 1668 * *Column names on first row?* → **Yes**
1657 * *Row names on first column?* → **Yes** 1669 * *Row names on first column?* → **Yes**
1658 * *Type of table operation* → **Perform a Full Table Operation** 1670 * *Type of table operation* → **Perform a Full Table Operation**
1659 1671
1660 * *Operation* → **Melt** 1672 * *Operation* → **Melt**
1661 * *Variable IDs* → "A" 1673 * *Variable IDs* → ``A``
1662 * *Unpivoted IDs* → "B,C" 1674 * *Unpivoted IDs* → ``B,C``
1663 1675
1664 This converts the "B" and "C" columns into variables. 1676 This converts the "B" and "C" columns into variables.
1665 1677
1666 1678
1667 Example 8: Pivot 1679 Example 8: Pivot
1695 * *Column names on first row?* → **Yes** 1707 * *Column names on first row?* → **Yes**
1696 * *Row names on first column?* → **Yes** 1708 * *Row names on first column?* → **Yes**
1697 * *Type of table operation* → **Perform a Full Table Operation** 1709 * *Type of table operation* → **Perform a Full Table Operation**
1698 1710
1699 * *Operation* → **Pivot** 1711 * *Operation* → **Pivot**
1700 * *Index* → "foo" 1712 * *Index* → ``foo``
1701 * *Column* → "bar" 1713 * *Column* → ``bar``
1702 * *Values* → "baz" 1714 * *Values* → ``baz``
1703 1715
1704 This splits the matrix using "foo" and "bar" using only the values from "baz". Header values may contain extra information. 1716 This splits the matrix using "foo" and "bar" using only the values from "baz". Header values
1717 may contain extra information.
1705 1718
1706 1719
1707 Example 9: Replacing text in specific rows or columns 1720 Example 9: Replacing text in specific rows or columns
1708 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1721 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1709 1722
1737 1750
1738 * *Type of table operation* → **Manipulate selected table elements** 1751 * *Type of table operation* → **Manipulate selected table elements**
1739 1752
1740 * *Operation to perform* → **Replace values** 1753 * *Operation to perform* → **Replace values**
1741 1754
1742 * *Replacement value* → :: 1755 * *Replacement value* → ``chr{elem:.0f}``
1743 1756
1744 chr{elem:.0f} 1757 Here, the placeholder ``{elem}`` lets us refer to each element's current value,
1745 1758 while the ``:.0f`` part is a format specifier that makes sure numbers are printed
1746 Here, the placeholder ``{elem}`` lets us refer to each element's 1759 without decimals (for a complete description of the available syntax see the
1747 current value, while the ``:.0f`` part is a format specifier that makes
1748 sure numbers are printed without decimals (for a complete description of
1749 the available syntax see the
1750 `Python Format Specification Mini-Language <https://docs.python.org/3/library/string.html#formatspec>`_). 1760 `Python Format Specification Mini-Language <https://docs.python.org/3/library/string.html#formatspec>`_).
1751 1761
1752 * *Operate on elements* → **Specific Rows and/or Columns** 1762 * *Operate on elements* → **Specific Rows and/or Columns**
1753 * *List of columns to select* → "2" 1763 * *List of columns to select* → ``2``
1754 * *List of rows to select* → "2,4" 1764 * *List of rows to select* → ``2,4``
1755 * *Inclusive Selection* → "No" 1765 * *Inclusive Selection* → ``No``
1756 1766
1757 1767
1758 If we wanted to instead add "chr" to the ALL elements in column 2 and rows 2 and 4, we would repeat the steps above but set the *Inclusive Selection* to "Yes", to give: 1768 If we wanted to instead add "chr" to the ALL elements in column 2 and rows 2 and 4, we
1769 would repeat the steps above but set the *Inclusive Selection* to "Yes", to give:
1759 1770
1760 === ===== ===== ===== 1771 === ===== ===== =====
1761 . c1 c2 c3 1772 . c1 c2 c3
1762 === ===== ===== ===== 1773 === ===== ===== =====
1763 g1 10 chr20 30 1774 g1 10 chr20 30
1764 g2 chr3 chr3 chr9 1775 g2 chr3 chr3 chr9
1765 g3 4 8 12 1776 g3 4 8 12
1766 g4 chr81 chr6 chr3 1777 g4 chr81 chr6 chr3
1767 === ===== ===== ===== 1778 === ===== ===== =====
1768 1779
1769
1770
1771 ]]></help> 1780 ]]></help>
1772 <citations></citations> 1781 <citations></citations>
1773 </tool> 1782 </tool>