annotate bin/maf-sort @ 2:f274c166e738 default tip

remove comments in bsfcall_wrapper.xml
author yutaka-saito
date Sun, 19 Apr 2015 23:02:04 +0900
parents 06f8460885ff
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
1 #! /bin/sh
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
2
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
3 # Sort MAF-format alignments by sequence name, then strand, then start
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
4 # position, then end position, of the top sequence. Also, merge
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
5 # identical alignments. Comment lines starting with "#" are written
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
6 # at the top, in unchanged order. If option "-d" is specified, then
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
7 # alignments that appear only once are omitted (like uniq -d).
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
8
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
9 # Minor flaws, that do not matter for typical MAF input:
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
10 # 1) It might not work if the input includes TABs.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
11 # 2) Preceding whitespace is considered part of the sequence name. I
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
12 # want to use sort -b, but it seems to be broken in different ways for
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
13 # different versions of sort!
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
14 # 3) Alignments with differences in whitespace are considered
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
15 # non-identical.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
16
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
17 # This script uses perl instead of specialized commands like uniq.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
18 # The reason is that, on some systems (e.g. Mac OS X), uniq doesn't
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
19 # work with long lines.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
20
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
21 # Make "sort" use a standard ordering:
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
22 LC_ALL=C
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
23 export LC_ALL
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
24
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
25 uniqOpt=1
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
26 whichSequence=1
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
27 while getopts hdn: opt
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
28 do
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
29 case $opt in
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
30 h) cat <<EOF
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
31 Usage: $(basename $0) [options] my-alignments.maf
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
32
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
33 Options:
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
34 -h show this help message and exit
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
35 -d only print duplicate alignments
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
36 -n sort by the n-th sequence (default: 1)
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
37 EOF
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
38 exit
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
39 ;;
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
40 d) uniqOpt=2
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
41 ;;
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
42 n) whichSequence="$OPTARG"
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
43 ;;
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
44 esac
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
45 done
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
46 shift $((OPTIND - 1))
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
47
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
48 baseField=$((6 * $whichSequence))
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
49 a=$(($baseField - 4))
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
50 a=$a,$a
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
51 b=$(($baseField - 1))
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
52 b=$b,$b
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
53 c=$(($baseField - 3))
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
54 c=$c,$c
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
55 d=$(($baseField - 2))
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
56 d=$d,$d
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
57
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
58 # 1) Add digits to "#" lines, so that sorting won't change their order.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
59 # 2) Replace spaces, except in "s" lines.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
60 # 3) Join each alignment into one big line.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
61 perl -pe '
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
62 s/^#/sprintf("#%.9d",$c++)/e;
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
63 y/ /\a/ unless /^s/;
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
64 y/\n/\b/ if /^\w/;
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
65 ' "$@" |
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
66
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
67 sort -k$a -k$b -k${c}n -k${d}n | # sort the lines
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
68
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
69 # Print only the first (or second) of each run of identical lines:
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
70 perl -ne '$c = 0 if $x ne $_; $x = $_; print if ++$c == '$uniqOpt |
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
71
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
72 # 1) Remove the digits from "#" lines.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
73 # 2) Restore spaces and newlines.
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
74 perl -pe '
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
75 s/^#.{9}/#/;
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
76 y/\a\b/ \n/;
06f8460885ff migrate from GitHub
yutaka-saito
parents:
diff changeset
77 '