annotate 2.4/man/man3/String::Approx.3pm @ 13:e3609c8714fb draft

Uploaded
author plus91-technologies-pvt-ltd
date Fri, 30 May 2014 03:37:55 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
13
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
1 .\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.16)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
2 .\"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
3 .\" Standard preamble:
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
4 .\" ========================================================================
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
5 .de Sp \" Vertical space (when we can't use .PP)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
6 .if t .sp .5v
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
7 .if n .sp
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
8 ..
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
9 .de Vb \" Begin verbatim text
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
10 .ft CW
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
11 .nf
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
12 .ne \\$1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
13 ..
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
14 .de Ve \" End verbatim text
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
15 .ft R
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
16 .fi
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
17 ..
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
18 .\" Set up some character translations and predefined strings. \*(-- will
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
19 .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
20 .\" double quote, and \*(R" will give a right double quote. \*(C+ will
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
21 .\" give a nicer C++. Capital omega is used to do unbreakable dashes and
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
22 .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
23 .\" nothing in troff, for use with C<>.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
24 .tr \(*W-
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
25 .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
26 .ie n \{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
27 . ds -- \(*W-
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
28 . ds PI pi
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
29 . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
30 . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
31 . ds L" ""
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
32 . ds R" ""
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
33 . ds C` ""
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
34 . ds C' ""
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
35 'br\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
36 .el\{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
37 . ds -- \|\(em\|
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
38 . ds PI \(*p
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
39 . ds L" ``
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
40 . ds R" ''
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
41 'br\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
42 .\"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
43 .\" Escape single quotes in literal strings from groff's Unicode transform.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
44 .ie \n(.g .ds Aq \(aq
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
45 .el .ds Aq '
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
46 .\"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
47 .\" If the F register is turned on, we'll generate index entries on stderr for
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
48 .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
49 .\" entries marked with X<> in POD. Of course, you'll have to process the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
50 .\" output yourself in some meaningful fashion.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
51 .ie \nF \{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
52 . de IX
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
53 . tm Index:\\$1\t\\n%\t"\\$2"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
54 ..
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
55 . nr % 0
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
56 . rr F
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
57 .\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
58 .el \{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
59 . de IX
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
60 ..
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
61 .\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
62 .\"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
63 .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
64 .\" Fear. Run. Save yourself. No user-serviceable parts.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
65 . \" fudge factors for nroff and troff
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
66 .if n \{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
67 . ds #H 0
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
68 . ds #V .8m
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
69 . ds #F .3m
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
70 . ds #[ \f1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
71 . ds #] \fP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
72 .\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
73 .if t \{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
74 . ds #H ((1u-(\\\\n(.fu%2u))*.13m)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
75 . ds #V .6m
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
76 . ds #F 0
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
77 . ds #[ \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
78 . ds #] \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
79 .\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
80 . \" simple accents for nroff and troff
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
81 .if n \{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
82 . ds ' \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
83 . ds ` \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
84 . ds ^ \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
85 . ds , \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
86 . ds ~ ~
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
87 . ds /
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
88 .\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
89 .if t \{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
90 . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
91 . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
92 . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
93 . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
94 . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
95 . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
96 .\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
97 . \" troff and (daisy-wheel) nroff accents
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
98 .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
99 .ds 8 \h'\*(#H'\(*b\h'-\*(#H'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
100 .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
101 .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
102 .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
103 .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
104 .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
105 .ds ae a\h'-(\w'a'u*4/10)'e
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
106 .ds Ae A\h'-(\w'A'u*4/10)'E
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
107 . \" corrections for vroff
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
108 .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
109 .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
110 . \" for low resolution devices (crt and lpr)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
111 .if \n(.H>23 .if \n(.V>19 \
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
112 \{\
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
113 . ds : e
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
114 . ds 8 ss
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
115 . ds o a
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
116 . ds d- d\h'-1'\(ga
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
117 . ds D- D\h'-1'\(hy
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
118 . ds th \o'bp'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
119 . ds Th \o'LP'
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
120 . ds ae ae
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
121 . ds Ae AE
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
122 .\}
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
123 .rm #[ #] #H #V #F C
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
124 .\" ========================================================================
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
125 .\"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
126 .IX Title "Approx 3pm"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
127 .TH Approx 3pm "2013-01-22" "perl v5.14.2" "User Contributed Perl Documentation"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
128 .\" For nroff, turn off justification. Always turn off hyphenation; it makes
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
129 .\" way too many mistakes in technical documents.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
130 .if n .ad l
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
131 .nh
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
132 .SH "NAME"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
133 String::Approx \- Perl extension for approximate matching (fuzzy matching)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
134 .SH "SYNOPSIS"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
135 .IX Header "SYNOPSIS"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
136 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
137 \& use String::Approx \*(Aqamatch\*(Aq;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
138 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
139 \& print if amatch("foobar");
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
140 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
141 \& my @matches = amatch("xyzzy", @inputs);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
142 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
143 \& my @catches = amatch("plugh", [\*(Aq2\*(Aq], @inputs);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
144 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
145 .SH "DESCRIPTION"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
146 .IX Header "DESCRIPTION"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
147 String::Approx lets you match and substitute strings approximately.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
148 With this you can emulate errors: typing errorrs, speling errors,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
149 closely related vocabularies (colour color), genetic mutations (\s-1GAG\s0
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
150 \&\s-1ACT\s0), abbreviations (McScot, MacScot).
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
151 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
152 \&\s-1NOTE:\s0 String::Approx suits the task of \fBstring matching\fR, not
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
153 \&\fBstring comparison\fR, and it works for \fBstrings\fR, not for \fBtext\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
154 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
155 If you want to compare strings for similarity, you probably just want
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
156 the Levenshtein edit distance (explained below), the Text::Levenshtein
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
157 and Text::LevenshteinXS modules in \s-1CPAN\s0. See also Text::WagnerFischer
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
158 and Text::PhraseDistance. (There are functions for this in String::Approx,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
159 e.g. \fIadist()\fR, but their results sometimes differ from the bare Levenshtein
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
160 et al.)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
161 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
162 If you want to compare things like text or source code, consisting of
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
163 \&\fBwords\fR or \fBtokens\fR and \fBphrases\fR and \fBsentences\fR, or
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
164 \&\fBexpressions\fR and \fBstatements\fR, you should probably use some other
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
165 tool than String::Approx, like for example the standard \s-1UNIX\s0 \fIdiff\fR\|(1)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
166 tool, or the Algorithm::Diff module from \s-1CPAN\s0.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
167 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
168 The measure of \fBapproximateness\fR is the \fILevenshtein edit distance\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
169 It is the total number of \*(L"edits\*(R": insertions,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
170 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
171 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
172 \& word world
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
173 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
174 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
175 deletions,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
176 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
177 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
178 \& monkey money
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
179 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
180 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
181 and substitutions
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
182 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
183 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
184 \& sun fun
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
185 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
186 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
187 required to transform a string to another string. For example, to
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
188 transform \fI\*(L"lead\*(R"\fR into \fI\*(L"gold\*(R"\fR, you need three edits:
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
189 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
190 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
191 \& lead gead goad gold
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
192 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
193 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
194 The edit distance of \*(L"lead\*(R" and \*(L"gold\*(R" is therefore three, or 75%.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
195 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
196 \&\fBString::Approx\fR uses the Levenshtein edit distance as its measure, but
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
197 String::Approx is not well-suited for comparing strings of different
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
198 length, in other words, if you want a \*(L"fuzzy eq\*(R", see above.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
199 String::Approx is more like regular expressions or \fIindex()\fR, it finds
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
200 substrings that are close matches.>
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
201 .SH "MATCH"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
202 .IX Header "MATCH"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
203 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
204 \& use String::Approx \*(Aqamatch\*(Aq;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
205 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
206 \& $matched = amatch("pattern")
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
207 \& $matched = amatch("pattern", [ modifiers ])
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
208 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
209 \& $any_matched = amatch("pattern", @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
210 \& $any_matched = amatch("pattern", [ modifiers ], @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
211 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
212 \& @match = amatch("pattern")
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
213 \& @match = amatch("pattern", [ modifiers ])
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
214 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
215 \& @matches = amatch("pattern", @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
216 \& @matches = amatch("pattern", [ modifiers ], @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
217 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
218 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
219 Match \fBpattern\fR approximately. In list context return the matched
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
220 \&\fB\f(CB@inputs\fB\fR. If no inputs are given, match against the \fB\f(CB$_\fB\fR. In scalar
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
221 context return true if \fIany\fR of the inputs match, false if none match.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
222 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
223 Notice that the pattern is a string. Not a regular expression. None
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
224 of the regular expression notations (^, ., *, and so on) work. They
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
225 are characters just like the others. Note-on-note: some limited form
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
226 of \fI\*(L"regular expressionism\*(R"\fR is planned in future: for example
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
227 character classes ([abc]) and \fIany-chars\fR (.). But that feature will
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
228 be turned on by a special \fImodifier\fR (just a guess: \*(L"r\*(R"), so there
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
229 should be no backward compatibility problem.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
230 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
231 Notice also that matching is not symmetric. The inputs are matched
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
232 against the pattern, not the other way round. In other words: the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
233 pattern can be a substring, a submatch, of an input element. An input
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
234 element is always a superstring of the pattern.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
235 .SS "\s-1MODIFIERS\s0"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
236 .IX Subsection "MODIFIERS"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
237 With the modifiers you can control the amount of approximateness and
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
238 certain other control variables. The modifiers are one or more
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
239 strings, for example \fB\*(L"i\*(R"\fR, within a string optionally separated by
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
240 whitespace. The modifiers are inside an anonymous array: the \fB[ ]\fR
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
241 in the syntax are not notational, they really do mean \fB[ ]\fR, for
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
242 example \fB[ \*(L"i\*(R", \*(L"2\*(R" ]\fR. \fB[\*(L"2 i\*(R"]\fR would be identical.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
243 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
244 The implicit default approximateness is 10%, rounded up. In other
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
245 words: every tenth character in the pattern may be an error, an edit.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
246 You can explicitly set the maximum approximateness by supplying a
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
247 modifier like
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
248 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
249 .Vb 2
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
250 \& number
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
251 \& number%
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
252 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
253 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
254 Examples: \fB\*(L"3\*(R"\fR, \fB\*(L"15%\*(R"\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
255 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
256 Note that \f(CW\*(C`0%\*(C'\fR is not rounded up, it is equal to \f(CW0\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
257 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
258 Using a similar syntax you can separately control the maximum number
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
259 of insertions, deletions, and substitutions by prefixing the numbers
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
260 with I, D, or S, like this:
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
261 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
262 .Vb 6
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
263 \& Inumber
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
264 \& Inumber%
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
265 \& Dnumber
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
266 \& Dnumber%
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
267 \& Snumber
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
268 \& Snumber%
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
269 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
270 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
271 Examples: \fB\*(L"I2\*(R"\fR, \fB\*(L"D20%\*(R"\fR, \fB\*(L"S0\*(R"\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
272 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
273 You can ignore case (\fB\*(L"A\*(R"\fR becames equal to \fB\*(L"a\*(R"\fR and vice versa)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
274 by adding the \fB\*(L"i\*(R"\fR modifier.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
275 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
276 For example
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
277 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
278 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
279 \& [ "i 25%", "S0" ]
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
280 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
281 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
282 means \fIignore case\fR, \fIallow every fourth character to be \*(L"an edit\*(R"\fR,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
283 but allow \fIno substitutions\fR. (See \s-1NOTES\s0 about disallowing
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
284 substitutions or insertions.)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
285 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
286 \&\s-1NOTE:\s0 setting \f(CW\*(C`I0 D0 S0\*(C'\fR is not equivalent to using \fIindex()\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
287 If you want to use \fIindex()\fR, use \fIindex()\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
288 .SH "SUBSTITUTE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
289 .IX Header "SUBSTITUTE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
290 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
291 \& use String::Approx \*(Aqasubstitute\*(Aq;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
292 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
293 \& @substituted = asubstitute("pattern", "replacement")
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
294 \& @substituted = asubstitute("pattern", "replacement", @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
295 \& @substituted = asubstitute("pattern", "replacement", [ modifiers ])
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
296 \& @substituted = asubstitute("pattern", "replacement",
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
297 \& [ modifiers ], @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
298 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
299 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
300 Substitute approximate \fBpattern\fR with \fBreplacement\fR and return as a
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
301 list <copies> of \fB\f(CB@inputs\fB\fR, the substitutions having been made on the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
302 elements that did match the pattern. If no inputs are given,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
303 substitute in the \fB\f(CB$_\fB\fR. The replacement can contain magic strings
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
304 \&\fB$&\fR, \fB$`\fR, \fB$'\fR that stand for the matched string, the string
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
305 before it, and the string after it, respectively. All the other
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
306 arguments are as in \f(CW\*(C`amatch()\*(C'\fR, plus one additional modifier, \fB\*(L"g\*(R"\fR
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
307 which means substitute globally (all the matches in an element and not
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
308 just the first one, as is the default).
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
309 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
310 See \*(L"\s-1BAD\s0 \s-1NEWS\s0\*(R" about the unfortunate stinginess of \f(CW\*(C`asubstitute()\*(C'\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
311 .SH "INDEX"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
312 .IX Header "INDEX"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
313 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
314 \& use String::Approx \*(Aqaindex\*(Aq;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
315 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
316 \& $index = aindex("pattern")
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
317 \& @indices = aindex("pattern", @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
318 \& $index = aindex("pattern", [ modifiers ])
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
319 \& @indices = aindex("pattern", [ modifiers ], @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
320 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
321 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
322 Like \f(CW\*(C`amatch()\*(C'\fR but returns the index/indices at which the pattern
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
323 matches approximately. In list context and if \f(CW@inputs\fR are used,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
324 returns a list of indices, one index for each input element.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
325 If there's no approximate match, \f(CW\*(C`\-1\*(C'\fR is returned as the index.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
326 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
327 \&\s-1NOTE:\s0 if there is character repetition (e.g. \*(L"aa\*(R") either in
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
328 the pattern or in the text, the returned index might start
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
329 \&\*(L"too early\*(R". This is consistent with the goal of the module
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
330 of matching \*(L"as early as possible\*(R", just like regular expressions
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
331 (that there might be a \*(L"less approximate\*(R" match starting later is
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
332 of somewhat irrelevant).
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
333 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
334 There's also backwards-scanning \f(CW\*(C`arindex()\*(C'\fR.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
335 .SH "SLICE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
336 .IX Header "SLICE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
337 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
338 \& use String::Approx \*(Aqaslice\*(Aq;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
339 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
340 \& ($index, $size) = aslice("pattern")
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
341 \& ([$i0, $s0], ...) = aslice("pattern", @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
342 \& ($index, $size) = aslice("pattern", [ modifiers ])
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
343 \& ([$i0, $s0], ...) = aslice("pattern", [ modifiers ], @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
344 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
345 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
346 Like \f(CW\*(C`aindex()\*(C'\fR but returns also the size (length) of the match.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
347 If the match fails, returns an empty list (when matching against \f(CW$_\fR)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
348 or an empty anonymous list corresponding to the particular input.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
349 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
350 \&\s-1NOTE:\s0 size of the match will very probably be something you did not
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
351 expect (such as longer than the pattern, or a negative number). This
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
352 may or may not be fixed in future releases. Also the beginning of the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
353 match may vary from the expected as with \fIaindex()\fR, see above.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
354 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
355 If the modifier
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
356 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
357 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
358 \& "minimal_distance"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
359 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
360 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
361 is used, the minimal possible edit distance is returned as the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
362 third element:
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
363 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
364 .Vb 2
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
365 \& ($index, $size, $distance) = aslice("pattern", [ modifiers ])
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
366 \& ([$i0, $s0, $d0], ...) = aslice("pattern", [ modifiers ], @inputs)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
367 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
368 .SH "DISTANCE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
369 .IX Header "DISTANCE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
370 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
371 \& use String::Approx \*(Aqadist\*(Aq;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
372 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
373 \& $dist = adist("pattern", $input);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
374 \& @dist = adist("pattern", @input);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
375 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
376 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
377 Return the \fIedit distance\fR or distances between the pattern and the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
378 input or inputs. Zero edit distance means exact match. (Remember
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
379 that the match can 'float' in the inputs, the match is a substring
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
380 match.) If the pattern is longer than the input or inputs, the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
381 returned distance or distances is or are negative.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
382 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
383 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
384 \& use String::Approx \*(Aqadistr\*(Aq;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
385 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
386 \& $dist = adistr("pattern", $input);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
387 \& @dist = adistr("pattern", @inputs);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
388 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
389 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
390 Return the \fBrelative\fR \fIedit distance\fR or distances between the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
391 pattern and the input or inputs. Zero relative edit distance means
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
392 exact match, one means completely different. (Remember that the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
393 match can 'float' in the inputs, the match is a substring match.) If
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
394 the pattern is longer than the input or inputs, the returned distance
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
395 or distances is or are negative.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
396 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
397 You can use \fIadist()\fR or \fIadistr()\fR to sort the inputs according to their
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
398 approximateness:
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
399 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
400 .Vb 3
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
401 \& my %d;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
402 \& @d{@inputs} = map { abs } adistr("pattern", @inputs);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
403 \& my @d = sort { $d{$a} <=> $d{$b} } @inputs;
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
404 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
405 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
406 Now \f(CW@d\fR contains the inputs, the most like \f(CW"pattern"\fR first.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
407 .SH "CONTROLLING THE CACHE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
408 .IX Header "CONTROLLING THE CACHE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
409 \&\f(CW\*(C`String::Approx\*(C'\fR maintains a \s-1LU\s0 (least-used) cache that holds the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
410 \&'matching engines' for each instance of a \fIpattern+modifiers\fR. The
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
411 cache is intended to help the case where you match a small set of
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
412 patterns against a large set of string. However, the more engines you
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
413 cache the more you eat memory. If you have a lot of different
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
414 patterns or if you have a lot of memory to burn, you may want to
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
415 control the cache yourself. For example, allowing a larger cache
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
416 consumes more memory but probably runs a little bit faster since the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
417 cache fills (and needs flushing) less often.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
418 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
419 The cache has two parameters: \fImax\fR and \fIpurge\fR. The first one
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
420 is the maximum size of the cache and the second one is the cache
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
421 flushing ratio: when the number of cache entries exceeds \fImax\fR,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
422 \&\fImax\fR times \fIpurge\fR cache entries are flushed. The default
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
423 values are 1000 and 0.75, respectively, which means that when
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
424 the 1001st entry would be cached, 750 least used entries will
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
425 be removed from the cache. To access the parameters you can
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
426 use the calls
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
427 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
428 .Vb 2
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
429 \& $now_max = String::Approx::cache_max();
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
430 \& String::Approx::cache_max($new_max);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
431 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
432 \& $now_purge = String::Approx::cache_purge();
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
433 \& String::Approx::cache_purge($new_purge);
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
434 \&
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
435 \& $limit = String::Approx::cache_n_purge();
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
436 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
437 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
438 To be honest, there are actually \fBtwo\fR caches: the first one is used
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
439 far the patterns with no modifiers, the second one for the patterns
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
440 with pattern modifiers. Using the standard parameters you will
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
441 therefore actually cache up to 2000 entries. The above calls control
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
442 both caches for the same price.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
443 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
444 To disable caching completely use
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
445 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
446 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
447 \& String::Approx::cache_disable();
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
448 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
449 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
450 Note that this doesn't flush any possibly existing cache entries,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
451 to do that use
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
452 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
453 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
454 \& String::Approx::cache_flush_all();
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
455 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
456 .SH "NOTES"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
457 .IX Header "NOTES"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
458 Because matching is by \fIsubstrings\fR, not by whole strings, insertions
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
459 and substitutions produce often very similar results: \*(L"abcde\*(R" matches
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
460 \&\*(L"axbcde\*(R" either by insertion \fBor\fR substitution of \*(L"x\*(R".
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
461 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
462 The maximum edit distance is also the maximum number of edits.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
463 That is, the \fB\*(L"I2\*(R"\fR in
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
464 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
465 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
466 \& amatch("abcd", ["I2"])
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
467 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
468 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
469 is useless because the maximum edit distance is (implicitly) 1.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
470 You may have meant to say
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
471 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
472 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
473 \& amatch("abcd", ["2D1S1"])
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
474 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
475 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
476 or something like that.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
477 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
478 If you want to simulate transposes
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
479 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
480 .Vb 1
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
481 \& feet fete
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
482 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
483 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
484 you need to allow at least edit distance of two because in terms of
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
485 our edit primitives a transpose is first one deletion and then one
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
486 insertion.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
487 .SS "\s-1TEXT\s0 \s-1POSITION\s0"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
488 .IX Subsection "TEXT POSITION"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
489 The starting and ending positions of matching, substituting, indexing, or
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
490 slicing can be changed from the beginning and end of the input(s) to
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
491 some other positions by using either or both of the modifiers
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
492 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
493 .Vb 2
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
494 \& "initial_position=24"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
495 \& "final_position=42"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
496 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
497 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
498 or the both the modifiers
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
499 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
500 .Vb 2
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
501 \& "initial_position=24"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
502 \& "position_range=10"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
503 .Ve
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
504 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
505 By setting the \fB\*(L"position_range\*(R"\fR to be zero you can limit
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
506 (anchor) the operation to happen only once (if a match is possible)
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
507 at the position.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
508 .SH "VERSION"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
509 .IX Header "VERSION"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
510 Major release 3.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
511 .SH "CHANGES FROM VERSION 2"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
512 .IX Header "CHANGES FROM VERSION 2"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
513 .SS "\s-1GOOD\s0 \s-1NEWS\s0"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
514 .IX Subsection "GOOD NEWS"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
515 .IP "The version 3 is 2\-3 times faster than version 2" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
516 .IX Item "The version 3 is 2-3 times faster than version 2"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
517 .PD 0
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
518 .IP "No pattern length limitation" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
519 .IX Item "No pattern length limitation"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
520 .PD
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
521 The algorithm is independent on the pattern length: its time
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
522 complexity is \fIO(kn)\fR, where \fIk\fR is the number of edits and \fIn\fR the
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
523 length of the text (input). The preprocessing of the pattern will of
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
524 course take some \fIO(m)\fR (\fIm\fR being the pattern length) time, but
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
525 \&\f(CW\*(C`amatch()\*(C'\fR and \f(CW\*(C`asubstitute()\*(C'\fR cache the result of this
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
526 preprocessing so that it is done only once per pattern.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
527 .SS "\s-1BAD\s0 \s-1NEWS\s0"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
528 .IX Subsection "BAD NEWS"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
529 .IP "You do need a C compiler to install the module" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
530 .IX Item "You do need a C compiler to install the module"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
531 Perl's regular expressions are no more used; instead a faster and more
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
532 scalable algorithm written in C is used.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
533 .ie n .IP """asubstitute()"" is now always stingy" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
534 .el .IP "\f(CWasubstitute()\fR is now always stingy" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
535 .IX Item "asubstitute() is now always stingy"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
536 The string matched and substituted is now always stingy, as short
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
537 as possible. It used to be as long as possible. This is an unfortunate
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
538 change stemming from switching the matching algorithm. Example: with
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
539 edit distance of two and substituting for \fB\*(L"word\*(R"\fR from \fB\*(L"cork\*(R"\fR and
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
540 \&\fB\*(L"wool\*(R"\fR previously did match \fB\*(L"cork\*(R"\fR and \fB\*(L"wool\*(R"\fR. Now it does
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
541 match \fB\*(L"or\*(R"\fR and \fB\*(L"wo\*(R"\fR. As little as possible, or, in other words,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
542 with as much approximateness, as many edits, as possible. Because
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
543 there is no \fIneed\fR to match the \fB\*(L"c\*(R"\fR of \fB\*(L"cork\*(R"\fR, it is not matched.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
544 .ie n .IP "no more ""aregex()"" because regular expressions are no more used" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
545 .el .IP "no more \f(CWaregex()\fR because regular expressions are no more used" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
546 .IX Item "no more aregex() because regular expressions are no more used"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
547 .PD 0
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
548 .ie n .IP "no more ""compat1"" for String::Approx version 1 compatibility" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
549 .el .IP "no more \f(CWcompat1\fR for String::Approx version 1 compatibility" 4
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
550 .IX Item "no more compat1 for String::Approx version 1 compatibility"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
551 .PD
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
552 .SH "ACKNOWLEDGEMENTS"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
553 .IX Header "ACKNOWLEDGEMENTS"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
554 The following people have provided valuable test cases, documentation
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
555 clarifications, and other feedback:
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
556 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
557 Jared August, Arthur Bergman, Anirvan Chatterjee, Steve A. Chervitz,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
558 Aldo Calpini, David Curiel, Teun van den Dool, Alberto Fontaneda,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
559 Rob Fugina, Dmitrij Frishman, Lars Gregersen, Kevin Greiner,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
560 B. Elijah Griffin, Mike Hanafey, Mitch Helle, Ricky Houghton,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
561 \&'idallen', Helmut Jarausch, Damian Keefe, Ben Kennedy, Craig Kelley,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
562 Franz Kirsch, Dag Kristian, Mark Land, J. D. Laub, John P. Linderman,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
563 Tim Maher, Juha Muilu, Sergey Novoselov, Andy Oram, Ji Y Park,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
564 Eric Promislow, Nikolaus Rath, Stefan Ram, Slaven Rezic,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
565 Dag Kristian Rognlien, Stewart Russell, Slaven Rezic, Chris Rosin,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
566 Pasha Sadri, Ilya Sandler, Bob J.A. Schijvenaars, Ross Smith,
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
567 Frank Tobin, Greg Ward, Rich Williams, Rick Wise.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
568 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
569 The matching algorithm was developed by Udi Manber, Sun Wu, and Burra
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
570 Gopal in the Department of Computer Science, University of Arizona.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
571 .SH "AUTHOR"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
572 .IX Header "AUTHOR"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
573 Jarkko Hietaniemi <jhi@iki.fi>
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
574 .SH "COPYRIGHT AND LICENSE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
575 .IX Header "COPYRIGHT AND LICENSE"
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
576 Copyright 2001\-2013 by Jarkko Hietaniemi
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
577 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
578 This library is free software; you can redistribute it and/or modify
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
579 under either the terms of the Artistic License 2.0, or the \s-1GNU\s0 Library
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
580 General Public License, Version 2. See the files Artistic and \s-1LGPL\s0
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
581 for more details.
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
582 .PP
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
583 Furthermore: no warranties or obligations of any kind are given, and
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
584 the separate file \fI\s-1COPYRIGHT\s0\fR must be included intact in all copies
e3609c8714fb Uploaded
plus91-technologies-pvt-ltd
parents:
diff changeset
585 and derived materials.