<ul><h2><a name=31> * SUBSTITUTION AND TRANSLATION * </a></h2></ul>


   As well as identifying regular expressions Perl can make substitutions
   based on those matches. The way to do this is to use the s function
   which is designed to mimic the way substitution is done in the vi text
   editor. Once again the match operator is used, and once again if it is
   omitted then the substitution is assumed to take place with the $_
   variable.

   To replace an occurrence of london by London in the string $sentence
   we use the expression

$sentence =~ s/london/London/

   and to do the same thing with the $_ variable just

s/london/London/

   Notice that the two regular expressions (london and London) are
   surrounded by a total of three slashes. The result of this expression
   is the number of substitutions made, so it is either 0 (false) or 1
   (true) in this case.


<ul><h2><a name=32>Options</a></h2></ul>

   This example only replaces the first occurrence of the string, and it
   may be that there will be more than one such string we want to
   replace. To make a global substitution the last slash is followed by a
   g as follows:

s/london/London/g

   which of course works on the $_ variable. Again the expression returns
   the number of substitutions made, which is 0 (false) or something
   greater than 0 (true).

   If we want to also replace occurrences of lOndon, lonDON, LoNDoN and
   so on then we could use

s/[Ll][Oo][Nn][Dd][Oo][Nn]/London/g

   but an easier way is to use the i option (for "ignore case"). The
   expression

s/london/London/gi

   will make a global substitution ignoring case. The i option is also
   used in the basic /.../ regular expression match.


<ul><h2><a name=33>Remembering patterns</a></h2></ul>

   It's often useful to remember patterns that have been matched so that
   they can be used again. It just so happens that anything matched in
   parentheses gets remembered in the variables $1,...,$9. These strings
   can also be used in the same regular expression (or substitution) by
   using the special RE codes \1,...,\9. For example

$_ = "Lord Whopper of Fibbing";
s/([A-Z])/:\1:/g;
print "$_\n";

   will replace each upper case letter by that letter surrounded by
   colons. It will print :L:ord :W:hopper of :F:ibbing. The variables
   $1,...,$9 are read-only variables; you cannot alter them yourself.

   As another example, the test

if (/(\b.+\b) \1/)
{
        print "Found $1 repeated\n";
}

   will identify any words repeated. Each \b represents a word boundary
   and the .+ matches any non-empty string, so \b.+\b matches anything
   between two word boundaries. This is then remembered by the
   parentheses and stored as \1 for regular expressions and as $1 for the
   rest of the program.

   The following swaps the first and last characters of a line in the $_
   variable:

s/^(.)(.*)(.)$/\3\2\1/

   The ^ and $ match the beginning and end of the line. The \1 code
   stores the first character; the \2 code stores everything else up the
   last character which is stored in the \3 code. Then that whole line is
   replaced with \1 and \3 swapped round.

   After a match, you can use the special read-only variables $` and $&
   and $' to find what was matched before, during and after the seach. So
   after

$_ = "Lord Whopper of Fibbing";
/pp/;

   all of the following are true. (Remember that eq is the
   string-equality test.)

$` eq "Lord Wo";
$& eq "pp";
$' eq "er of Fibbing";


   Finally on the subject of remembering patterns it's worth knowing that
   inside of the slashes of a match or a substitution variables are
   interpolated. So

$search = "the";
s/$search/xxx/g;

   will replace every occurrence of the with xxx. If you want to replace
   every occurence of there then you cannot do s/$searchre/xxx/ because
   this will be interpolated as the variable $searchre. Instead you
   should put the variable name in curly braces so that the code becomes

$search = "the";
s/${search}re/xxx/;


<ul><h2><a name=34>Translation</a></h2></ul>

   The tr function allows character-by-character translation. The
   following expression replaces each a with e, each b with d, and each c
   with f in the variable $sentence. The expression returns the number of
   substitutions made.

$sentence =~ tr/abc/edf/


   Most of the special RE codes do not apply in the tr function. For
   example, the statement here counts the number of asterisks in the
   $sentence variable and stores that in the $count variable.

$count = ($sentence =~ tr/*/*/);

   However, the dash is still used to mean "between". This statement
   converts $_ to upper case.

tr/a-z/A-Z/;


<ul><h2><a name=35>Exercise</a></h2></ul>

   Your current program should count lines of a file which contain a
   certain string. Modify it so that it counts lines with double letters
   (or any other double character). Modify it again so that these double
   letters appear also in parentheses. For example your program would
   produce a line like this among others:

023 Amp, James Wa(tt), Bob Transformer, etc. These pion(ee)rs conducted many

   Try to get it so that all pairs of letters are in parentheses, not
   just the first pair on each line.

   For a slightly more interesting program you might like to try the
   following. Suppose your program is called countlines. Then you would
   call it with

./countlines

   However, if you call it with several arguments, as in

./countlines first second etc

   then those arguments are stored in the array @ARGV. In the above
   example we have $ARGV[0] is first and $ARGV[1] is second and $ARGV[2]
   is etc. Modify your program so that it accepts one argument and counts
   only those lines with that string. It should also put occurrences of
   this string in paretheses. So

./countlines the

   will output something like this line among others:

019 But (the) greatest Electrical Pioneer of (the)m all was Thomas Edison, who


     _________________________________________________________________