A very useful function in Perl is split, which splits up a string and
places it into an array. The function uses a regular expression and as
usual works on the $_ variable unless otherwise specified.
The split function is used like this:
$info = "Caine:Michael:Actor:14, Leafy Drive";
@personal = split(/:/, $info);
which has the same overall effect as
@personal = ("Caine", "Michael", "Actor", "14, Leafy Drive");
If we have the information stored in the $_ variable then we can just
use this instead
@personal = split(/:/);
If the fields are divided by any number of colons then we can use the
RE codes to get round this. The code
$_ = "Capes:Geoff::Shot putter:::Big Avenue";
@personal = split(/:+/);
is the same as
@personal = ("Capes", "Geoff",
"Shot putter", "Big Avenue");
But this:
$_ = "Capes:Geoff::Shot putter:::Big Avenue";
@personal = split(/:/);
would be like
@personal = ("Capes", "Geoff", "",
"Shot putter", "", "", "Big Avenue");
A word can be split into characters, a sentence split into words and a
paragraph split into sentences:
@chars = split(//, $word);
@words = split(/ /, $sentence);
@sentences = split(/\./, $paragraph);
In the first case the null string is matched between each character,
and that is why the @chars array is an array of characters - ie an
array of strings of length 1.
A useful tool in natural language processing is concordance. This
allows a specific string to be displayed in its immediate context
whereever it appears in a text. For example, a concordance program
identifying the target string the might produce some of the following
output. Notice how the occurrences of the target string line up
vertically.
discovered (this is the truth) that when he
t kinds of metal to the leg of a frog, an e
rrent developed and the frog's leg kicked,
longer attached to the frog, which was dea
normous advances in the field of amphibian
ch it hop back into the pond -- almost. Bu
ond -- almost. But the greatest Electrical
ectrical Pioneer of them all was Thomas Edi
This exercise is to write such a program. Here are some tips:
* Read the entire file into array (this obviously isn't useful in
general because the file may be extremely large, but we won't
worry about that here). Each item in the array will be a line of
the file.
* When the chop function is used on an array it chops off the last
character of every item in the array.
* Recall that you can join the whole array together with a statement
like $text = "@lines";
* Use the target string as delimiter for splitting the text. (Ie,
use the target string in place of the colon in our previous
examples.) You should then have an array of all the strings
between the target strings.
* For each array element in turn, print it out, print the target
string, and then print the next array element.
* Recall that the last element of an array @food has index $#food.
As it stands this would be a pretty good program, but the target
strings won't line up vertically. To tidy up the strings you'll need
the substr function. Here are three examples of its use.
substr("Once upon a time", 3, 4); # returns "e up"
substr("Once upon a time", 7); # returns "on a time"
substr("Once upon a time", -6, 5); # returns "a tim"
The first example returns a substring of length 4 starting at position
3. Remember that the first character of a string has index 0. The
second example shows that missing out the length gives the substring
right to the end of the string The third example shows that you can
also index from the end using a negative index. It returns the
substring that starts at the 6th character from the end and has length
5.
If you use a negative index that extends beyond the beginning of the
string then Perl will return nothing or give a warning. To avoid this
happening you can pad out the string by using the x operator mentioned
earlier. The expression (" "x30) produces 30 spaces, for example.
_________________________________________________________________