====== Length of words in a list ====== Task: given a list of words, how many are there consisting of one, two, three... n characters? Additional information: characters belong to the Unicode set. ===== Everyday job: read in a file ===== First we ensure Perl is dealing with Unicode characters, then read in a file, turn the file into an array, chomp the array (i. e. remove the newline): #!/usr/bin/perl -w # cntstr.pl -- count characters in Unicode string # usage: perl cntstr.pl filename use strict; use warnings; use utf8; binmode STDOUT, ":utf8"; my $filename = $ARGV[0]; open my $fh, "< :encoding(UTF-8)", $filename or die "open: $!"; # file into array: my @str = <$fh>; # chomp array: chomp (@str); ===== Magic: sort words by length ===== We sort words by length to get the range (from shortest to longest); out of curiosity, we print also the shortest, next longest, and the longest word. Recipe [[http://stackoverflow.com/questions/13372784/sorting-by-length-in-perl|found on Stack Overflow]]. # sort by length (sort the list in the elements from the longest string length to the smallest length) my @sorted = sort { length $a <=> length $b } @str; print "Shortest word: ", $sorted[0], ", ", length($sorted[0]), "\n"; print "Last longest word: ", $sorted[scalar(@sorted) - 2], ", ", length($sorted[scalar(@sorted) - 2]), "\n"; print "Longest word: ", $sorted[scalar(@sorted) - 1], ", ", length($sorted[scalar(@sorted) - 1]), "\n"; ===== Challenge: array of arrays ===== Once we know the range, we want to create a separate list for words with one character, a list for words with two characters, then three, four... all the way to 27. And then we want to count elements in each list. In Perl, a list of lists is called [[http://www.perlhowto.com/array_of_arrays|array of arrays]]. Quite a challenge --- not so much to understand how it works; it was more difficult to follow how it is actually done. # we want an array of arrays: 5s, 6s, 7s etc. # initialize top array: my @wordlengths = (); # create top array, holding 27 lists: foreach my $i ( 0 .. 26 ) { # loop over what we got from the sorted list: its number of elements, its values: foreach my $singleword (0.. scalar(@sorted) - 1) { # test whether the given length fits in the actual category: if (length($sorted[$singleword]) == $i + 1 ) { # if so, push it into the current subarray; # mind the curly brackets! push @{ $wordlengths[$i] }, $sorted[$singleword]; } } } ===== Let's see what we have ===== Finally, we have to print something to see where we are and what we've got. foreach my $b ( 0 .. 26 ) { print "Number of words with ", ($b + 1), " characters: ", scalar(@{$wordlengths[$b]}), "\n"; print "First word with ", ($b + 1), " characters: ", $wordlengths[$b][0], "\n"; } Think I'll dream about foreach loops. And accessing the array of arrays. Have to do it several more times to get used to it. ====== The original script ====== Here's what I've originally written, with Croatian variable names and messages. Ca. 35 lines of code. #!/usr/bin/perl -w # cntstr.pl -- count characters in Unicode string use strict; use warnings; use utf8; binmode STDOUT, ":utf8"; my $filename = $ARGV[0]; open my $fh, "< :encoding(UTF-8)", $filename or die "open: $!"; my @str = <$fh>; # chomp array: chomp (@str); # sort by length (sort the list in the elements from the longest string length to the smallest length) my @sorted = sort { length $a <=> length $b } @str; print "Najkraća riječ: ", $sorted[0], ", ", length($sorted[0]), "\n"; print "Predzadnja najduža riječ: ", $sorted[scalar(@sorted) - 2], ", ", length($sorted[scalar(@sorted) - 2]), "\n"; print "Najduža riječ: ", $sorted[scalar(@sorted) - 1], ", ", length($sorted[scalar(@sorted) - 1]), "\n"; # here we should have an array of arrays: 5s, 6s, 7s etc. # initialize top array my @brojevi = (); foreach my $i ( 0 .. 26 ) { foreach my $duzina (0.. scalar(@sorted) - 1) { if (length($sorted[$duzina]) == $i + 1 ) { push @{ $brojevi[$i] }, $sorted[$duzina]; } } } foreach my $b ( 0 .. 26 ) { print "Broj riječi od ", ($b + 1), " slova: ", scalar(@{$brojevi[$b]}), "\n"; print "Prva riječ s ", ($b + 1), " slova: ", $brojevi[$b][0], "\n"; }