====== Search CroALa, get back number of occurrences ====== * Transform a list of words, removing endings, preparing it for Philologic regex search (to cover orthographic variants) * Send a list of words to CroALa, transform results to a list with the number of found occurrences * Transform the result list in a HTML list with live links querying CroALa ===== Transform a list of words, removing endings ===== This is done with a bash script and a perl script (perl uses [[http://search.cpan.org/~xern/Lingua-LA-Stemmer-0.01/Stemmer.pm|Lingua::LA::Stemmer]]). ==== zacroala.sh bash script ==== Here is the ''zacroala.sh'' script: #!/bin/bash # Jovanovic, 2012-10, format a list of words for CroALa orthographic search # usage: ./zacroala.sh filename # take argument, find file file=$1 ./lastem.pl ${file} \ | awk '{ print length(), $0}' \ | sort -n \ | awk '{$1=""; print $0}' \ | sed 's/ //g' \ | tr '[:lower:]' '[:upper:]' \ | tr "JY" "I" \ | tr "V" "U" \ | sed 's/\([AO]\)E/[AO]?E/g' \ | sed 's/\([BCDFGHLMNPRST]\)\1/\1?\1/g' \ | sed 's/H/H?/g' \ | sed 's/T\([^TH?]\)/TH?\1/g' \ | sed 's/\(.*\)/\1*/g' - >> ${file}-zacroala ==== lastem.pl script for stemming Latin ==== Here is the ''lastem.pl'', called from ''zacroala.sh'': #!/usr/bin/perl #!/usr/bin/perl -w # lastem.pl - read in a file of latin words, turn it into an array, print out the stems # synopsis: lastem.pl somefile # Jovanovic, 5/10/2012 use strict; use warnings; use Lingua::LA::Stemmer; use File::Slurp 'read_file'; # give us a file: my $fname = shift or die 'filename!'; # turn it into an array with slurp: my @words = read_file $fname; # stem the array (hard reference...): my $stems = Lingua::LA::Stemmer::stem(\@words); # print the result: print "$_ \n" for (@$stems); ==== Send the list to CroALa, process the resulting HTML ==== See the script **localcaula.sh**, [[http://gss.srce.hr/pithos/rest/njovanovic@ffzg.hr/files/labor/localcaula-sh.html|here]] ([[http://emacswiki.org/emacs/Htmlize|htmlized]]). ==== Transform the result list with occurrences into a HTML list with live links ==== Here is the **zacr-rez.sh** script: #!/bin/bash # Jovanovic, 2012-10, transforms a list of results into live links for CroALa # usage: ./zacr-rez.sh filename # take argument, find file file=$1 sort ${file} \ | sed 's#^\(.*\)\( =.*\)#
  • \1\2#g' > ${file}.html # end of script