====== Search CroALa, get back number of occurrences ======
* Transform a list of words, removing endings, preparing it for Philologic regex search (to cover orthographic variants)
* Send a list of words to CroALa, transform results to a list with the number of found occurrences
* Transform the result list in a HTML list with live links querying CroALa
===== Transform a list of words, removing endings =====
This is done with a bash script and a perl script (perl uses [[http://search.cpan.org/~xern/Lingua-LA-Stemmer-0.01/Stemmer.pm|Lingua::LA::Stemmer]]).
==== zacroala.sh bash script ====
Here is the ''zacroala.sh'' script:
#!/bin/bash
# Jovanovic, 2012-10, format a list of words for CroALa orthographic search
# usage: ./zacroala.sh filename
# take argument, find file
file=$1
./lastem.pl ${file} \
| awk '{ print length(), $0}' \
| sort -n \
| awk '{$1=""; print $0}' \
| sed 's/ //g' \
| tr '[:lower:]' '[:upper:]' \
| tr "JY" "I" \
| tr "V" "U" \
| sed 's/\([AO]\)E/[AO]?E/g' \
| sed 's/\([BCDFGHLMNPRST]\)\1/\1?\1/g' \
| sed 's/H/H?/g' \
| sed 's/T\([^TH?]\)/TH?\1/g' \
| sed 's/\(.*\)/\1*/g' - >> ${file}-zacroala
==== lastem.pl script for stemming Latin ====
Here is the ''lastem.pl'', called from ''zacroala.sh'':
#!/usr/bin/perl
#!/usr/bin/perl -w
# lastem.pl - read in a file of latin words, turn it into an array, print out the stems
# synopsis: lastem.pl somefile
# Jovanovic, 5/10/2012
use strict;
use warnings;
use Lingua::LA::Stemmer;
use File::Slurp 'read_file';
# give us a file:
my $fname = shift or die 'filename!';
# turn it into an array with slurp:
my @words = read_file $fname;
# stem the array (hard reference...):
my $stems = Lingua::LA::Stemmer::stem(\@words);
# print the result:
print "$_ \n" for (@$stems);
==== Send the list to CroALa, process the resulting HTML ====
See the script **localcaula.sh**, [[http://gss.srce.hr/pithos/rest/njovanovic@ffzg.hr/files/labor/localcaula-sh.html|here]] ([[http://emacswiki.org/emacs/Htmlize|htmlized]]).
==== Transform the result list with occurrences into a HTML list with live links ====
Here is the **zacr-rez.sh** script:
#!/bin/bash
# Jovanovic, 2012-10, transforms a list of results into live links for CroALa
# usage: ./zacr-rez.sh filename
# take argument, find file
file=$1
sort ${file} \
| sed 's#^\(.*\)\( =.*\)#\1\2#g' > ${file}.html
# end of script