Search CroALa, get back number of occurrences

  • Transform a list of words, removing endings, preparing it for Philologic regex search (to cover orthographic variants)
  • Send a list of words to CroALa, transform results to a list with the number of found occurrences
  • Transform the result list in a HTML list with live links querying CroALa

Transform a list of words, removing endings

This is done with a bash script and a perl script (perl uses Lingua::LA::Stemmer).

zacroala.sh bash script

Here is the zacroala.sh script:

#!/bin/bash
# Jovanovic, 2012-10, format a list of words for CroALa orthographic search
# usage: ./zacroala.sh filename
# take argument, find file
file=$1
./lastem.pl ${file} \
| awk '{ print length(), $0}' \
| sort -n \
| awk '{$1=""; print $0}' \
| sed 's/ //g' \
| tr '[:lower:]' '[:upper:]' \
| tr "JY" "I" \
| tr "V" "U" \
| sed 's/\([AO]\)E/[AO]?E/g' \
| sed 's/\([BCDFGHLMNPRST]\)\1/\1?\1/g' \
| sed 's/H/H?/g' \
| sed 's/T\([^TH?]\)/TH?\1/g' \
| sed 's/\(.*\)/\1*/g' - >> ${file}-zacroala

lastem.pl script for stemming Latin

Here is the lastem.pl, called from zacroala.sh:

#!/usr/bin/perl
#!/usr/bin/perl -w
# lastem.pl - read in a file of latin words, turn it into an array, print out the stems
# synopsis: lastem.pl somefile
# Jovanovic, 5/10/2012
use strict;
use warnings;
use Lingua::LA::Stemmer;
use File::Slurp 'read_file';
 
# give us a file:
my $fname  = shift or die 'filename!';
# turn it into an array with slurp:
my @words = read_file $fname;
# stem the array (hard reference...):
my $stems = Lingua::LA::Stemmer::stem(\@words);
# print the result:
print "$_ \n" for (@$stems);

Send the list to CroALa, process the resulting HTML

See the script localcaula.sh, here (htmlized).

Transform the result list with occurrences into a HTML list with live links

Here is the zacr-rez.sh script:

#!/bin/bash
# Jovanovic, 2012-10, transforms a list of results into live links for CroALa
# usage: ./zacr-rez.sh filename
# take argument, find file
file=$1
sort ${file} \
| sed 's#^\(.*\)\( =.*\)#<li><a href="http://www.ffzg.unizg.hr/klafil/croala/cgi-bin/search3t?dbname=croala\&word=\1\&OUTPUT=TF">\1</a>\2#g' > ${file}.html
 
# end of script
 
z/croala-list-occur.txt · Last modified: 28. 10. 2012. 11:28 by njovanov
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki