Number XML elements with Perl

Well, this turned out to be much simpler than I thought. There was a half-ready text document to be transformed into XML. It contained a lot of empty pb (pagenumber) elements, like this: <pb n=””/>. I had to insert page numbers as values of @n; the count was to grow by 2 from 50 to 512.

At first I got confused just trying to imagine a program that would do this, but eventually the following short script did the trick:

#!/usr/bin/perl
# stat.perl - number xml elements, starting from a given number
# usage: perl stat.perl filename startingnumber
    use strict;
    use warnings;
# get the values from commandline: name of file, first page number:
my $filein = $ARGV[0];
my $pagenum = $ARGV[1];
 
# avoid "wide character in print" warning:
binmode STDOUT, ":utf8";
# open $filein:
open(FILE, '<:encoding(utf8)', $filein) or die "Could not open '$filein' $!\n";
# read every line of file, find pb:
while (<FILE>) {
        if (/pb n="/) {
            s/""/"$pagenum"/;
print $_;
# increase counter for next match (50 52 54...):
$pagenum = $pagenum + 2;
# if nothing matches, just print the line:
        } else {
        	print $_; }
    }
# done!
 
z/perl-number-elements.txt · Last modified: 12. 01. 2013. 23:30 by njovanov
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki