3 Matching Annotations
- Apr 2017
-
dm.ubiquitypress.com dm.ubiquitypress.com
-
# script to process library-output # files for consistently # semicolon-delimited countries: USA, # Italy, Germany $in=shift; # take input file from command line $out = shift; # take output filename from command line open IN, $in or die "Cannot open $in for read:$!"; open OUT, ">$out" or die "Cannot open $out for write:$!"; print OUT "istc_number\tlocations\tcount\n"; while (<IN>) { $copycount=0; /^(i.\d{8})\t(.*)$/; $istc_number=$1; $locations=$2; @libraries=split /;/, $locations; foreach $library (@libraries) { while ($library=~/\((?:\D|\d+[^,])[^\(]*?\)/) { $library=~s/\((?:\D|\d+[^,])[^\(]*?\)//g; } #get rid of nested parentheses $library=~s/\((\d{1,2})[^\(]*\)/\(\1\)/g; #replace (3, 1 torn) with (3) if ($library=~/\((\d{1,2})\)/) {$copycount+=$1} else {$copycount++} } print OUT "$istc_number\t$locations\t$copycount\n"; }
Dan check.
-
$batch="istc.txt"; #name of file to search open BATCH, $batch or die "Cannot open $batch for read:$!"; while (<BATCH>) { if (/^[ ]*USA:\t(.*?)$/) { $match = $1; $hit=1; } if (/^ISTC.*(i.\d{8})/ and ($hit == 1)) { $hit = 0; $istc_number = $1; print "$istc_number\t$match\n"; } }
Check with Dan
-
# This script takes as input a # tab-delimited table of istc # numbers and imprint fields, # assumed here to be named 'imprint.txt'. # This script outputs the istc number # again as an index, followed by the # first imprint field only, then fields # containing the city and printer. Then # it outputs the years: the average # of all years in all imprint fields, # the earliest and then the latest such # year. The last column contains three # flags, either + or -. Signed cities, # printers, and dates appear as +++, # while the opposite would be ---. Years # appearing in single quotes ('1401') # have been ignored. # set imprint data file $batch="imprint.txt"; # open the file to process, or give an error # code open BATCH, $batch or die "Cannot open $batch for read:$!"; # create column titles print "istc_number\timprint\tcity\tprinter\tavg_year\tfirst_year\tlast_year\tflags\n"; while (<BATCH>) { # first, reset all variables undef @allyears; undef @sort; $firstyear=0; $lastyear=0; $avgyear = 0; $yearcount = 0; $flags='+++'; # save the input line as $record for later # use $record=$_; # get first two tab-delimited fields, the # ISTC Number and first imprint line /^(.*?)\t(.*?)\t/; $istc_number=$1; $imprint=$2; # search the imprint line for an optional # opening bracket, then the city, then a # colon, then the rest of the line $imprint=~/^(\[|)(.*?)(?:\]: |: )(.*)$/; $rightpart=$3; $city=$2; # if an opening bracket was found, flag #the city as unsigned if ($1) {substr $flags, 0, 1, "-"} # split the rest of the line by commas, # forming the array @printer @printer=split /, /, $rightpart; # fix 3 defective records: if there's # no comma found in the rest # of the line, and there's no number # to be found, add a dummy, # empty date element to array # fix defective imprint lines not # handled correctly: ip01005630 # (no year,), ic00216715 (no year,), ir00334450 if ($#printer==0 and $printer[0]!~/\d/) {push @printer, " "} # fix for two defective records with no # imprint data: print the # istc number and then skip the rest of the loop if ($record=~/^([^\t]*?)\t$/) { $istc_number=$1; print "$istc_number\n"; next; } # remove the last element of @printer array; # it's usually the date field $date = pop @printer; # fix for two deficient records containing neither # city nor printer, just dates if ($imprint !~/:/) { $date = $imprint; undef @printer; $city = ""; } # remove all brackets to test for a date; we # need to find the ca. 150 records of the # anomalous form 'City: printer, year, month # and day' $_ = $date; s/[\[\]]//g; $xdate=$_; # remove all brackets from current last element # of @printer array $ydate=@printer[-1]; $ydate=~s/[\[\]]//g; # if $date doesn't contain a year, then check # the last element of @printer; if it does, # pop it onto the front of $date if ($xdate !~/1[45]\d{2}|undated/i and $ydate=~/1[45]\d{2}/) { $date=pop(@printer).$date; $_ = $date; s/[\[\]]//g; $xdate=$_; } # now obliterate dates in single quotes regarded # as false $xdate=~s/'.*?'//g; # match a year 1400 to 1599 $xdate=~/(1[45]\d{2})/; # if we find it, use it, otherwise we have nothing # to test if ($1) {$testyear=$1} else {$testyear=""} # if we have a date to test, get the last two digits if ($testyear) {$yeardigits=substr $testyear, 2, 2} else {$yeardigits='####'} # if the last two digits are surrounded by brackets, # flag the date as unsigned. [14]94 is treated # as signed, 14[9]4 as unsigned $_ = $imprint; if (/\[[^\]]*$yeardigits[^\]]*\]|\[$yeardigits|$yeardigits\]/ or $yeardigits eq '####') { substr $flags, 2, 1, "-"; } # split the input line again on the tabs @checkdates = split /\t/, $record; # but discard the first two tabs $null=shift @checkdates; $null=shift @checkdates; # and add the date field previously identified unshift (@checkdates, $date); # this next loop extracts all years from each # imprint field in turn foreach $possibledate (@checkdates) { $_=$possibledate; # remove brackets, get rid of '1401' dates s/\[|\]|'.*?'//g; # find simple years, like 1493, 1494-, # 1498-1505 @simple_years=/(1[45]\d{2})/g; # add the years found to the list push (@allyears, @simple_years); # find dates like 1476-80 $_=$possibledate; @complex_years=/(1[45]\d{2}[\-\/]\d{2})\D/g; # first count the simple years in the next loop foreach $simpleyear(@simple_years) { $avgyear+=$simpleyear; $yearcount++; } # and add the second part to the list of years # in the following loop foreach $complexyear (@complex_years) { # find the element to split on: either - or / $split=substr($complexyear,4,1); # ignoring @temp[0], as it is already a simple_year @temp = split /$split/, $complexyear; @temp[1]=substr(@temp[0],0,2).@temp[1]; push (@allyears, @temp[1]); $avgyear+=@temp[1]; $yearcount++; } } # round to nearest year if ($yearcount) { $avgyear=int(($avgyear/$yearcount)+.5); } else { $avgyear = ""; } # now sort the years numerically @sort = sort { $a <=> $b } @allyears; $firstyear=@sort[0]; $lastyear= @sort[-1]; # put the printer back together $printer=join ', ', @printer; # add missing front or back brackets for aesthetics only $_ = $printer; if (/^[^\[]+\]/) {$printer='['.$printer} if (/\[[^\]]+$/) {$printer=$printer.']'} # now get rid of all brackets and store as $xprinter $_ = $printer; s/[\[\]]//g; # if the printer is enclosed in brackets, or begins with a # bracket, flag as unsigned $xprinter=$_; if ($imprint=~/\[[^\]]*\Q$xprinter\E[^\]]*/ or $printer=~/^\[/) { substr $flags, 1, 1, "-"; } # output the information and continue on to the next # record print "$istc_number\t$imprint\t$city\t$xprinter\t"; print "$avgyear\t$firstyear\t$lastyear\t$flags\n"; }
Check this with Dan.
-