{"id":297,"date":"2007-08-12T17:48:53","date_gmt":"2007-08-12T22:48:53","guid":{"rendered":"http:\/\/www.mccambridge.org\/blog\/2007\/08\/playing-with-perl\/"},"modified":"2022-09-11T00:40:41","modified_gmt":"2022-09-11T00:40:41","slug":"playing-with-perl","status":"publish","type":"post","link":"http:\/\/www.mccambridge.org\/blog\/2007\/08\/playing-with-perl\/","title":{"rendered":"Playing with PERL"},"content":{"rendered":"
Warning: The following content is of a ridiculously nerdy nature, and probably unsuited for most of the viewing audience. That said, I had a lot of fun writing it, so here it is \ud83d\ude42<\/em><\/p>\n My roommate Dave IMed me the other day with a problem: He needed a program in 30 minutes that would search through a text file for any occurrences of a list of CAPITALIZED words, and convert them to lowercase, wherever they occurred.<\/p>\n I received his message 20 minutes later, set to work, and in the last 9 minutes, developed a script for him, along with an extensive test case. PERL, of course, was born to solve this problem.<\/p>\n Here’s the the test case I made up, and the correct output, to get an idea of what I needed to do:<\/p>\n input.txt<\/p>\n There once was a Chicken named EGgS. output.txt<\/p>\n There once was a chicken named eggs. (I later realized this missed one rather important case… BUG:<\/strong> see if you can figure out what it is, and what the error in the first three drafts below is)<\/p>\n Here’s the first draft, which works (nearly…see bug note above) correctly and was submitted within the prescribed 30 minutes :-):<\/p>\n munge1.pl<\/p>\n But then I thought to myself… “Self, this is PERL. Surely there is a shorter way?” munge2.pl<\/p>\n Better, shorter, PERL-ier \ud83d\ude42<\/p>\n But still not really PERL. I mean, come on. There were 3 entire statements there. Laaame.<\/p>\n So I played a bit, and moved the first map around to compact two statements into one (admittedly, the map is just there to make sure the user’s list of TERMS to lowercase is *actually* lowercase, but I wanted to keep that bit of functionality):<\/p>\n munge3.pl<\/p>\n Nice. But still not PERL-y. \ud83d\ude09 munge.pl<\/p>\n Ahhh… that<\/em> is PERL :-). One line of file-munging goodness. Use only as directed:<\/p>\n \/usr\/bin\/env perl munge.pl input.txt > output.txt<\/p>\n Note:<\/strong> I know<\/em> this is not good coding practice, it was just fun to reduce to as short of a program as possible. And there’s something to be said for brevity, as well. (… is the soul of wit…)<\/em><\/p>\n If you’re concerned about the error-checking of said code, tack this on the end \ud83d\ude00<\/p>\n or die “Caught teh 3rrorz!!1”; # \ud83d\ude09<\/p>\n","protected":false},"excerpt":{"rendered":" Warning: The following content is of a ridiculously nerdy nature, and probably unsuited for most of the viewing audience. That said, I had a lot of fun writing it, so here it is \ud83d\ude42 My roommate Dave IMed me the other day with a problem: He needed a program in 30 minutes that would search […]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/posts\/297"}],"collection":[{"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/comments?post=297"}],"version-history":[{"count":1,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/posts\/297\/revisions"}],"predecessor-version":[{"id":1633,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/posts\/297\/revisions\/1633"}],"wp:attachment":[{"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/media?parent=297"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/categories?post=297"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/tags?post=297"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}
\nIt lived STRINGS in a barn.
\nThe CHICKEN was afraid of EGGSNITCHERS.
\nChicken likes Eggs served on STRINGS<\/p>\n
\nIt lived strings in a barn.
\nThe chicken was afraid of EGGSNITCHERS.
\nchicken likes eggs served on strings<\/p>\n#!\/usr\/bin\/env perl\nuse strict;\nmy @patterns = (\n\"chicken\",\n\"eggS\",\n\"strings\"\n);\n\n# Make sure user used lowercase\nmap { tr\/[A-Z]\/[a-z]\/; } @patterns;\n\nmy $input_file = $ARGV[0] or die \"Usage: go.pl n\";\nopen (FH, $input_file) or die \"Could not read from file: $input_filen\";\n\nwhile (my $line = ) {\nforeach (@patterns) {\n$line =~ s\/^$_(W)\/$_$1\/i;\n$line =~ s\/(W)$_$\/$1$_\/i;\n$line =~ s\/(W)$_(W)\/$1$_$2\/i;\n}\nprint $line;\n}\n\nclose FH;\n\nexit 0;\n<\/pre>\n
\nRemoving some “useless” error-checking and file parsing code in favor of a shell-out, I came up with this:<\/p>\n#!\/usr\/bin\/env perl\nmy @patterns = (\n\"chicken\",\n\"eggS\",\n\"strings\"\n);\n\nmap { tr\/[A-Z]\/[a-z]\/; } @patterns;\n\nmap {\nforeach $a (@patterns) {\ns\/^$a(W)\/$a$1\/i;\ns\/(W)$a$\/$1$a\/i;\ns\/(W)$a(W)\/$1$a$2\/i;\n}\nprint;\n} `cat $ARGV[0]`;\n<\/pre>\n
map { tr\/[A-Z]\/[a-z]\/; } (@patterns = (\"chicken\",\"eggS\",\"strings\"));\nmap { foreach $a (@patterns) { s\/^$a(W)\/$a$1\/i; s\/(W)$a$\/$1$a\/i; s\/(W)$a(W)\/$1$a$2\/i; } print; } `cat $ARGV[0]`;<\/pre>\n
\nThen it hit me: why am I wasting an entire statement to create an array that I will only use in one other statement? Oh, and while we’re at it, let’s cut those 3 regexps down to 1, courtesy of a good insight from Jason. (Use b and B to match word boundaries.) AND, in a throw-back to my early days of escaping URL strings in CGI (like: s\/(W)\/sprintf(\"%%%02x\",ord($1))\/eg<\/code>) let’s move the lowercase-ifying inside the regexp as well, eliminating the first map altogether.<\/p>\n
map { foreach $b(\"chicken\",\"eggS\",\"strings\"){s\/b$bB\/lc $b\/ieg;} print;} `cat $ARGV[0]`;<\/pre>\n