{"id":297,"date":"2007-08-12T17:48:53","date_gmt":"2007-08-12T22:48:53","guid":{"rendered":"http:\/\/www.mccambridge.org\/blog\/2007\/08\/playing-with-perl\/"},"modified":"2022-09-11T00:40:41","modified_gmt":"2022-09-11T00:40:41","slug":"playing-with-perl","status":"publish","type":"post","link":"http:\/\/www.mccambridge.org\/blog\/2007\/08\/playing-with-perl\/","title":{"rendered":"Playing with PERL"},"content":{"rendered":"<p><em>Warning: The following content is of a ridiculously nerdy nature, and probably unsuited for most of the viewing audience.  That said, I had a lot of fun writing it, so here it is \ud83d\ude42<\/em><\/p>\n<p>My roommate Dave IMed me the other day with a problem:  He needed a program in 30 minutes that would search through a text file for any occurrences of a list of CAPITALIZED words, and convert them to lowercase, wherever they occurred.<\/p>\n<p>I received his message 20 minutes later, set to work, and in the last 9 minutes, developed a script for him, along with an extensive test case.  PERL, of course, was born to solve this problem.<\/p>\n<p>Here&#8217;s the the test case I made up, and the correct output, to get an idea of what I needed to do:<\/p>\n<p>input.txt<\/p>\n<p class=\"code\">There once was a Chicken named EGgS.<br \/>\nIt lived STRINGS in a barn.<br \/>\nThe CHICKEN was afraid of EGGSNITCHERS.<br \/>\nChicken likes Eggs served on STRINGS<\/p>\n<p>output.txt<\/p>\n<p class=\"code\" lang=\"text\">There once was a chicken named eggs.<br \/>\nIt lived strings in a barn.<br \/>\nThe chicken was afraid of EGGSNITCHERS.<br \/>\nchicken likes eggs served on strings<\/p>\n<p>(I later realized this missed one rather important case&#8230; <strong>BUG:<\/strong> see if you can figure out what it is, and what the error in the first three drafts below is)<\/p>\n<p>Here&#8217;s the first draft, which works (nearly&#8230;see bug note above) correctly and was submitted within the prescribed 30 minutes :-):<\/p>\n<p>munge1.pl<\/p>\n<pre class=\"code\" lang=\"perl\">#!\/usr\/bin\/env perl\nuse strict;\nmy @patterns = (\n\"chicken\",\n\"eggS\",\n\"strings\"\n);\n\n# Make sure user used lowercase\nmap { tr\/[A-Z]\/[a-z]\/; } @patterns;\n\nmy $input_file = $ARGV[0] or die \"Usage: go.pl n\";\nopen (FH, $input_file) or die \"Could not read from file: $input_filen\";\n\nwhile (my $line = ) {\nforeach (@patterns) {\n$line =~ s\/^$_(W)\/$_$1\/i;\n$line =~ s\/(W)$_$\/$1$_\/i;\n$line =~ s\/(W)$_(W)\/$1$_$2\/i;\n}\nprint $line;\n}\n\nclose FH;\n\nexit 0;\n<\/pre>\n<p>But then I thought to myself&#8230; &#8220;Self, this is PERL.  Surely there is a shorter way?&#8221;<br \/>\nRemoving some &#8220;useless&#8221; error-checking and file parsing code in favor of a shell-out, I came up with this:<\/p>\n<p>munge2.pl<\/p>\n<pre class=\"code\" lang=\"perl\">#!\/usr\/bin\/env perl\nmy @patterns = (\n\"chicken\",\n\"eggS\",\n\"strings\"\n);\n\nmap { tr\/[A-Z]\/[a-z]\/; } @patterns;\n\nmap {\nforeach $a (@patterns) {\ns\/^$a(W)\/$a$1\/i;\ns\/(W)$a$\/$1$a\/i;\ns\/(W)$a(W)\/$1$a$2\/i;\n}\nprint;\n} `cat $ARGV[0]`;\n<\/pre>\n<p>Better, shorter, PERL-ier \ud83d\ude42<\/p>\n<p>But still not really PERL.  I mean, come on.  There were 3 entire statements there.  Laaame.<\/p>\n<p>So I played a bit, and moved the first map around to compact two statements into one (admittedly, the map is just there to make sure the user&#8217;s list of TERMS to lowercase is *actually* lowercase, but I wanted to keep that bit of functionality):<\/p>\n<p>munge3.pl<\/p>\n<pre class=\"code\" lang=\"perl\">map { tr\/[A-Z]\/[a-z]\/; } (@patterns =  (\"chicken\",\"eggS\",\"strings\"));\nmap { foreach $a (@patterns) { s\/^$a(W)\/$a$1\/i; s\/(W)$a$\/$1$a\/i; s\/(W)$a(W)\/$1$a$2\/i; } print; } `cat $ARGV[0]`;<\/pre>\n<p>Nice.  But still not PERL-y. \ud83d\ude09<br \/>\nThen it hit me: why am I wasting an entire statement to create an array that I will only use in one other statement?  Oh, and while we&#8217;re at it, let&#8217;s cut those 3 regexps down to 1, courtesy of a good insight from Jason.  (Use b and B to match word boundaries.) AND, in a throw-back to my early days of escaping URL strings in CGI (like: <code>s\/(W)\/sprintf(\"%%%02x\",ord($1))\/eg<\/code>) let&#8217;s move the lowercase-ifying inside the regexp as well, eliminating the first map altogether.<\/p>\n<p>munge.pl<\/p>\n<pre class=\"code\" lang=\"perl\">map { foreach $b(\"chicken\",\"eggS\",\"strings\"){s\/b$bB\/lc $b\/ieg;} print;} `cat $ARGV[0]`;<\/pre>\n<p>Ahhh&#8230; <em>that<\/em> is PERL :-).  One line of file-munging goodness.  Use only as directed:<\/p>\n<p class=\"code\">\/usr\/bin\/env perl munge.pl input.txt &gt; output.txt<\/p>\n<p><strong>Note:<\/strong> I <em>know<\/em> this is not good coding practice, it was just fun to reduce to as short of a program as possible.  And there&#8217;s something to be said for brevity, as well.  <em>(&#8230; is the soul of wit&#8230;)<\/em><\/p>\n<p>If you&#8217;re concerned about the error-checking of said code, tack this on the end \ud83d\ude00<\/p>\n<p class=\"code\" lang=\"perl\">or die &#8220;Caught teh 3rrorz!!1&#8221;;  # \ud83d\ude09<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Warning: The following content is of a ridiculously nerdy nature, and probably unsuited for most of the viewing audience. That said, I had a lot of fun writing it, so here it is \ud83d\ude42 My roommate Dave IMed me the other day with a problem: He needed a program in 30 minutes that would search [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/posts\/297"}],"collection":[{"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/comments?post=297"}],"version-history":[{"count":1,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/posts\/297\/revisions"}],"predecessor-version":[{"id":1633,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/posts\/297\/revisions\/1633"}],"wp:attachment":[{"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/media?parent=297"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/categories?post=297"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.mccambridge.org\/blog\/wp-json\/wp\/v2\/tags?post=297"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}