Wednesday, March 25, 2015

csveach: Simple, Yet Powerful CSV File Processing On The Command Line

Ahhh yes, the CSV format: it looks so easy to process. Yet an embedded newline here, a stray quote there, and it becomes an awful pain to work with. To that end, I give you csveach, a small perl script that attempts to make working with CSV files on the command line headache free.

Here's how it works. Suppose you've got some data:

# file: foo.csv
Alice,555-1212,"To Be,
Or not to Be,
That is the question!"
Bob,546-7777,"Better to keep your mouth shut and be thought a fool,
then to open it and remove all doubt."
Charlie,930-9808,"Faster.
Just run Faster."

(Note the annoying embedded newlines)

First, you craft a short shell script that's going to get executed for each row in the CSV file. Each column is handed to the script as a parameter. $1 is the row number in the CSV file, $2 is the first column, $3 the second, and so on. Here's a trivial shell script that would extract just the name and phone number from the above:

# file: name.sh
#!/bin/bash

echo "$2: $3"

Finally, you run:

  csveach ./name.sh foo.csv

Which gives the output:

Alice: 555-1212
Bob: 546-7777
Charlie: 930-9808

The shell script can just as easily work with the newline embedded text. For example:

# file: quote.sh
#!/bin/bash

quote=`echo "$4" | tr '\n' '|'`
len=`echo $quote  | wc -c`
echo "$2's $len words of wisdom: $quote"

This can be run as:

 csveach ./quote.sh foo.csv

and gives the output:

Alice's 44 words of wisdom: To Be,|Or not to Be,|That is the question!|
Bob's 93 words of wisdom: Better to keep your mouth shut and be thought a fool,|then to open it and remove all doubt.|
Charlie's 26 words of wisdom: Faster.|Just run Faster.|

By letting a shell script (or really any executable) do the heavy lifting, it's possible to reformat or process CSV data any way you want. And best of all, you can ignore all those exceptional cases that make CSV such a pain.

Enjoy!

#!/bin/perl

##
## For each row of a CSV file kick off a system command
##

use Text::CSV;

$script = shift(@ARGV);

my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
my $row_num = 0;

foreach my $file (@ARGV) {
  open my $io, "<", $file or die "$file: $!";

  while (my $row = $csv->getline ($io)) {
    $row_num++;
    my @args = @$row;
    unshift(@args, $row_num);
    (system $script, @args) == 0 or die "system @command failed: $?";
  }
}

2 comments: