Well, I hate to do this, but I basically solved your problem, at least I think so. I got so wrapped up in trying to find the best way that I found myself going through the code, and just writing it down. Also, I am using a perl IDE (Komodo) which helped in the debugging of most of the additional cases.
First, I split them based on where commas are. The problem is that you could have stuff within quotes that has commas in it. So, I look for an open quote, and if I find one, I try to figure out where the end quote is. Then, I print the stuff inbetween the quotes, a |, and move on. In the case where I just have regular "stuff", I just put it in with a | (except at the end, then I don't want to print one, because that would be silly).
I know there are cases it won't cover where the data is malformed and all freaked out. Give it a shot and let me know if it does what you really were looking for. Here is the sample input I used:
input wrote:
"i like cheese",cheese,is,good
"well, maybe",cheese,is,the,best
cheese,tastes,yummy
what,"are you, up to",cheese,man
today,is,"cheese, cheese, day",foo
today,is,"cheese, cheese, day"
whoops,"i, forgot, my, quote
i have,"two quotes",i have,"two quotes",ok
And here is the output:
output wrote:
i like cheese|cheese|is|good
well, maybe|cheese|is|the|best
cheese|tastes|yummy
what|are you, up to|cheese|man
today|is|cheese, cheese, day|foo
today|is|cheese, cheese, day
whoops|i, forgot, my, quote
i have|two quotes|i have|two quotes|ok
Here's the source:
Code:
#!/usr/bin/perl
open(INFILE,"<infile");
open(OUTFILE,">outfile");
while(<INFILE>)
{
my $line = $_;
chomp($line);
my @commas = split(/,/,$line);
for(my $i = 0; $i <= $#commas; $i++)
{
if($commas[$i] =~ /"/)
{
# sub off the first one so we can find out if it
# is only one item
$commas[$i] =~ s/"//;
my $j = $i;
while($commas[$j] !~ /"/)
{
# just keep lookin
if($j >= $#commas)
{
print "i could not find a matching quote, this line is borked! ";
print "i am going to assume it should have been at the end.\n";
$j = $#commas;
$commas[$j] .= """;
}
else
{
# put a comma back since we removed it
$commas[$j] .= ",";
$j++;
}
}
# we know i is the start, and j is the end,
# so everything inbetween is only one item.
$commas[$j] =~ s/"//;
for(my $start = $i; $start <= $j; $start++)
{
print OUTFILE $commas[$start];
if($start == $j && $j != $#commas)
{
print OUTFILE "|";
}
}
# skip i ahead to j, since we already got all that
$i = $j;
}
else
{
print OUTFILE $commas[$i];
print OUTFILE "|" unless $i == $#commas; # don't print one at the end
}
}
print OUTFILE "\n";
}
close(INFILE);
close(OUTFILE);