about summary refs log tree commit diff

csv.sh

parse CSV files with pure POSIX shell!

CSV files have weird quoting rules, so parsing them with awk or cut won't cut it on its own. but we can convert them to a format that shell utilities like:

the short way

(see csv-min.sh)

to convert from a csv file to lines of tab-separated values:

LC_ALL=C sed -n 's/'"$(printf "\r")"'$//;s/\\/\\\\/g;s/'"$(printf "\t")"'/\\t/g;H;x;h;s/^\n//;s/\n/\\n/g;s/,/,,/g;s/$/,/;s/^/,/;s/,\([^",]*\("[^"]*\(""[^"]*\)*"[^",]*\)*\),/\1'"$(printf "\t")"'/g;/,$/d;s/.$//;s/,,/,/g;s/"\([^"]*\(""[^"]*\)*\)"/\1/g;s/""/"/g;p;s/.*//;h'

tabs, newlines, and backslashes are escaped into \t, \n, and \\, respectively.

foo,bar,"baz ""quuz"" \etc"

foo[TAB]bar[TAB]baz "quuz" \\etc

now you can parse with regular shell tools:

  • cut -f2
  • awk -F'\t' '{print $2 + $3}'
  • etc.

to convert back to CSV:

LC_ALL=C sed 's/"/""/g;s/'"$(printf "\t")"'/","/g;s/^/"/;s/$/"/;s/\\\\/& /g;s/\\n/\n/g;s/\\t/'"$(printf "\t")"'/g;s/\\\\ /\\/g;'

(this program doesn't output CR LFs, but you can modify it to! sed 's/$/'"$(printf "\r")"'/')

what?

see csv.sh.

disclaimer

you shouldn't trust yourself to verify a CSV parser, let alone trust me to write one!

CSV is an amalgamation of formats, loosely described by RFC 4180. i try to be slightly more lenient than RFC 4180, and i tried my parser on output from a variety of programs, but i don't guarantee correctness for weird files.

license

made by Natalia Posting in 2023.

Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.

THE SOFTWARE IS PROVIDED “AS IS” AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.