diff options
author | equa <equaa@protonmail.com> | 2023-07-27 16:45:21 -0400 |
---|---|---|
committer | equa <equaa@protonmail.com> | 2023-07-27 16:46:22 -0400 |
commit | 1cdf6c38216e47efd2884cb43fcd5239a876e588 (patch) | |
tree | 4458b2d92365a7222e9cd63cbae61d9b358f0b1d /README.md | |
parent | 77ec331c7dacad6b5028783288eb705bdb9ad22e (diff) |
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 15 |
1 files changed, 14 insertions, 1 deletions
diff --git a/README.md b/README.md index 806cb1d..91972e3 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,14 @@ parse CSV files with pure POSIX shell! +CSV files have weird quoting rules, so parsing them with `awk` or `cut` +won't cut it on its own. but we can convert them to a format that shell utilities like: + ## the short way (see `csv-min.sh`) -to convert from csv to tab-separated strings: +to convert from a csv file to lines of tab-separated values: ``` LC_ALL=C sed -n 's/'"$(printf "\r")"'$//;s/\\/\\\\/g;s/'"$(printf "\t")"'/\\t/g;H;x;h;s/^\n//;s/\n/\\n/g;s/,/,,/g;s/$/,/;s/^/,/;s/,\([^",]*\("[^"]*\(""[^"]*\)*"[^",]*\)*\),/\1'"$(printf "\t")"'/g;/,$/d;s/.$//;s/,,/,/g;s/"\([^"]*\(""[^"]*\)*\)"/\1/g;s/""/"/g;p;s/.*//;h' @@ -14,6 +17,16 @@ LC_ALL=C sed -n 's/'"$(printf "\r")"'$//;s/\\/\\\\/g;s/'"$(printf "\t")"'/\\t/g; tabs, newlines, and backslashes are escaped into `\t`, `\n`, and `\\`, respectively. +> → `foo,bar,"baz ""quuz"" \etc"` + +> ← `foo[TAB]bar[TAB]baz "quuz" \\etc` + +now you can parse with regular shell tools: + +- `cut -f2` +- `awk -F'\t' '{print $2 + $3}'` +- etc. + to convert back to CSV: ``` |