powershell - Merge bunch of CSVs

published-date: 02 Feb 2023 19:45 +0700
categories: dumber-day-by-day snippets
tags: powershell

Preface

CSV is a tabular data exchange format and is used dangerously extensively by data scientists, engineers, or analyst alike. Looking at the wiki, its feature is actually quite limited as a format for exchanging tabular data in plain-text. There are various alternatives such as to xml which supports schema, json with their inherit javascript object semantics, and various other formats which I don’t mention because I’m too tired of writing preface just to fill the post summary and is just hunk of bs at this point. sorry.

There might be times where you’d need to join bunch of csv fils with common headers, and copy-pasting everything or building a parsing and writer script is waay too much to ask. Turns out, powershell do have built-in command just to do CSV utility command!

First and Foremost

Assuming you’re using windows, of course. And powershell at least version 5.1

One Liners Snippet

Get-ChildItem "path/to/folder" -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv "path/to/merge.csv" -NoTypeInformation -Append

The cmdlet pipes that took place are:

  1. Get-ChildItem iterates over a directory, if flag -Filter is declared, then it will do some pattern-matching with the child items in that directory. Which is not only important, but you can do fancy stuff with it.
  2. Select-Object -ExpandProperty FullName takes previous stream and returns full absolute path to current child item.
  3. Import-Csv imports the csv into the stream and
  4. Export-Csv with -Append flag creates (if the file does not exist) and append previous CSV import from the stream.

Keep in mind that if you call this command repeatedly, it will bloat target path. -Append flag does not overwrite.

Variations

Say few of you CSVs fragment are updated quite frequently and you need to remove previous merge versions, just rm (or Remove-File if you prefer) previous file!

$mergedFileName = "conc.csv"; 
$targetDir = "."
if (TestPath $mergedFileName) {
    Remove-Item $mergedFileName
} 
Get-ChildItem $targetDir -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv $mergedFileName -NoTypeInformation -Append

Or only merge specific CSVs with prefix in its filename

Get-ChildItem . -Filter "prefix_somethingsomething*.csv" | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv "conc.csv" -NoTypeInformation -Append

And you can add flag -Recurse into Get-ChildItem to recursively get child item within subfolders.

And -Delimiter [your-csv-delimiter] into Import-Csv in case you were’nt using comma as delimiter. Make sure to refer the Documentation, also keep in mind different powershell version might have different flags available to them.

And that’s all.