Home] [About] [Posts] [Resources]
dir: Home /
Posts /
powershell - Merge CSVs
published-date: 02 Feb 2023 19:45 +0700
categories: [misc] [quickie]
tags: [powershell]
CSV is a tabular data exchange format and is used dangerously extensively by data scientists, engineers, or analyst alike. Looking at the wiki, its feature is actually quite limited as a format for exchanging tabular data in plain-text. There are various alternatives such as to xml
which supports schema, json
with their inherit javascript object semantics, and various other formats which I don’t mention because I’m too tired of writing preface just to fill the post summary and is just hunk of bs at this point. sorry.
There might be times where you’d need to join bunch of csv fils with common headers, and copy-pasting everything or building a parsing and writer script is waay too much to ask. Turns out, powershell
do have built-in command just to do CSV utility command!
Assuming you’re using windows, of course. And powershell at least version 5.1
1Get-ChildItem "path/to/folder" -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv "path/to/merge.csv" -NoTypeInformation -Append
The cmdlet pipes that took place are:
Get-ChildItem
iterates over a directory, if flag -Filter
is declared, then it will do some pattern-matching with the child items in that directory. Which is not only important, but you can do fancy stuff with it.Select-Object -ExpandProperty FullName
takes previous stream and returns full absolute path to current child item.Import-Csv
imports the csv into the stream andExport-Csv
with -Append
flag creates (if the file does not exist) and append previous CSV import from the stream.Keep in mind that if you call this command repeatedly, it will bloat target path.
-Append
flag does not overwrite.
Say few of you CSVs fragment are updated quite frequently and you need to remove previous merge versions, just rm
(or Remove-File
if you prefer) previous file!
1$mergedFileName = "conc.csv";
2$targetDir = "."
3if (TestPath $mergedFileName) {
4 Remove-Item $mergedFileName
5}
6Get-ChildItem $targetDir -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv $mergedFileName -NoTypeInformation -Append
Or only merge specific CSVs with prefix in its filename
1Get-ChildItem . -Filter "prefix_somethingsomething*.csv" | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv "conc.csv" -NoTypeInformation -Append
And you can add flag -Recurse
into Get-ChildItem
to recursively get child item within subfolders.
And -Delimiter [your-csv-delimiter]
into Import-Csv
in case you were’nt using comma as delimiter. Make sure to refer the Documentation, also keep in mind different powershell version might have different flags available to them.
And that’s all.
Built with Hugo | previoip (c) 2024