Skip to content

When is a text file not a text file?

When it’s in the wrong encoding, apparently.

Allow me to explain. I was working on parsing (with Cygwin grep) a text file with which was created by copying the output of PsExec (one of the Sysinternals tools) and then running it through a PowerShell script to pull out and reformat the information I needed in a format that made it easy to parse with other tools. The issue I was running into was that grep kept failing to match anything. I even tried just matching against /./, and all I got for my trouble was a message that said that the “Binary file matches”, which was very puzzling considering it was a text file.

After searching around and pondering, the idea came to me that I should look at the character encoding. So I opened up the file in Notepad++, and checked the encoding, and sure enough, it was UCS-2 instead of UTF-8. I converted it to UTF-8, re-saved the file, and grep could then process the text file as a text file.

Just wanted to share this experience in the hopes that someone else will find it useful.

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*