Skip to content

Handling UTF-8-sig for encoding? #102

@b-meson

Description

@b-meson

Hi all,

I was recently trying to use csvlink to filter the data from two well-formated data sets. I tried to follow the documentation but it was not working at all. I repeatedly got the following error despite the field "sample" being in both my files.

csvlink: error: Could not find field 'sample' in input

Ultimately, I was able to dump the CSV and notice my header was printing as \ufeffsample which left me to figure out this was a byte order mark (BOM) issue. I made the following change to csvlink.py and the code ran for me.

-                self.input_1 = open(self.configuration['input'][0], encoding='utf-8').read()
+                self.input_1 = open(self.configuration['input'][0], encoding='utf-8-sig').read()
             except IOError:
                 raise self.parser.error("Could not find the file %s" %
                                    (self.configuration['input'][0], ))

             try:
-                self.input_2 = open(self.configuration['input'][1], encoding='utf-8').read()
+                self.input_2 = open(self.configuration['input'][1], encoding='utf-8-sig').read()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions