Welcome to the CSC Q&A.
Get help and give help!
Write great code!
It is our choices... that show what we truly are, far more than our abilities.

Categories

+4 votes

After I did some modification in my CSV file, pandas could not read it anymore. It keeps popping up this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 5: invalid start byte

Does anyone know about this error and how to fix it?

asked in CSC320 by (1.1k points)
edited by

1 Answer

+1 vote
 
Best answer

Yeah, that's annoying.

It's possible that you saved the file using some other encoding, instead of UTF-8.

You might try this, to export in UTF-8 format from Excel.
https://www.ibm.com/support/knowledgecenter/en/SSWU4L/Data/imc_Data/Data_q_a_watson_assistant/A_Simple_Way_to_UTF-8_Encode_your_CSV_fi191.html

Or, you could try to read it in with pandas using another character encoding -- see https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python

Or, if those ideas fail, as a workaround, you could try saving the file in .XLSX format instead, and using panda's read_excel function instead.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

answered by (12.2k points)
selected by
...