Welcome to the CSC Q&A, on our server named in honor of Ada Lovelace. Write great code! Get help and give help!
It is our choices... that show what we truly are, far more than our abilities.

Categories

+7 votes

After I did some modification in my CSV file, pandas could not read it anymore. It keeps popping up this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 5: invalid start byte

Does anyone know about this error and how to fix it?

asked in CSC320 by (1 point)
edited by

1 Answer

+1 vote
 
Best answer

Yeah, that's annoying.

It's possible that you saved the file using some other encoding, instead of UTF-8.

You might try this, to export in UTF-8 format from Excel.
https://www.ibm.com/support/knowledgecenter/en/SSWU4L/Data/imc_Data/Data_q_a_watson_assistant/A_Simple_Way_to_UTF-8_Encode_your_CSV_fi191.html

Or, you could try to read it in with pandas using another character encoding -- see https://stackoverflow.com/questions/18171739/unicodedecodeerror-when-reading-csv-file-in-pandas-with-python

Or, if those ideas fail, as a workaround, you could try saving the file in .XLSX format instead, and using panda's read_excel function instead.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

answered by (508 points)
selected by
...