unicode – Kev's Development Toolbox

August 19, 2018August 19, 2018

Python: UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xc2 in position 806040: ordinal not in range(128)

Trying to parse a file that has some unusual characters in it. I spent a while trying to work out if my file was in an unusual encoding, or whether it was a CR vs CRLF issue, but no, I did have some unusual chars in the file. Removed the offending chars and now all good.

December 18, 2012

Things you always forget: XML Character Entities

Certain characters break XML or need to be escaped in HTML so they are rendered literally and not interpreted themselves as markup. There’s a few predefined character entities that are commonly used:

< : <
> : >
& : &

For other characters though, you sometimes need to use their Unicode encoded value. For example to literally display { and } in a JSP or JSF page (since they are used in EL syntax ${} and #{} ), you can use their unicode values like this:

{ : {
} : }

There’s many unicode ref charts online, here’s one that I’ve used:

http://en.wikibooks.org/wiki/Unicode/Character_reference/0000-0FFF