Categories
Web development

About charset UTF-8

The first line of the head element of a html document is usually

<meta charset="UTF-8" />   

What does it do?

Unicode

Unicode lists characters. Each character in the list corresponds with a number. The encoding UTF-8 determines how these numbers are stored in computer files/memory.

Example 1: The letter z is listed as number 122. It is stored as a single byte:

01111010

The first 128 characters are the same in all encodings (basic Latin), so the letter z would be save without a charset specification.

Example 2: The character é is number 233 of the unicode list, stored as

11000011 10101001

Example 3: The Kannada letter ಊ has number 3210. It’s stored as

11100000 10110010 10001010

HTML-codes and entities

HTML-codes use Unicode, independent of the character set used. So if you type &#3210; you’ll get ಊ even if you omit the charset meta tag or choose another one, like iso-8859

For é you may also use the HTML-code &#233; or the HTML entity &eacute;.

Is UTF-8 the default?

It’s sometimes stated that UTF-8 is the default character encoding for HTML5. But it isn’t. Not in the sense that it will be active if you don’t specify it. So make sure the tag is always present as the first child of head.

Some links

https://unicode-table.com

https://blog.whatwg.org/the-road-to-html-5-character-encoding