How to Specify Character Encoding in HTML5?

Problem

May be you want to make sure that the text or special characters you use in the HTML document display correctly in the browsers. So, you need to specify the correct character encoding.

Solution

The Code:
To specify character set in your HTML document, add one of the following two methods inside the head element of your HTML page.

  1. Add charset attribute in meta element

    <meta charset=”UTF-8” />
  2. Add “content-Type” keyword as value for the http-equiv attribute in meta element

    <meta http-equiv=”content-Type” content=”text/html; charset=UTF-8” />


Explanation:
The character encoding is specified in the meta element to let browser know how to interpret the content of the HTML page and display that correctly.

In the first method above, the character encoding UTF-8 is assigned as value of the charset attribute. In the second method, it is assigned in the content attribute in that fashion.

Things you need to know about character encoding

  • What is character encoding?
    Character is the smallest part of the content. A character set (or repertoire) is a set of characters grouped for a particular purpose. Each character in a character set has a position number which is called code point. These code points are represented as bytes so that computer can manipulate those. Character encoding is the method of representing those code points into sequence of bytes.
  • Always specify character encoding
    When a web server sends a web page to the browser, it attaches additional information which is called HTTP header. Content encoding is one of the information specified there. If you don’t specify the character encoding in your HTML codes, browser use the character encoding specified in the HTTP header. If no character encoding specified in both HTTP header and in the HTML document, browser may not display content properly. So, always declare character encoding in your HTML document.
  • Character encoding should be the first line inside the head element
    This should be included within the first 1024 bytes from the beginning of the HTML page. So, declare it right after the opening <head> tag. Right after the browser finds the encoding declaration, it reinterprets the whole page according to the specified character encoding, no matter what other character encoding web server instructs browser to interpret with.
  • UTF-8 is the most common character encoding
    Unicode based encodings like UTF-8 supports almost every language in the world.
  • Make sure your editor saves HTML file with correct character encoding
    Your editor may save your HTML file with different character encoding. So, when saving your HTML file, make sure that UTF-8 character encoding is selected.
  • Test your character encoding
    To check which character encoding a web page is using, put the URL in the following link-http://validator.w3.org/i18n-checker/