Unicode is getting more acceptance
Unicode (Universal Coded Character Set) is widely used on all the
platforms. Unicode data is stored in multibyte pattern. Though all the
operating systems supports UTF-8, some operating system supports UTF-16
and UTF-32 along with. UTF stands for Unicode Transformation Format. The
number after UTF denote the bits block (Byte) to represent a character.
One to four Bytes are used. For Devanagari script, code point U+0900 to
U+097F are used.
Indian languages are coming on the common platform
For Devanagari character Om code point is U+0950. How it is stored in UTF-8 is interesting one.First Step – Convert it into simple binary number
0 9 5 0
0000 1001 0101 0000
Second Step – Make the group of Six from right
0000 1001 0101 0000
0000 100101 010000
Third Step – if the group is of six then concat 10 to left and for group of four concat 1110
11100000 10100101 10010000
1110 0000 1010 0101 1001 0000
Fourth Step – convert it to hex
E0 A5 90
Binary value : 11100000-10100101-10010000
Hex value : 0xE0 0xA5 0x90
For UTF-8,Devanagari script needs three bytes to store one character.
Default unicode is UTF-8, so it better to use UTF-8 to store data.