Friends i am B.TECH Engineer(I.T) . Hopefully these notes may help u.
Encoding and Decoding
American Standard Code for Information Interchange (ASCII) is still the foundation for existing encoding types. ASCII assigned characters to 7-bit bytes using the numbers 0 through 127.
More and more, ASCII and ISO 8859 encoding types are being replaced by Unicode. Unicode is a massive code page with tens of thousands of characters that support most languages and scripts, including Latin, Greek, Cyrillic, Hebrew, Arabic, Chinese, and Japanese (and many other scripts).
Unicode itself does not specify an encoding type; however, there are several standards for encoding Unicode. The .NET Framework uses Unicode UTF-16 (Unicode Transformation Format, 16-bit encoding form) to represent characters. In some cases, the .NET Framework uses UTF-8 internally. The System.Text namespace provides classes that allow you to encode and decode characters. System.Text encoding support includes the following encodings:
■ Unicode UTF-32 encoding Unicode UTF-32 encoding represents Unicode characters as sequences of 32-bit integers. You can use the UTF32Encoding class to convert characters to and from UTF-32 encoding.
■ Unicode UTF-16 encoding Unicode UTF-16 encoding represents Unicode characters as sequences of 16-bit integers. You can use the UnicodeEncoding class to convert characters to and from UTF-16 encoding.
Unicode UTF-8 encoding Unicode UTF-8 uses 8-bit, 16-bit, 24-bit, and up to 48-bit encoding. Values 0 through 127 use 8-bit encoding and exactly match ASCII values, providing some degree of interoperability. Values from 128 through 2047 use 16-bit encoding and provide support for Latin, Greek, Cyrillic, Hebrew, and Arabic alphabets. Values 2048 through 65535 use 24-bit encoding for Chinese, Japanese, Korean, and other languages that require large numbers of values. You can use the UTF8Encoding class to convert characters to and from UTF-8 encoding.
Using the Encoding Class
You can use the System.Text.Encoding.GetEncoding method to return an encoding object for a specified encoding. You can use the Encoding.GetBytes method to convert a Unicode string to its byte representation in a specified encoding. The following code example uses the Encoding.GetEncoding method to create a target encoding object for the Korean code page. The code calls the Encoding.GetBytes method to convert a Unicode string to its byte representation in the Korean encoding. The code then displays
the byte representations of the strings in the Korean code page.
// C#
// Get Korean encoding
Encoding e = Encoding.GetEncoding("Korean");
// Convert ASCII bytes to Korean encoding
byte[] encoded;
encoded = e.GetBytes("Hello, world!");
// Display the byte codes
for (int i = 0; i < encoded.Length; i++)
Console.WriteLine("Byte {0}: {1}", i, encoded[i]); How to Specify the Encoding Type when Writing a File
To specify the encoding type when writing a file, use an overloaded Stream constructor that accepts an Encoding object. For example, the following code sample creates several files with different encoding types:
StreamWriter swUtf7 = new StreamWriter("utf7.txt", false, Encoding.UTF7);
swUtf7.WriteLine("Hello, World!");
swUtf7.Close();
StreamWriter swUtf8 = new StreamWriter("utf8.txt", false, Encoding.UTF8);
swUtf8.WriteLine("Hello, World!");
swUtf8.Close();
StreamWriter swUtf16 = new StreamWriter("utf16.txt", false, Encoding.Unicode);
swUtf16.WriteLine("Hello, World!");
swUtf16.Close();
StreamWriter swUtf32 = new StreamWriter("utf32.txt", false, Encoding.UTF32);
swUtf32.WriteLine("Hello, World!");
swUtf32.Close();
How to Specify the Encoding Type when Reading a File
Typically, you do not need to specify an encoding type when reading a file. The .NET Framework automatically decodes most common encoding types. However, you can specify an encoding type using an overloaded Stream constructor, as the following sample shows:
string fn = "file.txt";
StreamWriter sw = new StreamWriter(fn, false, Encoding.UTF7);
sw.WriteLine("Hello, World!");
sw.Close();
StreamReader sr = new StreamReader(fn, Encoding.UTF7);
Console.WriteLine(sr.ReadToEnd());
sr.Close();
No comments:
Post a Comment