We've all known that the main problem of constructing XSS attacks is how to obfuscate malicious code. In the following paragraphs Cheng will attempt to explain the concept of bypassing script filters with variable-width encodings, and disclose the applications of this concept to Hotmail and Yahoo! Mail web-based mail services.

A variable-width encoding(a.k.a variable-length encoding) is a type of character encoding scheme in which codes of differing lengths are used to encode a character set. Most common variable-width encodings are multibyte encodings, which use varying numbers of bytes to encode different characters. The first use of multibyte encodings was for the encoding of Chinese, Japanese and Korean, which have large character sets well in excess of 256 characters. The Unicode standard has two variable-width encodings: UTF-8 and UTF-16. The most commonly-used codes are two-byte codes. The EUC-CN form of GB2312, plus EUC-JP and EUC-KR, are examples of such two-byte EUC codes. And there are also some three-byte and four-byte codes.

The link for this article located at Securiteam is no longer available.