Bypassing script filters via variable-width encodings Author: Cheng Peng Su (applesoup_at_gmail.com) Date: August 7, 2006 We've all known that the main problem of constructing XSS attacks is how to obfuscate malicious code in order to bypass script filters. In the following paragraphs I will attempt to explain the concept of bypassing script filters via variable-width encodings, and disclose the applications of this concept to Hotmail and Yahoo! Mail web-based mail services. Variable-width encoding Introduction ==================================== A variable-width encoding (a.k.a variable-length encoding) is a type of character encoding scheme in which codes of different lengths are used to encode a character set. Most common variable-width encodings are multibyte encodings, which use varying numbers of bytes to encode different characters. The first use of multibyte encodings was for the encodings of Chinese, Japanese and Korean, which have large character sets in excess of 256 characters. The Unicode standard has two variable-width encodings: UTF-8 and UTF-16. The most commonly used codes are two-byte codes. The EUC-CN form of GB2312, plus EUC-JP and EUC-KR, are examples of such two-byte EUC codes. And there are also some three-byte and four-byte codes. Example and Discussion ====================== Here is a php file from which I will start to introduce my idea. ------------------------------example.php-------------------------------- not " ."" // NOTE: 5 whitespace characters following the last \" ."available\r\n\r\n
\r\n\r\n"; } ?> ------------------------------------------------------------------------- For most values of $i, Internet Explorer 6.0(SP2) will display "Char XXX is not available". When $i is between 192(0xC0) and 255(0xFF), you can see "Char XXX is available". Let's take $i=0xC0 for example, and consider the following code: Char 192 is not available 0xC0 is one of the 32 first bytes of 2-byte sequences (0xC0-0xDF) in UTF-8. So when IE parses the code above, it will consider 0xC0 character and the following quote as a sequence, and therefore these two pairs of FONT elements will become one with font = "xyz[0xC0]">not not message. Supposing there is a forum encoded in UTF-8, we can attack by sending [font=xyz[0xC0]]buried[/font][font=abc onmouseover=alert() fake_parameter=[0xC0]]exploited[/font] And it will be tranformed into buriedexploited Again, the way to exploit is very flexible. This FONT-FONT example is just an enlightening one. The following exploitation to Yahoo! Mail is quite different from this one. Disclosure ========== Using this method, I have found two XSS vulnerabilities in Hotmail and Yahoo! Mail web-based mail services. I informed Yahoo and Microsoft on April 30 and May 12 respectively. And they have patched the vulnerabilities. Yahoo! Mail XSS --------------- Before it fixed the flaw, Yahoo! Mail filtering engine could block "expression()" syntax in a CSS attribute, even if attacker use a comment to break up expression( expr/* */ession() ). I used [0x81] with the following asterisk to make a sequence, so that the second */ would close the comment. But the filtering engine considered the first two comment symbol as a pair. -------------------------------------------------------------------- MIME-Version: 1.0 From: user Content-Type: text/html; charset=GB2312 Subject: example exploited . -------------------------------------------------------------------- Hotmail XSS ----------- This exploitation is almost the same as the example.php. -------------------------------------------------------------------- MIME-Version: 1.0 From: user Content-Type: text/html; charset=SHIFT_JIS Subject: example exploited . -------------------------------------------------------------------- Reference ========= Wikipedia: Variable-width encoding(http://en.wikipedia.org/wiki/Variable-width_encoding) RFC 3629, the UTF-8 standard(http://tools.ietf.org/html/rfc3629) RSnake: XSS Cheat Sheet(http://ha.ckers.org/xss.html) ( Original text: http://applesoup.googlepages.com/bypass_filter.txt )