Join Our Telegram GroupsTelegram

hack website using XSS

xss top owasp 10 xxs bug bouty hunting

 

XSS vulnerability related

1 Definition and principle


XSS (Cross Site Scripting Attack), the browser executes malicious functions by taking the content entered by the user as a script. This attack against the user’s browser is a cross-site scripting attack.
Mainly divided into three types:
• Reflective
• Storage type
• DOM type

XSS hazards:
◇ Steal cookies
◇ Steal account
◇ Malware download
◇ Keylogger
◇ Advertising Drainage

2 Reflective XSS

2.1 Principle

The application or API contains unauthenticated and unescaped user input directly as part of the HTML output . A successful attack can allow the attacker to execute arbitrary HTML and JavaScript in the victim's browser.
Feature: Non-persistent, it can only be caused by the user clicking on the link with specific parameters.
Scope of influence: Only users who execute scripts.

3 Stored XSS

3.1 Principle

Stored XSS means that the application obtains untrusted data through Web requests, and stores it in the database without checking whether the data has XSS code . When the data is retrieved from the database next time, the program does not filter it , and the page executes the XSS code again, and the stored XSS can continue to attack the user.
Where the stored XSS appears:
◇ message board
◇ Comment area
◇ profile picture
◇ Signature
◇ Blog

4 DOM type XSS

4.1 Principle

4.1.1 DOM

The DOM model uses a logical tree to represent a document. The end of each branch is a node, and each node contains objects. DOM methods (methods) allow you to manipulate the tree in a specific way, with these methods you can change the structure, style, or content of the document.


4.1.2 DOM XSS

DOM XSS is actually a special type of reflective XSS, which dynamically outputs data to the page by manipulating the DOM tree through JS without relying on submitting the data to the server. It is a vulnerability based on the DOM document object model.


1
2
3
4
5
6
7
<html>
    <body>
        <script>
            document.write("<script>alert(1)<\/script>")
        </script>
    </body>
</html>

4.1.3 Example

First of all, this is a DOM XSS. The reason is that the JS code dynamically splices a code like this:


1
$("head").append("<meta>"+text+"</meta>")

Take the following POC as an example:

You can see that the code in the div is in the form encoded by the HTML entity, but the final result will still pop up



The reason is that the code entered by innerHTML will not be executed.
For example, you can dynamically insert a DOM node as follows


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
<!DOCTYPE html>
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>DOM XSS POC</title>
</head>
<body>
    <div id="demo">&lt;script&gt;alert`1`&lt;/script&gt;</div>
    <script src="https://libs.baidu.com/jquery/2.1.1/jquery.min.js"></script>
    <br>
    <div id="test"></div>
    <script>
        document.getElementById("test").innerHTML = document.getElementById("demo").innerHTML + "";
    </script>
</body>
</html>

You find <div id=test> tag will not be executed, but like jquery framework will be inserted when the next label eval node so that it can perform, because the append () method itself is to make the inserted element to perform, there is this demand .

4.1.4 Similarities, differences and harms with reflective XSS

with:
The input is not controlled well, and the javascript script input is inserted into the HTML page as the output.
different:
Reflected XSS means that after the back-end language is passed , the page reference back-end output takes effect.
DOM XSS is inserted into the page after JS directly manipulates the DOM tree .
Harmfulness:
The front and back ends are separated and are not tested by WAF.

5 Pseudo-protocol and encoding bypass

5.1 Pseudo agreement

Dummy protocol is different from the widely used Internet as http://、https://、ftp:// used in the URL, to perform specific functions
Data pseudo protocol:
data:text/html;base64, PHNjcmlwdD5hbGVydCgxKTs8L3NjcmlwdD4=
JavaScript pseudo-protocol:
javascript::alert("1")

5.2 Encoding bypass

5.2.1 UNICODE encoding

ISO (International Organization for Standardization) has developed a code that includes all letters and symbols of all cultures on the earth, using two bytes to represent a character
Unicode is just a symbol set. It only specifies the binary code of the symbol, but does not specify how the binary code should be stored. Specific storage is implemented by: UTF-8, UTF-16, etc.

5.2.2 Browser decoding

There are three main processes when parsing an HTML document:
HTML parsing and creating DOM tree, URL parsing and JavaScript parsing. Each parser is responsible for decoding and parsing its corresponding part in the HTML document, and the order is also different.

5.2.3 HTML parsing process

5.2.3.1 Analysis process

HTML has 5 types of elements:
1. Void elements, including area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param, source, track, wbr, etc.

2. Raw text elements, including <script> and <style>

3. RCDATA elements (RCDATA elements), there are <textarea> and <title>

4. Foreign elements, such as elements in the MathML namespace or SVG namespace

5. Basic elements (Normal elements), that is, elements other than the above 4 elements

The differences between the five types of elements are as follows:
1. Empty elements cannot contain any content (because they have no closing tag, no content can be placed between the opening tag and the closing tag).

2. The original text element can hold text.

3. The RCDATA element can hold text and character references.

4. External elements that can hold text, character references, CDATA sections, other elements, and comments

5. Basic elements that can hold text, character references, other elements and comments

The HTML parser operates as a state machine. It consumes characters from the document input stream and switches to different states according to its conversion rules.


Take the following code as an example:

1
2
3
4
5
<html>
 <body>
   This is Geekby's blog
 </body>
</html>

1. The initial state is "Data" State. When it encounters the <character, the state changes to "Tag open" state. Reading an az character will generate a start tag symbol, and the state changes to "Tag name" state accordingly. This state is maintained. Until the> is read, each character is appended to the symbol name. In the example, an html symbol is created.

2. When> is read, the current symbol is completed. At this time, the state returns to the "Data" state, and the <body> tag repeats this process. At this point, both html and body tags are recognized. Now, go back to "Data" State and read each character in "This is Geekby's blog" to generate a character symbol.

3. This way until the <in </body> is encountered. Now, I am back to "Tag open", read the next character /, enter "Close tag open", create a closed tag symbol, and transfer the state to "Tag name" state, and keep this state until it encounters >. Then, generate a new label symbol and return to the "Data" State. The following closed label processing procedure is the same as above.
information
When the HTML parser is in the Data State , RCDATA State , or Attribute Value State , the character entity will be decoded into the corresponding character.
Example
1
<div>&#60;img src=x onerror=alert(4)&#62;</div>

<And> are encoded as character entities < and >. When the HTML parser finishes parsing the <div>, it will enter the data state and issue the tag token. Then, when the entity < is parsed, because it is in the data state, the entity will be decoded as <, and the following > will be decoded as> in the same way.



problem
After being decoded, will img be parsed into HTML tags and cause JS to execute?

Because the parser will not switch to the Tag Open State after using character references, it will not be published as an HTML tag unless it enters the tag open state. Therefore, no new HTML tags will be created, they will only be processed as data

5.2.3.2 Several special cases

◇ Original text element

In HTML, there are two tags belonging to Raw text elements: script and style. All content blocks under the Raw text elements type tag belong to this tag.
All character entity encodings under the Raw textelements type tag will not be decoded by HTML. When the HTML parser parses the content block (data) part of the script and style tags, the state will enter the Script Data State, which is not among the three states we mentioned earlier that will decode character entities.
Therefore, the <script>&#97;&#108;&#101;&#114;&#116&#40;&#57;&#41;&#59</script>character entity will not be decoded, and JS will not be executed.
◇ RCDATA situation

In HTML, there are two tags belonging to RCDATA: textarea and title.
Labels of type RCDATA Elements can contain text content and character entities.
When the parser parses the data part of the textarea and title tags, the state will enter the RCDATA State.
As we mentioned earlier, when in the RCDATA State state, character entities will be decoded by the parser.
Example
1
<textarea>&#60;script&#62;alert(5)&#60;/script&#62;</textarea>

The parser decodes them when it parses them

But there is still the same JS will not be executed, reason is because the state machine to decode the character entity will not enter the open label (Tag Open State), therefore there is <script>and is not interpreted as HTML tags

5.2.4 JavaScript parsing

Whether Unicode character escape sequences like \uXXXX or Hex encoding can be decoded depends on the situation.
First, there are three places where Unicode character escape sequences can appear in JavaScript:
1. String
When a Unicode escape sequence appears in a string, it will only be interpreted as a normal character without destroying the context of the string.
E.g,<script>alert("\u0031\u0030");</script>
The escaped part is 10, which is a string, which will be decoded normally, and the JS code will be executed.
1. Identifier
If the Unicode escape sequence exists in the identifier, that is, the variable name (such as function name, etc...), it will be decoded.
E.g,<script>\u0061\u006c\u0065\u0072\u0074(10);</script>
The part that is escaped by the encoding is the alert character, which is the function name, which is in the identifier, so it will be decoded normally and the JS code will be executed.
1. Control characters
If the Unicode escape sequence exists in a control character, it will be decoded but not interpreted as a control character, but will be interpreted as an identifier or part of a string character.
The control characters are', ", (), etc.
For example, <script>alert\u0028"xss");</script>(it was Unicode encoding, then it is no longer as decoded control character, but as part of the alert identifier (.
Therefore, control characters such as parentheses of functions cannot be interpreted normally after Unicode escapes.
Example
1
<script>\u0061\u006c\u0065\u0072\u0074\u0028\u0031\u0031\u0029</script>

The coded part is alert(11). The JS in this example will not be executed because the control characters are encoded.

1
<script>\u0061\u006c\u0065\u0072\u0074(\u0031\u0032)</script>

The coded part is alert and the bracket is 12. In this example, JS will not be executed. The reason is that the encoded part in the brackets cannot be interpreted normally. Either use ASCII numbers, or add "" or '' to make it a string. As a string, it can only be used as a normal character. .

1
<script>alert('13\u0027)</script>

Is coded as 'This embodiment does not execute JS, because the control character is encoded, the decoding 'will become part of the string, rather than interpreted as a control character. Thus in this embodiment the string is not complete, since there is no 'to end of the string.

1
<script>alert('14\u000a')</script>

The JS in this example will be executed, because the encoded part is in the string and will only be interpreted as ordinary characters and will not break through the string context.


5.2.5 URL parsing

The URL parser is also modeled as a state machine, and the characters in the document input stream can lead it to different states.
First of all, it should be noted that the protocol part of the URL must be ASCII characters, that is, it cannot be encoded in any way, otherwise the state machine of the URL parser will enter the No Scheme state.
Example
1
<a href="%6a%61%76%61%73%63%72%69%70%74:%61%6c%65%72%74%28%31%29"></a>

The URL encoding part is javascript:alert(1). JS will not be executed because the javascript string as part of the Scheme is encoded, causing the URL parser state machine to enter the No Scheme state.



The URL :can not be encoded in any way, or URL parser state machine will enter No Scheme state.
Example
1
<a href="javascript%3aalert(3)"></a>

Because: URL encoded as %3a, the URL state machine enters the No Scheme state, and the JS code cannot be executed.

Example
1
<a href="&#x6a;&#x61;&#x76;&#x61;&#x73;&#x63;&#x72;&#x69;&#x70;&#x74;:%61%6c%65%72%74%28%32%29">

javascriptThis string is entity-encoded, :not encoded, but alert(2)URL-encoded. Can be executed successfully.

First of all, in the HTML parser, when the HTML state machine is in the Attribute Value State, the character entity will be decoded, here in the href attribute, so the javascript string that is entity-encoded will be decoded.

Secondly, HTML parsing is before URL parsing, so before URL parsing, the javascript string in the Scheme part has been decoded and is no longer entity-encoded.

5.2.6 Parsing order

First, when the browser receives an HTML document, it will trigger the HTML parser to lexically parse the HTML document. This process completes the HTML decoding and creates a DOM tree.
Next, the JavaScript parser will intervene to parse the inline script, and this process completes the JS decoding work.
If the browser encounters a context that requires a URL, the URL parser will also intervene to complete the URL decoding work. The decoding order of the URL parser will vary according to the location of the URL, and may be parsed before or after the JavaScript parser. HTML parsing is always the first step. URL parsing and JavaScript parsing, their parsing order depends on the situation.

Example
1
<a href="UserInput"></a>

In this example, the HTML parser first decodes the character entities of the UserInput part;

Then the URL parser will decode the UserInput URL; if the Scheme part of the URL is javascript, the JavaScript parser will decode the UserInput again. So the parsing order is: HTML parsing->URL parsing->JavaScript parsing.

Example
1
<a href=# onclick="window.open('UserInput')"></a>

In this example, the HTML parser first decodes the character entities of the UserInput part;

Then the JavaScript parser will parse and execute the JS in the onclick part;

After executing the JS, the parameters of the window.open('UserInput') function will be passed into the URL, so the URL parser will decode the UserInput part.

Therefore, the order of parsing is: HTML parsing -> JavaScript parsing -> URL parsing.

Example
1
<a href="javascript:window.open('UserInput')">

In this example, the HTML parser first decodes the UserInput part of the character entities; then the URL parser parses the attribute value of href; then, because the Scheme is javascript, it is parsed by JavaScript;

After parsing and executing the JS, the window.open('UserInput') function passes in the URL, so it is parsed by the URL parser.

So the parsing order is: HTML parsing->URL parsing->JavaScript parsing->URL parsing.

reference
https://mp.weixin.qq.com/s/liODgY4NjYqdWg3JgPXMdA

6 HTML 5 new features and corresponding security analysis

6.1 SVG

SVG means scalable vector graphics, which is a way to define images in XML format
JS in SVG


1
2
3
4
5
<?xml version="1.0" standalone="no"?>
<svg width="100%" height="100%" version="1.1" xmlns="http://www.w3.org/2000/svg">
	<rect width="100" height="100" style="fill:rgb(0,0,255);stroke-width:1;stroke:rgb(0,0,0)" />
  <script>alert(1)</script>
</svg>


When accessing the picture defined by the above file, a pop-up window will appear
Fishing with SVG

Overall process

6.2 Web Storage

Web Storage consists of two parts, one part is session storage and the other part is localStorage.
◇ sessionStorage: Used to store data in a session locally. These data can only be accessed by pages in the same session and the data will be destroyed when the session ends.
◇ localStorage: The user's persistent local storage, unless the data is actively deleted, the data will never expire.

The new HTML5 Web Storage API allows web developers to store approximately 5 megabytes of data on the user's computer (while only 4KB of data is allowed in cookies).
Use SVG to steal localStorage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
<?xml version="1.0" standalone="no"?>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" >
    <rect width="100" height="100" />
    <script>
        if(localStorage.length)
        {
            for(key in localStorage)
            {
                if(localStorage.getItem(key))
                {
                    console.log(key);
                    console.log(localStorage.getItem(key));
                }
            }
        }
    </script>
</svg>






Post a Comment

Hope you enjoyed the article!😊
Post a Comment