XSS vulnerability related
1 Definition and principle
XSS (Cross Site Scripting Attack), the browser executes malicious functions by taking the content entered by the user as a script. This attack against the user’s browser is a cross-site scripting attack.
Mainly divided into three types:
• Reflective
• Storage type
• DOM type
XSS hazards:
◇ Steal cookies
◇ Steal account
◇ Malware download
◇ Keylogger
◇ Advertising Drainage
2 Reflective XSS
2.1 Principle
The application or API contains unauthenticated and unescaped user input directly as part of the HTML output . A successful attack can allow the attacker to execute arbitrary HTML and JavaScript in the victim's browser.Feature: Non-persistent, it can only be caused by the user clicking on the link with specific parameters.
Scope of influence: Only users who execute scripts.
3 Stored XSS
3.1 Principle
Stored XSS means that the application obtains untrusted data through Web requests, and stores it in the database without checking whether the data has XSS code . When the data is retrieved from the database next time, the program does not filter it , and the page executes the XSS code again, and the stored XSS can continue to attack the user.Where the stored XSS appears:
◇ message board
◇ Comment area
◇ profile picture
◇ Signature
◇ Blog
4 DOM type XSS
4.1 Principle
4.1.1 DOM
The DOM model uses a logical tree to represent a document. The end of each branch is a node, and each node contains objects. DOM methods (methods) allow you to manipulate the tree in a specific way, with these methods you can change the structure, style, or content of the document.
4.1.2 DOM XSS
DOM XSS is actually a special type of reflective XSS, which dynamically outputs data to the page by manipulating the DOM tree through JS without relying on submitting the data to the server. It is a vulnerability based on the DOM document object model.4.1.3 Example
First of all, this is a DOM XSS. The reason is that the JS code dynamically splices a code like this:Take the following POC as an example:

You can see that the code in the div is in the form encoded by the HTML entity, but the final result will still pop up
The reason is that the code entered by innerHTML will not be executed.
For example, you can dynamically insert a DOM node as follows
You find
<div id=test>
tag will not be executed, but like jquery framework will be inserted when the next label eval node so that it can perform, because the append () method itself is to make the inserted element to perform, there is this demand .4.1.4 Similarities, differences and harms with reflective XSS
with:The input is not controlled well, and the javascript script input is inserted into the HTML page as the output.
different:
Reflected XSS means that after the back-end language is passed , the page reference back-end output takes effect.
DOM XSS is inserted into the page after JS directly manipulates the DOM tree .
Harmfulness:
The front and back ends are separated and are not tested by WAF.
5 Pseudo-protocol and encoding bypass
5.1 Pseudo agreement
Dummy protocol is different from the widely used Internet ashttp://、https://、ftp://
used in the URL, to perform specific functionsData pseudo protocol:
data:text/html;base64, PHNjcmlwdD5hbGVydCgxKTs8L3NjcmlwdD4=
JavaScript pseudo-protocol:
javascript::alert("1")
5.2 Encoding bypass
5.2.1 UNICODE encoding
ISO (International Organization for Standardization) has developed a code that includes all letters and symbols of all cultures on the earth, using two bytes to represent a characterUnicode is just a symbol set. It only specifies the binary code of the symbol, but does not specify how the binary code should be stored. Specific storage is implemented by: UTF-8, UTF-16, etc.
5.2.2 Browser decoding
There are three main processes when parsing an HTML document:HTML parsing and creating DOM tree, URL parsing and JavaScript parsing. Each parser is responsible for decoding and parsing its corresponding part in the HTML document, and the order is also different.
5.2.3 HTML parsing process
5.2.3.1 Analysis process
HTML has 5 types of elements:1. Void elements, including area, base, br, col, command, embed, hr, img, input, keygen, link, meta, param, source, track, wbr, etc.
2. Raw text elements, including <script> and <style>
3. RCDATA elements (RCDATA elements), there are <textarea> and <title>
4. Foreign elements, such as elements in the MathML namespace or SVG namespace
5. Basic elements (Normal elements), that is, elements other than the above 4 elements
The differences between the five types of elements are as follows:
1. Empty elements cannot contain any content (because they have no closing tag, no content can be placed between the opening tag and the closing tag).
2. The original text element can hold text.
3. The RCDATA element can hold text and character references.
4. External elements that can hold text, character references, CDATA sections, other elements, and comments
5. Basic elements that can hold text, character references, other elements and comments
The HTML parser operates as a state machine. It consumes characters from the document input stream and switches to different states according to its conversion rules.
Take the following code as an example:
1. The initial state is "Data" State. When it encounters the <character, the state changes to "Tag open" state. Reading an az character will generate a start tag symbol, and the state changes to "Tag name" state accordingly. This state is maintained. Until the> is read, each character is appended to the symbol name. In the example, an html symbol is created.
2. When> is read, the current symbol is completed. At this time, the state returns to the "Data" state, and the <body> tag repeats this process. At this point, both html and body tags are recognized. Now, go back to "Data" State and read each character in "This is Geekby's blog" to generate a character symbol.
3. This way until the <in </body> is encountered. Now, I am back to "Tag open", read the next character /, enter "Close tag open", create a closed tag symbol, and transfer the state to "Tag name" state, and keep this state until it encounters >. Then, generate a new label symbol and return to the "Data" State. The following closed label processing procedure is the same as above.
information
Example
problem
5.2.3.2 Several special cases
◇ Original text elementIn HTML, there are two tags belonging to Raw text elements: script and style. All content blocks under the Raw text elements type tag belong to this tag.
All character entity encodings under the Raw textelements type tag will not be decoded by HTML. When the HTML parser parses the content block (data) part of the script and style tags, the state will enter the Script Data State, which is not among the three states we mentioned earlier that will decode character entities.
Therefore, the
<script>alert(9);</script>
character entity will not be decoded, and JS will not be executed.◇ RCDATA situation
In HTML, there are two tags belonging to RCDATA: textarea and title.
Labels of type RCDATA Elements can contain text content and character entities.
When the parser parses the data part of the textarea and title tags, the state will enter the RCDATA State.
As we mentioned earlier, when in the RCDATA State state, character entities will be decoded by the parser.
Example
5.2.4 JavaScript parsing
Whether Unicode character escape sequences like \uXXXX or Hex encoding can be decoded depends on the situation.First, there are three places where Unicode character escape sequences can appear in JavaScript:
1. String
When a Unicode escape sequence appears in a string, it will only be interpreted as a normal character without destroying the context of the string.
E.g,
<script>alert("\u0031\u0030");</script>
The escaped part is 10, which is a string, which will be decoded normally, and the JS code will be executed.
1. Identifier
If the Unicode escape sequence exists in the identifier, that is, the variable name (such as function name, etc...), it will be decoded.
E.g,
<script>\u0061\u006c\u0065\u0072\u0074(10);</script>
The part that is escaped by the encoding is the alert character, which is the function name, which is in the identifier, so it will be decoded normally and the JS code will be executed.
1. Control characters
If the Unicode escape sequence exists in a control character, it will be decoded but not interpreted as a control character, but will be interpreted as an identifier or part of a string character.
The control characters are', ", (), etc.
For example,
<script>alert\u0028"xss");</script>
, (
it was Unicode encoding, then it is no longer as decoded control character, but as part of the alert identifier (.Therefore, control characters such as parentheses of functions cannot be interpreted normally after Unicode escapes.
Example
5.2.5 URL parsing
The URL parser is also modeled as a state machine, and the characters in the document input stream can lead it to different states.First of all, it should be noted that the protocol part of the URL must be ASCII characters, that is, it cannot be encoded in any way, otherwise the state machine of the URL parser will enter the No Scheme state.
Example
The URL
:
can not be encoded in any way, or URL parser state machine will enter No Scheme state.Example
Example
5.2.6 Parsing order
First, when the browser receives an HTML document, it will trigger the HTML parser to lexically parse the HTML document. This process completes the HTML decoding and creates a DOM tree.Next, the JavaScript parser will intervene to parse the inline script, and this process completes the JS decoding work.
If the browser encounters a context that requires a URL, the URL parser will also intervene to complete the URL decoding work. The decoding order of the URL parser will vary according to the location of the URL, and may be parsed before or after the JavaScript parser. HTML parsing is always the first step. URL parsing and JavaScript parsing, their parsing order depends on the situation.
Example
Example
Example
https://mp.weixin.qq.com/s/liODgY4NjYqdWg3JgPXMdA
6 HTML 5 new features and corresponding security analysis
6.1 SVG
SVG means scalable vector graphics, which is a way to define images in XML formatJS in SVG
When accessing the picture defined by the above file, a pop-up window will appear
Fishing with SVG
Overall process
6.2 Web Storage
Web Storage consists of two parts, one part is session storage and the other part is localStorage.◇ sessionStorage: Used to store data in a session locally. These data can only be accessed by pages in the same session and the data will be destroyed when the session ends.
◇ localStorage: The user's persistent local storage, unless the data is actively deleted, the data will never expire.
The new HTML5 Web Storage API allows web developers to store approximately 5 megabytes of data on the user's computer (while only 4KB of data is allowed in cookies).
Use SVG to steal localStorage