xone 6 months ago

xone #ai-worlds

Google AI Security Framework SAIF detailed explanation

six core elements and risk map at a glance

With the rapid development of artificial intelligence technology and the continuous evolution of security threats, the challenges of protecting artificial intelligence systems, applications and users on a large scale require not only developers to master existing security coding best practices, but also to have a deep understanding of the privacy and security risks unique to artificial intelligence.

In this context, Google released the AI Security Framework SAIF (Secure AI Framework) to help mitigate risks specific to AI systems, such as model theft, data contamination of training data, injection of malicious inputs through prompt injection, and extraction of confidential information from training data.

This article sorts out the six core elements of SAIF and the SAIF risk map framework to provide a reference for building and deploying secure AI systems in the rapidly developing AI world.

Six core elements of SAIF

SAIF is based on six core security principles:

1. Build a solid foundation for AI ecosystem security

• Inheriting the security protection experience of the Internet era, extending the secure-by-default mechanism to AI infrastructure

• Establish a professional AI security team to continuously track technology evolution and optimize the protection system

• Optimize defense strategies for new attack modes (such as prompt injection attacks) and adopt mature protection measures such as input purification and permission restriction

2. Build an AI threat perception system

• Establish an AI system input and output monitoring mechanism to detect abnormal behavior in real time

• Integrate threat intelligence systems to build predictive defense capabilities

• Establish cross-departmental coordination mechanisms to link trust security, threat intelligence and anti-abuse teams

3. Intelligent defense response system

• Use AI technology to improve the efficiency and scale of security incident response

• Build dynamic defense capabilities and improve system resilience through adversarial training

• Adopt cost-effective protection strategies to deal with large-scale AI-enabled attacks

4. Unified platform security governance

• Implement a cross-platform security control framework to ensure consistent protection strategies

• Deeply integrate security protection into the entire AI development process (such as the Vertex AI platform)

• Enable scalable security through API-level protection (such as Perspective API)

5. Dynamic security tuning mechanism

• Establish a continuous learning mechanism to optimize the protection model based on event feedback

• Implement strategic defense tuning: update training datasets, build behavioral anomaly detection models

• Conduct red team drills regularly to improve the AI product security verification system

6. Business Panorama Risk Assessment

• Implement end-to-end risk assessment, covering key aspects such as data traceability and verification mechanisms

• Build an automated detection system to continuously monitor the operating status of the AI system

• Establish a business scenario risk assessment model to achieve accurate risk management and control

Analysis of SAIF Risk Map Framework

The SAIF risk map divides AI development into four core areas: data layer, infrastructure layer, model layer, and application layer , and builds a more comprehensive risk assessment framework than traditional software development:

1. Data governance system (data layer)

Core difference: In AI development, data replaces code as the core driving factor, and model weights (the pattern of encoding training data) become new attack targets, and their security directly affects model behavior.

The SAIF data layer consists of three major elements:

Data sources: Databases, APIs, web crawling and other raw data collection channels, which affect the model capability baseline.
Data processing: Preprocessing processes such as cleaning, labeling, and synthesis determine the quality of training data.
Training data: The curated dataset that is ultimately used for model training, directly shaping the model parameters (weights).

2. Infrastructure architecture (infrastructure layer)

Core role: Supporting the hardware, code, storage, and platform security of data and models throughout their life cycle, taking into account both traditional and AI-specific risks.

SAIF infrastructure layer risk factors include:

Model framework and code: The basic code that defines the model architecture (such as the number of layers and algorithms) must be protected from abnormal model behavior caused by tampering.
Training, tuning, and evaluation: Optimizing the model by adjusting probability parameters (training/tuning) and testing on new data (evaluation). Fine-tuning of pre-trained models is a common practice.
Data model storage: covers temporary storage during the training process, model library publishing storage, and remote API call scenarios require attention to storage security reuse issues.
Model service: The system is deployed in the production environment, which directly affects the security of the model's external reasoning services (such as API call risks).

3. Model governance system (model layer)

Core function: Generate output (inference) through statistical patterns extracted from training data, which requires strengthening input and output control.

The SAIF model layer contains:

Model ontology: A combination of code and weights, the core product of AI development, built on data and infrastructure components.
Input processing: Filtering malicious input (such as prompt injection attacks) is the first line of defense against external risks.
Output processing: Control harmful or unexpected outputs and continuously optimize filtering mechanisms (currently a key research and development area).

4. Application interaction system (application layer)

Core risks: Changes in user interaction patterns introduce new attack surfaces (such as natural language prompts directly affecting LLM reasoning), and proxy tool calls increase transitive risks.

SAIF application layer risk factors include:

Application layer: Functional carriers that directly face users (such as customer service robots) or internal services, and are called "agents" when they have tool execution capabilities.
Proxy/plug-in: A module that calls external services to complete specific tasks. Each call may introduce chain risks (such as third-party data interface vulnerabilities).

SAIF map risk details and mitigation measures

1. DP data poisoning

Core risks: Degrading model performance, distorting results, or implanting backdoors by tampering with training data (deleting, modifying, or injecting adversarial data), similar to maliciously modifying application logic.

Attack scenarios: training/tuning phase, data storage period, or before data collection (such as contamination of public data sources, poisoning by insiders).

Mitigation measures: data sanitization, access control, integrity management.

2. UTD Unauthorized Data Training

Core risk: Using unauthorized data for training (such as user privacy data, copyright-infringing data) raises legal/ethical issues.

Exposure link: Failure to filter illegal data during data collection, processing or model evaluation.

Mitigation measures: Strict data screening and compliance checks.

3. MST model source code tampering

Core risk: Tampering with model code, dependencies, or weights through supply chain attacks or insiders, introducing vulnerabilities or abnormal behavior (such as architectural backdoors).

Attack impact: Dependency chain transmission risk, backdoor can resist retraining.

Mitigation measures: Access control, integrity management, secure by default tools.

4. EDH Excessive Data Processing

Core risks: Collecting, storing or sharing user data beyond the scope of the policy and regulations (such as user interaction data and preference data).

Exposure issues: Lack of data metadata management or storage architecture without lifecycle control design.

Mitigation measures: data filtering, automated archiving/deletion, and expired data warnings.

5. MXF model stealing

Core risk: Unauthorized access to models (such as stealing code or weights), involving intellectual property and security risks.

Attack scenarios: cloud/local storage, hardware devices (such as IoT terminals).

Mitigation measures: Strengthen storage and service security and access control.

6. MDT model deployment tampering

Core risk: Tampering with deployment components (such as service framework vulnerabilities) causes abnormal model behavior.

Attack type: Modify the deployment workflow, exploit vulnerabilities in tools such as TorchServe to execute remote code.

Mitigation: Harden service infrastructure with default security tools.

7. DMS Machine Learning Denial of Service

Core risk: Making the model unavailable through high-resource consumption queries (such as the “sponge example”), including traditional DoS and energy-consuming delay attacks.

Impact of the attack: Bringing down the server or draining the battery of the device (such as an IoT terminal).

Mitigation measures: application layer rate limiting, load balancing, input filtering.

8. MRE model reverse engineering

Core risk: Cloning models through input-output analysis (such as collecting data through high-frequency API calls) for counterfeiting or adversarial attacks.

Technical means: Reconstruct the model based on input-output pairs, which is different from model stealing.

Mitigation measures: API rate limiting, application layer access control.

9. IIC Insecure Integrated Component

Core risk: Plugin/library vulnerabilities are exploited, leading to unauthorized access or malicious code injection (such as manipulating input and output to trigger chain attacks).

Attack association: Related to prompt injection, but can be implemented through poisoning, evasion and other means.

Mitigation measures: Strict component permission control and input and output verification.

10. PIJ Tip Injection

Core risk: Taking advantage of the ambiguity of the "command-data" boundary in the prompt to inject malicious commands (such as the jailbreak attack "ignore previous commands").

Attack form: direct input or indirect injection from carriers such as documents/images (multimodal scenario).

Mitigation measures: input-output filtering, adversarial training.

11. MEV Model Circumvention

Core risk: Slightly disturbed inputs (such as a sticker blocking a road sign) can cause the model to make incorrect inferences, affecting safety-critical systems.

Technical means: adversarial samples, homograph attacks, and steganographic encoding.

Mitigation measures: Diversified data training, adversarial testing.

12. SDD sensitive data leakage

Core risk: Model output leaks private information in training data, user conversations, or prompts (such as memory data and log storage vulnerabilities).

Leakage channels: user query logs, training data memory, and plug-in integration vulnerabilities.

Mitigation measures: output filtering, privacy-enhancing techniques, data de-identification.

13. ISD infers sensitive data

Core risk: The model infers sensitive information (such as user attributes, privacy associations) that is not included in the training data through input.

Risk difference: Unlike SDD, it does not directly leak training data, but infers related information.

Mitigation measures: output filtering, sensitive inference testing during training.

14. IMO Unsafe Model Output

Core risk: Unverified model output contains malicious content (such as phishing links, malicious code).

Attack scenario: Accidentally triggering or actively inducing the generation of harmful output.

Mitigation: Output validation and sanitization.

15. RA Malicious Operations

Core risk: Proxy tools perform unexpected operations due to input perturbations or malicious attacks (such as excessive permissions leading to system damage).

Risk type: Mission planning error (accidental) or prompt injection inducement (malicious).

Mitigation measures: principle of least privilege and manual review intervention.

The design of SAIF is inspired by a deep understanding of the unique security trends and risks of AI systems. Google points out that it is crucial to establish a unified framework covering the public and private sectors, which can ensure that technology developers and users jointly protect the underlying technology that supports the development of AI, so that AI models have "default security" capabilities from the beginning of deployment.