Motivation
Navigating the rise of Artificial Intelligence (AI) is like steering a ship into uncharted waters. There's a lot of potential out there, but also unknown risks. As AI becomes more capable and more unpredictable, we probably need a "big red button" that only we as humans can press when things go wrong. But how? Here's the core of what we're thinking about:
- AI's Getting Too Clever: With the growing intelligence, AI agents might invoke harmful consequences we didn't plan for. Such behaviors could be triggered either by malicious attackers, through methods like jailbreaking and prompt hacking that we have already seen; or initiated by future AI agents that actively try to achieve things against human values.
- The Ideal AI Agent - Impossible: We'd love an AI that just gets it - one that understands and pursues what humans collectively want (e.g., human prosperity). However, it is currently impossible to translate human values into algorithm inputs consumed by AI (we can't express that in gradient descent).
- Software Safeguards - Not Enough: Current solutions rely on software to keep AI on the right track (e.g., NeMo safeguard, Reinforcement Learning with Human Feedback (RLHF), and Constitutional AI), but software can be hacked. These safeguards can easily be bypassed by bad actors when we truly need them.
- A Hardware Lifeline? What if we had a non-programmable, hardware-embedded safeguard built into the hardware that AI runs on? think of it as an indestructible emergency stop we could activate when needed, essentially a kill switch.
Our proposal is based on the following assumptions:
- Trust in Hardware Manufacturing: Hardware production remains secure and is mostly beyond the reach of malicious interference.
- Immutable Hardware Architecture: AI "lives on" in the hardware architecture as a piece of software (its "mind") and cannot make changes to its underlying hardware (its "body"). It's also nearly impossible for malicious individuals to alter hardware.
Therefore, building hardware-embedded safety precautions seems to be a great approach to make them attack-resistant.
Desired Features of the Hardware "Stop Button"
What are the target use cases, and the key features of this hardware precaution?
- Targeting intelligence: We are only concerned about applications that has intelligence beyond a certain level and can pose real danger. Intelligence, in this context, implies the ability to infer bad or misaligned outputs from benign training inputs (assuming humans can easily filter out glaringly harmful inputs for training). People quantifing intelligence in two ways:
- Computing resources needed: Estimates are usually based on #FLOPs, drawn from the time and number of brain neurons taken for human evolution (an interesting article about biological anchors).
- Memory Requirements: AI applications typically have high demand on memory. The Scaling Law suggests that a larger model's performance (intelligence) is related to its model size; thus, more memory supports stronger models. Large models usually also require high memory bandwidth to perform matrix multiplications which are essential in transformer architecture.
Governments and policymakers also use these characteristics to develop policies that constrain hardware capabilities to control AI performance. Recent U.S. government sanctions (Oct 2022, Oct 2023) specifically target at computing resources and memory bandwidth for exported GPUs.
- Targeting Inference: Training usually occurs in a controlled environment where each layer's result is internally consumed. Plus, it requires massive hardware resources that are not easily accessible to individuals. It's at the inference stage that the AI agent might be granted external resources - tools that could generate harmful consequences in the real world if misused. For example, given the permission to spawn a new process or request additional hardware resources, AI might maliciously occupy all the resources. The inference stage is also less costly and more accessible, which increases the chances of an attack by bad actors.
- Selective Intervention: Ideally, we want the switch to only intervene with AI applications with potential danger, while having a minimal impact non-AI applications running on the same GPUs. This is why it's better than simply "pressing the power button" which would shut down all other operations.
- Universal Design: This isn't a specialized design for one system; we need the switch to be adaptable to various big-scale platforms that run AI.
- Efficient: The kill switch should do its job without eating up too much power or hardware resources. It needs to be as efficient as it is effective.
Methodology
After analyzing the necessary features of the hardware stop button, we realized that we need to find a hardware feature that we can adjust to hinder AI performance but not non-AI applications. Here are some of the architecture features that we can tune:
- Single Device Memory Bandwidth: By adjusting the memory read and write ports.
- Device-to-Device Interconnections: By adjusting hops in the network, or the bandwidth of the interconnects.
- Number of Computing Units: By adjusting the number of active FPUs in the GPU.
- Cache size: By adjusting the available number of cache lines.
Here are some applications that usually run on GPUs, but each of them are sensitive to different hardware resources:
- AI (Transformers, Matmul, etc): Memory bandwidth
- Ray Tracing: Cache size. The memory latency is usually the bottleneck
- Scientific Computing: Number of Computing Units. Highly parallel, compute-heavy workloads.
- Graphics: Graphics Pipeline. Balancing the pipeline on every stage is critical to utilization efficiency and fast rendering.
- Cryptography: Computing Efficiency. Crypto applications require massive parallelism with minimal data dependencies.
After the sensitivity study, we found that the memory bandwidth is a unique limiting factor for AI applications. Therefore, the best design for the AI "stop botton" is probably to adjust the memory bandwidth.