
Namaste Yogis. Welcome to the Blockchain & AI Forum, where your technology questions are answered, mostly correct! Here no question is too mundane. As a bonus, a proverb is included. Today’s question is from Victor, in Altadena, CA. He wants to know if he can earn $10M by developing a Weak artificial intelligence (AI) system capable of supervising a Strong AI.
Victor, you came to the right place. On December 14th, Open AI released a research paper titled, Weak-to-Strong Generalization (system) in which they explored the very question you asked!
https://openai.com/research/weak-to-strong-generalization
Open AI starts with a premise; the premise is Strong AI systems will be vastly more intelligent than humans in 10 years. Holy left behind Batman! Open AI says our current method of controlling AI relies on human supervision. However, future AI systems (Strong AI) will be capable of extremely complex and creative behaviors making it difficult for humans to reliably supervise them. Sounds like a sci-fi horror movie! Based on that premise, OpenAI has the following hypothesis–humans can control weak AI systems and weak AI systems can be designed, built, and used to control much stronger AI systems. Therefore, controlling Strong AI systems in the future should be possible. OpenAI says their hypothesis has been proven correct in a lab setting. Let’s unpack their research and findings.
I’ll quote from the report. “The research suggest there remains important disanalogies between our current empirical setup and the ultimate problem of aligning superhuman models.” In other words, more work is needed.
Open AI says, it may be easier for future models to imitate weak human errors than for current strong models to imitate current weak model errors, which could make generalization (control) harder in the future. Nevertheless, we believe our setup captures some key difficulties of aligning future superhuman models, enabling us to start making empirical progress on this problem today.” Said succinctly, we know managing superhuman models is likely to become a major issue, but we are working on it! Are you reassured?
Let’s examine the three findings:
Finding 1: When ChatGPT2 was used to supervise GPT-4, the resulting model typically performs somewhere between GPT-3 and GPT-3.5 and Open AI was able to recover much of GPT-4’s capabilities employing weaker supervision.
Finding 2: Naive human supervision, such as reinforcement learning from human feedback, could scale poorly to superhuman models without further work.
Finding 3: It’s feasible to substantially improve weak-to-strong generalization.
After discovering that Weak AI in the lab was able to control Strong AI, Open AI says it is committed to undertake two actions.
To start, AI has committed to releasing open source code to make it easy to get started with weak-to-strong generalization experiments today. In other words, the computer codes used in the research will be publicly available to researchers thus putting more brain power to work. Fantastic! https://openai.com/blog/superalignment-fast-grants
Second, OpenAI will launch a $10 million grants program for graduate students, academics, and other researchers to work on superhuman AI alignment broadly. https://openai.com/blog/superalignment-fast-grants. Outstanding!
In summary, the Open AI report said, 1) managing future superhuman AI models is likely to be a problem; 2) preliminary research suggest Weak AI could manage Strong AI models; 3) more research is needed; and 4) $10M in grants is available to any one capable of designing a Weak AI system capable of controlling a Strong AI system.
We end with a proverb from Mali, where they say: When an old woman dies, a library burns to the ground.
Until next time,
Yogi Nelson
