Late-night parenting duties can sometimes lead to extraordinary ideas.
That's precisely what happened to OthersideAI developer Josh Bickett, who, while feeding his newborn, conceived a revolutionary "self-operating computer framework".
Bickett's brainwave was triggered by the capabilities of GPT-4 vision. He realised that the technology could be used to operate a computer, just like a human would. The framework he sketched out allows AI to control both the mouse and keyboard, based on visual inputs from screenshots.
OthersideAI CEO, Matt Shumer, was quick to recognise the potential of this innovation. He compared it to the development of self-driving cars, saying, "We have the sensors now. We have the LIDAR systems. Next, we build the intelligence."
This new approach is a significant leap from previous methods that relied solely on APIs. Shumer explained that APIs have limitations and don't extend to all operations. If you want a truly autonomous system, it needs to operate like a human, because, as he put it, "the world is built for people."
The real potential of this framework lies in the advanced computer vision and reasoning models that can be plugged into it. Bickett envisions a "plug and play" scenario where you can simply insert a better model, and the system improves.
Shumer predicts that this technology will eventually replace traditional computer interfaces. He envisages a world where people can delegate tasks they dislike to their computer, using simple conversational commands.
The open-source nature of the framework is expected to accelerate progress, with developers worldwide experimenting with new applications. However, achieving the vision of truly intelligent computer agents will require significant resources and innovation.
This is where AI research company Imbue comes in. They recently secured a $150 million partnership with Dell to build a powerful AI training platform. The focus of their work is on developing AI capable of reasoning, a critical component for autonomous functioning.
While the self-operating computer framework is just the first step, Bickett and Shumer see it as the dawn of a new era where sophisticated AI agents replace human computing interfaces entirely.
Made with TRUST_AI - see the Charter: https://www.modelprop.co.uk/trust-ai
Comments