Report: Google will develop artificial intelligence that can control computers

Google plans to develop an artificial intelligence named "Project Jarvis" aimed at taking over users' browsers to help with daily tasks such as research, shopping, and flight bookings. It is expected to preview in December and will be supported by the new generation Gemini large language model. Despite Google's accumulation in AI basic research, it still lags behind OpenAI in reasoning ability, leading to its Gemini chatbot's lack of competitiveness

On October 26th, according to The Information, Google is developing an artificial intelligence that can control computers, with plans to preview this new AI product as early as December.

The product, also known as "Computer Usage Agent," is designed to take over users' browsers to help consumers with various daily tasks such as research, purchasing products, or booking flights. Three sources cited by The Information revealed that the project is codenamed "Project Jarvis," similar to a product announced by Anthropic earlier this week.

They also disclosed that in December, Google will release its next-generation flagship Gemini large language model, which will power Jarvis.

Aiming to Catch Up with OpenAI, Customized for Chrome

However, the release schedule for Jarvis indicates that while Google has accumulated some research foundation in AI technology, it is still playing catch-up with its competitors. Currently, Google is still developing AI with so-called "reasoning capabilities," while OpenAI introduced this feature as early as September.

Analysis suggests that Google's Gemini chatbot is significantly lagging behind OpenAI's ChatGPT in competition, leading companies to turn to OpenAI's large language models (LLMs) and making it challenging for Google's Gemini model to catch up. To enhance AI development efficiency, Google merged the team responsible for the Gemini chatbot into its main AI team, DeepMind, last week.

It is worth noting that AI developers now consider "agents" (AI systems capable of completing complex tasks without human supervision) as the next stage of the industry. Companies like Salesforce, Microsoft, and Workday have been purchasing LLMs from OpenAI and other companies, racing to develop AI agents using this technology.

Anthropic and Google are attempting to take the concept of AI agents to a deeper level by interacting directly with personal computers or browsers through software. OpenAI has also been developing similar software for most of this year.

Sources stated that Google's AI agent product is similar to the one launched by Anthropic, both frequently capturing content on users' computer screens and explaining screenshots before taking actions like clicking buttons or entering text to respond to user commands.

However, there are key differences between the agent products of the two companies:

Anthropic claims its product can operate on different applications installed on computers, while Jarvis can currently only operate on browsers and has been "customized" for Google's Chrome browser Insiders also indicated that, at least for now, Jarvis is targeted at those who wish to automate daily web tasks. At Google's developer conference this spring, CEO Sundar Pichai hinted that the future Gemini version could autonomously perform multiple operations, such as helping users return a pair of shoes.

Slow Product Response Time, Security Questioned

Insiders also noted that the plan for "Jarvis" is tentative and subject to change. Reports suggest that Google may initially release the product to a small number of early testers to help identify and address its shortcomings. The agent currently operates relatively slowly, as the model needs a few seconds to think before taking each action.

Furthermore, since Google needs access to customers' private information such as login passwords and credit card details to access different websites to complete tasks or make purchases as per the customers' requests.

Analysis points out that Google needs to convince people that its AI agent can securely handle their personal data, which is essential for carrying out tasks.

In addition, LLMs have some common vulnerabilities, such as the potential for generating incorrect answers. Previously, Google used LLM-driven conversational answers in its search engine, resulting in many obvious errors