Alphabet-C's GPT Killer: "The Most Powerful Model" Gemini Shows Its Potential, Expected to be Released in Autumn

Wallstreetcn
2023.08.17 08:45
portai
I'm PortAI, I can summarize articles.

Media reports claim that Alphabet-C's "new killer" Gemini combines the capabilities of three major models: GPT-4, Midjourney, and Stable Diffusion. It can also provide analysis charts, create graphics with text descriptions, and control software using text or voice commands.

Google's new killer weapon Gemini is about to meet the world!

It is rumored that Gemini not only has the ability to engage in text conversations like GPT-4, but also integrates the capabilities of Midjourney and Stable Diffusion to generate images.

In order to compete with OpenAI, Google CEO Sundar Pichai took an extraordinary step in April this year by merging Google Brain and DeepMind, two teams with completely different cultures and codes.

Now, the Google Avengers, consisting of hundreds of engineers, are ready and working day and night, aiming to strike down OpenAI's GPT-4 and regain the leading position in the field of AI.

Google co-founder Sergey Brin has also returned to the battlefield and personally taken charge of training Gemini. It is said that Gemini will be unveiled this autumn, and Google's challenge is also imminent.

The list of Avengers has been exposed

Betting on Gemini to create the most powerful killer for GPT-4

According to insiders, Gemini combines the text capabilities of LLM with the graphic capabilities of GraphWriter. In other words, it is a combination of GPT-4 and Midjourney/Stable Diffusion.

This is also the first time the outside world has heard that Gemini has such powerful graphic capabilities. In addition, it can provide analytical charts, create graphics with text descriptions, and control software using text or voice commands.

In late June, Google DeepMind CEO Demis Hassabis revealed that Gemini will be integrated with AlphaGo and large language models, and Google DeepMind is ready to invest tens of millions, or even hundreds of millions of dollars.

Gemini will integrate the reinforcement learning and tree search used in AlphaGo, as well as technologies from the fields of robotics, neuroscience, and more.

It can be said that Google has placed a heavy emphasis on Gemini, which will power the Bard chatbot and drive enterprise applications such as Google Docs and Slides. In addition, Google hopes to charge developers for accessing Gemini through cloud server rental services.

Currently, Google Cloud sells access to Google AI models through the Vertex AI product. If these new features are implemented, Google is likely to catch up with Microsoft.

After all, Microsoft is already ahead in AI products, with AI features included in Office 365 applications, and its applications also sell access to ChatGPT to users.

James Cham, an investor in AI startups at Bloomberg Beta, the venture capital arm of Bloomberg, told Bloomberg, "For the past nine months, everyone has been asking the question: When will there be a company that seems to have the potential to surpass OpenAI?" "Now, it finally seems that there is a model that can compete with GPT-4."

Google Forced to Step Out of Its Comfort Zone

With the rise of OpenAI, Google has had to try out new technologies while ensuring the core search business.

According to insiders, Google is likely to use Gemini in certain products before launching it.

In the past, Google used relatively simple models to improve search, but products like Bard and Gemini require analyzing large amounts of images and text to generate more human-like responses. The potential cost of massive amounts of data and servers is something Google must control.

The updated Bard is more powerful.

Leveraging the Advantage of YouTube

According to The Information, Google has trained Gemini with a large number of YouTube videos. Moreover, Gemini can integrate audio and video into the model itself, forming multimodal capabilities, which many researchers consider to be the next frontier of AI.

For example, a model trained on YouTube videos can help mechanics diagnose car repair issues based on videos. Or it can generate software code based on sketches of websites or applications that users want to create. Previously, OpenAI demonstrated this feature of GPT-4, but it has not been released yet.

OpenAI CEO Greg Brockman demonstrated the ability of GPT-4 to read images and write web code, but it seems that the use of YouTube content has been delayed. It can also help Google develop more advanced text-to-video software that automatically generates detailed videos based on user-desired content descriptions.

This is similar to the technology being developed by the startup company RunwayML, which is supported by Google. Content creators in Hollywood are closely following the development of this technology.

Google DeepMind launches a comprehensive counterattack

In 2011, Google established Google Brain, aiming to build its own AI to optimize search results, precise ad targeting, and features such as auto-fill in Gmail.

DeepMind, located in London, is more dedicated to academic research. In 2016, AlphaGo defeated Lee Sedol with a score of 4 to 1, which was considered an important milestone on the path to general artificial intelligence (AGI). Besides using software developed by DeepMind to improve the efficiency of data centers, DeepMind's work has not had a significant impact on its core products.

But everything changed at the end of last year.

In November 2022, OpenAI released ChatGPT, and within a few weeks, the number of users skyrocketed to tens of millions, achieving the achievement of reaching 100 million users in the shortest time.

Within a few months, OpenAI's revenue reached hundreds of millions of dollars, and during this period, Microsoft invested $10 billion and countless capital flowed towards OpenAI. OpenAI's market value and reputation reached an unprecedented height.

It was at this time that Google realized that its leadership position in the field of AI was in jeopardy.

Google Brain + DeepMind = ?

In April of this year, Google, in a passive position, made a major move: Google Brain and DeepMind officially merged! The two major departments of "王不见王" unexpectedly merged, which surprised the public.

The merged Google DeepMind will be led by CEO Demis Hassabis, while former Google AI head Jeff Dean will take on the role of Chief Scientist.

Currently, at least 26 experts are responsible for the development of Gemini, including researchers who have previously worked at Google Brain and DeepMind. Insiders revealed that DeepMind's two executives, Oriol Vinyals and Koray Kavukcuoglu, will be responsible for Gemini's development alongside former Google Brain head Jeff Dean. They will oversee hundreds of employees involved in Gemini's development.

In addition, Google co-founder Sergey Brin has also made a comeback.

Sergey Brin and Larry Page have been evaluating the Gemini model and assisting employees in training the model. According to leaks, Brin also participated in the technical decision-making process of retraining the model after the team discovered that Gemini unexpectedly accepted potentially offensive content.

The "unexpected marriage" has its challenges.

With the merger of Google Brain and DeepMind, the new team quickly encountered a very serious problem - how to merge the code and which software to use for development? After all, the code repositories of these two departments were completely independent before.

Although both sides reached a compromise after making concessions:

  • During the pre-training phase of the model, Google Brain's software Pax, used for training machine learning models, will be used.

  • In the later stages, DeepMind's software Core Model Strike, used for model development, will be used. However, according to insiders, many employees are still unhappy because they have to use software they are not familiar with. In addition, both Google and DeepMind have developed their own models for ChatGPT. DeepMind has embarked on a project codenamed Goodall, aiming to develop a system that competes with ChatGPT using different variants of the undisclosed model Chipmunk. Meanwhile, Google Brain has initiated the development of Gemini.

In the end, DeepMind decided to abandon its own efforts and collaborate with Google Brain on the development of Gemini. Interestingly, it is said that Google Brain's attitude towards remote work policies is more lenient than that of DeepMind.

Internal Struggles, Turmoil, Counterattack

While the situation at OpenAI seems to be thriving, Google has been caught up in exhausting internal struggles. First, several senior technical talents, including Liam Fedus, Barret Zoph, and Luke Metz, among others, chose to join OpenAI. Although Google managed to regain some talent, such as rehiring Jacob Devlin and Jack Rae, Jacob Devlin criticized the development of Bard and left for OpenAI in January of this year. Jack Rae, a former researcher at DeepMind, joined OpenAI in 2022.

Previously, Devlin expressed concerns to executives such as Dean and Pi-chai about the Bard team's use of ChatGPT data for training. Subsequently, he resigned. In order to counter the dominance of ChatGPT and regain its position as a leader in the field of artificial intelligence, Google hastily released the chatbot Bard in February of this year. However, the launch event turned into a disaster due to a basic factual error, causing the company's market value to evaporate overnight. Google's first counterattack ended in turmoil.

In May, at the Google I/O conference, a brand new PaLM 2 model was released, greatly improving Bard's ability to answer questions and generate code.

Also announced was the Search Generative Experience (SGE), which combines generative AI with Google's traditional search service. Simply put, SGE is an AI-powered search service similar to Bing Chat, but instead of using a separate chat window, the AI-generated content is presented to users within the search results.

In other words, while conducting a search, Google will use AI to provide explanations for the search content, answer user questions, and assist users in travel planning, among other things.

Users no longer need to compare multiple links or exert effort to determine the authenticity of the information behind each link, because all available content is consolidated into the replies collected by AI.

In a recent update, Google added the ability for SGE to include images and videos in the AI-generated replies, helping users gain a more intuitive understanding of the knowledge and information they are searching for.

Similar to Bing Chat, the AI responses from SGE also include links with timestamps to support the AI-generated replies. If users are interested in related information, they can click on the links to gain a more comprehensive understanding of the specific content.

In the AI-generated replies, for many informative knowledge and concepts, users can directly hover their mouse to obtain accurate definitions of the concepts.

This feature has now been launched for AI replies related to science, history, economics, and other knowledge-based questions.

For users who need to browse lengthy web page information to learn or understand information, SGE has also introduced a web-based AI summarization feature called "SGE while browsing."

This feature serves as a "summary generator" that is at the user's disposal at any time.

For any lengthy web page content, users can use it to generate summaries and quickly grasp the key points. In the "Explore the Page" section below, users can also see questions related to the page content. If users are interested in a question, they can simply click on it to see how the article answers these questions.

However, due to Google's conservative market strategy, SEG currently only allows users in the United States to apply for testing through the Waiting List.

Therefore, most users may not even be aware that Google has launched such a service. In any case, it is reported that at least 21 generative AI tools have been tested after the merger of the two departments, including tools that provide life advice and psychological counseling to users.

Google, which fired an engineer claiming that chat AI has consciousness last year, is now exploring this "sensitive" area, which shows that they are really determined to take a gamble.

Gemini project, currently in a good situation

However, the merger of the two teams is indeed a big surprise for some engineers responsible for the Gemini project.

James Molloy and Tom Hennigan, who previously worked at DeepMind, are responsible for the infrastructure together with Paul Barham, a senior researcher at Google.

Timothy Lillicrap previously worked at DeepMind on the development of chess and Go systems, while Emily Pitler, a researcher at Google Brain, leads a team focused on enabling LLM to handle specialized tasks such as mathematics or web search.

However, in addition to personnel arrangements within the merged organization, the Gemini team also faces significant challenges in the development process, such as determining the data that can be used for model training.

Therefore, Google's lawyers have been closely evaluating this training work. In one case, lawyers asked researchers to remove training data from textbooks due to concerns about objections from copyright holders. These data could have helped train the model to answer questions in fields such as astronomy or biology.

However, Aydin Senkut, former Google executive and founder of VC company Felicis Ventures, praised the release of Gemini, saying that it showed "Google's determination to once again be at the forefront, rather than being extremely conservative."

Aydin Senkut also strongly agrees with Google's decision: "This is the right direction. In the end, they will succeed."