Apple's AI-powered iOS debuts with a bang: Instantly boosts conversational EQ, large models become the ultimate mouthpiece, Siri undergoes a magnificent transformation

Apple's AI version of iOS had a hot first day, with Siri completely transformed into Apple Intelligence & Siri, capable of polishing Twitter comments and gracefully handling inappropriate language. Apple's large-scale model is trained using Google's TPU cluster, achieving results surpassing GPT-4. Pang Ruoming, head of Apple's basic large model team, stated that these basic models support a wide range of functions, including summarization, writing assistance, tool usage, and coding

It's here, it's here! Apple's Apple Intelligence finally meets fruit powder!

With the launch of iOS 18.1 Beta, registered developers can now experience some of Apple's AI features.

The most obvious change is the comprehensive transformation of Siri, which has become Apple Intelligence & Siri.

Another major update is the writing function, which can help polish Twitter comments and easily arrange sophisticated expressions.

Even dirty words can be turned into polite and friendly words in minutes:

After activating Apple Intelligence, Apple's self-developed on-device large model will be downloaded to the device.

According to feedback from quick netizens, unlike other AI services, Apple's AI is not prone to service rejection.

At the same time, Apple's in-house large model report has also been released, revealing a large amount of technical details.

The report shows that in tasks such as instruction following and text summarization, Apple's on-cloud large model has achieved results surpassing GPT-4.

Ruoming Pang, head of Apple's Basic Large Model Team, also stated that their model is competitive compared to some of the best models in the same category

Pang Ruoming is a computer science Ph.D. from Princeton University. He obtained his bachelor's and master's degrees from Shanghai Jiao Tong University and the University of Southern California, respectively. In 2021, he joined Apple after working as an engineer at Google for 15 years.

The main conversational features of Apple Intelligence are supported by the models developed by his team.

He also emphasized that these foundational models are "not chatbots" but support a wide range of functions, including summaries, writing assistance, tool usage, and code.

Additionally, Apple has developed many in-house algorithms to enhance model performance, with specific information disclosed in the report.

Some attentive netizens have discovered a detail -

The training of Apple's large models uses Google's TPU cluster, with zero involvement from NVIDIA.

Siri Upgrade, but ChatGPT Not Yet Integrated

To experience Apple's Apple Intelligence, there are several requirements that need to be met.

Firstly, the iOS 18.1 Beta version that supports it is currently limited to registered developers at $99 per year, so regular users will have to wait.

As mentioned earlier, it only supports M-series and A17 Pro chips, meaning only certain regions' 15 Pro and 15 Pro Max iPhones can use it.

Apart from hardware and identity requirements, system settings also need to be modified, setting the region to the United States and changing the device and Siri language to English.

After meeting all these requirements, you can... join the waiting queue.

The Apple Intelligence launched this time is a partial feature, mainly focusing on text generation, Siri, and the Photos module.

Starting with text generation, as an important part of Apple AI, this feature is not limited to official Apple applications As long as the standard input text system is used, it can also be used in third-party applications for text summarization, proofreading, and rewriting.

In addition, combined with the audio transcription function already launched in the iOS 18 Beta's voice memo, the text generation system can also generate summaries for recordings.

The second important update is Siri.

On the interface, the new version of Siri is no longer a circular icon, but a colorful light that continuously flashes around the screen during operation.

Moreover, it also provides a text conversation method for users who do not want to have voice conversations. Double-clicking the bottom of the screen will bring up the keyboard for typing communication with Siri.

In terms of content, the new version of Siri will be able to answer questions related to Apple products and help users troubleshoot.

Furthermore, the new Siri can understand the context from one query to the next, for example, asking Siri to create a calendar event and then requesting to create a reminder without restating the topic being discussed.

However, the previously introduced screen-aware feature is not included in this Siri update.

The update to the Photos app allows users to search for specific photos using natural language, even specific moments in videos.

The above is the general content related to AI in this developer testing version. It should be noted that this is only a part of the features shown at the previous event, and many more have not been launched.

In particular, the previously mentioned integration of ChatGPT has not been included in this update.

Decrypting Apple's Large Model

Apple has stated that ChatGPT is not a mandatory option in Apple's AI, and the main function is driven by their in-house large model.

Regarding this model, Apple also released a comprehensive technical report at the same time it was launched.

The model's name is straightforward, called the Apple Foundation Model (AFM), with both on-device and server versions.

The on-device model has around 3 billion parameters, while the server-side model's specific size was not disclosed, only mentioned to be larger than the on-device model, with both having a 32k context window

The training process has 0 NVIDIA content

The model training is carried out using the in-house AXLearn framework based on JAX, employing strategies such as tensor parallelism and pipeline parallelism.

The hardware used is Google TPU, with 8192 TPUv4 chips on the cloud side and 2048 TPUv5p chips on the edge side, resulting in 0 NVIDIA content.

The data mainly comes from web pages crawled by Applebot, as well as publicly licensed code and mathematical datasets.

It is worth mentioning that none of the datasets selected by Apple are under the GPL license; they are all under more open-source licenses such as MIT, Apache, and CC0.

In terms of process, the pre-training process of AFM is divided into three stages - core training, continued training, and context extension.

During the core training stage, the cloud-side version has a data volume of 6.3T tokens with a window length of 4096, and the edge-side version is distilled based on this.

During continued training, the weights of low-quality data are reduced, and mathematical, code, and authorized high-quality data are used to enhance the model's capabilities.

This process uses data of 1T tokens, and the window length is increased from 4096 to 8192.

In the next stage, the window length is further extended to 32k, involving long sequence text and synthetic data, with a total of 100B tokens.

Innovative reinforcement learning algorithms

The post-training of AFM includes Supervised Fine-Tuning (SFT) and Human Feedback Reinforcement Learning (RLHF).

In the SFT stage, synthetic data and human-annotated data are used, with synthetic data mainly focusing on mathematics, tool usage, and code.

In the RLHF stage, Apple has created two reinforcement learning algorithms, iTeC and MDLOO.

iTeC, short for Iterative Teaching Committee, is an algorithm used for post-training reinforcement learning, aiming to optimize the model's performance through multiple rounds of iteration The core idea is to combine different preference optimization algorithms, including rejection sampling, direct preference optimization (DPO), enabling the model to benefit from multiple optimization strategies, thereby improving its adaptability and performance for specific tasks.

In each iteration, iTeC will select a group of best-performing models from the latest models, forming a "model committee." These models are trained using different training methods such as SFT, RS, DPO/IPO, and RL.

By collecting human preference feedback on model responses, iTeC continuously updates its reward model and uses it to train a new set of models.

After collecting a batch of human preference data, iTeC refreshes its reward model and trains a new set of models, repeating this cycle through multiple rounds of iterations to gradually improve model performance.

MDLOO is an online reinforcement learning algorithm specifically designed to optimize the quality of model responses.

As an online algorithm, it can decode responses in real-time during model training and apply RL algorithms to maximize rewards.

In other words, this method allows the model to continuously learn and adjust its strategy during training to generate responses that better align with human preferences.

In terms of implementation, it combines the advantages of Leave-One-Out (LOO) estimator and Mirror Descent Policy Optimization (MDPO) to achieve more stable and effective policy updates.

Edge-side Mixed Precision Quantization

To make edge-side models run more efficiently while avoiding excessive memory usage, Apple has quantized the edge-side version of AFM.

Specifically, Apple has adopted a mixed precision quantization approach, using different quantization precisions for different stages.

The approach used by Apple is called the "palette" strategy. In palette quantization, weights are not quantized individually but grouped together, with weights within the group sharing the same quantization constant.

For projection weights, every 16 columns/rows share the same quantization constant, and 4-bit quantization is performed using the K-means algorithm.

For embedding layers, which are shared between input and output, 8-bit integer quantization is used for each channel, and certain less important layers are further compressed to 2-bit quantization.

To recover the performance lost after quantization, to maintain the quality and accuracy of the model's output, Apple has also introduced Accuracy-Recovery Adapters.

These adapters are small neural network modules that can be inserted into specific layers of the pre-trained model, trained on top of the quantized model through fine-tuning to learn how to compensate for the effects of quantization.

Surpassing GPT-4 in Some Tasks

After applying a series of optimization techniques, it is time to evaluate the model's performance.

In this process, Apple adopts a strategy that combines human evaluation with automated evaluation.

Starting with human evaluation, evaluators design a variety of questions covering analysis, reasoning, brainstorming, chatbots, and other aspects, and have the model generate corresponding responses At the same time, the questions will also be presented to other models used for comparison, and then evaluated by assessors to determine which model's output is better.

As a result, whether it is a cloud-side or edge-side model, there is at least a 60% probability of not losing to comparison models such as Llama 3 and GPT-4.

The rest of the tests mainly rely on datasets.

In terms of compliance with instructions, Apple conducted IFEval tests, and the cloud-side AFM outperformed GPT-4 at both the instruction and prompt levels, becoming the new SOTA.

The performance of edge-side models also surpassed models of similar sizes such as Llama 3-8B and Mistral-7B.

In AlpacaEval, both edge-side and cloud-side AFM also achieved second place.

Looking at the specific task performance, AFM achieved SOTA in the summarization task in the writing benchmark, and was also close to the first place in the writing task.

In terms of mathematics, Apple evaluated with two datasets, GSM8K and MATH.

The results showed that the edge-side model was inferior to Llama 3-8B and Microsoft's Phi 3 mini on GSM8K, while the cloud-side model was surpassed by GPT-4 and Llama 3-70B but performed better than GPT-3.5.

The performance on MATH was relatively higher, with the edge-side version leading models of the same size and the cloud-side version surpassing Llama 3-70B.

Apart from performance, security is also very important. Apple evaluated the AFM's ability to resist adversarial attacks manually.

The results showed that when facing adversarial prompts, the violation rate achieved by AFM was significantly lower than other open source and commercial models.

Above are some highlights from the Apple Big Model Technology Report. For more details, please refer to the original report.

One More Thing

Although Apple Intelligence has been provided to developers for testing, Bloomberg reported that the official version may be delayed.

Indeed, following Apple's previous version release pattern, the version number 18.1 also indicates that these features will not be launched alongside the new iPhone in September.

Analyst Gene Munster suggested that Apple should consider postponing the release date of iPhone 16 to align with Apple Intelligence.

As for whether Cook will consider this suggestion, we will have to wait and see.

Author: Quantum Bit Kresi, Source: Quantum Bit, Original Title: Apple AI Version iOS First Day Hot: Chatting Instantly Becomes Emotionally Intelligent, Big Model Becomes the Strongest Mouthpiece, Siri Gorgeous Transformation