Apple Foundation Model on device

25 June 2025

Since iOS/macOS 26, developers can utilize a built-in language model provided by Apple in their apps, using Swift APIs.

This is the same model that powers Apple Intelligence. It is local and private.

Here, I explore the model’s features, capabilities, and benchmarks to provide a high-level overview. For a deep-dive, see the Apple Intelligence Foundation Language Models ↗ by Apple and the 2025 update ↗ from the Apple ML team.

Following best practices, Apple did a terrible job naming its own models. The best nomenclature I could find is AFM-on-device and AFM-server for the local and server Apple models respectively. There seems to be no versioning schema.

Note: here, I’m mostly interested in the on-device model. The server model doesn’t have much public info and is not that interesting overall (there are better models).

Key

3 billion parameters
2 bits/weight quantization
context size: 4096 tokens ↗ (input + output)

The model is proprietary and closed-weight.

Most likely, the same model is used across all Apple devices and is part of the software updates (i.e., newer versions of iOS/macOS will get newer/updated models).

Capabilities

tool calling
structured outputs
supports 15 languages
lightweight, task-specific fine-tuning

No multimodality: Both the text and image models are single-modal.

No reasoning: For advanced reasoning, Apple suggests using large reasoning models.

Model Dimensions

Param	Value
Model dimension	3072
Head dimension	128
Num query heads	24
Num key/value heads	8
Num layers	26
Num non-embedding params (B)	2.58
Num embedding params (B)	0.15

Training data

As per paper ↗, Apple used:

data licensed from publishers
publicly available curated datasets
open-sourced datasets
open source repositories on GitHub
information crawled by Applebot, Apple’s web crawler

No personal data from Apple users was used.

The model was trained from scratch (i.e., it’s not based on any existing open-source model).

Benchmarks

Pre-training

Metric	AFM-on-device	AFM-server
MMLU (5 shot)	61.4	75.4

Metric	AFM-server
MMLU (5-shot)	75.3
GSM8K (5-shot)	72.4
ARC-c (25-shot)	69.7
HellaSwag (10-shot)	86.9
Winogrande (5-shot)	79.2

Metric	AFM-server
Narrative QA	77.5
Natural Questions (open)	73.8
Natural Questions (closed)	43.1
Openbook QA	89.6
MMLU	67.2
MATH-CoT	55.4
GSM8K	72.3
LegalBench	67.9
MedQA	64.4
WMT 2014	18.6

Human Evaluation

Side-by-side evaluation of AFM-on-device and AFM-server against comparable models

Instruction Following

Tool Use

Berkeley Function Calling Leaderboard Benchmark evaluation results on Function Calling API

Writing

Math

Safety

Conclusion

Overall, the model seems to be on par with small open-source models.

On iPhone (and likely iPad), it now makes no sense to bring any third-party models (as installing and managing those will net worse UX).

On macOS, I’d expect to have the option to run third-party on-device and hosted models (especially for pro users and complex use cases), while using AFM by default.