Apple Foundation Model on device
Since iOS/macOS 26, developers can utilize a built-in language model provided by Apple in their apps, using Swift APIs.
This is the same model that powers Apple Intelligence. It is local and private.
Here, I explore the model’s features, capabilities, and benchmarks to provide a high-level overview. For a deep-dive, see the Apple Intelligence Foundation Language Models ↗ by Apple and the 2025 update ↗ from the Apple ML team.
Following best practices, Apple did a terrible job naming its own models. The best nomenclature I could find is AFM-on-device and AFM-server for the local and server Apple models respectively. There seems to be no versioning schema.
Note: here, I’m mostly interested in the on-device model. The server model doesn’t have much public info and is not that interesting overall (there are better models).
Key
- 3 billion parameters
- 2 bits/weight quantization
- context size: 4096 tokens ↗ (input + output)
The model is proprietary and closed-weight.
Most likely, the same model is used across all Apple devices and is part of the software updates (i.e., newer versions of iOS/macOS will get newer/updated models).
Capabilities
- tool calling
- structured outputs
- supports 15 languages
- lightweight, task-specific fine-tuning
No multimodality: Both the text and image models are single-modal.
No reasoning: For advanced reasoning, Apple suggests using large reasoning models.
Model Dimensions
Param | Value |
---|---|
Model dimension | 3072 |
Head dimension | 128 |
Num query heads | 24 |
Num key/value heads | 8 |
Num layers | 26 |
Num non-embedding params (B) | 2.58 |
Num embedding params (B) | 0.15 |
Training data
As per paper ↗, Apple used:
- data licensed from publishers
- publicly available curated datasets
- open-sourced datasets
- open source repositories on GitHub
- information crawled by Applebot, Apple’s web crawler
No personal data from Apple users was used.
The model was trained from scratch (i.e., it’s not based on any existing open-source model).
Benchmarks
Pre-training
Metric | AFM-on-device | AFM-server |
---|---|---|
MMLU (5 shot) | 61.4 | 75.4 |
Metric | AFM-server |
---|---|
MMLU (5-shot) | 75.3 |
GSM8K (5-shot) | 72.4 |
ARC-c (25-shot) | 69.7 |
HellaSwag (10-shot) | 86.9 |
Winogrande (5-shot) | 79.2 |
Metric | AFM-server |
---|---|
Narrative QA | 77.5 |
Natural Questions (open) | 73.8 |
Natural Questions (closed) | 43.1 |
Openbook QA | 89.6 |
MMLU | 67.2 |
MATH-CoT | 55.4 |
GSM8K | 72.3 |
LegalBench | 67.9 |
MedQA | 64.4 |
WMT 2014 | 18.6 |
Human Evaluation

Instruction Following

Tool Use

Writing

Math

Safety

Conclusion
Overall, the model seems to be on par with small open-source models.
On iPhone (and likely iPad), it now makes no sense to bring any third-party models (as installing and managing those will net worse UX).
On macOS, I’d expect to have the option to run third-party on-device and hosted models (especially for pro users and complex use cases), while using AFM by default.