FoundationModels: Tool Calling for an Assistant App

23 August 2025

The app is open-sourced ↗

Previously, I made a reading app with article summarization powered by Apple’s on-device model.

This time, I wanted to see how good the model is at tool calling. For this, I went on to create a chat-like assistant app that allows you to search through your data.

Chat UI

In our case, the user will interact with the model via a chat interface.

There’s nothing specific about my implementation.

I’ve used NavigationSplitView with the chat list in the sidebar and the chat UI in the detail view. The user can create new chats and switch between them. I’ve used SwiftData to persist the chats across sessions.

One feature that I’ve found tricky to get right is the chat title generation. The usual approach is to pass the first message in a separate LLM request. The problem here is that the model is not very bright, as it tries to fulfill the user’s requests instead of providing you with a title. Here’s a prompt that worked well for me:

let titleSession = LanguageModelSession(instructions: "You are a helpful assistant that generates concise chat titles. Generate a short, concise title (3-5 words) based on the user's message. Do not respond to the user's message directly. Do not output anything else. ONLY OUTPUT THE TITLE")

let stream = titleSession.streamResponse {
    "USER MESSAGE START"
    userMessage
    "USER MESSAGE END"
}

To improve the output, you can also provide some examples or use structured outputs.

Note: at this point, the app can already serve as a general-purpose assistant. The output quality is dubious at best, but at least it’s fully private and works offline.

Map Search

To start, I’ve set out to implement a local map search with MapKit. By using MKLocalSearch.Request, you can search for places using natural language, perfect for our use case.

Here’s my tool:

final class SearchMapTool: Tool {
    let name = "searchMap"
    let description = "Performs a local search using the map to find nearby places"
    
    @Generable
    struct Arguments {
        @Guide(description: "The search query (e.g., 'restaurants', 'gas stations', 'coffee shops')")
        let query: String
    }
    
    func call(arguments: Arguments) async throws -> String {
        let request = MKLocalSearch.Request()
        request.naturalLanguageQuery = arguments.query
        request.resultTypes = [.pointOfInterest, .address]
        
        let search = MKLocalSearch(request: request)
        
        do {
            let response = try await search.start()
            let results = response.mapItems.prefix(5)
            
            if results.isEmpty {
                return "No results found for '\(arguments.query)'"
            }
            
            let formattedResults = results.map { item in
                let name = item.name ?? "N/A"
                let address = item.address?.shortAddress ?? "N/A"
                
                return """
                Name: \(name)
                Address: \(address)
                """
            }.joined(separator: "\n\n")
            
            return formattedResults
        } catch {
            return "Search failed: \(error.localizedDescription)"
        }
    }
}

When called, the tool would make a local search request and convert the results into an LLM-friendly format.

We initialize the session with our tool:

let localSearchTool = SearchMapTool()
let session = LanguageModelSession(
    tools: [localSearchTool],
    instructions: "You are a helpful assistant. Respond naturally and conversationally."
)

Now, whenever it’s relevant, the model will use the tool to answer the user’s request.

Generative UI

Now, the above is great and all, but it’s too basic.

We know that the output of that prompt would be a list of places. It’s a great opportunity to craft a bespoke UI to render them nicely.

This concept is not new; it’s been known as generative UI ↗ for years.

Implementing this for FoundationModels is new.

Transcript

My first idea was to use chat transcripts to extract the tool call outputs. Transcripts store the entire conversation in a structured format, so that you can access every user’s input, every model’s response, and, of course, every tool call, including the outputs.

Using Transcript ↗ is relatively simple. The downside is that you’d need to parse the tool output. Alternatively, you can output the results as JSON, which is usually not great for LLMs to process.

I’d make the tool to output the search results as an object:

@Model
final class LocalSearchResult {
    var name: String?
    var address: String?
    
    init(name: String?, address: String?) {
        self.name = name
        self.address = address
    }
}

@Generable
struct LocalSearchResultData: Codable {
    var name: String?
    var address: String?
}

@Generable
struct LocalSearchToolOutput: Codable {
    var results: [LocalSearchResultData]
}

final class LocalSearchTool: Tool {
    // …
    
    func call(arguments: Arguments) async throws -> LocalSearchToolOutput {
        // …
        do {
            let response = try await search.start()
            let results = response.mapItems.prefix(5)
            if results.isEmpty {
                return "No results found for '\(arguments.query)'"
            }
            let toolResults = results.map { item in
                let name = item.name
                let address = item.address?.shortAddress
                return MapLocationData(name: name, address: address)
            }
            let output = LocalSearchToolOutput(results: toolResults)
            return GeneratedContent(output)
        } catch {
            return "Search failed: \(error.localizedDescription)"
        }
    }
}

I’d then access the tool outputs via chat transcript:

private func extractChatMessages(from transcript: Transcript) -> [ChatMessage] {
    var messages: [ChatMessage] = []
    var currentLocalSearchResults: [LocalSearchResult] = []
    // Extract messages from stable transcript
    for entry in transcript {
        switch entry {
        case .prompt(let prompt):
            // …
        case .toolOutput(let output):
            if output.toolName == "localSearch" {
                currentLocalSearchResults = extractLocalSearchResults(from: output)
            }
        case .response(let response):
            // …
        case .toolCalls(_), .instructions(_):
            break
        @unknown default:
            break
        }
    }
    // …
    return messages
}

private func extractLocalSearchResults(from toolOutput: Transcript.ToolOutput) -> [LocalSearchResult] {
    var searchResults: [LocalSearchResult] = []
    
    for segment in toolOutput.segments {
        switch segment {
        case .structure(let structuredSegment):
            if structuredSegment.source == "localSearch" {
                do {
                    let toolOutput = try structuredSegment.content.value(LocalSearchToolOutput.self)
                    searchResults = toolOutput.results.map { resultData in
                        LocalSearchResult(
                            name: resultData.name,
                            address: resultData.address
                        )
                    }
                } catch {
                    // Handle parsing error silently
                }
            }
        case .text(let textSegment):
            if let jsonData = textSegment.content.data(using: .utf8) {
                do {
                    let toolOutput = try JSONDecoder().decode(LocalSearchToolOutput.self, from: jsonData)
                    searchResults = toolOutput.results.map { resultData in
                        LocalSearchResult(
                            name: resultData.name,
                            address: resultData.address
                        )
                    }
                } catch {
                    // Handle JSON decoding error silently
                }
            }
        @unknown default:
            break
        }
    }
    
    return searchResults
}

There’s another issue with transcripts. In our implementation, they are fetched after the response is finished, which means that the user will only be able to see the UI once the model finishes responding.

Structured Outputs

Another option would be to get the model to output the search results as a Generable struct. This is a simple solution, and it supports streaming.

It’s kinda redundant, as the data will be coded into text first as a tool output, then it will be converted back to the object by the model. There’s also a chance the model will mess up the encoding.

Finally, the output still would take some time to generate.

Tool Callback

Finally, the solution I’ve settled on is to pass the output via a callback.

This way, we don’t need to parse anything; we can also pass the output to the model as plain text, and it works instantly:

@Model
final class MapLocation {
    var name: String?
    var address: String?
    
    init(name: String?, address: String?) {
        self.name = name
        self.address = address
    }
}

struct MapLocationData: Codable {
    var name: String?
    var address: String?
}

final class SearchMapTool: Tool {
    // …
    
    var onResults: (([MapLocationData]) -> Void)
    
    init(onResults: @escaping ([MapLocationData]) -> Void) {
        self.onResults = onResults
    }

    // …
    
    func call(arguments: Arguments) async throws -> String {
        // …        
        do {
            // …
            let toolResults = results.map { item in
                let name = item.name
                let address = item.address?.shortAddress
                return MapLocationData(name: name, address: address)
            }
            
            // Call the callback with results
            onResults(toolResults)
            
            let formattedResults = toolResults.enumerated().map { index, result in
                let name = result.name ?? "N/A"
                let address = result.address ?? "N/A"
                
                return """
                Result \(index + 1):
                Name: \(name)
                Address: \(address)
                """
            }.joined(separator: "\n\n")
            
            return formattedResults
        } catch {
            return "Search failed: \(error.localizedDescription)"
        }
    }
}

In the target view, we simply handle the callback with any logic (e.g., pass the tool output to ViewModel and conditionally render the custom view based on that).

With either approach, the resulting UI is much nicer.

More tools

Once I polished the flow for a single tool, I went ahead and added more. Specifically, I implemented “search events” and “search reminders” tools using EventKit.

To filter the returned reminders, I’ve used the @Guide macro:

// GetRemindersTool
@Generable
struct Arguments {
    @Guide(description: "Filter reminders by completion status", .anyOf(["all", "completed", "incomplete"]))
    let status: String
    
    @Guide(description: "Number of reminders to return", .range(1...10))
    let count: Int?
}

Tool Selection

One thing I’ve quickly noticed is that the model is not perfect when it comes to picking a tool. Sometimes, it doesn’t call the tool when it needs to, and sometimes it’s the other way around.

In general, when it comes to smaller LLMs, the best trick I’ve personally found is to remove the wiggle room for the model as much as possible. For example, making the prompt more specific works very well.

In this case, instead of providing the tools at all times, I’ve decided to make it conditional based on the user’s input.

When composing a message, the user can see the available tools and pick one.

private let availableTools = [
    ToolPickerItem(name: "Map", icon: "map", description: "Search for local places"),
    ToolPickerItem(name: "Calendar", icon: "calendar", description: "View upcoming calendar events"),
    ToolPickerItem(name: "Reminders", icon: "checklist", description: "View reminders from the reminders app")
]

private var filteredTools: [ToolPickerItem] {
    if toolQuery.isEmpty {
        return availableTools
    }
    return availableTools.filter { $0.name.localizedCaseInsensitiveContains(toolQuery) }
}

private var toolPickerView: some View {
    VStack(alignment: .leading, spacing: 0) {
        ForEach(Array(filteredTools.enumerated()), id: \.element.name) { index, tool in
            HStack {
                Image(systemName: tool.icon)
                
                VStack(alignment: .leading, spacing: 2) {
                    Text(tool.name)
                    Text(tool.description)
                }
                
                Spacer()
            }
        }
    }
}

If the tool is chosen, we provide that tool (and that tool only) to the LanguageModelSession context, including a specific prompt to nag tool usage:

var tools: [any Tool] = []
var toolInstruction = ""

switch tool.name {
case "Map":
    tools = [localSearchTool]
    toolInstruction = "Please use the localSearch tool to help answer this request."
case "Calendar":
    tools = [eventsTool]
    toolInstruction = "Please use the calendarEvents tool to help answer this request."
case "Reminders":
    tools = [remindersTool]
    toolInstruction = "Please use the reminders tool to help answer this request."
default:
    tools = [localSearchTool, eventsTool, remindersTool]
    toolInstruction = "Please use the appropriate tool to help answer this request."
}

responseSession = LanguageModelSession(
    tools: tools,
) {
    "You are a helpful assistant."
    
    "Respond naturally and conversationally."
    
    toolInstruction
}

Note: it could be an interesting experiment to use the .contentTagging ↗ model use case to match the user prompt with one of the available tools as a preliminary step.

Troubleshooting

Generation Errors

I’ve been getting GenerationError on and off, specifically the guardrailViolation case, without any clear pattern.

I’ve seen other developers found ↗ tool calling unreliable as well.

In my case, since this was happening after the tool was called, I could still get the tool results via callback and render them as if nothing happened.

I also experienced this issue when the tool output was “empty” (e.g., an empty string).

Large Tool Output

For some tools (e.g., when getting all the reminders), the output can be large, with hundreds or even thousands of items. Usually, the best approach is to sort the items and only return the most relevant ones.

Tool (not) gets called wrongfully

As mentioned above, the fool-proof way to solve this is to let the user choose the tool to use (if any).

Based on the selected tool, update the system prompt to nag about the tool usage and provide more context on the tool itself.

Conclusion

While I found the tool calling to work quite well, as with my previous app, I don’t think AI provides enough utility here. In most cases, a better way to access my reminders and calendar events would be to open up Raycast. As for the map locations, I think just opening up the Map would have been easier.

An LLM-powered app like this one could be helpful if I need to search through my notes, emails, or texts. For those cases, there is no quick and easy solution, and the built-in search features in those apps usually suck. Unfortunately, integrating those data sources would take much more effort.

Still, I think the app works quite well for its intended purpose. The LLM is decent at tool calling (especially, with some hints from the user), and the resulting UI is quite cool.

Demo