Edge AI: exploring the capabilities of Apple’s VLM

Demonstrate Apple’s on‑device VLM, a quantized Qwen model, evaluating Q&A accuracy, prompt response, multilingual visual/text support, and hardware‑specific resource use.

iOS macOS Xcode Swift Qwen

Overview

Apple has released a VLM that’s a quantized fine tuned version of Qwen they’ve optimized for iOS and macOS Apple Silicon devices. I want to show some experiments on when it works and when it fails. For example how good is it at Q&A? How responsive to prompting is it? What languages can it work with both visually and textually? Are resource usages different on different hardware? What tunability does Apple offer by default?

Tech stack