Turn your iPhone camera into a multilingual vocabulary teacher. Point at any object in the real world — VisualVocab identifies it and teaches you its name in 11 languages, instantly.















Open the camera and let YOLO detect objects around you in real time at up to 24 FPS — no internet required. Bounding boxes appear instantly around everything in view.
See something interesting? Tap it for instant AI identification. Or drag to draw a precise region around any object — perfect for isolating one item in a crowded scene.
Every identified word is saved to your personal vocabulary, automatically translated into your target language. Track encounter counts, confidence scores, and watch your dictionary grow as you explore the world.
Three powerful ways to discover and learn vocabulary from the world around you. No flashcards. No memorization exercises. Just your camera and the world.
Your camera detects and labels objects at up to 24 FPS. A 3.25 MB model runs directly on your device, identifying up to 20 objects simultaneously from 80 COCO classes — bounding boxes appear instantly with labels translated in real time.
Every object you identify gets saved to your personal vocabulary, deduplicated and tracked. See encounter counts, confidence scores, and credits spent. Your vocabulary grows as you explore, synced to the cloud across all your devices.
Drag a custom bounding box around exactly what you want. Perfect for isolating a specific item on a crowded table or a particular plant in a garden.
Spanish, Japanese, French, German, Korean, Chinese, Portuguese, Norwegian, Hindi, Russian — plus English. Swap source ↔ target with one tap.
YOLO and Apple AI modes require zero internet. All 80 object classes have hand-verified translations in every supported language, baked into the app.
Choose the right mode for your needs. Switch instantly between on-device speed, cloud-powered precision, and Apple's private on-device intelligence.
YOLOv26n runs directly on your iPhone at 15–24 FPS, detecting up to 20 objects simultaneously from 80 trained classes. Zero network calls. Bounding boxes rendered at 60 FPS via Skia GPU canvas.
Tap or draw on any object to send a cropped image to Anthropic's Claude Haiku 4.5 vision model. Recognizes brands, dishes, species, architectural styles — anything a world-class AI can see. Context-aware and highly accurate.
A two-stage pipeline entirely on your device. Apple Vision classifies the region, then Apple Foundation Models reasons over candidates and translates. No data ever leaves your phone.
Choose any combination of source and target language. All 80 YOLO object classes have hand-verified translations in every supported language. Claude AI and Apple AI generate dynamic translations for any object — no class limits.
Swap source ↔ target with one tap
YOLO and Apple AI are always free — no account needed, no credits consumed. Credits unlock Claude AI's cloud-powered precision for identifying anything your camera can see.
Everything you need to know about VisualVocab. Can't find an answer? Reach out via the app or our support page.