Education startup (UX / parent app)¶
| Field | Value |
|---|---|
| Status | Active |
| Type | Work |
Description¶
UX and product build for a parent-facing education startup app focused on asynchronous feedback: parents get coaching-style commentary on an activity with their child (often video-based).
Boxy — on-device video/audio POC (exploratory)¶
Goal: Test whether lightweight, rules-based coaching — combining caregiver speech with basic infant vocalization cues — is enough to unlock parent participation before heavier investment. Not positioned as the final technical approach.
Privacy / scope: Native iOS POC: parents import clips (e.g. from Photos), all analysis on device (intentional for feasibility); tradeoff is TestFlight-style access for testers and minutes-long analysis per video (async UX under exploration).
Iteration 1 — Vision + speech¶
- SwiftUI app: local library, Apple Vision (framing/proximity from sampled frames) + on-device speech (pacing, pauses, rough turn-taking, transcript sentiment proxy), Settings to wipe data, DEBUG re-analyze; AVAudioSession + unmuted AVPlayer for audible playback.
- Result: Pipeline worked technically, but models missed infant / parentese-like audio — mostly adult speech only.
Iteration 2 — Speech-only + custom coo detector (current POC)¶
- Vision off; speech-only interaction analysis with heuristic coach copy from transcript + audio cues; privacy/UI copy updated.
- Caregiver speech: WhisperKit on-device (medium / large class models); speedups from smaller “distilled” Whisper options + a perf bug fix.
- Infant cues: Custom baby-coo classifier — trained offline, shipped as Core ML (
BabyCooClassifier.mlpackage), same mel frontend as training,log_melper export JSON, class_logits →[not_coo, coo](softmax for score), sliding chunks over clip audio alongside Whisper. - Output: Rules-based coaching blends what the parent said with “possible coo here” moments; timeline of coo-like regions; notes tie to parent lines before/after, quiet / turn-taking space, and slightly more specific copy when data allows.
Training / tooling (offline)¶
- FastAPI labeling + training pipeline: Distil-Whisper LoRA, baby-coo CNN, optional Core ML export for the coo model; end-to-end runs on a small set of home / usability videos after stack work (venv, ffmpeg PATH + subprocess, Starlette/Jinja TemplateResponse, transformers/PEFT API drift — e.g. SEQ_2_SEQ_LM removal so Whisper + LoRA runs).
Caveats (explicit)¶
- Latency: Analysis still several minutes per video on device; may improve with processing or async product framing.
- Reach: Native iOS POC → TestFlight for external testing.
Full verbatim notes: raw/education-startup-boxy-poc.md.
Gemini (cloud) — primary product prototype¶
After the on-device path proved too weak for full coaching quality (and Gemini could cover the coaching job the small local stack was chasing), the main prototype moved cloud-side:
- Video analysis: Gemini on an uploaded clip — high-quality coaching feedback in about 16 seconds for a 2:30 video, minimal tuning.
- Toy photos + style prompt: Activity ideas + stylized paper-cut illustration; quality described as very strong.
Current prototype (split hosting)¶
| Track | Hosting / infra | AI |
|---|---|---|
| Video | GitLab Pages (web) + Google Cloud (video upload) | Gemini API — video analysis / coaching feedback |
| Photo / activities | Vercel | Gemini API — activity generation + image generation |
(Public URLs for Pages/Vercel not captured in wiki yet — add to raw/ when stable.)
Product themes¶
- Async parent feedback (not necessarily live sessions).
- Multimodal: video of parent–child activity; optional toy-photo → activities + illustration.
- Historical thread: Boxy explored distillation-flavored smaller Whisper + custom detector for on-device coaching cues; Gemini is the current bet for quality on video.
Tech stack (summary)¶
- Explored: iOS (SwiftUI, WhisperKit, Core ML coo model, Apple Vision in v1), FastAPI training service, Distil-Whisper + LoRA, CNN coo classifier.
- Now: Web prototype; Gemini API; Google Cloud (uploads); GitLab Pages; Vercel.
Raw sources¶
raw/projects.md(early stub)raw/education-startup-gemini-prototype.md(Gemini + prototype detail)raw/education-startup-boxy-poc.md(Boxy POC narrative + technical summary, 2026-04-11 ingest)
Related¶
- What-Im-Working-On
- ai-model-distillation — personal study (Moonshot / papers) that informed the distillation-minded on-device attempt; Boxy is the work application of that intuition before the Gemini pivot.
- tools-and-repos (add deploy URLs when known)
Known issues / notes¶
- Boxy: POC only; slow on-device analysis; TestFlight for broader tests.
- Product: Consolidation story (one surface vs two deploys) TBD.