Education startup (UX / parent app)¶

Field	Value
Status	Active
Type	Work

Description¶

UX and product build for a parent-facing education startup app focused on asynchronous feedback: parents get coaching-style commentary on an activity with their child (often video-based).

Boxy — on-device video/audio POC (exploratory)¶

Goal: Test whether lightweight, rules-based coaching — combining caregiver speech with basic infant vocalization cues — is enough to unlock parent participation before heavier investment. Not positioned as the final technical approach.

Privacy / scope: Native iOS POC: parents import clips (e.g. from Photos), all analysis on device (intentional for feasibility); tradeoff is TestFlight-style access for testers and minutes-long analysis per video (async UX under exploration).

Iteration 1 — Vision + speech¶

SwiftUI app: local library, Apple Vision (framing/proximity from sampled frames) + on-device speech (pacing, pauses, rough turn-taking, transcript sentiment proxy), Settings to wipe data, DEBUG re-analyze; AVAudioSession + unmuted AVPlayer for audible playback.
Result: Pipeline worked technically, but models missed infant / parentese-like audio — mostly adult speech only.

Iteration 2 — Speech-only + custom coo detector (current POC)¶

Vision off; speech-only interaction analysis with heuristic coach copy from transcript + audio cues; privacy/UI copy updated.
Caregiver speech: WhisperKit on-device (medium / large class models); speedups from smaller “distilled” Whisper options + a perf bug fix.
Infant cues: Custom baby-coo classifier — trained offline, shipped as Core ML (BabyCooClassifier.mlpackage), same mel frontend as training, log_mel per export JSON, class_logits → [not_coo, coo] (softmax for score), sliding chunks over clip audio alongside Whisper.
Output: Rules-based coaching blends what the parent said with “possible coo here” moments; timeline of coo-like regions; notes tie to parent lines before/after, quiet / turn-taking space, and slightly more specific copy when data allows.

Training / tooling (offline)¶

FastAPI labeling + training pipeline: Distil-Whisper LoRA, baby-coo CNN, optional Core ML export for the coo model; end-to-end runs on a small set of home / usability videos after stack work (venv, ffmpeg PATH + subprocess, Starlette/Jinja TemplateResponse, transformers/PEFT API drift — e.g. SEQ_2_SEQ_LM removal so Whisper + LoRA runs).

Caveats (explicit)¶

Latency: Analysis still several minutes per video on device; may improve with processing or async product framing.
Reach: Native iOS POC → TestFlight for external testing.

Full verbatim notes: raw/education-startup-boxy-poc.md.

Gemini (cloud) — primary product prototype¶

After the on-device path proved too weak for full coaching quality (and Gemini could cover the coaching job the small local stack was chasing), the main prototype moved cloud-side:

Video analysis: Gemini on an uploaded clip — high-quality coaching feedback in about 16 seconds for a 2:30 video, minimal tuning.
Toy photos + style prompt: Activity ideas + stylized paper-cut illustration; quality described as very strong.

Current prototype (split hosting)¶

Track	Hosting / infra	AI
Video	GitLab Pages (web) + Google Cloud (video upload)	Gemini API — video analysis / coaching feedback
Photo / activities	Vercel	Gemini API — activity generation + image generation

(Public URLs for Pages/Vercel not captured in wiki yet — add to raw/ when stable.)

Product themes¶

Async parent feedback (not necessarily live sessions).
Multimodal: video of parent–child activity; optional toy-photo → activities + illustration.
Historical thread: Boxy explored distillation-flavored smaller Whisper + custom detector for on-device coaching cues; Gemini is the current bet for quality on video.

Tech stack (summary)¶

Explored: iOS (SwiftUI, WhisperKit, Core ML coo model, Apple Vision in v1), FastAPI training service, Distil-Whisper + LoRA, CNN coo classifier.
Now: Web prototype; Gemini API; Google Cloud (uploads); GitLab Pages; Vercel.

Raw sources¶

raw/projects.md (early stub)
raw/education-startup-gemini-prototype.md (Gemini + prototype detail)
raw/education-startup-boxy-poc.md (Boxy POC narrative + technical summary, 2026-04-11 ingest)

What-Im-Working-On
ai-model-distillation — personal study (Moonshot / papers) that informed the distillation-minded on-device attempt; Boxy is the work application of that intuition before the Gemini pivot.
tools-and-repos (add deploy URLs when known)

Known issues / notes¶

Boxy: POC only; slow on-device analysis; TestFlight for broader tests.
Product: Consolidation story (one surface vs two deploys) TBD.