Skip to content

Education startup (UX / parent app)

Field Value
Status Active
Type Work

Description

UX and product build for a parent-facing education startup app focused on asynchronous feedback: parents get coaching-style commentary on an activity with their child (often video-based).

Boxy — on-device video/audio POC (exploratory)

Goal: Test whether lightweight, rules-based coaching — combining caregiver speech with basic infant vocalization cues — is enough to unlock parent participation before heavier investment. Not positioned as the final technical approach.

Privacy / scope: Native iOS POC: parents import clips (e.g. from Photos), all analysis on device (intentional for feasibility); tradeoff is TestFlight-style access for testers and minutes-long analysis per video (async UX under exploration).

Iteration 1 — Vision + speech

  • SwiftUI app: local library, Apple Vision (framing/proximity from sampled frames) + on-device speech (pacing, pauses, rough turn-taking, transcript sentiment proxy), Settings to wipe data, DEBUG re-analyze; AVAudioSession + unmuted AVPlayer for audible playback.
  • Result: Pipeline worked technically, but models missed infant / parentese-like audio — mostly adult speech only.

Iteration 2 — Speech-only + custom coo detector (current POC)

  • Vision off; speech-only interaction analysis with heuristic coach copy from transcript + audio cues; privacy/UI copy updated.
  • Caregiver speech: WhisperKit on-device (medium / large class models); speedups from smaller “distilled” Whisper options + a perf bug fix.
  • Infant cues: Custom baby-coo classifier — trained offline, shipped as Core ML (BabyCooClassifier.mlpackage), same mel frontend as training, log_mel per export JSON, class_logits[not_coo, coo] (softmax for score), sliding chunks over clip audio alongside Whisper.
  • Output: Rules-based coaching blends what the parent said with “possible coo here” moments; timeline of coo-like regions; notes tie to parent lines before/after, quiet / turn-taking space, and slightly more specific copy when data allows.

Training / tooling (offline)

  • FastAPI labeling + training pipeline: Distil-Whisper LoRA, baby-coo CNN, optional Core ML export for the coo model; end-to-end runs on a small set of home / usability videos after stack work (venv, ffmpeg PATH + subprocess, Starlette/Jinja TemplateResponse, transformers/PEFT API drift — e.g. SEQ_2_SEQ_LM removal so Whisper + LoRA runs).

Caveats (explicit)

  • Latency: Analysis still several minutes per video on device; may improve with processing or async product framing.
  • Reach: Native iOS POC → TestFlight for external testing.

Full verbatim notes: raw/education-startup-boxy-poc.md.

Gemini (cloud) — primary product prototype

After the on-device path proved too weak for full coaching quality (and Gemini could cover the coaching job the small local stack was chasing), the main prototype moved cloud-side:

  • Video analysis: Gemini on an uploaded clip — high-quality coaching feedback in about 16 seconds for a 2:30 video, minimal tuning.
  • Toy photos + style prompt: Activity ideas + stylized paper-cut illustration; quality described as very strong.

Current prototype (split hosting)

Track Hosting / infra AI
Video GitLab Pages (web) + Google Cloud (video upload) Gemini APIvideo analysis / coaching feedback
Photo / activities Vercel Gemini APIactivity generation + image generation

(Public URLs for Pages/Vercel not captured in wiki yet — add to raw/ when stable.)

Product themes

  • Async parent feedback (not necessarily live sessions).
  • Multimodal: video of parent–child activity; optional toy-photo → activities + illustration.
  • Historical thread: Boxy explored distillation-flavored smaller Whisper + custom detector for on-device coaching cues; Gemini is the current bet for quality on video.

Tech stack (summary)

  • Explored: iOS (SwiftUI, WhisperKit, Core ML coo model, Apple Vision in v1), FastAPI training service, Distil-Whisper + LoRA, CNN coo classifier.
  • Now: Web prototype; Gemini API; Google Cloud (uploads); GitLab Pages; Vercel.

Raw sources

  • raw/projects.md (early stub)
  • raw/education-startup-gemini-prototype.md (Gemini + prototype detail)
  • raw/education-startup-boxy-poc.md (Boxy POC narrative + technical summary, 2026-04-11 ingest)

Known issues / notes

  • Boxy: POC only; slow on-device analysis; TestFlight for broader tests.
  • Product: Consolidation story (one surface vs two deploys) TBD.