Realtime AI video, open-source SUNO, next-level AI agents, realtime text-to-speech: AI NEWS

This episode is a “demo reel” style roundup of new repos and papers. The fastest way to use it is to pick one item below and try to reproduce the author’s core demo.

ShowUI‑Aloha: browser/UI agent work focused on realistic multi-step interactions.
NovaSR: speech recognition improvements packaged as a repo you can try locally.
VerseCrafter: structured video generation, pitched as “verse → video” style workflows.
UniSH: an additional “helper” model/pipeline (worth scanning for training/data choices).
ShowUI‑pi: another UI-agent variant from the ShowLab ecosystem.
AnyDepth: depth estimation improvements; useful for “3D-ish” pipelines and editing.
Flux2 Klein (tutorial): a walkthrough-style entry, good for quickly benchmarking quality.
VIBE: video-editing / identity / motion work (good to compare against other edit models).
ShapeR: 3D / reconstruction work from Meta Research.
HeartMula: a research-y exploration with an interesting demo (but expect sharp edges).
RigMo: motion/rigging-focused work that fits into animation/editing pipelines.
Pocket TTS: lightweight, repo-first text-to-speech.
PixVerse R1: a production-facing video tool; good for “what can users do today?” comparisons.

Read the original