Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

Z80-μLM is a delightfully pragmatic take on “tiny language models”: instead of chasing fluency, it focuses on personality-driven, short responses that can run on extremely constrained hardware. The project trains a small neural network in Python with quantization-aware training, then exports inference code and weights into a single CP/M .COM program that can run on a 1970s-era Z80 with classic 64KB limits.

Technically, it’s closer to a clever character-level classifier than a modern chat model: user input is hashed into trigram buckets (typo-tolerant and mostly word-order invariant), and the network uses 2-bit weights and integer-only math to keep inference feasible. The repo includes examples like a terse “tinychat” bot and a 20-questions-style game, plus tooling to generate and balance training data. If you like retrocomputing or embedded ML, a fun first experiment is to train a model on your own short Q&A or “yes/no/maybe” dataset, then run it in a CP/M emulator to see what kinds of interaction patterns still feel surprisingly “alive” even without deep context.

Read the original