OLMo-2 1B Distilled (reasoning traces)

hbfreed/Olmo-2-1B-Distilled is a small (1B parameter) text model that’s explicitly trained to produce reasoning traces. Rather than being a generic instruct checkpoint, it’s positioned as an experiment in “reasoning distillation”: the model learns to emit an internal <think>...</think> block before the final answer.

The model card gives a concrete training story. It starts from allenai/OLMo-2-0425-1B-Instruct, does an SFT stage on examples in the <think> format, then runs an on-policy distillation loop where the student generates rollouts and a larger teacher (allenai/Olmo-3-7B-Think) provides token-level supervision (reverse KL). If you’re evaluating small models for agent-like tasks, this is a nice checkpoint to test because it makes multi-step behavior visible: try prompting for step-by-step reasoning on a few math/logic questions, then compare output stability and verbosity against a “plain” 1B instruct model.

Quick stats from the listing feed: pipeline: text-generation · 0 likes · 211 downloads.

View on Hugging Face

Source listing: https://huggingface.co/models?sort=modified