AWS Neuron optimum model cache

This is not a model checkpoint, but a “plumbing” repo: it hosts cached AWS Neuron compilation artifacts for popular Hugging Face Hub models. If you’re deploying LLMs on AWS Inferentia or Trainium, a common bottleneck is the compile step (exporting a model and producing Neuron-compatible artifacts). This cache is meant to make that workflow much more repeatable and less time-consuming, but coverage is configuration-specific (model + hardware + sequence length), so you still need to verify your exact setup is included.

In practice, it plugs into the existing AWS Neuron ecosystem: optimum-neuron and NeuronX TGI can look up and reuse cached configs instead of compiling from scratch, and the model page supports a “Deploy → AWS SageMaker” flow with an Inferentia/Trainium-specific snippet. If you’re experimenting, a good first step is to run optimum-cli neuron cache lookup for the model you care about and see whether there’s a cached configuration that matches your target hardware and sequence length.

Quick stats from the listing feed: 28 likes.

View on Hugging Face

Source listing: https://huggingface.co/models?sort=modified