An 800M-parameter ASR model from Tongyi Lab focused on low-latency transcription, with strong Chinese dialect coverage and a sibling checkpoint for 31-language recognition.
A large MoE text-generation model from MiniMax (230B total / 10B active) positioned for coding and agent workflows, released under a modified MIT license.
A ComfyUI-friendly packaging of the Qwen Image Layered diffusion weights (bf16 + fp8mixed) plus a matching VAE, ready to drop into a node-based image pipeline.
An ASR stack that pairs Whisper Large v3 Turbo’s encoder with SmolLM3-3B — but swaps the adapter for a shared MoE projector (4 experts, 2 active per token) and ships custom `transformers` code.
A big menu of GGUF quant files for beyoru/Luna, including imatrix-weighted IQ quants. Useful if you run llama.cpp / Ollama-style local inference and want to trade quality for VRAM.
A recently updated Llama-style CausalLM (16 layers, 128k context, very large vocab). The card is still boilerplate, so treat it as experimental and verify license/training data before building on it.
A PyTorch checkpoint for a sequence-based trading agent (conv + Transformer encoder, 26 input features, 3 actions). There’s no model card yet, so you’ll need your own loading code and serious backtesting before using it.
A cheap-to-train ASR model: frozen Whisper encoder + small trained projector + frozen SmolLM3-3B decoder. Trained in ~24h on one A40 (~$12) and reports ~12% WER on LoquaciousSet.
A ComfyUI workflow pack for `Qwen/Qwen-Image-Edit-2509` that demos controllable edits (line/depth, pose, masks, outpainting, try-on) and lists the required custom nodes + a companion Lightning LoRA.
A static GGUF quant pack of `huihui-ai/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated`, curated by `mradermacher`. Useful if you want ready-made Q4/Q2 artifacts for llama.cpp-style local inference.
A BS-RoFormer checkpoint for music source separation (vocals vs instrumental) with MVSEP-reported metrics and “v2” weights for both stems. Best suited for people already running a BS-RoFormer pipeline.
A curated bundle of Wan 2.1 text-to-video checkpoints packaged for Wan2GP, aiming to make open-source video generation usable on lower-VRAM (even older) GPUs via a simple web UI.
A GGUF-quantized packaging of Magic-Wan-Image V2.0 for text-to-image, aimed at easier/lighter local inference. Useful if you specifically want GGUF artifacts instead of a standard Diffusers checkpoint.
A text-to-speech model targeting clear 48kHz audio with very high throughput (claims 100x realtime via lmdeploy + batching) while fitting in ~6GB VRAM. Worth a look if you need low-latency TTS for apps.
Prebuilt sherpa-onnx native bundles (incl. Android + ONNX Runtime variants) so you can ship offline speech features without building native deps from scratch.
A grab-bag of RWKV weights packaged for mobile/web runtimes (WebRWKV `.st`/`.prefab`) plus GGUF quantizations for llama.cpp-style runners.
Tencent’s HY-World 1.5 “WorldPlay” aims to stream an interactive world model with real-time latency, long-horizon geometric consistency, and explicit keyboard/mouse action control.
Cache of precompiled AWS Neuron artifacts so popular Hub models deploy much faster on Inferentia/Trainium (via `optimum-neuron` / NeuronX TGI).
AGPL-3.0 English abstractive summarizer based on DistilBART (CNN/DM), positioned for fast, lightweight news/article summarization.
A GGUF-packaged T5-XXL text encoder (from `google/t5-v1_1-xxl`) intended for local pipelines that load encoders from `./models/text_encoders` (Apache-2.0).
A grab-bag Hugging Face repo that looks like a practical ComfyUI-friendly bundle: LoRAs, upscalers, and assorted model/tool artifacts you can mix into image/video workflows.
A likely Hindi↔English translation checkpoint (Marian-style) uploaded as training checkpoints; useful if you’re experimenting with custom MT or fine-tuning rather than looking for a polished packaged release.
An experimental-looking repo that ships multiple epoch artifacts (GGUF weights + a finetuned wav2vec checkpoint), suggesting a “sync” pipeline that mixes audio features with a GGUF model.
A 6B bilingual (Chinese/English) text-to-image model focused on legible text rendering, photorealism, and efficient deployment.
A compact 1.1B-parameter LLM released as GGUF for llama.cpp-style local inference (CC-BY-NC-2.0).
Text-to-video weights + inference code targeting native 4K generation, reporting a 10× speedup over naive 4K video generation (Apache-2.0 + Wan license).