Models

Fresh model releases and updates pulled from Hugging Face.

Browse the source feed →
Charformer Turkish base model (character-level T5)
Hugging Face · Feb 19, 2026

A Turkish encoder-decoder model that operates at the character level (Charformer-style), skipping tokenizers. Potentially useful for noisy text and spelling variations, but still marked as in-development.

Cirilla 0.3B 4E (tiny MoE Witcher lore model)
text-generation · Feb 19, 2026

A 229M-parameter sparse MoE model trained on Witcher lore and synthetic instruction data. Not compatible with vanilla Transformers; it runs via the author’s `cirilla` Python package.

DualTowerVLM bootstrap checkpoint (dual-tower VLM)
image-text-to-text · Feb 19, 2026

An early DualTowerVLM checkpoint: separate vision and text towers, fused later for multimodal generation. Interesting if you’re experimenting with VLM architecture and representation fusion.

Kimi K2.5 (GGUF imatrix quants)
text-generation · Feb 17, 2026

GGUF imatrix quant pack of moonshotai/Kimi-K2.5 for llama.cpp-style runners (LM Studio, KoboldCpp, etc.).

ERNIE-4.5-VL 28B-A3B Thinking
image-text-to-text · Feb 17, 2026

Apache-2.0 multimodal MoE model focused on visual reasoning, grounding, and tool use, with a Transformers quickstart.

MiniMax-M2.5 (EXL3 quant pack)
text-generation · Feb 17, 2026

EXL3 quant pack for MiniMax-M2.5 (2–8 bpw), built with ExLlamaV3 for fast GPU inference and easy quality/VRAM tradeoffs.

SoulX-FlashHead 1.3B (real-time talking heads)
image-to-video · Feb 17, 2026

Talking-head generator aimed at real-time streaming; ships Lite/Pro checkpoints plus an open dataset (VividHead) and inference code.

Xoron-Dev MultiMoe (multimodal MoE)
any-to-any · Feb 17, 2026

Ambitious multimodal MoE claiming unified text/image/video/audio generation plus tool use; useful as a research reference even if early.

MiniCPM-o 4.5 (9B full-duplex omnimodal)
any-to-any · Feb 03, 2026

An end-to-end 9B vision+speech model that can see, listen, and speak in real time (full-duplex), with demos for continuous audio/video streaming and multiple deployment paths like vLLM, SGLang, Ollama, and llama.cpp.

Qwen3-Coder 30B-A3B Heretic (imatrix GGUF)
text-generation · Feb 03, 2026

An imatrix-weighted GGUF quant pack for `dpankros/Qwen3-Coder-30B-A3B-Instruct-Heretic`, with IQ and Q*K variants intended for llama.cpp-style runners and a downloadable `.imatrix.gguf` for producing your own quants.

OLMo-2 1B Distilled (reasoning traces)
text-generation · Feb 03, 2026

A 1B-parameter OLMo-2 model trained via on-policy distillation to emit `<think>...</think>` reasoning traces, using `allenai/Olmo-3-7B-Think` as a teacher for token-level supervision.

MewZoom 4X (super-resolution)
image-to-image · Feb 01, 2026

A lightweight 4× super-resolution model that upscales while deblurring/denoising, with a Python package (`ultrazoom`) and optional “control” variants to tune enhancement strength.

MiniMax-M2.1-REAP-30 (imatrix GGUF quants)
text-generation · Feb 01, 2026

A llama.cpp-friendly set of GGUF quants (plus an imatrix file) for `0xSero/MiniMax-M2.1-REAP-30`, with practical guidance on which IQ/Q4/Q6 quant types to try first.

Thai Wav2Vec2 with CommonVoice V8 (newmm tokenizer) + language model
automatic-speech-recognition · Feb 01, 2026

A Thai ASR wav2vec2-large-xlsr-53 fine-tune using Common Voice V8 (+ earlier Thai CV splits), with reported WER/CER and a language model that materially improves decoding quality.

ACE-V1.1 (MRI brain tumor detection)
object-detection · Jan 19, 2026

A YOLOv11 fine-tune for detecting brain tumors in MRI images, with author-reported mAP50 ~0.9 and a focus on reducing false positives; non-commercial CC-BY-NC-4.0 license.

Chroma1 Radiance (architecture release)
Hugging Face · Jan 19, 2026

A lightweight model card that mainly ships an architecture diagram for “Radiance,” plus a growing ecosystem of adapters, finetunes, and quantizations built on top of the base checkpoint.

Wraith 8B (GGUF imatrix quants)
text-generation · Jan 19, 2026

Imatrix-weighted GGUF quant pack for `vanta-research/wraith-8b`, with many IQ/Q* variants for llama.cpp-style local inference across different memory/quality tradeoffs.

Asterisk (hybrid ASPP-attention SmolLM2)
text-generation · Jan 18, 2026

A small research LLM that mixes a graph-style ASPP operator with standard attention, aiming for better structured reasoning than plain SmolLM2-135M (Apache-2.0).

Havelock Orality Analyzer (oral vs literate classifier)
text-classification · Jan 18, 2026

A BERT-based classifier that scores text on an oral→literate spectrum (Walter Ong), plus span-level marker classifiers for category/subtype labeling (MIT).

QuickMT fr→en (fast CTranslate2 translation)
translation · Jan 18, 2026

A 200M-parameter French→English MT model exported to CTranslate2 for speed, with reported FLORES devtest metrics and a simple Python API via `quickmt`.

Wan video (GGUF pack)
text-to-video · Jan 17, 2026

A grab-and-go Hugging Face repo that aggregates GGUF quants and related assets for Wan 2.1/2.2 video generation (T2V/I2V), plus a big index of practical ComfyUI / diffusion tutorials from the SECourses channel.

LightingRemap Alpha (Qwen Image Edit LoRA)
text-to-image · Jan 17, 2026

An alpha LoRA for Qwen Image Edit that relights an image using simple colored blocks as “virtual lights,” then removes the blocks in the final output (handy for quick cinematic lighting experiments).

LeRobot xvla-base (robot policy)
robotics · Jan 17, 2026

A baseline robotics policy repo for Hugging Face LeRobot using the `xvla` policy type, with copy-pastable commands to train on your dataset and run evaluation episodes (Apache-2.0).

GigaCheck-Detector-Multi
token-classification · Jan 16, 2026

A multilingual, span-level AI text detector from the LLMTrace project, designed to localize which parts of a document look AI-written (not just classify the whole thing).

SmolVLA Piper
robotics · Jan 16, 2026

A LeRobot-trained robotics policy built on SmolVLA, intended for a Piper arm setup and packaged as a ready-to-run vision-language-action checkpoint.

Tiny Audio
automatic-speech-recognition · Jan 16, 2026

A lightweight “large audio model” aimed at practical ASR and audio captioning, with a dual-headed architecture (audio encoder + text decoder) and an MIT license.

Paged Attention kernels (vLLM + mistral.rs)
Hugging Face · Jan 15, 2026

A small but useful kernel drop: paged-attention implementations pulled from vLLM and mistral.rs, handy if you’re building or benchmarking your own GPU inference runtime.

LongCat Flash Thinking 2601
text-generation · Jan 15, 2026

A MIT-licensed 560B MoE “thinking” model geared for agentic tool use, with a chat template that supports interleaved reasoning and function calling while keeping token budgets under control.

Ministral 3 14B Instruct 2512
text-generation · Jan 15, 2026

Mistral’s edge-leaning 14B FP8 instruct model adds vision, function calling, and a 256k context window—strong chat/tool use that can fit in ~24GB VRAM (less with quantization).

Kanana-2 30B-A3B Instruct
text-generation · Jan 14, 2026

Kakao’s Kanana-2 30B-A3B instruct model targets agentic use cases with stronger tool calling and reasoning, using an MoE design (30B total / ~3B active) and a native 32k context window.

LTX-2 Model Card
image-to-video · Jan 14, 2026

LTX-2 is an open-weights diffusion (DiT) model for generating video with synchronized audio, with multiple checkpoints (full, fp8/fp4, distilled) plus spatial/temporal upscalers and ComfyUI/Diffusers integrations.

RSI-AI V1.0 (GGUF imatrix quants)
text-generation · Jan 14, 2026

This is a set of imatrix-weighted GGUF quantizations for `EpistemeAI/RSI-AI-V1.0`, intended for llama.cpp-style local inference with size/speed tradeoffs across many quant variants.

LTXV2 for ComfyUI (video VAE fix)
text-to-video · Jan 13, 2026

ComfyUI-ready LTXV2 video model files plus a corrected video VAE that should improve detail vs earlier extracted checkpoints.

Qwen3-Next 80B-A3B Thinking (AWQ 4-bit)
text-generation · Jan 13, 2026

AWQ 4-bit quant of Qwen3-Next-80B-A3B-Thinking (80B total / ~3B active) cutting weight memory to ~46 GB for long-context reasoning.

extBanglaT5 (BanglaT5 fine-tune)
translation · Jan 12, 2026

A lightweight BanglaT5 fine-tune for Bengali (bn) ↔ English (en) translation-style text2text tasks, published under Apache-2.0.

Qwen3-Next 80B-A3B Thinking (GGUF)
text-generation · Jan 12, 2026

GGUF quant pack for Qwen3-Next-80B-A3B-Thinking: a high-sparsity MoE reasoning model (80B total / ~3B active) with 262k native context.

Complexity Deep 150M
text-generation · Jan 11, 2026

An early 150M experimental LM exploring deterministic token-routed MoE and a robotics-inspired control layer; interesting ideas, but the authors say it’s not coherent yet.

Big GPT OSS i1 GGUF
text-generation · Jan 11, 2026

Imatrix-weighted GGUF quant set of `suayptalha/big-gpt-oss` for llama.cpp-style runtimes, including an imatrix file for making your own quants and recommended Q4_K_M/Q6_K options.

StanislavKo28 music moods classification
audio-classification · Jan 11, 2026

An Apache-2.0 wav2vec2-based classifier that predicts 14 music moods from 30-second clips, with a public Kaggle dataset + notebooks and reported eval metrics.

Sopro TTS
text-to-speech · Jan 10, 2026

169M English TTS with streaming + zero-shot voice cloning; runs on CPU, but long generations can hallucinate.

ALIA-40B Instruct
text-generation · Jan 09, 2026

A 40B multilingual assistant (35 European languages) tuned for very long context (~160k) and released under Apache-2.0, with training scripts available.

Reasoner Llama 3.1 70B V2 (imatrix GGUF)
text-generation · Jan 09, 2026

Imatrix-weighted GGUF quants of a Reasoner-tuned Llama 3.1 70B checkpoint, giving llama.cpp users a menu of sizes from IQ1 to Q6.

VAETKI-VL 7B-A1B (GGUF)
image-text-to-text · Jan 09, 2026

A 7.6B MoE vision-language model (1.2B active) converted to GGUF for llama.cpp, with separate text weights + `mmproj` for Korean/English multimodal prompts.

VieNeu-TTS 0.3B
text-to-speech · Jan 08, 2026

A fast Vietnamese text-to-speech model (0.3B) built for offline/on-device synthesis and instant voice cloning, with GGUF Q4/Q8 variants for CPU/mobile.

LFM2.5-1.2B-Instruct
text-generation · Jan 07, 2026

A 1.2B-parameter on-device chat model tuned for fast local inference and long-context extraction/RAG and agent-style workflows, not for heavy coding or deep-knowledge tasks.

Gemma 3 4B T1-it (GGUF collection)
text-generation · Jan 07, 2026

Ready-to-run GGUF quantizations of the Taiwan-focused `twinkle-ai/gemma-3-4B-T1-it` so you can choose a speed/quality tradeoff and run Gemma 3 locally with `llama.cpp`.

Whisper-medium fine-tuned for Teochew ASR
automatic-speech-recognition · Jan 07, 2026

A `whisper-medium` fine-tune for Teochew (潮州话) ASR using a custom orthography to reduce ambiguity across dialectal variants, trained on the open `teochew_wild` dataset.

VAETKI 112B-A10B (MoE)
text-generation · Jan 06, 2026

A 112B-parameter (10B active) MoE LLM from NC-AI’s 13-org consortium, with 32k context, Korean/English/Chinese/Japanese support, and an MIT license.

Prosty Język
text-generation · Jan 06, 2026

A Polish “plain language” assistant built on Bielik-4.5B, packaged as a local/offline llamafile app for rewriting bureaucratic text into simpler Polish (Apache-2.0).

Qwen3 0.6B GGUF (high-fidelity quants)
text-generation · Jan 06, 2026

A GGUF release of Qwen3-0.6B with multiple high-fidelity quants (including Q3_HIFI) for llama.cpp/LM Studio—useful when you need an ultra-small, offline model on limited hardware.

GLM-4.7 GGUF (imatrix)
text-generation · Jan 05, 2026

Imatrix (importance-matrix) GGUF quants of ZhipuAI’s GLM-4.7, packaged for llama.cpp-style local inference across a wide range of sizes.

DistilBART news summarizer
summarization · Jan 05, 2026

A distilled BART (CNN/DM-style) model tuned for abstractive news summarization—small enough for quick local runs, with a straightforward Transformers usage snippet.

T5 small spoken-typo corrector
text-generation · Jan 05, 2026

A fine-tuned T5-small for fixing missing spaces and common typos in short, conversational English—useful for chat logs, AAC phrases, and spoken-text cleanup.

GPT-OSS 20B BalitaNLP CPT
text-generation · Jan 02, 2026

A continuous-pretraining (CPT) run that adapts GPT-OSS 20B toward Filipino/Tagalog using the BalitaNLP news dataset; meant as a language-adaptation checkpoint, not an instruction-tuned assistant.

VulnLLM-R-7B (MLX 6-bit)
text-generation · Jan 02, 2026

A 6-bit MLX conversion of UCSB-SURFI's VulnLLM-R-7B (Qwen2.5-based), tuned for vulnerability detection and code analysis on Apple Silicon.

Nerdsking Python Coder 3B-i
text-generation · Jan 02, 2026

A 3B Python-focused code model with a reported 88.41 HumanEval pass@1 (bf16, zero-shot), positioned as a developer-oriented / partially uncensored assistant.

Charformer Turkish v0.3
text-generation · Dec 29, 2025

A character-level decoder-only Transformer for Turkish Wikipedia with a tiny 105-char vocab and 2K context, useful for experimenting with char-level generation.

Calibri Flux
text-to-image · Dec 29, 2025

A FLUX-compatible Diffusers checkpoint (“Calibri Flux”) with a minimal `DiffusionPipeline` example for fast 1024px generations.

Fin.AI (from-scratch GPT-2-style, 30M)
text-generation · Dec 29, 2025

An experimental 30M-parameter GPT-2-style model trained from scratch with frequent automated updates (CPU-only), plus public training logs.

AiARTiST ACG Playground SDXL V6 (Hyper)
text-to-image · Dec 28, 2025

SDXL checkpoint tuned for bright, detailed ACG/anime-style images, with an included Hyper-SDXL LoRA for fast 8-10 step generation.

Gemma 3 12B IT Heretic X GGUF (imatrix)
text-generation · Dec 28, 2025

Imatrix-weighted GGUF quants of LastRef's Gemma 3 12B instruction model (Heretic X variant) for llama.cpp-style local inference, with a wide range of IQ/Q2-Q6 sizes.

GLM-4.6V GGUF (imatrix)
image-text-to-text · Dec 28, 2025

Imatrix-weighted GGUF quants of zai-org's GLM-4.6V vision-language model for llama.cpp-compatible runtimes, including an imatrix file and multiple IQ/Q quant sizes.

GLM-4.6 GGUF (Unsloth Dynamic 2.0)
text-generation · Dec 27, 2025

Quantized GGUF builds of Zhipu’s GLM-4.6 for llama.cpp-style local inference (200K context).

Arunav Flux (FLUX.1-dev LoRA)
text-to-image · Dec 26, 2025

A FLUX.1-dev LoRA with a simple trigger word, usable via diffusers or ComfyUI for style/subject steering.

Llama 3 Meerkat 70B imatrix GGUF
text-generation · Dec 26, 2025

Imatrix-weighted GGUF quants of llama-3-meerkat-70B, aimed at higher quality local inference in llama.cpp.

Norwegian NER (nb-bert-base fine-tuned)
token-classification · Dec 26, 2025

A Norwegian (nb/nn) NER model fine-tuned from NbAiLab/nb-bert-base with ~0.93 F1 across PER/ORG/LOC/MISC.

Fun-ASR
Hugging Face · Dec 23, 2025

An 800M-parameter ASR model from Tongyi Lab focused on low-latency transcription, with strong Chinese dialect coverage and a sibling checkpoint for 31-language recognition.

Meet MiniMax-M2
text-generation · Dec 23, 2025

A large MoE text-generation model from MiniMax (230B total / 10B active) positioned for coding and agent workflows, released under a modified MIT license.

Qwen Image Layered (ComfyUI weights)
text-to-image · Dec 22, 2025

A ComfyUI-friendly packaging of the Qwen Image Layered diffusion weights (bf16 + fp8mixed) plus a matching VAE, ready to drop into a node-based image pipeline.

Tiny Audio MoE (shared projector)
automatic-speech-recognition · Dec 22, 2025

An ASR stack that pairs Whisper Large v3 Turbo’s encoder with SmolLM3-3B — but swaps the adapter for a shared MoE projector (4 experts, 2 active per token) and ships custom `transformers` code.

Luna (imatrix GGUF quants)
text-generation · Dec 22, 2025

A big menu of GGUF quant files for beyoru/Luna, including imatrix-weighted IQ quants. Useful if you run llama.cpp / Ollama-style local inference and want to trade quality for VRAM.

llasa phoneme finetune v3 (Llama CausalLM)
text-generation · Dec 21, 2025

A recently updated Llama-style CausalLM (16 layers, 128k context, very large vocab). The card is still boilerplate, so treat it as experimental and verify license/training data before building on it.

DidulaThavishaPro/ppo-gtc-trading-agent
Hugging Face · Dec 21, 2025

A PyTorch checkpoint for a sequence-based trading agent (conv + Transformer encoder, 26 input features, 3 actions). There’s no model card yet, so you’ll need your own loading code and serious backtesting before using it.

Tiny Audio
automatic-speech-recognition · Dec 21, 2025

A cheap-to-train ASR model: frozen Whisper encoder + small trained projector + frozen SmolLM3-3B decoder. Trained in ~24h on one A40 (~$12) and reports ~12% WER on LoquaciousSet.

Qwen-Edit 2509 ComfyUI workflow
image-to-image · Dec 20, 2025

A ComfyUI workflow pack for `Qwen/Qwen-Image-Edit-2509` that demos controllable edits (line/depth, pose, masks, outpainting, try-on) and lists the required custom nodes + a companion Lightning LoRA.

Huihui Qwen3-Next 80B-A3B (abliterated) GGUF
text-generation · Dec 20, 2025

A static GGUF quant pack of `huihui-ai/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated`, curated by `mradermacher`. Useful if you want ready-made Q4/Q2 artifacts for llama.cpp-style local inference.

BS-RoFormer HyperACE (music source separation)
Hugging Face · Dec 20, 2025

A BS-RoFormer checkpoint for music source separation (vocals vs instrumental) with MVSEP-reported metrics and “v2” weights for both stems. Best suited for people already running a BS-RoFormer pipeline.

DeepBeepMeep/Wan2.1
text-to-video · Dec 19, 2025

A curated bundle of Wan 2.1 text-to-video checkpoints packaged for Wan2GP, aiming to make open-source video generation usable on lower-VRAM (even older) GPUs via a simple web UI.

Magic-Wan-Image V2.0
text-to-image · Dec 19, 2025

A GGUF-quantized packaging of Magic-Wan-Image V2.0 for text-to-image, aimed at easier/lighter local inference. Useful if you specifically want GGUF artifacts instead of a standard Diffusers checkpoint.

MiraTTS (fast 48kHz text-to-speech)
text-to-speech · Dec 19, 2025

A text-to-speech model targeting clear 48kHz audio with very high throughput (claims 100x realtime via lmdeploy + batching) while fitting in ~6GB VRAM. Worth a look if you need low-latency TTS for apps.

Sherpa-ONNX prebuilt libs
Hugging Face · Dec 18, 2025

Prebuilt sherpa-onnx native bundles (incl. Android + ONNX Runtime variants) so you can ship offline speech features without building native deps from scratch.

RWKV mobile model pack (WebRWKV + GGUF)
text-generation · Dec 18, 2025

A grab-bag of RWKV weights packaged for mobile/web runtimes (WebRWKV `.st`/`.prefab`) plus GGUF quantizations for llama.cpp-style runners.

HY-WorldPlay (interactive world model)
text-to-3d · Dec 18, 2025

Tencent’s HY-World 1.5 “WorldPlay” aims to stream an interactive world model with real-time latency, long-horizon geometric consistency, and explicit keyboard/mouse action control.

AWS Neuron optimum model cache
Hugging Face · Dec 17, 2025

Cache of precompiled AWS Neuron artifacts so popular Hub models deploy much faster on Inferentia/Trainium (via `optimum-neuron` / NeuronX TGI).

News summarizer (DistilBART)
summarization · Dec 17, 2025

AGPL-3.0 English abstractive summarizer based on DistilBART (CNN/DM), positioned for fast, lightweight news/article summarization.

T5-XXL text encoder (GGUF)
feature-extraction · Dec 17, 2025

A GGUF-packaged T5-XXL text encoder (from `google/t5-v1_1-xxl`) intended for local pipelines that load encoders from `./models/text_encoders` (Apache-2.0).

Aitrepreneur/FLX
Hugging Face · Dec 16, 2025

A grab-bag Hugging Face repo that looks like a practical ComfyUI-friendly bundle: LoRAs, upscalers, and assorted model/tool artifacts you can mix into image/video workflows.

aman-singh154/hindi-english-final-model
Hugging Face · Dec 16, 2025

A likely Hindi↔English translation checkpoint (Marian-style) uploaded as training checkpoints; useful if you’re experimenting with custom MT or fine-tuning rather than looking for a polished packaged release.

argildotai/syncforge
Hugging Face · Dec 16, 2025

An experimental-looking repo that ships multiple epoch artifacts (GGUF weights + a finetuned wav2vec checkpoint), suggesting a “sync” pipeline that mixes audio features with a GGUF model.

LongCat-Image
text-to-image · Dec 16, 2025

A 6B bilingual (Chinese/English) text-to-image model focused on legible text rendering, photorealism, and efficient deployment.

Riko-1.1B (GGUF)
text-generation · Dec 16, 2025

A compact 1.1B-parameter LLM released as GGUF for llama.cpp-style local inference (CC-BY-NC-2.0).

T3-Video (native 4K text-to-video)
text-to-video · Dec 16, 2025

Text-to-video weights + inference code targeting native 4K generation, reporting a 10× speedup over naive 4K video generation (Apache-2.0 + Wan license).