A Turkish encoder-decoder model that operates at the character level (Charformer-style), skipping tokenizers. Potentially useful for noisy text and spelling variations, but still marked as in-development.
A 229M-parameter sparse MoE model trained on Witcher lore and synthetic instruction data. Not compatible with vanilla Transformers; it runs via the author’s `cirilla` Python package.
An early DualTowerVLM checkpoint: separate vision and text towers, fused later for multimodal generation. Interesting if you’re experimenting with VLM architecture and representation fusion.
GGUF imatrix quant pack of moonshotai/Kimi-K2.5 for llama.cpp-style runners (LM Studio, KoboldCpp, etc.).
Apache-2.0 multimodal MoE model focused on visual reasoning, grounding, and tool use, with a Transformers quickstart.
EXL3 quant pack for MiniMax-M2.5 (2–8 bpw), built with ExLlamaV3 for fast GPU inference and easy quality/VRAM tradeoffs.
Talking-head generator aimed at real-time streaming; ships Lite/Pro checkpoints plus an open dataset (VividHead) and inference code.
Ambitious multimodal MoE claiming unified text/image/video/audio generation plus tool use; useful as a research reference even if early.
An end-to-end 9B vision+speech model that can see, listen, and speak in real time (full-duplex), with demos for continuous audio/video streaming and multiple deployment paths like vLLM, SGLang, Ollama, and llama.cpp.
An imatrix-weighted GGUF quant pack for `dpankros/Qwen3-Coder-30B-A3B-Instruct-Heretic`, with IQ and Q*K variants intended for llama.cpp-style runners and a downloadable `.imatrix.gguf` for producing your own quants.
A 1B-parameter OLMo-2 model trained via on-policy distillation to emit `<think>...</think>` reasoning traces, using `allenai/Olmo-3-7B-Think` as a teacher for token-level supervision.
A lightweight 4× super-resolution model that upscales while deblurring/denoising, with a Python package (`ultrazoom`) and optional “control” variants to tune enhancement strength.
A llama.cpp-friendly set of GGUF quants (plus an imatrix file) for `0xSero/MiniMax-M2.1-REAP-30`, with practical guidance on which IQ/Q4/Q6 quant types to try first.
A Thai ASR wav2vec2-large-xlsr-53 fine-tune using Common Voice V8 (+ earlier Thai CV splits), with reported WER/CER and a language model that materially improves decoding quality.
A YOLOv11 fine-tune for detecting brain tumors in MRI images, with author-reported mAP50 ~0.9 and a focus on reducing false positives; non-commercial CC-BY-NC-4.0 license.
A lightweight model card that mainly ships an architecture diagram for “Radiance,” plus a growing ecosystem of adapters, finetunes, and quantizations built on top of the base checkpoint.
Imatrix-weighted GGUF quant pack for `vanta-research/wraith-8b`, with many IQ/Q* variants for llama.cpp-style local inference across different memory/quality tradeoffs.
A small research LLM that mixes a graph-style ASPP operator with standard attention, aiming for better structured reasoning than plain SmolLM2-135M (Apache-2.0).
A BERT-based classifier that scores text on an oral→literate spectrum (Walter Ong), plus span-level marker classifiers for category/subtype labeling (MIT).
A 200M-parameter French→English MT model exported to CTranslate2 for speed, with reported FLORES devtest metrics and a simple Python API via `quickmt`.
A grab-and-go Hugging Face repo that aggregates GGUF quants and related assets for Wan 2.1/2.2 video generation (T2V/I2V), plus a big index of practical ComfyUI / diffusion tutorials from the SECourses channel.
An alpha LoRA for Qwen Image Edit that relights an image using simple colored blocks as “virtual lights,” then removes the blocks in the final output (handy for quick cinematic lighting experiments).
A baseline robotics policy repo for Hugging Face LeRobot using the `xvla` policy type, with copy-pastable commands to train on your dataset and run evaluation episodes (Apache-2.0).
A multilingual, span-level AI text detector from the LLMTrace project, designed to localize which parts of a document look AI-written (not just classify the whole thing).
A LeRobot-trained robotics policy built on SmolVLA, intended for a Piper arm setup and packaged as a ready-to-run vision-language-action checkpoint.
A lightweight “large audio model” aimed at practical ASR and audio captioning, with a dual-headed architecture (audio encoder + text decoder) and an MIT license.
A small but useful kernel drop: paged-attention implementations pulled from vLLM and mistral.rs, handy if you’re building or benchmarking your own GPU inference runtime.
A MIT-licensed 560B MoE “thinking” model geared for agentic tool use, with a chat template that supports interleaved reasoning and function calling while keeping token budgets under control.
Mistral’s edge-leaning 14B FP8 instruct model adds vision, function calling, and a 256k context window—strong chat/tool use that can fit in ~24GB VRAM (less with quantization).
Kakao’s Kanana-2 30B-A3B instruct model targets agentic use cases with stronger tool calling and reasoning, using an MoE design (30B total / ~3B active) and a native 32k context window.
LTX-2 is an open-weights diffusion (DiT) model for generating video with synchronized audio, with multiple checkpoints (full, fp8/fp4, distilled) plus spatial/temporal upscalers and ComfyUI/Diffusers integrations.
This is a set of imatrix-weighted GGUF quantizations for `EpistemeAI/RSI-AI-V1.0`, intended for llama.cpp-style local inference with size/speed tradeoffs across many quant variants.
ComfyUI-ready LTXV2 video model files plus a corrected video VAE that should improve detail vs earlier extracted checkpoints.
AWQ 4-bit quant of Qwen3-Next-80B-A3B-Thinking (80B total / ~3B active) cutting weight memory to ~46 GB for long-context reasoning.
A lightweight BanglaT5 fine-tune for Bengali (bn) ↔ English (en) translation-style text2text tasks, published under Apache-2.0.
GGUF quant pack for Qwen3-Next-80B-A3B-Thinking: a high-sparsity MoE reasoning model (80B total / ~3B active) with 262k native context.
An early 150M experimental LM exploring deterministic token-routed MoE and a robotics-inspired control layer; interesting ideas, but the authors say it’s not coherent yet.
Imatrix-weighted GGUF quant set of `suayptalha/big-gpt-oss` for llama.cpp-style runtimes, including an imatrix file for making your own quants and recommended Q4_K_M/Q6_K options.
An Apache-2.0 wav2vec2-based classifier that predicts 14 music moods from 30-second clips, with a public Kaggle dataset + notebooks and reported eval metrics.
169M English TTS with streaming + zero-shot voice cloning; runs on CPU, but long generations can hallucinate.
A 40B multilingual assistant (35 European languages) tuned for very long context (~160k) and released under Apache-2.0, with training scripts available.
Imatrix-weighted GGUF quants of a Reasoner-tuned Llama 3.1 70B checkpoint, giving llama.cpp users a menu of sizes from IQ1 to Q6.
A 7.6B MoE vision-language model (1.2B active) converted to GGUF for llama.cpp, with separate text weights + `mmproj` for Korean/English multimodal prompts.
A fast Vietnamese text-to-speech model (0.3B) built for offline/on-device synthesis and instant voice cloning, with GGUF Q4/Q8 variants for CPU/mobile.
A 1.2B-parameter on-device chat model tuned for fast local inference and long-context extraction/RAG and agent-style workflows, not for heavy coding or deep-knowledge tasks.
Ready-to-run GGUF quantizations of the Taiwan-focused `twinkle-ai/gemma-3-4B-T1-it` so you can choose a speed/quality tradeoff and run Gemma 3 locally with `llama.cpp`.
A `whisper-medium` fine-tune for Teochew (潮州话) ASR using a custom orthography to reduce ambiguity across dialectal variants, trained on the open `teochew_wild` dataset.
A 112B-parameter (10B active) MoE LLM from NC-AI’s 13-org consortium, with 32k context, Korean/English/Chinese/Japanese support, and an MIT license.
A Polish “plain language” assistant built on Bielik-4.5B, packaged as a local/offline llamafile app for rewriting bureaucratic text into simpler Polish (Apache-2.0).
A GGUF release of Qwen3-0.6B with multiple high-fidelity quants (including Q3_HIFI) for llama.cpp/LM Studio—useful when you need an ultra-small, offline model on limited hardware.
Imatrix (importance-matrix) GGUF quants of ZhipuAI’s GLM-4.7, packaged for llama.cpp-style local inference across a wide range of sizes.
A distilled BART (CNN/DM-style) model tuned for abstractive news summarization—small enough for quick local runs, with a straightforward Transformers usage snippet.
A fine-tuned T5-small for fixing missing spaces and common typos in short, conversational English—useful for chat logs, AAC phrases, and spoken-text cleanup.
A continuous-pretraining (CPT) run that adapts GPT-OSS 20B toward Filipino/Tagalog using the BalitaNLP news dataset; meant as a language-adaptation checkpoint, not an instruction-tuned assistant.
A 6-bit MLX conversion of UCSB-SURFI's VulnLLM-R-7B (Qwen2.5-based), tuned for vulnerability detection and code analysis on Apple Silicon.
A 3B Python-focused code model with a reported 88.41 HumanEval pass@1 (bf16, zero-shot), positioned as a developer-oriented / partially uncensored assistant.
A character-level decoder-only Transformer for Turkish Wikipedia with a tiny 105-char vocab and 2K context, useful for experimenting with char-level generation.
A FLUX-compatible Diffusers checkpoint (“Calibri Flux”) with a minimal `DiffusionPipeline` example for fast 1024px generations.
An experimental 30M-parameter GPT-2-style model trained from scratch with frequent automated updates (CPU-only), plus public training logs.
SDXL checkpoint tuned for bright, detailed ACG/anime-style images, with an included Hyper-SDXL LoRA for fast 8-10 step generation.
Imatrix-weighted GGUF quants of LastRef's Gemma 3 12B instruction model (Heretic X variant) for llama.cpp-style local inference, with a wide range of IQ/Q2-Q6 sizes.
Imatrix-weighted GGUF quants of zai-org's GLM-4.6V vision-language model for llama.cpp-compatible runtimes, including an imatrix file and multiple IQ/Q quant sizes.
Quantized GGUF builds of Zhipu’s GLM-4.6 for llama.cpp-style local inference (200K context).
A FLUX.1-dev LoRA with a simple trigger word, usable via diffusers or ComfyUI for style/subject steering.
Imatrix-weighted GGUF quants of llama-3-meerkat-70B, aimed at higher quality local inference in llama.cpp.
A Norwegian (nb/nn) NER model fine-tuned from NbAiLab/nb-bert-base with ~0.93 F1 across PER/ORG/LOC/MISC.
An 800M-parameter ASR model from Tongyi Lab focused on low-latency transcription, with strong Chinese dialect coverage and a sibling checkpoint for 31-language recognition.
A large MoE text-generation model from MiniMax (230B total / 10B active) positioned for coding and agent workflows, released under a modified MIT license.
A ComfyUI-friendly packaging of the Qwen Image Layered diffusion weights (bf16 + fp8mixed) plus a matching VAE, ready to drop into a node-based image pipeline.
An ASR stack that pairs Whisper Large v3 Turbo’s encoder with SmolLM3-3B — but swaps the adapter for a shared MoE projector (4 experts, 2 active per token) and ships custom `transformers` code.
A big menu of GGUF quant files for beyoru/Luna, including imatrix-weighted IQ quants. Useful if you run llama.cpp / Ollama-style local inference and want to trade quality for VRAM.
A recently updated Llama-style CausalLM (16 layers, 128k context, very large vocab). The card is still boilerplate, so treat it as experimental and verify license/training data before building on it.
A PyTorch checkpoint for a sequence-based trading agent (conv + Transformer encoder, 26 input features, 3 actions). There’s no model card yet, so you’ll need your own loading code and serious backtesting before using it.
A cheap-to-train ASR model: frozen Whisper encoder + small trained projector + frozen SmolLM3-3B decoder. Trained in ~24h on one A40 (~$12) and reports ~12% WER on LoquaciousSet.
A ComfyUI workflow pack for `Qwen/Qwen-Image-Edit-2509` that demos controllable edits (line/depth, pose, masks, outpainting, try-on) and lists the required custom nodes + a companion Lightning LoRA.
A static GGUF quant pack of `huihui-ai/Huihui-Qwen3-Next-80B-A3B-Instruct-abliterated`, curated by `mradermacher`. Useful if you want ready-made Q4/Q2 artifacts for llama.cpp-style local inference.
A BS-RoFormer checkpoint for music source separation (vocals vs instrumental) with MVSEP-reported metrics and “v2” weights for both stems. Best suited for people already running a BS-RoFormer pipeline.
A curated bundle of Wan 2.1 text-to-video checkpoints packaged for Wan2GP, aiming to make open-source video generation usable on lower-VRAM (even older) GPUs via a simple web UI.
A GGUF-quantized packaging of Magic-Wan-Image V2.0 for text-to-image, aimed at easier/lighter local inference. Useful if you specifically want GGUF artifacts instead of a standard Diffusers checkpoint.
A text-to-speech model targeting clear 48kHz audio with very high throughput (claims 100x realtime via lmdeploy + batching) while fitting in ~6GB VRAM. Worth a look if you need low-latency TTS for apps.
Prebuilt sherpa-onnx native bundles (incl. Android + ONNX Runtime variants) so you can ship offline speech features without building native deps from scratch.
A grab-bag of RWKV weights packaged for mobile/web runtimes (WebRWKV `.st`/`.prefab`) plus GGUF quantizations for llama.cpp-style runners.
Tencent’s HY-World 1.5 “WorldPlay” aims to stream an interactive world model with real-time latency, long-horizon geometric consistency, and explicit keyboard/mouse action control.
Cache of precompiled AWS Neuron artifacts so popular Hub models deploy much faster on Inferentia/Trainium (via `optimum-neuron` / NeuronX TGI).
AGPL-3.0 English abstractive summarizer based on DistilBART (CNN/DM), positioned for fast, lightweight news/article summarization.
A GGUF-packaged T5-XXL text encoder (from `google/t5-v1_1-xxl`) intended for local pipelines that load encoders from `./models/text_encoders` (Apache-2.0).
A grab-bag Hugging Face repo that looks like a practical ComfyUI-friendly bundle: LoRAs, upscalers, and assorted model/tool artifacts you can mix into image/video workflows.
A likely Hindi↔English translation checkpoint (Marian-style) uploaded as training checkpoints; useful if you’re experimenting with custom MT or fine-tuning rather than looking for a polished packaged release.
An experimental-looking repo that ships multiple epoch artifacts (GGUF weights + a finetuned wav2vec checkpoint), suggesting a “sync” pipeline that mixes audio features with a GGUF model.
A 6B bilingual (Chinese/English) text-to-image model focused on legible text rendering, photorealism, and efficient deployment.
A compact 1.1B-parameter LLM released as GGUF for llama.cpp-style local inference (CC-BY-NC-2.0).
Text-to-video weights + inference code targeting native 4K generation, reporting a 10× speedup over naive 4K video generation (Apache-2.0 + Wan license).