სიახლეებზე დაბრუნება
22 Feb, 2026
Georgian KenLM Language Model (3-gram)
KenLM 3-gram language model trained on Georgian (ქართული) text data
🦉 Georgian KenLM Language Model (3-gram)
KenLM 3-gram language model trained on Georgian (ქართული) text data, intended for ASR decoding and general language modeling.
- Language
ka(Georgian)- Model Type
- KenLM n-gram
- n-gram size
3-gram- Format
.arpa- Tooling
- KenLM
View on Hugging Face
Opens in a new tab
📂 Files
ge_model9.arpa— ARPA plaintext format
📚 Training Data
Trained on a curated collection of Georgian text from multiple domains:
- News articles
- Subtitles
- Books and web content
Data was cleaned, whitespace-tokenized, and normalized to standard Georgian orthography.
💬 Intended Use
Ideal for:
- Beam search decoding in ASR systems (e.g., Whisper, DeepSpeech, Vosk)
- Scoring and reranking ASR hypotheses
- Basic text modeling or Georgian spelling correction
Quick integration note
If you’re using a CTC decoder (e.g., pyctcdecode), you typically pair this .arpa
with a tokenizer/lexicon and plug it into beam search scoring.
22 Feb, 2026, 15:29
ყველა სიახლე