Web Analytics

paasaa

⭐ 140 stars English by minibikini

Paasaa

Elixir CI Coverage Status Hex.pm Hex.pm

Paasaa is an Elixir library for robust natural language and script detection. It achieves this through statistical analysis of character n-grams and Unicode script properties, without relying on AI. It helps in tasks like text processing, natural language understanding, or internationalization by accurately identifying the writing system and human language of a given text.

API Documentation | Hex Package

Installation

Add paasaa to your list of dependencies in mix.exs:

def deps do
  [{:paasaa, "~> 1.0.0"}]
end

After you are done, run mix deps.get in your shell to fetch and compile Paasaa.

Usage

Detect a language:

iex> Paasaa.detect("Detect this!")
"eng"
Detect language and return a scored list of languages:

iex> Paasaa.all("Detect this!")
[
  {"eng", 1.0},
  {"sco", 0.8230731943771207},
  {"nob", 0.6030053320407174},
  {"nno", 0.5525933107125545},
  ...
]

Detect a script:

iex> Paasaa.detect_script("Detect this!")
{"Latin", 0.8333333333333334}

使用选项的高级用法

detect/2all/2 函数接受一个关键字列表的选项来控制它们的行为。

白名单和黑名单语言

您可以限制可能的语言集合。如果您已经知道文本必须是几种语言中的一种,或者您想排除一个常见的误报,这将非常有用。

# Exclude English to find the next most likely language
iex> Paasaa.detect("Detect this!", blacklist: ["eng"])
"sco"

Only consider Polish and Serbian

iex> text = "Pošto je priznavanje urođenog dostojanstva i jednakih i neotuđivih prava..." iex> Paasaa.detect(text, whitelist: ["pol", "srp"]) "srp"

Set Minimum Text Length

By default, Paasaa returns "und" for very short strings. You can adjust this threshold with :min_length.

iex> Paasaa.detect("Привет", min_length: 10)
"und"

iex> Paasaa.detect("Привет", min_length: 6) "rus"

Supported Languages

For a full list of supported languages, please see LANGUAGES.md.

Contributing

Contributions are welcome! Please feel free to open an issue or submit a pull request on GitHub.

If you are updating the language data, you can regenerate the necessary modules with the following command:

mix run script/generate_language_data.exs

Derivation

Paasaa is a derivative work from Franc (JavaScript, MIT) by Titus Wormer.

License

MIT © Egor Kislitsyn

--- Tranlated By Open Ai Tx | Last indexed: 2026-03-04 ---