Whatlang strikes back

Serhii Potapov April 18, 2021 #rust #whatlang #library

I am happy to announce a release of a new version (0.12.0) of whatlang.

It's the biggest release since the crate was published in 2016. It went through some refactoring and restructure of its internals in order to enable support of multiple detection methods, so the library can provide better quality results.

The full list of changes can be seen in the changelog.

With that I also created a homepage for Whatlang: https://whatlang.org/. This a little single page application implemented fully in Rust (thanks to Seed), where you can interactively play with the library and take a little look inside of it (if you click "Debug" tab).

Who uses whatlang?

There is at least one major user of whatlang, which I am very proud of, it's Sonic, a fast and lightweight search backend, written in also Rust.

Story behind the release

I'll be honest, the trigger for the release was the announcement of Lingua. More precisely the comments I read there about Whatlang. Because I do not use Whatlang actively myself for my needs, I do not eat "my dog's food", so I was not really aware that many users were disappointed that much with the results of library on short text inputs. So I decided to spent 2 weeks of my vacation in January in order to bring some improvements. Whatlang is still lagging Lingua in accuracy benchmarks for short texts, but it's one leap forward.

Whatlang VS Lingua comparison

One may ask how does Whatlang differ from Lingua, and why someone should consider using Whatlang if Lingua produces better result? Here are some points:

For text inputs over 100 characters Whatlang and Lingua produces more or less similar reliable result
Whatlang is very lightweight and has only one dependency (hashbrown), while Lingua has over 10 deps
Whatlang adds only 3.5Mb to your binary, moreover it's only ~530Kb when it's compiled into WASM. Lingua adds a little bit more than 100Mb.
Whatlang just works immediately out of the box, while Lingua needs to "warm up", it takes time and memory to unzip all its heavy language models
Whatlang is meant to be fast, almost as fast as possible, a lot of effort was put into its optimization. I haven't done Lingua performance benchmarks, but I won't be surprised to discover that it's 5-10 times slower than Whatlang.

Considering this, I believe there is a place for both libraries in the ecosystem depending on a particular use case.

Happy language detection!

Back to top