Context-sensitive Spell Correction with Deep Learning

Seeking to Spell with seq2spell

“lemon meringue pie” or “lemon merengue pie”?

Though those two phrases are only one letter apart, they are actually quite different in meaning. The first refers to a delicious dessert while the second refers to a dancing pie!

In our MyFitnessPal product, spelling mistakes have long been a pain point for both our users and data scientists. Since we have the largest crowdsourced food & nutrition database in the world, we naturally have a lot of typos in our textual data. From the user perspective, these typos are annoying. From the data science perspective, they affect the quality of any analysis or modelling we do involving text.

Many of these typos are fairly easy to fix with basic spell correction techniques, but a large portion, like the meringue/merengue example above, are harder to detect and correct. Inspired by the approach taken by Tal Weiss, we set out to create our own context-sensitive spell correction system to address the problem of intelligent spell correction.

The Problem

Don’t spell checkers already exist?

Absolutely. Unfortunately, existing spell checkers actually don’t work too well on MyFitnessPal textual data for two main reasons: lack of domain specificity and lack of context sensitivity.

  1. Lack of domain specificity: Existing spell checkers are designed for general English and are not targeted towards food-related corrections. For example, consider the following misspelled food description: “low tat milk”. A traditional spell checker might correct it to “low tar milk”, which is an inappropriate suggestion for the domain of food text.
  2. Lack of context-sensitivity: Additionally, traditional spell checkers do not take into account the entire context of the line when making corrections. A traditional spell checker does not spot any errors in the description: “Milk, 1 up”, even though it should be “Milk, 1 cup” given the context.

On a technical note, spell checkers suggest proper spellings while spell correctors choose the proper spelling for a typo. Since our goal was to automatically correct typos in our food database in an unsupervised way, the suggestion-based nature of spell checkers is less applicable to this use case.

The Model: From seq2seq to seq2spell

So how did we address these limitations and make a more intelligent spell correction system? Enter deep learning. Specifically, we used a model known as seq2seq, which is short for sequence to sequence learning. Seq2seq is most commonly used for translation tasks (in fact, Google uses a version of it in Google Translate) but it can also be used for other language tasks as well.

How does seq2seq work? On a high level it mimics the way we as humans process language. Seq2seq reads in a sentence, gets a sense of the entire sentence’s “meaning” or context and then performs its assigned task.

For translating from, say, Chinese to English, at a high level it looks something like this:

We used this same intuition in developing our context-sensitive spell correction system, which we aptly named seq2spell. Here, instead of translating from one language to another, we “translate” from possibly misspelled lines in English to their corrected versions. Seq2spell reads in a possibly misspelled line, encodes it into a representation of its “meaning” and then outputs the corrected line:

Model Training

How does seq2spell learn? Seq2spell actually starts out with no knowledge of the English language whatsoever, and we never program in any rules for how it should handle spelling mistakes. Instead, seq2spell learns to spell by looking at examples. A lot of them. In training seq2spell to correct typos in MyFitnessPal food descriptions, we gave it over 147 million examples of possible misspelling and corrected spelling pairs.

How did we get so many training examples? Indeed, in many applications of deep learning, having to acquire huge amounts of training data is the main bottleneck. In this case however, we were able to generate our own training data for free! First, we isolated a relatively typo-free portion of English MyFitnessPal food descriptions by using criteria such as the number of times the food was logged as implicit signals of spelling quality. This relatively clean subset consisted of almost 3 million food descriptions. Treating this subset as ground truth, we then inserted random typos (think randomly replacing, inserting, deleting or swapping characters) on the fly during the training process. To illustrate this process, taking the correctly spelled description, “korean bbq shrimp skewers”, we can generate a misspelling for the model to learn from by deleting a space and replacing the second “k” with a “c” to yield “korean bbq shrimpscewers”.

During training, seq2spell picked up on many of the complex rules of English that we as humans might take for granted, and most importantly learned to generalize in applying these rules to typos not directly seen in training.

The Results

To test the performance of seq2spell, we benchmarked it against Hunspell, the most popular open-source spell checker, by using both to correct food description typos and ingredient name typos not seen during the training process. Since Hunspell often offers multiple correction suggestions, for experimental purposes we always took the first one it offered. While there are high-quality spell correctors out there that use context like seq2spell, they tend to be closed source, so we chose Hunspell for comparison purposes.

Our main evaluation metric was line-level precision, which is the number of proper corrections made out of total corrections made. Note that this is different from accuracy, which takes into account corrections not made and hence depends on the proportion of errors in the dataset. To help give some intuition for why accuracy can be misleading, a spell corrector which does absolutely nothing on a dataset that is 99% error-free “achieves” 99% accuracy but 0% precision. More practically, remember that since the purpose of the tool was to automatically clean up typos in our database, we wanted to be extremely sure of any corrections we proposed. Precision gives us a measure of how sure we can be that this is the case.

Seq2spell completely knocks Hunspell out when it comes to precision on both datasets. Most importantly, since seq2spell has above 95% precision, it is robust enough to use in a production setting. Since Hunspell makes more false corrections than correct ones (if we take its first suggestion as the correction), it is not reliable enough to use for automated cleanup of our database.

As astounding as these results may look, we do have to remember it was not a very fair fight to begin with! After all, since Hunspell is a spell checker (more suited to offering suggestions) and is context-dumb (since it takes a word-by-word approach), its poor showing is understandable. Nevertheless, this comparison is still useful as it gives us an idea of where seq2seq stands, and seq2spell’s high precision speaks for itself.

Here are some corrections to actual MyFitnessPal food descriptions that seq2spell made:

(Mis)spelling seq2spell Correction Hunspell Correction
Milk, 1 up Milk, 1 cup Milk, 1 up
Lemon merengue pie Lemon meringue pie Lemon merengue pie
Grilled chicken breadt Grilled chicken breast Grilled chicken bread
Wheta Hamburger Bun Wheat Hamburger Bun Theta Hamburger Bun
Hoisin Prok Tenderloin Hoisin Pork Tenderloin Hoister Grok Tenderloin

The key takeaway here is that seq2spell is able to catch errors based on context and the food-specific domain that Hunspell either miscorrects or misses completely.

Future Work

We’re still in the process of refining seq2spell to make it even better. One main area for improvement is increasing recall, the number of mistakes recovered by seq2spell out of all the mistakes in a dataset. At the moment we slightly sacrifice recall to maximize precision, but ideally we would like to increase recall as well. We’re also considering different ways of integrating seq2spell into the MyFitnessPal app from a product perspective.

More broadly, beyond spell correction, the data science team at Under Armour is hard at work thinking of other ways to creatively apply deep learning in solving tough problems. We’re excited about the potential of deep learning in improving the MyFitnessPal user experience and our other products, and seq2spell is just the beginning!