Taste Labeling and Analysis

Sandwiches or burritos for lunch? Popcorn or candy at a movie? Anyone who has come across these kinds of decisions can attest to how important and very personal senses of taste can be, and yet taste remains one of the main deciding factors in what all of us eat. As we strive to better understand the logging patterns of our users, taste stands out as a dimension from which we can gain strong insights.

The problem, of course, is that we have no real basis or labels for taste in the MyFitnessPal database, and it is easy to imagine how obnoxious features in the app might become if we forced users to label tastes on foods themselves, every time something is logged. Like many cases in this digital age, a potential solution for this problem can be found in machine learning. Taking only a dataset of food names and (not always accurate) nutrition information, we set out to build an accurate and scalable model for predicting unknown food tastes.

The Data

As the majority of data that exists within MyFitnessPal’s ecosystem originates from user input, we find many variations, both in food names and nutritional contents, of similar foods. For example, some MyFitnessPal users rely only on calorie tracking, and do not enter macro and micronutrients for created foods. A number of initiatives have been taken on by the data science team to help identify the most accurate data, but we wanted to generate a training set of which we were absolutely sure that both the food name and nutritional contents were valid. So for this purpose, we turned to external sources—namely, the USDA food database. Since the USDA can be considered a standard for nutritional accuracy, this allows us to train our models based on both the text information in food descriptions and on the vector of nutritional contents.

The only remaining question then is: how exactly does one add taste labels onto this data, given the USDA does not label foods with flavors? One possible solution to this problem is to base labels off of context clues found in food names. Certain foods contain hints at their tastes (“Sweet Tea” means it can probably be considered sweet) and then some basic example foods of each taste can be chosen as a basis (For example, variants of meat like chicken, steak, etc., can all probably be considered savory/umami). In the end, we filtered the USDA database down to roughly 1,300 examples of foods with reasonably likely taste labels. This training set contained representative samples of all six flavors we were interested in classifying—sweet, savory/umami, salty, bitter, sour, and spicy.

The Model

Armed with our training set, we turned to a supervised machine learning algorithm to classify each food in the MyFitnessPal database. By leveraging Google’s Word2Vec, which turns words into numeric representations, we were able to use each food’s description and nutritional contents as a set of features for two separate, but parallel, weighted K-Nearest Neighbors models. The key advantage of wKNN is that it enabled us not just to make a label prediction, but to assign each food a probability of being any given flavor. This meant that we could also capture foods that might be more than one flavor, like how “caramel popcorn” should probably be considered both sweet and salty.

For testing this taste labeling model’s accuracy, as a team we hand-labeled a new set of 1,200 randomly selected foods with the tastes we thought best matched. Based only on the nutrition data of the foods, we ran a wKNN algorithm which achieved just 55% accuracy. Using the wKNN for our Word2Vec-derived features gave us a slightly better 68% accuracy. It was only when combining both the text and nutrition features together that we saw large gains in accuracy, reaching an impressive 87%! Both models together really help cover the weaknesses of the other, and we’re left with a strong machine learning technique, but the question still remains as to whether it is useful. To that end, we decided to go a step further and evaluate how our taste labels compare to the current state of the academic literature on taste, and also what new insights we might be able to glean from these machine labeled tastes in MyFitnessPal.

Findings

We first took a sample of around 100,000 active MyFitnessPal users in order to deep-dive into their logging behavior with respect to taste. Together, this sample of users accounted for hundreds of millions of food entries in our database, comprising millions of unique foods. This is an entirely new type of dataset of tastes than any created before, so one idea is to check it against what previous studies had to say about taste behaviors. If our large-scale sample could replicate findings from clinical and survey research, then it could validate our efforts.

When looking for established effects, we were led to previous research on the relationship between age and taste. Work by one of the main researchers in taste, Adam Drewnowski [A. Drewnoski and P. Monsivais. Present Knowledge in Nutrition, chapter 60. Taste and Food Choices, pages 1027–1041. John Wiley & Sons, Inc, 10 edition, 2012.] has found some pretty clear relationships in changes in taste perceptions over time. Simply put, as people age, their tolerance for bitterness increases, while their tolerance for sweetness decreases.

Our new taste data almost perfectly mirrors what this previous research has found. We see a 6.32% relative decrease in sweet foods logged, and a whopping 43% relative increase in bitter foods logged from the 18-25 age group to the 55+ age groups. Unpacking our model’s performance, this effect makes even more sense, as the foods most commonly labeled as “bitter” are caffeinated beverages, like tea and coffee, as well as alcoholic drinks, like beer. These are all things that most everyone would agree are increasingly consumed with higher age (at least, hopefully).

Some findings did leave a lot more room open for interpretation, however, such as the correlations between user BMI group and taste preference. This is a connection that does not even find a lot of agreement in the scientific literature as, more and more, the traditional logic that a diet dominated by sweet foods leads to a higher BMI is open to scrutiny [P. Togo, M. Osler, T. Sørensen, and B. Heitmann. Food intake patterns and body mass index in observational studies. International journal of obesity and related metabolic disorders: journal of the International Association for the Study of Obesity, 25(12):1741–1751, 2001]. So what does our taste data have to say about BMI? While we cannot draw any links to food taste preferences causing BMI increases, we can get a good look at the correlation between BMI and taste preference.

The figures above show that, in our active user subset, it is more common for users of a higher BMI group to log salty and/or savory tasting foods. We also see a clear decrease in sweet taste logging as BMI group increases. You might now be picturing all the thin people snacking on cookies and ice cream in their free time, but it’s important to realize that no distinction is being made here between a “healthy” kind of sweet food and an “unhealthy” one. Thus we cannot differentiate between our sweet-tooth loggers munching on apples and blueberries, or molten chocolate cake. That’s certainly an effect we hope to explore more in the future as we expand this research.

Finally, we also reach a point where our data can start to address questions that no previous taste research has really been able to answer. A main advantage of having large scale app-collected data is that it provides a window into how people actually tend to behave, when they’re only entering in data for themselves. For example, one interesting question is whether or not meals tend to have different taste distributions.

While breakfast is dominated by sweet and savory tastes, we also see a relative high level of bitter foods compared to other meals. Why bitter? Well, again, when do you think most people will be logging their teas and coffees of the day? Lunch and dinner, unsurprisingly, cater to our users’ savory sides. We also notice the highest logs of spicy foods during lunch and dinner. Finally, snacks take the cake of the day when it comes to sweet tasting foods, with also the highest number of salty foods from all those chips people might be munching on between their meals.

Further Work

While we’ve already gained many valuable insights into user taste logging patterns, there is still much to explore and expand upon on the data science front. How do our users’ taste preferences vary by region? What kinds of distributions of unique foods do we see across each taste? Can we find a pattern in tastes that dominate the diet of users successful in reaching their goals? These are a few of the many questions we hope to uncover in future analysis.

algorithmsclassificationword2vec