10.25394/PGS.8040866.v1
Amruthavarshini Talikoti
ESTIMATING PHENYLALANINE OF COMMERCIAL FOODS : A COMPARISON BETWEEN A MATHEMATICAL APPROACH AND A MACHINE LEARNING APPROACH
2019
Purdue University Graduate School
PKU
Phenylalanine
Machine Learning
nutrition facts label
Dietary Management
2019-05-14 17:39:58
article
https://hammer.figshare.com/articles/ESTIMATING_PHENYLALANINE_OF_COMMERCIAL_FOODS_A_COMPARISON_BETWEEN_A_MATHEMATICAL_APPROACH_AND_A_MACHINE_LEARNING_APPROACH/8040866
<p></p><p>Phenylketonuria (PKU) is an inherited metabolic
disorder affecting 1 in every 10,000 to 15,000 newborns in the United States
every year. Caused by a genetic mutation, PKU results in an excessive build up
of the amino acid Phenylalanine (Phe) in the body leading to symptoms including
but not limited to intellectual disability, hyperactivity, psychiatric
disorders and seizures. Most PKU patients must follow a strict diet limited in
Phe. The aim of this research study is to formulate, implement and compare
techniques for Phe estimation in commercial foods using the information on the
food label (Nutritional Fact Label and ordered ingredient list). Ideally, the
techniques should be both accurate and amenable to a user friendly
implementation as a Phe calculator that would aid PKU patients monitor their
dietary Phe intake.</p>
<p> The first
approach to solve the above problem is a mathematical one that comprises three
steps. The three steps were separately proposed as methods by Jieun Kim in her
dissertation. It was assumed that the third method, which is more
computationally expensive, was the most accurate one. However, by performing
the three methods subsequently in three different steps and combining the
results, we actually obtained better results than by merely using the third
method.</p>
<p> The first
step makes use of the protein content in the foods and Phe:protein multipliers.
The second step enumerates all the ingredients in the food and uses the minimum
and maximum Phe:protein multipliers of the ingredients along with the protein
content. The third step lists the ingredients in decreasing order of their
weights, which gives rise to inequality constraints. These constraints hold
assuming that there is no loss in the preparation process. The inequality
constraints are optimized numerically in two phases. The first involves
nutrient content estimation by approximating the ingredient amounts. The second
phase is a refinement of the above estimates using the Simplex algorithm. The
final Phe range is obtained by performing an interval intersection of the
results of the three steps. We implemented all three steps as web applications.
Our proposed three-step method yields a high accuracy of Phe estimation (error <= +/- 13.04mg Phe per serving for 90% of foods).</p>
<p> The above
mathematical procedure is contrasted against a machine learning approach that
uses the data in an existing database as training data to infer the Phe in any
given food. Specifically, we use the K-Nearest Neighbors (K-NN) classification
method using a feature vector containing the (rounded) nutrient data. In other
words, the Phe content of the test food is a weighted average of the Phe values
of the neighbors closest to it using the nutrient values as attributes. A
four-fold cross validation is carried out to determine the hyper-parameters and
the training is performed using the United States Department of Agriculture
(USDA) food nutrient database. Our tests indicate that this approach is not
very accurate for general foods (error <= +/- 50mg Phe per 100g in about 38%
of the foods tested). However, for low-protein foods which are typically
consumed by PKU patients, the accuracy increases significantly (error <= +/- 50mg Phe per 100g in over 77% foods).</p>
<p> The
machine learning approach is more user-friendly than the mathematical approach.
It is convenient, fast and easy to use as it takes into account just the
nutrient information. In contrast, the mathematical method additionally takes
as input a detailed ingredient list, which is cumbersome to be located in a
food database and entered as input. However, the Mathematical method has the
added advantage of providing error bounds for the Phe estimate. It is also more
accurate than the ML method. This may be due to the fact that for the ML
method, the nutrition facts alone are not sufficient to estimate Phe and that
additional information like the ingredients list is required. </p><br><p></p>