Last time we got to the point of having an app that aggregates nutritional values for some ingredients and displays them as a % of the recommended daily amounts for the user given their age and gender. This time we'll take a critical look at our app and its shortfalls.

Free play

We'll start our search by playing with the app freely, trying out scenarios that we'd expect to work or not to work.

Cases that we'd expect not to work are easy enough to generate:

  • missing ingredient name
  • invalid ingredient name
  • invalid unit
  • invalid number
  • multiple amounts
  • multiple ingredient quantities

Cases that we'd expect to work are a bit harder. There are many possibilities. One approach is to try real recipes, fix issues and track whether the rate of new issues declines. First let's try the failure cases.

Missing ingredient name

If we enter:

4 tbsp

no results are displayed. Arguably this is correct behaviour because no ingredients can be matched. What about...?

1 potato
4 tbsp

We get matched ingredients:

  • 1 x potato
  • 4 tbsp

This is clearly wrong. Firstly, 4 tbsp is not an ingredient. Secondly, it's inconsistent with the previous result where 4 tbsp was unmatched.

We also get invalid nutrient values - all the numbers are poisoned with NaN.

It looks like unmatched lines are not being handled gracefully and are trashing the calculation. We need to filter them out before they can wreak havoc.

Invalid ingredient name

Let's try and match a non-existent ingredient:

3 cups bilge

Again no results - looks reasonable. However, add in a spud and we're in trouble again:

1 potato
3 cups bilge

As with the missing ingredient name, we get the quantity appearing as a matched ingredient and NaNs for all the nutrient values.

Invalid unit

What happens if we use a unit that isn't recognised by the classifier? How about the following?

10 parsecs zucchini

The result is actually ok... the unrecognised unit is ignored and only the number and ingredient are matched:

  • 10 x zucchini

It's not what was asked for but we don't know what a parsec is and we didn't poison the calculation by feeding it with NaNs

Invalid number

Let's try and break the number classification:

.5 potato
1..5 potato
one potato
two potato
1. potato

This produces some incorrect matches:

  • 5 x potato
  • 5 x potato
  • 1 x potato
  • 1 x potato
  • 1 x potato

but again, they haven't poisoned the calculation. Arguably word numbers up to ten should be understood and so should decimals with no leading zero. 1..5 is plainly invalid input so we can just be thankful it didn't break anything.

Multiple amounts

Let's try the following:

2 cups 4 tbsp spread

The first amount is taken and the second is ignored:

  • 2 cup margarine

Not exactly correct but it's sensible and didn't cause any bizarre behaviour.

Multiple ingredient quantities

Finally, lets put two ingredient quantities on the same line:

1 potato 2 tbsp sugar

This produces the following match:

  • 2 tbsp potato

It's taken the first ingredient but the most specific amount resulting in a cross-over between the two ingredients. Again though, no nasty contamination of the calculation and the user can easily fix the issue by editing the input

Recipe ingredients

Next we'll try using ingredients from some recipes on the web.

2 cups cooked and drained adzuki beans
½ grated onion 
¼ cup toasted sunflower and pumpkin seeds 
1 tbsp tamari or shoyu (natural soya sauce) 
1 tbsp dried or fresh sage, coriander or herbs of your choice (leaves only) finely diced 
Pinch of sea salt and pepper (optional) 
Buckwheat flour
Organic sunflower frying oil

(https://uk.veganuary.com/recipes/azduki-bean-burgers)

This fails quite badly as the page shows the loading indicator ad infinitum...

Veebe stuck loading

Issues

We've collected enough issues for now - let summarise and fix them before going any further with the bug hunt:

  • ingredients that could not be matched are only shown if another ingredient is present
  • ingredients that could not be matched cause the aggregation calculation to be poisoned with NaNs
  • word numbers not understood
  • decimals with no leading zero not understood
  • the above Veganuary recipe causes the loading indicator to be displayed forever

Inconsistent mismatch reporting

It's clear to see the source of this bug from inspecting part of the render method that outputs the matches and nutrient values (details elided):

          {this.results.aggregated.length > 0 ? <div>
            <h2>Matched Ingredients</h2>
            ...
            
            {this.nutrients.map(n => this.buildNutrientListItem(n, 1))}
          </div> : null}

Both the matched ingredients and the aggregated nutrients are only displayed if this.results.aggregated is not empty. We can see from the aggregation calculation that if nutrientCounts is empty then the list of aggregated nutrients will be empty also:

    let aggregated: INutrientValue[] = [];
    for (let [n, c] of nutrientCounts.entries())
      if (c == componentCount && fdcNames.has(n))
        aggregated.push({name: n, value: 0.0, unit: nutrientUnits.get(n)});

And if there are no matched FDC components then nutrientCounts will be empty:

    let componentCount = 0;
    let nutrientCounts: Map<string, number> = new Map();
    let nutrientUnits: Map<string, string> = new Map();
    for (let iq of classified)
      for (let c of iq.components) {
        componentCount++;
        for (let n of c.foodNutrients) {
          nutrientCounts.set(n.name, nutrientCounts.get(n.name) ? nutrientCounts.get(n.name) + 1 : 1);
          nutrientUnits.set(n.name, n.unitName);
        }
      }

The solution is to split the two parts to show the matched ingredients if the list of matched ingredients is not empty and show the aggregated nutrients if the list of aggregated nutrients is not empty:

          {this.results.classified.length > 0 ? <div>
            <h2>Matched Ingredients</h2>
            ...
          </div> : null}
          
          {this.results.aggregated.length > 0 ? <div>
            {this.nutrients.map(n => this.buildNutrientListItem(n, 1))}
          </div> : null}

This now consistently reports all matches - even partial matches where the ingredient wasn't matched but the quantity was. Aggregated nutrients are only shown where at least one component was matched.

Poisoned calculation

Looking at the aggregation calculation, there is clearly a bug on this line

const nutrientDensity = sum / iq.components.length;

Where the ingredient is unmatched, iq.components.length is zero and therefore the above line will produce a division by zero.

We know that in this situation, the numerator (sum) will also be zero so it's safe to limit the minimum length to 1:

const nutrientDensity = sum / Math.max(iq.components.length, 1);

We also need to avoid including unmatched ingredients in the aggregation calculation as these will be missing nutrition values and thus propagate NaNs. We can do this with a simple filter at the top of the calcAggregated method:

  calcAggregated(classifiedAll: IIngredientQuantityEnriched[]): INutrientValue[] {

    // Only aggregate ingredients that we successfully matched
    let classified = classifiedAll.filter(iq => iq.componentIds.length > 0);

    // ...

Word numbers not understood

Fixing this is simply a case of updating our number rule to accept no leading zero by adding a new case \.\d+:

	rule(r'{(?<number>(?:\d* )?\d+ ?\/ ?\d+|\d*\s?[½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞]|\d+(\.\d+)?|\.\d+)}', '<<number>>'),

Decimals with no leading zero not understood

Similarly, the issue here is that the only word numbers understood are a and and:

	rule(r'{(?<number>an?)}', '<<number>>'),

We need to add in the additional word numbers:

	rule(r'{(?<number>an?|one|two|three|four|five|six|seven|eight|nine|ten)}', '<<number>>'),

Loading forever

Examining the console output we can see that an exception has occurred because the server response did not contain the necessary CORS headers to allow access to the data:

Veebe CORS failure

That's a bit strange - why does this recipe cause problems with the CORS headers? The answer is that our error handling on the server isn't very good and an exception has caused it to bypass our code that applies the headers and return a default error response. So this tells us nothing about the underlying failure but we should fix it to avoid confusion.

Let's bung a try-except block around the ingredient quantity parsing and if an exception occurs we'll return an empty ingredient quantity with an error attached:

def parse_ingredient_quantity(iq_str, ingredients):
  try:
  	# ...
  	
  except:
    return {
      'name': iq_str,
      'number': 0,
      'unit': 'default',
      'componentIds': [],
      'error': 'failed to parse'
    }

Now we just need to modify our app to handle the error. We'll highlight ingredients with an error in red and output the error text instead of the number, unit and ingredient:

              <ion-item color={iq.componentIds.length == 0 ? 'danger' : ''} onClick={() => this.toggleExpanded(iq)}>{iq.error ? <span>{iq.error}</span> : <span>{iq.number} {iq.unit == 'default' ? 'x' : iq.unit} {iq.name}</span>}</ion-item>

Let's also output (unknown ingredient) if there were no component foods found:

              <ion-item color={iq.componentIds.length == 0 ? 'danger' : ''} onClick={() => this.toggleExpanded(iq)}>{iq.error ? <span>{iq.error}</span> : <span>{iq.number} {iq.unit == 'default' ? 'x' : iq.unit} {iq.componentIds.length > 0 ? iq.name : '(unknown ingredient)'}</span>}</ion-item>

We'll change the heading from <h2>Matched Ingredients</h2> to <h2>Ingredients</h2> as any number of them may not be matched. Now we see the following:

Veebe with error handling

Having handled the errors better we can now investigate the source of the problem. This is most easily done by running locally in the debugger. Doing so produces the following exception:

ingredient_classifier.py", line 185, in parse_ingredient_quantity
    'number': float(number),
ValueError: could not convert string to float: '½'

We are naively converting our number string to a float, which only works for a subset of the formats we have allowed in our regex. We need a conversion function that can handle fraction characters, slash fractions and word numbers:

number_float_map = {
	'½': 1/2,
	'⅓': 1/3,
	'⅔': 2/3,
	'¼': 1/4,
	'¾': 3/4,
	'⅕': 1/5,
	'⅖': 2/5,
	'⅗': 3/5,
	'⅘': 4/5,
	'⅙': 1/6,
	'⅚': 5/6,
	'⅛': 1/8,
	'⅜': 3/8,
	'⅝': 5/8,
	'⅞': 7/8,
	'a': 1,
	'an': 1,
	'one': 1,
	'two': 2,
	'three': 3,
	'four': 4,
	'five': 5,
	'six': 6,
	'seven': 7,
	'eight': 8,
	'nine': 9,
	'ten': 10}
	
def number_to_float(n):
	if re.fullmatch(r'\d+\.\d*|\.\d+|\d+', n):
		return float(n)
	
	m = re.fullmatch(r'(?:(\d+) ?)?(\d+) ?/ ?(\d+)', n)
		
	if m:
		a = float(m.group(1) or '0')
		b = float(m.group(2))
		c = float(m.group(3))
		return a + b / c
		
	if n in number_float_map:
		return number_float_map[n]
		
	raise Exception('Unrecognised number format')

We can then replace the float(number) with number_to_float(number) in parse_ingredient_quantity:

  	for name in names:
  		return {
        'name': name,
  			'number': number_to_float(number),
  			'unit': unit,
  			'componentIds': ingredients[name].component_ids,
  			'unitMass': ingredients[name].unit_mass,
  			'density': ingredients[name].density
  		}

The only remaining issue is that the last ingredient was not matched:

Organic sunflower frying oil

We'd expect it to match this entry in our ingredient map:

	'sunflower-oil': ing(['sunflower oil'], ['172338', '171017', '171025', '172328'], 0.0, 0.9129789081583796),

The trouble is it's trying to match sunflower oil as a complete phrase. We need to rework our classify_ingredients function to match all the keywords separately. Firstly, let's rename keywords to key_phrases in the ing class. We'll use the term 'key phrase' for a group of 'key words':

class ing:
	def __init__(self, key_phrases, component_ids, unit_mass, density):
		self.key_phrases = key_phrases
		self.component_ids = component_ids
		self.unit_mass = unit_mass
		self.density = density

Now we'll update classify_ingredients. Firstly we need to change it to match a key phrase if all of the key words in it are found:

# Substitution helpers
def classify_ingredients(s, ingredients):
	candidates = []
	names = {}
	for name, ingredient in ingredients.items():
		for key_phrase in ingredient.key_phrases:
			match = True
			for key_word in key_phrase.split(' '):
				if key_word not in s:
					match = False
					break
			if match:
				candidates.append(key_phrase)
				names[key_phrase] = name
				
	sorted_candidates = sorted(candidates, key=len)
	
	if len(sorted_candidates) == 0:
		return s
		
	key_phrase = sorted_candidates[-1]
	name = names[key_phrase]
    
	classified_key_phrase = '<ingredient><' + name + '>' + key_phrase + '</' + name + '></ingredient>'

Then we try to substitute the classified key phrase for the original key phrase:

	if key_phrase in s:
		return s.replace(key_phrase, classified_key_phrase)

If it doesn't exist as a whole phrase in the original string then we strip out words that match the key words in the key phrase and append the classified key phrase to the end:

	for key_word in key_phrase.split(' '):
		s = s.replace(key_word, '')
		
	return s + ' ' + classified_key_phrase

Next time

We'll improve the ingredient data