Last time we improved our ingredient data. This time we’ll carry on with that by calculating more accurate density values.

Food data central gives the mass in grams for particular portions of food in the file `food_portion.csv`. Where the portion is given as a known volumetric unit, we can use the mass and volume to determine the density.

Let’s begin by reading in the food portion data and computing the density for each row:

``````density = {}
with open('food_portion.csv') as f:
for r in rows:
d = calc_single_density(
r['amount'],
r['measure_unit_id'],
r['modifier'],
r['gram_weight'])
if d > 0.0:
density[r['fdc_id']] = d
``````

Here we read in four columns and pass them to `calculate_single_density`:

• `amount` - the number of units contained in the portion
• `measure_unit_id` - the ID of the unit used or `'9999'` if the unit is specified as free text
• `modifier` - free text unit description, if applicable
• `gram_weight` - the mass in grams of the portion

From these values we generate a density value by converting the portion unit to `ml`, multiplying by `amount` and computing `g / ml`:

``````def calc_single_density(amount, unit_id, modifier, grams):
unit = find_unit(unit_id, modifier)
return 0.0 if unit is None else float(grams) / (clean_amount(amount) * unit['ml'])
``````

The `clean_amount` function simply handles bad input values zero or empty string), coercing them to sensible values:

``````def clean_amount(amount):
a = float(amount) if not amount == '' else 1.0
return a if a > 0.0 else 1.0
``````

The `find_unit` function determines the unit that the portion is denominated in by first checking for a free text description and otherwise looking at the unit ID. It then returns a dictionary for that unit containing its name and volume in `ml`:

``````units = {
'1000': {
'name': 'cup',
'ml': 236.588
},
'1001': {
'name': 'tbsp',
'ml': 14.7868
},
'1002': {
'name': 'tsp',
'ml': 4.92892
},
'1003': {
'name': 'l',
'ml': 1000.0
},
'1004': {
'name': 'ml',
'ml': 1.0
},
'1005': {
'name': 'cu in',
'ml': 16.3871
},
'1006': {
'name': 'cc',
'ml': 1.0
},
'1007': {
'name': 'gal',
'ml': 3785.41
},
'1008': {
'name': 'pt',
'ml': 473.176
},
'1009': {
'name': 'fl oz',
'ml': 29.5735
}
}

def find_unit(unit_id, modifier):
if unit_id == '9999' and modifier != '':
for unit in units.values():
if modifier.startswith(unit['name']):
return unit
elif unit_id in units:
return units[unit_id]
``````

Now we have a lookup table from ingredient ID to density, we can write a function to take a list of component food IDs and compute their mean density:

``````def calc_density(ids):
densities = [density[id] for id in ids if id in density]
return statistics.mean(densities) if len(densities) > 0 else 1.0
``````

With this in place it is now trivial to knock up a simple script to read rows from our ingredient csv, calculate the mean density and write out the rows to a new ingredient csv file:

``````with open('manual-ingredients.csv', encoding='utf-8') as fi:

with open('ingredients-with-density.csv', mode='w', encoding='utf-8') as fo:

rows_out = csv.writer(fo)

rows_out.writerow([
'keywords',
'component_ids',
'density',
'unit_mass',
'name'])

# data rows
for r in rows_in:
if r['keywords'].strip() == '':
continue
ids = r['component_ids'].split('+')
rows_out.writerow([
r['keywords'],
r['component_ids'],
calc_density(ids),
r['unit_mass'],
r['name']])

print('done.')
``````

Running this and feeding the new data into our ingredient classifier produces the following output:

``````2 courgettes (zucchini)
number: 2.0, unit: default, names: ['zucchini'], grams: 392.0, Protein: 10.623199999999999, Fat: 1.568, Carbohydrate: 12.191199999999998

1 carrot
number: 1.0, unit: default, names: ['carrot'], grams: 61.0, Protein: 0.4148, Fat: 0.1780090909090909, Carbohydrate: 4.404754545454545

number: 1.0, unit: default, names: ['avocado'], grams: 136.0, Protein: 2.8061333333333334, Fat: 18.19226666666667, Carbohydrate: 11.3288

1 bunch basil
number: 1.0, unit: default, names: ['basil'], grams: 0.0, Protein: 0.0, Fat: 0.0, Carbohydrate: 0.0

1 tbsp lemon juice
number: 1.0, unit: tbsp, names: ['lemon-juice'], grams: 0.8523936, Protein: 0.0029833775999999995, Fat: 0.00204574464, Carbohydrate: 0.058815158400000005

2 tbsp nutritional yeast
number: 2.0, unit: tbsp, names: ['yeast'], grams: 43.2342987916217, Protein: 10.324350551439261, Fat: 0.3891086891245953, Carbohydrate: 8.828443813249152

10 olives, sliced
number: 10.0, unit: default, names: [], grams: 0.0, Protein: 0.0, Fat: 0.0, Carbohydrate: 0.0

4 garlic cloves, roasted
number: 4.0, unit: default, names: ['garlic'], grams: 12.0, Protein: 0.7632000000000001, Fat: 0.06, Carbohydrate: 3.9672

2 tomatoes, roasted
number: 2.0, unit: default, names: ['tomato'], grams: 246.0, Protein: 2.4354, Fat: 1.190025, Carbohydrate: 12.825825000000002

Pinch of chilli powder or smoked paprika
number: 1.0, unit: pinch, names: ['chili-powder'], grams: 0.4003137933832878, Protein: 0.05388223658939054, Fat: 0.057164809695133496, Carbohydrate: 0.19895595531149407

Total mass: 891.4870061850049
Protein: 27.42394949896199
Fat: 21.636620001035485
Carbohydrate: 53.80399447241519
``````

Next time

We’ll think about how to implement our ingredient classifier as a web service.