Last time we improved our ingredient data. This time we’ll carry on with that by calculating more accurate density values.
Food data central gives the mass in grams for particular portions of food in the file food_portion.csv
. Where the portion is given as a known volumetric unit, we can use the mass and volume to determine the density.
Let’s begin by reading in the food portion data and computing the density for each row:
density = {}
with open('food_portion.csv') as f:
rows = csv.DictReader(f)
for r in rows:
d = calc_single_density(
r['amount'],
r['measure_unit_id'],
r['modifier'],
r['gram_weight'])
if d > 0.0:
density[r['fdc_id']] = d
Here we read in four columns and pass them to calculate_single_density
:
amount
- the number of units contained in the portionmeasure_unit_id
- the ID of the unit used or'9999'
if the unit is specified as free textmodifier
- free text unit description, if applicablegram_weight
- the mass in grams of the portion
From these values we generate a density value by converting the portion unit to ml
, multiplying by amount
and computing g / ml
:
def calc_single_density(amount, unit_id, modifier, grams):
unit = find_unit(unit_id, modifier)
return 0.0 if unit is None else float(grams) / (clean_amount(amount) * unit['ml'])
The clean_amount
function simply handles bad input values zero or empty string), coercing them to sensible values:
def clean_amount(amount):
a = float(amount) if not amount == '' else 1.0
return a if a > 0.0 else 1.0
The find_unit
function determines the unit that the portion is denominated in by first checking for a free text description and otherwise looking at the unit ID. It then returns a dictionary for that unit containing its name and volume in ml
:
units = {
'1000': {
'name': 'cup',
'ml': 236.588
},
'1001': {
'name': 'tbsp',
'ml': 14.7868
},
'1002': {
'name': 'tsp',
'ml': 4.92892
},
'1003': {
'name': 'l',
'ml': 1000.0
},
'1004': {
'name': 'ml',
'ml': 1.0
},
'1005': {
'name': 'cu in',
'ml': 16.3871
},
'1006': {
'name': 'cc',
'ml': 1.0
},
'1007': {
'name': 'gal',
'ml': 3785.41
},
'1008': {
'name': 'pt',
'ml': 473.176
},
'1009': {
'name': 'fl oz',
'ml': 29.5735
}
}
def find_unit(unit_id, modifier):
if unit_id == '9999' and modifier != '':
for unit in units.values():
if modifier.startswith(unit['name']):
return unit
elif unit_id in units:
return units[unit_id]
Now we have a lookup table from ingredient ID to density, we can write a function to take a list of component food IDs and compute their mean density:
def calc_density(ids):
densities = [density[id] for id in ids if id in density]
return statistics.mean(densities) if len(densities) > 0 else 1.0
With this in place it is now trivial to knock up a simple script to read rows from our ingredient csv, calculate the mean density and write out the rows to a new ingredient csv file:
with open('manual-ingredients.csv', encoding='utf-8') as fi:
rows_in = csv.DictReader(fi)
with open('ingredients-with-density.csv', mode='w', encoding='utf-8') as fo:
rows_out = csv.writer(fo)
# header row
rows_out.writerow([
'keywords',
'component_ids',
'density',
'unit_mass',
'name'])
# data rows
for r in rows_in:
if r['keywords'].strip() == '':
continue
ids = r['component_ids'].split('+')
rows_out.writerow([
r['keywords'],
r['component_ids'],
calc_density(ids),
r['unit_mass'],
r['name']])
print('done.')
Running this and feeding the new data into our ingredient classifier produces the following output:
2 courgettes (zucchini)
number: 2.0, unit: default, names: ['zucchini'], grams: 392.0, Protein: 10.623199999999999, Fat: 1.568, Carbohydrate: 12.191199999999998
1 carrot
number: 1.0, unit: default, names: ['carrot'], grams: 61.0, Protein: 0.4148, Fat: 0.1780090909090909, Carbohydrate: 4.404754545454545
1 avocado
number: 1.0, unit: default, names: ['avocado'], grams: 136.0, Protein: 2.8061333333333334, Fat: 18.19226666666667, Carbohydrate: 11.3288
1 bunch basil
number: 1.0, unit: default, names: ['basil'], grams: 0.0, Protein: 0.0, Fat: 0.0, Carbohydrate: 0.0
1 tbsp lemon juice
number: 1.0, unit: tbsp, names: ['lemon-juice'], grams: 0.8523936, Protein: 0.0029833775999999995, Fat: 0.00204574464, Carbohydrate: 0.058815158400000005
2 tbsp nutritional yeast
number: 2.0, unit: tbsp, names: ['yeast'], grams: 43.2342987916217, Protein: 10.324350551439261, Fat: 0.3891086891245953, Carbohydrate: 8.828443813249152
10 olives, sliced
number: 10.0, unit: default, names: [], grams: 0.0, Protein: 0.0, Fat: 0.0, Carbohydrate: 0.0
4 garlic cloves, roasted
number: 4.0, unit: default, names: ['garlic'], grams: 12.0, Protein: 0.7632000000000001, Fat: 0.06, Carbohydrate: 3.9672
2 tomatoes, roasted
number: 2.0, unit: default, names: ['tomato'], grams: 246.0, Protein: 2.4354, Fat: 1.190025, Carbohydrate: 12.825825000000002
Pinch of chilli powder or smoked paprika
number: 1.0, unit: pinch, names: ['chili-powder'], grams: 0.4003137933832878, Protein: 0.05388223658939054, Fat: 0.057164809695133496, Carbohydrate: 0.19895595531149407
Total mass: 891.4870061850049
Protein: 27.42394949896199
Fat: 21.636620001035485
Carbohydrate: 53.80399447241519
Next time
We’ll think about how to implement our ingredient classifier as a web service.