Last time we improved our ingredient data. This time we’ll carry on with that by calculating more accurate density values.

Food data central gives the mass in grams for particular portions of food in the file food_portion.csv. Where the portion is given as a known volumetric unit, we can use the mass and volume to determine the density.

Let’s begin by reading in the food portion data and computing the density for each row:

density = {}
with open('food_portion.csv') as f:
	rows = csv.DictReader(f)
	for r in rows:
		d = calc_single_density(
			r['amount'],
			r['measure_unit_id'],
			r['modifier'],
			r['gram_weight'])
		if d > 0.0:
			density[r['fdc_id']] = d

Here we read in four columns and pass them to calculate_single_density:

  • amount - the number of units contained in the portion
  • measure_unit_id - the ID of the unit used or '9999' if the unit is specified as free text
  • modifier - free text unit description, if applicable
  • gram_weight - the mass in grams of the portion

From these values we generate a density value by converting the portion unit to ml, multiplying by amount and computing g / ml:

def calc_single_density(amount, unit_id, modifier, grams):
	unit = find_unit(unit_id, modifier)
	return 0.0 if unit is None else float(grams) / (clean_amount(amount) * unit['ml'])

The clean_amount function simply handles bad input values zero or empty string), coercing them to sensible values:

def clean_amount(amount):
	a = float(amount) if not amount == '' else 1.0
	return a if a > 0.0 else 1.0

The find_unit function determines the unit that the portion is denominated in by first checking for a free text description and otherwise looking at the unit ID. It then returns a dictionary for that unit containing its name and volume in ml:

units = {
	'1000': {
		'name': 'cup',
		'ml': 236.588
	},
	'1001': {
		'name': 'tbsp',
		'ml': 14.7868
	},
	'1002': {
		'name': 'tsp',
		'ml': 4.92892
	},
	'1003': {
		'name': 'l',
		'ml': 1000.0
	},
	'1004': {
		'name': 'ml',
		'ml': 1.0
	},
	'1005': {
		'name': 'cu in',
		'ml': 16.3871
	},
	'1006': {
		'name': 'cc',
		'ml': 1.0
	},
	'1007': {
		'name': 'gal',
		'ml': 3785.41
	},
	'1008': {
		'name': 'pt',
		'ml': 473.176
	},
	'1009': {
		'name': 'fl oz',
		'ml': 29.5735
	}
}

def find_unit(unit_id, modifier):
	if unit_id == '9999' and modifier != '':
		for unit in units.values():
			if modifier.startswith(unit['name']):
				return unit
	elif unit_id in units:
		return units[unit_id]

Now we have a lookup table from ingredient ID to density, we can write a function to take a list of component food IDs and compute their mean density:

def calc_density(ids):
	densities = [density[id] for id in ids if id in density]
	return statistics.mean(densities) if len(densities) > 0 else 1.0

With this in place it is now trivial to knock up a simple script to read rows from our ingredient csv, calculate the mean density and write out the rows to a new ingredient csv file:

with open('manual-ingredients.csv', encoding='utf-8') as fi:
	rows_in = csv.DictReader(fi)
	
	with open('ingredients-with-density.csv', mode='w', encoding='utf-8') as fo:
		
		rows_out = csv.writer(fo)
		
		# header row
		rows_out.writerow([
			'keywords',
			'component_ids',
			'density',
			'unit_mass',
			'name'])
		
		# data rows
		for r in rows_in:
			if r['keywords'].strip() == '':
				continue
			ids = r['component_ids'].split('+')
			rows_out.writerow([
				r['keywords'],
				r['component_ids'],
				calc_density(ids),
				r['unit_mass'],
				r['name']])
				
print('done.')

Running this and feeding the new data into our ingredient classifier produces the following output:

2 courgettes (zucchini)  
number: 2.0, unit: default, names: ['zucchini'], grams: 392.0, Protein: 10.623199999999999, Fat: 1.568, Carbohydrate: 12.191199999999998

1 carrot  
number: 1.0, unit: default, names: ['carrot'], grams: 61.0, Protein: 0.4148, Fat: 0.1780090909090909, Carbohydrate: 4.404754545454545

1 avocado  
number: 1.0, unit: default, names: ['avocado'], grams: 136.0, Protein: 2.8061333333333334, Fat: 18.19226666666667, Carbohydrate: 11.3288

1 bunch basil  
number: 1.0, unit: default, names: ['basil'], grams: 0.0, Protein: 0.0, Fat: 0.0, Carbohydrate: 0.0

1 tbsp lemon juice  
number: 1.0, unit: tbsp, names: ['lemon-juice'], grams: 0.8523936, Protein: 0.0029833775999999995, Fat: 0.00204574464, Carbohydrate: 0.058815158400000005

2 tbsp nutritional yeast  
number: 2.0, unit: tbsp, names: ['yeast'], grams: 43.2342987916217, Protein: 10.324350551439261, Fat: 0.3891086891245953, Carbohydrate: 8.828443813249152

10 olives, sliced  
number: 10.0, unit: default, names: [], grams: 0.0, Protein: 0.0, Fat: 0.0, Carbohydrate: 0.0

4 garlic cloves, roasted  
number: 4.0, unit: default, names: ['garlic'], grams: 12.0, Protein: 0.7632000000000001, Fat: 0.06, Carbohydrate: 3.9672

2 tomatoes, roasted  
number: 2.0, unit: default, names: ['tomato'], grams: 246.0, Protein: 2.4354, Fat: 1.190025, Carbohydrate: 12.825825000000002

Pinch of chilli powder or smoked paprika
number: 1.0, unit: pinch, names: ['chili-powder'], grams: 0.4003137933832878, Protein: 0.05388223658939054, Fat: 0.057164809695133496, Carbohydrate: 0.19895595531149407


Total mass: 891.4870061850049
Protein: 27.42394949896199
Fat: 21.636620001035485
Carbohydrate: 53.80399447241519

Next time

We’ll think about how to implement our ingredient classifier as a web service.