Last time we finished getting our ingredient data ship-shape. We now have a function that does a pretty good job of mapping a rag-bag of free text ingredients to specific items in our database and calculating their aggregate nutritional values. Today we’ll look at how we might present this as a web service.

Hosting

The architecture for this is quite simple. We want:

  • a web page that provides the interface for a user to supply the ingredient list and see the nutrition output
  • an HTTP end-point the page can query to run our classifier and return the results

Let’s focus on the end-point to begin with. A nice way to create that end-point is with a cloud function. That avoids all the unpleasant rigmarole of setting up web-servers, load-balancing and so forth so that all we have to do is create a function and upload the code.

There are various providers of cloud function hosting but we’ll go with Google for ours because they support Python functions and the process for uploading is very simple and can be done using just the web browser. We won’t go into the details of uploading a Python cloud function in this article as Google already have a fairly comprehensive tutorial.

Performance

Before we go ahead and wrap up our classifier as a cloud function, it’s worth pausing to think about performance.

The classification and aggregation processing is fast but at present we read over 36Mb of food nutrient data from CSV into memory before each invocation of the function, most of which we’ll never use.

We want to avoid this unnecessary work because:

  • slow response degrades the user experience
  • CPU time costs money

Instead of loading all the nutrient data from CSV, we’ll fetch just the bits we need by querying the FDC database.

However, it’s not quite that simple because accesses to the FDC DB are throttled per IP address. If the data is requested by our server, we could quickly exceed the permitted rate. Therefore, that data must be queried by the client. It could then be passed to our cloud function or the subsequent part of the calculation could be performed client-side.

In this case, the subsequent calculation (aggregating the nutrient values) is trivial and it makes sense to do this in the client. We may also want to let the user tweak some of the numbers that feed into the aggregation so quick recalculation without going back and forth to the server is beneficial.

The other data sets our function uses (ingredient keywords and units) are much smaller and we need to load them completely each time as we’re likely to use all of their contents.

The process

The complete process from end to end looks like this:

  • [Client] load page with form to submit free-text ingredient entry
  • [Server] execute classifier cloud function with free-text ingredients and return structured ingredients
  • [Client] display structured form with structured ingredients
  • [Client] calculate aggregate nutrient values
  • [Client] display aggregate nutrient values
  • [Client] on change to data in structured form, update calculated nutrient values

The client-side code will be javascript to be able to run in a web browser and interact with the web page. We won’t be doing anything complicated on the client side though, just aggregating values and some plumbing to link up with the UI controls.

Refactoring

Before we begin, let’s take our program and refactor it so it’s easier to move into the cloud. We want to:

  • Separate the classification from the aggregation because only the classification will go to the cloud
  • Move data loading to the end of the classification so we can dump the data in directly and keep the source at the top, for convenience

Here is the refactored source code:

Note that to avoid the aggregation depending on our ingredient database, which will live in the cloud, the component_ids, unit_mass and density are now returned from the classification in the ingredient_quantity class for the matched ingredients.

Loading ingredient data

Currently, we’re loading our ingredient data from ingredients.json. We can’t load a file quite as easily when we move to the cloud and actually there’s only about 700 entries so for now at least we could cheat and put the data in the source directly. To do this, we need to modify our gen-ingredient-json.py script a tad:

import csv

with open('ingredients-with-density.csv', encoding='utf-8') as f:
	rows = csv.DictReader(f)
	
	print('ing_map = {')
	for r in rows:
		if r['keywords'].strip() == '':
			continue
		
		name = r['keywords'].split('|')[0].strip().replace(' ', '-')
		args = '[\'' + r['keywords'].replace('|', '\', \'') + '\'], [\'' + r['component_ids'].replace('+', '\', \'') + '\'], ' + r['unit_mass'] + ', ' + r['density']
		print('	\'' + name + '\': ing(' + args + '),')
		
print('}')

This generates a dict like this:

ing_map = {
	'abiyuch': ing(['abiyuch'], ['167782'], 100.0, 0.9637006103437199),
	'acerola-juice': ing(['acerola juice'], ['171687'], 0.0, 1.0211845063993101),
	'acerola': ing(['acerola'], ['171686'], 100.0, 0.4142221921652831),

	...
}

This depends on a new class, ing to hold the ingredient data:

class ing:
	def __init__(self, keywords, component_ids, unit_mass, density):
		self.keywords = keywords
		self.component_ids = component_ids
		self.unit_mass = unit_mass
		self.density = density

We can follow Google’s tutorial to create a Python cloud function, paste our code in, modify the boilerplate code to call our parse_recipe function with the input recipe from the json request and build a response containing the result (list of matched ingredients).

Next time

We’ll think about the client app that calls our shiny new cloud function!