Build your own budget Tensorflow-lite AI Cloud with Google’s Coral EdgeTPU Technology

Nam Vu
8 min readJan 7, 2020

January 6, 2020

TL,DR; In this article you will learn how to create a budget AI Cloud that provides an Object Detection REST API with accelerated performance made possible by Google’s Coral EdgeTPU Technology. Scroll to the end of this article for a challenge!!

Ahhhh, the Cloud…

Image taken from

When creating a back end server application, we often think in terms of performance. While performance can be achieved by highly optimized software, the bottleneck are often caused by limited hardware capabilities. Adding AI into the mix, and we’re talking extreme power hungry mega-tron GPU(s) on top of your multi-extreme-core processors.

Now let us picture being able to setup your own personal, high performance AI Cloud server with just a single piece of plug-and-play USB hardware; drawing a peak current of about 900mA on maximum running capacity. Well, we can stop day dreaming now, because this is made possible by Google’s Coral EdgeTPU technology. The piece of hardware that we will check out for this tutorial is the Coral USB Accelerator. Let’s follow our imagination and build our own budget AI Cloud!!!

Coral USB Accelerator (image taken from

** Note that any of the Coral’s product can be used for this project, but I’m only featuring the Accelerator for the plug-and-play purpose of this tutorial.

Edge TPU benchmark by Google


  • The Coral USB Accelerator
  • A Debian based x86_64 host machine that you can use to build the code (I’m running Debian10 on my home desktop and Ubuntu 18.04 with my laptop, I haven’t tried to build on other machines)
  • A server to run the binary on, could be arm64 or x86_64, could be the same machine as your build machine
  • I suggest going through the Getting Started Guide from Coral’s Webpage (you’ll also need to install the libedgetpu package from that guide where you are running the code)
  • This guide assumes that readers have some basic knowledge of AI, tensorflow, linux, backend application, REST APIs, and git.

Introduction to project restor

restor is a personal project of mine that provides a REST API for Object Detection accelerated by the EdgeTPU Platform (I thought the name is kind of punny). You can visit the project from here:

restor is written in modern C++ for even better performance boost and depends on the google’s EdgeTPU Object Detection Engine. Currently, restor has 4 endpoints, but this will be expanded:

restor’s available endpoints as of Jan 6, 2020

The POST method is what will be used for a client to send image data to the server. In the “data” field of the json string, the data should be a base64 encoded data from a .bmp image, this data will then be processed by the restor server to detects objects. Don’t worry, I will include some example client applications also :)

  • Here is a preview of what restor can do:

Let’s take a look at this beautiful image of my cat, Maxium along with me and some friends at the crag last weekend:


Let’s do some detection magics…

  • the preparation
# Let's First install some deps
$ sudo apt update
$ sudo apt install wget curl jq imagemagick
# Secondly you can download the same image
$ wget
# Then this image needs to be reformatted to a .bmp file instead of .jpg
$ mogrify -format bmp maximum.png
# at this point, a maximum.bmp file should be generated.
# Next, turn this image data into a base64 encoded json string and save it as a tmp file:
$ echo "{\"data\":\"`cat maximum.bmp|base64 -w0`\"}" > /tmp/maximum.json
  • the send off
# send it to the server (you need to set the server up first)
$ curl -d@/tmp/maximum.json -H "Content-Type: application/json" -X POST http://localhost:8888/detects | jq
"req_id": 627,
"result1": {
"candidate": "dog",
"score": 0.91015625
"result2": {
"candidate": "person",
"score": 0.66015625
"result3": {
"candidate": "person",
"score": 0.41796875

Oh wait, did I say “cat” earlier?!? Because restor definitely believes Maximum is a dog o_0. In this case, I think restor may actually be correct ¯\_(ツ)_/¯

Setup restor

  • First, let’s install the build-essential package to build the project:
# If you are running the server on the same machine as your build machine, install this:
$ sudo apt-get install -y build-essential
# If you are running the server on another machine that does not have the same CPU architecture, install this:
$ sudo apt-get install -y crossbuild-essential-armhf crossbuild-essential-arm64
  • Second, let’s set up the project:
# clone the project:
$ git clone && cd restor
# download some dependencies:
$ python3
  • Next, building the project:
# If you are running the server on the same machine:
$ make
# If you are running the server on another machine, you'll need to specify the CPU architecture of the machine you are running this on, an example:
$ make CPU=aarch64
** currently we have CPU=k8 for amd64 and CPU=aarch64 for arm64, the binary can also be built for arm32 bits with CPU=armv7a, however the libedgetpu no longer support 32 bits architecture, so you may have to install an older version of

If everything builds correctly, you should see a restor binary in the “out/{k8||aarch64||armv7a}/restor” directory from root of the project.

Run the Server

  • Configurations

restor can either be run via CLI argument parsing or a yaml config. There is an example config in config/restor.yaml:

modelFile: test_data/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite
labelFile: test_data/coco_labels.txt
numResults: 5
numThreads: 4
port: 8888

Each of the field name are quite self explanatory so I won’t go over them, although for modelFile and labelFile, I suggest giving the full path, in case you are running the binary from different directories.

** note: the model used in this example are taking straight from the google-coral’s edgetpu repo which is capable of detecting 90 different objects.

  • Running the restor server

To run the server natively, just give it a config_path, here is an example run with the above config:

$ ./out/k8/restor --config_path config/restor.yaml
I0106 14:13:02.776759 53679] RESTOR
I0106 14:13:02.806629 53679] found 1 TPU(s)
I0106 14:13:02.806658 53679] config: config/restor.yaml
I0106 14:13:05.524688 53679] Engine initialized
I0106 14:13:05.524776 53679] model: test_data/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite
I0106 14:13:05.524821 53679] label: test_data/coco_labels.txt
I0106 14:13:05.524849 53679] num_results: 5
I0106 14:13:05.524873 53679] num_threads: 4
I0106 14:13:05.524894 53679] input_tensor_shape: [1,300,300,3]
I0106 14:13:05.524971 53679] End points registered
I0106 14:13:05.525341 53679] Serving on port: 8888

To run it on another machine, just copy the model, label file, config, and binary to the machine you wish to run it on and then run it.

That is all folks, you have built your own personal AI Cloud that can tackle object detection tasks. There are many like it, but this one is yours :)

You can loop back to the Preview section above to test out the server. The tutorial ends here if you are only concern about setting up your personal AI Cloud, however, I’ve also included a couple of clients programs, as well as a /metrics monitoring endpoint. If you want to check it out, continue reading the BONUS section!


  • Metric Monitoring

For more the more advance users, you guys may be wanting some type of monitoring system. Collecting metrics is one way of finding out what type of stress your server is dealing with. For that, I’ve built in a prometheus /metrics endpoint. By running the restor server, this end point is readily available by default:

$ curl localhost:8888/metrics
# HELP server_request_total Number of total http requests handled
# TYPE server_request_total counter
server_request_total{endpoint="metrics",method="GET"} 63.000000
server_request_total{endpoint="version",method="GET"} 84.000000
server_request_total{endpoint="detects",method="POST"} 84.000000
# HELP process_open_fds Number of open file descriptors
# TYPE process_open_fds gauge
process_open_fds 18.000000
# HELP process_resident_memory_bytes Process resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 441085952.000000
# HELP process_virtual_memory_bytes Process virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 704843776.000000
# HELP process_virtual_memory_max_bytes Process peak virtual memory size in bytes.
# TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes 911925248.000000

For a better looking representation of these metrics, you can set up a prometheus docker image. I have provided a prometheus.yaml config and a Dockerfile here. Please change the ip address values in the prometheus.yaml to matches your server’s ip address. Then

$ cd config
# modify the prometheus.yaml here
$ docker build -t prometheus .
$ docker run -p 9090:9090 prometheus

If everything is successful, you should be able to visit localhost:9090/graph with your web browser to see something like this:

  • Example cpp client for sending any images to the server

I have also written an example client that can access all of restor’s endpoints here: cpp_cli_client. Quick usage:

# go to it
$ cd example_client/cpp_cli_client
# install some deps
$ make install
... not showing this output ...
# build
$ make
g++ -Iinclude -std=c++17 -Wall -Wextra -pthread -o restor_client
# should see all these files
$ ls
include Makefile restor_client
# example run
$ ./restor_client --host localhost --port 8888 --method post --endpoint /detects --image /tmp/maximum.bmp
Sending POST @data={"data": base64_encode(/tmp/maximum.bmp)} to localhost:8888/detects
"req_id": 1,
"result1": {
"candidate": "dog",
"score": 0.91015625
"result2": {
"candidate": "person",
"score": 0.66015625
"result3": {
"candidate": "person",
"score": 0.41796875

** restor_client can see other endpoints but I’ll leave it up to you to figure that out :)

  • Example python client for taking images and sending it to the server

The example python client depends on opencv to take a picture and sending it to the restor for detection.

# go to it
$ cd example_client/cv_client
# run it
$ python3 --host localhost --port 8888

That is all folks, you have now successfully built your own AI Cloud using a plug and play device. Checkout the whole source code on my Github Repo. Send me some feature requests, issues, PRs…

Last but not least, the challenge:

I’ve walked you through the step by step process for setting up your machine with USB Accelerator, can you set up restor on the Dev Board?!?