Building a tRUSTworthy web service 🦀
This blog was written by me when I was interning at Holmusk, a digital health company. The product mentioned blow is what I was working on with Rust during my time there.
I worked on FoodDX, which is a service which helps people get insights on how to improve their diet with the help from proprietary AI technology and personalized feedback from nutrition experts. In its current stage, it helps score images of food taken through an app, and gives it a score from 1-5, and also provides personalized tips for the food.
It's a big project with a lot of components - the AI models, the app, and the backend infrastructure which handles it all. We'll be taking a look at the backend in this article.
Before we dive into the infrastructure, it would help to take a look at what an image goes through once it enters our system.
s3
bucket.Haskell is used for the client facing API, and it's used for other ad-hoc tasks such as reading/writing to a database among others.
Rust is used for image preprocessing, model inference and sending the results back. Sending of results was done differently in the two approaches outlined below. Rust was chosen for its high efficiency, small executable footprint, and absence of a garbage collector. It also had strong type system, speed & relatively actively maintained Tensorflow (client) library.
The internal organization of the rust service in this architecture is outlined above. There were 3 main parts, all running concurrently on 3 separate tokio[2] runtimes - namely polling SQS, preprocessing and running inference on the images, and cleanup tasks (like writing results to redis, notifying SQS that the image can now be taken off the queue, etc).
The external processes related to this architecture are outlined below.
The main gripe we had was in the S3
to SQS
[3] upload event notification. In our benchmarks, it was very slow, and we aim for the service to have a very low latency, with the goal being that every image that comes into the system should be scored/rated in under 1 second . Because of the way the system was designed, this meant that we'd need a pretty big makeover on the rust side, and some tweaks on the haskell side if we were to get closer to meeting our performance goals. This is also mentioned in the AWS docs, where they state that Typically, event notifications are delivered in seconds but can sometimes take a minute or longer
.
As mentioned above, the main reason for redesigning the architecture was to avoid the S3
to SQS
upload event notification as low latency is of high priority in this project. In the process, we found out that we actually simplified it, by removing unnecessary moving parts.
Internally, the Rust service now also has a webserver. The client facing API (written in Haskell) proxies the HTTP requests it receives to the Rust server via a load balancer (AWS ELB). In this version of the architecture, we completely eliminate the use of a queue (SQS
).
We have chosen to use warp
[4] for the web server implementation in Rust.
We have 3 tokio runtimes running simultaneous and somewhat independently of one another. These tokio runtimes communicate with each other using messages that are passed between them using bounded channels. The "messages" we pass are custom Structs
we define for communication.
Finally, because each request handler needs a result for its own image, the handler initially creates a oneshot for receiving it's results and this is passed along as metadata for the image. Once the image is inferred in a batch, the data is sent back to the image's corresponding request handler so the results can be returned.
As mentioned, we have completely avoided the use of SQS in this architecture. The external architecture around the rust service now looks like this:
We use a couple of different channels for communication with different parts within the rust service. Check out this chapter from the rust book for some more context on how they work!
std::sync::mpsc
: This is the only sync
channel we use (rest are async
). We use it to communicate to the main
function that the models have been loaded. Since the main
function is sync
, we use the builtin synchronous channel rust provides.The other channels are async
, meaning they wouldn't block the runtime while await
ing for a result. They instead would pass the control back to the async runtime (tokio
in this case) and other tasks can be performed. The async
channels are :
tokio::sync::oneshot
: A oneshot is a channel which has only one reciever and one sender. The handler keeps the Receiver
and sends its Sender
around the program. Once the processing is finished (a batch of requests are processed at a time) the oneshot is used to send the result back to the handler of that specific request, maintaining the one-one mapping of the request and response that's required.async_channel::bounded
, which we use like a MPSC (Multi Producer, Single Consumer) channel to pass data between many response handlers to the batching task, for example. It's used for communication between tasks. We'd like to use tokio::sync::mpsc
, but :
1.x
removed try_recv
due to some errors. They plan on adding it back later. This was a function we had to use.async-channel
is recommended as an alternative for the time being.These are our takeaways for using Rust in this project!
Pros :
Cons :
rusoto
library. Because this library had a dependency with Tokio 0.1.15
, we couldn't migrate to Tokio 1.x
for a really long time. We were able to do it later when rusoto
was updated, but we still expected such a critical library to stay up to date. Things are looking good however, with AWS announcing that they are working on an official SDK for Rust.We also have some general takeaways and gotchas we encountered in this project:
C
bindings were not built/available for a large variety of GPU instances we use in AWS. This proved to be a little tedious to fix, as we had to manually compile tensorflow for the systems we use in production, without which we experienced slow inference times and model loading.The image hash is calculated and used to check for duplicates. ↩︎
Tokio is an asynchronous runtime for the Rust programming language. A lot of languages have a built in async runtime. Rust allows you to choose whichever runtime you require. Tokio is the most popular option in the Rust ecosystem. Check out this resource for more insight into async and the rust async ecosystem! ↩︎
Amazon SQS
is a fully managed queue which we were using to distribute messages to different Rust service instances. ↩︎
We used warp
because of its excellent tokio interoperability and flexible Filter
system. ↩︎
Connect and reach out to me!