My First R Production Models

In my past life, I did not have the opportunity to deploy a machine learning (ML) model and I was ecstatic to have the opportunity to do so at MakuSafe. It also helped that I am my own data science team and can make the call as to when a model is ready to be deployed. Life in a start-up is quite different from a Fortune 500 company; it’s filled with innovating quickly and executing swiftly.

I’m pleased to say that on November 11, 2020, I was able to implement not one, not two, but THREE distinct models into production. And the icing on top was that all essential work was done in R!

So What Do These Models Do?

The three different models do the following:

  • identifying if a subject has slipped, tripped, or fell
  • identifying if a subject has experienced repetitive motion
  • identifying if a subject has pulled/pushed an object

While I am proficient in both Python and R, I’ve always found a stronger affinity for coding in R due to several packages that make coding fun and easier - as compared to Python. I’m definitely not advocating that one language is better than other. Far from it! Rather, I just prefer to code in R and use RStudio as my preferred IDE (which can also be used for Python!).

Last year, I had the privilege of attending a few conferences (related to stats & machine learning) across the US and one of the most popular statements I kept hearing was the R cannot be used for production use and Python was way better. So in my job, I took it upon myself to prove that R could be used in a production setting.

And I proved just that!

How I did it (at a very high level)

Believe it or not, my overall methodology (in terms of R packages) was not very complex. It essentially consisted of the following 5 steps:

  1. Data wrangling & analysis with {tidyverse}

  2. Data modeling with {tidymodels} & the associated packages for models such as:

    • random forest: {ranger}
    • support vector machines: {e1071}
    • deep learning: {keras}, {reticulate}
    • xgboost: {xgboost}
  3. Develop package & create bundle

    • this is a key step
  4. {plumber} to generate APIs to interact with R functions within package

  5. Deploy via Docker

Lesson Learned: Creating a Package is Key

By creating an internal package (i.e., not releasing to CRAN), I was able to create all of the necessary functions and the appropriate documentation in one bundle. This enables reliable documentation at the source and not having to create another documentation on another system (thus increasing the amount of maintenance required from me).

Deploying the package via Docker enables me to keep the models up to date with minimal post-modeling work. For instance, if I need to update a model, I simply have to commit an updated package bundle to the specific git repo and an automated workflow will kick off and tell the Docker image to use the newer package. Internally, we’ll keep track which model version created each prediction.

Lesson Learned: Working with Dev team early on prevented many issues

In order for my plumber APIs to be effective, I worked with the dev team members early on to understand how and what information will be sent to my models. This ensured that I built in the appropriate steps to handle errors and transformations so that my models could run error-free. Without this interaction early on, I could not have understood how or what my API needed to do. This is also how I learned that Docker containers would be the most prudent way to have a stable, consistent, and predictable environment for my APIs to work correctly. Furthermore, I knew exactly what the expected outputs would need to be and optimized my code to run quickly & efficiently. In the end, once data are received by my API, the outputs are made available in less than 300 milliseconds - which is more than acceptable for our business needs.

R can be used for production

Anyone that says that R cannot be used in a production setting doesn’t really understand how or why. I’m not arguing that R models are the best to use, but they can be implemented. In fact, Jacqueline Nolis from T-Mobile and her team were able to deploy a tensorflow model using R & Docker. Now that is awesome!