Rise of the Machine Learning Engineer

February 01, 2018
2 Minute Read

Photo by Andy Kelly on Unsplash

DRAFT

To make great products: do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.

–– Martin Zinkevich

Over the past few years, data science has rightfully been the most hyped career track. Numerous online courses, graduate programs, bootcamps and books have been published on the topic. All of them silo the role into a business intelligence and statistical modeller role mostly focused on building models in sandboxes. This is far from the original ambitions for this craft which was people building products leveraging data.

Most recently fast.ai and Andrew Ng’s Deep Learning Specialization on Coursera have received due attention.

Despite all the extremely hard work being put into expediting the spread of machine learning knowledge, we see companies (those who are not AMZN, GOOG, FB, MSFT or the likes) struggle with building a machine learning backed feature/product.

At most companies, the machine learning lifecycle consists of data scientists developing models offline and handing it off to data engineers to productionize them and integrate them with the rest of the (microservice) ecosystem, at times leveraging completely different implementation technology (language/compute framework), slowing down the entire process for the company.

The data engineers want to automate these engineering handoffs, deployments and integrations with the rest of the enterprise ecosystem. The data scientists want to have confidence on the model’s ability, its effectiveness and performance at scale. Most importantly both data scientists and data engineers want to move up the machine learning maturity scale to adapt to changing environments of the problem:

Today, ‘bias’ is a bug (in data-driven products) and models need to be secure, transparent and localized. We need machine learning to monitor models at scale.

Over the past few years, Machine Learning has been dominated by quant-heavy researchers developing novel methods, however, as data sizes are increasing exponentially these models can be trained

If you are based out of Vancouver, Canada and interested in applying core computer science to run Machine Learning systems at scale in production you can drop by one of our regular Vancouver Production ML Meetup. We have recently touched up some of the following tools: Apache Spark, ModelDB, MLeap, Apache MXNet and are trying to untangle cutting edge research being done in the area by pioneers such as Rise Lab, Google BRAIN Team and Stanford DAWN to reduce the time it takes for teams to take their models to production, at scale and eventually being able to build pipelines where models are trained, deployed, tuned, selected in an online manner following best practices and being weary of “The High-Interest Credit Card of Technical Debt”.