MLflow in Five

MLflow Explained With Five Questions, In Five Minutes, Like You’re Five

Do you think the machine learning lifecycle sounds like a type of very intelligent bike? Were you once an AI artificer but just haven’t dusted off the old scikit in a while? Before you type ‘pip install mlflow’ into your console and jump into MLflow’s documentation, take a moment to read this quick refresher that covers the What, Who, Why, When, and Where of MLflow.

What?

MLflow is an open-source open-interface end-to-end Machine Learning (ML) platform that helps engineers manage the ML lifecycle for distributed datasets.

Who?

Initially developed by the creator of Apache Spark™ Matei Zaharia, MLflow was expanded by a specialist Machine Learning and AI team at Databricks. The project has since joined the non-profit Linux Foundation, allowing for further community-led open collaboration.

Why?

There are hundreds of tools for every phase of the ML lifecycle, and a thorough engineering job demands you try every algorithm available to see if it improves results. This means productionising multiple libraries and keeping track of which code, data, and parameters have been implemented in which test-case models. Combine this with the broad spectrum of environments your model will need to deploy in, and you have a recipe for time wasting, overspend, and risky unstable deployments. 

MLflow fixes all these issues with its five key components:

  1. MLflow Track logs and compares run outputs to ease initial experimentation. 
  2. MLflow Projects helps with organising data science code into Git repos. 
  3. MLflow Models packages ML models in multiple flavour formats for quick and flexible deployment.
  4. MLflow Registry organises models into a central repository with easy API access and a UI built for collaborative working.
  5. MLflow Pipelines (still experimental) uses predefined model templates, auto-reruns, and a git-integrated pipeline to move models from development to deployment lightning fast.

Further, native Databricks integration means MLflow slots right into Azure stacks, allowing model data ingestion or output delivery via highly flexible cloud infrastructure that preserves performance and controls costs. This also ensures that working with distributed datasets and teams is incredibly simple.

Finally, a commitment to open computer science through partnership with the Linux Foundation begets transparency and peer review. This means that MLflow is free and accessible, but also ensures the tool remains cutting edge and problems are quickly addressed through community engagement.

When?

MLflow is typically deployed for the entirety of the ML lifecycle. 

  1. First, MLflow allows developers, scientists, and engineers to track model experimentation in the early stages of construction. 
  2. Next, models can be deployed from a range of libraries and packaged into projects to allow sharing and transfer to production. 
  3. These production-stage models can then easily be served to users by hosting them as REST endpoints. 
  4. This model lifecycle is easily managed using a registry, letting users codify and monitor model development and versioning from staging to production. 

Where?

The major use cases for MLFlow are in the world of Data Science. Big Data projects need MLflow Track to log and compare model outcomes from potentially large and dispersed scientist teams. MLflow Projects and MLflow Models are then necessary to package, share, and deploy models with their contingent libraries across teams. Individual data scientists are also key MLflow users, benefitting from its organisational capabilities even on smaller-scale projects. 

In the analytics industry, MLflow is also favoured by large organisations and their production engineers. MLflow Repository permits easy collaboration between and within different project verticals to facilitate thorough peer review and expedite reuse in productised data analytics pipelines – themselves streamlined via the eponymous MLflow Pipelines.

Summary

To surmise, MLflow is a powerful platform from the Databricks team and Linux Foundation that enables end-to-end management of ML projects, leveraged by a diverse userbase. This next gen tool champions transparent and open code, standards, and software whilst revolutionising the ML lifecycle. MLflow is a must have for a modern ML project, and a game-changing secret weapon for any ML engineer.

To learn more about the MLflow platform, follow them on Twitter or head to their Public Slack Channel.

You can also join the conversation by commenting below – or click here to discover more Distributed Analytics AI blog posts

Leave a Reply

Your email address will not be published. Required fields are marked *