Building Python Machine Learning projects that are both maintainable and easy to deploy is a hard job. To name a few topics, things like managing data

Structure of a machine learning project

submited by
Style Pass
2021-07-15 23:30:07

Building Python Machine Learning projects that are both maintainable and easy to deploy is a hard job. To name a few topics, things like managing data pipelines, training multiple models, not to mention production deploys, and versioning can become a pain in the neck. As you can imagine, everything can go out of control very quickly if not handled properly.

Thankfully, today we have so many options to choose from when it comes to cloud services for training and deploying your models into production, namely AWS SageMaker, Google’s AI and Machine Learning or Azure’s Machine Learning. However, all of these services encourage the use of Jupyter Notebooks and isolated Python scripts when using their services, leading to low maintainability of projects with repeated boilerplate code, and lacking good practices of teamwork and collaboration.

In this article, we share an opinionated overview of how to structure the code and processes of an ML project while following good old-fashioned software engineering practices. In particular, we are going to focus on using AWS SageMaker for training and production inference purposes. If you happen to not use AWS SageMaker, we still believe you’ll find the content of this article handy.

Leave a Comment