Drift Detection and Automated Model Versioning in AWS SageMaker

Samon Daniel

doi:10.63665/ijmlaidse-y1f2a005

Drift Detection and Automated Model Versioning in AWS SageMaker

Samon Daniel
Ladoke Akintola University of Technology

DOI: 10.63665/ijmlaidse-y1f2a005

View / Download Full Article (PDF)

Abstract

In production, it is expected that machine learning models will be automatically managed to keep them continuously working well and reliably. This paper addresses a unified methodology of automated drift detection and versioning management of models within the AWS SageMaker ecosystem. We talk about using a model registry from SageMaker in order to easily handle different versions of a model. Furthermore, we review how deployment processes can be simplified by using automated pipelines. We demonstrate how SageMaker Model Monitor can be used to detect both data drift and model drift, enabling quicker responses to performance degradation. The research further proves that such components can be integrated into a low-cost and scalable system that maintains strong model governance and operational resilience. Experimental evaluation shows that the proposed automation helps maintain model accuracy and stability over time. This work provides important insights for practitioners seeking to leverage AWS services to automate the complete machine learning lifecycle using modern MLOps best practices.

Keywords

Automated Model Versioning, Drift Detection, AWS SageMaker, Machine Learning Operations (MLOps), Model Registry, Model Monitoring, Data Drift, Concept Drift, Continuous Integration/Continuous Deployment (CI/CD), Model Governance

References

[1] Breck, E., Cai, S., Nielsen, E., Salib, M., & Sculley, D. (2017). The ML test score: A rubric for ML production readiness and technical debt reduction. In 2017 IEEE International Conference on Big Data (pp. 1123-1132). IEEE.

[2] Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (pp. 2503-2511).

[3] Amazon Web Services, Inc. (2023). Amazon SageMaker Model Monitor Developer Guide. Retrieved from https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html

[4] Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., ... & Zimmermann, T. (2019). Software engineering for machine learning: A case study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (pp. 291-300).

[5] Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys (CSUR), 46(4), 1-37.

[6] Krawczyk, B., Minku, L. L., Gama, J., Stefanowski, J., & Woźniak, M. (2017). Ensemble learning for data stream analysis: A survey. Information Fusion, 37, 132-156.

[7] Rahman, M. S., & Davis, D. N. (2013). Addressing the data imbalance problem in medical datasets using oversampling and ensemble techniques. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (pp. 43-50).

[8] Sharma, A., Tiwari, P., & Shrivastava, S. (2021). Automated machine learning pipelines: Tools and challenges. Journal of Big Data, 8(1), 1-30.

[9] Siddiqui, M. A., Khoshgoftaar, T. M., & Wald, R. (2021). Model performance degradation detection and mitigation in production environments: A survey. IEEE Transactions on Neural Networks and Learning Systems.

[10] Maletzke, T., Suggala, A. S., & Singh, A. (2020). Continuous monitoring of machine learning models for data and concept drift detection. In Proceedings of the 2020 ACM SIGKDD Workshop on Machine Learning Systems (pp. 1-7).

[11] Cook, D. J., & Holder, L. B. (2019). Model governance for machine learning: Managing model risk in the age of AI. Communications of the ACM, 62(10), 56-64.

[12] AWS Labs. (2022). Building automated MLOps pipelines using AWS Step Functions and Lambda. Retrieved from https://github.com/awslabs/mlops-deployment

[13] Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124-136.

[14] Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (pp. 214-226).

[15] Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning requires rethinking generalization. Communications of the ACM, 64(3), 107-115.