Summary of our experience with constructing, hacking, and maintaining the machine learning infrastructure for our R&D team of 10 people. What’s the difference between a research ML environment and a production ML environment? Evolution of our workflow management. Encouraging the transparent and collaborative ML development culture. What’s better, clouds or homebrew? Constrained resources, Docker, Kubernetes, and you. How we solved complex problems, e.g. enabled GPU overcommitting for JupyterHub. Deploying the scientific Python stack, Tensorflow, PyTorch properly.