Maximusnd / Getty Images
Remember when modern infrastructure meant provisioning software in a few virtual machines with Chef? Or managing the lifecycle of a couple of VMs using Terraform? As an industry, we don’t live in that world anymore.
Today’s most successful development teams have moved beyond managing a dozen or a hundred cloud infrastructure components, and instead have to think about thousands of cloud resources. In the modern world of containers and Kubernetes, the environments are huge in scale and complexity, the rate of change is infinitely faster, and the division between application and infrastructure has become blurred.
4x-image / Getty Images
In my August 2020 article, “How to choose a cloud machine learning platform,” my first guideline for choosing a platform was, “Be close to your data.” Keeping the code near the data is necessary to keep the latency low, since the speed of light limits transmission speeds. After all, machine learning especially deep learning tends to go through all your data multiple times (each time through is called an
epoch).
I said at the time that the ideal case for very large data sets is to build the model where the data already resides, so that no mass data transmission is needed. Several databases support that to a limited extent. The natural next question is, which databases support internal machine learning, and how do they do it? I’ll discuss those databases in alphabetical order.