We recently hosted a call with Karthik Bharathy, Director of AI/ML services at AWS, who spoke on recent strategic announcements and AWS capabilities – specifically on how AWS’ AI/ML offerings look, how you can leverage them, and how to best work with the AWS team as a startup.
At AWS, the primary goal today is figuring out how the business can take advantage of generative AI, beyond just simple text and chat use cases. What led to this moment was the massive proliferation of data and compute: it became available at low cost and a very large scale. So, machine learning has experienced a ton of innovation over the 2-3 years, which has accelerated the prior efforts tremendously.
Generative AI is a fundamental paradigm shift from the AI of the past – you’re trying to generate new content, powered by foundation models, which are pre-trained ML models leveraging vast amounts of data. What’s important is that these foundation models can be customized for specific uses cases, giving them more power and relevance
From an AWS standpoint, ML innovation is in their DNA – going all the way back to e-commerce recommendations, or picking the routes where packages can be stored, Alexa, Prime Air, or even Amazon Go where you have a physical retail experience. These products already incorporate machine learning and are backed by foundation models
Today, there are over 100K customers across different geographies and verticals, all using AWS for machine learning, and all of these customers are already in production.
There are four key considerations for startups when spinning up ML capabilities:
Amazon Bedrock is a fully managed service that offers high performing foundation models from companies like AI21 Labs, Cohere, Anthropic, Stability AI, etc. including foundation models coming from Amazon (Titan models).
This is a serverless offering, meaning you don’t have to manage any infrastructure when you access these models – all you have to do is use APIs to interact with them. You can also customize these models with your own data (e.g. fine-tuning or RAG)
You have the choice of taking advantage of an on-demand mode where you look at the input and output tokens, and then essentially index on what your pricing will be, so that you can project based on your current application needs (vs. future application needs). Whether this is coming from NVIDIA, AWS Inferentia, etc. – it’s all under the hood – you’re not exposed to instances. There are even capabilities like instance recommender that can suggest what’s the best instance for a given model – it really just comes down to the use case
There are many different models today, and this list will continue to expand. You can try them all in a sandbox environment via the AWS console.
The Bedrock service is HIPPA compliant so you can use the models in GA along with Bedrock for your production use cases
Getting started with Bedrock:
In terms of pricing there are three different dimensions:
If you’re an ML practitioner who wants to try out an open-source model, and have access to the instances, you can use Sagemaker JumpStart. This is a model hub where you can access HuggingFace models (or other models) directly within Sagemarker and deploy them to an inference endpoint. Fine-tuning is different from the Bedrock experience – you work via a notebook where you make changes to the model in a more hands-on way, so if you’re an ML practitioner who is very familiar with the different techniques of fine-tuning, it gives you a lot of knobs and flexibility on how you can build, train, and deploy models on Sagemarker
Foundation models, however, are just one piece of the puzzle. From a process perspective, there’s a lot more to the orchestration piece. Users typically want a task to be accomplished (vs. just interacting with data), so if you have an application, but you just want to book your vacation, that’s going to involve a series of steps (e.g. understanding the different prices, selecting the different options, etc.). So, it’s a process in and of itself and that involves more than just interaction with the model – you must also interact with the data, and a bunch of APIs on the back-end, and so on. And at the same time, you want to ensure that security is tight, because while there’s orchestration, you also want to meet your enterprise cloud policies. This can take a number of weeks if you do this on your own, and Amazon Bedrock has just announced “Agents” to make this a lot simpler
How do Amazon Bedrock Agents work to enable generative AI applications to complete tasks in just a few clicks?
Differentiating with your data is key – It’s pretty evident, but while foundation models can do a lot of things out of the box, their impact is vastly amplified once they are fine-tuned with your data sources. So, net data is your differentiator. You’ll get higher accuracy for the specific tasks that you’re after. All you need to do is point to the repository of your custom data, then point that to the foundation models. The foundation models will do the training run and produce a fine-tuned version of the model that you can use in your application
The customer data you provide to fine-tune the model is only used for your own, newly made, fine-tuned model. It won’t ever be used to improve the underlying model that Amazon is providing. AWS can’t access your data and don’t intend to use it for improving their own service. Everything is being generated in your VPC, so there are enough guardrails in place on who can access the data or the model. In fact, whenever a model is being used, e.g. a proprietary model, the model weights are protected so that consumers of the model don’t get access to the model. At the same time, the data that’s being used to fine-tune the model is not available to AWS or the model provider
It’s important to have a comprehensive data strategy that augments your gen AI applications. You have a variety of different data sources and in the case of structured or vector data, in some cases you may want to label the data. There are services in AWS which can be used for labeling the data which will give you more accurate results when you fine-tune your model. Then of course, you also may need to integrate multiple datasets. There are capabilities for ETL, so you can connect all your different data sources. Data and ML governance are also available as you build out your application
When you’re trying to run foundation models, it’s important that you have the more performant infrastructure at the lowest cost. AWS silicon supports many of the foundation models – you have the choice of using the GPUs when it comes to hosting and training foundation models like the A100s and the H100s, there is also custom silicon, AWS Inferentia 2, and Tranium available
Another fantastic way to leverage Amazon, is via targeted applications that enhance productivity. These are a few popular options:
Amazon has heavily leveraged Coursera to provide a comprehensive suite of offerings on the topic.
You can choose the service that you want to learn about + your role in your organization, and it gives you a bunch of resources to learn more. And should you have specific asks on a Bedrock Workshop or you want to learn about Sagemaker Jumpstart, AWS has hands-on workshops where their specialists will engage with you and help set things up. AWS is also investing a large amount of money into their generative AI innovation center, which will connect you with ML and AI experts, so that if you have an idea, they can help you transform that into a generative AI solution
Karthik is an product and engineering leader with experience in product management, product strategy, execution and launch. He manages a team of product managers, TPMs, UX designers, Business Development Managers, Solution Specialists and engineers on Amazon SageMaker. He has incubated and grown several businesses including Amazon Neptune at AWS, PowerApps, Windows Azure BizTalk Services, SQL Server at Microsoft and shipped across all layers of the technology stack – graphs, relational databases, middleware, and low-code/no-code app development.
Karthik holds a masters degree in business and an undergraduate in computer engineering.