Best way to deploy Open Source LLM Models

Paras Madan

4 min read

Deploying open source models can be a challenging task, especially when considering factors such as privacy, security, and cost-effectiveness. However, with the right knowledge and platforms, this process can be made significantly easier. I did extensive research on pricing, speed & several other factors like privacy, control etc and found out best possible way to deploy an Open Source LLM Model (in May 2024).

I am classifying the entire deployments into 2 categories: Self Managed and Hosted API's

Self Managed Deployment for LLM's

Deploying Large Language Models (LLMs) in a self-managed environment can be a challenging yet rewarding task. This approach provides full control over the model, allowing for greater flexibility to modify, train, or fine-tune the model. Challenges includes high latency and extra price plus a technical knowledge of MLOPs to do a successful deployment.

How this method works?

This method usually involves taking a GPU instance yourself (like a big EC2 or Azure VM) and then following the manual deployment process where you need to install python first and then then loading an LLM image from hugging face and then installing all dependencies and then configuring firewall or nginx routes and then finally getting a link ready for use.

Pros and Cons of this method:

Pros:

1. Full Control: Self-managed deployment provides full control over the model, allowing for greater flexibility to modify, train, or fine-tune the model.

2. User Experience: Understanding the strengths and limitations of LLMs and effectively leveraging their capabilities can lead to the development of innovative and impactful applications in diverse fields.

Cons:

1. Cost and Latency: Longer prompts increase the cost of inference, while the length of the output directly impacts latency. However, it is essential to note that cost and latency analysis for LLMs can quickly become outdated due to the rapid evolution of the field.

2. Resource Intensive: Self-managed deployment of LLMs can be resource-intensive, requiring significant computational power and storage capacity. This might not be feasible for all organizations, especially smaller ones with limited resources.

Hosted Open Source LLM API's

Now this method involves, using some platforms like Together AI, Refactor etc which provide API's to all these Open Source Models at a very effective price. All you need to do is use their code block and change the names of different models you want to use, that's it. So whether, a new model is released tomorrow or day after, your code block would always be same. Here are some hosted open-source LLM APIs.

1. Together AI

2. Replicate

3. Deep Infra

4. Perplexity

5. AWS Bedrock

Pros and Cons of Hosted Open Source LLM API's

Pros:

1. Ease of Use: Hosted APIs provide a platform for developers to leverage the power of LLMs without the need to manage the underlying infrastructure. This allows developers to focus on building applications and services that utilize the capabilities of LLMs.

2. Flexibility: These APIs often support a wide range of programming languages and platforms, making them versatile for different development environments.

3. Cost-Effective: Using hosted APIs can be more cost-effective than building and maintaining your own infrastructure for deploying LLMs.

Cons:

1. Limited Control: While hosted APIs provide ease of use, they may not offer the same level of control over the model as self-managed deployments. This could limit the ability to modify, train, or fine-tune the model.

2. Dependency: There is a dependency on the service provider for the availability and performance of the API. Any downtime or performance issues with the service provider can directly impact the applications and services using the API.

What's best for you?

Depending on your use case and overall cost, you can come to a final conclusion. But if you ask me, my personal advice would be following:

Hosted Open Source LLM API's for:

— Personal Projects

— Company projects where safety and privacy is not a huge concern

— Consumer facing applications where speed is important and cost allocated to project is less

— Startups which are bootstrapped

Self Managed Deployment for LLM's for:

— Projects where financial and banking data is involved (BFSI Sector)

— Projects where companies internal documents are being used and privacy and security is a big concern

If you like this blog you should also check out the videos I make on Instagram:https://www.instagram.com/parasmadan.in/

In case of any queries, feel free to reach out to me on parasmadan555@gmail.com

Llm

Mlops

Open Souce

Llama 2

artificial intelligence