Skip to main content

Select providers

Continue makes it easy to use different providers for serving your chat, autocomplete, and embeddings models.

To select the ones you want to use, add them to your config.json.

Self-hosted

Local

You can run a model on your local computer using:

Ollama
LM Studio
Llama.cpp
KoboldCpp (OpenAI compatible server)
llamafile ((OpenAI compatible server)
LocalAI (OpenAI compatible server)
Text generation web UI (OpenAI compatible server)
FastChat (OpenAI compatible server)
llama-cpp-python (OpenAI compatible server)
TensorRT-LLM (OpenAI compatible server)

Remote

You can deploy a model in your AWS, GCP, Azure, or other clouds using:

HuggingFace TGI
vLLM
SkyPilot
Anyscale Private Endpoints (OpenAI compatible API)

SaaS

Open-source models

You can deploy open-source LLMs on a service using:

Together
HuggingFace Inference Endpoints
Anyscale Endpoints (OpenAI compatible API)
Replicate
Deepinfra
Groq (OpenAI compatible API)
AWS Bedrock

Commercial models

You can use commercial LLMs via APIs using:

Anthrophic API
OpenAI API
Azure OpenAI Service (OpenAI compatible API)
Google Gemini API
Mistral API
Voyage AI API
Cohere API

In addition to selecting providers, you will need to figure out what models to use.

Self-hosted
- Local
- Remote
SaaS
- Open-source models
- Commercial models