Select providers
Continue makes it easy to use different providers for serving your chat, autocomplete, and embeddings models.
To select the ones you want to use, add them to your config.json.
Self-hosted
Local
You can run a model on your local computer using:
- Ollama
- LM Studio
- Llama.cpp
- KoboldCpp (OpenAI compatible server)
- llamafile ((OpenAI compatible server)
- LocalAI (OpenAI compatible server)
- Text generation web UI (OpenAI compatible server)
- FastChat (OpenAI compatible server)
- llama-cpp-python (OpenAI compatible server)
- TensorRT-LLM (OpenAI compatible server)
Remote
You can deploy a model in your AWS, GCP, Azure, or other clouds using:
- HuggingFace TGI
- vLLM
- SkyPilot
- Anyscale Private Endpoints (OpenAI compatible API)
SaaS
Open-source models
You can deploy open-source LLMs on a service using:
- Together
- HuggingFace Inference Endpoints
- Anyscale Endpoints (OpenAI compatible API)
- Replicate
- Deepinfra
- Groq (OpenAI compatible API)
- AWS Bedrock
Commercial models
You can use commercial LLMs via APIs using:
- Anthrophic API
- OpenAI API
- Azure OpenAI Service (OpenAI compatible API)
- Google Gemini API
- Mistral API
- Voyage AI API
- Cohere API
In addition to selecting providers, you will need to figure out what models to use.