Jack Pearce

Deploy Open Source Models on Paperspace with OpenAI API Compatibility

Here’s a straightforward method to set up your own OpenAI-compatible API endpoint on Paperspace Gradient. This will allow you to deploy open source models and integrate them with other OpenAI tools.

Paperspace is a cloud-based machine learning platform that offers GPU-powered virtual machines and a Kubernetes-based container service. Paperspace Deployments are containers-as-a-service that allow you to run container images and serve machine learning models using a high-performance, low-latency service with a RESTful API.

  1. Set Up Your Paperspace Account:

    • Navigate to Gradient > Deployments > Create.
  2. Select Your Hardware:

    • Choose a GPU for your deployment, such as the P4000, available at $0.51/hr. Remember, you can start and stop the GPU as needed to manage costs.
  3. Configure Your Docker Image:

    • Use the Docker image ollama/ollama:latest. For more details on this image, visit Ollama GitHub page.
  4. Set the Ports:

    • Specify the port as 11434 for your deployment.
  5. Deployment and Access:

    • Upon deployment, you’ll receive an HTTPS endpoint.
    • Pull a new model image, for example, llama3, using the following command:
      curl https://<yourendpoint>.paperspacegradient.com/api/pull -d '{"name": "llama3"}'
  6. You’re done! You now have an OpenAI-compatible API endpoint available at your Gradient URL. It will work with OpenAI-compatible tools.

Paperspace Deployment Configuration JSON:

  "apiVersion": "v1",
  "image": "ollama/ollama:latest",
  "name": "ollama",
  "enabled": false,
  "resources": {
    "machineType": "RTX4000",
    "replicas": 1,
    "ports": [