Introduction

Rubeus is an unopionionated edge worker for buliding with Large Language Models (LLMs). Catering to a range of LLM providers, Rubeus extends beyond a unified API, becoming a powerful ally that expertly handles retries, fallbacks, and load distribution. The essence of Rubeus isn't merely about initiating requestsβ€”it's about ensuring these requests are handled intelligently and efficiently. With Rubeus, you're harnessing the power of language models, Axios-style! πŸ’ΌπŸš€

Key Features

Supported Providers

ProviderSupport StatusSupported Endpoints
OpenAI Supported/completion, /embed
Anthropic Supported/complete
Cohere Supportedgenerate, embed
Google Bard Coming Soon
LocalAI Coming Soon

Getting Started

npm install
npm run dev # To run locally
npm run deploy # To deploy to cloudflare

The local server runs on http://localhost:8787 by default that is the base url for all requests.

Usage

Interoperability

Rubeus allows you to switch between different language learning models from various providers, making it a highly flexible tool. The following example shows a request to openai, but you could change the provider name to cohere, anthropic or others and Rubeus will automatically handle everything else.

curl --location 'http://127.0.0.1:8787/complete' \
--header 'Content-Type: application/json' \
--data-raw '{
    "config": {
        "provider": "openai"
    },
    "params": {
        "prompt": "What are the top 10 happiest countries in the world?",
        "max_tokens": 50,
        "model": "text-davinci-003",
        "user": "jbu3470"
    }
}'

Fallback Strategies

In case one provider fails, Rubeus is designed to automatically switch to another, ensuring uninterrupted service.

# Fallback to anthropic, if openai fails (This API will use the default text-davinci-003 and claude-v1 models)
curl --location 'http://127.0.0.1:8787/complete' \
--header 'Content-Type: application/json' \
--data-raw '{
    "config": {
        "mode": "fallback",
        "options": [
          {"provider": "openai"}, 
          {"provider": "anthropic"}
        ]
    },
    "params": {
        "prompt": "What are the top 10 happiest countries in the world?",
        "max_tokens": 50,
        "user": "jbu3470"
    }
}'


# Fallback to gpt-3.5-turbo when gpt-4 fails
curl --location 'http://127.0.0.1:8787/complete' \
--header 'Content-Type: application/json' \
--data-raw '{
    "config": {
        "mode": "fallback",
        "options": [
          {"provider": "openai", "params_to_override": {"model": "gpt-4"} }, 
          {"provider": "anthropic", "params_to_override": {"model": "gpt-3.5-turbo"} }
        ]
    },
    "params": {
        "messages": {"role": "user", "content": "What are the top 10 happiest countries in the world?"},
        "max_tokens": 50,
        "user": "jbu3470"
    }
}'

Retry Strategies

Rubeus has a built-in mechanism to retry failed requests, eliminating the need for manual re-runs.

# Add the retry configuration to enable exponential back-off retries
curl --location 'http://127.0.0.1:8787/complete' \
--header 'Content-Type: application/json' \
--data-raw '{
    "config": {
        "mode": "single",
        "options": [{
            "provider": "openai",
            "retry": {
                "attempts": 3,
                "onStatusCodes": [429,500,504,524]
            }
        }]
    },
    "params": {
        "prompt": "What are the top 10 happiest countries in the world?",
        "max_tokens": 50,
        "model": "text-davinci-003",
        "user": "jbu3470"
    }
}'

Load Balancing

Manage your workload effectively with Rubeus's custom weight-based distribution across multiple API keys or providers.

# Load balance 50-50 between gpt-3.5-turbo and claude-v1
curl --location 'http://127.0.0.1:8787/complete' \
--header 'Content-Type: application/json' \
--data '{
    "config": {
        "mode": "loadbalance",
        "options": [{
            "provider": "openai",
            "weight": 0.5,
            "params_to_override": { "model": "gpt-3.5-turbo" }
        }, {
            "provider": "anthropic",
            "weight": 0.5,
            "params_to_override": { "model": "claude-v1" }
        }]
    },
    "params": {
        "messages": {"role": "user","content":"What are the top 10 happiest countries in the world?"},
        "max_tokens": 50,
        "user": "jbu3470"
    }
}'

Unified API Signature

If you're familiar with OpenAI's API, you'll find Rubeus's API easy to use due to its unified signature.

# OpenAI query
curl --location 'http://127.0.0.1:8787/complete' \
--header 'Content-Type: application/json' \
--data-raw '{
    "config": {
        "provider": "openai"
    },
    "params": {
        "prompt": "What are the top 10 happiest countries in the world?",
        "max_tokens": 50,
        "user": "jbu3470"
    }
}'

# Anthropic Query
curl --location 'http://127.0.0.1:8787/complete' \
--header 'Content-Type: application/json' \
--data-raw '{
    "config": {
        "provider": "anthropic"
    },
    "params": {
        "prompt": "What are the top 10 happiest countries in the world?",
        "max_tokens": 50,
        "user": "jbu3470"
    }
}'

Roadmap

  1. Support for more providers, including Google Bard and LocalAI.
  2. Enhanced load balancing features to optimize resource use across different models and providers.
  3. More robust fallback and retry strategies to further improve the reliability of requests.
  4. Increased customizability of the unified API signature to cater to more diverse use cases.

πŸ’¬ Participate in Roadmap discussions here.

Contributing

  • Checkout Good First Issue to start contributing!
  • Bug Report? File here.
  • Feature Request? File here.
  • Reach out to the developers directly: Rohit | Ayush

License

Rubeus is licensed under the MIT License. See the LICENSE file for more details.