Skip to content

Leeroo Orchestrator

For more details refer the paper Leeroo Orchestrator: Elevating LLMs Performance Through Model Integration.

Project layout

  • orchestrator.py - The Orchestrator serves as the central wrapper object, facilitating seamless coordination among expert models.
  • server_manager.py - Manages the servers implemented in the ./servers/ directory.
  • test_orchestrator.py - Provides an example of running the orchestrator for testing purposes.
  • abstract_classes.py - Contains base classes for servers, serving as foundational structures for server implementations.
  • ./configs/* - Holds configuration files used by the orchestrator for various settings.
  • ./servers/* - Houses all server implementations, each dedicated to serving specific serving strategy/machine i.e Ec2 vllm/sagemaker/... etc.
  • ./utils/* - Includes utility functions used by servers to enhance functionality.
  • .env - Specifies the required environment variables for proper system operation.

AWS_SAGEMAKER_ROLE_NAME = "*****"   # Role is used for sagemaker connections.
SECURITY_GROUP_ID       = "****"    # SecurityGroupId is used to create an instance. make sure to have inference port open for this group.
AWS_ACCESS_KEY_ID       = "*****"   # 
AWS_SECRET_ACCESS_KEY   = "*****"   #
HUGGING_FACE_HUB_TOKEN  = "*****"   # Required for authenticating with HF for downloading checkopints
OPENAI_ORGANIZATION     = "*****"
OPENAI_API_KEY          = "*****"   # If openai models are used as experts

Run Orchestrator server

import json
import time
from app.orchestrator import Orchestrator


config = json.load(open("app/configs/demo_orch_sagemaker_mix.json", "r"))

# init
leeroo_orchestrator = Orchestrator(config)

# boot the machines
leeroo_orchestrator.load_orchestrator_server()
leeroo_orchestrator.load_experts_server()

# start the inference endpoints
leeroo_orchestrator.start_inference_endpoints(max_wait_time=120)


# Wait until all endpoints are up
status = False
while not status:
    print("Checking server status...")
    status = leeroo_orchestrator.check_servers_state()
    if status:
        print("Servers are running...")
        break
    time.sleep(30)

# Test get_response for all the servers
for expert in leeroo_orchestrator.experts.values():
    print(expert.model_id)
    print(expert.get_response("hello"))

# Test get_response for complete pipeline
response = leeroo_orchestrator.get_response("What is the capital of India?")
print(response)

# turn off the machines
leeroo_orchestrator.orchestrator.stop_server()
for expert_id, expert in leeroo_orchestrator.experts.items():
    res = expert.stop_server()
    print(res)

print("done!")