AWS EC2 VLLM
Bases: BaseServerObject
Serve an Expert Model using VLLM on AWS EC2 Machine Expert models operate independently and can be hosted on different or common machines. They may include closed-source models like GPT-4. Refer to (app/configs/demo_orch_ec2_mix.json) for configuration details.
Source code in app/servers/vllm_ec2_server.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
__init__(**kwargs)
Source code in app/servers/vllm_ec2_server.py
19 20 21 22 23 24 25 26 27 |
|
get_response(message, stream=False)
Generate Text using the expert LLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message |
str
|
Input Query |
required |
stream |
bool
|
Get a response stream. Defaults to False. |
False
|
Returns:
Name | Type | Description |
---|---|---|
response |
str
|
Output Text Generation |
Source code in app/servers/vllm_ec2_server.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
start_inference_endpoint(max_wait_time=120)
Starts a new tumx session with name 'tmux_session_name' and activates the environment 'pytorch'.
NOTE: we provide an aws ami that has the required conda env and VLLM installed and ready to used. For changing the env refer app/utils/ssh_utils.py : start_vllm_server
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_wait_time |
int
|
Defaults to 120. |
120
|
Source code in app/servers/vllm_ec2_server.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
start_server()
Starts a Dedicated EC2 Instance The 'instance_name' serves as a unique identifier. If an instance tagged with 'instance_name' is already present in a given region, the operation is aborted. The unique identifier can be edited in app/server_manager.py: get_server.
Source code in app/servers/vllm_ec2_server.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
stop_server()
Stops the Ec2 server.
Source code in app/servers/vllm_ec2_server.py
77 78 79 80 81 82 83 84 |
|