-
Notifications
You must be signed in to change notification settings - Fork 452
[FEAT] Llama Stack support via its Python client #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
dot-agi
merged 74 commits into
AgentOps-AI:main
from
thaddavis:tduval/feature/llamaStackClientSupport
Dec 10, 2024
Merged
Changes from 25 commits
Commits
Show all changes
74 commits
Select commit
Hold shift + click to select a range
52c132e
llama_stack_client LLM provider
www ec8445d
ruff
teocns 2fac7a0
refining and documenting the llama stack integration support & proces…
www 4299bcb
fixing error in the core_manual_test for Llama Stack
www 5fb0b36
removing unnecessary elif branch in llama_stack_client.py llm provider
www cfa1899
updating llama stack examples & documentation
www af70799
updating llama_stack_client_examples.ipynb
www 0b56e40
saving tweaks to Llama Stack client examples and related README.md af…
www f855873
saving v1 of patching of Llama Stack Agent.create_turn method
www 6bf54e5
save progress to testing Llama Stack Agent class and Inference class
www 3dc0d2f
minor edits
www b815ef3
removing unneeded code
www 888b635
format line
www 1c7c1de
adding support for monitoring tools
www 187963b
for completeness
www c1a58f2
remove logs
areibman ac3e01e
implemeting code review
www 8122c3f
saving progress of getting agent monkeypatch tested in the ipynb
www b131246
saving testing scaffold and preliminary fireworks setup/support
www ae572ba
remove Fireworks API key
www 0a12c5c
removing uneeded global
www 9a43d74
enhance(compose): remove deprecate version attr
teocns 13950fc
Removing some redundancies
teocns fe06a44
saving tweak to custom docker-compose.yaml for llama stack
www 65a5ab4
saving solid docker-compose for spinning up ollama with a llama-stack
www 0114ede
adding documentation for Llama Stack integration
www fa80099
rename compose.yaml files to follow the standard docker compose format
www b1e4335
minor tweaks
www dd27a37
add disclaimer in the Fireworks docker compose file
www 9c4ab6e
pushing for Alex
www 978d4f0
saving changes to track Llama Stack Agent events with a stack data st…
www d235643
removing commented code
www 3ee63cc
tweak handle_stream_chunk in handle_response function of Llama Stack …
www 44494e1
removing comments
www bcf22a8
inference_canary 1 and 2 now for clarity
www a4fec78
organizing canaries
www 7319616
not a big deal
teocns 034e25c
readme
teocns 796b6bc
maintain filename standards under /root
teocns 8f79564
Merge branch 'main' into tduval/feature/llamaStackClientSupport
dot-agi 3a8ca51
ollama: healthcheck on localhost rather; healthcheck relaxed
teocns 998231e
progress, api hitting, network ok
teocns e6d2200
Is this path relevant?
teocns eef3730
INFERENCE_MODEL default
teocns 537f950
unused network
teocns 0c2d9d9
host:port instead of URL in run.yaml
teocns 68218d2
seems fixed
teocns ecee563
on non Apple silicon: must try between llamastack-local-cpu and disti…
teocns 3f7fc68
providers.config.url | ollama HOST
teocns 95878e2
save
teocns 42e4c4f
env.tpl
teocns a3a81fc
right configs
teocns ebb2ea1
DONE
teocns 77a98a3
pushing enhancement before merge
www 21261a3
Merge branch 'main' into tduval/feature/llamaStackClientSupport
dot-agi 626f87a
Merge branch 'main' into tduval/feature/llamaStackClientSupport
dot-agi c77339e
add `Instruct` to model name
dot-agi 2539b7b
clean and increase memory
dot-agi d541d43
test cleanup
dot-agi f9a6b07
clean notebook
dot-agi 51847e7
updated examples readme and notebook renamed
dot-agi 81651fa
updated docs
dot-agi d76b784
clean integration code
dot-agi cdf33a7
linting
dot-agi e66c878
linting tests
dot-agi efeffd0
fix generator bug
www 27db3df
saving working agent_canary test - works in Python notebook AND in sc…
www 35ce1db
clean notebook and remove commented code
dot-agi acc41f1
Merge branch 'main' into tduval/feature/llamaStackClientSupport
dot-agi 72921f5
Merge branch 'main' into tduval/feature/llamaStackClientSupport
dot-agi e3d75d7
deleting llama-stack test
dot-agi 1777fed
add llama stack to examples
dot-agi 637df3f
ruff
dot-agi f946881
fix import
dot-agi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,236 @@ | ||
import inspect | ||
import pprint | ||
import sys | ||
from typing import Dict, Optional | ||
|
||
from agentops.event import LLMEvent, ErrorEvent, ToolEvent | ||
from agentops.session import Session | ||
from agentops.log_config import logger | ||
from agentops.helpers import get_ISO_time, check_call_stack_for_agent_id | ||
from agentops.llms.instrumented_provider import InstrumentedProvider | ||
|
||
class LlamaStackClientProvider(InstrumentedProvider): | ||
original_complete = None | ||
original_create_turn = None | ||
|
||
|
||
def __init__(self, client): | ||
super().__init__(client) | ||
self._provider_name = "LlamaStack" | ||
|
||
def handle_response(self, response, kwargs, init_timestamp, session: Optional[Session] = None, metadata: Optional[Dict] = {}) -> dict: | ||
teocns marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Handle responses for LlamaStack""" | ||
try: | ||
accum_delta = None | ||
accum_tool_delta = None | ||
|
||
def handle_stream_chunk(chunk: dict): | ||
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs) | ||
if session is not None: | ||
llm_event.session_id = session.session_id | ||
|
||
# NOTE: prompt/completion usage not returned in response when streaming | ||
# We take the first ChatCompletionResponseStreamChunkEvent and accumulate the deltas from all subsequent chunks to build one full chat completion | ||
if llm_event.returns is None: | ||
llm_event.returns = chunk.event | ||
|
||
try: | ||
nonlocal accum_delta | ||
llm_event.agent_id = check_call_stack_for_agent_id() | ||
llm_event.model = kwargs["model_id"] | ||
llm_event.prompt = kwargs["messages"] | ||
|
||
# NOTE: We assume for completion only choices[0] is relevant | ||
# chunk.event | ||
|
||
if chunk.event.event_type == "start": | ||
accum_delta = chunk.event.delta | ||
elif chunk.event.event_type == "progress": | ||
accum_delta += chunk.event.delta | ||
elif chunk.event.event_type == "complete": | ||
llm_event.prompt = [ | ||
{"content": message.content, "role": message.role} for message in kwargs["messages"] | ||
] | ||
llm_event.agent_id = check_call_stack_for_agent_id() | ||
llm_event.prompt_tokens = None | ||
llm_event.completion = accum_delta | ||
llm_event.completion_tokens = None | ||
llm_event.end_timestamp = get_ISO_time() | ||
self._safe_record(session, llm_event) | ||
|
||
except Exception as e: | ||
self._safe_record(session, ErrorEvent(trigger_event=llm_event, exception=e)) | ||
|
||
kwargs_str = pprint.pformat(kwargs) | ||
chunk = pprint.pformat(chunk) | ||
logger.warning( | ||
f"Unable to parse a chunk for LLM call. Skipping upload to AgentOps\n" | ||
f"chunk:\n {chunk}\n" | ||
f"kwargs:\n {kwargs_str}\n" | ||
) | ||
|
||
def handle_stream_agent(chunk: dict): | ||
# NOTE: prompt/completion usage not returned in response when streaming | ||
# We take the first ChatCompletionResponseStreamChunkEvent and accumulate the deltas from all subsequent chunks to build one full chat completion | ||
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs) | ||
tool_event = ToolEvent(init_timestamp=init_timestamp, params=kwargs) | ||
|
||
if session is not None: | ||
llm_event.session_id = session.session_id | ||
|
||
if llm_event.returns is None: | ||
llm_event.returns = chunk.event | ||
|
||
try: | ||
if chunk.event.payload.event_type == "turn_start": | ||
pass | ||
elif chunk.event.payload.event_type == "step_start": | ||
pass | ||
elif chunk.event.payload.event_type == "step_progress": | ||
|
||
if (chunk.event.payload.step_type == "inference" and chunk.event.payload.text_delta_model_response): | ||
nonlocal accum_delta | ||
delta = chunk.event.payload.text_delta_model_response | ||
llm_event.agent_id = check_call_stack_for_agent_id() | ||
llm_event.prompt = kwargs["messages"] | ||
|
||
if accum_delta: | ||
accum_delta += delta | ||
else: | ||
accum_delta = delta | ||
elif (chunk.event.payload.step_type == "inference" and chunk.event.payload.tool_call_delta): | ||
|
||
if (chunk.event.payload.tool_call_delta.parse_status == "started"): | ||
tool_event.name = "ToolExecution - started" | ||
self._safe_record(session, tool_event) | ||
elif (chunk.event.payload.tool_call_delta.parse_status == "in_progress"): | ||
nonlocal accum_tool_delta | ||
delta = chunk.event.payload.tool_call_delta.content | ||
if accum_tool_delta: | ||
accum_tool_delta += delta | ||
else: | ||
accum_tool_delta = delta | ||
elif (chunk.event.payload.tool_call_delta.parse_status == "success"): | ||
tool_event.name = "ToolExecution - success" | ||
tool_event.params["completion"] = accum_tool_delta | ||
self._safe_record(session, tool_event) | ||
elif (chunk.event.payload.tool_call_delta.parse_status == "failure"): | ||
self._safe_record(session, ErrorEvent(trigger_event=tool_event, exception=Exception("ToolExecution - failure"))) | ||
|
||
elif chunk.event.payload.event_type == "step_complete": | ||
if (chunk.event.payload.step_type == "inference"): | ||
llm_event.prompt = [ | ||
{"content": message['content'], "role": message['role']} for message in kwargs["messages"] | ||
] | ||
llm_event.agent_id = check_call_stack_for_agent_id() | ||
llm_event.model = metadata.get("model_id", "Unable to identify model") | ||
llm_event.prompt_tokens = None | ||
llm_event.completion = accum_delta or kwargs["completion"] | ||
llm_event.completion_tokens = None | ||
llm_event.end_timestamp = get_ISO_time() | ||
self._safe_record(session, llm_event) | ||
elif (chunk.event.payload.step_type == "tool_execution"): | ||
tool_event.name = "ToolExecution - complete" | ||
tool_event.params["completion"] = accum_tool_delta | ||
self._safe_record(session, tool_event) | ||
elif chunk.event.payload.event_type == "turn_complete": | ||
pass | ||
|
||
except Exception as e: | ||
self._safe_record(session, ErrorEvent(trigger_event=llm_event, exception=e)) | ||
|
||
kwargs_str = pprint.pformat(kwargs) | ||
chunk = pprint.pformat(chunk) | ||
logger.warning( | ||
f"Unable to parse a chunk for LLM call. Skipping upload to AgentOps\n" | ||
f"chunk:\n {chunk}\n" | ||
f"kwargs:\n {kwargs_str}\n" | ||
) | ||
|
||
if kwargs.get("stream", False): | ||
def generator(): | ||
for chunk in response: | ||
handle_stream_chunk(chunk) | ||
yield chunk | ||
return generator() | ||
elif inspect.isasyncgen(response): | ||
async def async_generator(): | ||
dot-agi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
async for chunk in response: | ||
handle_stream_agent(chunk) | ||
yield chunk | ||
return async_generator() | ||
else: | ||
llm_event = LLMEvent(init_timestamp=init_timestamp, params=kwargs) | ||
if session is not None: | ||
llm_event.session_id = session.session_id | ||
|
||
llm_event.returns = response | ||
dot-agi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
llm_event.agent_id = check_call_stack_for_agent_id() | ||
llm_event.model = kwargs["model_id"] | ||
llm_event.prompt = [{"content": message.content, "role": message.role} for message in kwargs["messages"]] | ||
llm_event.prompt_tokens = None | ||
llm_event.completion = response.completion_message.content | ||
dot-agi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
llm_event.completion_tokens = None | ||
llm_event.end_timestamp = get_ISO_time() | ||
|
||
self._safe_record(session, llm_event) | ||
except Exception as e: | ||
self._safe_record(session, ErrorEvent(trigger_event=llm_event, exception=e)) | ||
kwargs_str = pprint.pformat(kwargs) | ||
response = pprint.pformat(response) | ||
logger.warning( | ||
f"Unable to parse response for LLM call. Skipping upload to AgentOps\n" | ||
f"response:\n {response}\n" | ||
f"kwargs:\n {kwargs_str}\n" | ||
) | ||
|
||
return response | ||
|
||
def _override_complete(self): | ||
from llama_stack_client.resources import InferenceResource | ||
|
||
global original_complete | ||
original_complete = InferenceResource.chat_completion | ||
|
||
def patched_function(*args, **kwargs): | ||
# Call the original function with its original arguments | ||
init_timestamp = get_ISO_time() | ||
session = kwargs.get("session", None) | ||
if "session" in kwargs.keys(): | ||
del kwargs["session"] | ||
result = original_complete(*args, **kwargs) | ||
return self.handle_response(result, kwargs, init_timestamp, session=session) | ||
|
||
# Override the original method with the patched one | ||
InferenceResource.chat_completion = patched_function | ||
|
||
def _override_create_turn(self): | ||
from llama_stack_client.lib.agents.agent import Agent | ||
|
||
self.original_create_turn = Agent.create_turn | ||
|
||
def patched_function(*args, **kwargs): | ||
# Call the original function with its original arguments | ||
init_timestamp = get_ISO_time() | ||
session = kwargs.get("session", None) | ||
if "session" in kwargs.keys(): | ||
del kwargs["session"] | ||
result = self.original_create_turn(*args, **kwargs) | ||
return self.handle_response(result, kwargs, init_timestamp, session=session, metadata={"model_id": args[0].agent_config.get("model")}) | ||
|
||
# Override the original method with the patched one | ||
Agent.create_turn = patched_function | ||
|
||
|
||
def override(self): | ||
self._override_complete() | ||
self._override_create_turn() | ||
|
||
def undo_override(self): | ||
if self.original_complete is not None: | ||
from llama_stack_client.resources import InferenceResource | ||
InferenceResource.chat_completion = self.original_complete | ||
|
||
if self.original_create_turn is not None: | ||
from llama_stack_client.lib.agents.agent import Agent | ||
Agent.create_turn = self.original_create_turn |
dot-agi marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
## | ||
|
||
https://github.com/meta-llama/llama-stack/blob/main/distributions/fireworks/run.yaml | ||
|
||
## | ||
|
||
```sh | ||
docker-compose -f fireworks-server-config.yaml up | ||
dot-agi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.