Skip to content

Save LLM interactions' metadata#62

Open
FranciscoTGouveia wants to merge 3 commits intomainfrom
save-llm-interaction-info
Open

Save LLM interactions' metadata#62
FranciscoTGouveia wants to merge 3 commits intomainfrom
save-llm-interaction-info

Conversation

@FranciscoTGouveia
Copy link
Copy Markdown
Collaborator

Self-explanatory.

self.current.prompt = relative
return

@require_active
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is needed; I've follow the rationale on save_prompt.

Copy link
Copy Markdown
Collaborator

@frediramos frediramos Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is needed.

SaveScrapers provides a way to save scraper descriptions, prompts, etc, ... without the client code needing to know whether save mode is active or not.

Client code just calls methods like this:

self.bench.save_prompt(prompt)

Then
@require_active handles it

# Save LLM interaction log if any interactions occurred
if llm.interaction_log and settings.stats:
name = website.__class__.__name__.lower()
ts = current_ts()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this too much information?
I believe that simply adding the scraper would constantly cause overwrites, which, IMO, we want to avoid, so I have also appended the timestamp.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously, this code could not stay in main.py.
But is also rather not needed.

The SaveStats class already offers a mk_topdir() method to create a unique folder with the current timestamp.

As a suggestion, you could make topdir static and therefore all SaveStats subclasses could share it.

For reference, see the save_json() code in Scrapers.py

Copy link
Copy Markdown
Collaborator

@frediramos frediramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual LLM metrics extraction seems quite good.
Just some needes some housekeeping.

More details below.

self.current.prompt = relative
return

@require_active
Copy link
Copy Markdown
Collaborator

@frediramos frediramos Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is needed.

SaveScrapers provides a way to save scraper descriptions, prompts, etc, ... without the client code needing to know whether save mode is active or not.

Client code just calls methods like this:

self.bench.save_prompt(prompt)

Then
@require_active handles it

return

@require_active
def save_llm_log(self, llm):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this logic should be moved to a webcap/lib/bench/llm.py file
in a class named like class SaveLLM(SaveStats):

self.content = source.html_string


@dataclass
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this logic should be moved to the SaveLLM class

def __init__(self, api_key: Optional[str] = None):
"""Initialize the LLM client."""
self.api_key = api_key
self.interaction_log: List[LLMInteraction] = []
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this new logic can also be moved to the SaveLLM class

The idea is that the LLMClient is instantiated just like ScraperGen:

def init(self, api_key, save_path: str = None):
self.bench = SaveLLM(llm_name, save_path)

@@ -146,6 +148,7 @@ def operation():
# post operation
self._add_user_message_to_history(message)
self._add_assistant_response_to_history(response)
Copy link
Copy Markdown
Collaborator

@frediramos frediramos Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The llm client should then just call something like:

self.bench.save_llm_stats(*args)

return None

# post operation
self._record_interaction(prompt, self.response_to_text(response), self.extract_usage(response))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ipsis Verbis

# Save LLM interaction log if any interactions occurred
if llm.interaction_log and settings.stats:
name = website.__class__.__name__.lower()
ts = current_ts()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously, this code could not stay in main.py.
But is also rather not needed.

The SaveStats class already offers a mk_topdir() method to create a unique folder with the current timestamp.

As a suggestion, you could make topdir static and therefore all SaveStats subclasses could share it.

For reference, see the save_json() code in Scrapers.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants