Save LLM interactions' metadata by FranciscoTGouveia · Pull Request #62 · formalsec/webcap

FranciscoTGouveia · 2026-03-31T12:38:16Z

Self-explanatory.

…etadata

FranciscoTGouveia · 2026-03-31T12:41:33Z

src/webcap/lib/bench/scrapers.py

        self.current.prompt = relative
        return

+    @require_active


I am not sure this is needed; I've follow the rationale on save_prompt.

Yes, it is needed.

SaveScrapers provides a way to save scraper descriptions, prompts, etc, ... without the client code needing to know whether save mode is active or not.

Client code just calls methods like this:

self.bench.save_prompt(prompt)

Then
@require_active handles it

src/webcap/lib/synthesis/llms/base.py

FranciscoTGouveia · 2026-03-31T12:49:46Z

src/webcap/main.py

+    # Save LLM interaction log if any interactions occurred
+    if llm.interaction_log and settings.stats:
+        name = website.__class__.__name__.lower()
+        ts = current_ts()


Is this too much information?
I believe that simply adding the scraper would constantly cause overwrites, which, IMO, we want to avoid, so I have also appended the timestamp.

Obviously, this code could not stay in main.py.
But is also rather not needed.

The SaveStats class already offers a mk_topdir() method to create a unique folder with the current timestamp.

As a suggestion, you could make topdir static and therefore all SaveStats subclasses could share it.

For reference, see the save_json() code in Scrapers.py

frediramos

The actual LLM metrics extraction seems quite good.
Just some needes some housekeeping.

More details below.

frediramos · 2026-03-31T13:26:17Z

src/webcap/lib/bench/scrapers.py

        self.current.prompt = relative
        return

+    @require_active


Yes, it is needed.

SaveScrapers provides a way to save scraper descriptions, prompts, etc, ... without the client code needing to know whether save mode is active or not.

Client code just calls methods like this:

self.bench.save_prompt(prompt)

Then
@require_active handles it

frediramos · 2026-03-31T13:30:22Z

src/webcap/lib/bench/scrapers.py

        return

+    @require_active
+    def save_llm_log(self, llm):


All this logic should be moved to a webcap/lib/bench/llm.py file
in a class named like class SaveLLM(SaveStats):

frediramos · 2026-03-31T13:31:11Z

src/webcap/lib/synthesis/llms/base.py

            self.content = source.html_string


+@dataclass


All this logic should be moved to the SaveLLM class

src/webcap/lib/synthesis/llms/base.py

frediramos · 2026-03-31T13:46:33Z

src/webcap/lib/synthesis/llms/base.py

    def __init__(self, api_key: Optional[str] = None):
        """Initialize the LLM client."""
        self.api_key = api_key
+        self.interaction_log: List[LLMInteraction] = []


Most of this new logic can also be moved to the SaveLLM class

The idea is that the LLMClient is instantiated just like ScraperGen:

def init(self, api_key, save_path: str = None):
self.bench = SaveLLM(llm_name, save_path)

frediramos · 2026-03-31T13:48:02Z

src/webcap/lib/synthesis/llms/claude.py

@@ -146,6 +148,7 @@ def operation():
        # post operation
        self._add_user_message_to_history(message)
        self._add_assistant_response_to_history(response)


The llm client should then just call something like:

self.bench.save_llm_stats(*args)

frediramos · 2026-03-31T13:49:14Z

src/webcap/lib/synthesis/llms/gemini.py

            return None

        # post operation
+        self._record_interaction(prompt, self.response_to_text(response), self.extract_usage(response))


Ipsis Verbis

frediramos · 2026-03-31T14:20:27Z

src/webcap/main.py

+    # Save LLM interaction log if any interactions occurred
+    if llm.interaction_log and settings.stats:
+        name = website.__class__.__name__.lower()
+        ts = current_ts()


Obviously, this code could not stay in main.py.
But is also rather not needed.

The SaveStats class already offers a mk_topdir() method to create a unique folder with the current timestamp.

As a suggestion, you could make topdir static and therefore all SaveStats subclasses could share it.

For reference, see the save_json() code in Scrapers.py

FranciscoTGouveia added 3 commits March 31, 2026 13:25

Fix typos referring to deepseek in the claude class

58092bd

Implement extract_usage method for LLMs to keep the interaction's m…

8886279

…etadata

Save the LLM interaction's metadata to a specific file

41baf98

FranciscoTGouveia requested a review from frediramos March 31, 2026 12:38

FranciscoTGouveia commented Mar 31, 2026

View reviewed changes

frediramos reviewed Mar 31, 2026

View reviewed changes

Conversation

FranciscoTGouveia commented Mar 31, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frediramos Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frediramos left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frediramos Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frediramos Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

frediramos Mar 31, 2026 •

edited

Loading

frediramos left a comment •

edited

Loading

frediramos Mar 31, 2026 •

edited

Loading

frediramos Mar 31, 2026 •

edited

Loading