Skip to content

[Bug]: CosineStrategy not generating results #1424

@lmseidler

Description

@lmseidler

crawl4ai version

0.7.4

Expected Behavior

extractor = CosineStrategy("some filter")
clusters = extractor.extract(url, html) <-- should return a list containing the clustering results

Current Behavior

clusters = extractor.extract(url, html) <-- always returns an empty list

Only works if e.g extractor.DEL = "\n"

The reason for the bug is apparently that CosineStrategy.DEL doesn't point to an appropriate character. Therefore, the html.split in the beginning of extract fails to produce more than one item. In filter_document_embeddings an empty list gets returned because at_least_k = 1 // 2 = 0. extract then proceeds to return an empty list as output.

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Take any URL and HTML content and use CosineStrategy.extract. Will always return an empty list.

Code snippets

OS

linux

Python version

3.11

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working📌 Root causedidentified the root cause of bug

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions