-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Open
Labels
🐞 BugSomething isn't workingSomething isn't working📌 Root causedidentified the root cause of bugidentified the root cause of bug
Description
crawl4ai version
0.7.4
Expected Behavior
extractor = CosineStrategy("some filter")
clusters = extractor.extract(url, html) <-- should return a list containing the clustering results
Current Behavior
clusters = extractor.extract(url, html) <-- always returns an empty list
Only works if e.g extractor.DEL = "\n"
The reason for the bug is apparently that CosineStrategy.DEL doesn't point to an appropriate character. Therefore, the html.split in the beginning of extract fails to produce more than one item. In filter_document_embeddings an empty list gets returned because at_least_k = 1 // 2 = 0. extract then proceeds to return an empty list as output.
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Take any URL and HTML content and use CosineStrategy.extract. Will always return an empty list.
Code snippets
OS
linux
Python version
3.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
🐞 BugSomething isn't workingSomething isn't working📌 Root causedidentified the root cause of bugidentified the root cause of bug