Strange behaviour of `do_tfidf`

I am trying to reproduce the [example from the manual](https://exploratory.io/reference/#do_tfidf)

```
> res <- data.frame("text" = c("this is what it is", "which is better")) %>%
+   do_tokenize(text) %>%
+   do_tfidf(document_id, token)
```

which is expected to result in:

| document_id | token | count_per_doc | count_of_docs | tfidf |
| --- | --- | --- | --- | --- |
| 1	| is	| 2	| 2	|0.0000000 |
| 1	| it	| 1	| 1	|0.5773503|
| 1	| this	| 1	| 1	|0.5773503|
| 1	| what	| 1	|1	|0.5773503|
| 2	| better	| 1	|1	|0.7071068 |
| 2	| is	| 1	| 2 | 0.0000000 |
| 2	| which	| 1	|1 | 0.7071068 |

However, I obtain
	
| document_id | token | count_per_doc | count_of_docs | tfidf |
| --- | --- | --- | --- | --- |
| 1	| is	| 2	| 2	|0.0000000 |
| 1	| it	| 1	| 1	|0.0000000 |
| 1	| this	| 1	| 1	|0.7071068 |
| 1	| what	| 1	|1	|0.7071068 |
| 2	| better	| 1	|1	|0.7071068 |
| 2	| is	| 1	| 2 | 0.0000000 |
| 2	| which	| 1	|1 | 0.7071068|


Another strange result is the following:

```
> data.frame("text" = c("good it was", "is nice she", "good is she")) %>%
+   do_tokenize(text) %>%
+   do_tfidf(document_id,token)
```
document_id|token|count_per_doc|count_of_docs|tfidf
---|---|---|---|---
1|good|1|2|0.327
1|it|1|1|0.327
1|was|1|1|0.887
2|is|1|2|0.327
2|nice|1|1|0.327
2|she|1|2|0.887
3|good|1|2|0.327
3|is|1|2|0.327
3|she|1|2|0.887

where I would expect to find identical values for "it" and "was"...






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behaviour of `do_tfidf` #841

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

document_id	token	count_per_doc	count_of_docs	tfidf
1	is	2	2	0.0000000
1	it	1	1	0.5773503
1	this	1	1	0.5773503
1	what	1	1	0.5773503
2	better	1	1	0.7071068
2	is	1	2	0.0000000
2	which	1	1	0.7071068

document_id	token	count_per_doc	count_of_docs	tfidf
1	good	1	2	0.327
1	it	1	1	0.327
1	was	1	1	0.887
2	is	1	2	0.327
2	nice	1	1	0.327
2	she	1	2	0.887
3	good	1	2	0.327
3	is	1	2	0.327
3	she	1	2	0.887

Strange behaviour of do_tfidf #841

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Strange behaviour of `do_tfidf` #841