Skip to content

Commit bfdb3cb

Browse files
authored
Merge pull request #116 from OpenPecha/hot-fix-sent-tokenizer
fix:tokenizing sentence by verb is failing.
2 parents cf84da0 + 0c6b7f1 commit bfdb3cb

1 file changed

Lines changed: 3 additions & 3 deletions

File tree

botok/tokenizers/sentencetokenizer.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -107,9 +107,9 @@ def get_sentence_indices(tokens):
107107
sentence_idx = piped_sentencify(sentence_idx, tokens, is_verb_n_punct)
108108

109109
# 4. find verbs followed by clause boundaries
110-
sentence_idx = piped_sentencify(
111-
sentence_idx, tokens, is_verb_n_clause_boundary, threshold=30
112-
) # max size to check
110+
# sentence_idx = piped_sentencify(
111+
# sentence_idx, tokens, is_verb_n_clause_boundary, threshold=30
112+
# ) # max size to check
113113

114114
# joining the sentences without verbs to either the one preceding them or following them
115115
sentence_idx = join_no_verb_sentences(sentence_idx, tokens)

0 commit comments

Comments
 (0)