Skip to content

can't load word2vec model when running example code #33

@estathop

Description

@estathop

I am trying to execute the LSTM to LSTM auto-encoder with word embedding (RNN to RNN architecture) example. I have already trained my own word2vec model via gensim and saved it with the command
model.save('/home/estathop/Documents/word2vecmodel/w2v1model') #save model
when trying to use the

# load Gensim word2vec from word2vec_model_path
word2vec = GensimWord2vec('/home/estathop/Documents/word2vecmodel/w2v1model')

the following error occurs:

Traceback (most recent call last):

File "", line 5, in
word2vec = GensimWord2vec('/home/estathop/Documents/word2vecmodel/w2v1model')

File "/home/estathop/anaconda2/envs/tensorflow/lib/python2.7/site-packages/seq2vec/word2vec/gensim_word2vec.py", line 9, in init
model_path, binary=True

File "/home/estathop/anaconda2/envs/tensorflow/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 1120, in load_word2vec_format
limit=limit, datatype=datatype)

File "/home/estathop/anaconda2/envs/tensorflow/lib/python2.7/site-packages/gensim/models/utils_any2vec.py", line 174, in _load_word2vec_format
header = utils.to_unicode(fin.readline(), encoding=encoding)

File "/home/estathop/anaconda2/envs/tensorflow/lib/python2.7/site-packages/gensim/utils.py", line 359, in any2unicode
return unicode(text, encoding, errors=errors)

File "/home/estathop/anaconda2/envs/tensorflow/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

any ideas how to fix/bypass this ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions