Releases · sfarhat/Convolutional-Forced-Alignment

04 Aug 01:24

sfarhat

v0.2.0

3c23c90

Trained TIMIT Model + Forced Alignment + GRAD-CAM Latest

Latest

A deep CNN inspired by Zhang et. al 2016 trained on the TIMIT dataset, using a different loss than CTC. Focusing on the idea of an "ideal alignment" and a slight modification to preprocessing the data, this model is able to classify phonemes with a Phoneme Error Rate of around 22% on the TIMIT test set, in addition to a 67 ms Average Alignment Error on both the train and test sets.

The hyperparameters are:

ADAM LR: 10e-5
Batch size: 3
Epochs: 15
Activation: PReLU

In addition, Forced alignment is possible on any provided input file as well as generating class activation maps (GRAD-CAM) for desired phonemes/words.

Assets 3

03 Aug 02:08

sfarhat

v0.1.0

23327f4

Trained Librispeech Model

A network similar to that described in this paper (Zhang et. al 2017) is fully implemented and trained. The differing details of this trained model relative to the one above are as follows:

PReLU activation
batch size of 3
ADAM optimizer with LR=10e-5
trained for 50 epochs on the train-clean-100 dataset of LibriSpeech

The trained model weights are attached. It achieves a 29.97% Character Error Rate based off of greedy decoding.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: sfarhat/Convolutional-Forced-Alignment

Trained TIMIT Model + Forced Alignment + GRAD-CAM

Uh oh!

Trained Librispeech Model

Uh oh!