Mask R-CNN is a convolution-based neural network for the task of object instance segmentation. The paper describing the model can be found here. This repository provides a script to train Mask R-CNN for Tensorflow on Habana Gaudi, and is an optimized version of the implementation in NVIDIA's Mask R-CNN for Gaudi.
Changes in the model:
- Support for Habana device was added
- Horovod support
Please visit this page for performance information. For more information about training deep learning models on Gaudi, visit developer.habana.ai.
Mask R-CNN builds on top of Faster R-CNN adding an additional mask head for the task of image segmentation.
The architecture consists of the following:
- ResNet-50 backbone with Feature Pyramid Network (FPN)
- Region proposal network (RPN) head
- RoI Align
- Bounding and classification box head
- Mask head
Please follow the instructions given in the following link for setting up the
environment including the $PYTHON environment variable: Gaudi Setup and
Installation Guide. Please
answer the questions in the guide according to your preferences. This guide will
walk you through the process of setting up your system to run the model on
Gaudi.
In the docker container, clone this repository and switch to the branch that
matches your SynapseAI version. (Run the
hl-smi
utility to determine the SynapseAI version.)
git clone -b [SynapseAI version] https://github.com/HabanaAI/Model-ReferencesGo to the Mask RCNN directory:
cd Model-References/TensorFlow/computer_vision/maskrcnnThis repository provides scripts to download and preprocess the COCO 2017 dataset. If you already have the data then you do not need to run the following script, proceed to Download the pre-trained weights.
The following script will save TFRecords files to the data directory, /data.
cd dataset
bash download_and_preprocess_coco.sh /dataBy default, the data directory is organized into the following structure:
<data_directory>
raw-data/
train2017/
train2017.zip
test2017/
test2017.zip
val2017/
val2017.zip
annotations_trainval2017.zip
image_info_test2017.zip
annotations/
instances_train2017.json
person_keypoints_train2017.json
captions_train2017.json
instances_val2017.json
person_keypoints_val2017.json
image_info_test2017.json
image_info_test-dev2017.json
captions_val2017.json
train-*.tfrecord
val-*.tfrecordThis repository also provides scripts to download the pre-trained weights of ResNet-50 backbone.
The script will make a new directory with the name weights in the current directory and download the pre-trained weights in it.
./download_and_process_pretrained_weights.shEnsure that the weights folder created has a resnet folder in it.
Inside the resnet folder there should be 3 folders for checkpoints and weights: extracted_from_maskrcnn, resnet-nhwc-2018-02-07 and resnet-nhwc-2018-10-14.
Before moving to the next step, ensure resnet-nhwc-2018-02-07 is not empty.
As a prerequisite, root of this repository must be added to PYTHONPATH. For example:
export PYTHONPATH=$PYTHONPATH:$HOME/Model-ReferencesBoth demo_mask_rcnn.py and mask_rcnn_main.py script can be used to run training. demo_mask_rcnn.py is a thin wrapper for mask_rcnn_main.py, that reduces boilerplate.
Using demo_mask_rcnn.py with all default parameters:
$PYTHON demo_mask_rcnn.pyWith exemplary parameters:
$PYTHON demo_mask_rcnn.py train --dataset /data -d bf16Equivalent command using mask_rcnn_main.py:
TF_BF16_CONVERSION=/path/to/Model-References/TensorFlow/common/bf16_config/full.json $PYTHON mask_rcnn_main.py --mode=train --checkpoint="weights/resnet/resnet-nhwc-2018-02-07/model.ckpt-112603" --eval_samples=5000 --init_learning_rate=0.005 --learning_rate_steps=240000,320000 --model_dir="results" --num_steps_per_eval=29568 --total_steps=360000 --training_file_pattern="/data/train-*.tfrecord" --validation_file_pattern="/data/val-*.tfrecord" --val_json_file="/data/annotations/instances_val2017.json"When demo_mask_rcnn.py is used, then hvd_workers parameter can be set. HCL configuration and hyperparameters adjustment will be performed automatically.
$PYTHON demo_mask_rcnn.py train --dataset /data -d bf16 --hvd_workers 8It's possible to invoke mpirun directly, both using demo_mask_rcnn.py and mask_rcnn_main.py. Please note that if you are training the model in a Kubernetes environment you need to use the following mpirun command. In this case, create a valid HCL configuration file and provide the file path using the HCL_CONFIG_PATH environment variable.
For the HCL configuration file format, read the API Reference Guides page.
Don't use hvd_workers parameter and remember to adjust hyperparameters (typically by multiplying init_learning_rate and dividing learning_rate_steps, num_steps_per_eval and total_steps by a number of workers).
Multi-card training in bf16 over mpirun using demo_mask_rcnn.py:
mpirun --allow-run-as-root --np 8 $PYTHON demo_mask_rcnn.py train --dataset /data -d bf16 --init_learning_rate 0.04 --learning_rate_steps 30000,40000 --num_steps_per_eval 3696 --total_steps 45000Equivalent command using mask_rcnn_main.py:
TF_BF16_CONVERSION=/path/to/Model-References/TensorFlow/common/bf16_config/full.json mpirun --allow-run-as-root --np 8 $PYTHON mask_rcnn_main.py --mode=train --training_file_pattern="/data/train-*.tfrecord" --validation_file_pattern="/data/val-*.tfrecord" --val_json_file="/data/annotations/instances_val2017.json" --init_learning_rate=0.04 --learning_rate_steps=30000,40000 --num_steps_per_eval=3696 --total_steps=45000 --checkpoint="weights/resnet/resnet-nhwc-2018-02-07/model.ckpt-112603" --model_dir="results"You can train the model in different data type by setting the TF_BF16_CONVERSION environment variable. For more details on the mixed precision training JSON recipe files, please refer to the TensorFlow Mixed Precision Training on Gaudi documentation.
For multi-card training in bf16-basic over mpirun using mask_rcnn_main.py:
TF_BF16_CONVERSION=/path/to/Model-References/TensorFlow/common/bf16_config/basic.json mpirun --allow-run-as-root --np 8 $PYTHON mask_rcnn_main.py --mode=train --training_file_pattern="/data/train-*.tfrecord" --validation_file_pattern="/data/val-*.tfrecord" --val_json_file="/data/annotations/instances_val2017.json" --init_learning_rate=0.04 --learning_rate_steps=30000,40000 --num_steps_per_eval=3696 --total_steps=45000 --checkpoint="weights/resnet/resnet-nhwc-2018-02-07/model.ckpt-112603" --model_dir="results"For multi-card training in fp32 over mpirun using mask_rcnn_main.py:
TF_BF16_CONVERSION=0 mpirun --allow-run-as-root --np 8 $PYTHON mask_rcnn_main.py --mode=train --training_file_pattern="/data/train-*.tfrecord" --validation_file_pattern="/data/val-*.tfrecord" --val_json_file="/data/annotations/instances_val2017.json" --init_learning_rate=0.04 --learning_rate_steps=30000,40000 --num_steps_per_eval=3696 --total_steps=45000 --checkpoint="weights/resnet/resnet-nhwc-2018-02-07/model.ckpt-112603" --model_dir="results"You can modify the training behavior through the various flags in the demo_mask_rcnn.py script and the mask_rcnn_main.py.
Flags in the demo_mask_rcnn.py script are as follows:
command: Runtrain,train_and_evalorevalon MS COCO.train_and_evalby default.dataset: Dataset directory.checkpoint: Path to model checkpoint.model_dir: Model directory.total_steps,s: The number of steps to use for training, should be adjusted according to thetrain_batch_sizeflag. Note that for first 100 steps performance won't be reported by the script (-1 will be shown).dtype,d: Data type,fp32,bf16orbf16-basic.bf16andbf16-basicautomatically converts the appropriate ops to the bfloat16 format. This approach is similar to Automatic Mixed Precision of TensorFlow, which can reduce memory requirements and speed up training.bf16-basicallows only matrix multiplications and convolutions to be converted.hvd_workers: Number of Horovod workers, disabled by default.train_batch_size,bs,b: Batch size for training.no_eval_after_training,t: Disable single evaluation steps after training whentraincommand is used.clean_model_dir,c: Clean model directory before execution.use_fake_data,f: Use fake input.pyramid_roi_impl: Implementation to use for PyramidRoiAlign.habana(default),habana_fp32andgathercan be used.eval_samples: Number of eval samples. Number of steps will be divided byeval_batch_size.eval_batch_size: Batch size for evaluation.num_steps_per_eval: Number of steps used for evaluation.deterministic: Enable deterministic behavior.save_summary_steps: Steps between saving summaries to TensorBoard.map_by_socket: MPI maps processes to sockets. Used only whenhvd_workersis set.device: Device type.CPUandHPUcan be used.profile: Gather TensorBoard profiling data.init_learning_rate: Initial learning rate.learning_rate_steps: Warmup learning rate decay factor. Expected format: "first_value,second_value".
| Command | Notes |
|---|---|
$PYTHON demo_mask_rcnn.py train --dataset /data -d bf16 |
Single-card training in bf16 |
$PYTHON demo_mask_rcnn.py train --dataset /data -d fp32 |
Single-card training in fp32 |
$PYTHON demo_mask_rcnn.py eval --dataset /data |
Single-card evaluation in bf16 |
$PYTHON demo_mask_rcnn.py train --dataset /data -d bf16 --hvd_workers 8 |
8-cards training in bf16 |
$PYTHON demo_mask_rcnn.py train --dataset /data -d fp32 --hvd_workers 8 |
8-cards training in fp32 |
$PYTHON demo_mask_rcnn.py eval --dataset /data --hvd_workers 8 |
8-cards evaluation in bf16 |
mpirun --allow-run-as-root --np 8 $PYTHON demo_mask_rcnn.py --dataset /data --init_learning_rate 0.04 --learning_rate_steps 30000,40000 --num_steps_per_eval 3696 --total_steps 45000 |
8-cards training and evaluation in bf16 |