Start with Example
Graphormer provides example scripts to train your own models on several datasets. For example, to train a Graphormer-slim on ZINC-500K on a single GPU card:
> cd examples/property_prediction/
> bash zinc.sh
The content of zinc.sh
is simply a fairseq-train
command:
CUDA_VISIBLE_DEVICES=0 fairseq-train \
--user-dir ../../graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name zinc \
--dataset-source pyg \
--task graph_prediction \
--criterion l1_loss \
--arch graphormer_slim \
--num-classes 1 \
--attention-dropout 0.1 --act-dropout 0.1 --dropout 0.0 \
--optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-8 --clip-norm 5.0 --weight-decay 0.01 \
--lr-scheduler polynomial_decay --power 1 --warmup-updates 60000 --total-num-update 400000 \
--lr 2e-4 --end-learning-rate 1e-9 \
--batch-size 64 \
--fp16 \
--data-buffer-size 20 \
--encoder-layers 12 \
--encoder-embed-dim 80 \
--encoder-ffn-embed-dim 80 \
--encoder-attention-heads 8 \
--max-epoch 10000 \
--save-dir ./ckpts
CUDA_VISIBLE_DEVICES
specifies the GPUs to use. With multiple GPUs, the GPU IDs should be separated by commas.
A fairseq-train
with Graphormer model is used to launch training.
Command-line Tools gives detailed explanations to the parameters.
Similarily, to train a Graphormer-base on PCQM4M dataset on multiple GPU cards:
> cd examples/property_prediction/
> bash pcqv1.sh
By runing the instructions in the scripts, Graphormer will automatically download the needed datasets and pre-process them.
Evaluate Pre-trained Models
Graphormer provides pretrained models so that users can easily evaluate, and finetune.
To evaluate a pre-trained model, use the script graphormer/evaluate/evaluate.py
.
python evaluate.py \
--user-dir ../../graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name pcqm4m \
--dataset-source ogb \
--task graph_prediction \
--criterion l1_loss \
--arch graphormer_base \
--num-classes 1 \
--batch-size 64 \
--pretrained-model-name pcqm4mv1_graphormer_base \
--load-pretrained-model-output-layer \
--split valid \
--seed 1
--pretrained-model-name
specifies the pre-trained model to be valuated. The pre-trained model will be automatically downloaded. And --load-pretrained-model-output-layer
is set so that weights of the
final fully connected layer in the pre-trained model is loaded. And --split
specifies the split of the dataset to be evaluated, can be train
or valid
.
Fine-tuning Pre-trained Models
To fine-tune pre-trained models, use --pretrained-model-name
to set the model name. For example, the script examples/property_prediction/hiv_pre.sh
fine-tunes our model pcqm4mv1_graphormer_base
on the ogbg-molhiv
dataset. The command for fine-tune is
fairseq-train \
--user-dir ../../graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name ogbg-molhiv \
--dataset-source ogb \
--task graph_prediction_with_flag \
--criterion binary_logloss_with_flag \
--arch graphormer_base \
--num-classes 1 \
--attention-dropout 0.1 --act-dropout 0.1 --dropout 0.0 \
--optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-8 --weight-decay 0.0 \
--lr-scheduler polynomial_decay --power 1 --warmup-updates $warmup_updates --total-num-update $tot_updates \
--lr 2e-4 --end-learning-rate 1e-9 \
--batch-size $batch_size \
--fp16 \
--data-buffer-size 20 \
--encoder-layers 12 \
--encoder-embed-dim 768 \
--encoder-ffn-embed-dim 768 \
--encoder-attention-heads 32 \
--max-epoch $max_epoch \
--save-dir ./ckpts \
--pretrained-model-name pcqm4mv1_graphormer_base \
--flag-m 3 \
--flag-step-size 0.01 \
--flag-mag 0 \
--seed 1 \
--pre-layernorm
After fine-tuning, use graphormer/evaluate/evaluate.py
to evaluate the performance of all checkpoints:
python evaluate.py \
--user-dir ../../graphormer \
--num-workers 16 \
--ddp-backend=legacy_ddp \
--dataset-name ogbg-molhiv \
--dataset-source ogb \
--task graph_prediction \
--arch graphormer_base \
--num-classes 1 \
--batch-size 64 \
--save-dir ../../examples/property_prediction/ckpts/ \
--split test \
--metric auc \
--seed 1 \
--pre-layernorm
Training a New Model
We take OC20 as an example to show how to train a new model on your own datasets.
First, download IS2RE train, validation, and test data in LMDB format by:
> cd examples/oc20/ && mkdir data && cd data/
> wget -c https://dl.fbaipublicfiles.com/opencatalystproject/data/is2res_train_val_test_lmdbs.tar.gz && tar -xzvf is2res_train_val_test_lmdbs.tar.gz
Create ckpt
folder to save checkpoints during the training:
> cd ../ && mkdir ckpt/
Now we train a 48-layer graphormer-3D
architecture, which has 4 blocks and each block contains 12 Graphormer layers. The parameters are sharing across blocks. The total training steps are 1 million, and we warmup the learning rate by 10 thousand steps.
> fairseq-train --user-dir ../../graphormer \
./data/is2res_train_val_test_lmdbs/data/is2re/all --valid-subset val_id,val_ood_ads,val_ood_cat,val_ood_both --best-checkpoint-metric loss \
--num-workers 0 --ddp-backend=c10d \
--task is2re --criterion mae_deltapos --arch graphormer3d_base \
--optimizer adam --adam-betas '(0.9, 0.98)' --adam-eps 1e-6 --clip-norm $clip_norm \
--lr-scheduler polynomial_decay --lr 3e-4 --warmup-updates --total-num-update 1000000 --batch-size 4 \
--dropout 0.0 --attention-dropout 0.1 --weight-decay 0.001 --update-freq 1 --seed 1 \
--fp16 --fp16-init-scale 4 --fp16-scale-window 256 --tensorboard-logdir ./tsbs \
--embed-dim 768 --ffn-embed-dim 768 --attention-heads 48 \
--max-update 1000000 --log-interval 100 --log-format simple \
--save-interval-updates 5000 --validate-interval-updates 2500 --keep-interval-updates 30 --no-epoch-checkpoints \
--save-dir ./ckpt --layers 12 --blocks 4 --required-batch-size-multiple 1 --node-loss-weight 15
Please note that --batch-size 4
requires at least 32GB of GPU memory. If out of GPU momery occuars, one may try to reduce the batchsize then train with more GPU cards, or increase the --update-freq
to accumulate the gradients.