All commands used in our Reproducibility Challenge submission experiments

Running Experiments

After installing our library, experiments are run by running run_exp TASK_NAME PARAMS. The TASK_NAME sets the data and model to use while PARAMS adjust some model and training settings as well as logging.

Experiment Task Names

When running the experiment script, each experiment type has its own "task" which defines the data and model to be used as well as the training schedule. Most model settings are hard-coded in configs to mirror the configurations used in the Reformer paper. All model and training hypermaramters used in training can be found in Experiments/Configs. Depending on the task, a limited number of configs can be changed such as seq_len or n_layers for example.

  • synt : Language Modelling with Synthetic Data
  • lm_XX : The lm_base argument will train a baseline TransformerLM, lm_rev to train a ReversibleLM and lm_shared_qk for a baseline Transformer with shared query-key values. All training is on the enwik9 dataset.
  • n_hashes : Trains a LSH-LM on the enwik8 data
  • n_layers : Trains a LSH-LM on the enwik8 data
  • wmt_XX : The wmt_base argument will train a classic Transformer on the WMT-14 dataset, while wmt_rev will train a ReversibleTransformer on the same dataset

Commands Used

Language Model Experiments with enwik8

Section 4.2 - LSH attention analysis on synthetic task

run_exp "synt" --n_epochs=750 --bs=64 --save_model=True --seed=123 --do_wandb_logging=False --n_hashes=4

Section 4.3 & 4.4 - Effect of sharing QK & Effect of reversible layers The language modelling experiments outlined in section 4.3 and 4.4 of the paper were run with the following command. The lm_base argument was passed to train a baseline TransformerLM, lm_rev to train a ReversibleLM and lm_shared_qk for a baseline Transformer with shared query-key values

run_exp "lm_base" --n_epochs=10 --bs=1 --max_seq_len=4096 --grad_accum=8 --save_model=True --clip=1.0 --seed=42 --do_wandb_logging=False

Section 4.5 - Reversible Transformer on translation task experiment

ReversibleTransformer on WMT-14:

run_exp "wmt_rev" --lr=1e-4 --n_epochs=2 --bs=64 --n_layers=6 --max_seq_len=256 --do_wandb_logging=False --save_model=True --clip=1.0 --seed=8230 --precision=2

Run the above with task "wmt_base" to train a baseline Transformer

Section 4.6 - Effect of number of hashing rounds on the performance

run_exp "n_hashes" --n_hashes=2 --n_epochs=10 --bs=8 --max_seq_len=4096 --do_wandb_logging=True --wandb_group='n_hashes' --wandb_notes='performance as function of n_hashes (2)' --wandb_tags='lm exp lsh nhashes' --grad_accum=8  --clip=1.0 --seed=2

Section 4.7 - LSH attention evaluation speed The LSH-LM evaluation speed experiment used the same functions as the script but was carried out in the "LSH evaluation speed" notebook here

Section 4.8 - Deep Reformer models

run_exp "n_layers" --n_layers=6 --n_epochs=8 --bs=2 --max_seq_len=16384 --do_wandb_logging=True --wandb_group='n_layers' --wandb_notes='performance as function of n_layers (6)' --wandb_tags='lm exp lsh nlayers' --grad_accum=8 --clip=1.0 --seed=48 --save_model=True