Problem Description
Once #545 is resolved, benchmark configurations will live in YAML files and can be loaded/validated/executed via BenchmarkConfig/BenchmarkLauncher. The next step is to add a script that automatically executes a benchmark from a config yaml file or some pre-defined parameters.
Expected behavior
Provide a simple way to execute benchmarks from a set of parameters
invoke benchmark-launcher \
--modality single_table \
--timeout 345600 \
--datasets 'adult,alarm,census' \
--synthesizers 'UniformSynthesizer,TVAESynthesizer' \
--num_instances 4 \
--output_destination 's3://sdgym-benchmark/Debug'
Or from a config file
invoke benchmark-launcher --config_filepath 'config.yaml'
Additional context
- It should not be possible to give a
config_filepath and parameters together.
- If
num_instances is given, then this exact number of instances must be launched.
- For now we can apply a simple rule to build the
instance_jobs split, which is:
- Try to split by synthesizer first and keep all datasets together
- If we still need more splits to reach the number of instances, split the dataset list for a given synthesizer
- This logic ensures the correct number of instances is launched without job redundancy.
Problem Description
Once #545 is resolved, benchmark configurations will live in YAML files and can be loaded/validated/executed via
BenchmarkConfig/BenchmarkLauncher. The next step is to add a script that automatically executes a benchmark from a config yaml file or some pre-defined parameters.Expected behavior
Provide a simple way to execute benchmarks from a set of parameters
Or from a config file
Additional context
config_filepathandparameterstogether.num_instancesis given, then this exact number of instances must be launched.instance_jobssplit, which is: