The Source code for paper Evolving Generalizable Parallel Algorithm Portfolios via Domain-Agnostic Instance Generation
This page will tell you how to config the environment for the source code and run it.
conda create -n test_env -q -y python=3.8
conda activate test_env
pip install -r requirements.txt# You need to set the $DACE as the root path of this project
# export DACE="The Path of this project on your server"
# gcc we used is 11.2/11.4
# Install some package by apt
sudo apt install make gcc g++ libeigen3-dev libssl-dev swig git libboost-dev libasio-dev
# Download and install cmake>=3.14
cd $DACE #The root path of the project
wget https://cmake.org/files/v3.22/cmake-3.22.4.tar.gz
tar xf cmake-3.22.4.tar.gz
cd cmake-3.22.4
./bootstrap --parallel=48
make -j 255
sudo make install
#Download and install pybind11
cd $DACE #The root path of the project
git clone https://github.com/pybind/pybind11.git
cd pybind11
mkdir build
cd build
cmake ..
make check -j 255
sudo make install
#Download and install Boost
cd $DACE #The root path of the project
wget https://archives.boost.io/release/1.84.0/source/boost_1_84_0.tar.gz
tar xf boost_1_84_0.tar.gz
cd boost_1_84_0
./bootstrap.sh
sudo ./b2 install --prefix=/usr toolset=gcc threading=multi
# Update Dynamic Lib list
sudo ldconfig
# Build the binary lib
# Modify the Python Environments !!! Important!!!
conda activate test_env
cd $DACE/src/cpp/com_imp/
cmake -DCMAKE_BUILD_TYPE=Release && makeYou need to run the following instructions to download the dataset for compiler arguments optimization.
ck pull repo:ck-env
ck pull repo:ck-autotuning
ck pull repo:ctuning-programs
ck pull repo:ctuning-datasets-minCreate a new dictionary named /tmp_ck
sudo mkdir /tmp_ck
sudo chown -R ${USER} /tmp_ckTo avoid the disk I/O block, we recommend that create a memory disk and link it to /tmp_ck.
sudo mount -t tmpfs -o size=32G tmpfs /tmp_ckThen copy the runtime to the /tmp_ck dictionary.
cp -r ~/CK /tmp_ck
cp -r ~/CK-TOOLS /tmp_ck
cp -r ~/CK/ctuning-programs/program/* /tmp_ckThere are 3 problem classes in this repo and 3 of them are generated according to the existing datasets. The datasets of
complementary influence maximization problem and compiler arguments optimization problem have upload into the folder data/dataset.
The dataset of Facebook/Wiki/Epinions for Complementary Influence Maximization Problem is located in the folder data/dataset/com_imp.
The dataset for Compiler Arguments Optimization problem is located in the folder data/dataset/compiler_args.
Set the env_variable PYTHONPATH as:
# You need to set the $DACE as the root path of this project
# export DACE="The Path of this project on your server"
export PYTHONPATH=$DACE:$DACE/srcWhile $DACE is the root path of this project.
Run the experiment_problem.py in the src path
cd $DACE/src/experiments
python experiment_problem.pycd $DACE/src/experiments
python experiment_build_surrogate.pycd $DACE/src/experiments
python generate_initail_cofing_set.pycd $DACE/src/pap
python dace.py --problem_domain contamination_problem --problem_dim 30
python dace.py --problem_domain compiler_args_selection_problem --problem_dim 80 --add_recommend_config_set True
python dace.py --problem_domain com_influence_max_problem --problem_dim 80cd $DACE/src/pap
python ceps.py --problem_domain contamination_problem --problem_dim 30 --max_parallel_num 300
python ceps.py --problem_domain compiler_args_selection_problem --problem_dim 80 --add_recommend_config_set True --max_parallel_num 300
python ceps.py --problem_domain com_influence_max_problem --problem_dim 80 --max_parallel_num 20cd $DACE/src/pap
python global.py --problem_domain contamination_problem --problem_dim 30 --max_parallel_num 300 --distributed False --time_limit 12 --config_batch_size 200 --smac_max_try 51200
python global.py --problem_domain compiler_args_selection_problem --problem_dim 80 --max_parallel_num 300 --distributed True --time_limit 12 --config_batch_size 100 --smac_max_try 6400
python global.py --problem_domain com_influence_max_problem --problem_dim 80 --max_parallel_num 50 --distributed True --time_limit 12 --config_batch_size 75 --smac_max_try 6400cd $DACE/src/pap
python parhydra.py --problem_domain contamination_problem --problem_dim 30 --max_parallel_num 300 --distributed False --time_limit 12 --config_batch_size 200 --smac_max_try 25600
python parhydra.py --problem_domain compiler_args_selection_problem --problem_dim 80 --max_parallel_num 300 --distributed True --time_limit 12 --config_batch_size 100 --smac_max_try 4800
python parhydra.py --problem_domain com_influence_max_problem --problem_dim 80 --max_parallel_num 50 --distributed True --time_limit 12 --config_batch_size 20 --smac_max_try 12800Test the performance of the PAP constructed by DACE and CEPS
cd $DACE/src/experiments
python -u experiment_run_pap.py --method DACE --problem_domain contamination_problem --problem_dim 30 --max_parallel_num 200 --repeat_time 20
python -u experiment_run_pap.py --method CEPS --problem_domain contamination_problem --problem_dim 30 --max_parallel_num 200 --repeat_time 20cd $DACE/src/experiments
python -u experiment_run_default_pap.py --repeat_time 20cd $DACE/src/experiments
python experiment_test_result.pycd $DACE/src/pap
python dace_no_reg.py --problem_domain contamination_problem --problem_dim 30
python dace_no_reg.py --problem_domain compiler_args_selection_problem --problem_dim 80 --add_recommend_config_set True
python dace_no_reg.py --problem_domain com_influence_max_problem --problem_dim 80cd $DACE/src/experiments
python experiment_vis.pyYou can run some experiments in distributed mode, such as training PAP, evaluating performance on test set for DACE and CEPS.
In src/pap/ceps.py, src/pap/global.py, src/pap/parhydra.py, src/experiments/experiment_run_pap.py, you can run the code in distributed by add a script parameter "--distributed true". And then, you need to start the evaluator script on the machines in cluster:
cd $DACE/src/distribution
python -u start_evaluator.py --pap $PAP_TYPE --max_parallel_num $PARALLEL_NUM --server_host $IP_ADDRESS$PAP_TYPE has two candidate value ceps and base, ceps used in the PAP construction process of CEPS, and base used in the performance evaluation process.
$PARALLEL_NUM is the num of the evaluation process in parallel.
$IP_ADDRESS is the ip address of the master machine that run the script with parameter "--distributed true".
For example, I want to Run CEPS to train a PAP on compiler args optimization problem in distributed mode.
Firstly, run the ceps.py on a machine with ip address 172.18.18.18
cd $DACE/src/pap
python ceps.py --problem_domain compiler_args_selection_problem --problem_dim 80 --add_recommend_config_set True --max_parallel_num 300 --distributed trueThen, on each machines in your cluster:
cd $DACE/src/distribution
python -u start_evaluator.py --pap ceps --max_parallel_num 300 --server_host "172.18.18.18"