Skip to content

CSID-DGU/remote_boot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Remote Boot

Wake-on-LAN targets can be sent automatically whenever this desktop boots.

Files

  • script/common.sh: shared ansible and server-id helpers for remote boot scripts
  • script/wake_targets.sh: send magic packets for one or more named targets
  • script/run_remote_boot.sh: boot entrypoint that loads config and runs the target script
  • script/create_test_container.sh: create a temporary GPU container without touching the DB
  • script/delete_test_container.sh: remove the temporary test container
  • script/check_server_boot_health.sh: verify mount, GPU, container create, SSH service, and container GPU
  • script/wait_for_priority_servers.sh: retry health checks until timeout before waking the rest
  • script/restart_all_remote_containers.sh: run docker restart $(docker ps -aq) on selected servers with retry
  • script/integration_smoke_test.sh: manual ansible/docker/GPU smoke test before enabling boot automation
  • script/install_remote_boot_service.sh: installs and enables the systemd boot service
  • config/remote_boot.local.env: local defaults used at boot time

Quick start

  1. Copy the example config and edit only your local file
cp config/remote_boot.example.env config/remote_boot.local.env
  1. Fill in server-specific values in config/remote_boot.local.env
  2. Review config/remote_boot.local.env
  3. Install the boot service
./script/install_remote_boot_service.sh
  1. Reboot, or run once manually
sudo systemctl start remote-boot.service

Manual usage

List available targets:

./script/wake_targets.sh --list-targets

Wake a group manually:

./script/wake_targets.sh all

Boot orchestration with staged wake-up:

  • REMOTE_BOOT_PRIORITY_TARGETS="FARM1 LAB1" is sent first
  • REMOTE_BOOT_ENABLE_GATE=true waits for priority servers to pass health checks
  • the gate retries for up to REMOTE_BOOT_GATE_TIMEOUT_SECONDS=360
  • once the gate passes, the remaining selected targets are sent
  • finally, if REMOTE_BOOT_ENABLE_CONTAINER_RESTART=true, all selected servers run a full docker container restart

Standalone test container commands:

./script/create_test_container.sh --server-id FARM1
./script/delete_test_container.sh --server-id FARM1

Recommended manual integration test:

./script/integration_smoke_test.sh --scope priority

Git

  • config/remote_boot.local.env is ignored by .gitignore
  • commit config/remote_boot.example.env and keep real server-specific values, including MAC addresses, only in config/remote_boot.local.env
  • when a server is added, update REMOTE_BOOT_FARM_TARGETS or REMOTE_BOOT_LAB_TARGETS plus the matching REMOTE_BOOT_MAC_<TARGET> value in config/remote_boot.local.env

Notes

  • wakeonlan must be installed on this desktop.
  • wake_targets.sh reads MAC addresses from REMOTE_BOOT_MAC_<TARGET> variables in config/remote_boot.local.env.
  • LAB* targets use 192.168.1.255, and FARM* targets use 192.168.2.255 by default.
  • remote scripts can use REMOTE_BOOT_ANSIBLE_INVENTORY, or fall back to your existing ansible.cfg default inventory.
  • host mount checks expect 100.100.100.100:/294t/dcloud/share for LAB and 100.100.100.120:/volume1/share for FARM.
  • boot health checks create a temporary GPU test container directly via Docker and remove it without writing to the DB.
  • test container share mounts can use REMOTE_BOOT_TEST_SHARE_SOURCE_TEMPLATE="/home/tako%s/share/user-share/"; %s is replaced with the server number, so FARM1 and LAB1 both use /home/tako1/share/user-share/.
  • container restart uses docker ps -aq; if a server has no containers, it logs and continues.
  • If the network is not ready at boot, increase REMOTE_BOOT_PRE_DELAY_SECONDS.

About

동국대학교 GPU 서버실 관리용 데스크탑에서 실행할 프로그램

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages