Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move TPP backend to ubuntu VM #46

Closed
bloodearnest opened this issue Dec 15, 2021 · 4 comments
Closed

Move TPP backend to ubuntu VM #46

bloodearnest opened this issue Dec 15, 2021 · 4 comments
Assignees

Comments

@bloodearnest
Copy link
Member

bloodearnest commented Dec 15, 2021

We need to move to using a linux VM we can fully manage, rather than the pre-baked VM that comes with Docker for Windows.

Benefits

  • We can allocate more RAM to docker (currently only 248G out of a possible 1024G) and disk (currently only 500GB) than docker allows
  • We can use linux docker in the VM, rather than Docker for Windows, which avoids various issues:
    • Docker volume IO is slow on Docker for Windows
    • Docker for Windows is not officially supported on Windows Server
    • Upcoming licensing changes in Docker for Windows may complicate things (https://www.docker.com/pricing/faq)
  • It is consistent with all our other backends.
  • It provides a cleaner and more flexible administration boundary between TPP and Datalab. TPP would just need to manage windows host users, firewall, and the HyperV config and Datalab can manage everything else with admin access inside the VM.
  • It's an additional security layer in the case of compromised credentials, as there's an additional set of credentials involved (the SSH keys needed to log in to the VM from the Windows host).
  • It provides an opportunity to clean up level 3 access for early access researchers.
  • Enables us to move away from the global level 4 sync currently maintained by TPP, and provide fully audited per-workspace access via the release-hatch service in the VM.

The basic proposal is:

  • Datalab build a Hyper-V image on windows and give to TPP to deploy as a base install (see build-image.sh)
  • TPP deploy the image, bump its resources:
    • 512GB RAM to start, increased once Docker for Windows is gone.
    • As much disk as possible, probably as additional disks in the VM that Datalab can map internally as needed (e.g. long term storage, docker storage)
  • TPP to configure firewall
    • allow access from VM to archive.ubuntu.com:80 and security.ubuntu.com:80 so we can keep the VM up to date.
    • allow port 443 access from level 4 to the new ubuntu VM so Level 4 users can access release-hatch service on new VM to facilitate releasing.
  • Test new VM setup with new temporary backend.
    • Datalab to setup SMB fileshare for level 4 files from VM to host
  • Arrange downtime for swtich over:
    • Copy current files over

Implications

File storage is now on the VM.

This gives us the best possible IO, but changes how the files will be accessed.

Note: we will need to copy in the current windows based level3 and 4 directories into the VM as part of the switch over.

Only users with ssh access will be able to log in to view the level 3 files.

Level 4 access

Ideally, release-hatch running on the VM will provide mediated access to the level 4 files for viewing by level 4 researchers. However, if that's not sufficient in the short term, then the VM can expose a SMB fileshare of the level 4 files, which can be used by the TPP maintained sync script to copy to the TPP Level 4 VM for a while.

Level 3 access

In theory, only the tech team need access to the level 3 files, which they can do via ssh. However, if we need to provide non-ssh access to view level 3 files, then we can possibly do an SMB fileshare to the Windows host, but would really rather not.

@bloodearnest
Copy link
Member Author

Update:

We have a VM at 192.168.201.4, currently with ~236G of RAM and 180G of disk.

I've submitted some jobs to the backend, it ran, and we can view the outputs on Level 4 using the browser there to talk to the VM.

Next steps:

  • More tests of real jobs.
  • expose /srv/medium_privacy as an SMB share and mount it on the host, so that TPPs sync scripts can carry on working.
  • Add a self-signed HTTPS certificate provided by TPP for release-hatch.
  • clean up old stuff of E: to make sure we have enough room.

Swtiching over:

  • Schedule a switch over day, where we:
    • stop docker and current job-runner.
    • bump disk quota on VM
    • copy /e/{high,medium}_privacy to the VM
    • point sync script src at new SMB share
    • switch config on VM's jobrunner to become the TPP jobrunner.
    • gogogo

@bloodearnest
Copy link
Member Author

bloodearnest commented Feb 21, 2022

Related: we need proper TLS for release-hatch: #50

@bloodearnest
Copy link
Member Author

Update:

Work ongoing:

  • TPP are upgrading D:\ from 900G to 3.2T this week.
  • working through archiving workspaces, slow and tedious.

Next: plan and schedule the migration.

@bloodearnest
Copy link
Member Author

bloodearnest commented Mar 24, 2022

Update:

  • D:\ is now 3.4TB and has a Hyper-V drive on it that is mounted at /archive inside the VM
  • 1.2TB of workspace files archived into ~170GB of archived tarballs, 2.2TB of live data
  • Enabled Hyper-V guest services for fast copying not over the network.
  • Testing the above to get an idea of how long we'd need to copy over (wip)

We need to schedule a date after Easter for the migration, and communicate to users.

The actual migration plans has been migrated to separate ticket.

#58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant