-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the Cluster Builder Manual!
This manual is licensed under the Creative Commons Attribution-ShareAlike CC BY-SA license
The world of clusters can be a very broad subject. Some need computational power for research or for business while others might be interested in this guide as a means to learn parallel coding, to learn administration, or just for fun. I enjoy building clusters for fun, research, and for profit. This instructional guide will walk one through the process of building a cluster from the ground up, including the basics of administration and managemnet. I am in the process of expanding the notes to cover a variety of possibilities and configurations.
How you build a cluster can vary wildly depending on how much money you are willing to invest. This guide is going to take the assumption of a smaller cluster on a standard Gigabit network.
The frontend should have at least an 80GB hard drive, but the more room you have for your /home partition the better. As your users will be logging into the frontend to compile, launch jobs, and do work then you really should have several GB of memory and multiple cores. A section on creating a Login Node for users will come later. The frontend should also have two network cards. The first will have access to the public network or the internet while the second will have access to the private network that is reserved specifically for the nodes.
The compute nodes should be as beefy as possible. The type of nodes completely depends on the type of work. Large data set jobs may require more memory then processing power while rendering jobs may require more GPUs then anything else. If you are just doing this for the education then whatever you have works. My first personal cluster had ancient hardware because I could only afford 100$ for 5 compute nodes. My current personal playground setup consists of clearance-sale refurb boxes from Newegg; they cost me very little and are surprisingly powerful. When building a cluster for a job, tailoring the compute nodes to the job is very important but completely dependant on the job type.
In the 'simple' setup, this guide will assume that you will be exporting your /home directory from the frontend via NFS. This is not the only option that you have. It is not uncommon to have a SAN or NAS on which /home is stored. Some build their own while others buy one. This is more a more advanced topic outside the scope of this guide at this time.
This guide is assuming the following setup.
Internet <-> Frontend01 <-> nodes
The frontend is known as frontend01.cluster.domain and it will have a public side address of 192.168.1.201 and a private side network of 10.10.10.10.
The nodes are known as node01, node02, node03, and node04.
The example cluster that this guide uses will be based on a 64bit system, but the guide will attempt to mask the commands for those with 32bit systems. Anytime the variable $ARCH is shown, substitute it with either i386 or x86_64.
If you are building a cluster of any significant size then you will be grabbing the same pacakges many times. This can be very time consuming for you over a slow internet connection and very load intensive on a community repository. In these situations it is often very useful having a local repository from which you can pull your packages from. Here is one way of building a local repository.
(CreateRepo)[wiki/CreateRepo_SL] (Still in progress).
Start with the installation of the frontend. The operating system one chooses is very important. Many prefer using Red Hat, CentOS, or Scientific Linux but there are many good reasons for choosing a Debian based system as well.
Scientific Linux
Debian
Ubuntu
First login to configure the new installation: On Scientific Linux.
Configure NFS for /home.
Configure a DHCP/TFTP server. (still work in progress)
Configure local repo '(coming soon)'
Kickstart '(coming soon)'
Hardware resource manager Torque '(coming soon)'
User resource manager Maui '(coming soon)'
-or-
Open Grid Scheduler '(coming soon)'
Puppet '(coming soon)'
Modules '(coming soon)'
Add users and push their logins to the nodes
Add a development user for creating packages for the cluster