Tuesday, November 3, 2020

Running Openshift at Home - Part 4/4 Deploying Openshift 4 on Proxmox VE

Part 4/4 - Deploying Openshift/OKD 4.5 on Proxmox VE Homelab

This is the last part of a 4-part series on Running Openshift at Home. Some information here will have some references to the previous part of the series. Please find the links below for the previous posts.

Part 1/4 - Homelab Hardware
Part 2/4 - Building a Silent Server Cabinet
Part 3/4 - Installing a Two-Node Proxmox VE Cluster

The installation process uses OKD4.5, the upstream community project of Openshift. It's like the Fedora Linux of Red Hat Enterprise Linux.

Openshift has 2 types of installation. The IPI, which is the fully automated deployment over known cloud providers such as AWS, Azure and GCP. On the other hand, UPI or user-provisioned infrastructure is a partially automated process which we will talk about here.

The installation process is automatically performed by bootstrapping. A bootstrap machine is a temporary machine used by Openshift/Kubernetes to host the services required in the bootstrap procedure. The bootstrap machine will create an etcd cluster and starts a few Kubernetes services. The master machines will then join the etcd cluster through ignition. The Kubernetes services will then be transferred from the bootstrap machine to the master nodes as soon as they become ready. The last step of the bootstrap process is that the bootstrap machine will be removed from the etcd cluster. At this point, the bootstrap machine can be shut down and deleted forever.

Though the bootstrap process is automatic, the preparation of the installation, however, has to be done manually. Note that in 4.6 which was released a couple of days ago, there is now support for automated installation on bare-metal infrastructure using IPMI/BMC. We will not cover this here.

Infrastructure

As described in the previous post, my homelab infrastructure looks like this.


The servers are running Proxmox Virtualization Environment, an opensource hypervisor. I also have a physical router and a physical DNS server. We will configure this device as well for the OKD bootstrap process to work. You will need a good amount of RAM on the host to run the following configuration. In Proxmox, we can over-provision RAM. So even if the total RAM of the VMS is 100GB, the setup should run if you have at least 64GB of RAM available on the host. In my case, the total RAM usage after installing a 5-node Openshift was around 56GB.

Virtual Machines

For a 5-node Openshift/OKD cluster you will need to spin up 6x Fedora Core OS VMs, and 1x Centos 8, assuming you have a physical router and an external DNS server. Otherwise, you may also run your router and DNS server in a VM. But this will eat up even more RAM on your host.

Start by creating the following VMs in Proxmox VE as detailed in the following sections and take note of their MAC addresses after creation. We will use the table below as a reference for VM creation and DHCP address reservation configuration.

VM NameRoleIP AddressOSvCPURAMStorage
okd4-bootstrapbootstrap192.168.1.200Fedora Core OS416120
okd4-control-plane-1master192.168.1.201Fedora Core OS416120
okd4-control-plane-2master192.168.1.202Fedora Core OS416120
okd4-control-plane-3master192.168.1.203Fedora Core OS416120
okd4-compute-1worker192.168.1.204Fedora Core OS416120
okd4-compute-2worker192.168.1.205Fedora Core OS416120
okd4-servicesLoad Balancer,
DNS Server,
Web, NFS
192.168.1.210CentOS 844100


Download the OS Images

Download the latest Fedora Core OS installer from https://getfedora.org/en/coreos/download. Select the Bare Metal & Virtualized Tab. Download the Bare Metal ISO package.


Upload the installer to the local storage of the Proxmox node where you will create the VMs (just in case you have multiple nodes).

Create the VMs

From the Proxmox VE Web interface, right-click in the node and select create VM. Name the VM according to the table above starting with the okd4-bootstrap.


 Select the Fedora Core OS image we uploaded earlier.

 

Leave the system tab with default values. Proxmox VE has already pre-selected the optimum setting for the selected Guest OS type. Set the size of the disk according to the table above.

 

Select 4 cores in the CPU tab as per the table above. Leave the rest of the setting unchanged by default, unless you know what you are doing.

 

Set the Memory to 16GB (16384 Mib).

If you followed the above instructions correctly, you should see the following values in the confirmation screen. Then just click finish.

After the VM is created. Proxmox will generate a MAC Address for the virtual network interface card. Take note of the MAC Address. Create a table similar to the above, but with the MAC Addresses column. You will need this later.


Repeat the above procedure for the rest of the VMs. Take note that the last VM in the table, okd4-services, is a CentOS 8 VM. You need to download the CentOS 8 installer ISO and upload it to Proxmox local storage. The latest CentOS 8 release can be downloaded here: http://isoredirect.centos.org/centos/8/isos/x86_64.

Download the one that ends with dvd1.iso.

 

Upload this file to the local storage of Proxmox VE and create the okd4-services VM as per the above procedure. Take note that this VM only has 4GB of RAM. Not 16GB.

You should have the following list of VMs at the end.



DHCP Address Reservation

Using the list of MAC Addresses of the VMs created earlier, we need to assign IP addresses to these MAC Addresses through DHCP Address reservation. Depending on the router, the process may be slightly different.

For my case, I have an ASUS Router and this is how the Address reservation looks like. Just a table of MAC addresses and their pre-assigned IP address.


When the VMs are started, they will get these IP Addresses via DHCP.

If you do not have a physical router or don't want to use your home router, you can run a PfSense router on a VM and configure the above VMs to be behind this router. Then you need to configure the same DHCP address reservation configuration. We will also need to revisit this router configuration after setting up a DNS server.


OKD Services

The okd4-services VM will run several things required to install and run Openshift/OKD.

  1. DNS Server -  If you do not have an external/Raspberry Pi DNS server
  2. HA Proxy - Load Balancer
  3. Apache Web Server (httpd) - to host the OS images and ignition files during PXE booting. This service can be stopped after the installation.
  4. NFS server - to be used as Persistent Volume by Openshift Image Registry

Start the okd4-services VM. Navigate to console.

 

 In the Installation Destination option, select custom, then done.

 Delete the /home partition and leave the desired capacity for / empty.

Select the Network and Hostname. Enable the ethernet adapter, set the hostname and tick the automatically connect.

Then click begin installation and set the root password.

After installation is complete,  run the following to add the EPEL repository to DNF/yum and update the OS.

sudo dnf install -y epel-release
sudo dnf update -y
sudo systemctl reboot

From this point on, we will do the rest of the installation and configuration from this VM. SSH to this new VM.


Create a DNS Server

The Openshift bootstrap procedure uses FQDNs to address the nodes. Because of this, we need to set up a DNS server. I used an old Raspberry Pi as DNS Server. But for simplicity, we will run the DNS server process on the okd4-services VM. 

SSH to okd4-services VM, it should have picked up the IP address 192.168.1.210 if the DHCP address reservation configuration was done correctly. 

Install git and clone the git repo https://github.com/rossbrigoli/okd4_files

cd
sudo dnf install -y git
git clone https://github.com/rossbrigoli/okd4_files.git

This repo contains the configuration files we need.

The default domain name and cluster name for this installation is okd.home.lab. Where okd is the cluster name and home.lab is the base domain name. I created a script to help you update the configuration files if you want to change the domain name and cluster name. The following command will update the command to mycloud.mydomain.com.

./setdomain.sh mycloud mydomain.com

Then we need to install named as our DNS server. Copy down the configuration files and open up firewall port 53/UDP for DNS.

#DNS Server
cd
sudo dnf -y install bind bind-utils
sudo cp named.conf /etc/named.conf
sudo cp named.conf.local /etc/named/
sudo mkdir /etc/named/zones
sudo cp db* /etc/named/zones
sudo systemctl enable named
sudo systemctl start named
sudo systemctl status named
sudo firewall-cmd --permanent --add-port=53/udp
sudo firewall-cmd --reload

Now we need to go back to our router/DHCP server to set this DNS server as the DNS server of the router. All devices connected to the router should use this DNS server.


Create a Load Balancer

Copy the HA Proxy configuration file to the /etc/haproxy and then start HA Proxy service.

cd
sudo cp okd4_files/haproxy.cfg /etc/haproxy/haproxy.cfg
sudo setsebool -P haproxy_connect_any 1
sudo systemctl enable haproxy
sudo systemctl start haproxy
sudo systemctl status haproxy

Open TCP ports for Openshift/etcd clustering.

sudo firewall-cmd --permanent --add-port=6443/tcp
sudo firewall-cmd --permanent --add-port=22623/tcp
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload


Serve the Installation Files Over HTTP

Now, we need to install Apache webserver (HTTPD). We will host here, the files that we need to PXE boot the nodes on http port 8080.

sudo dnf install -y httpd
sudo sed -i 's/Listen 80/Listen 8080/' /etc/httpd/conf/httpd.conf
sudo setsebool -P httpd_read_user_content 1
sudo systemctl enable httpd
sudo systemctl start httpd
sudo firewall-cmd --permanent --add-port=8080/tcp
sudo firewall-cmd --reload
curl localhost:8080


A quick recap of what we just did.

  1. We created the VMs we need for the cluster
  2. We configured a router/DHCP server
  3. We installed Centos 8 on okd4-service VM
  4. We configured a DNS Server (named) on Okd4-services VM
  5. We created a load balancer using HA Proxy running in okd4-services VM
  6. We created an Apache webserver to host installation files in okd4-services VM


Installing OKD

Now that the infrastructure is ready, it's time to install OKD. The open-source upstream project of Openshift. The latest OKD release can be found at https://github.com/openshift/okd/releases. Update the links below accordingly to get the latest version.

Download the Openshift installer and the OC client. Extract the downloaded file and move the extracted binaries to /usr/local/bin. SSH to okd4-services VM and execute the following.

cd
wget https://github.com/openshift/okd/releases/download/4.5.0-0.okd-2020-10-15-235428/open\
shift-client-linux-4.5.0-0.okd-2020-10-15-235428.tar.gz
wget https://github.com/openshift/okd/releases/download/4.5.0-0.okd-2020-10-15-235428/open\
shift-install-linux-4.5.0-0.okd-2020-10-15-235428.tar.gz

#Extract the okd version of the oc client and openshift-install:
tar -zxvf openshift-client-linux-4.5.0-0.okd-2020-10-15-235428.tar.gz
tar -zxvf openshift-install-linux-4.5.0-0.okd-2020-10-15-235428.tar.gz

#Move the kubectl, oc, and openshift-install to /usr/local/bin and show the version:
sudo mv kubectl oc openshift-install /usr/local/bin/
#Test oc client and openshift-install command
oc version
openshift-install version

You may also check the status of the release builds at https://origin-release.apps.ci.l2s4.p1.openshiftapps.com/.


Setup the Openshift Installer

If you haven't done so, create an SSH key without a password. We will provide the SSH key to the install_config.yaml so that we can login to the VMs without password prompts.

#Generate an SSH key if you do not already have one.
ssh-keygen

Your ssh key public key is usually located at ~/.ssh/id_rsa.pub.

Create an installation directory and copy the installer config file to it. We will use this directory to hold the files generated by the opnshift-install command.

cd
mkdir install_dir
cp okd4_files/install-config.yaml ./install_dir

Get a Red Hat pull secret by logging in to https://cloud.redhat.com. Navigate to Cluster Manager > Create Cluster > Red Hat Openshift Container Platform > Run on Baremetal > User-Provisioned Infrastructure > Copy pull secret.

Edit the install_config.yaml. Replace the value of pull secret field with your pull secret or leave it as is if you don't have a Red Hat pull secret. Then, replace the value of sshKey field with your SSH key.

The last two lines of your install_config.yaml file should look like this.


Generate the Ignition Files

Run the installer to generate files and host the files in httpd.

cd
openshift-install create manifests --dir=install_dir/
# This lines disables schedule application pods on the master nodes
sed -i 's/mastersSchedulable: true/mastersSchedulable: False/' install_dir/manifests/clust\
er-scheduler-02-config.yml
openshift-install create ignition-configs --dir=install_dir/

rm -drf /var/www/html/okd4
sudo mkdir /var/www/html/okd4

sudo cp -R install_dir/* /var/www/html/okd4/
sudo chown -R apache: /var/www/html/
sudo chmod -R 755 /var/www/html/
curl localhost:8080/okd4/metadata.json

Download the Fedora Core OS image and signature. Rename with a shorter name and host the image and signature files in httpd under okd4 directory.

cd /var/www/html/okd4/
sudo wget https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/32.20201004\
.3.0/x86_64/fedora-coreos-32.20201004.3.0-metal.x86_64.raw.xz
sudo wget https://builds.coreos.fedoraproject.org/prod/streams/stable/builds/32.20201004\
.3.0/x86_64/fedora-coreos-32.20201004.3.0-metal.x86_64.raw.xz.sig
sudo mv fedora-coreos-32.20201004.3.0-metal.x86_64.raw.xz fcos.raw.xz
sudo mv fedora-coreos-32.20201004.3.0-metal.x86_64.raw.xz.sig fcos.raw.xz.sig
sudo chown -R apache: /var/www/html/
sudo chmod -R 755 /var/www/html/

The latest Fedora Core OS release is available at https://getfedora.org/coreos/download?tab=cloud_launchable&stream=stable. Update the wget links accordingly to get the latest versions.


Starting the VMs

Now that the ignition files are generated. It's time to start the VMs. Select the okd4-bootstrap VM and navigate to Console. Start the VM. When you see the Fedora CoreOS startup screen, press TAB on the keyboard. This will initiate the PXE boot (booting over the network). In the command line below the screen, append the following arguments. This will install the Fedora CoreOS to /dev/sda disk using the image file and the ignition file we hosted in http in the earlier steps.

Bootstrap Node

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.1.210:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.1.210:8080/okd4/bootstrap.ign

In the console screen, it should look like this.

This step is painful when you make typo errors. Review what you typed before pressing "Enter". You also cannot copy-paste the arguments for these steps because Proxmox VE has no way of forwarding the clipboard to the VNC console session.

Master Nodes

Repeat the above step for all the other VMs, starting with the master nodes, okd4-control-plane-X. For the master nodes you need to replace the last argument ignition file name to master.ign.

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.1.210:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.1.210:8080/okd4/master.ign

Worker Nodes

Repeat the above steps for worker nodes.

coreos.inst.install_dev=/dev/sda
coreos.inst.image_url=http://192.168.1.210:8080/okd4/fcos.raw.xz
coreos.inst.ignition_url=http://192.168.1.210:8080/okd4/worker.ign


Bootstrap Progress

You can monitor the installation progress by running the following command.

cd
openshift-install --dir=install_dir/ wait-for bootstrap-complete --log-level=info

Once the bootstrap process completes, you should see the following messages.


Removing the Bootstrap Node

You can now shutdown/stop the okd4-bootstrap VM. Then we need to remove/comment out the bootstrap node from the load balancer so that API requests do not get routed to the bootstrap IP.

Edit the /etc/haproxy/haproxy.cfg file and reload the HAProxy configuration.

sudo sed '/ okd4-bootstrap /s/^/#/' /etc/haproxy/haproxy.cfg
sudo systemctl reload haproxy


Approving Certificate Signing Requests

We need to interact with the API in order to approve CSRs and to check the status/readiness of the cluster operators. Under the install_dir/auth there is a generated kubeconfig file. We will use this to access the API.

#login to the cluster
export KUBECONFIG=~/install_dir/auth/kubeconfig
oc whoami
oc get nodes
oc get csr

You should see only the master nodes when you ran the oc get nodes. This is because the worker nodes are still waiting for the CSR to be approved.

To quickly approve all CSR at once instead of doing it one-by-one, we will install jq and then use it to run oc adm certificate approve.

wget -O jq https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64
chmod +x jq
sudo mv jq /usr/local/bin/
jq --version
oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | \
xargs oc adm certificate approve

Approve the pending CSRs. You may need to run thing multiple times. Regularly run oc get csr. If you see a Pending CSR, run the below command again.

oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | \
xargs oc adm certificate approve


Check the Status of the Cluster Operators

You can check the status of cluster operators with the following command.
oc get clusteroperators

You should see an output like this.

Wait for the console to be available. Once it is available, we can point a browser to https://console-openshift-console.clustername.domain.name

You will get an SSL error because the certificate is not valid for this domain. That's normal. Just bypass the SSL error.


You will have to bypass SSL error twice because both the web console and the oAuth domains have an invalid certificate. After ignoring SSL errors for both web console and OAuth, you should see a login screen.



Login with user "kubeadmin".You can find the kubeadmin password in a file generated during the installation.

cat install_dir/auth/kubeadmin-password

Et voila!



Create a Cluster Admin User


The current kubeadmin user that we are using in the previous step is temporary. We need to create a permanent cluster administrator user. The fastest way to do this is by using htpasswd as an authentication provider. We will create a secret under the openshift-config namespace and add a htpasswd provider in the cluster oAuth.

cd
cd okd4_files
htpasswd -c -B -b users.htpasswd testuser testpassword
oc create secret generic htpass-secret --from-file=htpasswd=users.htpasswd -n \
openshift-config
oc apply -f htpasswd_provider.yaml

The next time you login to the web console, you should have an option to choose an authentication provider and there should be an htpasswd option like below.


Login with credentials: testuser/testpassword. After authentication succeeds, we need to give this new user a cluster admin role. Note that the command below does not work before the first login.

oc adm policy add-cluster-role-to-user cluster-admin test

Now that we have a proper cluster admin user. We can then delete the kubeadmin temporary user.

oc delete secrets kubeadmin -n kube-system

You should not see the kube:admin login option the next time you log in.


Setup Image Registry


In order to complete the setup, we need to create persistent storage to be exposed as persistent volume to be used by the Openshift image registry.

In okd4_services VM, run the following, run the following to install NFS Server, start it and open up some ports in the firewall.

#Setting up NFS server
sudo dnf install -y nfs-utils
sudo systemctl enable nfs-server rpcbind
sudo systemctl start nfs-server rpcbind
sudo mkdir -p /var/nfsshare/registry
sudo chmod -R 777 /var/nfsshare
sudo chown -R nobody:nobody /var/nfsshare

echo '/var/nfsshare 192.168.1.0/24(rw,sync,no_root_squash,no_all_squash,no_wdelay)' | \
sudo tee /etc/exports

sudo setsebool -P nfs_export_all_rw 1
sudo systemctl restart nfs-server
sudo firewall-cmd --permanent --zone=public --add-service mountd
sudo firewall-cmd --permanent --zone=public --add-service rpc-bind
sudo firewall-cmd --permanent --zone=public --add-service nfs
sudo firewall-cmd --reload

Create a persistent volume in Openshift.

cd
oc create -f okd4_files/registry_pv.yaml
oc get pv

You should see an unclaimed PV.


Now change the management State to Managed and set the claim to blank.

oc edit configs.imageregistry.operator.openshift.io

Change according to below and save it.

managementState: Managed
storage:
pvc:
claim:

If you run another oc get pv, you should see that the PV has been claimed by the image-registry-storage.


Now you have an image registry backed by persistent storage via NFS.


Deploying our First Application

 
Now let's deploy a demo Java application called Spring Pet Clinic. The source code is available in Github repo: https://github.com/spring-projects/spring-petclinic. We will use Openshift's S2I (Source-to-Image) feature to package and deploy a git repository into a running application.

First, create a project/namespace, then create an openshift app from the git source repo.

# Deploy Spring Pet Clinic
oc new-project spring-petclinic
oc new-app registry.redhat.io/ubi8/openjdk-11~https://github.com/spring-projects/spring-\
petclinic.git
oc expose svc/spring-petclinic

Open the web console. Select the "spring-petclinic" project. Click the Administrator and select the Developer view. In the Topology tab, you should see an application called spring-petclinic as shown below.


In the screenshot above, you can see that the application is being built. This may take a while depending on how fast your internet connection is. This is a Java Spring application that is known for having hundreds, maybe thousands, of dependencies. These dependencies are being downloaded from maven central during the build process.

Once the build and deployment are successful, the application icon will turn blue, and you should be able to navigate to the route URL.



We have just deployed our first application to our newly provisioned Openshift cluster running on Proxmox VE.

This concludes the "Running Openshift at Home" series. Check out the links below for the earlier parts.


3 comments:

  1. You actually make it look so easy with your performance but I find this matter to be actually something which I think I would never comprehend. It seems too complicated and extremely broad for me. I'm looking forward for your next post, I’ll try to get the hang of it! Applications for Qualification

    ReplyDelete
  2. Thanks for the great guide. Just got my new home server and used this guide to setup my own Openshift cluster so now I can bring my work home with me!!!
    Just a couple of corrections for you.
    1) You're missing the 'sudo dnf install haproxy' command, no biggy as this is fairly obvious
    But the one that had me scratching my head was
    2) You have the console url as https://console-openshift-console.clustername.domain.name and NOT https://console-openshift-console.apps.clustername.domain.name. I spent an hour or so scratching my head checking the routes and services, making sure all the pods in openshift-console and openshift-authentication were happy before I noticed the missing apps in the url.
    But now it's all working I just wanted to say again. Thanks for the great guide

    ReplyDelete
  3. Thanks a ton Ross.. Got my 4.11.0-0.okd-2022-12-02-145640 cluster running with 3 control-plane and 3 worker nodes running on Proxmox VE 7.3 primarily based on the guidance you provided here. I'm using VyOS 1.4 nightly as the DNS server (it supports being a full fledged DNS server using powerDNS and I've already got automated deployment / config backup scripts so deploying it in an automated Infra as Code way with the required records in a git tracked config file is super useful) and an LXC container with HAProxy as the services node.

    Was a fun project.

    ReplyDelete

Popular