Nutanix: Error – Foundation service running on one of the nodes

In the past I had two different scenarios, where an running foundation process got me some errors. The first one was, that I could not run LCM and the other one was that I could not upgrade Foundation via PRISM, like you can see on the screenshot:

Under normal operations, the foundation service is stopped on all cluster nodes. Only if you destroy a cluster, the foundation service gets started permanently until you create a new cluster / add the nodes to an existing cluster.

As far as I know the only other component within AOS is LCM, which leverages the foundation service for certain hardware update tasks like a BIOS update. This is also the most common reason, why a foundation process is started / still running in an “normal” cluster: Some sort of previous failed LCM actions.

To check if and where Foundation is running ssh into one of the CVMs and run the following command:

allssh 'genesis status | grep foundation'

As you can see in my output it was running on my CVM with the .24 IP-address (the process IDs in the brackets is the indication that the process is up and running):

To stop the foundation process just ssh to the related CVM and run:

genesis stop foundation

The output will directly show you that the service is now stopped:

Now just run LCM / Foundation upgrade again and the pre-checks will succeed.

NOTE: Please use the commands above only if you know what you are doing and at your own risk. If you are uncertain, I would strongly recommend involving the Nutanix support.

Speak Your Mind

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.