Removing Inaccessible Objects in vSAN
During the recent Chinese New Year break, I decided to get a proper rack for my lab equipment, transitioning it from my trusty work desk to an 'IKEA' shelf. As with all best practices, I decided to put my hosts into Maintenance Modes and turn off vSAN gracefully.
Long story short, I put 1 node of vSAN into Maintenance Mode, and tried to unplug the power. While doing it, accidentally tugged on the power of the 2 remaining vSAN nodes. Jist of it, I had the entire cluster down "not gracefully" like I intended to.
When the cluster came back up, I noticed that I had a bunch of inaccessible objects. Tried to "Repair objects immediately", but it didn't help.
18 Objects Inaccessible
There's a few reasons that objects could have been inaccessible, one being orphaned objects like .vswp files that used to exist, or objects that are currently suffering failures more than it was configured to tolerate.
So here's a few steps we can take to re mediate it and also to figure out what those objects are.
Step 1. SSH into VCSA. Login with root, then RVC into the VCSA Console. If your are not using the default vphere.local try: user@domain.com@localhost [note : Thanks Michael Garito for the tip!]
Login via root, and then RVC into VCSA
Step 2. Change directories and get into the Cluster folder. There you will see your vSAN Cluster name (optional). In this case, my cluster name is "VSAN".
Locate the name of your vSAN cluster by traversing the folders
Step 3. Run vsan.check_state -r cluster-name. This will check the state and tries to refresh the objects it. It will then list out all the inaccessible objects.
List inaccessible objects in the cluster
Step 4. We can now try to purge the swap objects. Vswp objects gets regenerated when the VM is powered on, totally okay to delete them. Run vsan.purge_inaccessible_vswp_objects cluster-name. From the output below, it seems like I have 18 objects inaccessible, of which 2 of them could be vswp.
No vswp objects found, but there are 2 objects that could be vswp and not associated to any objects
Step 5. We still have 16 objects that are unclassified. Run vsan.check_state -r cluster-name again, and note down the UUID of the remaining inaccessible objects.
Run vsan.cmmds_find -u UUID cluster-name. The details will show you where is the object hosted and what object it is.
For my scenario, I know these objects can be deleted, because I made some funky policy changes on the cluster while the 3rd node is in Maintenance Modes, possibly causing some orphaned objects. Not to mentioned, all my VM's are running fine. I will go ahead and delete these items in my next step.
Details of the particular object, owner and type
Step 6. To delete, you will need to SSH into the respective owner nodes. Delete the objects using the UUID that we noted earlier. Run /usr/lib/vmware/osfs/bin/objtool delete -u UUID -f -v 10
Deleting inaccessible objects from the node
Step 7. Repeat the process for each of the objects.
Hopefully the above steps will help your cluster get back into shape.