Failure description
1, the customer's multiple host system at the same time reported errors in storage disk failure,
the initial inference is that the storage failure.
2, engineers arrived at the scene and found that an EMC CX3-80 storage error, part of the LUN reported Bound Unassigned,
check the error LUNs found that they are distributed in different raid group, storage did not report hard disk failure.
3, further check the error LUNs, found that they are no longer attributed to the Current Owner for N/A, in addition, the error LUNs are offline state, the write cache of the two controllers are disabled.
Failure Analysis
Collect SPcollect and analyze the TRiiAGE_Analysis.html file.BugCheck caused SPA to restart abnormally.When SPA restarted,
SPB also malfunctioned.This failure scenario is similar to the power down scenario.There is a dirty cache problem.
Troubleshooting
1, need to communicate with the customer in advance before the operation, because clearing the dirty cache of LUN will make the data
of LUN on the cache be cleared,which may lead to the database problem of the system.
2、Find a host that can be connected to the storage, install EMC remote software, and enter the underlying windows system
of the 2 controllers to clear the dirty cache.
3, into the controller of the underlying windows system view, dirty cache LUN list is as follows:
4, in the TRiiAGE_Analysis.html file to check the LUN number, ALU is the ID we usually see, FLU is the system ID. checking is complete,
clear dirty cache operation (in the two controllers to operate separately).
After all the clearing is completed, Navisphere checks that the LUNs are online, the storage display is normal,
and the disk status of the host is normal.
Lessons learned
1. When encountering the problem of dirty cache in storage, you should consult relevant information and operate with caution.
2、Before clearing the dirty cache of the LUN, be sure to explain to the customer that the operation may lead to the loss of the cache data
of the LUN to ensure that the operation is safe and smooth.
For more information, please visit Antute's official website:54z9.pearltele.com