How to check if the datastore is corrupted? |
In cluster case:
The procedure below only checks the datastore of the member where it it executed. It always has to be executed on each members (including the coordinator). This can be done in parallel or not.
If multi generational datastore is disabled: The procedure below requires to stop the services of the member where the procedure is executed.
If multi generational datastore is enabled: The procedure below will check the entire datatore (all partitions) of the current member only if all the services are stopped. Otherwise, it will check only a part of the datastore: the "read-only partitions".
Procedure to check if the datastore of one member is corrupted: 1- If the TKU is older than Jul22, Install tw_support_tool
2- open an ssh session with the linux account "tideway" and execute the command below screen -S mySession
3- Execute the command below:
tw_support_tool --check-db-corruptions --restart-services
It will stop the services (implies downtime in non-FT cluster or standalone appliance case), check the db corruptions and restart the services afterwards. Note: -If tw_support_tool looks blocked/stuck, use the article below:
Discovery: tw_support_tool looks stuck - this procedure will collect your customization, ip addresses, hostnames and user names (PII). Alternative method 1: stop the services of the current member by yourself and execute the command tw_support_tool --check-db-corruptions Alternative method 2: If you can't stop services, execute the command tw_support_tool --check-db-corruptions. It will list the checks that can be done (if any) and will tell you how to check more. It won't be able to check anything if the services are up and multi generational datastore is disabled.
4- Wait for the end of the step above. This can be long. Sometimes more than 12 hours with large datastore. The duration can't be predicted. It depends on many parameters including the performance of the appliance, the datastore volume, the performance of the storage, etc.
5- When BMC Customer Support asks for the execution of this procedure, attach the file /tmp/SendThisFileToBMCSupport-<hostname>.tgz to your support case.
6-review the section "db_verify" of /tmp/tw_support_tool.latest (sed -n "/db_verify/,/---/p" /tmp/tw_support_tool.latest). This file is zipped in /tmp/SendThisFileToBMCSupport-<hostname>.tgz If no corruptions were found, the db_verify section will look like this: --------------------------- db_verify
[...] 1334 checkable db files They were all checked Unexpected messages (10 first ones): <= unexpected messsages does not mean "db corruption" Warning ABC
ERROR 123 If a corruption was found, it will look like this:
--------------------------- db_verify
[...] 1 db corruptions detected 1 index file(s) corrupted 0 history file(s) corrupted 0 state file(s) corrupted 0 rels file(s) corrupted Errors found (10 first ones): db_verify: p0003_rInference_pidx: BDB0090 DB_VERIFY_BAD: Database verification failed What if ...? - ... some corruptions are reported
Review the articles below: - ... the section "db_verify" contains the messages below
db_verify: BDB3018 unknown: unwritable page 51406 remaining in the cache after error 28 (error 28 is ENOSPC)
or db_verify: BDB0137 write: 0x246c2f0, 1024: No space left on device db_verify: BDB3015 unknown: write failed for page 1428795 db_verify: BDB3018 unknown: unwritable page 1428795 remaining in the cache after error 28 This means that the /usr filesystem was saturated during the execution of tw_support_tool. Backup the cores (if any) and log files if you have to delete them. Re-execute the procedure with more free disk space.
- ... tw_support_tool was interrupted by a timeout (12h by default) The message will also explain how to increase the timeout.
- ... you interrupted the script Use CTRL-C. If afterwards, you can't see what you type anymore, type "stty echo" + enter. If there are still remaining db_verify processes running 1h later, reboot.
- ... the ssh session was disconnected while tw_support_tool was running. Open a new ssh session on the host and execute "screen -r mySession". If the session can't be found, this means that the tool is finished, the result is in /tmp/SendThisFileToBMCSupport-<hostname>.tgz
Please refer following video |