We have experienced a disk failure today and changed it without any problem or manual commands.
This morning we have lost a disk in exadata. We got an alert and an email mentioning that “Hard disk status changed to predicative failure: critical” . There was also the drawing of the location of the corrupted disk in the email.
From the logs of the cell and asm, it can be easily seen that it has dropped the grid disks and started a rebalance operation in order to be sure that all the data has 2 copies.
We did not wait for the oracle/sun engineer to come and replace the disk. Our system admins has replaced the disk and exadata automatically recognized the new disk and started a new rebalance operation without any manual commands.
/* cell triggers the drop operation */ Sat Feb 05 11:50:31 2011 Received subopcode 6 in publish ASM Query on 3 guids. NOTE: Initiating ASM Instance operation: ASM DROP critical disk on 3 disks DATA_CD_08_cel11 [00000xxxx-yyyy-zzzz-0000-000000000000] RECO_CD_08_cel11 [00000xxxx-yyyy-zzzz-0000-000000000000] SYSTEMDG_CD_08_cel11 [00000xxxx-yyyy-zzzz-0000-000000000000]
/* the corrupt disk has been replaced with the spare one */ Sat Feb 05 16:40:44 2011 Drop celldisk CD_08_cel11 (options: force, from memory only) - begin Drop celldisk CD_08_cel11 - end Sat Feb 05 16:40:44 2011 Open received invalid device name SYSTEMDG_CD_08_cel11 Sat Feb 05 16:40:44 2011 Open received invalid device name SYSTEMDG_CD_08_cel11 Sat Feb 05 16:42:44 2011 create CELLDISK CD_08_cel11 on device /dev/sdi Sat Feb 05 16:42:44 2011 create GRIDDISK DATA_CD_08_cel11 on CELLDISK CD_08_cel11 Griddisk DATA_CD_08_cel11 - number is (248) NOTE: Initiating ASM instance operation: Operation: DROP and ADD of ASM disk for Grid disk guid=00000xxxx-yyyy-zzzz-0000-000000000000 Received subopcode 4 in publish ASM Query on 1 guids. NOTE: Initiating ASM Instance operation: ASM DROP ADD disk on 1 disks DATA_CD_08_cel11 [00000xxxx-yyyy-zzzz-0000-000000000000] Storage Index Allocation for GridDisk DATA_CD_08_cel11 successful
Sat Feb 05 16:42:44 2011 create GRIDDISK RECO_CD_08_cel11 on CELLDISK CD_08_cel11 Griddisk RECO_CD_08_cel11 - number is (252) NOTE: Initiating ASM instance operation: Operation: DROP and ADD of ASM disk for Grid disk guid=00000xxxx-yyyy-zzzz-0000-000000000000 Received subopcode 4 in publish ASM Query on 1 guids. NOTE: Initiating ASM Instance operation: ASM DROP ADD disk on 1 disks RECO_CD_08_cel11 [00000xxxx-yyyy-zzzz-0000-000000000000] Storage Index Allocation for GridDisk RECO_CD_08_cel11 successful
Sat Feb 05 16:42:44 2011 create GRIDDISK SYSTEMDG_CD_08_cel11 on CELLDISK CD_08_cel11 Griddisk SYSTEMDG_CD_08_cel11 - number is (256) NOTE: Initiating ASM instance operation: Operation: DROP and ADD of ASM disk for Grid disk guid=00000xxxx-yyyy-zzzz-0000-000000000000 Received subopcode 4 in publish ASM Query on 1 guids. NOTE: Initiating ASM Instance operation: ASM DROP ADD disk on 1 disks SYSTEMDG_CD_08_cel11 [00000xxxx-yyyy-zzzz-0000-000000000000]
