SolarCurve's House of Flamp: 2012

I've been chasing down an issue where my backup software (Vembu Storegrid, who provides terrific support) would hang and become unresponsive. Their support team logged in and helped me figure out that part of the disk appears to become inresponsive in high IO (Reading and writing from the disk) situations like when doing backups. They suggested running a repair.

So I logged in and wanted to shut down things and unmount the drive to prevent corruption.

(I check to see the device name I want to unmount /data or /dev/sdb1)

[root@sys-util-1 /]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 68G 45G 21G 69% /
tmpfs 2.0G 0 2.0G 0% /dev/shm
/dev/sdb1 2.8T 2.6T 195G 94% /data

(Now I try and unmount it, but it's showing as busy)
[root@sys-util-1 /]# umount /data
umount: /data: device is busy
umount: /data: device is busy

(So now I try and force unmount it with no luck)

[root@sys-util-1 /]# umount -f /data

umount2: Device or resource busy

umount: /data: device is busy

umount2: Device or resource busy

umount: /data: device is busy

(Next I ran an "lazy" unmount which means to unmount at the next moment it's not in use)

[root@sys-util-1 /]# umount -l /data

[root@sys-util-1 /]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda2 68G 45G 21G 69% /

tmpfs 2.0G 0 2.0G 0% /dev/shm

(Now I see that the device is unmounted and I wanted to run the repair but it's failing with the error below)

[root@sys-util-1 /]# xfs_repair /dev/sdb1

xfs_repair: /dev/sdb1 contains a mounted filesystem

fatal error -- couldn't initialize XFS library

(I decided to try a basic check first instead but the result was a fail as well)

[root@sys-util-1 /]# xfs_check /dev/sdb1

xfs_check: /dev/sdb1 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library

(So I mounted /data back up again and ran the fuser command to find out which applications were trying to hold open connections to the drive and then I killed them and confirmed that they went peacefully)

[root@sys-util-1 /]# mount /data

[root@sys-util-1 /]# fuser -vm /dev/sdb1

USER PID ACCESS COMMAND

/dev/sdb1: root 4567 f.... nautilus

root 4590 f.... trashapplet

root 4890 ..c.. bash

[root@sys-util-1 /]# kill 4567

[root@sys-util-1 /]# kill 4590

[root@sys-util-1 /]# kill 4890

[root@sys-util-1 /]# fuser -vm /dev/sdb1

(Next I unmounted the drive again and ran the repair. We are in business.)

[root@sys-util-1 /]# umount /data

[root@sys-util-1 /]# xfs_repair /dev/sdb1

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

- scan filesystem freespace and inode maps...

- found root inode chunk

Phase 3 - for each AG...

- scan and clear agi unlinked lists...

- process known inodes and perform inode discovery...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- agno = 4

- agno = 5

- agno = 6

I won't bother you with the rest but after googling around I didn't find anyone that had clearly laid out how to deal with these errors. I wanted to put something good out in the universe to hopefully help some others.

SolarCurve's House of Flamp

Wednesday, January 25, 2012

Nerding Out: File system issues

About Me

Search This Blog

Blog Archive

Labels