momo zone

调核人的blog

SCSI – Hot add, remove, rescan of SCSI devices

SCSI – Hot add, remove, rescan of SCSI devices

Finding informations about SCSI devices

The first problem when working with SCSI devices might be to map informations you got from /proc/scsi/scsi to the device names the kernel uses like /dev/sda and so on.

If you use Novell SuSE Enterprise Server this might not be a problem because SLES comes with a nice little commando called lsscsi. Here’s an example:

op710-1-lpar1:/ # lsscsi
[0:0:1:0]    disk    AIX      VDASD                  /dev/sda
[0:0:2:0]    disk    AIX      VDASD                  /dev/sdb
[0:0:3:0]    cd/dvd  AIX      VOPTA                  /dev/sr0
[0:0:4:0]    disk    AIX      VDASD                  /dev/sdc
[0:0:5:0]    disk    AIX      VDASD                  /dev/sdd

But assuming you don’t use SuSE or using other distributions in addition to SuSE. Well, the first place to look at is the good old /proc filesystem – namely /proc/scsi/scsi.

op710-1-lpar1:/ # cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: AIX      Model: VDASD            Rev:
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: AIX      Model: VDASD            Rev:
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: AIX      Model: VDASD            Rev:
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: AIX      Model: VDASD            Rev:
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: AIX      Model: VOPTA            Rev:
  Type:   CD-ROM                           ANSI SCSI revision: 04

The first line of a stanza is important. For example Host: scsi0 Channel: 00 Id: 01 Lun: 00 will be 0:0:1:0 if you’ll like it short or the first device on the first SCSI channel of the first SCSI adapter. This is the SCSI-ID of the device.

O.k. to find out which device like sda is hiding behind this information you must follow me on a journey to the /sys filesystem. If you’ll like or not /sys is very important and our friend.

The information about the device names the Linux kernel uses for a specific SCSI-ID can be found at /sys/bus/scsi/drivers/sd/<SCSI-ID> subdirectory as shown in the following example:

op710-1-lpar1:/ # ll /sys/bus/scsi/drivers/sd/0\:0\:1\:0/
total 0
drwxr-xr-x  2 root root    0 Aug 17 14:38 .
drwxr-xr-x  7 root root    0 Aug 17 16:33 ..
lrwxrwxrwx  1 root root    0 Aug 17 14:38 block -> ../../../../../block/sda
--w-------  1 root root 4096 Aug 17 14:38 delete
-rw-r--r--  1 root root 4096 Aug 17 14:38 detach_state
-r--r--r--  1 root root 4096 Aug 17 14:38 device_blocked
lrwxrwxrwx  1 root root    0 Aug 17 14:38 generic -> ../../../../../class/scsi_generic/sg0
-r--r--r--  1 root root 4096 Aug 17 14:38 model
-rw-r--r--  1 root root 4096 Aug 17 14:38 online
-r--r--r--  1 root root 4096 Aug 17 14:38 queue_depth
--w-------  1 root root 4096 Aug 17 14:38 rescan
-r--r--r--  1 root root 4096 Aug 17 14:38 rev
-r--r--r--  1 root root 4096 Aug 17 14:38 scsi_level
-rw-r--r--  1 root root 4096 Aug 17 14:38 timeout
-r--r--r--  1 root root 4096 Aug 17 14:38 type
-r--r--r--  1 root root 4096 Aug 17 14:38 vendor

Now we know that the device with the SCSI-ID 0:0:1:0 is known to the kernel as sda. Well done.

Note..
Pls. note that there are some differences between Novell’s SuSE Enterprise Linux and Red Hat’s Enterprise Linux in the way they’ll work with /sys. The mentioned method will work on both distributions though.

Rescan of a SCSI bus

Let’s assume you’ve added a SCSI disk to a SCSI bus (either a physical or virtual device). One possibility to make it known to the system would be to reboot the server or partition. But aaargh no, this is not the prefered way. The easiest way is to rescan the whole SCSI bus which will enable the Linux kernel to detect new devices!

To issue a SCSI bus rescan you must know on which bus you’ve added the device. If you don’t know which bus and if there are mutliple buses on the system you can rescan each bus which will be somehow annoying but will not interrupt the system.
To initiate a SCSI bus rescan type echo “- – -” > /sys/class/scsi_host/hostX/scan where X stands for the SCSI bus you want to scan.

op710-1-lpar1:/ # lsscsi
[0:0:1:0]    disk    AIX      VDASD                  /dev/sda
[0:0:2:0]    disk    AIX      VDASD                  /dev/sdb
[0:0:3:0]    cd/dvd  AIX      VOPTA                  /dev/sr0
[0:0:4:0]    disk    AIX      VDASD                  /dev/sdc
[0:0:5:0]    disk    AIX      VDASD                  /dev/sdd
op710-1-lpar1:/ # echo "- - -" > /sys/class/scsi_host/host0/scan
op710-1-lpar1:/ # lsscsi
[0:0:1:0]    disk    AIX      VDASD                  /dev/sda
[0:0:2:0]    disk    AIX      VDASD                  /dev/sdb
[0:0:3:0]    cd/dvd  AIX      VOPTA                  /dev/sr0
[0:0:4:0]    disk    AIX      VDASD                  /dev/sdc
[0:0:5:0]    disk    AIX      VDASD                  /dev/sdd
[0:0:6:0]    disk    AIX      VDASD                  /dev/sde

Well done, here’s our new device. By the way this does not only work for disks but also for SCSI CD/DVD devices.

Note…
Please note that a rescan of the SCSI bus will only detect new devices added to the bus. It will not detect the state of an already known device – e.g. if it’s failed or online.

Deletion of a SCSI Device

There might be the situation where you must remove a SCSI device from the system. Easy by using
echo 1 > /sys/bus/scsi/drivers/sd/<SCSI-ID>/delete. Here’s an example:

op710-1-lpar1:/ # lsscsi
[0:0:1:0]    disk    AIX      VDASD                  /dev/sda
[0:0:2:0]    disk    AIX      VDASD                  /dev/sdb
[0:0:3:0]    cd/dvd  AIX      VOPTA                  /dev/sr0
[0:0:4:0]    disk    AIX      VDASD                  /dev/sdc
[0:0:5:0]    disk    AIX      VDASD                  /dev/sdd
[0:0:6:0]    disk    AIX      VDASD                  /dev/sde
op710-1-lpar1:/ # echo 1 > /sys/bus/scsi/drivers/sd/0\:0\:6\:0/delete
op710-1-lpar1:/ # lsscsi
[0:0:1:0]    disk    AIX      VDASD                  /dev/sda
[0:0:2:0]    disk    AIX      VDASD                  /dev/sdb
[0:0:3:0]    cd/dvd  AIX      VOPTA                  /dev/sr0
[0:0:4:0]    disk    AIX      VDASD                  /dev/sdc
[0:0:5:0]    disk    AIX      VDASD                  /dev/sdd
Note…
To readd the device, simply rescan the whole SCSI bus.
Note…
Pls. note that deleting and readding the device will change the device names – i.e. by deleting /dev/sde and readding it using the scsi bus rescan it will become dev/sdf!

Rescan of a SCSI Device

The problem of a SCSI bus rescan is, that it will only detect new devices. But let’s assume the following situation.

You’ll have a client partition on a system with two VIO server. Each VIO server exports several virtual SCSI devices to the client partition and the client will build several RAID1 arrays on top of those devices. This setup will keep the client up and running even if on VIO server fails.
This szenario is shown in the following example:

570-lpar9:~ # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdc1[0] sde1[1]
      52428672 blocks [2/2] [UU]

md2 : active raid1 sdd1[1] sdf1[0]
      26214272 blocks [2/2] [UU]

md0 : active raid1 sdm2[2](F) sdb2[1]
      62896064 blocks [2/1] [_U]

unused devices: <none>

As you can see the RAID device /dev/md0 has one faulty device. Let’s assume you’ve corrected the problem with this device /dev/sdm and you know that it works correctly. But hot-removing and hot-adding of the device to the RAID array does not work. More worse, the kernel itself will post a lot of errors in /var/log/messages. The device keep staying in a faulty state. One possibility is, of course, to reboot the system but hey, this is our SAP system which cannot rebooted easily.

The solution is to rescan the device itself. This will tell the Linux kernel that this device is up and running. To rescan a particularly device, use the command
echo 1 > /sys/bus/scsi/drivers/sd/<SCSI-ID>/block/device/rescan.

570-lpar9:/sys/bus/scsi/drivers/sd/0:0:1:0/block/device # echo 1 > rescan
570-lpar9:/sys/bus/scsi/drivers/sd/0:0:1:0/block/device # tail -f /var/log/messages
...
Aug 17 16:17:49 570-lpar9 kernel: SCSI device sdm: 125829120 512-byte hdwr sectors (64425 MB)
Aug 17 16:17:49 570-lpar9 kernel: sdm: cache data unavailable
Aug 17 16:17:49 570-lpar9 kernel: sdm: assuming drive cache: write through
...

570-lpar9:/sys/bus/scsi/drivers/sd/0:0:1:0/block/device # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdc1[0] sde1[1]
      52428672 blocks [2/2] [UU]

md2 : active raid1 sdd1[1] sdf1[0]
      26214272 blocks [2/2] [UU]

md0 : active raid1 sdm2[2](F) sdb2[1]
      62896064 blocks [2/1] [_U]

unused devices: <none>
570-lpar9:/sys/bus/scsi/drivers/sd/0:0:1:0/block/device # mdadm /dev/md0 -r /dev/sdm2 -a /dev/sdm2
mdadm: hot removed /dev/sdm2
mdadm: hot added /dev/sdm2
570-lpar9:/sys/bus/scsi/drivers/sd/0:0:1:0/block/device # cat /proc/mdstat      
Personalities : [raid1]
md1 : active raid1 sdc1[0] sde1[1]
      52428672 blocks [2/2] [UU]

md2 : active raid1 sdd1[1] sdf1[0]
      26214272 blocks [2/2] [UU]

md0 : active raid1 sdm2[2] sdb2[1]
      62896064 blocks [2/1] [_U]
      [>....................]  recovery =  0.3% (211736/62896064) finish=14.7min speed=70578K/sec
unused devices: <none>

As you can see the rescan has told the kernel that the device is up and running again and now the RAID1 resync works. Well done.

Now let’s assume you’ve created a virtual device with 10GB capacity. But soon after that you’re facing the problem, that this space is enough. Assuming that we’re in a virtual environment, you can extend the logical volume on the VIO server. And then you must tell the client that it has more space. Ok, simply rescan the device and you can use the additional space. Here’s an example:

op710-1-lpar1:/ # fdisk -l
...
Disk /dev/sde: 10.7 GB, 10737418240 bytes
64 heads, 32 sectors/track, 10240 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1       10240    10485744   83  Linux
op710-1-lpar1:/ # echo 1 > /sys/bus/scsi/drivers/sd/0\:0\:6\:0/rescan
op710-1-lpar1:/ # fdisk -l
...
Disk /dev/sde: 16.1 GB, 16106127360 bytes
64 heads, 32 sectors/track, 15360 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1       10240    10485744   83  Linux
...
DOING SOME FDISK PARTITIONING HERE
...
op710-1-lpar1:/ # fdisk -l
...
Disk /dev/sde: 16.1 GB, 16106127360 bytes
64 heads, 32 sectors/track, 15360 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1               1       10240    10485744   83  Linux
/dev/sde2           10241       15360     5242880   83  Linux

The new capacity will be added at the end of the “old” device. Wow.

Be Careful!
Although it sounds great to resize a device online there are some caveheats to consider:

  • First resizing of the device will only work when the device is not mounted! Well, not really online anymore…
  • Second as far as I’ve tested the online resizing of volumes it will not work with LVM devices.
  • One exception to the second point (!!!) is using md devices – see blow!

The kernel itself discovers the new size of the device during the rescan but as long it is mounted somehow, somewhere or as long as it has a LVM device on it it will not be able to use the addl. space at all – you must reboot the system or partition!
Nevertheless the resizing works even in a virtualized environment – e.g. resize the LUN on the FC disk subsystem, use cfgdev on the VIOS and then use the mentioned method on the Linux LPAR.

Now for production systems this is not acceptable. The solution should be to use LVM and add addl. LUNs to a Volume Group instead of thinking about resizing an existing LUN. You can save a lot of time and headaches…

Online resizing of md devices with LVM2

As stated above the resizing of a device itself is not that issue but to make Linux not only detect the new size but also use it is a little bit tricky. But nevertheless it works if you’ll use a SW RAID on top of the devices – so read on…

Please note!
The following description works for RHEL4u4. I’ve not tested SLES9/SLES10, yet!
By saying this – SLES9 comes with the necessary command pvresize but the system simply tells me that this function is not implemented yet.

…ok, assuming the following situation:

  • Two SAN devices are exported to a Linux partition using a VIOS.
  • One md device was created on top of these two LUNs (i.e. SW RAID1).
  • Using LVM2 on the md device.

Let’s have a look on the steps so far. My system has two addl. SCSI devices /dev/sdb and /dev/sdc. Now I want to use those devices for very mission critical data to be held on. Therefore I decided to create a SW RAID as shown below.

[root@bc1-js21-2-lpar1 RPMS]# mdadm --create  /dev/md0 -l 1 -n 2 /dev/sdb /dev/sdc
mdadm: array /dev/md0 started.
[root@bc1-js21-2-lpar1 RPMS]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdc[1] sdb[0]
      10485696 blocks [2/2] [UU]
      [>....................]  resync =  1.4% (148600/10485696) finish=5.7min speed=29720K/sec
unused devices: <none>

Now to be more flexible I decided to use LVM2 on /dev/md0.

[root@bc1-js21-2-lpar1 RPMS]# pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created
[root@bc1-js21-2-lpar1 RPMS]# vgcreate testvg /dev/md0
  Volume group "testvg" successfully created
[root@[root@bc1-js21-2-lpar1 RPMS]# lvcreate --size 5G -n testlv /dev/testvg
  Logical volume "testlv" created
[root@bc1-js21-2-lpar1 RPMS]# mkfs.ext3 -L testlv /dev/testvg/testlv
mke2fs 1.35 (28-Feb-2004)
Filesystem label=testlv
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
655360 inodes, 1310720 blocks
65536 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1342177280
40 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 31 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Ok, I am saving a little space here and hope you’ll trust me by saying that the next steps included mounting the filesystem and create the appropriate /etc/fstab – entry.
So everyone is happy now but suddenly somehow the provided disk space is not enough anymore. So one way of increasing the size of my testlv is to add addl. LUNs to the volume group testvg. Another way is to increase the size of the original LUNs. OK let’s do it…

  • Increase the size on the SAN disk subsystem.
  • Use cfgdev on the VIOS – now the VIOS knows about the new size of the drives.
  • Rescan the devices on the Linux LPARS
    [root@bc1-js21-2-lpar1 ~]# echo 1 > /sys/bus/scsi/drivers/sd/0\:0\:2\:0/rescan
    [root@bc1-js21-2-lpar1 ~]# echo 1 > /sys/bus/scsi/drivers/sd/0\:0\:3\:0/rescan
    Check…
    Check /var/log/messages to see if the devices were successfully rescaned!

    At this stage you might remember what I mentioned earlier about the resizing of a disk. But – DON’T PANIC! – in this szenario anything is a little bit easier.
    As a next step it is necessary to remove one device from the RAID array and readd it to the same array again. This can be done in one line.

    [root@bc1-js21-2-lpar1 0:0:2:0]# mdadm /dev/md0 -f /dev/sdc -r /dev/sdc -a /dev/sdc
    mdadm: set /dev/sdc faulty in /dev/md0
    mdadm: hot removed /dev/sdc
    mdadm: hot added /dev/sdc

    And now do a fdisk -l to see what happened.

    [root@bc1-js21-2-lpar1 0:0:2:0]# fdisk -l /dev/sdc
    
    Disk /dev/sdc: 12.8 GB, 12884901888 bytes
    64 heads, 32 sectors/track, 12288 cylinders
    Units = cylinders of 2048 * 512 = 1048576 bytes

    Hey, that sounds good  – seems it worked! At this point it is absolutely necessary that you’ll wait until the RAID device has finished its resync. Otherwise you’ll loose data!!!! Check proc/mdstat.

    [root@bc1-js21-2-lpar1 ~]# while true; do clear; cat /proc/mdstat; sleep 2; done
    Personalities : [raid1]
    md0 : active raid1 sdc[2] sdb[0]
          10485696 blocks [2/1] [U_]
          [=>...................]  recovery =  9.8% (1031248/10485696) finish=5.1min speed=30682K/sec
    unused devices: <none>

    After the resync has finished do the same thing for the other disk (set faulty, remove, readd and wait).
    Great, now you’ll have increased both md devices. But – hey wait – pvdisplay shows something totally different.

    [root@bc1-js21-2-lpar1 ~]# pvdisplay /dev/md0
      --- Physical volume ---
      PV Name               /dev/md0
      VG Name               testvg
      PV Size               10.00 GB / not usable 0
      Allocatable           yes
      PE Size (KByte)       4096
      Total PE              2559
      Free PE               1279
      Allocated PE          1280
      PV UUID               1uDhon-C9hb-Yd0a-fYfs-RAhF-CYpH-X3Gklu

    Ok, ok, ok – we’re not finished yet.

First we must know the new size of the drives.

[root@bc1-js21-2-lpar1 ~]# sfdisk -s /dev/sdb
12582912
[root@bc1-js21-2-lpar1 ~]# sfdisk -s /dev/sdc
12582912

The next step involves mdadm because we must tell the system to increase the RAID array – but please note

The size to use for mdadm is the size reported by sfdisk -s minus 120.
I.e. the size I use for mdadm is 12582792 (12582912 – 120).
Someone who can explain why it is necessary to reduce the reported size by 120 is very welcome!

So reduce the reported size by 120 and use mdadm to resize (grow) the md device.

[root@bc1-js21-2-lpar1 ~]# mdadm --grow --size 12582792 /dev/md0
[root@bc1-js21-2-lpar1 ~]# for i in sdb sdc; do mdadm --examine /dev/$i; done
/dev/sdb:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : db908d4e:3568aef7:f981fce7:9ae7839c
  Creation Time : Fri Sep  8 19:26:02 2006
     Raid Level : raid1
    Device Size : 12582792 (11.100 GiB 12.88 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
...
/dev/sdc:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : db908d4e:3568aef7:f981fce7:9ae7839c
  Creation Time : Fri Sep  8 19:26:02 2006
     Raid Level : raid1
    Device Size : 12582792 (11.100 GiB 12.88 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
...

Ok, now or md device is larger – it is now necessary to resize the physical volume using pvresize.

[root@bc1-js21-2-lpar1 ~]# pvdisplay /dev/md0
  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               testvg
  PV Size               10.00 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              2559
  Free PE               1279
  Allocated PE          1280
  PV UUID               1uDhon-C9hb-Yd0a-fYfs-RAhF-CYpH-X3Gklu
[root@bc1-js21-2-lpar1 ~]# pvresize /dev/md0
  Physical volume "/dev/md0" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
[root@bc1-js21-2-lpar1 ~]# pvdisplay /dev/md0
  --- Physical volume ---
  PV Name               /dev/md0
  VG Name               testvg
  PV Size               12.00 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              3071
  Free PE               1791
  Allocated PE          1280
  PV UUID               1uDhon-C9hb-Yd0a-fYfs-RAhF-CYpH-X3Gklu
[root@bc1-js21-2-lpar1 ~]# vgdisplay testvg
  --- Volume group ---
  VG Name               testvg
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               12.00 GB
  PE Size               4.00 MB
  Total PE              3071
  Alloc PE / Size       1280 / 5.00 GB
  Free  PE / Size       1791 / 7.00 GB
  VG UUID               RJQb7I-niYp-zvQz-EEPZ-FMJH-fVs3-nRWQW8

That’s it – our RAID device and more important than that our physical volumen and its volume group are larger as before and we can now extend our logical volumes to use this extra space or add addl. logical volumes.

留下评论