You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The customer is having 3PARdata LUNs configured in ALUA peer persistent mode. With this configuration half of the SCSI sub paths are from active/optimized controller and other half are from standby controller. The paths from standby controller shows up as "active ghost running" in multipath -ll output.
It was confirmed with HPE/3PAR support that paths from standby controller will remain in path group marked as "enabled". These paths do not support READ/WRITE SCSI commands and would only process TUR commands. The paths from standby controller would become active during controller failover.
[root@host1 ~]# sg_rtpg -vvd /dev/sdhl
open /dev/sdhl with flags=0x802
report target port groups cdb: a3 0a 00 00 00 00 00 00 04 00 00 00
report target port group: pass-through requested 1024 bytes but got 100 bytes
Report list length = 100
Report target port groups:
target port group id : 0x102 , Pref=1
target port group asymmetric access state : 0x00 (active/optimized) <<---------- Paths from active/optimized controller.
T_SUP : 1, O_SUP : 0, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 0, AO_SUP : 1
status code : 0x00 (no status available)
vendor unique status : 0x00
target port count : 0a
Relative target port ids:
0x7d1
0x7d2
0x7e6
0x7e7
0x7e8
0x835
0x836
0x849
0x84b
0x84c
target port group id : 0x101 , Pref=0
target port group asymmetric access state : 0x02 (standby) <<---------- Paths from standby controller.
T_SUP : 1, O_SUP : 0, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 0, AO_SUP : 1
status code : 0x00 (no status available)
vendor unique status : 0x00
target port count : 0a
Relative target port ids:
0x3e9
0x3ea
0x3fe
0x3ff
0x400
0x44d
0x44e
0x461
0x463
0x464
Issue:
As the above paths from standby controller are in "Not Ready" state, the TUR command on these paths returns following messages as expected:
kernel: sd 0:0:2:27: [sdae] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
kernel: sd 0:0:2:27: [sdae] tag#0 CDB: Test Unit Ready 00 00 00 00 00 00
kernel: sd 0:0:2:27: [sdae] tag#0 Sense Key : Not Ready [current]
kernel: sd 0:0:2:27: [sdae] tag#0 Add. Sense: Logical unit not accessible, target port in standby state
Since the TUR command was completed with "Sense Key : Not Read" the sg_turs commands returns a return code of 2.
[root@host1 ~]# sg_turs -vvv /dev/sdae
open /dev/sdae with flags=0x800
test unit ready cdb: 00 00 00 00 00 00
duration=0 ms
test unit ready: Fixed format, current; Sense key: Not Ready
Additional sense: Logical unit not accessible, target port in standby state
Raw sense data (in hex):
70 00 02 00 00 00 00 0a 00 00 00 00 04 0b 00 00
00 00
device not ready
[root@host1 ~]# echo $?
2 <<----------
The return code of 2 is expected for paths in standby or "Not Ready" state:
$ man sg3_utils
[...]
EXIT STATUS
To aid scripts that call these utilities, the exit status is set to indicate
success (0) or failure (1 or more). Note that some of the lower values
correspond to the SCSI sense key values. The exit status values are:
0 success
1 syntax error. Either illegal command line options, options with bad
arguments or a combination of options that is not permitted.
2 the DEVICE reports that it is not ready for the operation requested.
The device may be in the process of becoming ready (e.g. spinning up
but not at speed) so the utility may work after a wait.
But this return code of 2 for sg_turs command creates problem in following code block in rescan-scsi-bus.sh script which keep iterating through the same while loop for 8 times. Each iteration in this while loop adds 1 second delay.
$vi rescan-scsi-bus.sh
[...]
while test $RC = 2 -o $RC = 6 && test $ctr -le 8; do <------
if test $RC = 2 -a "$RMB" != "1"; then echo -n "."; let LN+=1; sleep 1 <------
else usleep 20000; fi
let ctr+=1
sg_turs /dev/$SGDEV &>/dev/null
RC=$?
[...]
The rescan-scsi-bus.sh script keeps looping in above while loop as the sg_turs command returns 2 for paths in "Not Ready" state.
For the systems with with approx 10 LUNs with 4 paths in standby mode will lead to total 40 paths in ghost/standby state. For each of these paths, above while loop will add 8 seconds delay, i.e. total of 8*40 = 320 seconds. Customer is planning to increase the number of LUNs from 10 to ~50-60 and it would cause even more delays in re-scan.
Steps to Reproduce:
Assign the LUNs from 3PAR storage configured in peer-persistent mode.
Verify that half of the paths in multipath -ll are in "active ghost running" state
Re-scan the paths using rescan-scsi-bus.sh script, it shows excessive delay in scanning.
Actual results:
Excessive delays in re-scanning the paths from 3PAR LUNs configured in peer-persistent mode.
Expected results:
rescan-scsi-bus.sh script should better handle the paths in "Not Ready" state, to avoid excessive delays in re-scan.
For now we have work-around the current issue by commenting out the sleep 1 command in following while loop in rescan-scsi-bus.sh script:
$vi rescan-scsi-bus.sh
[...]
while test $RC = 2 -o $RC = 6 && test $ctr -le 8; do
if test $RC = 2 -a "$RMB" != "1"; then echo -n "."; let LN+=1; # sleep 1 <------ Commenting out "sleep 1"
else usleep 20000; fi
let ctr+=1
sg_turs /dev/$SGDEV &>/dev/null
RC=$?
[...]
My suggestion here is to either make the script aware of the "ghost" paths and skip scanning them (or fail gracefully without the timeout) or the timeouts to be lowered globally (via an extra argument).
(forwarded and anonymized for public use)
The customer is having 3PARdata LUNs configured in ALUA peer persistent mode. With this configuration half of the SCSI sub paths are from active/optimized controller and other half are from standby controller. The paths from standby controller shows up as "active ghost running" in
multipath -ll
output.It was confirmed with HPE/3PAR support that paths from standby controller will remain in path group marked as "enabled". These paths do not support READ/WRITE SCSI commands and would only process TUR commands. The paths from standby controller would become active during controller failover.
For example:
Issue:
As the above paths from standby controller are in "Not Ready" state, the TUR command on these paths returns following messages as expected:
Since the TUR command was completed with "Sense Key : Not Read" the
sg_turs
commands returns a return code of 2.The return code of 2 is expected for paths in standby or "Not Ready" state:
But this return code of 2 for
sg_turs
command creates problem in following code block inrescan-scsi-bus.sh
script which keep iterating through the same while loop for 8 times. Each iteration in this while loop adds 1 second delay.The
rescan-scsi-bus.sh
script keeps looping in above while loop as thesg_turs
command returns 2 for paths in "Not Ready" state.For the systems with with approx 10 LUNs with 4 paths in standby mode will lead to total 40 paths in ghost/standby state. For each of these paths, above while loop will add 8 seconds delay, i.e. total of 8*40 = 320 seconds. Customer is planning to increase the number of LUNs from 10 to ~50-60 and it would cause even more delays in re-scan.
Steps to Reproduce:
multipath -ll
are in "active ghost running" staterescan-scsi-bus.sh
script, it shows excessive delay in scanning.Actual results:
Expected results:
rescan-scsi-bus.sh
script should better handle the paths in "Not Ready" state, to avoid excessive delays in re-scan.Additional comments:
Below are more details for the paths in "active ghost running" state for ALUA setup with standby controller: https://access.redhat.com/solutions/1172323
HPE Guide on ghost/standby paths:
Page 158: https://support.hpe.com/hpesc/public/docDisplay?docId=c04448818#_OPENTOPIC_TOC_PROCESSING_d94e26609
For now we have work-around the current issue by commenting out the
sleep 1
command in following while loop inrescan-scsi-bus.sh
script:(note that the code excerpts above are from
sg3_utils-1.37
and while the code has changed slightly, functionality is roughly the same: https://github.com/hreinecke/sg3_utils/blob/master/scripts/rescan-scsi-bus.sh#L275)The text was updated successfully, but these errors were encountered: