Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable_group Segmentation fault #105

Open
linzhanglong opened this issue Nov 24, 2024 · 39 comments
Open

enable_group Segmentation fault #105

linzhanglong opened this issue Nov 24, 2024 · 39 comments

Comments

@linzhanglong
Copy link

linzhanglong commented Nov 24, 2024

Hello. In the do_check_path function, the following code:

if (pp->mpp->synced_count == 0) {  
    do_sync_mpp(vecs, pp->mpp);  
    /* if update_multipath_strings orphaned the path, quit early */  
    if (!pp->mpp)  
        return CHECK_PATH_SKIPPED;  
}

should be changed to ?

if (pp->mpp->synced_count == 0) {  
    do_sync_mpp(vecs, pp->mpp);  
    /* if update_multipath_strings orphaned the path, quit early */  
    if (!pp->mpp || pp->mpp->need_reload)  <--------------------------------modify  
        return CHECK_PATH_SKIPPED;  
}

Otherwise, the subsequent code in the enable_group function may access the pgindex array out of bounds:

static void
enable_group(struct path * pp)
{
    struct pathgroup * pgp;

    if (!pp->mpp->pg || !pp->pgindex)
        return;

    pgp = VECTOR_SLOT(pp->mpp->pg, pp->pgindex - 1); <-------------------------------------- here

    if (pgp->status == PGSTATE_DISABLED) {
        condlog(2, "%s: enable group #%i", pp->mpp->alias, pp->pgindex);
        dm_enablegroup(pp->mpp->alias, pp->pgindex);
    }
}
@mwilck
Copy link
Contributor

mwilck commented Nov 25, 2024

Thanks for the report.

  1. Which multipath-tools version are you using?
  2. can you share the multipathd logs preceding the crash with us, please?
  3. Are you able to reproduce the crash (for testing fixes)?

I've reviewed the code, and while it's true that update_pathvec_from_dm() may modify pp->mpp, I am not sure if your fix is correct. The problem is our lax handling of pp->pgindex in general. I'm going to send a patch to dm-devel.

@linzhanglong
Copy link
Author

Hello, I'm not at the office right now, I will provide the information later. This issue occurs with very low probability in a Active-Standby mode when you modify the LUN ID and then change it back. When multipathd detects a change in pp wwid, update_pathvec_from_dm will remove the path and also remove the empty pg, which can lead to this issue.

@mwilck
Copy link
Contributor

mwilck commented Nov 25, 2024

No problem, take your time.

@mwilck
Copy link
Contributor

mwilck commented Nov 25, 2024

I've sent a patch to [email protected]. Subject is "libmultipath: fix handling of pp->pgindex". You can inspect it here, too.

@linzhanglong
Copy link
Author

Hello. Could the following code in the update_pathvec_from_dm function also lead to remove paths and empty path groups? It could cause the same issue ?

/* If this fails, the device is not in sysfs */
pp->udev = get_udev_device(pp->dev_t, DEV_DEVT);

if (!pp->udev) {
    condlog(2, "%s: discarding non-existing path %s",
        mpp->alias, pp->dev_t);
    vector_del_slot(pgp->paths, j--);
    free_path(pp);
    must_reload = true;
    continue;
}

the logs related to my previous question:
2024-11-11 17:49:31.653829 err [multipathd:] 36b46e0810052656500146efa00000005: path 66:1312 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
2024-11-11 17:49:31.653846 notice [multipathd:] 36b46e0810052656500146efa00000005: removing empty pathgroup 5
2024-11-11 17:49:31.653853 warning [multipathd:] 36b46e0810052656500146efa00000005: sdbbk - rdac checker reports path is up
2024-11-11 17:49:31.653860 notice [multipathd:] 128:1376: reinstated
2024-11-11 17:49:31.653867 notice [multipathd:] 36b46e0810052656500146efa00000005: remaining active paths: 6
2024-11-11 17:49:47.588389 notice [multipathd:] --------start up--------

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Nov 27, 2024
pp->pgindex is set in disassemble_map() when a map is parsed.
There are various possiblities for this index to become invalid.
pp->pgindex is only used in enable_group() and followover_should_fallback(),
and both callers take no action if it is 0, which is the right
thing to do if we don't know the path's pathgroup.

Make sure pp->pgindex is reset to 0 in various places:
- when it's orphaned,
- before (re)grouping paths,
- when we detect a bad mpp assignment in update_pathvec_from_dm().
- when a pathgroup is deleted in update_pathvec_from_dm(). In this
  case, pgindex needs to be invalidated for all paths in all pathgroups
  after the one that was deleted.

The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but
because we're looping over pg->paths in the former and over pg->pgp in
the latter, I think it's better too play safe.

Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up")
Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Nov 27, 2024

Yes. @bmarzins pointed out the same thing on dm-devel.

I have posted another patch series to the dm-devel mailing list ("[PATCH v2 0/8] multipath-tools fixes") with an updated fix that should cover your case.

The set is also on my tip branch.

@mwilck
Copy link
Contributor

mwilck commented Nov 27, 2024

2024-11-11 17:49:31.653829 err [multipathd:] 36b46e0810052656500146efa00000005: path 66:1312 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map

You (or we) should investigate how it comes to pass that this path with a wrong WWID is in this map. The log history should provide some clue about it.

multipathd tries to work around this, but it represents some rather evil problem that has probably external reasons (a path has changed its WWID without being deleted and re-added).

@linzhanglong
Copy link
Author

You (or we) should investigate how it comes to pass that this path with a wrong WWID is in this map. The log history should provide some clue about it.

multipathd tries to work around this, but it represents some rather evil problem that has probably external reasons (a path has changed its WWID without being deleted and re-added).

Okay.
When modify the LUN ID or unmap this LUN on the storage side , the WWID of this LUN will change when viewed from the host side.

@mwilck
Copy link
Contributor

mwilck commented Nov 28, 2024

What storage type is it? You can't just have unmapped the LUN, you must have unmapped and re-mapped it, otherwise it wouldn't show up in the host any more, or am I missing something?

Anyway, testing the current patch set is more important now.

@mwilck
Copy link
Contributor

mwilck commented Nov 29, 2024

When modify the LUN ID or unmap this LUN on the storage side , the WWID of this LUN will change when viewed from the host side.

And there are no uevents on the host when this happens?

@linzhanglong
Copy link
Author

linzhanglong commented Nov 29, 2024

When modify the LUN ID or unmap this LUN on the storage side , the WWID of this LUN will change when viewed from the host side.

And there are no uevents on the host when this happens?

Yes, the storage LUN is not mounted, but it is mapped to the host. I modified the LUN ID on the storage side, and then executed the command: udevadm monitor --kernel --property, but there were no uevent events. The storage is HUAWEI XSG1.

@mwilck
Copy link
Contributor

mwilck commented Nov 29, 2024

Are there any kernel messages about changed LUNs or the like?

It is highly dangerous to swap a SCSI device in this way while a host is accessing it. It it isn't mounted, there's no immediate threat of data corruption, but still, I would strongly discourage doing it.

@linzhanglong
Copy link
Author

linzhanglong commented Nov 29, 2024

Yes, this issue occurred when I modified the LUN ID or umap LUN and rollback at that time.
These change LUN ID/Unmap LUN tests were conducted while I was analyzing that issue.

If the kernel issues the scsi command(tur) to the storage while the LUN ID is being modified, the kernel can receive the return code from the storage and detect that the LUN data has changed. In this scenario, a uevent will be triggered, but this depends on probability. I tested many times, and I only occasionally received the uevent event; in most cases, I did not receive it.

static void scsi_report_sense(struct scsi_device *sdev,
			      struct scsi_sense_hdr *sshdr)
{
	enum scsi_device_event evt_type = SDEV_EVT_MAXBITS;	/* i.e. none */

	if (sshdr->sense_key == UNIT_ATTENTION) {
		if (sshdr->asc == 0x3f && sshdr->ascq == 0x03) {
			evt_type = SDEV_EVT_INQUIRY_CHANGE_REPORTED;
			sdev_printk(KERN_WARNING, sdev,
				    "Inquiry data has changed");
		} else if (sshdr->asc == 0x3f && sshdr->ascq == 0x0e) {
			evt_type = SDEV_EVT_LUN_CHANGE_REPORTED;
			scsi_report_lun_change(sdev);  
			sdev_printk(KERN_WARNING, sdev,
				    "Warning! Received an indication that the "
				    "LUN assignments on this target have "
				    "changed. The Linux SCSI layer does not "
				    "automatically remap LUN assignments.\n");
		} else if (sshdr->asc == 0x3f)
			sdev_printk(KERN_WARNING, sdev,
				    "Warning! Received an indication that the "
				    "operating parameters on this target have "
				    "changed. The Linux SCSI layer does not "
				    "automatically adjust these parameters.\n");

		if (sshdr->asc == 0x38 && sshdr->ascq == 0x07) {
			evt_type = SDEV_EVT_SOFT_THRESHOLD_REACHED_REPORTED;
			sdev_printk(KERN_WARNING, sdev,
				    "Warning! Received an indication that the "
				    "LUN reached a thin provisioning soft "
				    "threshold.\n");
		}

		if (sshdr->asc == 0x29) {
			scsi_disk_reset_handler(sdev);
			evt_type = SDEV_EVT_POWER_ON_RESET_OCCURRED;
			sdev_printk(KERN_WARNING, sdev,
				    "Power-on or device reset occurred\n");
		}

		if (sshdr->asc == 0x2a && sshdr->ascq == 0x01) {
			evt_type = SDEV_EVT_MODE_PARAMETER_CHANGE_REPORTED;
			sdev_printk(KERN_WARNING, sdev,
				    "Mode parameters changed");
		} else if (sshdr->asc == 0x2a && sshdr->ascq == 0x06) {
			evt_type = SDEV_EVT_ALUA_STATE_CHANGE_REPORTED;
			sdev_printk(KERN_WARNING, sdev,
				    "Asymmetric access state changed");
		} else if (sshdr->asc == 0x2a && sshdr->ascq == 0x09) {
			evt_type = SDEV_EVT_CAPACITY_CHANGE_REPORTED;
			sdev_printk(KERN_WARNING, sdev,
				    "Capacity data has changed");
		} else if (sshdr->asc == 0x2a)
			sdev_printk(KERN_WARNING, sdev,
				    "Parameters changed");
	}

	if (evt_type != SDEV_EVT_MAXBITS) {
		set_bit(evt_type, sdev->pending_events);
		schedule_work(&sdev->event_work);
	}
}

@mwilck
Copy link
Contributor

mwilck commented Nov 29, 2024

Ok, I understand.

Do you still observe the crash with the current patch set?

@linzhanglong
Copy link
Author

linzhanglong commented Nov 30, 2024

Ok, I understand.

Do you still observe the crash with the current patch set?

Ok. I am testing. I have a small question: if this issue occurs and the pgindex is set to invalid, where can we ensure that the reload map will be triggered?

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 2, 2024
update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
settings. In this case, the map should be reloaded, but so far we don't
do this reliably. Add a call to reload_and_sync_map() to do_sync_mpp() to
clear this kind of inconsistency. In order to avoid endless reload loops,
limit the number of retries to 1.

Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Dec 2, 2024

I have a small question: if this issue occurs and the pgindex is set to invalid, where can we ensure that the reload map will be triggered?

That question isn't small. We currently don't. It's a subtle matter because we must avoid spurious map reloads, and endless reload loops. The difficult part is where we can safely reload the map. I've double-checked, and I think that do_sync_mpp() is the correct place to attempt a reload like this.

I have just pushed another commit (ab60145) to my "tip" branch. I'm curious to see if it works for your case (i.e. causes a map reload for the broken map).

@bmarzins, your opinion about that commit would also be highly appreciated, as you've made lots of changes around the checkerloop recently.

@mwilck
Copy link
Contributor

mwilck commented Dec 2, 2024

Do you observe kernel messages like this?

LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.

Also, could you run udevadm monitor -k -p -s scsi when the LUN assignment changes, and see if the kernel sends any notification about the changed LUN assignments to user space?

@mwilck
Copy link
Contributor

mwilck commented Dec 2, 2024

Note that the fact that multipathd receives no notifications about SCSI Unit Attention (UA)events is a long-standing problem. We have missing links un multiple levels here.
We might get a UA, but not necessarily on the path that had changed, it can be some other path device belonging to the same SCSI target. Even if the UA is received, it doesn't trigger a target rescan by the kernel, and even if the rescan is done, it doesn't trigger a block-level uevent, even if the device ID changes.

@mwilck
Copy link
Contributor

mwilck commented Dec 2, 2024

@linzhanglong, can you describe your test procedure in detail? You change a LUN assignment on the storage side, and then what do you do on the host side?

@linzhanglong
Copy link
Author

linzhanglong commented Dec 2, 2024

Do you observe kernel messages like this?

LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.

Also, could you run udevadm monitor -k -p -s scsi when the LUN assignment changes, and see if the kernel sends any notification about the changed LUN assignments to user space?

Hello, I just tested a single LUN, and the reason I previously mentioned not receiving the uevent event was because I was filtering with sdx. In fact, the uevent events were received, and they are as follows:

KERNEL[22778.730765] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-13/target15:0:7/15:0:7:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-13/target15:0:7/15:0:7:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=18718
SUBSYSTEM=scsi
UDEV_LOG=6

multipathd.log, 36b46e08100526565029487c700000108 is the WWID that the LUN changes to after I unmap it.

2024-12-02 22:39:47.929573 notice [multipathd:] sync_map_state: failing sdet state 2 dmstate 2
2024-12-02 22:39:47.929576 notice [multipathd:] sync_map_state: failing sdeg state 2 dmstate 2
2024-12-02 22:39:47.929580 notice [multipathd:] sync_map_state: failing sdeh state 2 dmstate 2
2024-12-02 22:39:47.929590 notice [multipathd:] sync_map_state: failing sdem state 2 dmstate 2
2024-12-02 22:39:47.929593 notice [multipathd:] sync_map_state: failing sdel state 2 dmstate 2
2024-12-02 22:39:47.929595 notice [multipathd:] sync_map_state: failing sden state 2 dmstate 2
2024-12-02 22:39:47.929606 notice [multipathd:] sync_map_state: failing sdei state 2 dmstate 2
2024-12-02 22:39:47.929610 notice [multipathd:] sync_map_state: failing sdej state 2 dmstate 2
2024-12-02 22:39:47.929613 notice [multipathd:] sync_map_state: failing sdek state 2 dmstate 2
2024-12-02 22:39:47.929623 notice [multipathd:] 36b46e08100526565029487c700000108: devmap dm-1 registered
2024-12-02 22:39:48.068179 err [multipathd:] 36b46e08100526565029487c700000108: path 8:16 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
2024-12-02 22:39:48.068211 notice [multipathd:] 36b46e08100526565029487c700000108: removing empty pathgroup 0
2024-12-02 22:39:48.068215 err [multipathd:] 36b46e08100526565029487c700000108: path 69:192 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
2024-12-02 22:39:48.068232 notice [multipathd:] 36b46e08100526565029487c700000108: removing empty pathgroup 0
2024-12-02 22:39:48.068235 err [multipathd:] 36b46e08100526565029487c700000108: path 70:32 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
2024-12-02 22:39:48.068250 notice [multipathd:] 36b46e08100526565029487c700000108: removing empty pathgroup 0
2024-12-02 22:39:48.068255 err [multipathd:] 36b46e08100526565029487c700000108: path 67:224 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
2024-12-02 22:39:48.068266 notice [multipathd:] 36b46e08100526565029487c700000108: removing empty pathgroup 0
2024-12-02 22:39:48.068270 err [multipathd:] 36b46e08100526565029487c700000108: path 68:64 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
2024-12-02 22:39:48.068284 notice [multipathd:] 36b46e08100526565029487c700000108: removing empty pathgroup 0
2024-12-02 22:39:48.707487 err [multipathd:] 36b46e08100526565029487c700000108: path 8:16 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
2024-12-02 22:39:48.707493 notice [multipathd:] 36b46e08100526565029487c700000108: removing empty pathgroup 0
2024-12-02 22:39:48.707496 err [multipathd:] 36b46e08100526565029487c700000108: path 69:192 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
...
024-12-02 22:39:50.068846 err [multipathd:] 36b46e08100526565029487c700000108: path 68:64 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
2024-12-02 22:39:50.068850 notice [multipathd:] 36b46e08100526565029487c700000108: removing empty pathgroup 0
2024-12-02 22:39:50.068852 notice [multipathd:] checker failed path 129:0 in map 36b46e08100526565029487c700000108
2024-12-02 22:39:50.068894 notice [multipathd:] 36b46e08100526565029487c700000108: sdeo - rdac checker reports path is down: lun not connected
2024-12-02 22:39:50.068904 notice [multipathd:] 36b46e08100526565029487c700000108: switch to path group #1
2024-12-02 22:39:50.069057 err [multipathd:] 36b46e08100526565029487c700000108: path 8:16 WWID 36b46e0810052656512345678000103e8 doesn't match, removing from map
 # multipathd  show topo
36b46e08100526565029487c700000108 dm-1 HUAWEI,XSG1
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:3:1    sdf  8:80   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:5:1    sdj  8:144  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:1:1    sdbk 67:224 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:5:1   sdas 66:192 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:2:1   sdaa 65:160 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:3:1   sdag 66:0   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:2:1    sdbq 68:64  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:0:1   sdh  8:112  active ready running
`-+- policy='round-robin 0' prio=50 status=enabled
  `- 15:0:6:1   sday 67:32  active ready running

@linzhanglong
Copy link
Author

linzhanglong commented Dec 2, 2024

I retested by unmapping a LUN, and this is the uevent event for the unmapping. My previous environment had 256 LUNs.

 # udevadm  monitor -k -p -s scsi
custom logging function 0x1fe6010 registered
selinux=0
runtime dir '/run/udev'
calling: monitor
monitor will print the received events for:
KERNEL - the kernel uevent

KERNEL[23808.067273] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-3/target15:0:2/15:0:2:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-3/target15:0:2/15:0:2:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19279
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.083982] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-4/target0:0:3/0:0:3:162 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-4/target0:0:3/0:0:3:162
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19281
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.098779] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-4/target15:0:3/15:0:3:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-4/target15:0:3/15:0:3:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19283
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.116540] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-9/target0:0:5/0:0:5:162 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-9/target0:0:5/0:0:5:162
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19285
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.131014] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-0/target0:0:0/0:0:0:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19287
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.134275] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-9/target15:0:4/15:0:4:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-9/target15:0:4/15:0:4:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19289
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.149047] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-10/target0:0:4/0:0:4:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-10/target0:0:4/0:0:4:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19291
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.156474] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-12/target15:0:5/15:0:5:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-12/target15:0:5/15:0:5:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19293
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.160778] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-12/target0:0:6/0:0:6:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-12/target0:0:6/0:0:6:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19295
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.171297] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-13/target0:0:7/0:0:7:1 (scsi)
ACTION=change
DEVPATH=/devices/pci0000:3a/0000:3a:00.0/0000:3b:00.0/host0/rport-0:0-13/target0:0:7/0:0:7:1
DEVTYPE=scsi_device
DRIVER=sd
MODALIAS=scsi:t-0x00
SDEV_UA=REPORTED_LUNS_DATA_HAS_CHANGED
SEQNUM=19297
SUBSYSTEM=scsi
UDEV_LOG=6

KERNEL[23808.180670] change   /devices/pci0000:3a/0000:3a:00.0/0000:3b:00.1/host15/rport-15:0-10/target15:0:6/15:0:6:1 (scsi)
ACTION=change
DEVPATH=/devices

  1. Test the unmap operation on a LUN.

1.1 before unmap LUN:

 # multipathd  show topo
36b46e08100526565029487c700000108 dm-1 HUAWEI,XSG1
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:5:1  sdg 8:96  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:6:1  sdh 8:112 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:7:1  sdi 8:128 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:0:1  sdb 8:16  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:4:1  sdf 8:80  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:1:1  sdc 8:32  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:2:1  sdd 8:48  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:3:1  sde 8:64  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:0:1 sdj 8:144 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:6:1 sdw 65:96 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:1:1 sdk 8:160 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:5:1 sdv 65:80 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:3:1 sdm 8:192 active ready running
`-+- policy='round-robin 0' prio=50 status=enabled
  `- 15:0:2:1 sdl 8:176 active ready running

1.2 after unmap LUN:

36b46e08100526565029487c700000108 dm-1 HUAWEI,XSG1
size=20G features='0' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 15:0:3:1 sdm 8:192  failed faulty running <------------------------------

36b46e0810052656512345678000103e8 dm-2 HUAWEI,XSG1
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 15:0:1:1 sdk 8:160  failed faulty running
  |- 0:0:4:1  sdf 8:80   failed faulty running
  |- 15:0:2:1 sdl 8:176  failed faulty running
  |- 0:0:3:1  sde 8:64   failed faulty running
  |- 15:0:0:1 sdj 8:144  failed faulty running
  |- 0:0:0:1  sdb 8:16   failed faulty running
  |- 15:0:6:1 sdw 65:96  failed faulty running
  |- 0:0:6:1  sdh 8:112  failed faulty running
  |- 15:0:4:1 sdu 65:64  failed faulty running
  |- 0:0:1:1  sdc 8:32   failed faulty running
  |- 15:0:5:1 sdv 65:80  failed faulty running
  |- 0:0:5:1  sdg 8:96   failed faulty running
  |- 15:0:7:1 sdx 65:112 failed faulty running
  |- 0:0:2:1  sdd 8:48   failed faulty running
  `- 0:0:7:1  sdi 8:128  failed faulty running
 
----- After some time:
36b46e0810052656512345678000103e8 dm-2 HUAWEI,XSG1
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 15:0:1:1 sdk 8:160  failed faulty running
  |- 0:0:4:1  sdf 8:80   failed faulty running
  |- 15:0:2:1 sdl 8:176  failed faulty running
  |- 0:0:3:1  sde 8:64   failed faulty running
  |- 15:0:0:1 sdj 8:144  failed faulty running
  |- 0:0:0:1  sdb 8:16   failed faulty running
  |- 15:0:6:1 sdw 65:96  failed faulty running
  |- 0:0:6:1  sdh 8:112  failed faulty running
  |- 15:0:4:1 sdu 65:64  failed faulty running
  |- 0:0:1:1  sdc 8:32   failed faulty running
  |- 15:0:5:1 sdv 65:80  failed faulty running
  |- 0:0:5:1  sdg 8:96   failed faulty running
  |- 15:0:7:1 sdx 65:112 failed faulty running
  |- 0:0:2:1  sdd 8:48   failed faulty running
  |- 15:0:3:1 sdm 8:192  failed faulty running
  `- 0:0:7:1  sdi 8:128  failed faulty running

1.3 remap LUN to host

36b46e08100526565029487c700000108 dm-1 HUAWEI,XSG1
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 15:0:7:1 sdx 65:112 active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:5:1 sdv 65:80  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:7:1  sdi 8:128  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:4:1 sdu 65:64  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:1:1 sdk 8:160  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:4:1  sdf 8:80   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:2:1 sdl 8:176  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:3:1  sde 8:64   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:0:1 sdj 8:144  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:0:1  sdb 8:16   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:6:1 sdw 65:96  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:6:1  sdh 8:112  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:1:1  sdc 8:32   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:5:1  sdg 8:96   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:2:1  sdd 8:48   active ready running
`-+- policy='round-robin 0' prio=50 status=enabled
  `- 15:0:3:1 sdm 8:192  active ready running
  1. Test the change LUN ID operation on a LUN.
    2.1 before change
36b46e08100526565029487c700000108 dm-1 HUAWEI,XSG1
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 15:0:7:1 sdx 65:112 failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 15:0:5:1 sdv 65:80  failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 0:0:7:1  sdi 8:128  failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 15:0:4:1 sdu 65:64  failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 15:0:1:1 sdk 8:160  failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 0:0:4:1  sdf 8:80   failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 15:0:2:1 sdl 8:176  failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 0:0:3:1  sde 8:64   failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 15:0:0:1 sdj 8:144  failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 15:0:6:1 sdw 65:96  failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 0:0:6:1  sdh 8:112  failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 0:0:1:1  sdc 8:32   failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 0:0:5:1  sdg 8:96   failed faulty running
|-+- policy='round-robin 0' prio=0 status=enabled
| `- 0:0:2:1  sdd 8:48   failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 15:0:3:1 sdm 8:192  failed faulty running

----- After some time:
36b46e0810052656512345678000103e8 dm-2 HUAWEI,XSG1
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 0:0:0:1  sdb 8:16   failed faulty running
  |- 15:0:0:1 sdj 8:144  failed faulty running
  |- 0:0:1:1  sdc 8:32   failed faulty running
  |- 15:0:1:1 sdk 8:160  failed faulty running
  |- 0:0:2:1  sdd 8:48   failed faulty running
  |- 15:0:2:1 sdl 8:176  failed faulty running
  |- 0:0:3:1  sde 8:64   failed faulty running
  |- 15:0:3:1 sdm 8:192  failed faulty running
  |- 0:0:4:1  sdf 8:80   failed faulty running
  |- 15:0:4:1 sdu 65:64  failed faulty running
  |- 0:0:5:1  sdg 8:96   failed faulty running
  |- 15:0:5:1 sdv 65:80  failed faulty running
  |- 0:0:6:1  sdh 8:112  failed faulty running
  |- 15:0:6:1 sdw 65:96  failed faulty running
  |- 0:0:7:1  sdi 8:128  failed faulty running
  `- 15:0:7:1 sdx 65:112 failed faulty running

2.1 change back

36b46e0810052656512345678000103e8 dm-2 HUAWEI,XSG1
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 15:0:0:1 sdj 8:144  failed faulty running
  |- 0:0:1:1  sdc 8:32   failed faulty running
  |- 15:0:1:1 sdk 8:160  failed faulty running
  |- 0:0:2:1  sdd 8:48   failed faulty running
  |- 15:0:2:1 sdl 8:176  failed faulty running
  |- 0:0:3:1  sde 8:64   failed faulty running
  |- 15:0:3:1 sdm 8:192  failed faulty running
  |- 0:0:4:1  sdf 8:80   failed faulty running
  |- 15:0:4:1 sdu 65:64  failed faulty running
  |- 0:0:5:1  sdg 8:96   failed faulty running
  |- 15:0:5:1 sdv 65:80  failed faulty running
  |- 0:0:6:1  sdh 8:112  failed faulty running
  |- 15:0:6:1 sdw 65:96  failed faulty running
  |- 0:0:7:1  sdi 8:128  failed faulty running
  `- 15:0:7:1 sdx 65:112 failed faulty running
create: 36b46e08100526565029487c700000108 dm-1 HUAWEI,XSG1
size=20G features='0' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  `- 0:0:0:1  sdb 8:16   active ready  running

----- After some time:
36b46e08100526565029487c700000108 dm-1 HUAWEI,XSG1
size=20G features='0' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:0:1  sdb 8:16   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:5:1  sdg 8:96   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:4:1  sdf 8:80   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:3:1  sde 8:64   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:2:1  sdd 8:48   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:1:1  sdc 8:32   active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:6:1  sdh 8:112  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 0:0:7:1  sdi 8:128  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:0:1 sdj 8:144  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:1:1 sdk 8:160  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:2:1 sdl 8:176  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:3:1 sdm 8:192  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:4:1 sdu 65:64  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:5:1 sdv 65:80  active ready running
|-+- policy='round-robin 0' prio=50 status=enabled
| `- 15:0:6:1 sdw 65:96  active ready running
`-+- policy='round-robin 0' prio=50 status=enab

@linzhanglong
Copy link
Author

linzhanglong commented Dec 3, 2024

I will merge the patch later today and test it in the environment with 256 LUNs. Looking at the code can solve the problem

I noticed that there are related operations for reload_and_sync_map at the end of the do_check_path function. Would it be more appropriate to reload the map there?

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 3, 2024
pp->pgindex is set in disassemble_map() when a map is parsed.
There are various possiblities for this index to become invalid.
pp->pgindex is only used in enable_group() and followover_should_fallback(),
and both callers take no action if it is 0, which is the right
thing to do if we don't know the path's pathgroup.

Make sure pp->pgindex is reset to 0 in various places:
- when it's orphaned,
- before (re)grouping paths,
- when we detect a bad mpp assignment in update_pathvec_from_dm().
- when a pathgroup is deleted in update_pathvec_from_dm(). In this
  case, pgindex needs to be invalidated for all paths in all pathgroups
  after the one that was deleted.

The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but
because we're looping over pg->paths in the former and over pg->pgp in
the latter, I think it's better too play safe.

Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up")
Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 3, 2024
update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
settings. In this case, the map should be reloaded, but so far we don't
do this reliably. Add a call to reload_and_sync_map() to do_sync_mpp() to
clear this kind of inconsistency. In order to avoid endless reload loops,
limit the number of retries to 1.

Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 4, 2024
pp->pgindex is set in disassemble_map() when a map is parsed.
There are various possiblities for this index to become invalid.
pp->pgindex is only used in enable_group() and followover_should_fallback(),
and both callers take no action if it is 0, which is the right
thing to do if we don't know the path's pathgroup.

Make sure pp->pgindex is reset to 0 in various places:
- when it's orphaned,
- before (re)grouping paths,
- when we detect a bad mpp assignment in update_pathvec_from_dm().
- when a pathgroup is deleted in update_pathvec_from_dm(). In this
  case, pgindex needs to be invalidated for all paths in all pathgroups
  after the one that was deleted.

The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but
because we're looping over pg->paths in the former and over pg->pgp in
the latter, I think it's better too play safe.

Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up")
Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 4, 2024
update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
settings. In this case, the map should be reloaded, but so far we don't
do this reliably. Add a call to reload_and_sync_map() to do_sync_mpp() to
clear this kind of inconsistency. In order to avoid endless reload loops,
limit the number of retries to 1.

Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Dec 4, 2024

I noticed that there are related operations for reload_and_sync_map at the end of the do_check_path function. Would it be more appropriate to reload the map there?

The need_reload flag is set in update_pathvec_from_dm(), which is called in the update_multipath_strings() code path. It makes sense to do the reload closely after that. do_sync_mpp() is where this check logically belongs 1. The calls late in do_check_path() are meant to fix priorities.

However, I think now that calling reload_and_sync_map() in do_sync_mpp(), like in my path, is not ideal. We'll be pointlessly repeating ioctls. It's fine to test this I think I'll post an update to the patch.

Footnotes

  1. We have lots of similar code paths for refreshing some properties of the maps either in multipathd or in the kernel, and that this is quite confusing even for people who've been working with this code base for years. Some day we'll need to clean this up.

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 4, 2024
update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
settings. In this case, the map should be reloaded, but so far we don't
do this reliably. Add a call to reload_map() to do_sync_mpp() to
clear this kind of inconsistency. In order to avoid endless reload loops,
limit the number of retries to 1.

Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Dec 4, 2024

However, I think now that calling reload_and_sync_map() in do_sync_mpp(), like in my path, is not ideal.

Indeed, this was wrong. reload_and_sync_map() may actually end up removing the map we're just working on.

I've pushed a new commit, 01ec4fa 15747c2, that replaces the call to reload_and_sync_map() with one to just reload_map(). This is sufficient because we'll call update_multipath_strings() again in do_sync_mpp().

@bmarzins, your feedback would be appreciated.

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 4, 2024
update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
settings. In this case, the map should be reloaded, but so far we don't
do this reliably. Add a call to reload_map() to do_sync_mpp() to
clear this kind of inconsistency. In order to avoid endless reload loops,
limit the number of retries to 1.

Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Dec 4, 2024

Sorry for posting the wrong commit. 01ec4fa is correct.

@linzhanglong
Copy link
Author

linzhanglong commented Dec 4, 2024

Sorry for posting the wrong commit. 01ec4fa is correct.

Okay, I will test the new patch.

@bmarzins
Copy link
Contributor

bmarzins commented Dec 5, 2024

@mwilck, Is there a big benefit to retrying immediately that I'm overlooking? Is this to deal with the case where reload_map() fails, or are you worried about successfully reloading the map, but having update_multipath_strings() still flag it as needing to be reloaded again? I seems to me that instead of immediately retrying, we could just wait till the next path check to try again.

Also, I think we do want to call reload_and_sync_map(). Right now, every call to domap() in multipathd will call setup_multipath() and sync_map_state() afterwards. I think we want to keep that for all cases. I posted a patchset that contains an alternate version of this patch.

@linzhanglong
Copy link
Author

Is there a big benefit to retrying immediately that I'm overlooking? Is this to deal with the case where reload_map() fails, or are you worried about successfully reloading the map, but having update_multipath_strings() still flag it as needing to be reloaded again? I seems to me that instead of immediately retrying, we could just wait till the next path check to try again.

Also, I think we do want to call reload_and_sync_map(). Right now, every call to domap() in multipathd will call setup_multipath() and sync_map_state() afterwards. I think we want to keep that for all cases. I posted a patchset that contains an alternate version of this patch.
Hello,regarding the patch set, where I can find it?

@mwilck
Copy link
Contributor

mwilck commented Dec 5, 2024

Is there a big benefit to retrying immediately that I'm overlooking?

In my mind, need_reload indicates an inconsistent state in the kernel. While we do our best to make sure that the kernel won't actually use this path for I/O, I think that we should attempt to fix this situation rather sooner than later.

@bmarzins
Copy link
Contributor

bmarzins commented Dec 5, 2024

@linzhanglong, the patches are available here:
https://lore.kernel.org/dm-devel/[email protected]/

@bmarzins
Copy link
Contributor

bmarzins commented Dec 5, 2024

Is there a big benefit to retrying immediately that I'm overlooking?

In my mind, need_reload indicates an inconsistent state in the kernel. While we do our best to make sure that the kernel won't actually use this path for I/O, I think that we should attempt to fix this situation rather sooner than later.

Oops. I misread your code. I though that you did one retry reload_map() immediately, buy you just loop to call update_multipath_strings() again. I still think that reload_and_sync_map() makes more sense, but you can ignore my retrys question.

My code should do the reload_and_sync_map() as often as yours. I just moved it after the prio refresh so that if we're going to do a reload because of that anyways, we won't reload the device twice (unless the device gets set to need_reload when we are syncing after the prio changed reload).

In my version of the patch, perhaps we could store a flag when need_reload is still set after the new call to reload_and_sync_map() in checkerloop, and cleared whenever the map is reloaded. Then we could check that flag and only require the mpp->synced_count > 0 check if the last reload of the map was solely because of need_reload, and it didn't help.

That would make multipathd respond to a new need_reload within a checker tick. If the reload didn't fix the problem, we would wait till the next time a path in the device is checked before trying again, which is the same speed as yours.

@mwilck
Copy link
Contributor

mwilck commented Dec 5, 2024

I'd also prefer to do this kind of reload no more than once per tick.
But in your version of the patch, checkerloop() could release the lock before actually reloading the map after update_multipath_strings() detects an inconsistency. That sort of contradicts my "rather sooner than later" idea. I agree that possibly having to reload the map twice during a single tick is not beatiful, but given that such inconsistencies are very rare, IMO it shouldn't hurt much.

The idea of the retry was to check if update_pathvec_from_dm() still reports an inconsistency (which includes the case in which the reload failed, because in that case we'd have failed to fix the situation). I thought one immediate retry was warranted, given that this is a rare but serious error condition. But I've no idea about the likelihood that such an immediate retry would succeed.

I'll respond in the dm-devel thread.

@mwilck
Copy link
Contributor

mwilck commented Dec 5, 2024

Another thought: we could attempt a single reload in do_sync_mpp() without an immediate retry (leaving need_reload set), and use your patch on top. This way if the reload in do_sync_mpp() failed, we'd retry at the end of the tick, with all path properties adjusted, which makes probably more sense than retrying immediately.

@bmarzins
Copy link
Contributor

bmarzins commented Dec 5, 2024

I thought one immediate retry was warranted, given that this is a rare but serious error condition. But I've no idea about the likelihood that such an immediate retry would succeed.

But your code checks retry++ < MAX_RETRIES so that chunk will only run once and won't ever call reload_map() after jumping to try_again (retry will be 1 and MAX_RETRIES is 1). Even if mpp->need_reload gets set when it tries again, it will skip the code to reload the map. Right? That's what I didn't notice originally.

@mwilck
Copy link
Contributor

mwilck commented Dec 5, 2024

Right, now I misinterpreted my own code :-)

Indeed the idea was just to retry reading the kernel parameters. I didn't intend to reload multiple times.

mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 6, 2024
pp->pgindex is set in disassemble_map() when a map is parsed.
There are various possiblities for this index to become invalid.
pp->pgindex is only used in enable_group() and followover_should_fallback(),
and both callers take no action if it is 0, which is the right
thing to do if we don't know the path's pathgroup.

Make sure pp->pgindex is reset to 0 in various places:
- when it's orphaned,
- before (re)grouping paths,
- when we detect a bad mpp assignment in update_pathvec_from_dm().
- when a pathgroup is deleted in update_pathvec_from_dm(). In this
  case, pgindex needs to be invalidated for all paths in all pathgroups
  after the one that was deleted.

The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but
because we're looping over pg->paths in the former and over pg->pgp in
the latter, I think it's better too play safe.

Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up")
Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 6, 2024
update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
settings. In this case, the map should be reloaded, but so far we don't
do this reliably. A previous patch added a call to reload_and_sync_map()
in the CHECKER_FINISHED state, but in the mean time the checker may have
waited for checker threads to finish, and may have dropped and re-acquired the
vecs lock. As mpp->need_reload is a serious but rare condition, also try
to fix it early in the checker loop. Because of the previous patch, we
can call reload_and_sync_map() here.

Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 6, 2024
pp->pgindex is set in disassemble_map() when a map is parsed.
There are various possiblities for this index to become invalid.
pp->pgindex is only used in enable_group() and followover_should_fallback(),
and both callers take no action if it is 0, which is the right
thing to do if we don't know the path's pathgroup.

Make sure pp->pgindex is reset to 0 in various places:
- when it's orphaned,
- before (re)grouping paths,
- when we detect a bad mpp assignment in update_pathvec_from_dm().
- when a pathgroup is deleted in update_pathvec_from_dm(). In this
  case, pgindex needs to be invalidated for all paths in all pathgroups
  after the one that was deleted.

The hunk in group_paths is mostly redundant with the hunk in free_pgvec(), but
because we're looping over pg->paths in the former and over pg->pgp in
the latter, I think it's better too play safe.

Fixes: 99db1bd ("[multipathd] re-enable disabled PG when at least one path is up")
Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
Reviewed-by: Benjamin Marzinski <[email protected]>
mwilck added a commit to openSUSE/multipath-tools that referenced this issue Dec 6, 2024
update_pathvec_from_dm() may set mpp->need_reload if it finds inconsistent
settings. In this case, the map should be reloaded, but so far we don't
do this reliably. A previous patch added a call to reload_and_sync_map()
in the CHECKER_FINISHED state, but in the mean time the checker may have
waited for checker threads to finish, and may have dropped and re-acquired the
vecs lock. As mpp->need_reload is a serious but rare condition, also try
to fix it early in the checker loop. Because of the previous patch, we
can call reload_and_sync_map() here.

Fixes: opensvc#105
Signed-off-by: Martin Wilck <[email protected]>
@mwilck
Copy link
Contributor

mwilck commented Dec 9, 2024

My tip branch now contains Ben's fixes plus an improved version of mine above, plus some cleanup.

@linzhanglong, Please provide feedback.

@linzhanglong
Copy link
Author

Sorry for posting the wrong commit. 01ec4fa is correct.

Okay, I will test the new patch.

I have merged the changes from this patch into the test environment without any issues. If there are any new changes, I will merge and test them today.

@linzhanglong
Copy link
Author

Test Okay

@linzhanglong
Copy link
Author

Hello, when will the new version be released? Is the branch at https://github.com/openSUSE/multipath-tools/tree/tip the next version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants