MPIBackend error on Windows 10: "Unknown option: --use-hwthread-cpus" #589

rythorpe · 2023-01-19T20:37:35Z

init network
drive type is Rhythmic, location=proximal
drive type is Rhythmic, location=distal
drive type is Evoked, location=distal
drive type is Evoked, location=proximal
drive type is Evoked, location=proximal
drive type is Evoked, location=proximal
drive type is Poisson, location=proximal
start simulation
MPI will run 2 trial(s) sequentially by distributing network neurons over 11 processes.
Unknown option: --use-hwthread-cpus

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
File ~\Documents\GitHub\hnn-core\hnn_core\gui\gui.py:1307, in run_button_clicked(widget_simulation_name, log_out, drive_widgets, all_data, dt, tstop, ntrials, backend_selection, mpi_cmd, n_jobs, params, simulation_status_bar, simulation_status_contents, connectivity_sliders, viz_manager)
   1305     with backend:
   1306         simulation_status_bar.value = simulation_status_contents['running']
-> 1307         simulation_data[_sim_name]['dpls'] = simulate_dipole(
   1308             simulation_data[_sim_name]['net'],
   1309             tstop=tstop.value,
   1310             dt=dt.value,
   1311             n_trials=ntrials.value)
   1313         simulation_status_bar.value = simulation_status_contents[
   1314             'finished']
   1316 viz_manager.reset_fig_config_tabs()

File ~\Documents\GitHub\hnn-core\hnn_core\dipole.py:100, in simulate_dipole(net, tstop, dt, n_trials, record_vsec, record_isec, postproc)
     95 if postproc:
     96     warnings.warn('The postproc-argument is deprecated and will be removed'
     97                   ' in a future release of hnn-core. Please define '
     98                   'smoothing and scaling explicitly using Dipole methods.',
     99                   DeprecationWarning)
--> 100 dpls = _BACKEND.simulate(net, tstop, dt, n_trials, postproc)
    102 return dpls

File ~\Documents\GitHub\hnn-core\hnn_core\parallel_backends.py:717, in MPIBackend.simulate(self, net, tstop, dt, n_trials, postproc)
    712 print(f"MPI will run {n_trials} trial(s) sequentially by "
    713       f"distributing network neurons over {self.n_procs} processes.")
    715 env = _get_mpi_env()
--> 717 self.proc, sim_data = run_subprocess(
    718     command=self.mpi_cmd, obj=[net, tstop, dt, n_trials], timeout=30,
    719     proc_queue=self.proc_queue, env=env, cwd=os.getcwd(),
    720     universal_newlines=True)
    722 dpls = _gather_trial_data(sim_data, net, n_trials, postproc)
    723 return dpls

File ~\Documents\GitHub\hnn-core\hnn_core\parallel_backends.py:174, in run_subprocess(command, obj, timeout, proc_queue, *args, **kwargs)
    171 if not sent_network:
    172     # Send network object to child so it can start
    173     try:
--> 174         _write_net(proc.stdin, pickled_obj)
    175     except BrokenPipeError:
    176         # child failed during _write_net(). get the
    177         # output and break out of loop on the next
    178         # iteration
    179         warn("Received BrokenPipeError exception. "
    180              "Child process failed unexpectedly")

File ~\Documents\GitHub\hnn-core\hnn_core\parallel_backends.py:475, in _write_net(stream, pickled_net)
    473 stream.flush()
    474 stream.write('@start_of_net@')
--> 475 stream.write(pickled_net.decode())
    476 stream.write('@end_of_net:%d@\n' % len(pickled_net))
    477 stream.flush()

OSError: [Errno 22] Invalid argument

rythorpe · 2023-01-19T20:40:55Z

@dylansdaniels can you check to see if you get the same thing with a development installation off of the master branch on Windows?

jasmainak · 2023-01-19T20:42:28Z

@rythorpe what distribution of MPI did you install?

jasmainak · 2023-01-19T20:53:04Z

I think this logic has to be OS-dependent or MPI dependent since the --use-hwthread-cpus option is only a feature of OpenMPI, not MSMPI

rythorpe · 2023-01-19T21:30:29Z

Oh shoot, I think you might be right 🤦‍♂️ There are probably many aspects of our MPIBackend that are currently incompatible with Windows. For instance, the feature I was trying to implement in #506 will probably need a bit of help before working on a Windows platform.

dylansdaniels · 2023-01-19T22:50:59Z

@rythorpe just getting to this. do you still want me to test? Or is it a moot point since windows doesn't support OpenMPI? I'll go ahead and do a fresh fork and get it set up on my windows computer in any case

rythorpe · 2023-01-19T23:36:20Z

It's always nice to have more eyes on it, but I wouldn't worry about testing it for now. Maybe once we get the present issue resolved we can both do separate installations and try to break it :)

dylansdaniels · 2023-01-19T23:45:28Z

sounds good i'll be ready for it :)

jasmainak · 2023-01-20T01:03:07Z

@rythorpe I would suggest that we copy these lines from the Neuron CIs so we avoid regressions in the future. The CIs will initially fail but you can work backwards making the CI pass like in TDD

jasmainak · 2023-01-20T01:06:21Z

It would also be nice to update this document once we figure out how to make it work: https://jonescompneurolab.github.io/hnn-core/stable/parallel.html#mpi

rythorpe · 2023-01-20T04:11:51Z

@rythorpe I would suggest that we copy these lines from the Neuron CIs so we avoid regressions in the future. The CIs will initially fail but you can work backwards making the CI pass like in TDD

I'm pretty sure msmpi is distributed with NEURON and is thus automatically installed during our unit test CIs. That's how it ended up on my Windows installation at least.

I'm guessing the reason it doesn't show up is because somehow the --use-hwthread-cpus option isn't getting called in our tests.

jasmainak · 2023-01-20T04:34:44Z

umm ... I don't think so. See here for the last windows CI run on master

see the skipped test on parallel backends

rythorpe · 2023-01-20T04:39:59Z

Ugh. Remind me again why we set up the MPIBackend tests to fail silently? I think we should consider reverting that since the MPIBackend will most likely be default for new users in workshops, etc.

jasmainak · 2023-01-20T04:47:43Z

No they don't fail. They just get skipped if MPI is not installed. It keeps the barrier low for new developers

This takes George's old GUI-specific `_available_cores()` method, moves it, and greatly expands it to include updates to the logic about cores and hardware-threading which was previously inside `MPIBackend.__init__()`. This was necessary due to the number of common but different outcomes based on platform, architecture, hardware-threading support, and user choice. These changes do not involve very many lines of code, but a good amount of thought and testing has gone into them. Importantly, these `MPIBackend` API changes are backwards-compatible, and no changes to current usage code are needed. I suggest you read the long comments in `parallel_backends.py::_determine_cores_hwthreading()` outlining how each variation is handled. Previously, if the user did not provide the number of MPI Processes they wanted to use, `MPIBackend` assumed that the number of detected "logical" cores would suffice. As George previously showed, this does not work for HPC environments like on OSCAR, where the only true number of cores that we are allowed to use is found by `psutil.Process().cpu_affinity()`, the "affinity" core number. There is a third type of number of cores besides "logical" and "affinity" which is important: "physical". However, there was an additional problem here that was still unaddressed: hardware-threading. Different platforms and situations report different numbers of logical, affinity, and physical CPU cores. One of the factors that affects this is if there is hardware-threading present on the machine, such as Intel Hyperthreading. In the case of an example Linux laptop having an Intel chip with Hyperthreading, the logical and physical core numbers will report different values with respect to each other: logical includes Hyperthreads (e.g. `psutil.cpu_count(logical=True)` reports 8 cores), but physical does not (e.g. `psutil.cpu_count(logical=False)` reports 4 cores). If we tell MPI to use 8 cores ("logical"), then we ALSO need to tell it to also enable the hardware-threading option. However, if the user does not want to enable hardware-threading, then we need to make this an option, tell MPI to use 4 cores ("physical"), and tell MPI to not use the hardware-threading option. The "affinity" core number makes things even more complicated, since in the Linux laptop example, it is equal to the logical core number. However, on OSCAR, it is very different than the logical core number, and on Macos, it is not present at all. In `_determine_cores_hwthreading()`, if you read the lengthy comments, I have thought through each common scenario, and I believe resolved what to do for each, with respect to the number of cores to use and whether or not to use hardware-threading. These scenarios include: the user choosing to use hardware-threading (default) or not, across Macos variations with and without hardware-threading, Linux local computer variations with and without hardware-threading, and Linux HPC (e.g. OSCAR) variations which appear to never support hardware-threading. In the Windows case, due to both jonescompneurolab#589 and the currently-untested MPI integration on Windows, I always report the machine as not having hardware-threading. Additionally, previously, if the user did provide a number for MPI Processes, `MPIBackend` used some "heuristics" to decide whether to use MPI oversubscription and/or hardware-threading, but the user could not override these heuristics. Now, when a user instantiates an `MPIBackend` with `__init__()` and uses the defaults, hardware-threading is detected more robustly and enabled by default, and oversubscription is enabled based on its own heuristics; this is the case when the new arguments `hwthreading` and `oversubscribe` are set to their default value of `None`. However, if the user knows what they're doing, they can also pass either `True` or `False` to either of these options to force them on or off. Furthermore, in the case of `hwthreading`, if the user indicates they do not want to use it, then `_determine_cores_hwthreading()` correctly returns the number of NON-hardware-threaded cores for MPI's use, instead of the core number including hardware-threads. I have also modified and expanded the appropriate testing to compensate for these changes. Note that this does NOT change the default number of jobs to use for the GUI if MPI is detected. Such a change breaks the current `test_gui.py` testing: see jonescompneurolab#960 jonescompneurolab#960

rythorpe changed the title ~~MPIBackend on Windows 10: "Unknown option: --use-hwthread-cpus~~ MPIBackend on Windows 10: "Unknown option: --use-hwthread-cpus" Jan 19, 2023

rythorpe changed the title ~~MPIBackend on Windows 10: "Unknown option: --use-hwthread-cpus"~~ MPIBackend error on Windows 10: "Unknown option: --use-hwthread-cpus" Jan 19, 2023

jasmainak linked a pull request Jan 20, 2023 that will close this issue

CIs for MSMPI #590

Open

This was referenced Dec 13, 2024

feat: refactor core/thread logic for mpibackend brown-ccv/hnn-core-ccv#2

Open

[MRG] Fix GUI MPI available cores #871

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPIBackend error on Windows 10: "Unknown option: --use-hwthread-cpus" #589

MPIBackend error on Windows 10: "Unknown option: --use-hwthread-cpus" #589

rythorpe commented Jan 19, 2023

rythorpe commented Jan 19, 2023

jasmainak commented Jan 19, 2023

jasmainak commented Jan 19, 2023

rythorpe commented Jan 19, 2023

dylansdaniels commented Jan 19, 2023

rythorpe commented Jan 19, 2023

dylansdaniels commented Jan 19, 2023

jasmainak commented Jan 20, 2023

jasmainak commented Jan 20, 2023

rythorpe commented Jan 20, 2023

jasmainak commented Jan 20, 2023

rythorpe commented Jan 20, 2023

jasmainak commented Jan 20, 2023

MPIBackend error on Windows 10: "Unknown option: --use-hwthread-cpus" #589

MPIBackend error on Windows 10: "Unknown option: --use-hwthread-cpus" #589

Comments

rythorpe commented Jan 19, 2023

rythorpe commented Jan 19, 2023

jasmainak commented Jan 19, 2023

jasmainak commented Jan 19, 2023

rythorpe commented Jan 19, 2023

dylansdaniels commented Jan 19, 2023

rythorpe commented Jan 19, 2023

dylansdaniels commented Jan 19, 2023

jasmainak commented Jan 20, 2023

jasmainak commented Jan 20, 2023

rythorpe commented Jan 20, 2023

jasmainak commented Jan 20, 2023

rythorpe commented Jan 20, 2023

jasmainak commented Jan 20, 2023