Skip to content

Commit

Permalink
Fix dynamo issue (#6527)
Browse files Browse the repository at this point in the history
Dynamo use faketensor to trace tensor ops. In some case, the mechanism
break compiling with deepspeed.

An example could be found at
https://gist.github.com/oraluben/9b8240c2fe482eb4382453d6c97a5f76, to
see issues, install deepspeed==0.14.4 instead of my fork

without this PR, llama cannot be compiled.

Detailed explanation:

1. `ZeROOrderedDict`
dynamo use deepcopy to copy tensors, which will call
`object.__reduce__`. When copying `ZeROOrderedDict`, the default
implementation do not copy its `_parent_module` and will lead to
failure.
2. `param` maybe faketensor and do not have `ds_status` yet, but during
tracing it's ok to just skip the `register_external_parameter`, it
should be done ways before.

---------

Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Logan Adams <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
  • Loading branch information
4 people authored Oct 25, 2024
1 parent 6e6563d commit 3d5cf73
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion deepspeed/runtime/zero/parameter_offload.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def _apply_forward_and_backward_to_tensors_only(module, forward_function, backwa

class ZeROOrderedDict(OrderedDict):

def __init__(self, parent_module=None, *args, **kwargs):
def __init__(self, parent_module, *args, **kwargs):
"""A replacement for ``collections.OrderedDict`` to detect external ZeRO params.
Args:
Expand All @@ -49,13 +49,18 @@ def __init__(self, parent_module=None, *args, **kwargs):
self._parent_module = parent_module
self._in_forward = False

def __reduce__(self):
r0, _, *r2 = super().__reduce__()
return (r0, (self._parent_module, )) + r2

This comment has been minimized.

Copy link
@keskival

keskival Nov 26, 2024

Contributor

This needs to be:

(r0, (self._parent_module, )) + tuple(r2)

Because r2 is a list, even though __reduce__() returns a tuple.

As it is, it gives:

TypeError: can only concatenate tuple (not "list") to tuple

@oraluben


def __getitem__(self, key):
param = super().__getitem__(key)

# Params can be registered as None (e.g., bias)
if param is None:
return param

# TODO: only weaken this check during compilation
if hasattr(param, "ds_status") and param.ds_status == ZeroParamStatus.NOT_AVAILABLE:
if self._parent_module._parameters._in_forward:
register_external_parameter(FWD_MODULE_STACK[-1], param)
Expand Down

0 comments on commit 3d5cf73

Please sign in to comment.