Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate_functional_correctness can't run #18

Closed
BoyuanJackChen opened this issue Nov 28, 2022 · 9 comments
Closed

evaluate_functional_correctness can't run #18

BoyuanJackChen opened this issue Nov 28, 2022 · 9 comments

Comments

@BoyuanJackChen
Copy link

BoyuanJackChen commented Nov 28, 2022

I created a conda environment with python3.7 using the exact same command in the doc. Then, I used openai's text-davinci-002 to generate a samples.jsonl file with 3 results for each problem.

Calling evaluate_functional_correctness samples.jsonl, I got the error message as below. I also tried to evaluate the example results with evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl, and got the same error.

I wonder how to fix it?

Error message:
Reading samples...
6it [00:00, 7427.93it/s]
Running test suites...
0%| | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/miniconda3/envs/codex/bin/evaluate_functional_correctness", line 33, in
sys.exit(load_entry_point('human-eval', 'console_scripts', 'evaluate_functional_correctness')())
File "/opt/miniconda3/envs/codex/bin/evaluate_functional_correctness", line 25, in importlib_load_entry_point
return next(matches).load()
File "/opt/miniconda3/envs/codex/lib/python3.8/importlib/metadata.py", line 77, in load
module = import_module(match.group('module'))
File "/opt/miniconda3/envs/codex/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 843, in exec_module
File "", line 219, in _call_with_frames_removed
File "/Users/boyuanchen/Desktop/human-eval/human_eval/evaluate_functional_correctness.py", line 30, in
sys.exit(main())
File "/Users/boyuanchen/Desktop/human-eval/human_eval/evaluate_functional_correctness.py", line 27, in main
fire.Fire(entry_point)
File "/opt/miniconda3/envs/codex/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/miniconda3/envs/codex/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/miniconda3/envs/codex/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/Users/boyuanchen/Desktop/human-eval/human_eval/evaluate_functional_correctness.py", line 22, in entry_point
results = evaluate_functional_correctness(sample_file, k, n_workers, timeout, problem_file)
File "/Users/boyuanchen/Desktop/human-eval/human_eval/evaluation.py", line 75, in evaluate_functional_correctness
result = future.result()
File "/opt/miniconda3/envs/codex/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/opt/miniconda3/envs/codex/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/opt/miniconda3/envs/codex/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/boyuanchen/Desktop/human-eval/human_eval/execution.py", line 77, in check_correctness
p.start()
File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/opt/miniconda3/envs/codex/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'check_correctness..unsafe_execute
`

@ghost
Copy link

ghost commented Dec 13, 2022

If you are on windows, this problem can be solved by declaring a global var of unsafe_execute just above the function.

And after this you would encounter Attribute error for unsafe_execute, that can be solved by installing multiprocess library and then replacing "multiprocessing" with multiprocess instead.

Another error on windows would be of pass@=0 instead of 0.4999, this can be solved by removing the timeout function in execution.py for sample sanity check only, for testing on generated samples you have to use timeout but have to find a way to use something different than setitimer() function which causes the pass@=0 issue.

@WelliJohn
Copy link

when I run on mac,I get the same error,can anyone help?thanks a lot

@jacob1017
Copy link

reinstall multiprocess, and replace multiprocessing used in human_eval/execution.py, then here we go.

@tianzhaotju
Copy link

when I run on mac,I get the same error,can anyone help?thanks a lot

so sad!!! I also have the same problem. Do u solve it, my friend? thx

@davide221
Copy link

davide221 commented Jun 27, 2023

After importing and change from multiprocessing to multiprocess. What worked for me was to use this

windows machine:

import threading

class TimeoutException(Exception):
    pass

@contextlib.contextmanager
def time_limit(seconds: float):
    timer = threading.Timer(seconds, lambda: (_ for _ in ()).throw(TimeoutException("Timed out!")))
    timer.start()
    try:
        yield
    finally:
        timer.cancel()

linux machine

@contextlib.contextmanager
def time_limit(seconds: float):
    def signal_handler(signum, frame):
        raise TimeoutException("Timed out!")
    signal.setitimer(signal.ITIMER_REAL, seconds)
    signal.signal(signal.SIGALRM, signal_handler)
    try:
        yield
    finally:
        signal.setitimer(signal.ITIMER_REAL, 0)
        pass

@pnewhook
Copy link

pnewhook commented Oct 6, 2023

On a Mac with Python 3.10 installing multiprocess and removing imports and usages of multiprocessing worked for me.

haesleinhuepf added a commit to haesleinhuepf/human-eval-bia that referenced this issue Mar 23, 2024
@RyanLoil
Copy link

RyanLoil commented Mar 28, 2024

After importing and change from multiprocessing to multiprocess. What worked for me was to use this

windows machine:

import threading

class TimeoutException(Exception):
    pass

@contextlib.contextmanager
def time_limit(seconds: float):
    timer = threading.Timer(seconds, lambda: (_ for _ in ()).throw(TimeoutException("Timed out!")))
    timer.start()
    try:
        yield
    finally:
        timer.cancel()

linux machine

@contextlib.contextmanager
def time_limit(seconds: float):
    def signal_handler(signum, frame):
        raise TimeoutException("Timed out!")
    signal.setitimer(signal.ITIMER_REAL, seconds)
    signal.signal(signal.SIGALRM, signal_handler)
    try:
        yield
    finally:
        signal.setitimer(signal.ITIMER_REAL, 0)
        pass

For Windows user, here is more things need to be done:

  1. addif __name__ == "__main__": before the sys.exit(main()) otherwise you will get a error which recommends you to use "freeze_support()"
  2. install multiprocess and removing imports and usages of multiprocessing or you will get "Ran out of input" error. Please ignore the IDE's "can't find **()" warning as the package multiprocess doesn't announce them in init.py but it still works.
  3. change the time_limit function is necessary as the origin one didn't work properly.
  4. TimeoutException class defination is optional

@NA-Wen
Copy link

NA-Wen commented May 20, 2024

reinstall multiprocess, and replace multiprocessing used in human_eval/execution.py, then here we go.

This works well for me on windows 11. Thank you so much!

@BoyuanJackChen
Copy link
Author

Thanks all! I have resolved the issue with the given advices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants