Constrained Human-AI Cooperation (CHAIC): An Inclusive Embodied Social Intelligence Challenge

✨ Introduction

The repo contains code for the following paper:

Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge (NeurIPS D&B Track 2024)

Authors: Weihua Du*, Qiushi Lyu*, Jiaming Shan, Zhenting Qi, Hongxin Zhang, Sunli Chen, Andi Peng, Tianmin Shu, Kwonjoon Lee, Behzad Dariush, Chuang Gan

You could view the [Project Page] for some video demos.

We introduce Constrained Human-AI Cooperation (CHAIC), an inclusive embodied social intelligence challenge designed to test social perception and cooperation in embodied agents. In CHAIC, the goal is for an embodied agent equipped with egocentric observations to assist a human who may be operating under physical constraints—e.g., unable to reach high places or confined to a wheelchair—in performing common household or outdoor tasks as efficiently as possible. To achieve this, a successful helper must: (1) infer the human's intents and constraints by following the human and observing their behaviors (social perception), and (2) make a cooperative plan tailored to the human user to solve the task as quickly as possible, working together as a team (cooperative planning).

To benchmark this challenge, we create four new agents with real physical constraints and eight long-horizon tasks featuring both indoor and outdoor scenes with various constraints, emergency events, and potential risks. We benchmark planning- and learning-based baselines on the challenge and introduce a new method that leverages Large Language Models and behavior modeling. Empirical evaluations demonstrate the effectiveness of our benchmark in enabling systematic assessment of key aspects of machine social intelligence.

🛠️ Setup

Step 1: Run the following commands step by step to set the environments:

conda create -n CHAIC python=3.9
conda activate CHAIC
pip3 install -e .
pip3 install torch==1.13.1+cu117 torchvision==0.14.1+cu117 \
 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

If you are running in a remote server without a screen, please refer to running TDW in server.

After that, you can run the demo scene to verify your setup:

python demo/demo_scene.py

Step 2: Install and download pre-trained perception models:

pip install -U openmim
mim install mmengine
mim install mmcv==2.1.0
mim install mmdet
pip install mmaction2
bash detection_pipeline/download_ckpt.sh

Step 3: Download the asset needed for the environment:

bash scripts/download_local_asset.sh

After that, you can run the perception demos to verify them:

python tdw-gym/detection.py
python tdw-gym/behavior.py

Notice: There may exist some internal bugs in the mmaction package, and you can refer to the Github issue to fix it when you meet trouble.

💾 Codebase Layout

Some important folders and their corresponding functions are listed here.

|__ tdw-gym/                         Main code
|
|__ scenes/                          Code for dataset generation
|
|__ dataset/                         Dataset configuration and storage
|
|__ transport_challenge_multi_agent/ Low level controller
|
|__ scripts/                         Scripts for running experiments
|
|__ detection_pipeline/              Code for perception models
|
|__ LM_agent/                        LLM & VLM Prompt

💫 Run Experiments

We prepare all the experiment scripts to run experiments under the folder scripts. For example, to run experiments with Random Helper in the highthing setting, you can use the following command:

bash scripts/random_helper/test_high_thing_random_helper.sh

By adding --gt_mask or --gt_behavior in the scripts, the environment will provide ground truth object segmentation masks or ground truth behaviors of the partner, respectively.

Notice: If you want to test the LLM+BM helper or the VLM helper, you need to fill your AzureOpenAI setting or OPENAI_API_KEY at lines 73-77 in LM_agent/LLM.py or lines 74-78 in LM_agent/VLM.py.

🧾 Benchmark Highlights

Multi-Agent Asynchronized Setting

Agents may take different frames to finish (or fail) one action, and one env step is finished until any agent's action is not under the ongoing status, and the current observation is returned to all agents. Then, all agents are asked for a new action, and any agent having ongoing action will switch to the new action if its action changes.

Heterogeneity of Agents

Different types of agents have different capacity scopes, and agents with different capacity scopes need to work together to achieve a common goal. Meanwhile, although the task goal is the same for all agents, the constrained agent and the helper will receive different information about the goal: The constrained agent can know the exact goal of the task, while the helper needs to perceive the constrained agent's behavior and infer the true goal.

Realistic Observation

One goal of CHAIC is to mimic real life as similar as possible. Therefore, we only provide the raw RGB-D images as the main observation (the benchmark also supports many other types of observation), making our benchmark challenging and having a wide range of application space.

🤖 Creating a new agent

First, you should learn about the details of the observation. The environment returns each agent's observation every step, which is a dictionary that includes the following items:

RGB: RGB image of the current agent's view
depth: depth image of the current agent's view
camera_matrix: the camera matrix of the current agent's ego camera
FOV: the field of view of the current agent's ego camera
agent: a list of length 6 that contains the current position (x, y, z) and forward (fx, fy, fz) of the agent, formatted as [x, y, z, fx, fy, fz].
held_objects: all the objects that the current agent is holding. It is a list of length 2 that contains the information of the object that is held in the agent's two hands. Each object's information contains its name, type, and a unique id. If it's a container, it also includes the information of the objects in it.
status: the status of the current action, which is a number from 0 to 2. 0 for 'ongoing', 1 for 'failure', and 2 for 'success'.
current_frames: the number of frames passed
valid: whether the last action of the agent is valid
previous_action & previous_status: all previous actions of the agent and their corresponding status

To create a new agent, you must create a Python file in a folder named 'agent' in the root directory of the repository, and write your own agent in it. You must create a class named 'PlanAgent' and implement the following two functions in the class:

def reset(self, obs, info):
def act(self, obs):

The function reset is used for initializing the agent at the beginning of the episode. It receives two arguments: 'obs' is the initial observation of the agent, and 'info' is the task's information. The information contains the names of all the possible objects, the goal location, rooms, etc.

The function act is the core part of the agent. It determines the next action of the agent. It receives the current observation from the environment and returns the action. Each action should be a dictionary and set its "type" key to an integer between 0 and 8, each refers to a certain type of action:

0: move forward by 0.5 meters
1: turn left by 15 degrees
2: turn right by 15 degrees
3: pick up an object, it should contain another key named 'object' whose value is the id of the object to pick, together with a key named 'arm' representing which hand to pick. 'left' for left hand, 'right' for right hand.
4: put the object in one hand to the container in the other hand.
5: put the object in one hand on some surface, it should contain a key named 'object' whose value is the id of the object to put on its surface, together with a key named 'arm' representing which hand to put.
6: send message, that would never be used.
7: remove obstacle, it should contain another key named 'object' whose value is the id of obstacle to pick, together with a key named 'arm' representing which hand to pick.
8: wait for several frames, it should contain a key named 'delay' indicating the number of frames to wait.

We provide an example in agent/example_agent.py. If you have any other questions, please refer to that example first.

To evaluate your agent on a certain task, create a script like the following.

bash scripts/test.sh

This script evaluates the example agent mentioned above in the High Container task. You should change the second item of the 'agents' argument, which represents the helper's name, to the name of the Python file of your implemented agent. Then you can just run the script and get the result. You can also change the 'output_dir' of the script to customize the position and save the result.

🏆 Results

We report the average Transport Rate (TR), Efficiency Improvement (EI), Goal Inference Accuracy (IA), Completion Ratio of Helper (CR), and Standard Error of Transport Rate (STD_TR) here. w/o means the constraint agent does the task solely without a helper. The Emergency Rate (ER) metric is also reported for the shopping task.

The table below shows the quantitative results of TR and EI, which is the most important metrics to measure helper's efficiency.

TR(EI)↑	Indoor						Outdoor
Helper Agent	Normal	High Target	High Container	High Goalplace	Lowthing	Wheelchair	Shopping	Furniture	Average
w/o	0.53	0.30	0.37	0.28	0.51	0.07	0.37	0.17	0.33
Random	0.52(-0.02)	0.27(-0.05)	0.36(0.00)	0.33(0.10)	0.50(-0.01)	0.21(0.56)	0.39(0.05)	0.48(0.68)	0.38(0.16)
RHP	0.64(0.15)	0.35(0.11)	0.45(0.19)	0.35(0.18)	0.66(0.23)	0.44(0.77)	0.49(0.22)	0.65(0.72)	0.50(0.32)
RL	0.45(-0.19)	0.26(-0.16)	0.28(-0.25)	0.25(-0.22)	0.43(-0.16)	0.11(0.07)	0.32(-0.13)	0.67(0.74)	0.35(-0.04)
SmartHelp	0.46(-0.12)	0.24(-0.17)	0.26(-0.28)	0.31(0.01)	0.49(-0.04)	0.13(0.11)	0.32(-0.13)	0.57(0.70)	0.35(0.01)
VLM	0.63(0.14)	0.33(0.06)	0.43(0.12)	0.26(-0.20)	0.69(0.26)	0.40(0.86)	0.50(0.25)	0.70(0.78)	0.49(0.28)
LLM+BM	0.65(0.17)	0.38(0.19)	0.49(0.24)	0.36(0.23)	0.70(0.27)	0.42(0.89)	0.58(0.33)	0.69(0.77)	0.53(0.39)
Oracle	0.77(0.31)	0.49(0.37)	0.69(0.47)	0.61(0.56)	0.82(0.38)	0.60(0.87)	0.61(0.39)	0.76(0.80)	0.67(0.52)

You can find the results of other metrics here.

📝 Submit Your Results

You can submit your helper's results by opening a GitHub Issue, which should include your code and results. A detailed submission guideline will be released soon.

🛠️ Known Issues

We found that sometimes the wheelchair agent may block at corners due to its model shape, so another model (a limping person with the same capacity as the wheelchair agent) replaces the original model to decrease the variance of results.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
LM_agent		LM_agent
README		README
agent		agent
controllers		controllers
dataset		dataset
demo		demo
detection_pipeline		detection_pipeline
docs		docs
scenes		scenes
scripts		scripts
smart_help_baseline		smart_help_baseline
tdw-gym		tdw-gym
transport_challenge_multi_agent		transport_challenge_multi_agent
unit_test		unit_test
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Constrained Human-AI Cooperation (CHAIC): An Inclusive Embodied Social Intelligence Challenge

✨ Introduction

🛠️ Setup

💾 Codebase Layout

💫 Run Experiments

🧾 Benchmark Highlights

Multi-Agent Asynchronized Setting

Heterogeneity of Agents

Realistic Observation

🤖 Creating a new agent

🏆 Results

📝 Submit Your Results

🛠️ Known Issues

About

Releases

Packages

Contributors 3

Languages

License

UMass-Foundation-Model/CHAIC

Folders and files

Latest commit

History

Repository files navigation

Constrained Human-AI Cooperation (CHAIC): An Inclusive Embodied Social Intelligence Challenge

✨ Introduction

🛠️ Setup

💾 Codebase Layout

💫 Run Experiments

🧾 Benchmark Highlights

Multi-Agent Asynchronized Setting

Heterogeneity of Agents

Realistic Observation

🤖 Creating a new agent

🏆 Results

📝 Submit Your Results

🛠️ Known Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages