Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs (Zhang et al. 2023).
PASTA allows a user to improve LLM controllability by simply emphasizing part of the prompt (e.g. the instruction) that the LLM should focus on. It requires no changes to LLM weights and no increase in inference time.
- Install
pastalib
:
pip install pastalib
# Alternatively,
# clone then pip install -e .
# pip install git+https://github.com/QingruZhang/PASTA
- Initialize a pre-trained LLM and PASTA.
from pastalib.pasta import PASTA
from transformers import AutoModelForCausalLM,AutoTokenizer
# Initialize pre-trained LLM
name = "huggyllama/llama-7b"
model = AutoModelForCausalLM.from_pretrained(name)
tokenizer = AutoTokenizer.from_pretrained(name)
# Select the attention heads to be steered,
# following the format of {'layer_id': [head_ids]}:
head_config = {
"3": [17, 7, 6, 12, 18], "8": [28, 21, 24], "5": [24, 4],
"0": [17], "4": [3], "6": [14], "7": [13], "11": [16],
}
# Initialize the PASTA steerer
pasta = PASTA(
model=model,
tokenizer=tokenizer,
head_config=head_config,
alpha=0.01, # scaling coefficient
scale_position="exclude", # downweighting unselected tokens
)
- Select specific input spans to emphasize, and then run inference as normal.
# Model Input
texts = ["Mary is a doctor. She obtains her bachelor degree from UCSD. Answer the occupation of Mary and generate the answer as json format."]
# ===== Without PASTA =====
# inputs = tokenizer(texts, return_tensors="pt")
# outputs = model.generate(**inputs)
# ---------------------
# ["The answer should be in json format."] # returns answer in the wrong format
# ===== With PASTA =====
inputs, offset_mapping = pasta.inputs_from_batch(texts)
# User highlights specific input spans
emphasized_texts = ["Answer the occupation of Mary and generate the answer as json format"]
# PASTA registers the pre_forward_hook to edit attention
with pasta.apply_steering(
model=model,
strings=texts,
substrings=emphasized_texts,
model_input=inputs,
offsets_mapping=offset_mapping
) as steered_model:
outputs = steered_model.generate(**inputs, max_new_tokens=128)
# -------------------------------
# ['{"name": "Mary", "occupation": "Doctor", ...}'] # returns answer in the correct format
-
pastalib
works with any models that apply causal attention by summing up query-key inner product with attention mask and can be applied to LLMs in a plug-and-play manner. For example,LlamaForCausalLM
andGPTJForCausalLM
fromtransformers
, whose attention moduels apply attention masks followingtorch.matmul(query, key) + attention_mask
. However,pastalib
currently only supports LLAMA, LLAMA-2 and GPT-J (more models in progress!). -
We provide different options of
head_config
for LLAMA-7B and GPT-J in the folder of config/head_config, including multi-task, task-agnostic and task-specific settings. Please see detailed discussion in our paper.
The overview of this repo is as follows:
pastalib
: contains the source code of PASTA libary, which can be applied to models from huggingfacetransformers
.evaluation
: consists of evaluation pipelines for different tasks, including data/model preprocessing and task evaluators/metrices.config
: includes thehead_config
for steering attention modules of LLAMA-7B and GPT-J with PASTA.scripts
: consists of running scripts of four tasks: JSON Formatting, Pronouns Changing, Bias in Bios, and CounterFact.
The evaluation pipeline are mainly refactored from REMEDI repo. Please see more details there.
Set up the environment with the following commands:
conda create -n pasta python=3.10
pip install -r requirements.txt
pip install -e .
python -m spacy download en_core_web_sm
python -W ignore -m nltk.downloader punkt cmudict
By default, the preprocessed datasets, models, and results are saved in the local directory of ./data
, ./models
, and ./results
. You can change the directory of their by setting the environment variables:
export CM_DATA_DIR=<data path>
export CM_MODELS_DIR=<models path>
export CM_RESULTS_DIR=<results path>
-
For CounterFact, our scripts can automatically download the dataset.
-
For Bias in Bios, we cannot release the dataset without the authorization. The dataset must be downloaded with the official release. After downloading the data examples into the
BIOS.pkl
file, you can run the following scripts:
python reformat_dataset.py \
--biasbios_raw_path <path of BIOS.pkl> \
--biasbios_save_file biasbios.json
Then, the preprocessed biasbios dataset file will be saved in CM_DATA_DIR/biasbios.json
.
Choose any head_config
files from config/head_config
and evaluate the performance of PASTA with the following command.
JSON Formatting
python -m scripts.eval_biasbios_instruction \
--task json \
--apply_pasta \
--emphasized_text instruct \
--alpha 0.01 \
--scale_position exclude \
--pasta_head_config <head_config_path> \
--model huggyllama/llama-7b \
--prompt_idx 0 \
--batch_size 16 \
--max_new_tokens 128 \
--experiment_name llama_evaluation \
--device cuda
Pronouns Changing
python -m scripts.eval_biasbios_instruction \
--task pronchange \
--apply_pasta \
--emphasized_text instruct \
--alpha 0.01 \
--scale_position exclude \
--pasta_head_config <head_config_path> \
--prompt_idx 0 \
--model huggyllama/llama-7b \
--max_new_tokens 128 \
--batch_size 16 \
--experiment_name llama_evaluation \
--device cuda
Bias in Bios
python -m scripts.eval_bias_gen \
--model huggyllama/llama-7b \
--apply_pasta \
--alpha 0.01 \
--scale_position exclude \
--pasta_head_config <head_config_path> \
--max_length 256 \
--batch_size 16 \
--experiment_name llama_evaluation \
--device cuda
CounterFact
python -m scripts.eval_fact_gen \
--model huggyllama/llama-7b \
--apply_pasta \
--alpha 0.01 \
--scale_position exclude \
--pasta_head_config <head_config_path> \
--add_unmediated_fact True \
--benchmarks efficacy paraphrase generation \
--experiment_name llama_evaluation
Please contact us or post an issue if you have any questions:
- Qingru Zhang ([email protected])
- Chandan Singh ([email protected])
- Liyuan Liu ([email protected])
- Xiaodong Liu ([email protected])
@misc{zhang2023tell,
title={Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs},
author={Qingru Zhang and Chandan Singh and Liyuan Liu and Xiaodong Liu and Bin Yu and Jianfeng Gao and Tuo Zhao},
year={2023},
eprint={2311.02262},
archivePrefix={arXiv},
primaryClass={cs.CL}
}