Skip to content

fix steering baseline#701

Open
xwh12345-user wants to merge 1 commit into
zjunlp:mainfrom
xwh12345-user:issue1
Open

fix steering baseline#701
xwh12345-user wants to merge 1 commit into
zjunlp:mainfrom
xwh12345-user:issue1

Conversation

@xwh12345-user

Copy link
Copy Markdown
Contributor

This PR fixes ori_generate() / ori_vllm_generate() so original predictions are generated without active interventions.

What Changed

  • Added shared generation-state helpers in steer/models/model_wrapper.py:

    • _save_and_clear_generation_state()
    • _restore_generation_state()
  • ori_generate() now temporarily clears and restores both:

    • add_activations_dict
    • intervention_dict
  • ori_vllm_generate() now applies the same cleanup/restore logic.

  • GPTWrapper.ori_generate() was also updated to use the shared helper logic.

This ensures orig_pred is generated from the base model state, without being affected by RePS/SFT/SPILT-style interventions stored
in intervention_dict.

What Was Tested

  • Static checks:

    • python -m py_compile steer/models/model_wrapper.py
    • git diff --check
  • Unit-style state restoration test:

    • Verified add_activations_dict and intervention_dict are both cleared before generation.
    • Verified both are restored after generation.
    • Verified intervention objects are restored as the same original objects.
  • Real-model tests on GPU 0:

    • /disk4/xuweihong/models/Qwen2.5-7B-Instruct
    • /disk4/xuweihong/models/Qwen3-VL-4B-Instruct

Results

  • Qwen2.5-7B-Instruct

    • Baseline ori_generate(): Paris
    • Normal generate() with intervention: London
    • Fixed ori_generate() with intervention: Paris
    • Intervention restored after generation: yes
  • Qwen3-VL-4B-Instruct

    • Normal generate() was affected by intervention.
    • Fixed ori_generate() matched the baseline output.
    • Intervention restored after generation: yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant