-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
Hi, I have been running the zero-shot evaluation on several models. For models that don't already have a universal top head list, I compute the indirect effect scores for each abstractive task, and, for the tasks with 10-shot performance greater than the majority baseline, aggregate them. I then add the top k heads (k is roughly proportional to # heads in the model) list to the compute_universal_function_vector function. I then evaluate using these top heads. For most models, I seem to get good results. However, for gemma 2 (of all sizes) and Mistral 7b v.01, I almost always get a top-1 accuracy lower than 10%, with a score that doesn't change much between the layers. Do you know why the tests don't work with these models?
Thanks!
Metadata
Metadata
Assignees
Labels
No labels