Original caption: Figure 1: Overview of the methods employed in this paper. We pair all games and scenarios to generate 20 unique combinations, which

Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models

submited by
Style Pass
2024-11-28 06:30:09

Original caption: Figure 1: Overview of the methods employed in this paper. We pair all games and scenarios to generate 20 unique combinations, which form the backbone of our dataset. We then submit each combination to each model, and obtain 300 observations per combination. For LLaMa2-70b, we ask for an answer and a motivation; we ask the other models only for their answers. We use the answers coming from LLaMa2-70b to perform LORA on a small, pre-trained LLaMa2-7b. The fine-tuned model is then again queried like the pre-trained model, and once that is done, we collect all data and measure the impact of fine-tuning on preferences.

Original caption: Figure 2: Preliminary investigation of differences in responses between LLaMa2-7b and LLaMa2-70b, grouped by context and game. Clockwise, from the left: propensity to cooperate in LLaMa2-7b grouped by context; propensity to cooperate in LLaMa2-7b grouped by game; propensity to cooperate in LLaMa2-70b grouped by context; propensity to cooperate in LLaMa2-70b grouped by games. Notably, LLaMa2-7b is almost entirely indifferent to context and game and displays a remarkable bias for choosing cooperation, whereas LLaMa2-70b adapts to new contexts and game structures to a remarkable extent.

Original caption: Table 1: Difference in proportion z𝑧zitalic_z-score testing for propensity to cooperate in within-sample and out-of-sample games for the LLaMa2-7b fine-tuned model. For each scenario, we report: the proportion of cooperative choices in the within-sample game, the proportion of cooperative choices in the out-of-sample game, the difference in proportions, standard error, z𝑧zitalic_z-score, and associated p𝑝pitalic_p-value. Reported significance levels follow standard practices: one asterisk (*) for significance at the 0.050.050.050.05 level, two asterisks (**) for significance at the 0.010.010.010.01 level, and three asterisks (***) for significance at the 0.0010.0010.0010.001 level.

Leave a Comment