NLPromptEval - Official Code from "What Makes a Good Natural Language Prompt?"

This repository provides code and instructions for reproducing the figures and tables presented in the paper:

Title: What Makes a Good Natural Language Prompt?

Paper Link:

The purpose of this repository is to facilitate the reproduction of results presented in the paper, specifically the figures and tables demonstrating the performance of various large language models on instruction-following tasks.

Setup Instructions

Clone the repository To clone the repository, run the following command:

git clone https://github.com/dxlong2000/NLPromptEval.git

Install dependencies
Reproducing the figures and tables

To reproduce Figure 1: Correlations of properties evaluated by GPT-4o, first run

python /src_gpt_4o_multiturn_final.py

then run

python /analysis.py

To reproduce Table 2: Performance of models (%) on various tasks under different configurations

cd inference-codes

and then run the script for the corresponding models and datasets.

To reproduce Table 3: Performance of two fine-tuned Qwen-2.5-7B-it models, run

python /finetuning-codes/finetuning_qwen.py

Citations:

If you use this repository in your work, please cite the original paper:

@article{long2025makes,
  title={What Makes a Good Natural Language Prompt?},
  author={Long, Do Xuan and Dinh, Duy and Nguyen, Ngoc-Hai and Kawaguchi, Kenji and Chen, Nancy F and Joty, Shafiq and Kan, Min-Yen},
  journal={arXiv preprint arXiv:2506.06950},
  year={2025}
}

License

This repository is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
data		data
finetuning-codes		finetuning-codes
human		human
inference-codes		inference-codes
README.md		README.md
anaysis.py		anaysis.py
human_eval.py		human_eval.py
src_gemini_2_flash_multiturn_final.py		src_gemini_2_flash_multiturn_final.py
src_gpt_4o_enhanced.py		src_gpt_4o_enhanced.py
src_gpt_4o_final.py		src_gpt_4o_final.py
src_gpt_4o_multiturn_final.py		src_gpt_4o_multiturn_final.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLPromptEval - Official Code from "What Makes a Good Natural Language Prompt?"

Paper Link:

About

Uh oh!

Releases

Packages

Languages

dxlong2000/NLPromptEval

Folders and files

Latest commit

History

Repository files navigation

NLPromptEval - Official Code from "What Makes a Good Natural Language Prompt?"

Paper Link:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages