Refining Text Generation for Realistic Conversational Recommendation via Direct Preference Optimization
This repository contains the implementation code for the research paper "Refining Text Generation for Realistic Conversational Recommendation via Direct Preference Optimization". We propose a method to improve text generation in Conversational Recommender Systems (CRSs) using Direct Preference Optimization (DPO).
View high-resolution diagram (PDF)
Traditional Conversational Recommender Systems (CRSs) face challenges such as hasty recommendations in short conversations and insufficient integration of implicit information. This research extends SumRec by applying DPO to optimize both dialogue summary generation and item recommendation information generation models, achieving more realistic and natural conversational recommendation.
- Extension of SumRec using DPO to propose a recommendation method suitable for realistic conversational recommendation datasets
- Demonstration of superior recommendation performance through comparison with baseline methods and the original SumRec
- Stage 1: Pre-training of score predictor (DeBERTa)
- Stage 2: DPO training for dialogue summary generation and item recommendation information generation models
- Base Model: Llama-3.1-Swallow-8B-v0.1
- Score Predictor: DeBERTa-v3-japanese-large
- Optimization Method: Direct Preference Optimization (DPO)
-
Description: Travel agent recommendation dialogues (realistic long conversations)
-
Download and Placement:
- Download the dataset from the above link
- Place the downloaded files in
data/Tabidachi/annotation_data/
Directory structure after placement:
data/Tabidachi/ βββ annotation_data/ βββ annotations/ # Directory containing dialogue annotation files β βββ *.json # Individual dialogue session files βββ spot_info.json # Tourist spot information βββ γΏγ°δΈθ¦§.docx # Tag list document
-
Description: Multi-category recommendation dialogues (for comparison)
-
Download and Placement:
- Download the dataset from the GitHub repository
- Place the downloaded files in
data/ChatRec/chat_and_rec/
Directory structure after placement:
data/ChatRec/ βββ chat_and_rec/ βββ except_for_travel/ # Non-travel recommendation dialogues β βββ *.json βββ travel/ # Travel recommendation dialogues β βββ *.json βββ no_restriction/ # General recommendation dialogues βββ *.json
- Llama-3.1-Swallow-8B-v0.1: https://huggingface.co/tokyotech-llm/Llama-3.1-Swallow-8B-v0.1
- DeBERTa-v3-japanese-large: https://huggingface.co/globis-university/deberta-v3-japanese-large
- Python 3.10.12
- CUDA-compatible GPU environment
- GPU: 4 Γ Nvidia A100 80GB
# Create virtual environment
python -m venv .venv
# Activate environment
source .venv/bin/activate # Linux/Mac
# or .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt- PyTorch 2.4.1
- Transformers 4.46.2
- TRL (Transformer Reinforcement Learning) 0.12.1
- Optuna 4.1.0
- Datasets 3.1.0
- See
requirements.txtfor complete details
This project uses four types of dataset formats:
- datasets_1: Recommendation datasets (for dialogue summary generation and item recommendation information generation)
- datasets_2: Score predictor training datasets
- datasets_3: DPO datasets for dialogue summary generation models
- datasets_4: DPO datasets for item recommendation information generation models
- Splitting Method: User ID-based
- Data Composition:
- Training data: 20 adults, 7 elderly, 15 children
- Validation data: 2 adults, 1 elderly, 2 children
- Test data: 3 adults, 2 elderly, 3 children
- User ID Ranges:
- Adults: 101-125 (25 users)
- Elderly: 201-210 (10 users)
- Children: 301-320 (20 users)
- Scoring: Actually recommended items = 1, others = 0
- Splitting Method: Category-based
- Data Count by Category:
- except_for_travel: 223 items (train: 178, test: 33, validation: 12)
- travel: 237 items (train: 189, test: 35, validation: 13)
- no_restriction: 545 items (train: 436, test: 81, validation: 28)
- Score Conversion:
- Human-predicted scores (1-5 scale by 5 third-party workers) β Scores <3 converted to 0 (dislike), scores β₯3 converted to 1 (like)
- This conversion unifies the scoring with Tabidachi corpus binary classification
- Ratio: Train:Test:Validation = 8:1.5:0.5
# Download Tabidachi dataset and place in data/Tabidachi/annotation_data/
# Then preprocess the raw data
python src/Tabidachi/data_preprocessing.py# Execute in order to create all dataset formats
python src/Tabidachi/create_dataset_1.py # Basic recommendation dataset
python src/Tabidachi/create_dataset_2.py # Score predictor training data
python src/Tabidachi/create_dataset_3.py # DPO data for dialogue summary generation
python src/Tabidachi/create_dataset_4.py # DPO data for item recommendation information# Train DeBERTa with method selection
# Method options: proposal, baseline1, baseline2
# - proposal/baseline1: Uses 3 inputs (dialogue summary, item recommendation information, candidate info)
# - baseline2: Uses 2 inputs (dialogue summary, candidate info)
python src/Tabidachi/train_deberta.py --method proposal&baseline1
# Or:
python src/Tabidachi/train_deberta.py --method baseline2
# Output: Model saved to src/Tabidachi/deberta_best_model_[method]/# Train dialogue summary generation model with hyperparameter optimization
python src/Tabidachi/dpo_summary_llm.py # Creates model 1
python src/Tabidachi/dpo_summary_llm_more.py # Creates models 2-5
# Output: Models saved as dpo-summary-results_1/, dpo-summary-results_2/, etc.
# Train item recommendation information generation model with hyperparameter optimization
python src/Tabidachi/dpo_recommendation_llm.py # Creates model 1
python src/Tabidachi/dpo_recommendation_llm_more.py # Creates models 2-5
# Output: Models saved as dpo-recommendation-results_1/, dpo-recommendation-results_2/, etc.# Proposed method (using DPO-trained models)
python src/Tabidachi/create_recommend_data_proposal.py
# Baseline methods for comparison
python src/Tabidachi/create_recommend_data_baseline1.py # baseline1 = SumRec (from paper)
python src/Tabidachi/create_recommend_data_baseline2.py # baseline2 = Baseline (from paper)
# Ablation studies (optional)
python src/Tabidachi/create_recommend_data_ablation1.py # w/o Rec-DPO
python src/Tabidachi/create_recommend_data_ablation2.py # w/o Sum-DPO# Evaluate with method selection
# For proposal/ablation methods: automatically evaluates models 1-5 and computes average
# For baseline methods: evaluates single dataset
# Proposed method (evaluates all 5 models)
python src/Tabidachi/evaluate_from_recommend_data.py --method proposal
# Baseline methods (single dataset each)
python src/Tabidachi/evaluate_from_recommend_data.py --method baseline1
python src/Tabidachi/evaluate_from_recommend_data.py --method baseline2
# Ablation studies (evaluates all 5 models)
python src/Tabidachi/evaluate_from_recommend_data.py --method ablation1
python src/Tabidachi/evaluate_from_recommend_data.py --method ablation2
# Output: HR@k and MRR@k metrics for selected method# Download ChatRec dataset and place in data/ChatRec/chat_and_rec/
# Then preprocess the raw data
python src/ChatRec/data_preprocessing.py# Execute in order
python src/ChatRec/create_dataset_1.py # Basic recommendation dataset
python src/ChatRec/create_dataset_2.py # Score predictor training data
python src/ChatRec/create_dataset_3.py # DPO data for dialogue summary generation
python src/ChatRec/create_dataset_4.py # DPO data for item recommendation information# Train DeBERTa with method selection
# Method options: proposal, baseline1, baseline2
# - proposal/baseline1: Uses 3 inputs (dialogue summary, item recommendation information, candidate info)
# - baseline2: Uses 2 inputs (dialogue summary, candidate info)
python src/ChatRec/train_deberta.py --method proposal&baseline1
# Or:
python src/ChatRec/train_deberta.py --method baseline2
# Output: Model saved to src/ChatRec/ChatRec_deberta_best_model_[method]/# DPO Training
python src/ChatRec/dpo_summary_llm.py # Creates model 1
python src/ChatRec/dpo_summary_llm_more.py # Creates models 2-5
python src/ChatRec/dpo_recommendation_llm.py # Creates model 1
python src/ChatRec/dpo_recommendation_llm_more.py # Creates models 2-5# Generate recommendation data
python src/ChatRec/create_recommend_data_proposal.py # Proposed method
python src/ChatRec/create_recommend_data_baseline1.py # baseline1 = SumRec (from paper)
python src/ChatRec/create_recommend_data_baseline2.py # baseline2 = Baseline (from paper)
# Evaluate performance with method selection
# For proposal: automatically evaluates models 1-5 and computes average
# For baseline methods: evaluates single dataset
# Proposed method (evaluates all 5 models)
python src/ChatRec/evaluate_from_recommend_data.py --method proposal
# Baseline methods (single dataset each)
python src/ChatRec/evaluate_from_recommend_data.py --method baseline1
python src/ChatRec/evaluate_from_recommend_data.py --method baseline2# Train model with different data split for crowdworker evaluation
python src/Tabidachi/dpo_summary_llm_cloudworker.py
# Output: Model saved as src/Tabidachi/dpo-summary-results_cloudworker/# For baseline method (SumRec - without DPO)
python src/Tabidachi/create_cloudworker_dataset.py --method baseline1
# Output: data/Tabidachi/cloudworker-dataset-baseline1/
# For proposed method (with DPO-trained models)
python src/Tabidachi/create_cloudworker_dataset.py --method proposal
# Output: data/Tabidachi/cloudworker-dataset-proposal/
# Note: The script automatically selects appropriate models:
# - baseline1: Uses base Llama-Swallow models without DPO
# - proposal: Uses DPO-trained models (dpo-recommendation-results_1 and dpo-summary-results_cloudworker)- Distribute generated datasets to crowdworkers
- Collect preference ratings and qualitative feedback
- Save results in
cloud_worker_tabidachi_datasets.xlsx
# Analyze crowdworker evaluation results
python metrics_sentence.py
# Computes:
# - Average character length of summaries and recommendations
# - Distinct-1/2 scores for text diversity
# - BLEU and ROUGE scores for text similarity
# - Statistical significance tests- Baseline 1 (baseline1 in code = SumRec in paper): Original SumRec method without DPO optimization
- Baseline 2 (baseline2 in code = Baseline in paper): Simple baseline using only dialogue summaries without item recommendation information
- Proposed Method: Full implementation with DPO for both summarization and recommendation
Note: In the source code, baseline1 corresponds to "SumRec" and baseline2 corresponds to "Baseline" as described in the paper.
- w/o Rec-DPO: Proposed method without DPO for item recommendation information generation
- w/o Sum-DPO: Proposed method without DPO for dialogue summary generation
- Hit Rate (HR)@k: Proportion of test cases where correct item appears in top-k recommendations
- Mean Reciprocal Rank (MRR)@k: Average of reciprocal ranks of correct items
- Evaluated at k = {1, 3, 5}
- Human preference ratings comparing baseline and proposed methods
- Text quality assessment for naturalness and informativeness
- Analyzed using
metrics_sentence.pyfor statistical validation
- Tool: Optuna for automatic hyperparameter search
- Optimized Parameters:
- Learning rate: [1e-7 to 5e-5]
- Batch size: [1, 2, 4, 8]
- DPO Ξ² parameter: [0.01 to 0.5]
- Optimization Trials: 5 trials per model
- Selection Criteria: Best validation performance
- Tabidachi Corpus: Superior performance across all metrics (HR@1,3,5, MRR@1,3,5) compared to existing methods
- ChatRec Corpus: Consistently achieved best performance in MRR
- DPO training for dialogue summary generation is particularly important: Confirmed through ablation studies
- Qualitative improvement in generated text: More detailed text generation focusing on recommendation-related information
- Improvement in top ranks: Notable performance improvement especially in HR@1, MRR@1
βββ src/
β βββ Tabidachi/ # Main implementation for Tabidachi
β β βββ data_preprocessing.py # Raw data preprocessing
β β βββ create_dataset_1.py # Basic recommendation dataset creation
β β βββ create_dataset_2.py # Score predictor dataset creation
β β βββ create_dataset_3.py # DPO dataset for dialogue summary generation
β β βββ create_dataset_4.py # DPO dataset for item recommendation information
β β βββ train_deberta.py # DeBERTa score predictor training
β β βββ inference_deberta.py # DeBERTa inference utilities
β β βββ dpo_summary_llm.py # DPO training for dialogue summary generation model (creates model 1)
β β βββ dpo_summary_llm_more.py # Additional DPO training for dialogue summary generation models (creates models 2-5)
β β βββ dpo_summary_llm_cloudworker.py # DPO training for crowdworker eval
β β βββ dpo_recommendation_llm.py # DPO training for item recommendation information generation model (creates model 1)
β β βββ dpo_recommendation_llm_more.py # Additional DPO training for item recommendation information generation models (creates models 2-5)
β β βββ create_recommend_data_proposal.py # Proposed method recommendation
β β βββ create_recommend_data_baseline1.py # Baseline 1 (SumRec)
β β βββ create_recommend_data_baseline2.py # Baseline 2 (Simple)
β β βββ create_recommend_data_ablation1.py # Ablation: w/o Rec-DPO
β β βββ create_recommend_data_ablation2.py # Ablation: w/o Sum-DPO
β β βββ create_cloudworker_dataset.py # Crowdworker evaluation dataset
β β βββ evaluate_from_recommend_data.py # HR@k, MRR@k evaluation
β β βββ oss_llm.py # Base LLM wrapper
β β βββ rec_model.py # Item recommendation information generation model wrapper
β β βββ summary_model.py # Dialogue summary generation model wrapper
β β βββ [utility scripts] # create_csv*.py, count_*.py, etc.
β βββ ChatRec/ # ChatRec implementation
β βββ data_preprocessing.py # Raw data preprocessing
β βββ create_dataset_1.py # Basic recommendation dataset
β βββ create_dataset_2.py # Score predictor dataset
β βββ create_dataset_3.py # DPO dataset for dialogue summary generation
β βββ create_dataset_4.py # DPO dataset for item recommendation information
β βββ train_deberta.py # DeBERTa training
β βββ inference_deberta.py # DeBERTa inference
β βββ dpo_summary_llm.py # DPO training for dialogue summary generation model (creates model 1)
β βββ dpo_summary_llm_more.py # Additional DPO training for dialogue summary generation models (creates models 2-5)
β βββ dpo_recommendation_llm.py # DPO training for item recommendation information generation model (creates model 1)
β βββ dpo_recommendation_llm_more.py # Additional DPO training for item recommendation information generation models (creates models 2-5)
β βββ create_recommend_data_proposal.py # Proposed method
β βββ create_recommend_data_baseline1.py # Baseline 1
β βββ create_recommend_data_baseline2.py # Baseline 2
β βββ evaluate_from_recommend_data.py # Evaluation
β βββ oss_llm.py # Base LLM wrapper
β βββ rec_model.py # Item recommendation information generation model
β βββ summary_model.py # Dialogue summary generation model
β βββ [utility scripts] # count_*.py, logger.py
βββ data/
β βββ Tabidachi/ # Tabidachi datasets
β β βββ annotation_data/ # Original downloaded dataset
β β βββ processed_data/ # Preprocessed data
β β βββ datasets_1/ # Recommendation datasets
β β βββ datasets_2/ # Score predictor training datasets
β β βββ datasets_3/ # DPO datasets for summarization
β β βββ datasets_4/ # DPO datasets for recommendation
β β βββ recommend_data_*/ # Generated recommendation results
β β βββ cloudworker-dataset*/ # Crowdworker evaluation data
β βββ ChatRec/ # ChatRec datasets
β βββ chat_and_rec/ # Original downloaded dataset
β βββ processed_data/ # Preprocessed data
β βββ datasets_1/ # Recommendation datasets
β βββ datasets_2/ # Score predictor training datasets
β βββ datasets_3/ # DPO datasets for summarization
β βββ datasets_4/ # DPO datasets for recommendation
β βββ recommend_data_*/ # Generated recommendation results
βββ images/ # Documentation images
β βββ proposal_flow.pdf # Method flow diagram (PDF)
β βββ proposal_flow.png # Method flow diagram (PNG)
βββ metrics_sentence.py # Text quality analysis tool
βββ cloud_worker_tabidachi_datasets.xlsx # Crowdworker evaluation results
βββ requirements.txt # Python dependencies
βββ README.md # This file
data_preprocessing.py: Converts raw annotation data into structured format for experimentscreate_dataset_1.py: Creates basic recommendation dataset with dialogue-candidate pairscreate_dataset_2.py: Generates training data for score predictor (DeBERTa)create_dataset_3.py: Creates preference pairs for dialogue summary generation DPO trainingcreate_dataset_4.py: Creates preference pairs for item recommendation information DPO training
train_deberta.py: Trains DeBERTa-based score predictor for recommendation scoring- Supports command-line argument
--method [proposal&baseline1|baseline2] - Automatically configures METHOD_FLAG: True for proposal&baseline1, False for baseline2
- Creates method-specific output directories
- Supports command-line argument
dpo_summary_llm.py: DPO training for dialogue summary generation model with Optuna hyperparameter optimization (creates model 1)dpo_summary_llm_more.py: Additional DPO training for dialogue summary generation models (creates models 2-5)dpo_summary_llm_cloudworker.py: Special training for crowdworker evaluation with different data splitdpo_recommendation_llm.py: DPO training for item recommendation information generation model with Optuna (creates model 1)dpo_recommendation_llm_more.py: Additional DPO training for item recommendation information generation models (creates models 2-5)
create_recommend_data_proposal.py: Generates recommendations using the proposed DPO-trained modelscreate_recommend_data_baseline1.py: baseline1 = SumRec implementation from paper (no DPO)create_recommend_data_baseline2.py: baseline2 = Baseline from paper (simple method without item recommendation information)create_recommend_data_ablation1.py: Ablation study without recommendation DPO (w/o Rec-DPO)create_recommend_data_ablation2.py: Ablation study without summarization DPO (w/o Sum-DPO)create_cloudworker_dataset.py: Creates specialized dataset for human evaluation
evaluate_from_recommend_data.py: Computes HR@k and MRR@k metrics for all methods- Supports command-line argument
--methodfor method selection - For Tabidachi: proposal, baseline1, baseline2, ablation1, ablation2
- For ChatRec: proposal, baseline1, baseline2
- Automatically reads datasets 1-5 for proposal/ablation methods
- Supports command-line argument
metrics_sentence.py: Analyzes text quality with multiple metrics:- Average character length of summaries and recommendations
- Distinct-1/Distinct-2 scores for diversity measurement
- BLEU and ROUGE scores for text similarity
- Processes crowdworker evaluation results from
cloud_worker_tabidachi_datasets.xlsx
oss_llm.py: Base wrapper for Llama-3.1-Swallow modelrec_model.py: Wrapper for item recommendation information generation modelsummary_model.py: Wrapper for dialogue summary generation modelinference_deberta.py: Utilities for DeBERTa score prediction
# 1. Prepare environment
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 2. Download and place dataset
# Download from https://www.nii.ac.jp/dsc/idr/rdata/Tabidachi/
# Place in data/Tabidachi/annotation_data/
# 3. Run complete pipeline
cd src/Tabidachi
bash ../../scripts/run_tabidachi_experiments.sh # If script exists
# Or run manually:
python data_preprocessing.py
python create_dataset_1.py && python create_dataset_2.py
python create_dataset_3.py && python create_dataset_4.py
python train_deberta.py --method proposal&baseline1 # or baseline2
python dpo_summary_llm.py # Creates model 1
python dpo_summary_llm_more.py # Creates models 2-5
python dpo_recommendation_llm.py # Creates model 1
python dpo_recommendation_llm_more.py # Creates models 2-5
python create_recommend_data_proposal.py
python create_recommend_data_baseline1.py
python create_recommend_data_baseline2.py
python evaluate_from_recommend_data.py --method proposal # Evaluate proposed method# 1. Download and place dataset
# Clone from https://github.com/Ryutaro-A/SumRec
# Place data in data/ChatRec/chat_and_rec/
# 2. Run complete pipeline
cd src/ChatRec
python data_preprocessing.py
python create_dataset_1.py && python create_dataset_2.py
python create_dataset_3.py && python create_dataset_4.py
python train_deberta.py --method proposal&baseline1 # or baseline2
python dpo_summary_llm.py # Creates model 1
python dpo_summary_llm_more.py # Creates models 2-5
python dpo_recommendation_llm.py # Creates model 1
python dpo_recommendation_llm_more.py # Creates models 2-5
python create_recommend_data_proposal.py
python create_recommend_data_baseline1.py
python create_recommend_data_baseline2.py
python evaluate_from_recommend_data.py --method proposal # Evaluate proposed method- DeBERTa:
src/Tabidachi/deberta_best_model_proposal&baseline1/ordeberta_best_model_baseline2/ - DPO Summary:
src/Tabidachi/dpo-summary-results_[1-5]/ - DPO Recommendation:
src/Tabidachi/dpo-recommendation-results_[1-5]/ - Cloudworker Model:
src/Tabidachi/dpo-summary-results_cloudworker/
- DeBERTa:
src/ChatRec/ChatRec_deberta_best_model_proposal&baseline1/orChatRec_deberta_best_model_baseline2/ - DPO Summary:
src/ChatRec/dpo-summary-results_[1-5]/ - DPO Recommendation:
src/ChatRec/dpo-recommendation-results_[1-5]/
- Tabidachi recommendations:
data/Tabidachi/recommend_data_[method]/ - ChatRec recommendations:
data/ChatRec/recommend_data_[method]/ - Evaluation results: Console output and wandb logs
- GPU: Nvidia A100 80GB Γ 4
- Training Times (with 4 Γ A100 80GB):
- Llama-3.1-Swallow-8B DPO: ~24 hours per epoch (single epoch training)
- DeBERTa: ~4 hours per epoch (hyperparameter tuning with 1, 4, and 10 epochs)
To be added
Note: This implementation is created for research purposes. For commercial use, please verify the licenses of each model and dataset.