Enhancing AI Recommendations: A Study on ChatGPT’s Conversational Refinement and Bias Mitigation

Fibo Quantum

Mastering prompt design in interactions with Chatbot AIs, including ChatGPT and Character AI, is crucial for achieving precise and relevant results. Recently, a paper titled “ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback” by Kyle Dylan Spurlock, Cagla Acun, and Esin Saka presents an in-depth analysis of enhancing recommendation systems using Large Language Models (LLMs) like ChatGPT. It focuses on the effectiveness of ChatGPT as a top-n conversational recommendation system and explores strategies to improve recommendation relevancy and mitigate popularity bias​​.

The study also delves into the current state of automated recommendation systems, highlighting the limitations of existing models due to their lack of direct user interaction and the superficial nature of their data interpretation. It emphasizes how the conversational abilities of LLMs like ChatGPT can redefine user interaction with AI systems, making them more intuitive and user-friendly​​.


The methodology is comprehensive and multifaceted:

Data Source: The HetRec2011 dataset, an extension of the MovieLens10M dataset with additional movie information from IMDB and Rotten Tomatoes, is used​​.

Content Analysis: Different levels of content are created for movie embeddings, ranging from basic information to detailed Wikipedia data, to analyze the impact of content depth on recommendation relevancy​​.

User and Item Selection: The study used a small, representative user sample to minimize variance and ensure reproducibility​​.

Prompt Creation: Different prompting strategies, including zero-shot, one-shot, and Chain-of-Thought (CoT), are employed to guide ChatGPT in recommendation generation​​.

Relevancy Matching: The relevancy of recommendations to user preferences is a key focus, with feedback used to refine ChatGPT’s outputs​​.

Evaluation: The study employs various metrics, such as Precision, nDCG, and MAP, to evaluate the quality of recommendations​​.


The paper conducts experiments to answer three research questions:

Impact of Conversation on Recommendation: Analyzing how ChatGPT’s conversational ability influences its recommendation effectiveness.

Performance as a Top-n Recommender: Comparing ChatGPT’s performance to baseline models in typical recommendation scenarios.

Popularity Bias in Recommendations: Investigating ChatGPT’s tendency towards popularity bias and strategies to mitigate it​​.

Key Findings and Implications

The study highlights several key findings:

Content Depth’s Influence: Introducing more content in embeddings improves the discriminative ability of the model, though a limit exists to this improvement​​.

ChatGPT vs. Baseline Models: ChatGPT performs comparably to traditional recommender systems, underscoring its robust domain knowledge in zero-shot tasks​​.

Managing Popularity Bias: Modifying prompts to seek less popular recommendations significantly improves novelty, indicating a strategy to counteract popularity bias. However, this approach involves a trade-off between novelty and performance​​.


The paper presents a promising direction for incorporating conversational AI, like ChatGPT, in recommendation systems. By refining recommendations through reprompting and feedback, it demonstrates a significant advancement over traditional models, especially in terms of user engagement and handling of popularity bias. This research contributes to the ongoing development of more intuitive, user-centric AI recommendation systems.

Image source: Shutterstock

Wood Profits Banner>