The Trustworthiness of Large Language Models in Long Context Recall
Primary Investigator:
Julia (Taylor) Rayz
Yifei Hu
Abstract
Long Context QA has become one of the common use cases for Generative Large Language Models (LLMs). We challenge the LLMs with a practical long context recall and reasoning task by asking the LLMs to repeat certain sentences from a given long document. Our results suggest that even the state-of-the-art LLMs still cannot perfectly recall the original text in this setting.