Return to Article Details Data Contamination or Genuine Generalization? Disentangling LLM Performance on Benchmarks Download Download PDF