
Filter papers
Remove All Filters
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions
Code
Collaboration
Evaluation
Reasoning
Tooling
Code
Collaboration
Evaluation
Reasoning
Tooling
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Reasoning
Pre-Training
Data
Interpretability
Reasoning
Pre-Training
Data
Interpretability
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Safety
Reasoning
Interpretability
Safety
Reasoning
Interpretability