Press Enter to skip to main content,
or keep pressing Tab key no navigate
TSU-logo COSET-logo

Research Seminar

January 30, 2026

Location: SB 156

Time: 12:00 pm

Presenter: Hadley Wickley

LLMs for data science

Abstract: In this talk, I explore how we can integrate Large Language Models (LLMs) into data science workflows using R and the ellmer package. I start by addressing the reality that LLMs often kind of suck: they fail at simple computations like multiplying large numbers, they struggle to count characters in a word, and their results are stochastic and rarely admit doubt. Despite these limitations, I demonstrate that they can still be incredibly useful for tasks ranging from creative writing and game generation to solving practical coding problems when we approach them with the right tools.

I focus on two main technical concepts to make these models reliable: structured data and tool calling. I show how to force LLMs to return structured R objects, like lists or data frames, rather than unstructured text, which allows us to perform tasks like extracting names and ages from data efficiently. I also explain “tool calling,” where we register R functions as tools—effectively giving the model a way to interact with the world. By defining tools, we can create agents that can check the date, perform accurate math, or even execute system commands like deleting files (though perhaps with a bit of caution!).

Finally, I discuss how LLMs are excellent for “jumpstarting” work and translating code. Whether it is translating SQL to dplyr, converting images to Shiny apps, or writing unit tests and documentation, LLMs can handle the “bad first draft” that compels us to fix and refine our code. I conclude by briefly touching on valid concerns regarding cost, environmental impact, and privacy, while emphasizing that for individual R developers, these tools offer a powerful way to accelerate our day-to-day programming tasks


Hadley Wickham is Chief Scientist at Posit PBC, winner of the 2019 COPSS award, and a member of the R Foundation. He builds tools (both computational and cognitive) to make data science easier, faster, and more fun. His work includes packages for data science (like the tidyverse, which includes ggplot2, dplyr, and tidyr)and principled software development (e.g. roxygen2, testthat, and pkgdown). He is also a writer, educator, and speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz.

Light Lunch will be served.