The `lm()` function in R

statistics

linear models

computation

Author

Ethan Tse

Published

December 21, 2025

Modified

December 24, 2025

Preamble

I think linear models are among the most commonly used tools in biomedical research, and I think it would be to the benefit of many to have a better understanding of such a fundamental tool.

Linear models also sit at the core of many statistical methods and machine learning models, which is why they appear so prominently in statistics and data science courses. Althought they appear simple, they are deceptively complex, with many statisticians spending their entire careers learning and developing techniques to improve linear models.

As such, decades of methodological work have expanded their flexibility, allowing them to handle surprisingly complex experimental designs while remaining relatively easy to interpret. This balance between expressiveness and interpretability is a major reason they continue to be used so widely.

Throughout my training, I have generally been encouraged to start with linear models and to move on to more complex approaches only when simpler ones are clearly inadequate. In practice, especially in biomedical research, more sophisticated models often need to perform substantially better than linear models to justify the added complexity and loss of interpretability. This trade-off becomes particularly important when the goal is inference—understanding relationships in the data—rather than prediction.

In this post, I focus on the most basic case: linear regression, which can be viewed as a building block for more advanced linear modeling frameworks.

Because the emphasis here is on [computation], I will keep the theoretical discussion light. Readers interested in the statistical foundations of linear models can refer to upcoming posts in the [theory] series.

We will work primarily with the stats::lm() function in R (which I think is a language very well suited for classical statistical analysis), and explore how it behaves in practice, its many function arguments, and how to interpret its output.

Linear models in R

Code

# load libraries
library(ggplot2)
library(titanic)

Code

# visualize the dataset

knitr::kable(head(titanic_train))

PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
1	0	3	Braund, Mr. Owen Harris	male	22	1	A/5 21171	7.2500		S
2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Thayer)	female	38	1	PC 17599	71.2833	C85	C
3	1	3	Heikkinen, Miss. Laina	female	26	0	STON/O2. 3101282	7.9250		S
4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35	1	113803	53.1000	C123	S
5	0	3	Allen, Mr. William Henry	male	35	0	373450	8.0500		S
6	0	3	Moran, Mr. James	male	NA	0	330877	8.4583		Q

Code

# fit a linear model

fit <- stats::lm(Fare ~ Age, data = titanic_train)
fit


Call:
stats::lm(formula = Fare ~ Age, data = titanic_train)

Coefficients:
(Intercept)          Age  
      24.30         0.35

When we call stats::lm in R, we are calling the lm() function from the base stats package. When we “fit” a linear model, we are

Disclaimer

This writeup represents my knowledge of the topic, and I by no means claim to be an expert.

Please Email me with any comments.