Conformal prediction made easy +👋 puncc
🎯 This article aims to introduce newcomers to the topic of conformal prediction. Don’t worry; it might sound complicated, but it isn’t (if I do my job correctly, that is 👊 🔥).
I will intentionaly avoid technical details, to focus on providing the reader with definitions and concepts surrounding the questions of what conformal prediction is, and why would one want to use it.
🔍Uncertainty quantification
Conformal prediction is a family of uncertainty quantification techniques. In other words, it aims to answer this type of question:
“How confident am I that … ?”
It can be used in the context of various machine learning problems, but here, I will focus on the two most basic applications: classification and regression problems.
Notice that the two are prediction problems (remember we do conformal prediction). I will refer to the first as conformal classification, and to the second as conformal regression.
Now, what is the difference between these two flavours of conformal prediction?
- conformal classification makes it so that the final output is a set of classes (called a conformal prediction set) as opposed to a single class.
- conformal regression makes it so that the final output is an interval (called a conformal prediction interval) as opposed to a point-prediction.
You can see a trend emerging here. Conformal predictions do not yield a single definitive answer; instead, they consist of multiple plausible ones.
📚 Conformal story-time
It might be easier to understand with examples, so below are two short “conformal stories” (this is not a real word, so don’t quote me on that please 😸):
#1 — 🍕
Me: “What do I want to eat tonight?”
(Also) me: “A pizza.”
VS
Conformal me: “Hmm, a pizza, a hamburger or fried chickens would all be fine.”
The above can be seen as a conformal classification story, where the classes are food items, and the features are dietary restrictions, healthiness-related goals, etc…
#2 — 🚴
Me: “At what time will the food be delivered?”
[Your favourite food app]: “At 9 PM.”
VS
[Your favourite conformal food app]: “Hmm, between 8:50 PM and 9:10 PM.”
The above can be seen as a conformal regression story, where the target is an ETA (estimated time of arrival), and the features are the GPS position of the delivery worker responsible for the command, the state of the traffic, etc…
Now, why would I want to have a prediction set (or an interval in the context of a regression problem) as opposed to a single class (or a point-prediction)? If you are like me, you usually want a straight answer to your questions, not multiple plausible ones…
💰🏡 Coverage and conformal regression on the selling price of houses
There is still a big piece of the puzzle I have yet to uncover : the concept of “coverage”.
Conformal techniques ensure that the prediction sets (or interval) contain the true class (or value), a particular percentage of the time. This percentage is the coverage, and it can be fixed by the user.
Once again, to explain with an example :
- The user wants to predict the selling price for a set of houses , using various data such as the localisation, the size of the house, etc…
- The user uses a conformal regression algorithm, with a regression model (for instance, a linear model from sklearn). He asks for a coverage value of 90%.
- Provided that a set of conditions are met, the user will obtain prediction intervals that contain the true value 90% of the time*.
For instance : “I am confident that there is a 90% chance that this particular house’s selling price is between 400 000 $ and 600 000 $.” 😃
*In reality, the observed coverage (i.e. the ratio of intervals that truly contain the selling price to the overall number of houses evaluated) will be approximately 90% rather than exactly 90%. The nitty-gritty technical stuff that explains this discrepancy is beyond the scope of this article, so we won’t delve into it here.
For the sake of completeness, I will give you the set of conditions evoked in the example above:
- One needs a holdout dataset of annotated data. This data is used to calibrate the model, so that it can produce conformal predictions.
- The calibration data above and the data on which we use our model to produce conformal predictions must be exchangeable*.
*If data points are independent and identically distributed (i.i.d.), they are exchangeable. In simpler and more practical terms, if the distribution of the calibration data does not differ too much from the distribution of the data on which we use our model to produce conformal predictions, you can safely use conformal prediction and get the desired coverage.
Now that you have a more complete picture of what is conformal prediction, we can tackle the “why”.
🩺Why use conformal prediction
Conformal prediction becomes particularly valuable in situations where understanding and managing uncertainty are critical.
Consider a scenario in which a machine learning model is developed to predict the presence or absence of a certain medical condition based on patient data, such as medical imaging. In a conventional approach, the model might output a single prediction, say positive/negative.
Now, let’s introduce uncertainty quantification using conformal prediction. Instead of providing a single prediction, the model with conformal prediction provides a prediction set that comes with a measure of confidence (or uncertainty depending on how you see it).
Here’s how it might work in this scenario:
- Conventional Approach — possible outcomes:
a/ Model predicts: “Patient has Condition X.”
b/ Model predicts: “Patient does not have Condition X.”
Notice that without further work, we do not have a sense of confidence attached to these predictions.
2. Conformal Prediction Approach — possible outcomes:
a/ Model predicts: “There is a 90% chance that the patient has Condition X.”
b/ Model predicts: “There is 90% chance that the patient does not have Condition X.”
c/ Model predicts: “There is 90% chance that the patient has Condition X or does not have it.” (prediction set of size 2)
D/ Model predicts: “…” (prediction set of size 0)
There are several interesting things to notice here.
Firstly, we have a sense of confidence attached to our predictions, which is nice.
Secondly, one can obtain a prediction set of size 2, which is quite useless as you have surely realized !
Thirdly, one can a prediction set of size 0! This peculiar behaviour usually happens when the coverage asked is too low, relative to the performance of the model. I will perhaps adress this phenomenon in more details in another notebook.
This goes to show that conformal prediction is not without flaws. Ideally, one wants to get small — but not empty — prediction sets (or intervals), to have actionable insights.
You should by now have a high-level understanding of what conformal prediction is.
Next, let’s get your hands dirty!
Getting your hands dirty with 👋puncc
👋puncc is an open-source Python library that enables the use of various conformal prediction algorithms, in the context of various machine learning problems: classification, regression and anomaly detection.
To understand how to use it, I invite you to explore the following:
Repository
Documentation
Quickstart
🕺 Thank you
Thank you for reading !
Please tell me if this helped and do not hesitate to leave remarks 😄
Before you leave…Going a step further ⚡️
If you want to get a deeper understanding of the topic, here are a few things you could consider doing:
- Watch this video
- Code by hand (no libraries) the following cases
Conformal regression (Split CP, CQR, CV+, LACP, …) with a simple dataset such as California house pricing
Conformal classification (naive, APS, RAPS, …) with a simple dataset such as MNIST
- Read the following articles
Angelopoulos et al. 2022, https://arxiv.org/pdf/2107.07511.pdf , “A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification”
Lei et al. 2018, https://arxiv.org/pdf/1604.04173.pdf, “Distribution-Free Predictive Inference For Regression”
Romano et al. 2019, https://arxiv.org/abs/1905.03222, “Conformalized Quantile Regression”
Romano et al. 2020, https://arxiv.org/abs/2006.02544, “Classification with Valid and Adaptive Coverage” (APS)
Angelopoulos et al. 2022, https://arxiv.org/abs/2009.14193, “Uncertainty Sets for Image Classifiers using Conformal Prediction”
Barber et al 2019, https://arxiv.org/abs/1905.02928, “Predictive inference with the jackknife+”
Ryan et al 2020, https://arxiv.org/abs/1904.06019, “Conformal Prediction Under Covariate Shift”