Recommendation system with collaborative filtering
Recommendation Systems
Predicting what a customer might want to consume next has significant economic value. However, you can only suggest a limited number of items at a time, so it’s crucial that your recommendations are as relevant as possible.
Recommendation systems can be found almost everywhere — on video streaming platforms such as Netflix, Disney+, and YouTube; audio streaming services like Spotify; and e-commerce platforms including Amazon, AliExpress, and Bol.com. In most of these cases, the systems are highly advanced and made up of many different components.
In this article, we will demonstrate a simple recommendation system based on the playtime of games played by users on Steam.
If you want to jump ahead to the commendation system demo if you want.
Collaborative filtering vs content-based filtering
Recommendation systems can generally be divided into two main approaches:
- Collaborative filtering: you are likely to enjoy items that similar users have enjoyed.
- Content-based filtering: you are likely to enjoy items that share similar properties with those you have already consumed.
Collaborative filtering focuses on the relationships between users and items. It doesn’t require any knowledge about the items themselves; only information about user behaviour, such as ratings, purchases, or engagement. If two users have shown similar preferences in the past, the system assumes their future interests will also align.
In contrast, content-based filtering relies on the characteristics or features of the items. For example, in a movie recommendation system, it might use metadata such as genre, director, cast, or description. The system then recommends other items with similar attributes to what the user has already liked.
While collaborative filtering can uncover unexpected relationships between users and items, it struggles with new users or new items (the cold start problem). Content-based filtering, on the other hand, can make immediate recommendations for new users as long as their preferences are known; but it may fail to suggest diverse or surprising options outside the user’s existing interests.
Project plan
This project was carried out in several steps:
- Data cleaning (not shown in the visual)
- Data preparation: applying logarithmic transformation and IDF scaling
- Output 1: UMAP data visualisation
- Output 2: KNN applied on SVD features
These steps are breaken down in the following sections.
Data
In this article, we will explore the power of collaborative filtering using data from Steam. After cleaning the dataset, we obtained information on 3,307 users and 4,851 games.
The dataset contains the playtime (in hours) per user per game, which represents each user’s level of interaction with a game. Throughout this article, I will refer to the consumer as customer, user, or player, and to the item as a game or product.
Based on this data, we can predict how much time a customer is likely to spend on a new game, given what other, similar customers have played.
The number of customers in this dataset is lower than ideal; ideally, we would have several times more users than items. Because of this imbalance, some of the recommendations may appear slightly unusual, when a few users exhibit less typical behaviour. However, collecting and preparing this data was already quite time-consuming, so it will serve well enough for demonstration purposes.
| user_id | game | playtime | |
|---|---|---|---|
| 507 | 2280 | Undertale | 10.0 |
| 818 | 2206 | Half-Life 2: Deathmatch | 2.7 |
| 452 | 2280 | The Wolf Among Us | 26.8 |
| 368 | 95 | Tales of Berseria | 0.0 |
| 242 | 95 | The Witcher: Enhanced Edition | 9.2 |
| 929 | 2542 | Batman: Arkham Asylum GOTY Edition | 33.6 |
| 262 | 95 | Call of Duty: Modern Warfare Remastered (2017) | 4.9 |
| 810 | 2206 | Sid Meier's Civilization VI | 6.2 |
| 318 | 95 | Arma: Cold War Assault | 0.0 |
| 49 | 2486 | BioShock Remastered | 18.0 |
Data Preprocessing
Before building our recommendation model, we need to clean and prepare the data so that it’s more suitable for generating meaningful recommendations.
Removing Rarely Played Games
Our dataset contains many more games than users, and a large number of these games are rarely played. To ensure data quality, I applied two filters:
- Games must have an average playtime greater than 2 hours.
- Games must have been played by at least 5 players.
Additionally, a player must have played a game for at least 1 hour for it to be considered. While this threshold is minimal, even a small amount of playtime can provide useful information about player interest.
Transforming Playtime
Raw playtime data can be quite erratic. For example, if one game takes around 10 hours to complete and a user decides to replay it ten times, does that mean they like it ten times more than someone who finishes it once? Or if one user has 20 hours of free time each week and another has only 10 hours per month to spare, does that mean that the first user likes the game more than the second user?
These inconsistencies make raw playtime misleading. To reduce skewness and make the data more comparable, we apply a natural logarithmic transformation. In other words, consuming roughly 2.7 times more playtime increases the transformed value by +1. For instance, a player who spends 10 hours on a game receives a value of 2.3, while another who plays 100 hours gets 4.6. This transformation prevents heavy players from dominating the data and gives moderate players more balanced representation in the algorithm.
Scaling Across Games and Players
You could also consider scaling playtime per game or per player. Some games naturally require more time to complete; for example, Portal might take around 20 hours, whereas an MMORPG can easily exceed 100 hours without reaching any final objective. Likewise, some players generally spend much more time gaming than others. Although the logarithmic transformation already mitigates this effect, we can apply per-user scaling to further normalise players onto the same scale.
I experimented with several scaling techniques on both players and games, including normalisation (max-abs scaling), standardisation (without centring around the mean), quantile scaling, and TF-IDF without the TF component (that is, using only inverse document frequency (IDF)).
Of these methods, only the IDF approach produced somewhat promising results. However, it behaves quite differently from traditional scaling methods; it boosts games that are more rarely played, giving additional weight to niche items that fewer users engage with.
For this version, no additional scaling has been applied. This means that longer and shorter games, as well as players who play more or less, are treated equally in the dataset.
Visual recommendations
One way to explore which products might suit you best is by visually inspecting which items are used by the same customers. However, this becomes challenging when each customer interacts with hundreds or even thousands of products — the data then exists in an extremely high-dimensional space. To make it interpretable, we need to reduce it to fewer dimensions, preferably two, so that it can be plotted and visually explored.
Techniques such as PCA (Principal Component Analysis) or SVD (Singular Value Decomposition) can be used for dimensionality reduction, but they are limited to linear combinations of the original features. If you want clearer separation between groups of similar items, UMAP (Uniform Manifold Approximation and Projection) is a very good alternative. Although UMAP is less intuitive in how it arranges points compared to PCA or SVD, it generally creates more distinct clusters, which helps reveal patterns in user behaviour.
Below you can see a projection of the product space, where IDF was applied across players before performing UMAP dimensionality reduction. This combination produced the most distinct and visually meaningful groupings.
Check for similar consumers
Another way to recommend games is by directly examining what similar users have played. A simple approach is to look at what a certain number of other people — who enjoyed the same games — have also played. This technique is known as the nearest neighbours method.
However, our dataset contains only about 3,307 users and 4,851 games. While that might sound substantial, it actually means that many possible user–game combinations are missing. As a result, this approach can be quite hit or miss, since the system often lacks enough overlap between users to make confident recommendations.
Checking for Similar Archetypes
Our data is very sparse, meaning it contains many zero values and only a few recorded interactions per user. This sparsity caused problems with the previous nearest neighbours approach, where it was difficult to find enough overlapping users.
However, many games are naturally similar, and similar players might have played slightly different ones. For example, perhaps you played Satisfactory while your friend played Factorio. Although these games are not identical, they are clearly more alike than, say, Counter-Strike 2.
So instead of comparing users directly, we can look for player archetypes — groups of players who share similar underlying preferences. We can create these archetypes through dimensionality reduction, compressing the consumer space into a smaller, latent “archetype” space. Once this transformation is applied, it becomes much easier to compare games and identify meaningful relationships between them.
Try it out below:
There are several design choices involved here — for example, which scaling method and which dimensionality reduction technique to use. Through experimentation, the combination of TF-IDF (without the TF component) and Singular Value Decomposition (SVD) with 27 dimensions produced the most intuitive results.
It’s also fun to experiment with combinations of games — when you select multiple games, the system calculates the mean vector of their embeddings, effectively blending their archetypal features to suggest related titles.
Discussion
The demo appears to work quite well overall. The visualisation is engaging, and the recommendations are reasonably good.
That said, there is still room for improvement:
- No content-based information has been incorporated yet.
- The number of users in the dataset should ideally be much larger.
- A non-linear dimensionality reduction technique, such as an autoencoder, might yield better results.
- Scaling games by expected engagement could also be beneficial if implemented carefully.
Of these points, the sample size is likely the most impactful. As with most machine learning models, the quality and quantity of data matter more than the complexity of the algorithm. No matter how sophisticated your model is, it cannot perform “magic” without enough meaningful data. Some of the recommendations still feel a bit random, and that’s probably due to the small dataset.
At some point, I’ll probably reuse this dataset in my machine learning course, which I’m currently developing. It’s a fun and practical dataset — an excellent example for illustrating recommendation techniques.
All in all, this prototype works well enough for the scope of this article. I hope you enjoyed it and perhaps even found it useful.