How I Built the World's Largest Database of LEGO Minifigure Ratings

6 November 2024 by Graham

How I Built the World's Largest Database of LEGO Minifigure Ratings

It All Started with Mark Zuckerberg and FaceMash

Whenever someone asks me about Brickelo, which very rarely happens I must admit, I begin by asking them if they've seen The Social Network. Perhaps the coolest scene from that movie is the one where we see Mark Zuckerburg build FaceMash - a website for rating female students at Harvard.

While the website was very questionable, the ELO algorithm behind it has always interested me, as it seemed like the most scientific approach to generating a ranked list of items. Far better than getting people to rate something on a numerical scale or getting people to pick their top five etc.

The ELO algorithm was first devised to rank chess players and it works by increasing or decreasing a player's rating based on whether they win or lose a match. The amount a player's rating is increased or decreased following the match outcome depends on the rating of their opponent prior to the match.

To give you an example from the tennis world, let's say Roger Federer plays the world number 100. Federer is arguably the greatest player of all time, so he would be expected to beat someone ranked much lower than himself, who is presumably much worse. Therefore, Federer's rating wouldn't increase much if he won the match, and likewise, his opponent's rating would only decrease slightly. Conversely, if the world number 100 beat Federer, the result suggests that Federer's opponent is better than him and far above the level of his current world ranking. As such, his rating would increase by a lot and Federer's would decrease by the same amount. Using this system, a player who's beaten 50 players ranked far below them can have a lower overall rating than a player who's played fewer matches but has beaten better opponents.

Proof of Concept

Outside of sports, I couldn't find many uses of the algorithm, so for a while I'd thought about applying it elsewhere. The world of LEGO was to provide me with a suitable opportunity. It's a hobby I'd been getting back into, largely because of my brother who was also undergoing a LEGO renaissance, and it had since become somewhat of an obsession. Together we built Brick Ranker, a website primarily aimed at tracking the value of LEGO sets and minifigures. It was while working on this project that I realised there was no definitive list of the “best” LEGO minifigures. On Brick Ranker we have lists of the most expensive minifigures, but are these people's favourites? I highly doubted there would be much correlation between the two and so the idea of putting together a robust list of the best LEGO minifigures was born. More than anything, I was curious to see how my personal favourites compared to everyone else's.

There are several community generated lists of the best LEGO sets, including our own at Brick Ranker, which is determined by our users' ratings of sets on a 10-point scale. Given that there are a lot more minifigures than sets, the disadvantage of using this system for minifigures is that it's likely that a large number of minifigures would go unrated. This is because most people will only rate something if they feel strongly about it (i.e., very good or very bad). Assigning a rating from 1 to 10 also takes time as people need to think about what each number means and potentially use their other ratings as a reference point. If you have thousands of minifigures to compare, you want users to be able to rate them as quickly as possible.

An ELO algorithm on the other hand seemed like a perfect approach, as unlike sets, a judgement on which of two minifigures is better can be made by just looking at them. Selecting your favourite of two minifigures is also simple and can be done very quickly. This low barrier to entry was going to be key to making the site a success because I would need a lot of data to generate robust results.

If we take a look at the Bricklink catalogue, there are around 17,000 minifigures at the time of writing. Not all of these are minifigures in the traditional sense, as according to LEGO, minifigures are those that use at least two of the three standard head, torso, and legs parts. This official definition technically rules out most Star Wars droids, skeletons, and baby minifigures to name but a few. More clear cut exclusions are Duplo figures, mini-doll figures (i.e. figures from LEGO Friends, Elves, etc), and brick-built figures.

Let's assume that if we exclude all these we're left with 10,000 minifigures for users to compare (although in reality this figure is much higher). This equates to just under 5 million different 1 vs. 1 combinations. To get robust ratings, we'd want each minifigure to be compared against every other minifigure multiple times, meaning several million user ratings are needed. At the time of writing we have just over half a million ratings, meaning that in theory around 10% of the possible combinations have occurred. I soon realised this was one of the main challenges in using an ELO algorithm to rank such a large dataset.

However, generating that level of data was not unachievable. As I mentioned earlier, I made the barrier to entry as low as possible, so a user can visit the site and within five minutes have easily compared 50 minifigures. The potential audience that would be interested in using the tool is also very large. The LEGO subreddit has nearly 2 million members, and Brickset, one of the largest LEGO websites, received nearly 12 million visitors last year. Of course, I could only dream of that level of traffic for Brickelo, but my point is the LEGO community is large enough for this approach to work.

Designing Brickelo

With the proof of concept in place, the next step was to build the thing. I had no web development experience, except for some brief dabbling in HTML, CSS, and Javascript. My input into Brick Ranker was primarily non-technical, so this would be my first solo project. My brother is a web developer by profession, so fortunately I could ask him for some steer. He recommended using a JavaScript framework called React, which has gained popularity in recent times, and is used by many large organisations. Amongst its benefits are its relative simplicity and its speed, so it seemed a good choice. Little did I know, there were large drawbacks that might have affected my choice of framework had I known these from the beginning.

As I was starting from almost zero, it was a couple of months before I had the first version of the website ready to launch. A large part of that time was spent learning React and putting together the backend, but a considerable amount of time was also spent thinking about how the website should look. I settled on a dark theme as this brought the minifigures into focus. I also chose to present the user with the minimum amount of information when asking them to compare minifigures as I want to remove as much bias as possible.

Initially, I planned to present users with two minifigures randomly selected from the whole database. This meant they would often be presented with quite obscure minifigures to choose between. I thought this might put some people off because if someone is only interested in Star Wars minifigures, they could get fed up if they don't see these, given how large the database was. I decided to add the ability to filter based on popular themes, such as Pirates of the Caribbean, The Lord of the Rings, and The Hobbit, so users could see their favourites more often.

By adding this feature, minifigures from the aforementioned themes would appear in far more matchups than average and therefore would have an advantage. Ideally, you want all minifigures to have undergone a similar number of matchups because minifigures that have undergone more matchups may have a higher rating just because they've made more opportunities to accumulate points and not because they are better than those that have undergone fewer matchups. In theory, this is only a short-term problem, because there will be a threshold number of matchups, after which a minfigure's rating stabilises. I had no idea what this threshold number was though.

I had a lot of fun in the testing phase of the project, which I thought was a good sign. I found making comparisons quite addictive and I realised it was a great tool for discovering minifigures. My LEGO knowledge was pretty good, but I was coming across minifigures I never knew existed. Once I'd made sure everything worked, it was time to let everyone else know about it…

Going Live!

Brickelo went live in August 2023, much to my pride and excitement. I believed I'd created something brilliant and naturally assumed it would instantly take off. However, August turned to November without any traffic whatsoever and I quickly learnt that you can build the best website in the world, but if you don't tell anyone about it, no one will use it.

I needed to get the word out somehow, but I didn't have any social media presence or the know-how or motivation to build one. I tried Hacker News, but my post went largely unnoticed. I tried emailing LEGO employees, which I thought was a genius tactic, and while it generated some traffic there was close to zero engagement. I found it odd and disenchanting that LEGO employees have no interest in LEGO. Reddit did however prove to be a happy hunting ground. I posted several times and on each occasion got a lot of positive and encouraging feedback. One post led to around 2,700 visitors and around 210,000 ratings.

As Brickelo celebrated its 1-year anniversary, there were a little over half a million ratings. This seems like a large number, but I was hoping the figure would be at least double that. However, it is enough to have produced some compelling results. The top 100 list also seemed to be very credible and I was pleased to see it wasn't dominated by the most expensive or minifigures of the most popular characters. It does feel like there's an over presentation of minifigures from the Pirates of the Caribbean, The Lord of the Rings, and The Hobbit and it will be interesting to see if this stays that way.

The Road Ahead

To bring long term success, I needed the site to be self-sustaining. Unfortunately, React isn't an SEO friendly framework. Without going too much into the technical details, very little of the website's content is pre-rendered, meaning it's effectively invisible to search engines. This meant that except for times when I posted on Reddit, on a fairly good day, I was getting around 10 users and a couple of hundred ratings and there'd been little change over time. This led me to give up a little on the project at times as it felt like flogging a dead horse, so since I launched Brickelo, I've mainly just kept it up-to-date and haven't rolled out many new features.

However, I cared too much to drop Brickelo completely so I went away and learnt about SEO and what I could do to make my site more visible, without the need to completely rebuild it. With a regular user base I believe Brickelo could become a powerful tool and I'd love to see what kind of results are produced from significantly more data. I have a lot more developments planned, and it feels like Brickelo is just at the beginning of its journey, so watch this space…

Final Thoughts

I'll finish by saying that I started this with the main aim of finding out if other AFOLs shared my opinion on which LEGO minifigures are the best. My own list of favourites has evolved over the process of building Brickelo, but by and large I agree most minifigures occupying the top 100 deserve their place there. Where exactly my top 10 rank is an article for another day…