In the first part, we discussed what automatic difficulty adjustment algorithms can be like. Using seeds, we generated tile layouts on the level and conducted an A/B test where, based on some hypotheses, we gave the player the same level but with different difficulty.
Based on the test results, we obtained some metrics. Let's see what that might look like.
I should mention right away that I am not showing real levels, metrics, or problems; they will all be made up. My task is to show how to act in a given situation and what to pay attention to.
What Metrics Should We Look At?
I'm sure your company has its own set of metrics that you constantly monitor, but for analyzing match3 level A/B tests, the following are sufficient:
- Monetization — the amount of coins (equivalent to real money or in-game currency) spent to pass the level. We count not only money spent with real currency but all expenditures of bonuses and coins on this level and divide by the number of players. Thus, we get a characteristic of the level's monetization.
- Difficulty — each company has its own approach; this can be attempts per level completion, win rate, or fail rate. You need to consider difficulty without using bonuses/extra moves and difficulty with them separately.
- Churn (7-day) — the percentage of players who started passing the level and did not return to the game after 7 days (you can use 3 or 5 days).
What conclusion can we draw from sample A/B test data? When automatic difficulty adjustment was enabled, earnings on the level dropped significantly, and churn increased significantly. In this situation, we can conclude that the level became worse. (You most likely understand that you cannot evaluate a specific level in isolation from the other levels; this is not enough, and you should look at metrics for a range of levels.)
Why could this happen? The first reason might be when exactly our algorithm started providing an easier version of the level; maybe it happened too early? The player wasn't ready to spend their bonuses/coins on passing the level yet? Or maybe you were unlucky, and the seed you fixed is bad? Yes, it's difficult, but when losing, a lot of level goals remain?
Every Company Has Its Own Set of Metrics
Some look only at monetization and churn, while others additionally analyze a lot of other things, for example:
- Number of reshuffles on the level — reshuffles are unpleasant, players don't like them. Anything that results in 2 or more reshuffles on a level is a reason to re-evaluate the level.
- Number of bonuses collected — players love levels with lots of bonuses; they are enjoyable to play.
- Number of remaining goals upon losing — control the percentage of level completion upon losing.
- Fuuu factor.
- Number of remaining moves upon winning — average or median value.
- Level time — our level is limited by the number of moves, but time is spent thinking about a move and on bonus animations.
- Distribution of collected goals throughout the level.
How do these metrics help? These are additional characteristics of the level. With their correct application, you can collect statistics from existing levels and improve those levels where you find deviations. At the same time, some levels will have deviations in characteristics but show great monetization metrics or low churn. This is normal, and you cannot be limited by dry statistics here. Accept exceptions, try to draw conclusions, and share your findings with the team.
We need to realize that the 'fuuu factor' and similar metrics are not the main indicators of success. Every level provides a unique player experience — it's built on emotions. And can you really measure emotions?
A huge scope for analysis and improvement opens up before you. This should be a continuous process. You must analyze player metrics, analyze new levels, and create them according to the information that is relevant to your project, to your audience.
What About New Elements?
Okay, we have an algorithm, we have information on thousands of levels from real players. What about new elements? The algorithm doesn't know how to work with them. Yes, that's a problem, and when a new element appears, the algorithm will make mistakes, but the more data you have, the smaller the deviation will be.
We have smoothly transitioned to the main question. Is it mandatory to use automatic difficulty adjustment algorithms during the creation stage of a new project? My opinion is that you should not spend time on this. You have a huge number of questions that you need to solve. Focus your efforts on pleasant gameplay, beautiful effects and animations, think through the style and rules of level design, set up live ops. And only after you are sure that your project can be profitable, you can increase the team and work on improvements. At the initial stage, you need level designers who understand how to create quality and interesting levels. At the same time, if you have problems in other parts of the game, even high-quality levels will not be able to take the project to a completely different level.
I want to send a big hello to all the guys with whom we once worked on these or similar issues. You are true professionals and heroes. I am grateful to you for the interesting conversations and your efforts.

