city scape
Anthem Safer Play

Responsible Gambling with Data Science: The Next Steps

Posted by Dr Edmond Mitchell, Lead Data Scientist on November 20 2020

In October 2019 in an effort to raise player protection standards the United Kingdom Gambling Commission (UKGC) instructed gaming operators to make progress on

  • Markers of Harm
  • Customer Interaction
  • Affordability
  • Advertising

Improving customer interaction, affordability and advertising is relatively straight forward as companies have bodies of previous work that can be improved upon, protecting players by understanding Markers of Harm requires an academic approach to be undertaken.


Markers of Harm

But before we jump into the nuts and bolts of Markers of Harm, lets first explain why utilization markers of harm can help protect players. The vast majority of operators employ machine learning models that implement a supervised learning approach which means that the models need labelled examples of players playing in a problematic or unsafe manner. Generally, the playing data preceding a customer’s self-exclusion is selected as the behaviour the machine learning algorithms aims to detect. While these models do a very important job in surfacing problematic play to customer service teams and thus protecting players, they have some blind spots.

These blind spots can be broken down into two areas. Not every person who has a problem self excludes and not every person who self excludes has a problem. The first statement “not every person who has a problem self excludes” identifies the difficult problem that not every player will identify that they have a problem and self-exclude. These players might continue playing in a problematic manner in perpetuity causing difficulties in their financial and interpersonal relationship. They might also gamble such large amounts that they have exhausted all savings and credit therefore see selecting to self-exclude as having perceived negligible benefits.

The second statement “not every person who self excludes has a problem” might seem improbable on first reading. Why would someone self-exclude if they didn’t have a problem? Generally, it comes down to two scenarios. The first is when a player has a particularly unlucky day and in a fit of heightened emotion and ill feeling towards that particular gaming operator might chose to self-exclude to make sure they never frequent them again. The second common reason why people who are not exhibiting problem gambling tendencies might choose to self-exclude is due to avoid SPAM emails they perceive will be received from an operator.

While these two self-exclusion scenarios are not common they do pose a problem to the data scientists who build the prediction models as they add noise to the data. Operators do try and remove this noise from their datasets however it is a very difficult and time consuming task.

One solution to this issue is to employ markers of harm in conjunction with an unsupervised machine learning approach.


Supervised learning vs unsupervised learning

In contrast to supervised learning, unsupervised learning does not use labelled data. Simply put it clusters data together depending on what features are selected.

For example, here at Anthem we have clustered players based on multiple characteristics such as tenure, weekly spend, number of games played etc in order to thoroughly understand the player base.

We have also used this approach to cluster similar types of games in order to improve game recommendations to players.
In this context clustering players based on markers of harm allows us to identify problematic play without using labelled data. Here at Anthem we build upon the work of Braverman, J. and Shaffer [1] who identified four key problem gambling markers of harm:

  1. Frequency which relates to the number of sessions a player has during a specified period
  2. Intensity which relates to the duration and number of bets per min during a player’s session
  3. Variability which relates to how gambling behaviour varies over time.
  4. Trajectory which relates to whether the 3 key markers are increasing or decreasing over time.

Traditional markers of harm fall under these four categories which we internally call the pillars of problematic play. For example, playing extremely long sessions would be detected in the intensity marker, loss chasing would be detected in the variability marker, deviating from your regular playstyle would be detected in the trajectory marker and playing many games at the same time would be detected under the frequency marker.

Scoring high in one of these areas does not indicate a problem per se however scoring highly across all four is a cause for concern. Careful use of unsupervised learning methodologies along with some advanced feature engineering allows the identification of a problematic cluster of players whose entire population consists of players who score highly across the four key makers of harm.

Getting a high degree of separation of between the clusters is the goal from a technical standpoint.

Factors for cluster analysis

Remember that clustering analysis is not negatively affected by heteroscedasticity but their construction can be negatively impacted by multicollinearity of features/ variables built so consider using principle components to remove superfluous information.

In addition to this analyse your data to identify the correct window length you should be using for your particular player base. Finally careful choice of the clustering algorithm itself is key.

Factors that should be considered are

  1. Data Types
    • Independent vs dependent data
  2. Parametric vs. non-parametric clustering
  3. Quality of Clustering
  4. Software/Package/Language Availability
  5. Features of the Methods
    • Computing averages (sometimes incalculable or too slow)
    • Stability analysis
    • Clusters properties
    • Speed
    • Memory


Traditional supervised learning has still a part to play in this process. Once we are able to accurately place players into safe and problematic clusters it makes sense to try and predict which cluster players will belong to in their future. Deep learning algorithms such as Recurrent Neural Networks have been shown to have a lot of success recognising a dataset's sequential attributes and can use bet patterns to predict the next likely behaviour.


Future Anthem's approach

Anthems models examine spin by spin levels of data to understand exactly what kind of behaviour a player is exhibiting in real time. This requires a state-of-the-art cloud technology platform which can ingest game event data in real-time, creating and caching relevant meta data, inputting said data into models and pushing out the model result in real-time.

This allows actions to be taken on the player much quicker than conventional approaches which limits the damage that the player can do. However you approach this problem, this should be your ultimate goal.

New call-to-action

[1] Braverman, J. and Shaffer, H.J., 2012. How do gamblers start gambling: Identifying behavioural markers for high-risk internet gambling. The European Journal of Public Health, 22(2), pp.273-278.