Automated Smurf Detection

Announcements and discussion about community-run events.
Post Reply
TTTPPP
Posts: 331
Joined: Wed Sep 05, 2018 8:25 pm

Automated Smurf Detection

Post by TTTPPP »

I thought it would be interesting to see whether it was possible to identify players from replay files by the in-game actions they took. One obvious use of this is to associate smurf accounts with the original player. I hacked something together using a combination of bμg's replay parser and the statistics tool I use to create analysis posts about RAGL. Before I go through the results I wanted to go through some of the limitations that the approach has.

Firstly bμg's tool can only cope with replays from 2021 onwards. This immediately means that the approach isn't able to shed light on the identity of Misery from RAGL S08 or Archangel (which happy has claimed a long time ago).

Another reason I can't identify Archangel is that I'm limited to replays that I have, or have downloaded from the Ladder (and I don't have enough free disk space/patience to download everything from the ladder!)

The classification algorithm I've put together is pretty crude. It computes different metrics for each replay, averages these per player account and then compares the absolute difference between the metric scores. This could definitely be improved by filtering out metrics which are adding noise or by computing an optimal weighting for the different metrics. There are probably other metrics that could be included to improve the score too. Having said all this, using a 2:1 train/test split, the algorithm does seem to match accounts correctly.

Since I have a limited data set then the script can only guess at players which are within the data set. This means that since there are no LorryDriver replays (because he played before 2021) then it will never guess that an account is a Lorry smurf.

There are a number of factors which I deliberately did NOT use:
  • Player chat
  • Player names
  • Skill level
  • IP address
  • Time of day/day of week
  • Game count (some players play lots more games than others)
  • List of opponents (since it's hard for a smurf to play against themself)
Anybody who is manually trying to identify a smurf will definitely use some or all of these things, but I thought it gave a more interesting result to base predictions purely on playstyle. All of these things can therefore be used to crosscheck the predictions made by the script.

A final note before we get on to some results in the next post: if you start a witch hunt then you're going to find witches. The script simply points out accounts that play in similar ways - therefore it will definitely find similar accounts. This does not mean that the players are smurfs of each other (and in many cases there is plenty of evidence that they are not smurfs of each other).

TTTPPP
Posts: 331
Joined: Wed Sep 05, 2018 8:25 pm

Re: Automated Smurf Detection

Post by TTTPPP »

The first results I wanted to discuss were what happened when making a train/test split. I randomly partitioned off two thirds of the games as "training" data and the other third of the games as "test" data.* I removed all games from players who had fewer than ten games in the "test" set. This resulted in 100 players.

For each of the 200 groups (100 players for each of train and test) I ran the set of metrics to produce a tuple of values. Next I compared the tuples to find the closest train tuple to each test tuple. Here's an example of the output:

Code: Select all

For Ekanim guessed Ekanim 0.21, Goremented 0.67, Eugenator 0.68 (test sample size: 125)
So for the Ekanim test data then the closest match was the Ekanim train data, followed by the Gormented train data and the Eugenator train data. The script correctly guessed Ekanim since the distance was 0.21, which was significantly less than the distances of 0.67 and 0.68 from the next closest players. Overall there were 125 games from Ekanim in the test data set (which suggests there were about 250 games from Ekanim in the train data set).

The final score by this method changed each time I ran it (because the train/test split was random), but generally it produced a result of about 90-95% accuracy. The times when it guessed incorrectly were generally when there were fewer games in the data set - for example:

Code: Select all

For porenut guessed Upps 0.60, toiletbreakbrb 0.63, porenut 0.64 (test sample size: 10)
This already gave me some interesting results - for example:

Code: Select all

For Fazzz guessed Fazzz 0.24, Fazzar 0.63, IronScion 0.70 (test sample size: 75)
At some point Fazzar changed his forum name from Fazzar to Fazzz. This resulted in his games being split into two "people". We can see that the script thought the best guess for Fazzz (test) was Fazzz (train), followed by Fazzar (train). The values are quite significantly different suggesting that Fazzar's play style has changed over time too.

To put these distances in perspective - the final average distance from the correct guess was 0.402, so a distance of 0.2 gives a high confidence while a distance of 0.6 gives a lower confidence.

I don't think the wrong guesses include any smurfs. They were almost entirely players with low game counts. In fact - the script even guessed the correct player where there were known smurfs as an alternative option. That means the script could distinguish between a player playing as themselves, or playing as a smurf. I expect this is for the same reason it can distinguish between Fazzz and Fazzar - players have changed their play style over time. It could potentially also be due to players behaving differently when smurfing.

* Although I intended to, I didn't actually perform any training with the training set. I've just kept the name train and test because that's what they're called in the code.

TTTPPP
Posts: 331
Joined: Wed Sep 05, 2018 8:25 pm

Re: Automated Smurf Detection

Post by TTTPPP »

The next part of my investigation was to try measuring the distance between different accounts. To do this I didn't split the data (train/test) as I was doing above, but rather lumped it all together to produce a tuple of values for each account. Another change from above was that I didn't filter out accounts that had played a low number of games. This second change means that there will be results with a large amount of error. I'll therefore try to include the number of games each account has played in my data set.

Possibly the player who is most famous (or infamous) for smurfing is Margot Honecker. Just going by the Discord chat history then Margot has been accused of being behind all of the following accounts (some were joke suggestions): peep, RoboCody, despro (by himself), Ruckus (also by himself), Antarctica, creo, KushBuddy, Elitecommander (this doesn't seem to even be a real account) and General Carl.

If we take the distance from Margot's account then we get the following ordering of accounts (from most likely to be a Margot smurf, to least likely):

Code: Select all

0.47: Margot Honecker (77) RoboCody (28)
0.50: Margot Honecker (77) KushBuddy (20)
0.57: Margot Honecker (77) General Carl (33)
0.93: Margot Honecker (77) peep (12)
0.94: Margot Honecker (77) creo (111)
0.95: Margot Honecker (77) Rossie (52) (Ruckus)
0.97: Margot Honecker (77) Antarctica (48)
1.00: Margot Honecker (77) despro (122)
This shows that the first three are reasonable accusations, but the rest are probably unfounded.

If we compare the RAGL ladder ratings for these accounts then we get:

Code: Select all

1710: despro
1600: creo
1316: RoboCody
1279: peep
1195: Antarctica
1156: Margot Honecker
1116: General Carl
1013: KushBuddy
897: Rossie
Assuming that Margot plays at roughly 1156, then it's plausible they would get a small boost by being anonymous (e.g. RoboCody), or that they could use the smurf account to try out a new strategy and do terribly with it. However it seems unlikely they could suddenly improve to despro/creo levels with a new account. This generally backs up the assertion above that RoboCody, KushBuddy and General Carl could be Margot smurfs.

Performing the search in reverse - looking at the accounts that are most similar to Margot we get:

Code: Select all

0.47: Margot Honecker (77) RoboCody (28)
0.50: KushBuddy (20) Margot Honecker (77)
0.57: General Carl (33) Margot Honecker (77)
0.67: 808MANN (23) Margot Honecker (77)
0.70: Margot Honecker (77) Mo (159)
0.75: Margot Honecker (77) Sigil (142)
0.75: Margot Honecker (77) Orb (22)
0.78: Margot Honecker (77) Odinn (20)
0.80: Margot Honecker (77) goldie (32)
This implicates the same three accounts we found earlier. Below that, 808MANN has a RAGL ladder rating of 663, and actually lost to RoboCody, so seems unlikely to be a Margot smurf. Meanwhile Mo, Sigil and Orb are established players. From this it seems that the cut-off for similar play styles is somewhere around 0.6-0.7, which matches what we saw in a previous post.
Last edited by TTTPPP on Sat May 25, 2024 6:42 am, edited 1 time in total.

TTTPPP
Posts: 331
Joined: Wed Sep 05, 2018 8:25 pm

Re: Automated Smurf Detection

Post by TTTPPP »

There are two players in my dataset where I know they have played under two different names. The first is Fazzar (Fazzz and Fazzar) and the second is happy (Clockwork and then not happy). I already gave some details about Fazzar above, but let's have a look at the closest accounts using the method we used for Margot previously. Here are all accounts that are within a distance of 0.8 from Fazzz.

Code: Select all

0.59: Fazzar (418) Fazzz (234)
0.70: Fazzz (234) IronScion (45)
0.75: Fazzz (234) porenut (23)
0.76: Fazzz (234) bete (498)
0.78: Fazzz (234) toiletbreakbrb (407)
0.78: Fazzz (234) f0rk (367)
0.79: Fazzz (234) biasao (604)
This correctly picks out Fazzar as a likely alter-ego of Fazzz, although the distance is much higher than when comparing part of the Fazzz data set against the other part. As mentioned earlier I think this is because Fazzar has changed his play style over time.

Performing the same analysis with not happy we get:

Code: Select all

0.62: not happy (73) Starforce (7)
0.64: not happy (73) Time (9)
0.65: not happy (73) morkel (277)
0.65: not happy (73) Blackened (231)
0.65: not happy (73) Pvt_Leaf (37)
0.68: not happy (73) MASTER (35)
0.68: not happy (73) Kav (358)
...[11 more accounts]...
0.76: not happy (73) FiveAces (84)
0.76: not happy (73) maths (76)
0.76: not happy (73) Sigil (142)
0.76: not happy (73) Clockwork (103)
I deliberately put the cut-off at Clockwork to show how different it thinks the play styles are. In fact the algorithm prefers matching up not happy with any of morkel, Blackened, MASTER, Kav, (and many more) rather than match it with Clockwork. It seems that happy 2.0 really is a different evolution!

One thing that I did notice here is that not happy seems quite similar to Starforce, Time and potentially maths, which are all potentially smurf accounts (that is - no one with those names seems to have emerged as a distinct personality). If we use a cut-off of 0.7 then we get the following lists of potential matches for these accounts:

Code: Select all

0.62: Starforce (7) not happy (73)
0.65: Starforce (7) Time (9)

Code: Select all

0.64: Time (9) not happy (73)
0.65: Time (9) Starforce (7)

Code: Select all

0.68: maths (76) Blackened (231)
0.69: maths (76) Kav (358)
Starforce and Time both play with a skill level reminiscent of happy, so these seem like reasonable accusations. The results from maths led me to check the closest players to Kav and Blackened, and in both cases they are each other's closest account:

Code: Select all

0.53: Blackened (231) Kav (358)
This is a good reminder to take the whole analysis with a pinch of salt. The algorithm can point out players with similar features in their games, but it cannot prove that two players are the same person.

tux
Posts: 17
Joined: Sun Nov 14, 2021 3:07 pm

Re: Automated Smurf Detection

Post by tux »

Fantastic stuff TripT - now if you'd just smack your spaghetti onto GitHub so I could play with it for a bit... ;)

TTTPPP
Posts: 331
Joined: Wed Sep 05, 2018 8:25 pm

Re: Automated Smurf Detection

Post by TTTPPP »

One of the cool things about this system is that it isn't based on skill level. We saw above that it was happy to claim happy and Private Leaf were similar (0.65 distance), but they certainly don't feel similar to play against.

Consequently I thought about seeing if anyone plays similar to goat who has topped the RAGL ladder for two and a half years and counting. Here's all players within a distance of 0.8 of goat:

Code: Select all

0.47: goat (110) sup (94)
0.50: goat (110) i like men (97)
0.69: goat (110) realpeep (7)
0.75: goat (110) creo (111)
0.77: goat (110) dang_shot (152)
0.80: goat (110) Goremented (767)
The immediate standout name on the list is sup. For quite a while sup played on ladder and goat didn't, while goat played in RAGL and sup didn't. Both of them play from the same IP address and this was explained because they are (/were) housemates. They both play at an extremely high level. The obvious explanation for all of the above is that they were the same player, however my understanding is that most people are convinced they were distinct (possibly there was a game between them?) Well here's some random analysis reopening that can of worms.

Potentially closing the can a little is the fact that goat and ilm also play in a really similar way. These two have played against each other in RAGL, as well as both being active on Discord. This potentially points to high tier players converging on a particular play style. The inclusion of creo and dang_shot in the list provides further evidence for this as both are in the top twenty players of all time. Also worth mentioning is realpeep - but more on that account in a future post.

Looking instead at who is similar to sup we get:

Code: Select all

0.40: sup (94) i like men (97)
0.47: sup (94) goat (110)
0.79: sup (94) TiTo (474)
0.80: sup (94) realpeep (7)
So although sup is very similar to goat, they're actually even more similar to ilm! After that there's a big gap before getting to TiTo.

To complete the analysis, here's the list of accounts close to i like men:

Code: Select all

0.40: i like men (97) sup (94)
0.50: i like men (97) goat (110)
0.75: i like men (97) Kernel Panic (130)
0.78: i like men (97) pepe the frog (17)
So we see again that the players most similar to ilm are sup and goat. At the start I pointed out that this isn't based on skill level, but these three players are all very highly skilled players, and it seems that no lower skill players play like them. We saw in a previous post that Kav and Blackened also form a cluster (which is ~1.1 distance away from this one), so it doesn't seem that there one right way to play the game.

TTTPPP
Posts: 331
Joined: Wed Sep 05, 2018 8:25 pm

Re: Automated Smurf Detection

Post by TTTPPP »

Another claimed smurf account is Death-Sentence. This account popped up in 2021 and achieved a Master's level rating within a couple of weeks. Since then Moods (ROCKhardFISTnips) has claimed the account, and it played another batch of games in 2022 which significantly lowered its rating.

Code: Select all

0.59: Death-Sentence (23) ROCKhardFISTnips (97)
0.66: Death-Sentence (23) Blackened (231)
0.66: Death-Sentence (23) MASTER (35)
0.68: Death-Sentence (23) poop (57)
0.68: Death-Sentence (23) Pvt_Leaf (37)
0.69: Death-Sentence (23) TermX (176)
From the assigned distances we can see that Moods is indeed the most likely match. I think something interesting here is the limitation of relying on player skill level to identify smurf accounts. The RAGL ratings in May showed that Death-Sentence was way better than Moods - it would be more natural to match the account with Blackened or NoobX:

Code: Select all

Pos	Player                	RAGL Rating
18	Blackened        	1547
24	Death-Sentence 	      	1505
35	TermX                	1449
74	ROCKhardFISTnips	1192
80	poop                 	1160
--	MASTER            	[Unrated]
--	Pvt_Leaf        	[Unrated]
If we compare that to the latest ratings, now that both players have had more games:

Code: Select all

Pos	Player                	RAGL Rating
8	Blackened		1633
18	MASTER			1468
37	ROCKhardFISTnips	1336
39	TermX			1330
72	Death-Sentence		1176
149	poop			958
212	Pvt_Leaf		820
Over time both accounts have moved towards a similar rating.

During my investigation I initially had about half the number of games in the data set, and I found that the best match was for MASTER. At the time the system seemed keen to match up any and all smurf accounts with MASTER (well, actually just Time, Death-Sentence, WhatEverMan and Starforce). Since increasing the data set it no longer thinks most of these are good matches, with MASTER just being similar to:

Code: Select all

0.66: MASTER (35) Death-Sentence (23)
0.68: MASTER (35) not happy (73)
0.68: MASTER (35) poop (57)
0.69: MASTER (35) kazu. (24)
0.70: MASTER (35) morkel (277)
and those other smurf accounts being further away. (nb. We covered Starforce and Time in a previous post above)

Code: Select all

0.73: MASTER (35) Starforce (7)
0.73: MASTER (35) Time (9)
0.77: MASTER (35) WhatEverMan (21)

TTTPPP
Posts: 331
Joined: Wed Sep 05, 2018 8:25 pm

Re: Automated Smurf Detection

Post by TTTPPP »

Another known smurf account is damn_shot. The real player dang_shot has 152 games in my data set and shows up as having a similar play style to several other well known players:

Code: Select all

0.65: dang_shot (152) creo (111)
0.67: dang_shot (152) spetsnaz84 (173)
0.70: dang_shot (152) Yara (30)
Meanwhile damn_shot is a smurf account claimed by tux with only a few games. This account does not match up with any other player, including tux. Listing all accounts up to a distance of 1.0 (i.e. not at all similar):

Code: Select all

0.97: damn_shot (7) Socan (1)
tux himself has played a lot more games but also doesn't come close to any other accounts. The nearest account is almost distance 1 away:

Code: Select all

0.99: tux (329) potato (18)
I spoke to tux and he came up with a list of old accounts, most of which have no games in my dataset. One account from the list was General Carl which he shared with Margot, and which may explain why it was less similar to Margot than the other smurf accounts we looked at above. I found it quite interesting that not only tux, but also damn_shot were completely different from all other accounts in the data set. They were even very far apart from each other - having a distance of 1.07. Weirdly even dang_shot shows up closer to this smurf account:

Code: Select all

1.05: damn_shot (7) dang_shot (152)

TTTPPP
Posts: 331
Joined: Wed Sep 05, 2018 8:25 pm

Re: Automated Smurf Detection

Post by TTTPPP »

To put these conjectures into perspective I thought it would be useful to run the same analysis against myself. I know that I haven't been using smurf accounts and that I don't share my account with anyone else. Looking up to a distance of 0.8 we get:

Code: Select all

0.68: TTTPPP (215) Jur (122)
0.71: TTTPPP (215) maths (76)
0.72: TTTPPP (215) milkman (763)
0.72: TTTPPP (215) Tailix Killa Mentor (52)
0.73: TTTPPP (215) Duke Bones (172)
0.74: TTTPPP (215) barf_openra (26)
0.79: TTTPPP (215) Rush Al (24)
0.79: TTTPPP (215) Gargamel (31)
0.79: TTTPPP (215) fiwo (44)
0.80: TTTPPP (215) maceman (812)
0.80: TTTPPP (215) gargraval (16)
So anecdotally Jur's playstyle is similar enough that he would be identified as a smurf of mine. Jur plays a lot better than I do (he has a rating about 350 better than me), and we've also played each other several times (for example in RAGL Season 12 when Jur won Minions).

Checking in the other direction then Jur is closer to a lot of other accounts - going up to a distance of 0.7:

Code: Select all

0.57: Jur (122) Duke Bones (172)
0.60: Jur (122) maceman (812)
0.62: Jur (122) milkman (763)
0.63: Jur (122) barf_openra (26)
0.64: Jur (122) Tailix Killa Mentor (52)
0.64: Jur (122) WhoCares (233)
0.65: Jur (122) Rush Al (24)
0.66: Jur (122) Gargamel (31)
0.66: Jur (122) SlickSloth (29)
0.67: Jur (122) Blackened (231)
0.67: Jur (122) netnazgul (52)
0.68: Jur (122) TTTPPP (215)
0.69: Jur (122) WhatEverMan (21)
I don't think any of these look at all likely to be smurf accounts for Jur. The best match is Duke Bones, who does have a similar RAGL rating, but they have played each other before in RAGL S12. The other top results are all well known players too.

Post Reply