SHEIN.com Data Breach Analysis

BreachDirectory
6 min readNov 9, 2022

Foreword

A women’s fashion retailer SHEIN, also spelled SheIn, is a US-based online store that had apparently suffered a data breach somewhere in June 2018, but the company only discovered the breach in late August 2018. SHEIN stated that the intruders managed to gain access to customers’ email addresses and encrypted passwords.

What data is at risk?

When the data breach was discovered, SHEIN stated that the hackers managed to gain access to email addresses and encrypted passwords that were stored in the system, but the leaked data does not contain any signs of encryption — it is likely that the passwords were decrypted before publishing the data.

Email addresses

In this data breach, there is a very wide array of email providers being used. Lets take a look:

The Email Providers In the SheIn Data Breach

The length of the chosen email addresses in this data breach also varies widely — if we take a range from the smallest number to the largest we can see that:

  • The smallest amount — 7 emails were more than or equal to 100 characters in length;
  • There’s 11 emails which were less than or equal to 5 characters in length;
  • 13 emails which contained more than or equal to 90 characters in length;
  • 25 emails which contained more than or equal to 80 characters in length;
  • 117 emails which contained more than or equal to 70 characters in length;
  • 178 emails which contained more than or equal to 60 characters in length;
  • 385 emails which contained more than or equal to 50 characters in length;
  • 10,183 emails which contained more than or equal to 40 characters in length;
  • 16,755 emails which contained less than or equal to 10 characters in length;
  • 843,073 emails which contained more than or equal to 30 characters in length;
  • 9,848,312 emails which contained less than or equal to 20 characters in length;
  • 22,322,666 emails which contained more than or equal to 20 characters in length.

Looking at the top-level domains (TLDs), we can also create a list of countries that SheIn users were using the service from:

SheIn TLDs

Here’s the letters email addresses begin with. If the analysis is being run on a database with duplicates, the results show that there are 29,026,175 email addresses that begin with letters. The most popular letter is R followed by the letter A, which is followed by the letter S. Email addresses beginning with letters contain 99.05978747356848% of the entire user base:

The letters email addresses begin with

Now that letters have been covered, we could also take a look at the numbers. It should be noted that email addresses beginning with numbers are much less prevalent than those beginning with letters. Combined, there are just 213,390 email addresses that begin with numbers — that’s less than 1% of the entire user base. Email addresses beginning with numbers contain 0.7282519329186425% of the total entries in the SheIn data breach.

SheIn.com — Email Addresses by Numbers

0.2119605935128775% of the email addresses in the SheIn data breach did not start with any numbers or letters — that’s exactly 62,108 accounts if we check the records against the database with duplicate entries or slightly more than 58,457 accounts if we check the records against the database without duplicate entries — the exact record count then would be 58,457.41329595996.

Passwords

There is a very interesting password distribution in the SheIn data breach — there are hundreds of different passwords that have been used by multiple different people. Of course, there are the ordinary combinations, but there are also thousands of passwords like “sheinside” potentially meaning that the users who chose such a password probably thought of it on-the-spot or “shein18” and “Shein2018”, potentially meaning that the users created their accounts in 2018. There were also 293,688 users that used multiple empty spaces as their passwords. Here’s the list:

SheIn.com Data — Password Analysis

It should also be noted that the system contained 3,294 one-character passwords meaning that it is probably safe to assume that SheIn did not implement many security rules to enforce password strength.

Judging by the passwords that the users chose, we can safely assume that the service has been in operation at least since 2015 and since then grown steadily — “shein2015” password has been chosen by 699 users, “shein2016” password has been chosen by 2,522 users, “shein2017” password has been chosen by 3,682 users and the “shein2018” password has been chosen by 4,715 users.

This allows us to make an assumption that the choices of year-based passwords grew by 1,823 users in 2016, by 1,160 users in 2017 and by 1,033 users in 2018. Average growth per year — 1338.666666666667 users who chose new year-based passwords, so we can assume that the service would have had approximately 2,372 new users who would have chosen new year-based passwords in 2019 and approximately 3,711 new users who would have chosen new year-based passwords in 2020.

More interesting password choices include one-character passwords like “&”, “S”, “43”, and “(“, the word “sonnenschein” has been used 1,356 times, “papillon” has been used 1,131 times, “1q2w3e4r5t” has been used 1,065 times and “ritinhasantos4” has been used 1,021 times.

We can also see that there are multiple passwords that have been used the same number of times:

Repeating Passwords In the SheIn.com Data Breach

Best guess would be that these passwords were created by users who had more than one account in the system and thus, the times passwords repeated would match the count of multiple accounts the user had.

Apart from this, there are also a lot of passwords that begin with alphabetical letters and numbers. Here is the list of passwords that begin with letters:

SheIn.com Data Breach — Passwords That Begin with Letters

Here is the list of passwords that begin with numbers:

SheIn.com Data Breach — Passwords Beginning with Numbers

In the data dump there are 408,406 passwords that are less than or equal to 5 characters in length, 20,919,888 passwords that are less than or equal to 10 characters in length, 29,187,461 passwords that are less than or equal to 20 characters in length, 65,519 passwords that are more than or equal to 20 characters in length, 40,642 passwords that are more than or equal to 30 characters in length. There are even passwords that are more than or equal to 40 characters in length — the total count of such passwords is 48. It is very likely that the passwords that are more than or equal to 20 characters in length were generated by password managers.

Summary

To summarize, the SheIn data breach, although relatively small compared to the bigger ones, did bring a lot of damage to the company and to its customers. The good thing is that SheIn notified all of their customers that their data is at risk — they also collaborated with cybersecurity investigators who monitored the network and tried to ensure that future data breaches can be prevented.

BreachDirectory can help your company ensure it’s not a target of such data breaches nor now, nor in the future — its powerful API capability will help protect the employees in your company from identity theft, and its data breach search engine can help keep your employees, friends, and loved ones safe by informing them if their account appears in a data breach. A data breach notification service is also available — it informs people who might have their credentials exposed in the next data breach uploaded to BreachDirectory.

Make sure to further your online security by running a search on BreachDirectory and implementing the BreachDirectory API into the infrastructure of your company, and until next time.

--

--