Constella Web Logo white e1703116556868

From Data Breach to Dependable Alert

breach alert iStock 1349106104

Every internet user has filled out a web form where they’ve provided some personally identifiable information (PII) to an organization they’re working with; data that these organizations promise to safeguard–but it doesn’t always go as planned. There’s an entire underground economy built around monetizing stolen PII and credentials, in the form of breach packages.

The truth is, your PII is very valuable to malicious actors, and despite considerable efforts to keep your personal information private, organizations of all sizes are frequently targeted and infiltrated by hackers. And unfortunately, some organizations have less-than-mature security and privacy practices, and inadvertently expose your data either via misconfigured software or careless security practice, or distribution to an unintended recipient.

You’ve probably heard about Identity Theft Protection services that monitor the deep and dark web for your exposed information. Subscribing to such a service is a great way to protect yourself from becoming a victim of cybercrime, but not all deep and dark web monitoring service providers are created equally.

The Steps Between Data Breach and Alert Delivery

Let’s peek under the hood of Constella Intelligence’s Breach Ingestion Engine to find out how the industry’s leading threat intelligence provider delivers verified, validated and correctly attributed identity exposure alerts that enable an optimal and dependable end-user experience.

1. Data Breach

Data breaches happen–there’s no cybersecurity secret-sauce that can guarantee breach prevention. Hackers are highly motivated to exfiltrate user data, because it enables highly lucrative fraud–identity theft, account takeover, and wire fraud, to name a few.

Threat actors sell and exchange breach data in underground communities with varying levels of sophistication. Breach packages are priced by various methods (e.g., size, rank, or type of stolen data). As a result, threat actors aren’t shy about fabricating data or mixing in old data with fresh breach data to falsely inflate the sale price of the breach package in underground markets. Less skilled hackers will even combine breached credentials from various sources into “password combo lists”.

As these data packages begin to exchange hands, chatter about the breach circulates in underground communities and providers like Constella are able to collect this breach data.

2. Hunting

Timely collection of relevant breaches–including small breaches that don’t make headlines–is not a trivial task. Constella’s breach hunters have been navigating underground communities for over 10 years; operating in multiple online arenas, they have established expertise in ethical data sourcing and collection, and therefore remain apprised of threat actor activity, allowing us to capture breached data in a way that is fully compliant with US Government guidelines.

3. Normalization

After capturing a breach package, the data must be normalized prior to ingesting it into the Constella Identity Data Lake. Prior to normalization, the captured breach data is in a distinct format–every website may have their own way of storing user data. This means field names could be proprietary (i.e. sites might name the field for email address, “email”, “email_address”, or “user_id”), formats of data will vary (i.e. a phone number might be stored as “123-456-7890” or “1234567890”) and files are structured differently. While certain data types like email addresses and credit card numbers are easy to identify programmatically, other data types like phone numbers, government ID numbers, and physical addresses that vary in format from region to region, can be very difficult to identify with certainty without context.

Constella’s Breach Ingestion Engine has been built to identify these format variances using AI technology–sometimes with human analyst help–which allows mapping fields from the raw data to Constella’s standardized field names, adapting raw data into a common format and ultimately ingesting (inserting the records into our data lake) for further analysis. Thanks to the power of the Breach Ingestion Engine and analyst team, Constella boasts the greatest breadth of data coverage in the industry, recognizing over 200 types of data attributes. This is critical, as we are able to normalize and alert clients of all data associated with a breach, not just email & password.

4. Data Ingestion

Constella’s Breach Ingestion Engine is programmed to read the file format produced in the normalization step and insert the normalized data into our repository of exposed PII and credentials. After breach data has been ingested, it must still pass through a rigorous verification process before being delivered to our partners or software platforms.

5. Verification

The verification process begins with an analyst-led company analysis. This process is intended to establish the confidence level needed to determine that the data is authentic rather than fabricated, and the extent a breach can be attributed to the organization purported to have lost the data. Of course, the strongest attribution is possible when the victim organization discloses the breach. However, our process will still guide us in assessing the authenticity of the data. This may consist of considering contextual information which can include public details of the victim organization such as location, category, and traffic rankings.

Prior to delivering any alerts, Constella takes multiple steps to check the integrity of the breach data. We begin by removing duplicate data from the breach package along with any records we have identified to be fabricated. Next, we classify the breach based on type, attribution, and overall confidence in the data. These classifications help Constella’s customers and partners determine the types of alerts they deliver to their end users. Some partners may only want to provide high confidence, attributed alerts; these are alerts from a breach where the source is known and verified, and the data is authentic. However, some other partners wish to engage their users frequently choosing to send unattributed alerts–where the source of the data exposure is either not known or not verifiable, but the data is believed to be authentic.

5. Alert Delivery

After Constella’s rigorous verification process, ingested breaches are classified as an attributed breach, an unattributed breach, or a password combo breach; and attribution, authenticity and overall confidence scores are assigned. Our partners choose the types of breach alerts they’d like to receive based on attribution and confidence, and the Constella system pushes out alerts accordingly. This complex process yields dependable alerts for our partners, allowing you to maximize the value you deliver to your end user while reducing your operational costs (customer support/call center, in particular), leading to the best possible customer experience.

How Can Constella Help?

It is imperative that you protect your users data from having their identity stolen. Constella Intelligence’s vast data lake of curated identity exposures brings industry-leading quality to deep and dark web identity exposure alerts. Carefully validated identity records ensure delivery of high-quality, actionable alerts. Seven of the top 10 identity theft protection providers trust Constella to monitor over 195 million partner assets, providing access to over 66 billion compromised identity records. Constella supports monitoring of common PII attributes such as email address, SSN, telephone number, credit card numbers, name, and address, while also providing support and data for less common attributes such as Gamertags, medical insurance account numbers, and IP address.

If you’re ready to protect your assets, your customers, or employees from the depths of the dark web, give Constella Intelligence a try today.



Keon Ramezani headshot

Keon Ramezani

Sales Engineer


Deliver new monitoring services to your customers using the Constella Intelligence API.