Constella Web Logo white e1703116556868

ID Fusion: See the Forest Through the Trees

zongnan bao DZTmuJHxrCo unsplash scaled 1

In the identity theft protection world, consumers’ personal information is monitored for exposure on the deep and dark web, and the results often come through as a series of disjointed data points. A person may have an exposed email address and password from a gaming site, exposed address and name from a forum, exposed credit card from an e-commerce site, and an exposed email addresses and employment information from a social media platform. Individually, these exposures may not reveal much other than the inherent risks they bring (i.e. exposed credit card may lead to financial fraud, exposed passwords may lead to account takeover); however, when considered together, these disjointed identity attributes start to paint a more complete picture of this person’s identity. And yet, there’s likely more data about this individual that’s out there circling the hacking community, but the consumer isn’t aware of it.

Nearly one in five consumers have more data exposures on the deep and dark web than they have been alerted to by their identity theft protection service. And it’s no fault of the consumer nor the identity theft protection provider. Imagine you just signed up for identity monitoring—you’re likely to provide a current email or two, name, your address, and a few other current details for your provider to monitor. But you probably weren’t thinking of that email address you stopped using a few years ago, or your former phone number for example. Unfortunately, there may be data exposures linked to these “forgotten” attributes that may never come to your attention, but still may benefit a cyber-criminal in their pursuit to defraud you. Additionally, you may have created “throwaway” email addresses you’ve created to signup for something less serious or fun, and often forget to monitor for exposures of these addresses. These tend to be among the most exposed email addresses, and cyber criminals know this, and therefore, Constella ID Fusion can help protect your identity in places where you may have forgotten you’ve left breadcrumbs.

To a business assessing risk (insider risk, fraud risk, insurance risk and so forth), these uncovered exposures become even more critical. Consider an organization assessing their insider threat risk, the risk they may face via an insider of the organization—an employee or contractor—who may otherwise engage in risky online behaviors, or worse, pose a malicious threat to their own employer. The employer has a limited understanding of how their employee spends their time online, as they are only privy to the employee’s activity on their work devices and the exposures that are linked to their work email. And most employees know better than to misuse their work equipment or email addresses but may be a lot more relaxed with how they use their personal computer and email address. Suppose this employee has some exposed passwords tied to their personal email but practices poor password hygiene, and one of their exposed passwords is the same one they use to log in to their work environment. This leaves the goal posts wide open for an attentive malicious actor, but leaves the organization completely in the dark, as they may see no exposed passwords associated with the employee’s work email address. How can they bridge this gap?

What is Identity Fusion?

You may have one known piece of information about a person—like a work email address—and the direct resulting identity exposures tied to that email address may not be very telling of who this individual is. Suppose this work email appears in a breach that also reveals a phone number, but nothing more. Certainly, this individual must have other exposures out there, and if we could somehow link them, the true identity of this person starts to come into focus. Once we’ve made these links, you could say we’ve fused this set of attributes into a unified identity. Through a series of high-confidence links, suppose we’ve identified a personal email, home address, and a name—we can now associate a person’s identity with this collection of exposed data. Now we know a few things about this person, including what types of websites they visit and can strategically draw conclusions that will allow you to mitigate risk.

In short, Identity Fusion is the process of associating multiple separate attributes and datapoints with the same person or identity.

How Does it Work?

The concept behind Identity Fusion is very similar to how cyber investigators or security researchers like journalist Brian Krebs use tools like Constella’s investigation platform, Hunter, to traverse identity exposure data, starting with a suspected hacker’s moniker or username and uncover actionable intelligence such as the hacker’s name, address, telephone number or IP address. But the power of Identity Fusion lies in the ability to automate this process at scale, using machine learning and artificial intelligence to discard weak and low-confidence links to ensure that any fused identity elements are high-confidence.

We leverage a concept known as “pivoting.” Let’s consider an example we mentioned above—a person’s work email appears in just one breach record, and in that record, we have an associated telephone number. Through this search, we have found one new piece of information we previously didn’t have—the phone number. We can use this phone number to invoke a new search and might discover a series of new breach records that contain this phone number, along with additional associated data. Through automated analysis, we can determine which resulting records are most likely to relate to the subject of our search. Similarly, we can leverage the fact that humans tend to re-use passwords, and therefore we can link multiple records by common password.

Finally, we need to make determinations about the quality of our data links and figure out which ones we can rely on to lead us to new information, and which ones should be discarded. For instance, let’s look at pivoting on an exposed password—a quick thought experiment tells us that this only produces reliable results when the exposed password is unique. Imagine the millions of internet users that use “password”, “querty123” or something similar for their password; these are far too commonly used and will not provide any help in resolving the identity of our subject. Constella’s algorithms understand when a password is too common or simple to produce reliable results, and therefore, pivots on these passwords are discarded.

Even when pivoting on a more unique data element like a telephone number, we must be careful to ensure the reliability of the linkage. Suppose our starting point is a work email address, such as, and an exposure record containing this email address also contains a phone number. When we pivot on the phone number, we may find other associated data, such as a name. If the associated name is “John Smith”, we can proceed with confidence that this linkage is related to our subject, as we understand that professional email addresses include the owner’s name, and we can see that the discovered name matches what we have in the email address. However, if the associated name comes back as “Bob Wilson,” we’re not left very confident about this linkage, and it is discarded. Constella’s algorithms are trained to follow these principles and are even smart enough to understand that “Jonathan Smith” is a probable owner of, or that the usernames “guitar_player_1963” and “gtrPlyr63” are related.

With these powerful link analysis algorithms in place, we can rank the confidence level we have in each link, choose to only traverse links with a confidence level above a minimum threshold, and deliver fused identity elements with an associated confidence ranking, empowering our partners to decide if the resolved identity elements fall within their risk tolerance.

Benefits of Identity Fusion

In the identity protection space, user engagement is the key to retention—namely, the greater the volume of alerts delivered to your end user, the higher the value the end-user receives. Unfortunately, the bottleneck in alert volume often comes from sparse monitoring inputs provided by the user. Most users have old, forgotten email addresses they don’t use anymore that still have exposures linked to them, or personal information linked to exposures they don’t feel comfortable providing to their identity protection service. If an identity protection service can uncover exposures tied to these email addresses and attributes not provided by their user, the service can alert the user to their unknown exposures, ask permission to add those newly found attributes to their monitoring profile and further protect them. This is a tremendous value-add for the user and instrumental in customer retention.

In the risk assessment space, one of the biggest challenges is starting with very limited information about the subject of the assessment. For example, assessing insider risk at an organization, or considering the risk level of an organization’s entire email domain can be tricky because an employee’s online presence spans beyond their corporate email address. In fact, the majority of a person’s online behavior that poses a risk to their employer happens on personal time with their personal accounts. The ability to resolve an employee’s personal email address from their corporate email provides tremendous insight into previously undiscovered risk an organization faces that could save millions of dollars.

Why Constella?

Constella leads the industry with the largest and most robust data lake of identity exposure data from the deep and dark web, surface web, phishing sites, and botnets. This unmatched data coverage provides a deep tree of links between identity elements, not only maximizing the probability of finding a link, but also the probability of a high-quality link. And beyond breadth and volume of data, Constella’s focus on data accuracy significantly augments the quality of these Identity Fusion links.

A well-established aspect of Constella’s data curation is a rigorous validation process. As we know all too well, malicious actors sometimes manipulate the data they exfiltrate in a breach. These manipulations usually mean false records are inserted into the breach package in order for the hacker to make more money from the sale of the data; or sometimes might mean that the entirety of the data package is fabricated; or that the malicious actor claims to have stolen data from a very high-profile organization, where in reality they’ve just mashed together data that has been circulating the community for a while. Constella’s validation process assess the attribution of a breach—that is, determining if the leaked data actually came from the purportedly breached site—and the authenticity of the breached data—which determines if the data found in the breach package accurately represents the data that was stored by the breached site.

Constella’s validation process allows us to begin our Identity Fusion journey from an elevated platform—knowing that we’re starting with high-quality, reliable data allows us to proceed with confidence when making assessments about the quality of each link.

Partner with Constella today to enable your solution with the most accurate and most actionable identity intelligence available.

Keon Ramezani
Sales Engineer