At the center of many of these challenges is the issue of duplicate records. The management of connections between two customer profiles that belong to the same person is often referred to as record linkage, a.k.a. merge/dedupe or entity resolution. Because of the complexities involved, this is — not surprisingly — a very popular area of research in computer/data science.
For hotels today, accurately merging multiple sources of data (data from multiple property management systems, uploaded file content, etc.) is extremely challenging. For example, a hotelier may upload a new list from an on-property event, and that list may contain a guest profile with a slightly different name from a record that is already in the property management system (PMS), while other data that follows may clearly be the same. In this example, multiple records are created for the same person, and this quickly amplifies across multiple properties in a portfolio. The risk is that hotels may may not recognize returning guests, may unintentionally send duplicate communications to the same person and consequently damage brand reputation, or that aggregated data like spend or stays will not be calculated correctly.
Given data challenges like this, many hotels are outgrowing the capabilities of a traditional hotel technology stack, especially hotels fortunate enough to already have a large database of past and incoming guests.
Savvy hotel operators don't want to be in the business of data processing or warehousing, and they are even less likely to be pursuing more sophisticated machine learning approaches to data management. And, they shouldn't have to. The good news is that forward-thinking companies are working to resolve guest profile-related data challenges for hotels, and are already receiving and managing guest data for millions of guests in widely varied formats.
Here's a brief look at how we at Revinate manage the record linkage problem to provide value for our clients.
1. Normalize the data.
Before we can start to integrate data from the many disparate sources that make up a hotel's technology stack, we have to make the different data formats fit together seamlessly.
For example, consider the small sample of customer data that we might receive from a property management system (PMS). Although a reservation is usually for a guest (or two, or three), it may also be for a wedding party, for a corporation or club, or it may be a placeholder reservation with a name like "walk-in" or "guest."
Additionally, we run up against common issues like name misspellings, especially when dictated over the phone. A first name like "José" will only be a partial match with "Jose." Shortened names must also be accounted for, so that "William" is a match with "Bill", "Margaret" with "Meg", etc. Similarly, if a reservation was made over the phone, it is entirely possible that "Julie N." is the same person as "Julian." Phone numbers and addresses can also be particularly troublesome to normalize, especially when dealing with guest data from many countries.
Through a combination of available normalization libraries and our own proprietary code, Revinate has built a system that makes this data more consistent, so that it can be integrated. This is an ongoing process that we are continually working to perfect for our customers.
2. Match data across systems.
Once the data is all in the same format, profiles can now be matched across systems, ensuring that there are no duplicate guest profiles. This can be a very complicated process. For instance, at a property with 50,000 profiles, there are 2.5 billion possible combinations of profiles to evaluate. At Revinate, we tackle this by greatly reducing the volume of data that needs to be processed through a complex scheme around profile pairs (also known as "blocking"), with focus on data quality first and foremost. For a more in-depth technical explanation, read this article.
3. Merge duplicates.
Now that duplicate guest profiles have been identified, it's time to merge the data to create one seamless database. This functionality will be very different for each use case. Generally, it is important to have a systematic way of evaluating how each field is to be treated. When names are combined (e.g. "William F" and "Will Faulkner" becomes "William Faulkner"), how does one choose which first and last name take precedence?
For different fields, like emails, phone numbers, and addresses, we want to ensure we retain all distinct values. An address that is 99% similar is almost certainly a duplicate, so in a case like this, we need to evaluate which address is the most complete and save one record. Other fields, like sales or revenue data, may need to be aggregated.
At Revinate, the methods outlined above have allowed us to deal with very large volumes of data in a sophisticated, flexible, and scalable way. Hoteliers are already facing an avalanche of guest data, and these approaches help our customers better manage this data and get deeper insights into their guests, freeing them to do what hoteliers do best: provide a stellar guest experience!