You’ve seen the ads -- they’re everywhere -- for the online love brokers: Match.com, eHarmony, JDate, POF.com, OKCupid, and so many others. The very personal act of finding a mate is coupled with rather impersonal technology, where sophisticated algorithms match people up based on data made possible via extensive questionnaires.
Big data is master in this realm. Accurate predictors require prospective partners to answer a plethora of questions about appearance, sexual preferences, leisure activities, etc. The amount of data being collected by these companies is enormous.
For online dating, big data assigns much value to these growing data stores, primarily the ability to make a good match. Online dating companies can now utilize data resulting from millions of successful matches as training data in order to determine more precisely which characteristics are most important to particular groups.
What are those attributes which attract men between 30-45, or formerly married women, or longtime bachelors, or artists? The list is endless, but only some attributes are highly correlated with success in matching, so the online dating service can modify its questionnaire, tailor it for a specific clientele, and continue to tweak it as additional matching results come in. An important tenant of machine learning is constantly making your learning algorithm more accurate.
On OkCupid's own website, they have a quote from the Boston Globe describing them as the “Google of online dating.” That’s an accurate description. With more than 3.5 million active members, and more than 7 million unique logins per month, OkCupid (acquired by Match.com in 2011) epitomizes big data.
One interesting facet of online dating sites is the significant sample size for even the most personal questions. Some questions have been answered over a million times. This speaks to statistical validity. Old media could only get a few thousand people to answer a poll about President Obama, but they felt it was enough to call the election with confidence. OkCupid, on the other hand, can ask the most personal questions and get hundreds of thousands of answers.
The analytics of online dating
The analytical techniques used to crunch the numbers and surface the patterns for online dating tend to vary. In the early days, when the velocity of online dating data was modest, tools like Excel sufficed. But as the industry rapidly grew, it was not uncommon for surveys to generate responses from a half million users.
It quickly became clear that a more robust solution was required. Enter open-source statistical packages like R to add power in the analytics toolbox. Data scientists at OkCupid are big on open-source. R can handle the larger and more complex data sets generated by OkCupid’s growing base of members. That makes R a good choice for data scientists who are interested in pushing the envelope of traditional statistical analysis. For example, the company used R packages for a study comparing gay and straight dating habits. There was a tremendous amount of data, and using R, the patterns evolved quickly.
The OKCupid blog OkTrends shows how big data and data science can hold a mirror to society, revealing its strengths and weaknesses, with a number of fascinating data analysis experiments.
Big data as the Holy Grail
Online dating sites are crafted with big data in mind. eHarmony maintains many terabytes of data in an internal data warehouse. They use Hadoop for distributed processing, along with BI software from Microstrategy. The company mines large amounts of data on its members, mainly from their activities on the website. Member behavior is an integral part of how eHarmony works, as it helps predict the success on the site. This includes the number of logins, how many photos posted, and the number of words in the profile.
eHarmony has a data analytics team, as well as a group dedicated to matching members. The matching team works on algorithms that deal with the data, and they try to find ways to optimize best match potential. eHarmony's algorithm varies from country to country, and the matching team regularly tweaks it based on member behavior. That is what causes huge computational and mathematical challenges for the company. You are essentially comparing a huge number of people to one another across hundreds of different variables, trying to identify the best matches.
The matching team develops hypotheses and analyses data to see which variables should be massaged. Big data also shapes eHarmony's marketing efforts, telling the company when to send out promotions to members. eHarmony is looking at ways to be smarter with processing its big data, and is considering doing more computational work in the cloud.
If you’ve used one of these sites, can you comment on the accuracy of their predictions? Did it work for you?
eHarmony's algorithm attacked As an intriguing follow-up to Big Dater, check out this Feb. 12 New York Times article, Skepticism as eHarmony Defends Its Matchmaking Algorithm. The question is feature engineering, or what data elements will be the most relevant to the machine learning algorithm. "Agreeableness" indicator huh ???
Re: An ironic success model. The opportunities are endless in the sense of how many data points you're able to capture when someone's captivated by your site. Sparkology.com says it does this, tracking a combination of responses, conscious behavior, and unintentional behavior (how long you view a profile).
Rhetorically, where is the room for randomness, though, and serendipity? (Or is serendipity just a series of parallel causes that we cannot track?)
User Rank: Petabyte Pathfinder 2/15/2013 | 3:22:23 AM
Re: An ironic success model. You have a point. For one, how do you come up with metrics to gauge how a date went? Or just how close or how far someone is to finding true love? Unless you have a very dedicated user base, I think few would be open to the idea of answering surveys after every date or every milestone of the relationship. Then there's the question about what would constitute a milestone or not.
Re: An ironic success model. Maybe they are basing these decisions on whether or not they come back and keep looking or not. That would keep the front end analytics working towards this scenario - of course, there should be an exit interview/form when the customer leaves just to make sure they haven't given up/given their heart to someone in a non-digital realm...
User Rank: Blogger 2/11/2013 | 9:24:37 AM
Re: An ironic success model. @technetronics
Marketwatch's post on online dating mentions the book Daniel did and the persisent problem of imperfect algorithms:
Dating sites pride themselves on the wizardry of their algorithms, but even the most sophisticated dating site can't always screen for jerks. "It's very early in the online dating industry," says Dan Slater, author of "Love in the Time of Algorithms: What Technology Does to Meeting and Mating." Sites have gotten better at cross-referencing what people say and do, "but there's still a lot of room for improvement," he says.
Match.com CEO Mandy Ginsberg says the site does its best to suggest people based on the information they supply. The site cross-references users' preferences and also tracks what profiles they click on, in an effort to ensure that their online habits jibe with their stated preferences. eHarmony, in turn, says its team of data scientists and psychologists look at multiple "points of compatibility" between applicants. Prospective members fill out psychological tests based on categories like emotional status, character, self-perception and conflict resolution.
The sites also point to the tools they've introduced in an effort to improve results: In one Match.com feature, for instance, a multiple choice question like "When it comes to style, I like a man who dresses like this" is followed up with a list of photographs of men with various styles. Other questions let members choose from a range of voices and . Other questions let members choose from a range of voices and photographs of celebrities.
Re: An ironic success model. I can take a stab and say that this is another situation of the quality of data coming in...two different people might have very different views of the date so I think you'd be hard-pressed to find an algorithmic way to integrate date feedback unless you limit it to very specific and well-defined metrics (like in a post-date survey) or do word analysis.
User Rank: Blogger 2/9/2013 | 10:26:25 PM
Re: An ironic success model. @technetronic, yes, for the most part, how accurate your portrayal is up to you. Some of Amy Webb's advice in gaming the analytics of online dating is to leave certain things out of a profile:
Interests and activities are fine, but avoid ones that are not self-explanatory or that can backfire. She illustrates with her own experience: "'I have a black belt in aikido.' (I actually do, and I put it on my profile at one point, which prompted some men to challenge me to a fight on the first date, which was as horrible and awkward as it sounds.)"
She includes some advice that she admits sounds "regressive."
Don't mention work, especially if your job is difficult to explain. You may have the most amazing career on the planet, but it can inadvertently intimidate someone looking at your profile. I realize this sounds horribly regressive, but during my experiment I found that women were attracted to men with high-profile careers, while the majority of men were turned off by powerful women.
Oh, and the woman with the silky, straight hair, may, in fact, have curly hair genes but be following this piece of advice:
Women with curly hair are at a distinct disadvantage online. I have no idea whether men prefer blondes, but I can say definitively that most men prefer women with healthy, long, straight hair. If you have curls and feel comfortable (and look good) straightening your hair, give that a try.
User Rank: Blogger 2/9/2013 | 10:15:35 PM
Re: Something I was always curious about these sites... alvb1227 they must not do any kind of check if Amy Webb was able to create 10 fake male accounts for JDate. I have heard of some dating sites that are mediated by matchmakers. I'd imagine those charge higher fees and have less freedom to browse and contact profiles.