I recently had the good fortune to host a small-group discussion on personalization and recommendation systems with two technical experts with years of experience at FAANG and other web-scale companies.
Raghavendra Prabhu (RVP) is Head of Engineering and Research at Covariant, a Series C startup building an universal AI platform for robotics starting in the logistics industry. Prabhu is the former CTO at home services website Thumbtack, where he led a 200-person team and rebuilt the consumer experience using ML-powered search technology. Prior to that, Prabhu was head of core infrastructure at Pinterest. Prabhu has also worked in search and data engineering roles at Twitter, Google, and Microsoft.
Nikhil Garg is CEO and co-founder of Fennel AI, a startup working on building the future of real-time machine learning infrastructure. Prior to Fennel AI, Garg was a Senior Engineering Manager at Facebook, where he led a team of 100+ ML engineers responsible for ranking and recommendations for multiple product lines. Garg also ran a group of 50+ engineers building the open-source ML framework, PyTorch. Before Facebook, Garg was Head of Platform and Infrastructure at Quora, where he supported a team of 40 engineers and managers and was responsible for all technical efforts and metrics. Garg also blogs regularly on real-time data and recommendation systems – read and subscribe here.
To a small group of our customers, they shared lessons learned in real-time data, search, personalization/recommendation, and machine learning from their years of hands-on experience at cutting-edge companies.
Below I share some of the most interesting insights from Prabhu, Garg, and a select group of customers we invited to this talk.
By the way, this expert roundtable was the third such event we held this summer. My co-founder at Rockset and CEO Venkat Venkataramani hosted a panel of data engineering experts who tackled the topic of SQL versus NoSQL databases in the modern data stack. You can read the TLDR blog to get a summary of the highlights and view the recording.
And my colleague Chief Product Officer and SVP of Marketing Shruti Bhat hosted a discussion on the merits, challenges and implications of batch data versus streaming data for companies today. View the blog summary and video here.
How recommendation engines are like Tinder.
Thumbtack is a marketplace where you can hire home professionals like a gardener or someone to assemble your IKEA furniture. The core experience is less like Uber and more like a dating site. It’s a double opt-in model: consumers want to hire someone to do their job, which a pro may or may not want to do. In our first phase, the consumer would describe their job in a semi-structured way, which we would syndicate behind-the-scenes to match with pros in your location. There were two problems with this model. One, it required the pro to invest a lot of time and energy to look and pick which requests they wanted to do. That was one bottleneck to our scale. Second, this created a delay for consumers just at the time consumers were starting to expect almost-instant feedback to every online transaction. What we ended up creating was something called Instant Results that could make this double opt-in – this matchmaking – happen immediately. Instant Results makes two types of predictions. The first is the list of home professionals that the consumer might be interested in. The second is the list of jobs that the pro will be interested in. This was tricky because we had to collect detailed info across hundreds of thousands of different categories. It’s a very manual process, but eventually we did it. We also started with some heuristics and then as we got enough data, we applied machine learning to get better predictions. This was possible because our pros tend to be on our platform several times a day. Thumbtack became a model of how to build this type of real-time matching experience.
The challenge of building machine learning products and infrastructure that can be applied to multiple use cases.
In my last role at Facebook overseeing a 100-person ML product team, I got a chance to work on a couple dozen different ranking recommendation problems. After you work on enough of them, every problem starts feeling similar. Sure, there are some differences here and there, but they are more similar than not. The right abstractions just started emerging on their own. At Quora, I ran an ML infrastructure team that started with 5-7 employees and grew from there. We would invite our customer teams to our inner team meetings every week so we could hear about the challenges they were running into. It was more reactive than proactive. We looked at the challenges they were experiencing, and then worked backwards from there and then applied our system engineering to figure out what needed to be done. The actual ranking personalization engine is not only the most-complex service but really mission critical. It’s a ‘fat’ service with a lot of business logic in it as well. Usually high-performance C++ or Java. You’re mixing a lot of concerns and so it becomes really, really hard for people to get into that and contribute. A lot of what we did was simply breaking that apart as well as rethinking our assumptions, such as how modern hardware was evolving and how to leverage that. And our goal was to make our customer things more productive, more efficient, and to let customers try out more complex ideas.
The difference between personalization and machine learning.
Personalization is not the same as ML. Taking Thumbtack as an example, I could write a rule-based system to surface all jobs in a category for which a home professional has high reviews. That’s not machine learning. Conversely, I could apply machine learning in a way so that my model is not about personalization. For instance, when I was at Facebook, we used ML to understand what is the most-trending topic right now. That was machine learning, but not personalization.
How to draw the line between the infrastructure of your recommendation or personalization system and its actual business logic.
As an industry, unfortunately, we are still figuring out how to separate the concerns. In a lot of companies, what happens is the actual-created infrastructure as well as all of your business logic are written in the same binaries. There are no real layers enabling some people to own this part of the core business, and these people own the other part. It’s all mixed up. For some organizations, what I’ve seen is that the lines start emerging when your personalization team grows to about 6-7 people. Organically, 1-2 of them or more will gravitate towards infrastructure work. There will be other people who don’t think about how many nines of availability you have, or whether this should be on SSD or RAM. Other companies like Facebook or Google have started figuring out how to structure this so you have an independent driver with no business logic, and the business logic all lives in some other realm. I think we’re still going back and learning lessons from the database field, which figured out how to separate things a long time ago.
Real-time personalization systems are less costly and more efficient because in a batch analytics system most pre-computations don’t get used.
You have to do a lot of computation, and you have to use a lot of storage. And most of your pre-computations are not going to be used because most users are not logging into your platform (in the time frame). Let’s say you have n users on your platform and you do an n choose-2 computation once a day. What fraction of those pairs are relevant on any given day, since only a miniscule fraction of users are logging in? At Facebook, our retention ratio is off-the-charts compared to any other product in the history of civilization. Even then, pre-computation is too wasteful.
The best way to go from batch to real time is to pick a new product to build or problem to solve.
Product companies are always focused on product goals – as they should be. So if you frame your migration proposal as ‘We’ll do this now, and many months later we’ll deliver this awesome value!’ you’ll never get it (approved). You have to figure out how to frame the migration. One way is to take a new product problem and build with a new infrastructure. Take Pinterest’s migration from an HBase batch feed. To build a more real-time feed, we used RocksDB. Don’t worry about migrating your legacy infrastructure. Migrating legacy stuff is hard, because it has evolved to solve a long tail of issues. Instead, start with new technology. In a fast-growth environment, in a few years your new infrastructure will dominate everything. Your legacy infrastructure won’t matter much. If you end up doing a migration, you want to deliver end user or customer value incrementally. Even if you’re framing it as a one-year migration, expect every quarter to deliver some value. I’ve learned the hard way not to do big migrations. At Twitter, we tried to do one big infrastructure migration. It didn’t work out very well. The pace of growth was tremendous. We ended up having to keep the legacy system evolving, and do a migration on the side.
Many products have users who are active only very occasionally. When you have fewer data points in your user history, real-time data is even more important for personalization.
Obviously, there are some parts like the actual ML model training that has to be offline, but almost all the serving logic has become real-time. I recently wrote a blog post on the seven different reasons why real-time ML systems are replacing batch systems. One reason is cost. Also, every time we made part of our ML system real-time, the overall system got better and more accurate. The reason is because most products have some sort of a long-tail kind of user distribution. Some people use the product a lot. Some just come a couple of times over a long period. For them, you have almost no data points. But if you can quickly incorporate data points from a minute ago to improve your personalization, you will have a much-larger amount of data.
Why it is much easier for developers to iterate, experiment on and debug real-time systems than batch ones.
Large batch analysis was the best way to do big data computation. And the infrastructure was available. But it is also highly inefficient and not actually natural to the product experience you want to build your system around. The biggest problem is that you fundamentally constrain your developers: you constrain the pace at which they can build products, and you constrain the pace at which they can experiment. If you have to wait several days for the data to propagate, how can you experiment? The more real-time it is, the faster you can evolve your product, and the more accurate your systems. That is true whether or not your product is fundamentally real-time, like Twitter, or not, like Pinterest.
People assume that real-time systems are harder to work with and debug, but if you architect them the right way they are much easier. Imagine a batch system with a jungle of pipelines behind it. How would we go about debugging that? The hard part in the past was scaling real-time systems efficiently; this required a lot of engineering work. But now platforms have developed where you can do real time easily. Nobody does large batch recommendation systems anymore to my knowledge.
I cry inside every time I see a team that decides to deploy offline analysis first because it’s faster. ‘We’ll just throw this in Python. We know it is not multi-threaded, it’s not fast, but we’ll manage.’ Six to nine months down the line, they have a very costly architecture that every day holds back their innovation. What is unfortunate is how predictable this mistake is. I’ve seen it happen a dozen times. If someone took a step back to plan properly, they would not choose a batch or offline system today.
On the relevance and cost-effectiveness of indexes for personalization and recommendation systems.
Building an index for a Google search is different than for a consumer transactional system like AirBnB, Amazon, or Thumbtack. A consumer starts off by expressing an intent through keywords. Because it starts with keywords that are basically semi-structured data, you can build an inverted index-type of keyword search with the ability to filter. Taking Thumbtack, consumers can search for gardening professionals but then quickly narrow it down to the one pro who is really good with apple trees, for example. Filtering is super-powerful for consumers and service providers. And you build that with a system with both search capabilities and inverted index capabilities. Search indexes are the most flexible for product velocity and developer experience.
Even for modern ranking recommendation personalization systems, old school indexing is a key component. If you’re doing things real time, which I believe we all should, you can only rank a few hundred things while the user is waiting. You have a latency budget of 4-500 milliseconds, no more than that. You cannot be ranking a million things with an ML model. If you have a 100,000-item inventory, you have no choice but to use some sort of retrieval step where you go from 100,000 items to 1,000 items based on scoring the context of that request. This selection of candidates quite literally ends up using an index, usually an inverted index, since they’re not starting with keywords as with a conventional text search. For instance, you might say return a list of items about a given topic that have at least 50 likes. That is the intersection of two different term lists and some index somewhere. You can get away with a weaker indexing solution than what is used by the Googles of the world. But I still think indexing is a core part of any recommendation system. It’s not indexing versus machine learning.
How to avoid the traps of over-repetition and polarization in your personalization model.
Injecting diversity is a very common tool in ranking systems. You could do an A/B test measuring what fraction of users saw at least one story about an important international topic. Using that diversity metric, you can avoid too much personalization. While I agree over-personalization can be a problem, I think too many people use this as a reason not to build ML or advanced personalization into their products, even though I think constraints can be applied at the evaluation level, before the optimization level.
There are certainly levels of personalization. Take Thumbtack. Consumers typically only do a few home projects a year. The personalization we’d apply might only be around their location. For our home professionals that use the platform many times a day, we would use their preferences to personalize the user experience more heavily. You still need to build in some randomness into any model to encourage exploration and engagement.
On deciding whether the north star metric for your customer recommendation system should be engagement or revenue.
Personalization in ML is ultimately an optimization technology. But what it should optimize towards, that needs to be provided. The product teams need to give the vision and set the product goals. If I gave you two versions of ranking and you had no idea where they came from – ML or not? Real-time or batch? – how would you decide which is better? That is the job of product management in an ML-focused environment.