Artsy Engineering Radio

Improving Artwork Recommendations

September 08, 2022 Artsy Engineering Season 2 Episode 21

Recommending artworks to collectors is tricky but having four different ways to do it… Listen as Sarah Haq and Jon Allured talk about how they are improving the artwork recommendations at Artsy by establishing a single source of truth and more!

Jon Allured:

Hello, everyone, welcome to Artsy Engineering Radio. I'm John, I'm your host for today. I'm an engineer here at Artsy and I have a very special guest today. It's Sarah, you guys have probably heard Sarah before she's gotten the podcast. So why don't you quick introduce yourself?

Sarah Haq:

Hello, thank you so much for inviting me. I'm Sarah Huck, we have many servers in the Artsy team. My voice is distinct enough. AI machine learning engineer currently in the fine and explore team at Artsy. I'm working on something super exciting. The field recommendations.

Jon Allured:

Yeah, so we wanted to get together because we've been working on this topic of artwork recommendations for quite a while. And we've reached some milestones that that felt like a good time to kind of talk about the process and talk about what's worked well. We've had some hiccups. And we can talk about those too. But yeah, so I'm on the growth team you're on, you're on a different team. But we overlap in the sense that my team would help like the marketing team. And the marketing team sends an email. And this email is I think, on Thursdays, and it has artwork recommendations for users. So like, I'm coming at it from that point of view of like, hey, the marketing team wants to send our recommendation emails, and you're coming up from the point of view of like, you're going to do some lab coat computer science work, and try to figure out what artworks are going to be good for user. And so we kind of have this way where we've found a seam or like a way to kind of decide where your work ends, and my work begins. And so that can be a cool way for us to kind of work together.

Sarah Haq:

Yeah, I guess, I guess we can start with the history history of recommendations at Artsy because, for example, the use case, like you mentioned, email. But the use case where I joined I picked up with on the app, it was a rail on the app that I was trying to rebuild. So for some bizarre reason, we've had several showcasing something that should be the same. But I think you have definitely done an incredible job uniting everything, and providing the single source of truth around recommendations. But currently, the stuff of the surface bought the app on the website, and emails. So for me, it's like it's really cool to see the old view used and all the exciting feedback we've received. But yeah, it's been a journey.

Jon Allured:

Yeah, so I think one of the early things you hit upon was, let's use some user behavior, and calculate a affinity score for an artist. So given a user and some things they've done, and given what we know about some artists, how can we calculate some number that represents how likely a user is to like an artist. And then once we have that data point, we can use that to find published artworks from our gallery partners that maybe also fit some other criteria and like, so that's kind of how this evolved. And you started that work with a rail on the app homescreen. So you launch the Artsy app on your phone. The first screen you see is like what we call the home feed. And the first rail of the home feed, sort of scrolling left to right, is our art, like artwork recommendations. And so when you're on the CX team, I think that's when that stuff was kicking around.

Sarah Haq:

Yeah, so when I joined, we lots of new hires this whole mass hiring. So I was currently in the collector experience team. And one of the first, I guess tickets or tasks was assigned by Pierre Luca, we wanted to replace this new works by artists who follow or the signal was very explicit, with analytics data and the data team and built this affinity model. And it felt like a good use case. And I paired a lot with Ola. And we put two built together, this rail, which replaced me works from artists you follow to you works for you, which as you said, considers affinity data. What's cool about affinity data, it takes signals that a user is giving us but it's also, for example, the more you browse on artists, it will capture that sort of behavior as well. And I find a lot of recommendation algorithms in industries only tend to look at one perspective, they look at sales or views or something very clear. But I really liked this idea of building on all these various different signals. So if you browse on something once that's maybe very weak, but if you really love an artist or an artwork, you're going to be going back into that page, especially because of the market that we're in. So I really find these affinity recta this affinity model that we've put in production and across all different platforms, I feel for the art market works really well.

Jon Allured:

Yeah, there's something foundational here that I want to make sure we highlight which is like we had to capture the signals first select because we have a pretty good foundation with segment is our is our provider for for this but anyway, so like all of our app and our website like Whenever a user is going about their business, like we're triggering events, and then those events get dumped into like redshift, like our data lake. And so because you had that foundational pool of signals, you could kind of tinker with what to boost what, what, what whatever. And then like arrive at some calculation that was that we feel confident. And

Sarah Haq:

that's that's good way of phrasing it. So yeah, currently, it looks up follow behavior, saving behavior, browsing behavior, inquiring behavior, getting behavior, orders, my collections, there's so many areas where we can capture the information, we are moving towards this otter follow idea, but at the time, that didn't exist as well. And I think only 30% or 20% of our users showed follow behavior, which meant we weren't able to showcase all our users or reflectors, any meaningful artworks as well. So definitely, yeah, it was one step in the right direction. And that's also see lots of different iterations as well.

Jon Allured:

And we'll keep iterating.

Sarah Haq:

Exactly, we'll keep iterating and make sure we provide something what's meaningful. But Jon, how did you fall into all of this, because this was something that was in collector experience, and you were in a different to you, I still don't know how this all started.

Jon Allured:

I think what happened is that you are the CX team, in general, we're doing like AV tests on this in the app, and they were successful, they were doing well, the approach you were taking was causing more commercial actions was causing more engagement, whatever the thing is. And so when marketing found out about that, they were like, I want that for my email. Because because the growth team works so closely with marketing, that turns into a project for me. And then it's like, oh, I'm like learning all this stuff from from Sarah, about, like, what you've been up to, and then I do like an audit. And I look at like, the rail on the homepage, the new for you page, the rail on the app, like all these things, and I realized that like, they're all being powered by different things. So what we figured out is that you might get an email with recommendations that has six artworks, and you click on like a view all and then that goes to maybe a screen in the app or a page on the website. And like what you see there is different and so that that inconsistency was one of the first things that we that I've sort of tried to highlight as like, Okay, before we can really move forward here on improving on evolving our algorithm, we should start by making that experience consistent, and establishing a single source of truth. So that's kind of where I started, I said to Kathy, the Director of Engineering, like, Hey, I would like to kind of spend some focus time here. And like, there's some roadmap here where I, you know, migrate all of these surfaces to a single source of truth. And then once we've hit some of these milestones, then, you know, some of our ideas about, you know, other experiments we can do with Sarah's modeling with him filtering out certain works, whatever, it'd be so much easier to do ILS experiments once we have this consistent experience.

Sarah Haq:

Yeah, I think that's really important to mention that any feature that we tried to release, we've tested thoroughly, we've had a few weeks of testing. And as a creating that testing framework was a huge challenge. And I'm so happy that we have this flow in place where we can try different versions of Definity Rex, we try we succeed, we fail, we keep iterating. And I really liked this process. I think this is just a good way to work, which is just experiment, having a good experimental framework, testing quickly getting feedback quickly. And then trying something else as opposed to spending lots of time designing planning does he let it just just ship it and see what happens?

Jon Allured:

Yeah, we also one thing to note is that we pulled out a recommendations channel in Slack, and then found a few folks that could be like a recommendations working group, so to speak. So that you know, maybe you and I were the ones like the ones mostly doing the actual, like engineering kind of work. But we had like marketing input and our curatorial teams input and, you know, even like other engineering, so anyway, we built this like most multidisciplinary team to try to understand like, What's our goal here? What's the, you know, what's the outcome? How do we stay focused on the user? Because it's so easy to get really into the weeds technically on this stuff?

Sarah Haq:

Yeah, of course. And again, this was a Can we keep talking about its effects emails, or website or app it's across different teams, as you mentioned, not just to collect your product area but also marketing area and yeah, creating this sort of informal team has been a lot of fun. It was also had its own challenges, I think everyone with everything but yeah, it's it's really nice that we're at a place where we have these recommendations being surface and all the areas were able to test it correctly. We're able to get feedback quickly. And we're able to just keep improving. And that is something that we should really celebrate.

Jon Allured:

Yeah, I think so too. So let's get into some of the technical details. I wrote some outline here. And so the next topic I was thinking about is like, how we calculate the recommendations. So I mentioned before that there's like some integration point between your work and my work. And what we established is like, there's a database on one of our surfaces, like where you sort of leave off is you send me like a blob of SQL, sometimes you read it yourself, and then that goes into this system. And it generates the recommendations for user daily. So your daily job, it runs, it sends this sequel blob off to our data lake. And then what ends up being stored in the database is like, essentially, a user ID, and artwork ID and a score. And so then once you've done that work, then I can pick it up from there. And I can surface that into API endpoints. And then I can go to clients, I can update them to hit that endpoint and consume those recommendations. So that was my way of kind of outlining how things work. But I don't know as much about what's like done on your side. So we're just talking about like, how do you come up with a sequel? Or like, what what's what's your process? Like?

Sarah Haq:

Yeah, so the initial starting point is a relationship between users and their interaction with artists or artworks. That's the starting point. For our affinity recommendations, algorithms, we're looking at users and artists they have affinity for, and this ranges and artists that you have in your collection, you've you're constantly browsing you're obsessed with, they'll get a very high score. And as an artist that maybe you just looked looked at once realized you could never afford the artists or it was just something you looked at your entity, you would have a very low scoring on because you just didn't engage so much. So the starting point of all our recommendation algorithms is this affinity dataset, which lives in read stuff. And then based on that we can fetch artworks that we feel are relevant. And that's what we're trialing and erroring. So right now, what we're looking at is, these users have affinity with these artists, what is the most recently published work in the last 30 days showcase that to the user. And originally, we were actually showcasing just recent artwork, so we weren't ordering by affinity. So that meant maybe an artist that you had a low affinity for, but how do they this work would be ranked higher. So the fact that we sorted this so that you would see artists you have affinity for as the first artwork is pretty cool, because I could see see my favorite artist being shown, and I bought something a few days ago. And I was like, yeah, it works. Because I am so obsessed with this art. And I keep looking, I keep looking, I keep looking at it takes it takes time to buy, I think something that we misunderstand art is not an immediate purchase, just because it's so expensive. So it takes time. And obviously, the more you're interacting with it, the more likely you want to buy it as well. So that's something that gets captured in this SQL query that you're talking about. But the starting point is is user artists pairing and the higher the score, the more you affinity for. And then we tried to subset relevant artworks. So for example, recently, we excluded merchandise from the affinity of X algorithm we try, we could potentially show being awake, the animal works higher. So lots of improvements can be made, but we test it, see how it works. We recently tried to incorporate budget, so we were trying to pick artworks within a budget. Sure, we got mixed feedback once that tested well wanted. And so the conclusion was that we might not move forward with it. Maybe budget doesn't make sense. Maybe affinity is a clear enough signal. And that's something that we have to just keep trying and testing. But from my crystal collector, I like the affinity wrecks, because they also pick up artists that I just didn't think to follow it just for whatever reason I couldn't I didn't get round falling. Also surface artists that sometimes I'm like I love but I just moved on to somebody else. So it's really great to be a new workers calm. And if it's something super trendy, it's even more amazing. I think the important things are we start with these user affinity scores, and then based on that we're fetching artworks based on some criteria. And that's what lives in politics right now.

Jon Allured:

Yeah. And so there's two kinds of changes I heard you talk about one was suppressing what we call merch. And so that was a change that we made, and we pushed to production, and we didn't really do much testing on it. We just trusted our instincts there. And we did maybe some like qualitative, just like reviewing how this looked to see if we thought that it was a good approach. But then we do test things you mentioned budget. So I'll just quickly maybe talk about how I set that up. So we mentioned this blob of SQL, right? So like what I did was I added one more column to our database, and call it version or something. And it has either an A or a B. So we split up by users, and we calculate their a recommendations, whether it be recommendations. And then when it comes time for someone like in our marketing department to create the email, they can use their existing A B testing framework to split their users into a and b cohorts. And then voila, we have an email that goes out that splits between these two approaches. And we can get some some feedback pretty quick that way. And then, like you say, it wasn't like a slam dunk. Yes, budget is super impactful here. And so we were able to like roll that back and kind of learn from it. And so as future tests roll out, I'll use that same technique.

Sarah Haq:

Yeah, and we split, we split the users 5050. Right. So 50% is our control group and 50%, our test group. So originally, our stress test started with the original emails, logic, and we tested against our affinity Rex. I saw some crazy numbers the other day, we had something like 285%. CVR to see, I don't even know what that would look really fancy. But I think in layman's terms, I think we've doubled or tripled conversion, which was, again, it's an amazing, amazing feat just by making such small changes and just keep iterating making sure we are constantly improving what we show our collectors. But yeah, so we our first test was was that and then we tried Definity RX, and we try to finish the RX with budget. But that test didn't do well. So we said we went back to affinity RX. But having that testing framework really makes decision making smoother, quicker plugs here.

Jon Allured:

I'm thinking about hiccups. And one of the hiccups I can take was I mixed up ascending and descending. And I was surfacing the lowest affinity Rex rather than the highest affinity Rex for some number of weeks. So once I fixed that block here again, so what we had already put the time in to migrate all these surfaces to a single source of truth. So finding this mistake, and fixing this mistake was much cheaper. I didn't have to go across all these different surfaces, all these API endpoints, there's one place where I made this mistake in one place to fix it.

Sarah Haq:

That's not my favorite hiccup that my favorite hiccup was when we brought the whole platform down. Because Because originally when we talk about recommendations, so again, recommendations means lots of up a whole means a whole family of algorithms, this machine learning algorithms we can use. When we talk about recommendations, we're talking about the affinity recommendations, and new works rail. So originally, we thought, Oh, we're gonna be using these machine learning recommendations. And then we needed we needed because that's not surfacing as not predicting recommendation. So many users were like, oh, we need to then use the following logic. And then we need for logic, and then we broke, we broke Artsy, I think that for me is my favorite hiccup.

Jon Allured:

Yeah, we have a proxy server called metaphysics. And the the email campaign that went out, it hits this, this endpoint for every user. And it was just inundating the API. And so this is another thing we've learned a lot about, of how to make it all performance as well. So yeah, I think of like the website, the app, the email as like clients of this API endpoint, that ultimately then queries this database for, for the recommendations. And it's like, a two part query. So first, it gets the user ID or work, Id score, whatever. And then it hits our main API and enhances that information with the full artwork information that would be required to like, show the image, show the title, link to the page, etc. So one of the thought I had was like, maybe we could discuss where we see this going, like, you know, what, what else can we do? One thought I have here is, we're very coupled to this idea of like, every day, we recalculate our recommendations. And that's great for existing users. And it's great for, you know, the frequency of sending an email every week. But we sort of have this idea that we would love to onboard new users, get some signals right away from them, and then immediately show them works that we think are really going to, you know, excite them and reflects their tastes and stuff like that. So that's more like real time recommendation calculation. And we just don't have any precedent for this yet. That's one that I think would be really sweet.

Sarah Haq:

Yeah, that's a bit trickier, though, because that's now going into the cold start problem. And usually a lot of ways company solve this is by series of questions and just trying to showcase either the most popular artworks or artworks where, let's say the advisory teams radar just so that we can then do something about that. But we can't magically showcase here the topic. Perfect, are relevant artists or artworks for you. But usually people solve this by popularity models think when you first go on to Netflix or YouTube or anything. They have to start with just generic content, but I think it's really important to showcase and put four are the best or are the most curated, well curated. And this is where we should definitely get our amazing, talented, Artsy experts. And so they can entice new users, for example, we have trove, we could even show that, and then making sure that we very quickly, once we've shown, once we've addressed this cold start problem, then we'd have to be able to show the very meaningful and relevant recommendations quickly. But something we haven't touched on, which I think is part of the victory of the single source of truth is the backfill, because not all users give us signals. We have a lot of users, maybe 20% of users that come on the platform. They're not following and not engaging. But that maybe something in our user journey, bro, maybe these are bounced completely bounced users. But still, there's always potential to entice collectors or potential collectors, or these non bars as we call them. So I think the backfill is another area that we can test, we can try lots of different black fields. Right now we're using one for was a trending score. Maybe we could try another version. We could try Trove we could wait. But that's another area. And I think once we test that and get more feedback on that, that's something that we can use on our onboarding as well, sure. But I think the fact that we have backfill is amazing.

Jon Allured:

It's pretty awesome. So in terms of the mechanics of this, the way this works is, it's possible for a user to be so new or so whatever, like they don't have enough information for us to calculate how many wrecks we want to show. And so what's they're talking about here is like, query goes in, and the and the array of recommendations comes back empty. And so what we do then is make a second API call to a curated list of artworks that our marketing friends make for us. And so that's updated weekly. And it's it's a combination of things. But as you say, there's some maybe some room for experimentation improvement there. But that that helps us always have at least six. So for the email campaign, they show six up in that in that email, you may have artists affinity, but we just may not have inventory. Like it's certainly possible that you have affinity for a few artists. But what if we just don't have any published for sale works by those artists, that can happen. And so in that case, again, backfill would come in, and you'd see that that curated list of things that we think are really hot right now?

Sarah Haq:

Yeah, absolutely, that's another, that's a very valid concern that we might just not have inventory, and we can't show them relevant content. And that's why it would be great once to defend you, Rex solves one area, it'd be great to then start using machine learning algorithms, because machine learning algorithms could also just like backfill it with could showcase predict artworks that might be relevant might be similar. And so that's something that we haven't tested in too much detail, they live on our homepage. But i Li from my side, I think that definitely is the next area that we have to test more and incorporate more. And, actually, so the way I see the onboarding flow going is we have user onboarding, we showcase some of our best artworks, if we can get even two signals, we can match them and make predictions. If we want to show them very relevant affinity parts, we have to just give them some time, we have to keep sending them to sort of seven day follow up emails be like, hey, follow this. Follow that, but we just have to keep enticing them and suggesting artists to follow or showcasing our best artworks, which I think people will appreciate. Because art is beautiful. And if you've made it this far, you want to see beautiful artworks. I don't think it's annoying. I can't imagine anyone ever saying, Oh, this gorgeous piece of this is hurting my eye.

Jon Allured:

That is a fun thing about this project is that like, we're showing people things they want to see. We're doing a service.

Sarah Haq:

So something that I think maybe we can investigate more as maybe our prices, or maybe showing 100 cases of artwork doesn't reflect our user base. But usually showing people art is not going to frustrate anyone if anyone is going to have a warm and fuzzy feeling inside. So I don't I don't feel I don't feel bad that we're bombarding people with god awful content, we get to showcase or to collect us and if people come to Artsy, they usually have some affinity towards art.

Jon Allured:

That's how they kind of got there in the first place. Speaking of warm and fuzzies, thank you so much for talking to me about this, Sara. It's been great. It's been so fun working with you on this and I think that we're going to probably continue to work on it in the future.

Sarah Haq:

Thank you so much for chatting and spearheading it, providing a single source of recommendation through Letsa. Yeah, super excited about next steps as well.

Jon Allured:

Awesome. Thanks for listening, and we'll talk to you next time.

Matt Dole:

Thanks for listening. You can follow the Artsy engineering team on Twitter at Artsy open source and you can find our blog at Artsy You don't get hyped up hi oh this episode was mixed and edited by Jesse Mikania and our theme music is by Eve Essex that you can find on all major streaming platforms See you next time