I wanted to ask a technical questions, maybe high level, on why sites may have bad search and what the bottlenecks might be in almost never updating such in years. Was there something in the original development of the stack that is affecting progressive updates around the feature, how should one approach “Search” then in this case? Or is it simply a management issue.
Few reasons. Really good search is hard to do, which means expensive. Reddit probably doesn’t have the architecture in place to do a quality search. Not that the architecture couldn’t be changed, but it costs money. Another reason is that search engines exist. Why replicate the intricate details of a search engine when it’s not your core business? Especially when everyone can use the search engines to search Reddit.
I would think there are plenty of “off the shelf” systems they could buy for this purpose, but they’re too cheap to do it. I mean all the Reddit post and comment data is in a database to begin with.
There are “off the shelf” systems, for a sufficiently broad interpretation of “off the shelf.” But they are not cheap (requiring probably a dedicated team just to properly configure and maintain, and probably also requiring significant rearchitecturing of your application’s data), and are usually still quite shitty even after all that.
Search is just very, very hard. Much harder than even experienced devs who have not worked in the area appreciate.
Source: I am a dev on a major search engine. No, not that one, but one you have definitely used many times.
I know it’s cool to shit on Reddit (and I dislike them too!), but this really is a technical issue. Stuff being in a database doesn’t mean that you can magically do good searching without anything. Off the shelf systems exist for off the shelf products, problem with those solutions is that once you differ significantly from the target type of project, it costs more and more to integrate. And since Reddit is pretty unique (if you also account for its scale), it doesn’t make financial sense to make a product that’s optimized for Reddit.
I think companies just don’t prioritize internal search because search engines like Google work.
I don’t think it’s a technical hurdle. There are many libraries and tools available for a company to develop a decent internal search capability if they wanted to do so.
I cannot answer the technical question as I don’t have enough experience with that. But I think sites like reddit mostly don’t care about search. They probably think: “People can use google if they want to search.”
For simplicity it probably just searches for posts containing all the words you typed, anywhere in the text and title of the post.
So a search for “table tennis” would match both “table tennis tournament” and “tennis Star looses temper, flips table at Dennis restaurant”
Likely exclusions for common “filler” worlds, like “the, and, is, a,”
Then the results get ordered by a score based on age, engagement / score
That’s what I’ve been thinking. But more often than not I just don’t get any results where Google tends to find the exact posts with the same query. And I guess like others have said, they probably just depend on that. But yeah I felt a feature that is very important like this for Reddit not being fixed is interesting to me.
Google is way more advanced. It’s doesn’t just do a keyword search.
It does things like looking for similar words automatically., Looking at what other users clicked on after doing similar searches.
It can actually be frustrating when it includes things that are nothing to do with what you typed if your search is too specific.
True, it would seem though, Google tackles a more complex problem, while Reddit or the like deals with its own specific data that has a predefined format that they know of.
Looking at what other users clicked on after doing similar searches.
Yeah, I feel Reddit just doesn’t account for these things, because requires way more work maybe to add as weights, if I am thinking of it correctly
Search is lot better if you are not logged in. For example, if you searched for ‘politics’ while not logged in, the top result would be a link to r/politics. But if you are logged in, it’s mostly garbage posts from anything but a political subreddit.