Graph problems are quite common. However, it’s rare to have access to a database offering graph semantics. There are graph databases, such as Ne

Graph components with DuckDB

submited by
Style Pass
2023-06-04 17:00:04

Graph problems are quite common. However, it’s rare to have access to a database offering graph semantics. There are graph databases, such as Neo4j and GraphX, but it’s difficult to justify setting one of those up. One could simply use networkx in Python. But that only works if the graph fits in memory.

From a practical angle, the fact is that people are querying data warehouses in SQL. There are many good reasons to write graph algorithms in SQL. And anyway, one may argue that graphs are a special case of the relational model.

I have a list of companies. Each company can have several admins. An admin may administrate several companies. Companies share a link because they might have at least one admin in common, and vice versa.

My ex-colleague had a few tens of thousands of rows sitting in a Snowflake table. Each row linking an admin to a company. It took us an hour to obtain a working solution in SQL. But it involved a recursive query with an uninspired stopping condition based on the recursion depth. Moreover, the query took a (painful) few minutes to run.

Let me illustrate with an example, which will also serve as a unit test. Let’s say there are customers $\{1, …, 8\}$ that have visited restaurants $\{A, …, G\}$. Some customers may have visited several restaurants, some none at all.

Leave a Comment