antydizajn / lorentz-vs-euclid
production: 57,900 Lorentz points · dim 129 · HyperspaceDB

Why your taxonomy embeddings collapse in flat space.

Paste a hierarchy. The page embeds it twice — once on the Poincaré disk (Lorentz model, signature −+++) and once in plain Euclidean R². Same tree, same recursion, different geometry. Watch the flat one eat itself.

1. Input — paste your hierarchy

Use indentation (2 spaces per level) or > arrows for paths. One concept per line. 5–50 nodes works best.

Computed live in your browser. No backend. No tracking. Source: view-source.

Lorentz / Poincaré disk

hyperbolic, exponential capacity

Euclidean R²

flat, polynomial capacity

2. What you are looking at

Both plots use the same algorithm: place the root, split the available angle among its children, step outward by a fixed distance, recurse. The only thing that changes is the geometry of "outward".

On the left, "outward" follows the Möbius gyrovector. Distance accumulates along a hyperbolic ruler, and the disk has room because area grows like cosh(r). Tree depth maps almost cleanly to radial position; siblings keep their angular wedges.

On the right, "outward" is a straight line. Area grows like . Three levels in, the wedges run out. Subtrees overlap. Cousins land on top of grandparents. Cosine retrieval starts returning nonsense and you blame the encoder.

Click any node to highlight the ancestor path in both plots. Red edges in the Euclidean plot mark crossings — edges that intersect each other in the layout, a visual signature of structural collapse you can not get away from in flat space for branching trees.

3. The math, short version

// Lorentz inner product, signature (−,+,+,+) ⟨x, y⟩L = −x₀·y₀ + x₁·y₁ + x₂·y₂ + … + xₙ·yₙ // Hyperboloid model (one sheet, x₀ > 0) ℍⁿ = { x ∈ ℝⁿ⁺¹ : ⟨x, x⟩L = −1, x₀ > 0 } // Geodesic distance — one arccosh, no series, no approximation d(x, y) = arccosh( −⟨x, y⟩L ) // Projection to Poincaré disk (used for the plot above) π(x₀, x₁, x₂) = ( x₁ / (x₀ + 1), x₂ / (x₀ + 1) ) // Step on the disk: Möbius addition + tanh exponential map expp(d, θ) = p ⊕ ( tanh(d/2) · e^{iθ} )

That is the whole geometry. Production code adds curvature constants, batched ops, and a numerically stable variant of arccosh, but the substance fits on a napkin.

4. The honest version

You drop a taxonomy into a vector DB and retrieval starts behaving weird. Cousins look closer than parent–child pairs. Top-K returns siblings of siblings before it returns the actual ancestor. You add a reranker, you add a graph filter, you weight some metadata, and at some point nobody on the team remembers why there are four heuristics on top of cosine.

This is what flat space does to trees. Euclidean and cosine spread points roughly uniformly across a sphere. Trees grow exponentially: every level multiplies the leaf count, and you only have polynomial room. So the leaves crowd, the hierarchy smears, and your numbers quietly degrade.

Hyperbolic space has exponential capacity built in. Pick a Lorentz model, push the tree out from the origin, distances stay honest. The math is short. The implementation cost is also short, if you have done it before. Most of the work is the eval, not the model.

I run this in production. HyperspaceDB, currently 57,900 Lorentz points in one collection at dimension 129. It works for me. It might work for you. The demo above is a toy — your data will behave differently. That is what the eval is for.

When Lorentz is worth the switch

  • Your data has a tree, a DAG with strong "is-a" edges, or a strict containment hierarchy.
  • You retrieve across levels — ancestors, descendants, "what category is this?", not just nearest neighbours of leaves.
  • Your top-K precision drops as you go deeper into the hierarchy.
  • You already tried bigger models, more dims, and a reranker, and the gains plateaued.
  • You can afford 30–60 minutes to look at empirical numbers before committing.

When it is not worth it

  • Your data is genuinely flat. Product reviews, news headlines, chat messages — the shape is not a tree.
  • You only need lexical or sentence-level similarity at one level.
  • Your stack has zero appetite for a non-stdlib metric. Some teams will nope out of arccosh on principle.
  • You have not measured anything yet. Switch metric only after you can show the current one is the bottleneck.
  • You want a silver bullet. Lorentz is geometry, not a magic trick.

Got hierarchical / taxonomic data and your retrieval feels off?

250 PLN

I will do a 60-minute eval on your data: Lorentz vs your current embedding, with empirical numbers — recall@k across levels, distortion of tree distances, where it breaks. You get a short report, the eval script, and an honest recommendation. If hyperbolic does not help on your data, I tell you that.

Email: paulina@antydizajn.pl

Pay: Revolut revolut.me/danveld · BLIK to 793 093 721 · title Lorentz eval

Then: open a brief on GitHub and paste a description of your data shape. Do not paste secrets — issues are public.

Open the brief All voucher options
Notes on the demo (read before you over-interpret)

This page embeds your input via Sarkar-style recursive placement: each node receives an angular wedge from its parent, children are placed by Möbius addition (hyperbolic side) or vector addition (flat side). It is not training. There is no learning. No vectors are coming from a real LLM. The point of the demo is purely geometric: show that the same recursive tree-placement that fits cleanly on the disk crashes into itself in R².

For a real eval I use trained embeddings, project to a Lorentz manifold via either Riemannian SGD or a closed-form lift, and measure mean average precision at k against the original hierarchy.

The "collisions" metric counts sibling-pairs whose post-embedding distance is smaller than their distance to their own parent. In a faithful embedding this number should be zero or near-zero. In flat R² with deep trees it grows fast.