SQL on Stefan Litsche

Sorting in Lakes

Fri, 02 Jul 2021 10:26:00 +0100

In case we sort rows using SQL we need to distinguish between duplicate values of unique rows and duplicate values due to duplicate rows.

The nice thing about the row_number SQL function is that it provides you with a unique number (per partition) which is often used as a mean to de-duplicate a set. The rank function does not provide this property because duplicate values share the same rank.

SELECT a,
    dense_rank() OVER (ORDER BY a),
    rank() OVER (ORDER BY a),
    row_number() OVER (ORDER BY a)
FROM (VALUES (1),(1),(2)) AS t(a);

 a | dense_rank | rank | row_number
---+------------+------+------------
 1 |          1 |    1 |          1
 1 |          1 |    1 |          2
 2 |          2 |    3 |          3

The problem occurs if you use it the wrong way. Recently I spent a some time debugging a weird behavior in a report. The numbers changed during different executions on an immutable data set — which is not what you expect.

Analyze Extreme Distributions in Postgresql

Thu, 30 Jul 2015 10:30:00 +0100

Recently my team and I observed in our PostgreSQL databases a sporadic increase in the execution time of stored procedures (see the graph above). Often it happened that an analyze of the referenced table solved the issue. In our case, fluctuations in our execution plan caused statement timeouts. This led to errors in our applications.

We wanted to understand this behavior better. Which circumstances prompted more frequent plan fluctuations? How exactly could we influence the system to be more reliable? To find answers, we tested how different configurations of PostgreSQL influenced the results of the query planner. This post shares the results of our tests.

NULL Werte in Order By Klausel

Thu, 21 Jul 2011 16:21:00 +0000

Man möchte in einer nach einer Spalte sortieren die Null Werte enthält. Aber man braucht ein sicheres Verhalten dafür, ob die NULL Werte am Beginn oder am Ende eingefügt werden. Wie kann man das standard konform und backend unabhängig realisieren?

Am einfachsten wäre es, wenn es ein zugesichertes Verhalten gibt. Leider gibt es das nicht. Der Standard gibt nur vor, dass alle Nullmarken gleich behandelt werden. Der ISO Standard SQL/92 sagt dazu (Zitat C.Date, H.Darwin (1998) SQL - Der Standard. Seite 278):

Performance Strategien

Thu, 02 Jun 2011 12:20:00 +0000

Gerade arbeite ich in einem Projekt, bei dem es um die Konvertierung von ca. 20 Mio Datensätzen geht. Vorgegebene Technik ist PL/SQL. Die allgemein bekannten grundlegenden Maßnahmen zur Optimierung - wie optimale Zugriffspfade, Indizierung, Vermindern der Resultsets, etc. - werden genutzt. Was kann man darüber hinaus noch tun?

Vermeiden von I/O
Parallelisierung

Zugriffe im Hauptspeicher sind bekanntermaßen schneller aus I/O Operationen. Darum können Nachschlage Listen, die im Hauptspeicher bspw. als Arrays oder Hashlisten umgesetzt sind, deutliche Zeit ersparen.