37 private links
Many PostgreSQL users have a basic understanding of how Postgres nbtree indexes work internally (e.g., how the structure is maintained, high level details of how a new level is added to the tree, the role of VACUUM in garbage collection). A smaller number have some understanding of advanced topics (e.g., details of Lehman & Yao's B-Link technique, details of crash recovery). Even an experienced backend hacker could be forgiven for concluding that this is all well explored territory, leaving little room for improvement, since all the important components are already in place.
This view of things is based on a correct premise, and yet cannot explain why nbtree doesn't perform well with certain specific real world workloads. For example, there is an excessive amount of nbtree index bloat created by the industry standard TPC-C benchmark, despite the fact that TPC's transactions rarely update indexed columns, and therefore handily avoid so-called "write amplification". Certain pieces are missing.
Code enhancements (authored by the speaker) that will appear in PostgreSQL 12 will significantly improve matters for affected workloads. In some ways, this work is based on a return to decades old fundamentals. In other ways, it is based on practical experience, involving analyzing real-world index structures in an effort to learn where problems may lie.
This talk will cover:
• A review of the design of nbtree, especially its high level goals.
• The importance of thinking in terms of invariants — the rules underlying what belongs where in the index.
• Interesting ways in which nbtree exceeds what is truly required by the invariants, and how that can be exploited to improve performance.
• Possible future work aimed at reducing CPU cache misses while descending a B-Tree.
• The big picture — how all these techniques are complementary, and worth more than the sum of their parts.
create type infomask_bit_desc as (mask varbit, symbol text);
create or replace function infomask(msk int, which int) returns text
language plpgsql as $$
declare
r infomask_bit_desc;
str text = '';
append_bar bool = false;
begin
for r in select * from infomask_bits(which) loop
if (msk::bit(16) & r.mask)::int <> 0 then
if append_bar then
str = str || '|';
end if;
append_bar = true;
str = str || r.symbol;
end if;
end loop;
return str;
end;
$$ ;
create or replace function infomask_bits(which int)
returns setof infomask_bit_desc
language plpgsql as $$
begin
if which = 1 then
return query values
(x'8000'::varbit, 'MOVED_IN'),
(x'4000', 'MOVED_OFF'),
(x'2000', 'UPDATED'),
(x'1000', 'XMAX_IS_MULTI'),
(x'0800', 'XMAX_INVALID'),
(x'0400', 'XMAX_COMMITTED'),
(x'0200', 'XMIN_INVALID'),
(x'0100', 'XMIN_COMMITTED'),
(x'0080', 'XMAX_LOCK_ONLY'),
(x'0040', 'EXCL_LOCK'),
(x'0020', 'COMBOCID'),
(x'0010', 'XMAX_KEYSHR_LOCK'),
(x'0008', 'HASOID'),
(x'0004', 'HASEXTERNAL'),
(x'0002', 'HASVARWIDTH'),
(x'0001', 'HASNULL');
elsif which = 2 then
return query values
(x'2000'::varbit, 'UPDATE_KEY_REVOKED'),
(x'4000', 'HOT_UPDATED'),
(x'8000', 'HEAP_ONLY_TUPLE');
end if;
end;
$$;
One of the most important tasks of the query planner is to choose which relations to join, how to join them and in which order. How this works is however not widely known, even though the algorithm used is soon to be celebrating its 40th birthday.
This presentation will cover what join order selection is, how the algorithm works and how it's implemented in PostgreSQL, as well as look at what lies ahead in the future for join ordering in PostgreSQL. Hopefully it will demystify the query planner a little.