Commit 3cc0297e authored by Per Cederqvist's avatar Per Cederqvist
Browse files

Document local-to-global. (Bug 144).

* doc/lyskomd.texi (local-to-global): Translated old Swedish text
that describes the reasoning behind the local-to-global structure,
and update it to match the current implementation.
parent 470ed77b
2006-08-01 Per Cederqvist <>
Document local-to-global. (Bug 144).
* doc/lyskomd.texi (local-to-global): Translated old Swedish text
that describes the reasoning behind the local-to-global structure,
and update it to match the current implementation.
2006-07-31 Per Cederqvist <>
Document the database process. (Bug 144, partially)
......@@ -2900,7 +2900,7 @@ executed, and which haven't been run. Try to get 100% coverage.
Run the configure script with @samp{--with-debug-calls} to compile in
support for debugging calls in the server. These calls are strictly for
making testing easier (or possible.) They are not official, and they may
change at any time.
change at any time. Not all debug calls are documented.
* memory-info:: Get information from malloc (1000)
......@@ -2964,110 +2964,113 @@ collector.
@section The local-to-global structure
The data structure that stores the mapping from local to global text
numbers is currently one of the more advanced structures used by
This section is not translated to English yet. See a comment in the
@file{lyskomd.texi} for the raw Swedish text.
@c FIXME: Translate this
numbers is as of this writing one of the more advanced structures used
by lyskomd.
@subsection Background
Det sätt som textnummer läggs till har ett antal egenskaper:
- Texter läggs hela tiden på bakifrån, aldrig i mitten eller i
- Numren på de lokala textnumren är konsekutiva, dvs inga hål
finns. Sådana hål kan dock uppstå (och uppstår!) när texter tas
Det första man ser när man analyserar innehållet i en mapp, är att det
finns långa avsnitt av idel nollor, och långa avsnitt där det inte
finns några nollor alls, eller åtminstone väldigt få. Detta antyder
alltså att man bör ha en adaptiv datastruktur som anpassar sig till
det lokala förhållanden. Vi föreslår alltså följande.
Mappen lagras i block (små arrayer). Det finns två sorters block:
1. Glesa block. Glesa block består egentligen av två block. I det
ena blocket ligger nycklar (Local_text_no) och i det andra blocket
ligger data (Text_no). Inom ett block använder man binärsökning i det
ena blocket för att hitta just den Local_text_no man är ute efter.
2. Täta block. Täta block består av ett enda block som innehåller
data (Text_no). Man vet vilket lokalt textnummer det första entryt
svarar mot. Det kan finnas enstaka lokala nummer i ett tätt block som
inte finns -- då innehåller data 0.
Blockstorleken är fixerad till t ex 100 entries. (Det verkar som om
man tjänar nästan exakt samma antal bytes oavsett om man väljer
blockstorlek 50 eller 1000). Ett fullt tätt block innehåller alltid
exakt 100 lokala textnummer. Ett fullt glest block innehåller alltid
100 existerande globala textnummer. (Ett glest block tar dubbelt så
mycket plats som ett tätt block, eftersom ett glest block ju
egentligen består av både ett nyckelblock och ett värdeblock).
För att hålla reda på sina block har man en array av block_info:
The way that text are added to the local-to-global structure has a few
typedef struct block_info {
int first_free;
int zeroes;
@itemize @bullet
@item The local text numbers are always added at the tail. They are
never added in the middle or at the beginning.
@item The local text numbers are initially consecutive (there are no
holes). However, holes are often introduced after a while when texts
are removed (typically by the garb, but also manually). Since some
texts are protected agains removal (for instance by being marked)
holes are common among the older texts.
@end itemize
Local_text_no start;
Local_text_no * key_block;
Text_no * value_block;
} L2g_block_info;
Before creating the current data structure, we carefully analyzed the
contents of the LysKOM database. We saw that the local text numbers
could be partitioned into ranges that had very different
characteristics. In some ranges, almost all local text numbers were
nonexistent; only a few scattered local text numbers existed. In
other ranges, the opposite was true: almost all local text number
existed, but a few scattered local text numbers no longer existed.
Om key_block == NULL så är det ett tätt block.
We concluded that we needed an adaptive data structure that could deal
with these two kind of ranges.
Fältet first_free visar var i blocket som man kan fylla på med
fler värden. Det är 100 för fulla block. För block som inte är
fulla pekar det ut det entry i value_block som nästa värde ska
hamna i. Det gäller t ex det sista blocket, som fylls på allt
eftersom nya inlägg skickas till mötet eller block där texter har
tagits bort.
@subsection The chosen solution
Fältet zeroes används bara för täta block, och räknar antalet
nollor i blocket. Om zeroes blir större än 50% av blockstorleken
gör man om blocket till ett glest block. Fältet zeroes är en
optimering som troligtvis underlättar ihopslagning av block. Det
är möjligt att den inte behövs.
After running some simulations on different data structures, we
decided to use the following structure. The mapping is stored in
several block. Each block maps a range of local text numbers to
global text numbers. The global text number 0 is used for
non-existing texts.
Fältet start innehåller numret på det första lokala textnumret i
There are two kinds of blocks:
Fältet key_block är en pekare till blocket med Local_text_no, dvs
nycklarna i blocket. Detta fält är NULL om detta är ett tätt
@itemize @bullet
@item @dfn{Sparse blocks}. Sparse blocks are represented by two arrays.
The first array contains keys (@code{Local_text_no}) and the second
values (@code{Text_no}). A binary search in the first array is used
to find the index of the wanted key; the corresponding value is stored
at the same index in the second array.
@item @dfn{Dense blocks}. Dense blocks contain a single array of
values (@code{Text_no}). The key (@code{Local_text_no}) that
corresponds to the first entry is also stored in the block. The array
contains information about @code{block_size} local text numbers
starting at that local text number. If any of those local text
numbers no longer exists, a 0 is stored in the array.
@end itemize
Fältet value_block är en pekare till blocket med Text_no, dvs
värdena i blocket.
The block size is currently fixed at 250 entries (see @code{l2g_init}
in @file{src/server/local-to-global.c}). The current implementation
uses linear searches in a couple of places; this means that those
operations are optimal if the block size is choosen as the square root
of the number of existing local-to-global texts (or slightly higher,
to account for stray zeroes in dense blocks). The block size will
have less impact once a binary search is implemented (see bug 154).
The @code{struct l2g_block_info} stores a single block. This is what
it looks like in revision 5532:
Förutom detta behövs en struct per möte som håller reda på arrayen med
struct l2g_block_info @{
/* An index into key_block and value_block that indicates the
first free spot. This can also be thought of as the number of
entries in key_block and value_block that are actually in use. */
int first_free;
/* Number of entries in the block that contain the value 0. For
purposes of calculating this value, data past the end of the
block is counted as zeroes, so that this->first_free +
this->zeroes is always at least L2G_BLOCKSIZE. */
int zeroes;
/* First local text no in this block. Note: this does not
increase if the first entry of the block is removed. */
Local_text_no start;
/* NULL if this is a dense block, or a block of L2G_BLOCKSIZE
Local_text_nos if this is a sparse block. */
Local_text_no *key_block;
/* A block of L2G_BLOCKSIZE Text_nos. */
Text_no *value_block;
@end example
typedef struct local_to_global {
int num_blocks;
int block_size;
L2g_block_info * blocks;
} Local_to_global;
There is also a structure that keeps the array of all blocks:
typedef struct @{
int num_blocks;
Local_text_no first_unused;
struct l2g_block_info * blocks;
@} Local_to_global;
@end example
@end ignore
All users of @code{Local_to_global} should use the accessor functions
in @file{local-to-global.h}.
@node Coding conventions
@section Coding conventions
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment