Commit e1d14ac1 authored by Niels Möller's avatar Niels Möller
Browse files

*** empty log message ***

Rev: README:1.1
Rev: doc/HACKING:1.4
parent 90a2bcad
LSH - a free implementetion of the Secure Shell protocols.
LSH IS WORK IN PROGRESS. DON'T EXPECT THE CURRENT VERSION TO WORK, AND
*DON'T* EXPECT IT TO PROVIDE ANY SECURITY WHATSOEVER.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation. See the file COPYING for details.
If you have downloaded a snapshot, you should be able to compile it
with
./configure
make
If you have checked out lsh from cvs, things are more complicated. You
need at least autoconf, automake, GNU-make, gcc, make, gperf and scsh
to get going. The Makefile.am file is created by running ./make_am. If
the compiler complains that it can't find a file foo.h.x,
try creating it with make foo.h.x or ./make_class <foo.h >foo.h.x, and
similarly for missing foo.c.x files.
For an introduction to the workings of LSH, see the file HACKING.
Several people have contributed to LSH, see the AUTHORS file for
details.
If you are interested in lsh, you may want to subscribe to the
psst-list. Subscription address is psst-request@net.lut.ac.uk.
Current snapshots of lsh can be found at
<URL: http://www.lysator.liu.se/~nisse/archive/>.
A Hacker's Guide to LSH A Hacker's Guide to LSH
This document contains some random notes, which I hope will make it This document contains some notes, which I hope will make it easier
easier for you to understand and hack lsh. It is divided into three for you to understand and hack lsh. It is divided into four main
main sections: Abstraction, memory allocation, and a source roadmap. sections: Abstraction, Object system, Memory allocation, and a Source
roadmap.
ABSTRACTION ABSTRACTION
...@@ -14,7 +15,7 @@ unsigned octets. The NUL character does *not* have any special status. ...@@ -14,7 +15,7 @@ unsigned octets. The NUL character does *not* have any special status.
Most of the functions in lsh are organized in terms of objects. An Most of the functions in lsh are organized in terms of objects. An
object type has a public interface: a struct containing attributes object type has a public interface: a struct containing attributes
that all instances of all implementations of the type must have, and that all instances of all implementations of the type must have, and
one or more method pointers. A method implementation is a C functions one or more method pointers. A method implementation is a C function
which takes an instance of a corresponding instance as its first which takes an instance of a corresponding instance as its first
argument (or in some cases, a pointer to a pointer to an instance). argument (or in some cases, a pointer to a pointer to an instance).
For many types, there is only one public attribute, which is a method For many types, there is only one public attribute, which is a method
...@@ -27,52 +28,69 @@ Extra data can be considered private, in OO-speak. ...@@ -27,52 +28,69 @@ Extra data can be considered private, in OO-speak.
Explicit casts are avoided as much as possible; instances that are Explicit casts are avoided as much as possible; instances that are
passed around are typed as pointers to the corresponding interface passed around are typed as pointers to the corresponding interface
struct, not as void *. Macros are used to make application of methods struct, not as void *. Macros are used to make application of methods
and closures more convenient. and closures more convenient. To cast a pointer to the right subclass
or superclass, the macro CAST is used, which provides optional runtime
type-checking. Catching pointer errors as early as possible makes
debugging easier.
An example might make this clearer. The definition of a write handler, An example might make this clearer. The definition of a write handler,
taken from abstract_io.h: taken from abstract_io.h:
/* May store a new handler into *w. */ /* CLASS:
struct abstract_write (class
{ (name abstract_write)
int (*write)(struct abstract_write **w, (vars
struct lsh_string *packet); (write method int "struct lsh_string *packet")))
}; */
#define A_WRITE(f, packet) ((f)->write(&(f), (packet))) #define A_WRITE(f, packet) ((f)->write((f), (packet)))
/* A processor that passes its result on to another processor */ /* A handler that passes packets on to another handler */
struct abstract_write_pipe /* CLASS:
{ (class
struct abstract_write super; (name abstract_write_pipe)
struct abstract_write *next; (super abstract_write)
}; (vars
(next object abstract_write)))
This is the interface structure common to all write handlers, and a */
generic subtype used for piping write handlers which are piped
together. One specific kind of write handler is the unpad handler, As you see, the definitions are not written directly in C, but in a
which removes padding from recieved packets, and sends them on. This specialized "class language" (described in the next section). Both
code is found in unpad.c, and it does not have any private data structure definitions and the functions needed for gc are generated
beyond the abstract_write_pipe structure above. The write method automatically and can be found in the corresponding .x file,
implementation of this type looks as follows: abstract_io.h.x.
static int do_unpad(struct abstract_write **w, abstract_write is the interface structure common to all write
handlers, and abstract_write_pipe is a generic subclass used for
piping write handlers together. One specific kind of write handler is
the unpad handler, which removes padding from recieved packets, and
sends them on. This object is implemented in unpad.c, and it does not
have any private data beyond the abstract_write_pipe structure above.
The write method implementation of this type looks as follows:
static int do_unpad(struct abstract_write *w,
struct lsh_string *packet) struct lsh_string *packet)
{ {
struct abstract_write_pipe *closure = (struct abstract_write_pipe *) *w; CAST(abstract_write_pipe, closure, w);
UINT8 padding_length; UINT8 padding_length;
UINT32 payload_length; UINT32 payload_length;
struct lsh_string *new; struct lsh_string *new;
if (packet->length < 1) if (packet->length < 1)
return 0; {
lsh_string_free(packet);
return LSH_FAIL | LSH_DIE;
}
padding_length = packet->data[0]; padding_length = packet->data[0];
if ( (padding_length < 4) if ( (padding_length < 4)
|| (padding_length >= packet->length) ) || (padding_length >= packet->length) )
return 0; {
lsh_string_free(packet);
return LSH_FAIL | LSH_DIE;
}
payload_length = packet->length - 1 - padding_length; payload_length = packet->length - 1 - padding_length;
...@@ -90,27 +108,169 @@ Note the last line; the function passes a newly created packet on to ...@@ -90,27 +108,169 @@ Note the last line; the function passes a newly created packet on to
the next handler in the pipe. the next handler in the pipe.
There's no central place where all important state is stored; I have There's no central place where all important state is stored; I have
tried to delegate details to the releant places. However, some things tried to delegate details to the relevant places. However, some things
that didn't fit anywhere else, and information that is needed by many that didn't fit anywhere else, and information that is needed by many
modules, is kept in the ssh_connection structure (in connection.[hc]). modules, is kept in the ssh_connection structure (in connection.[hc]).
Connection objects are also the point where packet are dispatched to Connection objects are also the point where packets are dispatched to
various handlers (key exchange, channels, debug, etc). Handlers are similar various packet handlers (key exchange, channels, debug, etc). packet
to the abstract write handlers described above, but they get one extra handlers are similar to the abstract write handlers described above,
argument: a pointer to the connection object. but they get one extra argument: a pointer to the connection object.
OBJECT SYSTEM
The language used for defining classes is not a full-featured
OO-language. In fact, it's primary purpose is not to help with object
orientation (I could do that fairly well with plain C before
introducing the class language), but to provide the information needed
for garbage collection. The syntax is scheme s-expressions, and the
files are preprocessed by make_class, a scheme schell script a few
hundred lines long.
Classes are written inside C comments. For each definition the source
file contains a line
/* CLASS:
followed by an s-expression. There are a few different kinds of
definitions. The most important is the class-expression, which looks
like
(class
(name NAME-OF-THE-CLASS)
(super THE-SUPERCLASS) ; Optional
(vars
INSTANCE-VARIABLES)
(methods
NONVIRTUAL-METHODS))
The last clause, (methods ...) is related to the meta-expression and
is currently used only by alist objects. It should be considered even
more experimental than the rest of the class language.
The most interesting part are the vars-clauses. Each instance variable
or method is described as a list, (name type ...). (If you are
familiar with lisp expressions, types are represented as lists, and
the syntax is actually (name . type) ). A type is a list of a keyword
and optional arguments. For convenience, a type that is not a list is
expanded to a simple type, i.e. int is equivalent to (simple int).
Some examples of variable definitions and corresponding C declarations
are:
(foo simple int)
(foo . int)
int foo;
The type keyword "simple" means that the variable will be ignored by
the garbage collector.
(foo pointer (simple int))
(foo pointer int)
(foo simple "int *")
int *foo;
Pointer is a modifier that is usually overkill for simple types. The
syntax is (pointer type) or (pointer type length-field) where the
latter construction implies that the pointer points to an array, the
length of which is kept in an instance variable LENGTH-FIELD.
(foo string)
struct lsh_string *foo;
(foo bignum)
mpz_t foo;
The string (or bignum) will be deallocated automatically when the
object is garbage collected.
(foo object abstract_write)
struct abstract_write *foo;
The keyword object means a pointer to an object, and the gc will make
sure that it is deallocated when *all* pointers to it are gone.
(foo method void "int arg")
(foo pointer (function void "struct THIS_TYPE *self" "int arg"))
void (*foo)(struct THIS_TYPE *self, int arg);
The method keyword defines a method, implemented as an instance
variable holding a function pointer. Keeping the pointer in an
instance variable rather than in the class is flexible; it's easy to
ocerride it in a subclass or even in a single object. So these methods
are as virtual as one can get.
It is possible to use the meta-feature to place method pointers in the
class struct rather than in every instance, and would be preferable
for some classes. But currently, the alist classes are the only ones
which are not keeping method pointers in each object.
Usually, there's an invocation macro for each method. For the above
method, one would use a macro such as
#define FOO(o, i) (((o)->foo)((o), (i)))
These macros are not generated automatically.
(foo struct dss_public)
struct dss_public foo;
The struct keyword incorporates some other struct in the instance (note
that it is *not* a pointer). The difference from including a C structure
as a simple type, like (foo simple "struct dss_public"), is the gc
properties. structures used with the struct keyword should be defined
by a struct-expression, and that definition determines not only the
contents of the structure, but also its gc properties.
(foo array (object abstract_write) LENGTH)
struct abstract_write *foo[LENGTH];
The array keyword defines an array of fix size. If the content type
((object abstract_write) in the example above) needs any gc
processing, that processing will be applied to each element. In
principle, one could nest array and pointer constructions arbitrarily,
but the current implementation can't handle construction requiring
nested loops to process the elements.
As a last resort, one can use the keyword special,
(foo special "struct strange *"
do_mark_strange do_free_strange))
struct strange *foo;
where the last two arguments are names of functions that should be
called by the garbage collector.
As noted above, structures for use with the struct keyword should be
defined by a struct-expression:
(struct
(name NAME-OF-STRUCT)
(vars
VARIABLES))
This defines a structure that can be included in other objects. The
structure is not an object in itself (i.e. they have no object
headers, pointers to them can not be handled by the gc, and (object
SOME-STRUCT) is not a valid type). The vars-clause is just like the
corresponding clause in a class-expression.
MEMORY ALLOCATION MEMORY ALLOCATION
As always when writing C programs, memory allocation is the most As always when writing C programs, memory allocation is the most
complicated and boring part of it. The objects in lsh can be complicated and error-prone part of it. The objects in lsh can be
classified by allocation strategy into three classes: classified by allocation strategy into three classes:
× Strings. These use a producer-consumer abstractions. Strings are × Strings. These use a producer-consumer abstraction. Strings are
allocated in various places, usually by reading a packet from some allocated in various places, usually by reading a packet from some
socket, or by calling ssh_format(). They are passed on to some socket, or by calling ssh_format(). They are passed on to some
consumer function, which has to deallocate the string when it is consumer function, which has to deallocate the string when it is
finished procesing it usually by throwing it a way, transforming it finished procesing it, usually by throwing it a way, transforming it
into a new string, or writing it to some socket. If you want to *both* into a new string, or writing it to some socket. If you want to *both*
pass a string to a consumer, and keep it for later reference, you have pass a string to a consumer, and keep it for later reference, you have
to copy it. to copy it.
...@@ -121,42 +281,30 @@ function that produced the string can not assume that it is alive or ...@@ -121,42 +281,30 @@ function that produced the string can not assume that it is alive or
intact after that it has been passed to a consumer. intact after that it has been passed to a consumer.
× Local objects, used in only one module, and with references from × Local objects, used in only one module, and with references from
only one place. Examples are the list nodes that io.c uses to link only one place. Examples are the queue nodes that write_buffer.c uses
file objects together. These are freed explicitly when they are no to link packets together. These are freed explicitly when they
longer needed. are no longer needed.
× Other objects and closures, which references eachother in some × Other objects and closures, which references eachother in some
complex fashion. Except places where it is *obvious* that an object complex fashion. Except places where it is *obvious* that an object
can be freed, these objects are currently not freed at all. This is a can be freed, these objects are not freed explicitly, but are handled
serious bug, but it may not be as disastrous as one may think. In real by the garbage collector. The gc overhead should be farily small;
use, almost all allocated memory are strinsg, which *are* freed when almost all allocated memory are strings, which *are* freed explicitly
they are no longer used. The problem are things like pipes of write when they are no longer used. The objects handled by the gc things
handlers, keyexchange state objects, etc, which are rather few. like pipes of write handlers, keyexchange state objects, etc, which
are relatively few.
Of course, this should be fixed. I don't think it is practical to
manually free all objects at the right time. Instead, one could use Objects are allocated using the NEW() macro. Objects that won't be
some of the following methods. needed anymore can be deleted explicitly by using the KILL() macro.
1. Reference counting (circular references still have to be broken For the gc to work properly, it is important that there be no bogus or
manually, but that's a lot easier than explicitly freeing objects uninititialized pointers. Pointers should either be NULL, or point at
at the right time). some valid data, and all bignums should be initialized. Note that this
rule applies to all objects, including those KILL()ed explicitly.
2. Some pool-based mechanism: Associate each allocated objects with
some connection, and free them all when the connection dies. One Currently, all memory is zeroed on allocation, which is overkill (and
could also have a limit on the amount of storage that can be also doesn't take care of initializing bignums). I'm considering
allocated for one connection, to avoid trivial denial of service extending the object system to initialize objects more intelligently.
attacks. If the a connection tries to allocate beyond that limit,
it is killed.
3. A simple mark&sweep gc. Should be fairly straight forward. Install
som runtime type info in the object structs, and do an occational
gc instead of sleeping in the main loop in io.c. Note that all
action is hooked into the callbacklists in io.c, so these lists can
serve well as the root set for the traversal.
For all these alternatives, note that the amount of data they must
handle is quite limited. There will likely be at most a few dozens of
objects for each connection that has to be considered.
ROADMAP ROADMAP
...@@ -174,6 +322,13 @@ atoms.in Textual names of the algorithms and services ...@@ -174,6 +322,13 @@ atoms.in Textual names of the algorithms and services
process_atoms bash script and the GNU gperf process_atoms bash script and the GNU gperf
program. program.
channel.[hc] Manages the channels of the ssh connection
protocol.
connection.[hc] Packet dispatch. Also manages global
information about the connection, such as
encryption and decryption state.
io.[hc] The io module. I believe that it is a good io.[hc] The io module. I believe that it is a good
thing to separate io from other processing. thing to separate io from other processing.
This module is the only one performing actual This module is the only one performing actual
...@@ -185,7 +340,7 @@ io.[hc] The io module. I believe that it is a good ...@@ -185,7 +340,7 @@ io.[hc] The io module. I believe that it is a good
read_{line|packet|data}.[hc] read_{line|packet|data}.[hc]
These are read handlers. They are hooked into These are read handlers. They are hooked into
the io-system, and called when there is input the io-system, and called when there is input
available at a socket. Complete packets (or available on a socket. Complete packets (or
lines) are passed on to some other handler for lines) are passed on to some other handler for
processing. processing.
...@@ -196,9 +351,16 @@ format.[hc] The function ssh_format is a varargs function ...@@ -196,9 +351,16 @@ format.[hc] The function ssh_format is a varargs function
number of other arguments. The supported number of other arguments. The supported
format specifiers are very different from the format specifiers are very different from the
stdio format functions, and works with stdio format functions, and works with
ssh datatypes. It allocates and returns a lsh datatypes. It allocates and returns a
string of the right size. string of the right size.
gc.c Simple mark&sweep garbage collector.
client.c Client specific processing.
server.c Server specific processing, including forking
of subprocesses.
lib/ Free implementations of hash functions and lib/ Free implementations of hash functions and
symmetric cryptographic algorithms. See the symmetric cryptographic algorithms. See the
file AUTHORS for credits. file AUTHORS for credits.
...@@ -209,6 +371,10 @@ crypto.[hc] lsh's interface to those algorithms. ...@@ -209,6 +371,10 @@ crypto.[hc] lsh's interface to those algorithms.
publickey_crypto.[hc] Public key cryptography objects. publickey_crypto.[hc] Public key cryptography objects.
keyexchange.[hc] Key exchange protocol. This file implements the
algorithm-independent parts of the ssh key
exchange protocol.
lsh.c Client main program. lsh.c Client main program.
lshd.c Server main program. lshd.c Server main program.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment