Update documentation for new implementation

This commit is contained in:
Matthias Beyer 2018-02-13 22:49:31 +01:00
parent cbb47cffcb
commit a5eaf0439c

View file

@ -3,27 +3,24 @@
This library crate contains functionality to generate _references_ within the
imag store.
It can be used to create references to other files on the filesystem (reachable
via a filesystem path). It differs from `libimagentrylink`/external linking as
A reference is a "pointer" to a file or directory on the filesystem and outside
the store.
It differs from `libimagentrylink`/external linking as
it is designed exclusively for filesystem references, not for URLs.
A reference can have several properties, for example can a reference track the
content of a filesystem path by hashing the content with a hashsum (SHA1) and
one can check whether a file was changed by that.
As files can get big (think of `debian.iso`) _partial hashing_ is supported
(think of "hash the first 2048 bytes of a file).
The library contains functionality to re-find a moved file automatically by
checking the content hash which was stored before.
Permission changes can be tracked as well.
A reference is created with a unique identifier, like a hash. The implementation
how this hash is calculated can be defined by the user of `libimagentryref`.
So this library helps to resemble something like a _symlink_.
### Usage
Users have to implement the `UniqueRefPathGenerator` trait which should
implement a hashing functionality for pathes.
### Limits
Please understand that this is _not_ intended to be a version control system or
something like that.
This is _not_ intended to be a version control system or something like that.
We also can not use _real symlinks_ as we need imag-store-objects to be able to
link stuff.
@ -31,39 +28,22 @@ link stuff.
This library offers functionality to refer to content outside of the store.
It can be used to refer to _nearly static stuff_ pretty easily - think of a
Maildir - you add new mails by fetching them, but you mostly do not remove mails
and if you do you end up with a "null pointer" in the store, which can then be
handled properly.
As this library supports custom hashes (you don't have to hash the full file,
you can also parse the file and hash only _some_ content) this is pretty
flexible.
For example if you want to implement a imag module which tracks a certain kind
of files which constantly change... but the first 5 lines do never change
after the file is created - you can write a custom hasher that only uses the
first 5 lines for the hash.
### Internals
Internally, in the store, the file gets created under
`/ref/<hash of the path to the file to refer to>`.
If the content of the file is hashed, we can still re-find the file via the
content hash (which is stored in the header of the store entry).
The reference object can, after the path was re-found, be updated.
Maildir - you add new mails by fetching them, but you mostly do not remove
mails.
If mails get moved, they can be re-found via their hash, because Maildir objects
hardly change. Or because the hash implementation which is used to refer to them
hashes only the `Message-Id` and that does not change.
### Long-term TODO
Things which have to be done here or are not yet properly tested:
Not implemented yet:
- [ ] Testing of different Hashers
- [ ] Testing of re-finding of objects, including:
- [ ] Can a moved file automatically be found by content hash?
- [ ] Does a store-reference get updated automatically if it was moved,
including links (as in `libimaglink`)?
- [ ] If the content of a file changes, does the content hash get updated
automatically?
("automatically" is a strechable term here, as these things have to be triggered
by the user anyways)
- [ ] Re-finding of files via their hash.
This must be implemented with several things in mind
* The user of the library should be able to provide a way how the
filesystem is searched. Basically a Functor which yields pathes to
check based on the original path of the missing file.
This enables implementations which do only search a certain subset
of pathes, or does depth-first-search rather than
breadth-first-search.