imag/doc/src/02000-store.md

316 lines
11 KiB
Markdown
Raw Normal View History

2016-01-08 17:28:31 +00:00
# The Store {#sec:thestore}
2016-07-08 16:21:05 +00:00
The store is where all the good things happen.
The store is basically just a directory on the filesystem imag manages and keeps
its state in.
One could say that the store is simply a databases, and it really is. We opted
to go for plain text, though, as we believe that plain text is the only sane way
to do such a thing.
A user should always be able to read her data without great effort and putting
everything in a _real_ database like sqlite or even postgresql would need a user
to install additional software just to read his own data. We don't want that.
Text is readable until the worlds end and we think it is therefor better to
store the data in plain text.
The following sections describe the store and the file format we use to store
data. One may skip the following sections, they are included for users who want
to dig into the store with their editors.
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
## File Format {#sec:thestore:fileformat}
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
The contents of the store are encoded in either UTF-8 or ASCII. Either way, a
normal text editor (like `vim` or the other one) will always be sufficient to
dog into the store and modify files. For simple viewing even a pager (like
`less`) is sufficient.
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
Each entry in the store consists of two parts:
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
1. Header
1. Content
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
The following section describe their purpose.
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
### Header Format {#sec:thestore:fileformat:header}
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
The header format is where imag stores its data. The header is an area at the
top of every file which is seperated from the content part by three dashes
(`---`). Between these three dashes there is structured data. imag uses `TOML`
as data format for this structured data, because it fits best and the available
`TOML` parser for the rust programming language is really good.
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
The header can contain any amount of data, but modules (see @sec:modules) are
restricted in their way of altering the data.
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
So normally there are several sections in the header. One section (`[imag]`) is
always present. It contains a `version` field, which tells imag which version
this file was created with (the version information is _also_ encoded in the
filename, just in case things change in the future). It also contains a `links`
field which is an Array of values. This `links` field is for linking (see
@sec:thestore:linking) to other entries in the store.
2016-01-19 14:58:22 +00:00
2016-07-08 16:21:05 +00:00
Other sections are named like the modules which created them. Every module is
allowed to store arbitrary data under its own section and a module may never
read other sections than its own. This is not enforced by imag itself, though.
2016-01-10 16:34:53 +00:00
### Content Format {#sec:thestore:fileformat:content}
2016-07-08 16:21:05 +00:00
The content is the part of the file where the user is free to enter any textual
content. The content may be rendered as Markdown or other markup format for the
users convenience. The store does never expect and specific markup and actually
the markup implementation is not inside the very code of imag.
Technically it would be possible that the content part of a file is used to
store binary data. We don't want this, though.
2016-01-10 16:34:53 +00:00
2016-01-12 12:07:04 +00:00
### Example {#sec:thestore:fileformat:example}
An example for a file in the store follows.
```text
2016-01-12 12:07:04 +00:00
---
[imag]
links = ["/home/user/more_kittens.mpeg"]
2017-02-04 11:17:48 +00:00
version = "0.3.0"
2016-01-19 15:10:31 +00:00
[note]
name = "foo"
2016-01-12 12:07:04 +00:00
---
This is an example text, written by the user.
```
2016-01-08 17:28:31 +00:00
## File organization {#sec:thestore:fileorganization}
2016-01-10 17:07:05 +00:00
The "Entries" are stored as files in the "Store", which is a directory the
2016-07-08 16:21:05 +00:00
user has access to. The store may exist in the users Home-directory or any
other directory the user has read-write-Access to.
Each module stores its data in an own subdirectory in the store. This is because
we like to keep things ordered and clean, not because it is technically
necessary.
We name the path to a file in the store "Store id" or "Storepath" and we often
refer to it by using the store location as root.
So if the store exists in `/home/user/store/`, a file with the storepath
2016-01-10 17:07:05 +00:00
`/example.file` is (on the filesystem) located at
`/home/user/store/example.file`.
2016-07-08 16:21:05 +00:00
A storepath contains predefined parts:
2016-01-19 13:34:58 +00:00
2016-07-08 16:21:05 +00:00
* The module name of the Module the Entry belongs to, as said above.
This part is always a directory.
* The version (semantic versioning applies) of the module storing the entry.
This part is a postfix to the filename.
2016-01-19 13:34:58 +00:00
The pattern for the storepath is
```
/<module name>/<optional sub-folders>/<file name>~<sem version>
```
2016-07-08 16:21:05 +00:00
So if a module named "example-module" with version "0.1.0" stores a file in the
Store, the storepath for a file with the name "example" is
"/example-module/example~0.1.0".
2016-01-10 17:07:05 +00:00
2016-07-08 16:21:05 +00:00
Any number of subdirectories may be used, so creating folder hierarchies is
possible and valid. A file "example" for a module "module" in version "0.1.0"
would be stored in sub-folders like this:
2016-01-19 13:34:58 +00:00
```
2016-07-08 16:21:05 +00:00
/module/some/sub/folder/example~0.1.0
2016-01-19 13:34:58 +00:00
```
2016-07-08 16:21:05 +00:00
For example, it is valid if these files exist at the same time:
2016-01-23 17:03:57 +00:00
* /foo/bar~0.2
* /foo/bar~1.3
2016-07-08 16:21:05 +00:00
It might not be sane, though.
2016-01-23 17:03:57 +00:00
2016-07-08 16:21:05 +00:00
To future-proof the system it is necessary to provide a way for modules to
differentiate in their versions on the store level. Thus if a module wants to
retrieve a file from the store it must at least accept files from it's current
advertised version. It may accept older files and it may transform them and
resubmit them in the newer version.
2016-01-23 17:03:57 +00:00
2016-01-19 13:35:13 +00:00
## Store path links {#sec:thestore:links}
2016-07-08 16:21:05 +00:00
Linking entries is version independent.
2016-01-19 13:35:13 +00:00
This means if an entry "a" from a module "A" gets written to the store, it may
2016-07-08 16:21:05 +00:00
link to an entry "b" from a module "B", which is in version "0.1.0" at the
moment. If the module "B" gets updated, it might update its entries in the store
as well. The link from the "a" should never get invalid in this case, though it
is not ensured by the core of imag itself.
2016-01-10 17:07:05 +00:00
2017-06-17 19:07:30 +00:00
## Backends {#sec:thestore:backends}
The store itself also has a backend. This backend is the "filesystem
abstraction" code.
Note: This is a very core thing. Casual users might want to skip this section.
### Problem {#sec:thestore:backends:problem}
First, we had a compiletime backend for the store.
This means that the actual filesystem operations were compiled into the stores
either as real filesystem operations (in a normal debug or release build) but as
a in-memory variant in the 'test' case.
So tests did not hit the filesystem when running.
This gave us us the possibility to run tests concurrently with multiple stores
that did not interfere with eachother.
This approach worked perfectly well until we started to test not the
store itself but crates that depend on the store implementation.
When running tests in a crate that depends on the store, the store
itself was compiled with the filesystem-hitting-backend.
This was problematic, as tests could not be implemented without hitting
the filesystem.
Hence we implemented this.
### Implementation {#sec:thestore:backends:implementation}
The filesystem is abstracted via a trait `FileAbstraction` which
contains the essential functions for working with the filesystem.
Two implementations are provided in the code:
* FSFileAbstraction
* InMemoryFileAbstraction
whereas the first actually works with the filesystem and the latter
works with an in-memory HashMap that is used as filesystem.
Further, the trait `FileAbstractionInstance` was introduced for
functions which are executed on actual instances of content from the
filesystem, which was previousely tied into the general abstraction
mechanism.
So, the `FileAbstraction` trait is for working with the filesystem, the
`FileAbstractionInstance` trait is for working with instances of content
from the filesystem (speak: actual Files).
In case of the `FSFileAbstractionInstance`, which is the implementation
of the `FileAbstractionInstance` for the actual filesystem-hitting code,
the underlying resource is managed like with the old code before.
The `InMemoryFileAbstractionInstance` implementation is corrosponding to
the `InMemoryFileAbstraction` implementation - for the in-memory
"filesystem".
The implementation of the `get_file_content()` function had to be
changed to return a `String` rather than a `&mut Read` because of
lifetime issues.
This change is store-internally and the API of the store itself was not
affected.
## The StdIo backend {#sec:thestore:backends:stdio}
Sidenote: The name is "StdIo" because its main purpose is Stdin/Stdio, but it
is abstracted over Read/Write actually, so it is also possible to use this
backend in other ways, too.
### Why? {#sec:thestore:backends:stdio:why}
This is a backend for the imag store which is created
from stdin, by piping contents into the store (via JSON or TOML) and piping the
store contents (as JSON or TOML) to stdout when the the backend is destructed.
This is one of some components which make command-chaining in imag possible.
With this, the application does not have to know whether the store actually
lives on the filesystem or just "in memory".
### Mappers {#sec:thestore:backends:stdio:mappers}
The backend contains a "Mapper" which defines how the contents get mapped into
the in-memory store representation: A JSON implementation or a TOML
implementation are possible.
The following section assumes a JSON mapper.
The backends themselves do not know "header" and "content" - they know only
blobs which live in pathes.
Indeed, this "backend" code does not serve "content" or "header" to the `Store`
implementation, but only a blob of bytes.
Anyways, the JSON-protocol for passing a store around _does_ know about content
and header (see @sec:thestore:backends:stdio:json for the JSON format).
So the mapper reads the JSON, parses it (thanks serde!) and translates it to
TOML, because TOML is the Store representation of a header.
But because the backend does not serve header and content, but only a blob,
this TOML is then translated (with the content of the respective file) to a
blob.
This is then made available to the store codebase.
This is complex and probably slow, we know.
To summarize what we do right now, lets have a look at the awesome ascii-art
below:
```
libimag*
|
v
IO Mapper FS Store FS Mapper IO
+--+-------------+---------+--------+---------+--------------+--+
| | | | | | | |
JSON -> TOML -> String -> Entry -> String -> TOML -> JSON
+ TOML
+ Content
```
This is what gets translated where for one imag call with a stdio store backend.
The rationale behind this implementation is that this is the best implementation
we can have in a relatively short amount of time.
### The JSON Mapper {#sec:thestore:backends:stdio:json}
The JSON mapper maps JSON which is read from a source into a HashMap which
represents the in-memory filesystem.
The strucure is as follows:
```json
{
"version": "0.3.0",
"store": {
"/example": {
"header": {
"imag": {
"version": "0.3.0",
},
},
"content": "hi there!",
},
},
}
```
### TODO {#sec:thestore:backends:todo}
Of course, the above is not optimal.
The TODO here is almost visible: Implement a proper backend where we do not need
to translate between types all the time.
The first goal would be to reduce the above figure to:
```
libimag*
|
v
IO Mapper Store Mapper IO
+--+------+--------+-------+--+
| | | | | |
JSON -> Entry -> JSON
+ TOML
+ Content
```
and the second step would be to abstract all the things away so the `libimag*`
crates handle the header without knowing whether it is JSON or TOML.