imag/doc/src/02000-store.md

9 KiB

The Store

The store is where all the good things happen. The store is basically just a directory on the filesystem imag manages and keeps its state in.

One could say that the store is simply a database, and it really is. We opted to go for plain text, though, as we believe that plain text is the only sane way to do such a thing, especially because the amount of data which is to be expected in this domain is in the lower Megabytes range and even if it is really much won't exceed the Gigabytes ever.

Having a storage format which is plain-text based is the superior approach, as text editors will always be there.

A user should always be able to read her data without great effort and putting everything in a real database like sqlite or even postgresql would need a user to install additional software just to read his own data. We don't want that. Text is readable until the worlds end and we think it is therefore better to store the data in plain text.

The following sections describe the store and the file format we use to store data. One may skip the following sections, they are included for users who want to dig into the store with their editors.

File Format

The contents of the store are encoded in UTF-8. A normal text editor (like vim or the other one) will always be sufficient to dig into the store and modify files. For simple viewing even a pager (like less) is sufficient.

Each entry in the store consists of two parts:

  1. Header
  2. Content

The following section describe their purpose.

Header Format

The header format is where imag stores its data. The header is an area at the top of every file which is seperated from the content part by three dashes (---). Between these three dashes there is structured data. imag uses TOML as data format for this structured data, because it fits best and the available TOML parser for the rust programming language is really good.

The header can contain any amount of data, but modules (see @sec:modules) are restricted in their way of altering the data.

So normally there are several sections in the header. One section ([imag]) is always present. It contains a version field, which tells imag which version this file was created with.

Other sections are named like the modules which created them. Every module is allowed to store arbitrary data under its own section and a module may never read other sections than its own.

These conventions are not enforced by imag itself, though.

Content Format

The content is the part of the file where the user is free to enter any textual content. The content may be rendered as Markdown or other markup format for the users convenience. The store does never expect and specific markup and actually the markup implementation is not inside the very core of imag.

Technically it would be possible that the content part of a file is used to store binary data. We don't want this, though, as it is contrary to the goals of imag.

Example

An example for a file in the store follows.


---
[imag]
version = "0.7.0"

[note]
name = "foo"

[link]
internal = ["some/other/imag/entry"]
---

This is an example text, written by the user.

File organization

The "Entries" are stored as files in the "Store", which is a directory the user has access to. The store may exist in the users Home-directory or any other directory the user has read-write-access to.

Each module stores its data in an own subdirectory in the store. This is because we like to keep things ordered and clean, not because it is technically necessary.

We name the path to a file in the store "Store id" or "Storepath" and we often refer to it by using the store location as root. So if the store exists in /home/user/store/, a file with the storepath /example.file is (on the filesystem) located at /home/user/store/example.file.

By convention, each libimagentry<name> and libimag<name> module stores its entries in in /<name>/.

So, the pattern for the storepath is

/<module name>/<optional sub-folders>/<file name>

Any number of subdirectories may be used, so creating folder hierarchies is possible and valid. A file "example" for a module "module" could be stored in sub-folders like this:

/module/some/sub/folder/example

Backends

The store itself also has a backend. This backend is the "filesystem abstraction" code.

Note: This is a very core thing. Casual users might want to skip this section.

Problem

First, we had a compiletime backend for the store. This means that the actual filesystem operations were compiled into the store either as real filesystem operations (in a normal debug or release build) but as a in-memory variant in the 'test' case. So tests did not hit the filesystem when running. This gave us us the possibility to run tests concurrently with multiple stores that did not interfere with each other.

This approach worked perfectly well until we started to test not the store itself but crates that depend on the store implementation. When running tests in a crate that depends on the store, the store itself was compiled with the filesystem-hitting-backend. This was problematic, as tests could not be implemented without hitting the filesystem and mess up other currently-running tests.

Hence we implemented store backends.

Implementation

The filesystem is abstracted via a trait FileAbstraction which contains the essential functions for working with the filesystem.

Two implementations are provided in the code:

  • FSFileAbstraction
  • InMemoryFileAbstraction

whereas the first actually works with the filesystem and the latter works with an in-memory HashMap that is used as filesystem.

Further, the trait FileAbstractionInstance was introduced for functions which are executed on actual instances of content from the filesystem, which was previousely tied into the general abstraction mechanism.

So, the FileAbstraction trait is for working with the filesystem, the FileAbstractionInstance trait is for working with instances of content from the filesystem (speak: actual Files).

In case of the FSFileAbstractionInstance, which is the implementation of the FileAbstractionInstance for the actual filesystem-hitting code, the underlying resource is managed like with the old code before. The InMemoryFileAbstractionInstance implementation is corrosponding to the InMemoryFileAbstraction implementation - for the in-memory "filesystem".

The StdIo backend

Sidenote: The name is "StdIo" because its main purpose is Stdin/Stdio, but it is abstracted over Read/Write actually, so it is also possible to use this backend in other ways, too.

Why?

This is a backend for the imag store which is created from stdin, by piping contents into the store (via JSON or TOML) and piping the store contents (as JSON or TOML) to stdout when the backend is destructed.

This is one of some components which make command-chaining in imag possible. With this, the application does not have to know whether the store actually lives on the filesystem or just "in memory".

Mappers

The backend contains a "Mapper" which defines how the contents get mapped into the in-memory store representation: A JSON implementation or a TOML implementation are possible.

The following section assumes a JSON mapper.

The mapper reads the JSON, parses it and translates it to a Entry. Then, the entry is made available to the store codebase. To summarize what we do right now, lets have a look at the awesome ascii-art below:

                    libimag*
                       |
                       v
 IO   Mapper         Store      Mapper  IO
+--+---------+----------------+--------+--+
|  |         |                |        |  |
      JSON   ->    Entry     ->  JSON

This is what gets translated where for one imag call with a stdio store backend.

The JSON Mapper

The JSON mapper maps JSON which is read from a source into a HashMap which represents the in-memory filesystem.

The strucure is as follows:

{
    "version": "0.7.0",
    "store": {
        "example": {
            "header": {
                "imag": {
                    "version": "0.7.0",
                },
            },
            "content": "hi there!",
        },
    },
}

TODO

If you look at the version history of this file you will see that this implementation has grown from something complex and probably slow to what we have today.

Still, there's one improvement we could make: abstract all the things away so the libimag* crates handle the header without knowing whether it is JSON or TOML. With this, we would not even have to translate JSON to TOML anymore. We should measure whether this would have actually any performance impact before implementing it.