Add info about horizontal scaling and debugging persistent queue (#270)

* persistent queue info

* fix format and fed queue content
This commit is contained in:
phiresky 2023-09-28 11:15:59 +02:00 committed by GitHub
parent 46e7d377c5
commit 6b52d62151
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 43 additions and 4 deletions

View file

@ -58,16 +58,36 @@ There are a few ways to work around this issue:
Lemmy_server can be horizontally scaled, with a few caveats. Lemmy_server can be horizontally scaled, with a few caveats.
Here's a quick example on how you could start 3 web servers, 3 federation servers and one scheduled task process:
```
lemmy_server --http-server=false --federate-activities=false # scheduled tasks
lemmy_server --http-server=true --federate-activities=false --disable-scheduled-task # http server 1
lemmy_server --http-server=true --federate-activities=false --disable-scheduled-task # http server 2
lemmy_server --http-server=true --federate-activities=false --disable-scheduled-task # http server 3
# federation server 1/3
lemmy_server --http-server=false --federate-activities=true --federate-process-index=1 --federate-process-count=3 --disable-scheduled-tasks
# federation server 2/3
lemmy_server --http-server=false --federate-activities=true --federate-process-index=2 --federate-process-count=3 --disable-scheduled-tasks
# federation server 3/3
lemmy_server --http-server=false --federate-activities=true --federate-process-index=3 --federate-process-count=3 --disable-scheduled-tasks
```
#### Scheduled tasks #### Scheduled tasks
By default, a Lemmy_server process will always run background scheduled tasks, which are intended to be run only on one server. Launching multiple processes with the default configuration will result in multiple duplicated scheduled tasks all starting at the same moment and trying to do the same thing at once. At best, it will be a waste of resources, but it could potentially end up causing some weird glitches as well. By default, a Lemmy_server process will run background scheduled tasks, which must be run only on one server. Launching multiple processes with the default configuration will result in multiple duplicated scheduled tasks all starting at the same moment and trying to do the same thing at once.
To solve this, Lemmy can be started with the `--disable-scheduled-tasks` flag on all but one instance. In general, there are two approaches: To solve this, Lemmy must be started with the `--disable-scheduled-tasks` flag on all but one instance. In general, there are two approaches:
1. Run all your load balanced Lemmy servers with the `--disable-scheduled-tasks` flag, and run one additional Lemmy server without this flag which is not in your load balancer and does not accept any HTTP traffic. 1. Run all your load balanced Lemmy servers with the `--disable-scheduled-tasks` flag, and run one additional Lemmy server without this flag which is not in your load balancer and does not accept any HTTP traffic.
2. Run one load balanced Lemmy server without the flag, and all other load balanced servers with the flag. 2. Run one load balanced Lemmy server without the flag, and all other load balanced servers with the flag.
Option 1 might have a few tiny advantages (easier to isolate logs for the scheduled tasks, and expensive scheduled tasks won't compete with HTTP requests for system resources), but requires an extra process. #### Federation queue
The persistent federation queue (since 0.19) is split by federated domain and can be processed in equal-size parts run in separate processes. To split the queue up into N processes numbered 1...N, use the arguments `--federate-process-index=i --federate-process-count=N` on each. It is important that each index is is given to exactly one process, otherwise you will get undefined behaviour (missing, dupe federation, crashes).
Federation processes can be started and stopped at will. They will restart federation to each instance from the last transmitted activity regardless of downtime.
#### Rolling upgrades #### Rolling upgrades

View file

@ -68,4 +68,23 @@ Also ensure that the time is accurately set on your server. Activities are signe
### Other instances don't receive actions reliably ### Other instances don't receive actions reliably
Lemmy uses a queue to send out activities. The size of this queue is specified by the config value `federation.worker_count`. Very large instances might need to increase this value. Search the logs for "Activity queue stats", if it is consistently larger than the worker_count (default: 64), the count needs to be increased. Lemmy uses one queue per federated instance to send out activities. Search the logs for "Federation state" for summaries. Errors will also be logged.
For details, execute this SQL query:
```sql
select domain,currval('sent_activity_id_seq') as latest_id, last_successful_id,fail_count,last_retry from federation_queue_state
join instance on instance_id = instance.id order by last_successful_id asc;
```
You will see a table like the following:
| domain | latest_id | last_successful_id | fail_count | last_retry |
| -------------------------- | --------- | ------------------ | ---------- | ----------------------------- |
| toad.work | 6837196 | 6832351 | 14 | 2023-07-12 21:42:22.642379+00 |
| lemmy.deltaa.xyz | 6837196 | 6837196 | 0 | 1970-01-01 00:00:00+00 |
| battleangels.net | 6837196 | 6837196 | 0 | 1970-01-01 00:00:00+00 |
| social.fbxl.net | 6837196 | 6837196 | 0 | 1970-01-01 00:00:00+00 |
| mastodon.coloradocrest.net | 6837196 | 6837196 | 0 | 1970-01-01 00:00:00+00 |
This will show you exactly which instances are up to date or not.