From 6b52d62151796234944c11cc3c62234be3f43be1 Mon Sep 17 00:00:00 2001 From: phiresky Date: Thu, 28 Sep 2023 11:15:59 +0200 Subject: [PATCH] Add info about horizontal scaling and debugging persistent queue (#270) * persistent queue info * fix format and fed queue content --- src/administration/horizontal_scaling.md | 26 +++++++++++++++++++++--- src/administration/troubleshooting.md | 21 ++++++++++++++++++- 2 files changed, 43 insertions(+), 4 deletions(-) diff --git a/src/administration/horizontal_scaling.md b/src/administration/horizontal_scaling.md index e8d43a9..199acba 100644 --- a/src/administration/horizontal_scaling.md +++ b/src/administration/horizontal_scaling.md @@ -58,16 +58,36 @@ There are a few ways to work around this issue: Lemmy_server can be horizontally scaled, with a few caveats. +Here's a quick example on how you could start 3 web servers, 3 federation servers and one scheduled task process: + +``` +lemmy_server --http-server=false --federate-activities=false # scheduled tasks +lemmy_server --http-server=true --federate-activities=false --disable-scheduled-task # http server 1 +lemmy_server --http-server=true --federate-activities=false --disable-scheduled-task # http server 2 +lemmy_server --http-server=true --federate-activities=false --disable-scheduled-task # http server 3 + +# federation server 1/3 +lemmy_server --http-server=false --federate-activities=true --federate-process-index=1 --federate-process-count=3 --disable-scheduled-tasks +# federation server 2/3 +lemmy_server --http-server=false --federate-activities=true --federate-process-index=2 --federate-process-count=3 --disable-scheduled-tasks +# federation server 3/3 +lemmy_server --http-server=false --federate-activities=true --federate-process-index=3 --federate-process-count=3 --disable-scheduled-tasks +``` + #### Scheduled tasks -By default, a Lemmy_server process will always run background scheduled tasks, which are intended to be run only on one server. Launching multiple processes with the default configuration will result in multiple duplicated scheduled tasks all starting at the same moment and trying to do the same thing at once. At best, it will be a waste of resources, but it could potentially end up causing some weird glitches as well. +By default, a Lemmy_server process will run background scheduled tasks, which must be run only on one server. Launching multiple processes with the default configuration will result in multiple duplicated scheduled tasks all starting at the same moment and trying to do the same thing at once. -To solve this, Lemmy can be started with the `--disable-scheduled-tasks` flag on all but one instance. In general, there are two approaches: +To solve this, Lemmy must be started with the `--disable-scheduled-tasks` flag on all but one instance. In general, there are two approaches: 1. Run all your load balanced Lemmy servers with the `--disable-scheduled-tasks` flag, and run one additional Lemmy server without this flag which is not in your load balancer and does not accept any HTTP traffic. 2. Run one load balanced Lemmy server without the flag, and all other load balanced servers with the flag. -Option 1 might have a few tiny advantages (easier to isolate logs for the scheduled tasks, and expensive scheduled tasks won't compete with HTTP requests for system resources), but requires an extra process. +#### Federation queue + +The persistent federation queue (since 0.19) is split by federated domain and can be processed in equal-size parts run in separate processes. To split the queue up into N processes numbered 1...N, use the arguments `--federate-process-index=i --federate-process-count=N` on each. It is important that each index is is given to exactly one process, otherwise you will get undefined behaviour (missing, dupe federation, crashes). + +Federation processes can be started and stopped at will. They will restart federation to each instance from the last transmitted activity regardless of downtime. #### Rolling upgrades diff --git a/src/administration/troubleshooting.md b/src/administration/troubleshooting.md index 793e20f..bfc0465 100644 --- a/src/administration/troubleshooting.md +++ b/src/administration/troubleshooting.md @@ -68,4 +68,23 @@ Also ensure that the time is accurately set on your server. Activities are signe ### Other instances don't receive actions reliably -Lemmy uses a queue to send out activities. The size of this queue is specified by the config value `federation.worker_count`. Very large instances might need to increase this value. Search the logs for "Activity queue stats", if it is consistently larger than the worker_count (default: 64), the count needs to be increased. +Lemmy uses one queue per federated instance to send out activities. Search the logs for "Federation state" for summaries. Errors will also be logged. + +For details, execute this SQL query: + +```sql +select domain,currval('sent_activity_id_seq') as latest_id, last_successful_id,fail_count,last_retry from federation_queue_state +join instance on instance_id = instance.id order by last_successful_id asc; +``` + +You will see a table like the following: + +| domain | latest_id | last_successful_id | fail_count | last_retry | +| -------------------------- | --------- | ------------------ | ---------- | ----------------------------- | +| toad.work | 6837196 | 6832351 | 14 | 2023-07-12 21:42:22.642379+00 | +| lemmy.deltaa.xyz | 6837196 | 6837196 | 0 | 1970-01-01 00:00:00+00 | +| battleangels.net | 6837196 | 6837196 | 0 | 1970-01-01 00:00:00+00 | +| social.fbxl.net | 6837196 | 6837196 | 0 | 1970-01-01 00:00:00+00 | +| mastodon.coloradocrest.net | 6837196 | 6837196 | 0 | 1970-01-01 00:00:00+00 | + +This will show you exactly which instances are up to date or not.