Add some more info about debugging federation issues (#328)

* Add some more info about debugging federation issues

* Apply suggestions from code review

Co-authored-by: Richard Schwab <mail@w.tf-w.tf>

* reorder

* prettier

---------

Co-authored-by: Richard Schwab <mail@w.tf-w.tf>
This commit is contained in:
Nutomic 2024-11-08 15:54:34 +01:00 committed by GitHub
parent 5f45cb2a33
commit e2b190d5ff
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -64,7 +64,33 @@ curl -H "Accept: application/activity+json" https://your-instance.com/comment/12
Check that [federation is allowed on both instances](federation_getting_started.md). Check that [federation is allowed on both instances](federation_getting_started.md).
Also ensure that the time is accurately set on your server. Activities are signed with a timestamp, and will be discarded if it is off by more than 10 seconds. Also ensure that the time is correct on your server. Activities are signed with a timestamp, and will be discarded if it is off by more than one hour.
It is possible that federation requests to `/inbox` are blocked by tools such as Cloudflare. The sending instance can find HTTP errors with the following steps:
- If you use a [separate container for outgoing federation](horizontal_scaling.md), you need to apply the following steps to that container only
- Set `RUST_LOG=lemmy_federate=trace` for Lemmy
- Reload the new configuration: `docker compose up -d`
- Search for messages containing the target instance domain: `docker compose logs -f --tail=100 lemmy | grep -F lemm.ee -C 10`
- You also may have to reset the fail count for the target instance (see below)
### Reset federation fail count for instance
If federation sending to a specific instance has been failing consistently, Lemmy will slow down sending using exponential backoff. For testing it can be useful to reset this and make Lemmy send activities immediately. To do this use the following steps:
- Stop Lemmy, or specifically the container for outgoing federation `docker compose stop lemmy`
- Enter SQL command line: `sudo docker compose exec postgres psql -U lemmy`
- Reset failure count via SQL:
```sql
update federation_queue_state
set fail_count = 0
from instance
where instance.id = federation_queue_state.instance_id
and instance.domain = 'lemm.ee';
```
- Exit SQL command line with `\q`, then restart Lemmy: `docker compose start lemmy`
### Other instances don't receive actions reliably ### Other instances don't receive actions reliably
@ -95,7 +121,9 @@ https://phiresky.github.io/lemmy-federation-state/site
### You don't receive actions reliably ### You don't receive actions reliably
Due to the lemmy queue, remove lemmy instances will be sending apub sync actions serially to you. If your server rate of processing them is slower than the rate the origin server is sending them, when visiting the [lemmy-federation-state](https://phiresky.github.io/lemmy-federation-state/site) for the remote server, you'll see your instance in the "lagging behind" section. Due to the lemmy queue, remote lemmy instances will be sending apub sync actions serially to you. If your server rate of processing them is slower than the rate the origin server is sending them, when visiting the [lemmy-federation-state](https://phiresky.github.io/lemmy-federation-state/site) for the remote server, you'll see your instance in the "lagging behind" section.
This can be avoided by setting the config value `federation.concurrent_sends_per_instance` to a value greater than 1 on the sending instance.
Typically the speed at which you process an incoming action should be less than 100ms. If this is higher, this might signify problems with your database performance or your networking setup. Typically the speed at which you process an incoming action should be less than 100ms. If this is higher, this might signify problems with your database performance or your networking setup.