Federation tests OK but fails to work

Description

I have 3 x Rocket chat servers within our company network servicing 3 different divisions of the business. There are 3 due to them each requiring different branding and integrations etc.
We would like to implement federation so that we can communicate across the 3 servers.
I have implemented federation using DNS and the federation test button comes back successful for each server however when trying to search for a user on a different server returns no results.

Each server uses a different domain name.
All servers are using the same active directory for authentication so anyone can login to any server.
The servers are behind an nginx reverse proxy which handles SSL handshake on port 443 and certificates and forwards to non-SSL ports (80 or 3000).

Has anyone had any success setting up federation in a similar environment and can offer some tips suggestions as to where this is failing?

Thanks

Server Setup Information

  • Version of Rocket.Chat Server: 2.11
  • Operating System: Linux (Debian 9)
  • Deployment Method: SNAP
  • Number of Running Instances: 1
  • DB Replicaset Oplog:
  • NodeJS Version: 8.15.1
  • MongoDB Version: 3.6.14
  • Proxy: nginx
  • Firewalls involved: pfSense

Any additional Information

Just revisiting this again as I am getting pressure to have the federation functionality working.

Previously I managed to get one of my RocketChat servers testing successfully publishing details in DNS but I couldn’t get an almost identical server to successfully test the DNS settings.

I constantly get the following in the log:

I20200304-11:26:53.154(0) server.js:204 Federation ➔ dns.error { Error: failed [503] Service Unavailable     at Object.exports.makeErrorByStatus (packages/http.js:176:10)     at Request._callback (packages/http.js:140:24)     at Request.self.callback (/snap/rocketchat-server/1427/programs/server/npm/node_modules/meteor/http/node_modules/request/request.js:185:22)     at emitTwo (events.js:126:13)     at Request.emit (events.js:214:7)     at Request.<anonymous> (/snap/rocketchat-server/1427/programs/server/npm/node_modules/meteor/http/node_modules/request/request.js:1161:10)     at emitOne (events.js:116:13)     at Request.emit (events.js:211:7)     at IncomingMessage.<anonymous> (/snap/rocketchat-server/1427/programs/server/npm/node_modules/meteor/http/node_modules/request/request.js:1083:12)     at Object.onceWrapper (events.js:313:30)     at emitNone (events.js:111:20)     at IncomingMessage.emit (events.js:208:7)     at endReadableNT (_stream_readable.js:1064:12)     at _combinedTickCallback (internal/process/next_tick.js:139:11)     at process._tickDomainCallback (internal/process/next_tick.js:219:9)   response:     { statusCode: 503,      content: 'Service Unavailable',      headers:        { date: 'Wed, 04 Mar 2020 11:26:53 GMT',         'content-length': '19',         'content-type': 'text/plain; charset=utf-8',         connection: 'close' },      data: null } } 

I have run some external tests on the external DNS zone and discovered that my SRV record had a ‘.’ missing and so was effectively this with the domain doubled:

_rocketchat._tcp.domain.com.	1H	IN	SRV	1 1 443 chat.domain.com.domain.com

Thinking that was my problem I fixed that and tested but it still fails.
So I go and check that record on the server that was testing OK and that also has the double domain problem.
I fix that and now that fails the test too!! Also reverting it back now fails the test.

So there is something broken with the whole DNS verification process.
Could it be because my servers are behind a reverse https proxy?
Is it since the rocketchat installation has been updated to the latest?

Version	2.4.11
Apps Engine Version	1.11.2
Database Migration	170
Database Migration Date	March 4, 2020 12:23 AM

I think the testing/federation discovery service is down. The 503 error in the logs is also seen if you visit https://hub.rocket.chat/ (where you should be able to manually test a domain’s availability with https://hub.rocket.chat/api/v1/peers?search=DOMAIN), and my guess without poking at code is that’s the same service that the test button uses.

PS I added a bug for it https://github.com/RocketChat/Rocket.Chat/issues/16945

@putt1ck thanks for your reply and I concur with your theory.
Thanks also for raising the bug - I was planning to look into the process for filing a bug.