The Portainer Edge Agent
For standard deployments, we used to assume that the Portainer instance and any environments shared the same network and could communicate seamlessly. If remote environments were on a different network (say, across the Internet) we could not manage them.
Then we changed the Edge agent architecture so only the environments need to access Portainer. There is now no need to expose the Portainer agents to the Internet.
Portainer now requires that only the
8000TCP ports are exposed. We used to serve the UI and the Portainer API from port
9000, but we extended the API to allow the remote agents to poll for instructions. Port
8000is a TLS tunnel server used to create a secure tunnel between the agent and the Portainer instance. More about that soon.
When you create an Edge Agent, you are first asked for a human-friendly endpoint name. You are then asked to confirm the FQDN:PORT of your Portainer instance. This is what agents will use to connect, so make sure it’s correct and that the DNS resolves.
During the creation process, an Edge ID is dynamically generated. This is a random UUID which is assigned to each environment. You can see it in the command syntax which is provided during the setup process.
The Edge ID and the join token are unique per environment. The join token is made up of the following encoded data:
- The Portainer instance API URL. This is how the Edge Agent knows how to ‘call home’ to your Portainer instance.
- The Portainer instance reverse tunnel server address. This is identical to the API URL but with the SSH tunnel server port (
8000is the default).
- The environment identifier key (endpoint ID).
- The Portainer instance reverse-tunnel server fingerprint (prevents MITM when creating a tunnel).
Use the command syntax to deploy an Edge Agent across your remote node or remote swarm cluster.
Agents poll the Portainer instance every 5 seconds by default (this is defined in Portainer settings).
The agent says to Portainer, “Hi, I'm an agent. My join token is
abc123. Do you need me right now?”. Portainer checks its database to ensure the Edge UUID and the join token match. If no UUID can be associated with the join token provided, Portainer will associate the UUID provided by the agent to the environment’s join token.
If the UUID/join token do not match, the connection is rejected. If the UUID/join token match, the Portainer instance responds with either: "No, I don’t need you. Please check in again in X seconds." (where X is the agent polling frequency), or "Yes, I do need you. Please connect using these tunnel credentials.”.
Once confirmation is received, the Edge Agent decrypts the credentials and opens a tunnel on port
8000to the Portainer instance. If a remote environment is a swarm cluster, every node will run an instance of the agent (and every instance will poll Portainer). The 'you are required' flag causes the first agent in the cluster to establish the tunnel. Once in place, Portainer can then query the agent where the tunnel is open. If the tunnel closes for any reason, the agent will immediately re-establish it.
Sometimes Portainer will ask the agent to establish a tunnel. This happens when an admin selects an Edge environment for interactive management via the Portainer UI or the API. Once selected, the 'you are required' flag triggers the connection process. If default settings are in use, it takes about 10 seconds for the agent to poll and establish a tunnel. That’s about 5 seconds wait time until polling then a few seconds for the tunnel to open. The admin is shown this message while this happens:
The agent keeps a record of when Portainer last communicated with it. After 5 minutes of inactivity, it sends a snapshot of the current config to Portainer for its records, closes the tunnel and revokes the credentials. When admins have an active session with an Edge environment, ‘keep alives’ are sent every minute (even if the admin is not performing a task) so they are not kicked out by mistake.
Thousands of endpoints polling Portainer every 5 seconds is a lot. That’s about 324b/second per agent, not per environment. If you don’t do a lot of environment admin, we suggest you go into Portainer settings and increase the polling frequency. Simply change it back when you need to do some admin so you are not kept waiting.
We load-tested Portainer with 15,000 actively connected environments with a polling frequency of 5 seconds. This generated 7Mbps of network traffic to the Portainer instance, and Portainer needed 4 CPUs to handle the encryption/tunnel load. This Edge Agent release is our first attempt at massive-scale centralized management. Our end goal is to reduce the network overhead associated with polling.