26 KiB

Raw Permalink Blame History

FreeDMR 2.0 Architecture Decisions

This file records architectural decisions, requirements, assumptions and open questions driven out during design discussion. It is intended as source material for a later formal FreeDMR 2.0 design document.

Project Philosophy

FreeDMR is open-source, open, intentionally understandable and intentionally simple enough to encourage community implementation, experimentation and operation by radio amateurs.

HBLink proved that a DMR server could be written in an open, readable way without DMR being gatekept by commercial vendors. FreeDMR takes the next step: it proves that a DMR network can be built this way without central control. Before HBLink and FreeDMR, DMR server software and server-level network membership were typically closed, gatekept or dependent on personal/team approval. FreeDMR exists in part to lower that barrier and give radio amateurs choice and freedom to experiment with global-scale ROIP networking. FreeDMR does not need to gatekeep all private experimentation. The project controls public listing: the process by which servers are shared with Pi-Star and other HBP hotspots as legitimate public access servers. A sysop can run a private server under their own DMR ID and arrange gatewaying with an existing sysop, who effectively vouches for that traffic. Public listing has additional requirements such as connectivity quality, sysop contactability and basic operational expectations.

The FreeDMR mesh design is influenced by the late Bob Bruninga's APRS ideas, Spanning Tree Protocol and related distributed-network approaches. The project also has a social purpose: bringing together communities and people connected to earlier amateur-radio networking work. FreeDMR is therefore both a technical system and a diplomacy project; design choices must respect operational autonomy, interoperability and trust between independent sysops.

FreeDMR is successful because it works in the amateur-radio sense: it is best effort, experimental, approachable and deployable on ordinary low-cost systems such as cheap VPS instances and Raspberry Pi-class hardware. It is not intended to be a safety-assured commercial system. FreeDMR 2.0 should improve quality, clarity and scalability without losing the ham-spirit/hacker-philosophy traits that made the network useful and welcoming.

Design implications:

Prefer clear, inspectable protocols over opaque mechanisms.
Keep the implementation understandable by competent sysops and contributors.
Keep the barrier to compatible implementations low where possible.
Preserve low-cost deployment and modest hardware requirements.
Avoid architectural choices that make FreeDMR dependent on heavyweight infrastructure for ordinary single-server operation.
Treat reliability as best-effort resilience appropriate to amateur radio, not as commercial safety assurance.
Preserve server autonomy and local policy.
Avoid unnecessary central control.
Distinguish private operation, vouched/gatewayed traffic and public listing.
Security should protect authenticity and network integrity without hiding amateur-radio traffic.

Protected Model

The protected asset is the FreeDMR operating model, not the old HBLink-derived object structure.

Preserve:

packet model and protocol behaviour
dial-a-TG semantics
TG/DMR-ID centric routing
loop control
source quench
mesh behaviour
practical RF/network tolerance learned from live servers and real RF links
"everything everywhere" principle, subject to documented exceptions

Replace or redesign where useful:

configured MASTER stanza as primary runtime identity
proxy-mediated client fan-out
global mutable BRIDGES structure as authoritative state
custom dashboard/reporting socket protocol
packet-path coupling to dashboard/API/report consumers

Layer Model

FreeDMR 2.0 should be described as layered:

Access layer: client/server access protocols such as HBP today and possible future non-trunk client protocols. Owns login/auth/options/keepalive, client sessions, slot state and RF-facing TG presentation.
Subscription layer: talkgroup conference membership. Owns direct TG subscriptions, dial-a-TG subscriptions, static/default/user-activated subscriptions, expiry and RF-visible TG to conference TG mapping.
Mesh layer: inter-server FBP/OBP/trunk-style behaviour. Owns loop control, source quench, hop/version handling and inter-server conference traffic.
Reporting layer: local dashboard, API observers, logs, global lastheard export and state snapshots. Reporting is observational and must not steer packet handling.

Reactor and Runtime Migration

Do not replace Twisted as part of the first FreeDMR 2.0 architecture work.

Decision:

Keep Twisted's single-threaded reactor as a safety boundary initially.
Extract and test the protocol/routing/subscription core behind deterministic interfaces.
Introduce explicit process/message boundaries only after the state model is clear.
Consider asyncio or another event loop only once Twisted has become a thin transport shell around tested core logic.

Rationale:

The current packet behaviour is subtle and validated through real RF/network deployment.
Replacing the event loop while also replacing the state model would mix too many sources of behavioural change.
Twisted's single-threaded reactor helps preserve current ordering assumptions while bridge/subscription and reporting boundaries are made explicit.
The first migration target is architectural clarity and scalability, not event loop novelty.

Identity Model

The configured master/listener is not the client identity.

FreeDMR 2.0 should move toward:

listener identity: UDP socket/service instance
client identity: DMR peer/client ID
subscription identity: client ID + slot + RF-visible TG + conference TG
mesh identity: server/peer/network ID

Server identity hierarchy:

FreeDMR server IDs are 4-digit DMR IDs.
Server sub-IDs are 5-digit IDs derived from the server ID space.
Each sysop/server identity may therefore cover up to 10 server sub-IDs for backend components, larger deployments, failover or fault-tolerant layouts.
Identity verification should cover the base server ID and its authorized sub-IDs rather than requiring unrelated credentials for each sub-ID.

A single master/listener UDP port should serve an arbitrary number of clients directly, replacing the proxy where possible.

Talkgroup Subscription Model

Conceptually, each TG is a conference bridge. Clients subscribe to conference TGs. FreeDMR does not primarily decide where to send user traffic; users choose the traffic they want to hear by subscription.

Subscriptions can be:

direct TG: RF-visible TG equals conference TG
dial-a-TG: RF-visible TG is currently TG9, conference TG is the selected TG
alias/rewrite: RF-visible TG may be any configured TG, conference TG is the FreeDMR network identity

Example:

TalkgroupSubscription(
    client_id=2345001,
    slot=2,
    conference_tg=4400,
    rf_tg=9,
    mode="dial",
    active=True,
)

The invariant is:

conference_tg = FreeDMR network/conference identity
rf_tg         = client-facing RF presentation identity

This makes arbitrary TG rewrites possible without making TG9 structurally special.

Bridge Table Replacement

The legacy BRIDGES dict should be replaced internally by subscription-oriented state and indexes. The "#" reflector naming convention does not need to be preserved internally; it can be a compatibility/export detail.

Recommended hot-path structures:

dict / set for O(1)-style local lookups
typing.NamedTuple keys for readable hash keys
dataclass(slots=True) records for mutable subscription/session state
heapq for expiry timers using lazy invalidation

Recommended indexes:

subscriptions_by_conference_tg[conference_tg] -> set[SubscriptionKey]
subscription_by_rf[(client_id, slot, rf_tg)] -> SubscriptionKey
subscriptions_by_client_slot[(client_id, slot)] -> set[SubscriptionKey]
expiry_heap -> (expires_at, generation, SubscriptionKey)

Packet handlers should not scan all subscriptions/bridges to find routing targets.

Packet Plane vs Control Plane

The packet plane is delay-sensitive.

Packet-plane rules:

local in-memory hot state only
no external database round trips
no blocking API/dashboard/report calls
no cross-process lock waits
no dependency on reporting consumers being connected

External stores may be used for:

config distribution
API/dashboard state
control-plane coordination
snapshots
global lastheard export
optional clustering/multi-process coordination

General performance principle:

Expensive processing should be considered for offload to separate processes because CPython execution is constrained by the GIL for CPU-bound Python code.
Offload is appropriate for reporting fanout, global export, dashboard aggregation, historical database writes, heavy analytics, expensive transcoding/codec experiments and non-critical maintenance jobs.
Offload boundaries must be asynchronous from the packet path. If an offload worker is slow or unavailable, packet handling must continue with local state.
Do not offload hot-path routing decisions if doing so would add inter-process, network or lock waits to every packet.

DMR Data Packet Policy

FreeDMR must maintain DMR data packet forwarding support.

Decision:

FreeDMR should forward supported DMR data packets according to the same conference/subscription and mesh principles as other traffic.
There must be no regression in existing data packet forwarding support.
FreeDMR core should not become an application-level DMR data processor.
GPS, SMS and similar application processing should be implemented by systems connected via FBP or another mesh/access-adjacent interface.
DATA_GATEWAY is understood as an earlier expression of this model: an FBP link that carries data-oriented traffic rather than ordinary voice traffic.
Existing SUB_MAP behaviour is intentional: data addressed to a DMR ID can be routed toward the last known HBP/client location for that DMR ID.

Core FreeDMR may inspect/classify data packets only as needed for:

packet admission and protocol validation
routing/subscription decisions
loop control and source quench
reporting/logging
preserving packet bytes and metadata across FBP/HBP boundaries
maintaining the subscriber location map needed for data-client routing

Possible narrow exceptions:

dial-a-TG control via DMR SMS
DMR SMS alerts from a server to a sysop

Any such exceptions must be explicit control-plane features and must not turn FreeDMR core into a general GPS/SMS application processor.

Mesh Peer Authentication

FreeDMR should only accept mesh/FBP traffic from servers that can be validated as legitimate members of the network.

Core principle:

FreeDMR may sign/authenticate traffic and control messages, but should not encrypt amateur-radio traffic or mesh traffic by default.
Amateur radio is public in most jurisdictions and encryption is often not permitted. FreeDMR users may also carry IP backhaul over amateur radio links.
FreeDMR's security model is authenticity, integrity, membership validation and local policy enforcement, not secrecy.
This follows the existing FreeDMR principle, agreed historically by project maintainers, that the network has nothing to hide and should remain cleartext.

Identity/listing distinction:

Signed mesh identity should prove a server/sysop identity or a vouching relationship. It should not automatically imply public listing.
Public listing is a directory/discovery decision for clients and HBP hotspots.
A public access server may need stronger operational requirements than a private or gatewayed server.
Local sysops may still choose whether to carry/vouch for traffic from private servers, even when those servers are not publicly listed.
If an individual 7-digit DMR ID is used as a server identity, traffic may pass when a directly connected/listed sysop chooses to allow and gateway it.
The vouching sysop is accountable to their peers for traffic they forward. If that traffic harms the network, peers may choose to stop peering with the vouching server. This preserves a self-policing social mechanism without requiring central control for all private experimentation.

Analogue network bridges:

Analogue ROIP/network bridges commonly connect as if they are DMR clients via HBP.
FreeDMR permits this and is generally more permissive than many other DMR networks.
FreeDMR works with/supports the DVSwitch community on this. DVSwitch provides a common mechanism by which analogue networks can be bridged into DMR-style access.
These bridges are operationally sensitive: technical limitations can make them effectively listen-only, consuming CPU and bandwidth while adding little value if they do not contribute actual two-way user activity.
Analogue bridges are often implemented using audio mixing/conference style behaviour. This is a poor fit for DMR and similar digital modes, which enforce one audio source at a time and rely on stream, hang-time and contention behaviour rather than mixed audio.
This mismatch comes partly from analogue repeater heritage: analogue systems may maintain a continuous transmit carrier and mix notification sounds such as pips, CWID and courtesy tones into the output audio. Analogue systems also often have little or no strong source identity, whereas DMR traffic carries a DMR ID.
A common failure mode is that a feed from an analogue repeater keeps the DMR stream open between analogue overs, plays courtesy/notification tones and then carries the next analogue user in the same held stream. This can hold the TG open and prevent a digital station from breaking in until the analogue repeater times out and its carrier drops.
Analogue bridges should therefore be subject to local sysop policy, public listing expectations and peer accountability. Permitted does not mean automatically valuable or immune from peering/listing consequences.

Other digital network bridges:

Digital voice networks such as YSF and NXDN are generally a better technical match for DMR than analogue networks because they also use AMBE-family vocoder audio.
AMBE-to-AMBE interworking can be lossless at the codec level and avoids transcoding artifacts.
Transcoding from analogue or unlike codecs can degrade audio quality significantly and should be treated carefully.

Desired direction:

Add PKI-backed mesh peer admission to the Bridge Control (BCXX) mechanism.
A peer server presents public identity material signed by a FreeDMR network master key or trusted network CA.
The authenticated identity must bind at least:
- server ID
- authorized server sub-IDs
- public key
- validity period
- permitted protocol/features where useful
Runtime admission should bind the authenticated server identity to the observed transport endpoint, including IP address.
If the observed IP address changes, the FBP peer must perform a new key exchange/authentication step before its traffic is forwarded.
Network membership should be represented by a signed sysop/server key that is issued when the sysop/server joins the network and revoked when they leave or are compromised. Runtime endpoint/session bindings are renewed separately and do not require re-signing the long-lived membership key.
One successful verification of the signed identity should authorize the covered server ID and declared/authorized sub-IDs for that sysop, subject to local policy and endpoint/session binding.

Packet-plane rule:

Expensive signature/certificate validation happens during control-plane admission or re-admission, not for every DMR packet.
Per-packet mesh traffic should use a cached authenticated peer/session state check keyed by server ID and endpoint.

Initial conceptual flow:

FBP peer connects/sends keepalive
  -> BC auth exchange presents signed server identity/public key
  -> FreeDMR validates signature against trusted network key
  -> FreeDMR binds server_id + endpoint + protocol features to peer session
  -> DMR traffic is accepted only while that authenticated binding is valid

Security requirements:

Reject unauthenticated FBP traffic by default once this mode is enabled.
Reject traffic where server ID, key identity and source endpoint do not match the authenticated binding.
Expire authenticated bindings and require renewal.
Support soft renewal: when an authenticated binding reaches its renewal timestamp, schedule asynchronous re-authentication while allowing a bounded grace period so in-flight voice is not interrupted purely by renewal timing.
Hard-stop forwarding only for explicit authentication failure, revoked identity/key, endpoint mismatch outside policy, expired grace period, or policy requiring immediate re-authentication.
Log authentication failure reasons clearly without leaking private material.
Provide a controlled transition mode for existing networks while PKI is rolled out.

Open questions:

Whether to use X.509 certificates, raw Ed25519 public keys with signed metadata, or another compact identity format.
How network master keys/CAs are generated, rotated and revoked.
Whether peer authorization policy should live in config, MQTT/control-plane state, or a signed network membership list.
How to handle legitimate dynamic-IP servers without weakening endpoint binding.
What renewal and grace-period defaults best preserve voice continuity without weakening mesh admission.

Distributed Key Gossip Option

FreeDMR may also use a peer-to-peer signed-key dissemination mechanism over the Bridge Control (BCXX) out-of-band channel.

Concept:

Each server periodically advertises the signed server public keys/membership documents it knows to its direct FBP peers.
Peers validate the signatures and build a local table of legitimate server identities as knowledge propagates through the mesh.
Each server uses its local signed-key table and local policy to decide whether to route or reject packets that originated from a given source server, even when that source server is not directly connected.

Rationale:

FreeDMR is a peer network, not hub-and-spoke or master/slave.
Servers are autonomous and independently operated.
Direct FBP peers should not be blindly trusted to make correct routing decisions on behalf of the local server.
Open-source, human-readable code deliberately lowers the barrier to modification, so each server must be able to protect itself from incorrect or malicious upstream forwarding decisions.

Security requirements for key gossip:

Only signed membership documents are accepted; peers cannot create trust by merely repeating a key.
Membership documents need issuer, subject server ID, public key fingerprint, authorized sub-IDs, validity period, serial/version and signature.
Revocation data must propagate by the same or a stronger mechanism.
Each server must enforce local policy after validation. A valid signed key proves membership, not mandatory carriage.
Key gossip must be rate-limited and bounded so it cannot become a BCXX flood or memory-growth vector.
Received membership data must be replay-resistant enough to handle expiry, superseded serials and revoked keys.
The packet path must use cached key/policy state; signature validation and gossip processing are control-plane work.

This complements direct-peer endpoint authentication. Direct-peer auth proves the connected FBP peer is legitimate for this session; distributed signed-key knowledge lets the local server make autonomous decisions about traffic whose source server is elsewhere in the mesh.

Reporting Protocol Decision

FreeDMR 2.0 should define a structured reporting event protocol and use MQTT as the preferred external live reporting transport.

Rationale:

MQTT is already familiar in DMR network dashboard/reporting contexts.
BrandMeister uses MQTT, providing a useful precedent for dashboard consumers.
MQTT topics map naturally to server/client/subscription/call state.
Retained messages are useful for current state snapshots.
Last Will and Testament can represent server/reporting disconnects.
MQTT-over-WebSocket allows browser dashboards to subscribe directly when the broker supports it.

Constraints:

MQTT publishing must be asynchronous from the packet worker.
Packet routing must continue if the MQTT broker/dashboard is down.
Event generation must be state-change/summary oriented, not per DMR frame.
The event schema is the compatibility contract; internal Python objects are not.
Local live dashboard and central global lastheard remain separate paths.
Voice stability takes precedence over reporting completeness. If the system must choose between dropping/reporting-losing events and delaying packet handling, it must drop or coalesce reporting events.

Implementation requirement:

packet path -> non-blocking local event queue -> MQTT publisher worker

The packet path must not call an MQTT broker synchronously. The local event queue should be bounded. On overflow, the publisher layer should drop or coalesce low-priority events and emit a later reporting-health event rather than blocking packet handling.

Suggested event priority:

retain/coalesce latest state: server/client/slot/subscription state
keep best effort: call start/end summaries
drop first under pressure: high-volume debug/warning/statistical updates

MQTT publishing should support reconnect with exponential backoff and should refresh retained state after reconnect so a dashboard can recover even if transient events were missed.

Suggested MQTT namespace:

freedmr/v2/{server_id}/state
freedmr/v2/{server_id}/client/{client_id}/state
freedmr/v2/{server_id}/client/{client_id}/slot/{slot}/activity
freedmr/v2/{server_id}/subscription/{subscription_id}/state
freedmr/v2/{server_id}/call/{stream_id}/start
freedmr/v2/{server_id}/call/{stream_id}/end
freedmr/v2/{server_id}/mesh/{peer_id}/state
freedmr/v2/{server_id}/event

Use retained messages for current state:

server state
client state
slot activity
subscription state
mesh peer state

Use non-retained messages for transient events:

call start/end
loop-control event
source-quench event
packet-rate/loss summary
warnings

Example event:

{
  "version": 2,
  "event_id": 1849281,
  "type": "call.started",
  "timestamp": 1710000000.123,
  "server_id": 234099,
  "client_id": 2345001,
  "slot": 2,
  "conference_tg": 4400,
  "rf_tg": 9,
  "source_id": 2351234,
  "stream_id": 16909060,
  "access": "hbp"
}

Dashboard delivery options:

preferred: dashboard subscribes to MQTT over WebSocket
alternative: local reporting sidecar translates MQTT to SSE/HTTP
control actions should use authenticated HTTP APIs unless a future UI needs bidirectional streaming

Local Dashboard vs Global Lastheard

Each FreeDMR server has its own local live dashboard. The global lastheard service is centrally hosted and non-real-time.

Local dashboard:

consumes local MQTT live state/events
displays current client/repeater traffic
must tolerate reconnects and missed transient events by reloading retained state topics

Global lastheard:

consumes call summaries or batched exports
should not depend on packet-plane or dashboard delivery
should tolerate central outage via spool/retry

Possible MQTT global feed:

Each server publishes local live dashboard topics to a local broker or local reporting service.
Prefer a separate exporter process for the curated global feed. The exporter subscribes to the same local real-time MQTT feed as the dashboard, filters and summarizes what is needed, then publishes to the network MQTT broker or writes to the global collector.
The exporter publishes only summary topics needed for the 30-day database, such as call end summaries, client/server presence, selected mesh health and selected subscription changes.
Raw packet events and high-volume live slot updates should not be exported to the global broker by default.
Central broker, global dashboard or exporter failure must not back up into local packet processing or local dashboard state.

Preferred flow:

FreeDMR core -> local MQTT feed -> local dashboard
                              -> global-exporter process -> network MQTT/collector

Core publishing invariant:

FreeDMR core emits each reporting event once to its configured local MQTT broker/publisher queue.
Fanout to dashboards, exporters, automation and global collectors is handled by the MQTT broker and separate subscriber processes.
Adding more reporting consumers must not increase FreeDMR packet-process work beyond the single local event emission.

Suggested global MQTT subjects:

freedmr/v2/global/{server_id}/call/end
freedmr/v2/global/{server_id}/client/state
freedmr/v2/global/{server_id}/server/state
freedmr/v2/global/{server_id}/mesh/state

Reporting Event Types

Initial event families:

server.started
server.stopping
client.connected
client.disconnected
client.options_changed
subscription.activated
subscription.deactivated
subscription.expired
call.started
call.ended
call.lost
mesh.peer_up
mesh.peer_down
mesh.source_quench
loop.detected
packet.rate_limited

Open Questions

Which MQTT broker should be packaged by default: Mosquitto, EMQX, NATS MQTT compatibility, or another option?
Should MQTT be mandatory for FreeDMR 2.0 dashboards, or optional with an embedded/local fallback?
What authentication/authorization model should protect MQTT topics and dashboard control APIs?
What retained-topic expiry policy should be used to prevent stale state?
Should global lastheard consume MQTT directly or use a separate HTTP/queue exporter fed from reporting events?
Should FreeDMR expose a legacy BRIDGES compatibility view during migration?

26 KiB Raw Permalink Blame History

FreeDMR 2.0 Architecture Decisions

Project Philosophy

Protected Model

Layer Model

Reactor and Runtime Migration

Identity Model

Talkgroup Subscription Model

Bridge Table Replacement

Packet Plane vs Control Plane

DMR Data Packet Policy

Mesh Peer Authentication

Distributed Key Gossip Option

Reporting Protocol Decision

Local Dashboard vs Global Lastheard

Reporting Event Types

Open Questions

26 KiB

Raw Permalink Blame History