Demystifying Push Notifications: The Complete Guide
A comprehensive theoretical exploration of push notifications: how they work, the infrastructure behind them, why mobile devices handle them so well, the services involved, and the fascinating journey from simple alerts to the sophisticated notification systems we use today.
Every day, billions of push notifications travel across the planet. A message from a friend. A breaking news alert. A reminder about an abandoned shopping cart. A security code for two-factor authentication. These small rectangles of information have become so ubiquitous that we rarely stop to consider the remarkable engineering that makes them possible.
When you receive a notification on your phone, what actually happens? How does a server on the other side of the world know how to reach your specific device? Why do notifications arrive even when apps are closed? How do mobile operating systems manage to receive messages for hundreds of apps without draining the battery in minutes? What infrastructure exists between the moment a company decides to send you a notification and the moment your phone buzzes?
The answers to these questions reveal one of the most sophisticated pieces of infrastructure in modern computing—a global system of persistent connections, message queues, device registries, and carefully optimized protocols that work together to deliver billions of messages daily with remarkable reliability.
This article is a deep theoretical exploration of push notifications. We will trace their history from simple polling mechanisms to today’s sophisticated real-time systems. We will examine the architecture that makes them work, understand why mobile devices handle them so efficiently, compare the major services that power them, and look at where this technology is heading. No code. Just the mental models you need to truly understand how push notifications work.
The Evolution of Instant Communication
Before we can appreciate push notifications, we must understand what came before them and why the transition was necessary. The story of push notifications is really the story of the internet’s evolution from a pull-based to a push-based communication model.
The Early Internet: You Had to Ask
In the early days of the internet, all communication was pull-based. If you wanted to know if you had email, you opened your email client and it checked the server. If you wanted news, you visited a news website. If you wanted to know if your friend was online, you logged into a chat service. The user always initiated the communication.
This model worked reasonably well when people accessed the internet from desktop computers for defined periods. You would sit down, check your email, browse some websites, and disconnect. The internet was something you visited, not something that was always with you.
sequenceDiagram
participant U as User
participant C as Client App
participant S as Server
Note over U,S: Pull-Based Model (1990s)
U->>C: "Do I have new mail?"
C->>S: Check for new messages
S-->>C: Here are your messages
C-->>U: Display messages
Note over U,S: User must explicitly ask
Note over U,S: No updates between checks
Note over U,S: Server is passive
The Problem with Polling
As internet usage grew and applications became more interactive, developers needed ways to show users updates in real-time. The first solution was polling—having the client application repeatedly ask the server if anything new had happened.
Polling is conceptually simple. Every few seconds (or minutes), the client sends a request to the server: “Anything new?” The server responds with either new data or an indication that nothing has changed. If you set the polling interval short enough, updates appear nearly instantaneous.
But polling has fundamental problems that become severe at scale.
Wasted Resources
In most applications, most polling requests return nothing. If you check for new messages every 5 seconds but only receive a message every few hours, 99.9% of your requests are wasted. Each request consumes bandwidth, server processing, and battery power—all for no useful information.
Latency vs. Efficiency Tradeoff
If you poll every 30 seconds, updates can be delayed by up to 30 seconds. If you poll every second, updates are fast but you are making 1,800 requests per hour per user. There is no good solution—either updates are slow or resources are wasted.
Battery Drain
For mobile devices, polling is devastating. Each network request requires waking the radio, which consumes significant power. A phone polling multiple servers for multiple apps would drain its battery in hours.
flowchart TB
subgraph Polling["Polling Model Problems"]
direction TB
subgraph Waste["Resource Waste"]
R1[Request 1: No data]
R2[Request 2: No data]
R3[Request 3: No data]
R4[Request 4: No data]
R5[Request 5: NEW DATA!]
R6[Request 6: No data]
end
subgraph Battery["Battery Impact"]
Wake[Wake Radio] --> Send[Send Request]
Send --> Wait[Wait Response]
Wait --> Process[Process Response]
Process --> Sleep[Sleep Radio]
Sleep --> Wake
end
subgraph Scale["Scale Problem"]
U1[User 1: 720 req/hour]
U2[User 2: 720 req/hour]
U3[User 3: 720 req/hour]
UN[User N: 720 req/hour]
U1 & U2 & U3 & UN --> Server[Server Overwhelmed]
end
end
Long Polling: A Better Approach
Long polling emerged as an improvement. Instead of the server immediately responding “no new data,” it holds the connection open until new data arrives or a timeout occurs. When data arrives, it is sent immediately. When the timeout occurs, the client reconnects.
This dramatically reduces the number of requests while improving latency. New data arrives as soon as it exists, not at the next polling interval. Server load decreases because most connections sit idle waiting for data rather than processing empty requests.
sequenceDiagram
participant C as Client
participant S as Server
Note over C,S: Long Polling Model
C->>S: Any new data? (Connection stays open)
Note over S: Server waits...
Note over S: ...and waits...
Note over S: Data arrives!
S-->>C: Here's your new data!
C->>S: Any new data? (New connection)
Note over S: Timeout after 30 seconds
S-->>C: No new data (timeout)
C->>S: Any new data? (Reconnect immediately)
Long polling was better, but still imperfect. Each user requires a dedicated server connection. Connections can drop and require reconnection. Mobile devices still needed to maintain network activity. It was a bridge technology, not a final solution.
WebSockets: True Bidirectional Communication
WebSockets, standardized in 2011, provided true bidirectional communication. After an initial HTTP handshake, the connection upgrades to a persistent socket where both client and server can send messages at any time without the overhead of HTTP headers.
For web applications, WebSockets revolutionized real-time communication. Chat applications, collaborative tools, live dashboards—all became possible with efficient, low-latency bidirectional messaging.
But WebSockets still require the client to maintain an active connection. For a browser tab, this is fine. For a mobile app, it is problematic. An app cannot maintain a WebSocket connection when it is not running, and keeping connections alive in the background drains battery and consumes data.
flowchart LR
subgraph Evolution["Evolution of Real-Time Communication"]
direction LR
P[Polling<br/>1990s] --> LP[Long Polling<br/>2000s]
LP --> WS[WebSockets<br/>2011]
WS --> Push[Push Notifications<br/>2009+]
end
subgraph Characteristics["Key Characteristics"]
direction TB
PC[Polling:<br/>Simple but wasteful]
LPC[Long Polling:<br/>Better but still overhead]
WSC[WebSockets:<br/>Efficient but needs<br/>active connection]
PushC[Push:<br/>Works when app<br/>is closed]
end
The Mobile Revolution Changes Everything
The introduction of the iPhone in 2007 and the App Store in 2008 fundamentally changed computing. Suddenly, people carried internet-connected computers everywhere. Apps needed to notify users about events even when the app was not open.
But mobile devices faced unique constraints. Battery life is precious. Cellular data has costs. Memory and processing power are limited. Running background processes for every installed app would make phones unusable.
The solution required a paradigm shift: instead of each app maintaining its own connection to its server, the operating system would maintain a single connection to a centralized service, and that service would relay messages to the appropriate apps.
This was the birth of modern push notifications.
How Push Notifications Actually Work
Push notifications seem simple from the user’s perspective—a message appears on the screen. But behind that simple experience is a sophisticated system involving multiple parties, persistent connections, cryptographic tokens, and carefully designed protocols.
The Core Architecture
Understanding push notifications requires understanding the key players and their relationships.
The Application Server
This is the backend system that decides when to send notifications and what they should contain. It might be a social media platform deciding to notify you about a new message, an e-commerce site reminding you about an abandoned cart, or a news organization alerting you about breaking news.
The application server does not communicate directly with user devices. It cannot—there are too many devices, they are frequently offline, their network addresses change constantly, and establishing direct connections would be impractical.
The Push Service
The push service is the intermediary that makes everything work. Apple runs Apple Push Notification Service (APNs) for iOS devices. Google runs Firebase Cloud Messaging (FCM) for Android devices. Microsoft runs Windows Push Notification Services (WNS) for Windows devices. Mozilla and others provide push services for web browsers.
These services maintain persistent connections to billions of devices worldwide. They accept messages from application servers and route them to the correct devices. They handle the complexity of device connectivity, message queuing, and delivery confirmation.
The Client Application
This is the app installed on the user’s device. It registers with the push service to receive a unique token, shares that token with the application server, and handles incoming notifications when they arrive.
The Operating System
The operating system is the unsung hero. It maintains the persistent connection to the push service, receives incoming messages, routes them to the correct application, and displays notifications to the user. It does this while aggressively optimizing for battery life.
flowchart TB
subgraph YourBackend["Your Application Server"]
Logic[Business Logic]
Queue[Notification Queue]
Logic --> Queue
end
subgraph PushProvider["Push Service Provider"]
direction TB
API[Push API]
Router[Message Router]
Connections[Device Connections]
Storage[Message Storage]
API --> Router
Router --> Connections
Router --> Storage
end
subgraph Device["User's Device"]
direction TB
OS[Operating System]
PushClient[Push Client]
YourApp[Your App]
Notification[Notification Display]
PushClient --> OS
OS --> YourApp
OS --> Notification
end
Queue -->|HTTPS Request| API
Connections <-->|Persistent Connection| PushClient
style PushProvider fill:#e1f5fe
style Device fill:#e8f5e9
The Registration Flow
Before any notifications can be sent, devices must register with the push service. This registration process establishes the connection that will be used for future notifications.
When a user installs your app and opens it, the app asks the operating system for a push notification token. The operating system, which already has a connection to the push service, requests a unique token for this app on this device. The push service generates this token and returns it through the operating system to your app.
Your app then sends this token to your application server. Your server stores the token, associating it with the user’s account. Now, whenever you want to send a notification to this user, you can use this token to identify their device.
sequenceDiagram
participant App as Your App
participant OS as Operating System
participant Push as Push Service
participant Server as Your Server
Note over App,Server: Device Registration Flow
App->>OS: Request push permission
OS-->>App: User grants permission
App->>OS: Request push token
OS->>Push: Generate token for this app/device
Push-->>OS: Here's the unique token
OS-->>App: Token: "abc123xyz..."
App->>Server: Register token for user
Server->>Server: Store: User ID → Token
Server-->>App: Registration confirmed
Note over Server: Server now knows how to<br/>reach this user's device
The Token: Your Device’s Address
The push token is a crucial piece of the puzzle. It is a unique identifier that tells the push service exactly which device should receive a message. But it is not a simple device ID—it is a carefully designed construct with important properties.
Uniqueness: Each token uniquely identifies a specific app on a specific device. If you install the same app on two phones, each gets a different token. If you install two different apps on the same phone, each gets a different token.
Opacity: The token’s internal structure is not meant to be parsed by your application. It is a blob of data that only the push service understands.
Expiration: Tokens can expire or become invalid. When a user uninstalls an app, reinstalls the operating system, or gets a new device, old tokens stop working. Your server must handle token invalidation gracefully.
Privacy: The token does not reveal the user’s phone number, email, or identity. It only allows sending notifications to that specific app installation.
flowchart TB
subgraph TokenCharacteristics["Push Token Properties"]
direction TB
subgraph Identity["Unique Identity"]
D1[Device 1 + App A] --> T1[Token: abc123]
D1a[Device 1 + App B] --> T2[Token: def456]
D2[Device 2 + App A] --> T3[Token: ghi789]
end
subgraph Lifecycle["Token Lifecycle"]
Created[Token Created] --> Valid[Token Valid]
Valid --> Used[Token Used for Notifications]
Used --> Invalid[Token Invalidated]
Invalid --> R1[App Uninstalled]
Invalid --> R2[OS Reinstalled]
Invalid --> R3[Token Expired]
Invalid --> R4[New Device]
end
end
Sending a Notification
When your application server decides to send a notification, it constructs a message containing the content to display and the target token, then sends this to the push service via HTTPS.
The push service receives the message, validates your credentials (proving you are authorized to send notifications for this app), and looks up the token in its database. If the device is currently connected, the notification is delivered immediately through the persistent connection. If the device is offline, the message is queued for later delivery.
When the device comes online (or wakes up for periodic check-ins), it receives the queued notifications. The operating system delivers them to the appropriate apps and displays them to the user.
sequenceDiagram
participant Server as Your Server
participant Push as Push Service
participant Device as User's Device
participant OS as Operating System
participant App as Your App
Note over Server,App: Notification Delivery Flow
Server->>Push: Send notification<br/>Token: abc123<br/>Message: "New message!"
Push->>Push: Validate credentials
Push->>Push: Look up token abc123
alt Device Online
Push->>Device: Deliver via persistent connection
Device->>OS: Incoming notification
OS->>App: Wake app (optional)
OS->>OS: Display notification
else Device Offline
Push->>Push: Queue message
Note over Push: Wait for device...
Device->>Push: Device comes online
Push->>Device: Deliver queued messages
Device->>OS: Incoming notifications
OS->>OS: Display notifications
end
The Persistent Connection
The persistent connection between devices and push services is the heart of the system. Understanding how it works explains why push notifications are so efficient.
Mobile operating systems maintain a single, shared connection to the push service. This connection uses a protocol optimized for mobile networks—typically MQTT or a proprietary protocol with similar characteristics. The connection is designed to:
Minimize Data Usage: The connection sends tiny “heartbeat” packets periodically to stay alive, but these are measured in bytes, not kilobytes.
Survive Network Changes: When you switch from WiFi to cellular or move between cell towers, the connection can recover without full re-establishment.
Wake Efficiently: Mobile radios can remain in low-power states while the connection persists. The operating system and cellular modem coordinate to wake only when data actually arrives.
Multiplex Apps: A single connection serves all apps on the device. One hundred installed apps do not require one hundred connections—they share one.
flowchart TB
subgraph Device["Mobile Device"]
direction TB
App1[App 1]
App2[App 2]
App3[App 3]
AppN[App N]
OSLayer[Operating System Push Client]
App1 & App2 & App3 & AppN --> OSLayer
end
subgraph Connection["Single Shared Connection"]
Pipe[Persistent Connection<br/>Optimized Protocol<br/>Heartbeats: ~20 bytes/min]
end
subgraph PushService["Push Service"]
Gateway[Connection Gateway]
Router[Message Router]
Gateway --> Router
end
OSLayer <--> Pipe
Pipe <--> Gateway
Note1[Without shared connection:<br/>100 apps = 100 connections<br/>= Battery death]
Note2[With shared connection:<br/>100 apps = 1 connection<br/>= Efficient]
style Connection fill:#fff3e0
Why Mobile Devices Handle Push Notifications So Well
One of the most remarkable aspects of push notifications is how well mobile devices handle them. Your phone can receive notifications from hundreds of apps instantly while lasting all day on a single charge. This is not accidental—it is the result of deep integration between hardware, operating system, and cellular networks.
The Radio Power Challenge
Mobile radios are power-hungry. Transmitting and receiving data requires significant energy. But the relationship between radio activity and power consumption is not linear—it is stepped.
Mobile radios operate in several power states. In high-power states, they can transmit and receive at full speed. In low-power states, they consume minimal energy but cannot send or receive data. Transitioning between states takes time and energy.
The key insight is that it is often more efficient to batch small communications together than to send them individually. Each communication requires transitioning to a high-power state, so fewer transitions mean less power consumed.
stateDiagram-v2
[*] --> Idle: Device sleeping
Idle --> RRC_Idle: Network check needed
RRC_Idle --> RRC_Connected: Data to send/receive
RRC_Connected --> RRC_Idle: Inactivity timer (5-10s)
RRC_Idle --> Idle: Longer inactivity (seconds)
note right of Idle
Power: ~0.5mW
No data possible
end note
note right of RRC_Idle
Power: ~50mW
Can receive paging
end note
note right of RRC_Connected
Power: ~500mW
Full data transfer
end note
How Push Notifications Minimize Radio Usage
Push notification systems are designed specifically to minimize radio state transitions. Here is how:
Batching at the Network Level: Cellular networks aggregate multiple push notifications destined for the same device and deliver them together. Instead of five separate wakeups for five notifications, the device wakes once and receives all five.
Heartbeat Optimization: The persistent connection needs periodic heartbeats to stay alive, but these are timed carefully. Operating systems negotiate heartbeat intervals with push services and cellular networks to minimize wakeups while keeping connections alive.
Carrier Collaboration: Apple and Google work with cellular carriers to optimize push delivery. Carriers can hold notifications briefly to batch them with other traffic, reducing the number of times the radio must wake.
Priority Levels: Not all notifications are equally urgent. Push services support priority levels—high-priority notifications wake the device immediately, while lower-priority ones can be delayed until the device is already awake for other reasons.
flowchart TB
subgraph Inefficient["Without Optimization"]
direction LR
N1a[Notif 1] --> W1[Wake]
N2a[Notif 2] --> W2[Wake]
N3a[Notif 3] --> W3[Wake]
N4a[Notif 4] --> W4[Wake]
W1 & W2 & W3 & W4 --> Bad[4 Radio Wakeups<br/>High Battery Drain]
end
subgraph Efficient["With Batching"]
direction LR
N1b[Notif 1]
N2b[Notif 2]
N3b[Notif 3]
N4b[Notif 4]
N1b & N2b & N3b & N4b --> Batch[Batch Together]
Batch --> W5[Single Wake]
W5 --> Good[1 Radio Wakeup<br/>Low Battery Drain]
end
Operating System Integration
The operating system plays a crucial role in efficient notification handling. Both iOS and Android have deeply integrated push notification support at the system level.
Single Connection Manager: The operating system, not individual apps, manages the connection to push services. This ensures proper batching and prevents apps from establishing redundant connections.
Wake Lock Management: When a notification arrives, the OS wakes only the minimum necessary components. The screen might remain off while the notification is processed and queued for display.
Process Lifecycle: Apps do not need to run continuously to receive notifications. The OS maintains the connection and can briefly wake an app to process a notification before putting it back to sleep.
Notification Coalescing: Multiple notifications for the same app can be combined or managed together, reducing the visual and cognitive load on the user.
flowchart TB
subgraph OSIntegration["Operating System Push Integration"]
direction TB
subgraph Kernel["Kernel Level"]
Radio[Radio Driver]
Power[Power Manager]
Wake[Wake Lock System]
end
subgraph System["System Level"]
PushDaemon[Push Daemon]
NotifManager[Notification Manager]
AppManager[App Process Manager]
end
subgraph Apps["App Level"]
App1[App 1<br/>Not Running]
App2[App 2<br/>Not Running]
App3[App 3<br/>Not Running]
end
Radio --> PushDaemon
PushDaemon --> Power
Power --> Wake
PushDaemon --> NotifManager
NotifManager --> AppManager
AppManager --> App1 & App2 & App3
end
Note[Apps don't need to run<br/>OS handles everything]
The Doze Mode Innovation
Android introduced Doze mode to further optimize battery life. When the device is stationary and unused for extended periods, it enters increasingly aggressive power-saving states.
In Doze mode, network access is restricted to periodic “maintenance windows.” Notifications are batched and delivered during these windows rather than immediately. This can delay non-urgent notifications by minutes or even hours when the device is deeply idle.
High-priority notifications can bypass Doze mode, ensuring truly urgent messages still arrive immediately. But the default behavior prioritizes battery life over instant delivery.
stateDiagram-v2
[*] --> Active: Device in use
Active --> Idle: Screen off, stationary
Idle --> LightDoze: 30+ minutes
LightDoze --> DeepDoze: Hours unused
note right of Active
All notifications immediate
end note
note right of LightDoze
Maintenance every 15 min
Notifications batched
end note
note right of DeepDoze
Maintenance every 1-2 hours
Only high-priority bypasses
end note
LightDoze --> Active: Device used
DeepDoze --> Active: Device used
iOS Background App Refresh
iOS takes a different approach with Background App Refresh. Instead of scheduled maintenance windows, iOS learns when users typically use each app and schedules background activity accordingly.
If you usually check a news app in the morning, iOS might refresh it in the early morning so content is ready when you open it. If you rarely use an app, iOS deprioritizes its background activity.
Push notifications interact with this system—a notification can trigger immediate background app execution, but the system limits how much work apps can do to protect battery life.
The Major Push Notification Services
Understanding the major push services—their architectures, capabilities, and constraints—is essential for anyone working with push notifications. While the concepts are similar, the implementations differ in important ways.
Apple Push Notification Service (APNs)
APNs was introduced with iOS 3.0 in 2009, making it one of the first modern push notification services. It remains the only way to send push notifications to iOS devices.
Architecture
APNs operates through Apple’s global data center network. Application servers connect to APNs via HTTP/2, which allows efficient multiplexing of many notifications over a single connection. Notifications are sent as JSON payloads with headers indicating the target device and delivery preferences.
Apple devices maintain persistent connections to APNs. On iOS, this connection is managed entirely by the operating system—apps cannot access it directly. The connection uses a binary protocol optimized for efficiency.
flowchart TB
subgraph YourInfra["Your Infrastructure"]
Server[Application Server]
end
subgraph Apple["Apple Infrastructure"]
direction TB
APNS_API[APNs HTTP/2 API]
subgraph Clusters["Data Center Clusters"]
DC1[US Data Center]
DC2[EU Data Center]
DC3[APAC Data Center]
end
APNS_API --> DC1 & DC2 & DC3
end
subgraph Devices["iOS Devices"]
iPhone1[iPhone User A]
iPhone2[iPhone User B]
iPad1[iPad User C]
end
Server -->|HTTP/2 + JWT Auth| APNS_API
DC1 & DC2 & DC3 <-->|Binary Protocol| iPhone1 & iPhone2 & iPad1
Authentication
APNs supports two authentication methods. Token-based authentication uses JSON Web Tokens (JWT) signed with keys you download from Apple. Certificate-based authentication uses TLS client certificates. Token-based is now preferred as tokens do not expire like certificates.
Delivery Semantics
APNs provides best-effort delivery. If a device is offline, APNs queues one notification per app. When the device reconnects, it receives the most recent notification, not all queued notifications. This is intentional—old notifications are often no longer relevant.
APNs also provides feedback about invalid tokens. When you send to a token that is no longer valid (the user uninstalled your app), APNs tells you so you can stop sending to that token.
Payload and Features
APNs payloads are JSON with a maximum size of 4KB. They can include alert text, badge numbers, sounds, and custom data for your app to process. Rich notifications can include images, videos, and interactive elements.
APNs supports:
- Silent notifications that wake your app without alerting the user
- Time-sensitive notifications that bypass focus modes
- Critical alerts for emergencies that override Do Not Disturb
- Location-based notifications triggered by geographic regions
Firebase Cloud Messaging (FCM)
FCM is Google’s push notification service, successor to Google Cloud Messaging (GCM). It handles Android devices and can also deliver to iOS (via APNs), web browsers, and other platforms.
Architecture
FCM’s architecture is more complex than APNs because it supports multiple platforms. Your server connects to FCM via HTTP or XMPP protocols. FCM then routes messages appropriately—directly to Android devices, through APNs for iOS, or to browser push services for web.
flowchart TB
subgraph YourServer["Your Application Server"]
Logic[Business Logic]
end
subgraph FCM["Firebase Cloud Messaging"]
direction TB
API[FCM API<br/>HTTP/XMPP]
Router[Platform Router]
AndroidGateway[Android Gateway]
APNsBridge[APNs Bridge]
WebPushBridge[Web Push Bridge]
API --> Router
Router --> AndroidGateway
Router --> APNsBridge
Router --> WebPushBridge
end
subgraph APNS["Apple APNs"]
AppleServers[Apple Servers]
end
subgraph Devices["End Devices"]
Android[Android Devices]
iOS[iOS Devices]
Browser[Web Browsers]
end
Logic -->|Send Notification| API
AndroidGateway --> Android
APNsBridge --> AppleServers --> iOS
WebPushBridge --> Browser
Topics and Conditions
FCM offers powerful targeting capabilities beyond individual tokens. You can subscribe devices to topics and send notifications to all subscribers without knowing their individual tokens. You can also combine topics with conditions: “Send to users subscribed to ‘news’ AND ‘sports’ but NOT ‘politics’.”
Message Types
FCM distinguishes between notification messages and data messages. Notification messages are displayed automatically by the Android system when your app is in the background. Data messages are always delivered to your app’s code, giving you full control over presentation.
This distinction matters for app behavior. Notification messages are simpler but less flexible. Data messages require more work but offer complete control.
Priority Levels
FCM supports normal and high priority messages. High priority messages wake sleeping devices immediately. Normal priority messages may be delayed during Doze mode or battery saver states. Overusing high priority can lead to throttling.
Delivery Tracking
FCM provides analytics about message delivery through Firebase Console. You can see how many messages were sent, how many were delivered, and how many resulted in app opens. This visibility helps optimize notification strategies.
Web Push
Web push is a standards-based system that allows websites to send notifications to users even when the browser is closed. It works across Chrome, Firefox, Edge, Safari, and other modern browsers.
Architecture
Unlike mobile push, web push is an open standard with multiple implementations. Each browser vendor operates their own push service, but they all implement the same Web Push protocol.
flowchart TB
subgraph YourServer["Your Application Server"]
Logic[Notification Logic]
VAPID[VAPID Keys]
end
subgraph PushServices["Push Services"]
direction TB
Google[Google Push Service<br/>Chrome]
Mozilla[Mozilla Push Service<br/>Firefox]
Microsoft[WNS<br/>Edge]
Apple[Apple Push Service<br/>Safari]
end
subgraph Browsers["User Browsers"]
Chrome[Chrome Browser]
Firefox[Firefox Browser]
Edge[Edge Browser]
Safari[Safari Browser]
end
Logic -->|Web Push Protocol| Google
Logic -->|Web Push Protocol| Mozilla
Logic -->|Web Push Protocol| Microsoft
Logic -->|Web Push Protocol| Apple
Google --> Chrome
Mozilla --> Firefox
Microsoft --> Edge
Apple --> Safari
VAPID Authentication
Web push uses Voluntary Application Server Identification (VAPID) for authentication. You generate a key pair and include your public key when users subscribe to push. When sending notifications, you sign them with your private key. This proves the notification came from you without requiring registration with each browser vendor.
Encryption
All web push payloads are encrypted end-to-end. When a user subscribes, you receive their public key. You encrypt the notification payload so only their browser can decrypt it. Even the push service cannot read the content.
Limitations
Web push has significant limitations compared to mobile push. Users must explicitly subscribe (no silent registration). Browsers must be running (at least in the background) to receive notifications. Payload sizes are limited to 4KB. And the user experience varies significantly across browsers and platforms.
Comparison Summary
flowchart TB
subgraph Comparison["Push Service Comparison"]
direction TB
subgraph APNs["Apple APNs"]
A1[iOS/macOS/watchOS only]
A2[HTTP/2 API]
A3[4KB payload]
A4[Certificate or JWT auth]
A5[Best-effort delivery]
A6[Rich notification support]
end
subgraph FCMBox["Firebase Cloud Messaging"]
F1[Android + iOS + Web]
F2[HTTP or XMPP API]
F3[4KB payload]
F4[Server key auth]
F5[Topics and conditions]
F6[Detailed analytics]
end
subgraph WebPush["Web Push"]
W1[All modern browsers]
W2[Standard Web Push Protocol]
W3[4KB payload]
W4[VAPID auth]
W5[End-to-end encryption]
W6[Requires browser running]
end
end
The Infrastructure Behind Push Services
Push notification services operate one of the largest real-time messaging systems on the planet. Understanding their internal architecture reveals the engineering challenges they solve.
Global Distribution
Push services must be globally distributed. A device in Tokyo should connect to nearby servers, not servers in California. Latency matters for real-time notifications, and forcing traffic to cross oceans would degrade performance.
Both Apple and Google operate data centers worldwide. Devices connect to the geographically nearest data center. These data centers synchronize to ensure messages reach their destination regardless of where they are sent or received.
flowchart TB
subgraph Global["Global Push Infrastructure"]
direction TB
subgraph NorthAmerica["North America"]
NA1[US-East DC]
NA2[US-West DC]
end
subgraph Europe["Europe"]
EU1[EU-West DC]
EU2[EU-Central DC]
end
subgraph Asia["Asia Pacific"]
AP1[Singapore DC]
AP2[Tokyo DC]
AP3[Sydney DC]
end
NA1 <--> NA2
EU1 <--> EU2
AP1 <--> AP2 <--> AP3
NA1 & NA2 <--> EU1 & EU2
EU1 & EU2 <--> AP1 & AP2 & AP3
end
Device1[Device in New York] --> NA1
Device2[Device in London] --> EU1
Device3[Device in Tokyo] --> AP2
Connection Management
Managing billions of persistent connections is extraordinarily challenging. Each connection consumes memory and requires periodic maintenance. At the scale of APNs or FCM, even small inefficiencies multiply into massive resource requirements.
Push services use connection multiplexing, connection pooling, and custom protocols to minimize per-connection overhead. Connections are distributed across thousands of servers, with load balancers routing devices to appropriate servers based on capacity and geographic proximity.
Token to Connection Mapping
When your server sends a notification with a token, the push service must determine which server currently holds the connection for that device. This requires a distributed mapping system that tracks:
- Which device corresponds to each token
- Which server each device is connected to
- Whether the device is currently online
This mapping must be updated in real-time as devices connect, disconnect, and move between servers. The system must be highly available—if the mapping service fails, no notifications can be delivered.
flowchart LR
subgraph TokenRouting["Token to Connection Routing"]
direction TB
subgraph Send["1. Send Request"]
Server[Your Server]
API[Push API]
Server -->|Token: abc123| API
end
subgraph Lookup["2. Token Lookup"]
TokenDB[(Token Database)]
API -->|Lookup abc123| TokenDB
TokenDB -->|Device: xyz<br/>Server: Gateway-7| API
end
subgraph Route["3. Route to Server"]
Gateway7[Gateway Server 7]
API -->|Forward notification| Gateway7
end
subgraph Deliver["4. Deliver"]
Device[Device xyz]
Gateway7 -->|Push via connection| Device
end
end
Message Queuing
Devices are frequently offline. They might be in airplane mode, in areas with no coverage, or simply turned off. Push services must queue messages for later delivery.
But queuing creates challenges. How long should messages be kept? What happens when a device is offline for weeks? Should all messages be delivered or just the most recent?
Different services make different choices. APNs keeps only the most recent notification per app—if you send five notifications while a device is offline, only the last one is delivered. FCM can keep multiple messages but has storage limits and time-to-live settings.
stateDiagram-v2
[*] --> Received: Notification arrives at push service
Received --> CheckOnline: Check device status
CheckOnline --> DeliverNow: Device online
CheckOnline --> Queue: Device offline
Queue --> Stored: Store in queue
Stored --> CheckTTL: Periodic check
CheckTTL --> StillValid: TTL not expired
CheckTTL --> Expired: TTL expired
Expired --> Discard: Remove from queue
StillValid --> DeviceOnline: Device comes online
DeviceOnline --> Deliver: Deliver queued messages
DeliverNow --> Deliver
Deliver --> [*]
Discard --> [*]
Rate Limiting and Fairness
Push services implement rate limiting to prevent abuse and ensure fairness. If one application tries to send millions of notifications per second, it could overwhelm the infrastructure and impact all users.
Rate limits operate at multiple levels:
- Per-application limits (how many notifications your app can send per minute)
- Per-device limits (how many notifications can be delivered to one device)
- Per-token limits (preventing spam to a single user)
Push services also implement feedback mechanisms. If you consistently send to invalid tokens or if users frequently dismiss your notifications, your sending privileges may be reduced.
The Complete Notification Journey
Let us trace a notification from creation to display, understanding each step in the process.
sequenceDiagram
participant Backend as Your Backend
participant Queue as Message Queue
participant Worker as Push Worker
participant FCM as FCM/APNs
participant CDN as Push Infrastructure
participant Cell as Cell Tower
participant Device as User Device
participant OS as Device OS
participant App as Your App
participant UI as Notification UI
Note over Backend,UI: Complete Push Notification Journey
Backend->>Queue: Queue notification job
Queue->>Worker: Job ready
Worker->>Worker: Lookup user tokens
Worker->>FCM: Send notification request
FCM->>FCM: Authenticate sender
FCM->>FCM: Validate token
FCM->>FCM: Route to correct server
FCM->>CDN: Forward to device region
CDN->>Cell: Transmit to device
Cell->>Device: Radio wakeup + data
Device->>OS: Incoming push data
OS->>OS: Decrypt and validate
OS->>App: Deliver to app (optional)
OS->>UI: Display notification
Note over UI: User sees notification!
Step 1: Your Backend Decides to Notify
Something triggers a notification—a new message, a completed order, a security event. Your application logic determines who should be notified and what the notification should say.
For high-volume applications, notifications are typically queued rather than sent immediately. A message queue decouples the notification decision from the notification delivery, allowing your main application to continue without waiting for push delivery.
Step 2: Looking Up Tokens
Your push worker retrieves the device tokens for the target user. A user might have multiple tokens—one for their phone, one for their tablet, one for their work phone. Each token must receive the notification.
Token management is critical. You must track:
- Which tokens belong to which users
- When tokens were last known to be valid
- Platform information (iOS vs. Android) to format messages appropriately
Step 3: Sending to the Push Service
Your worker sends the notification to the appropriate push service. This involves:
- Authenticating with the push service
- Formatting the payload according to service requirements
- Setting delivery options (priority, expiration, collapse keys)
- Handling rate limits and retries
Step 4: Push Service Processing
The push service receives your request and:
- Validates your credentials
- Checks the token exists and is valid
- Determines which server holds the device connection
- Routes the message to that server
- If the device is offline, queues the message
Step 5: Delivery to Device
If the device is connected, the message travels:
- Through the push service’s internal network
- To the connection gateway server
- Over the persistent connection
- To the device’s cellular modem or WiFi radio
The cellular network coordinates with the device to wake it if necessary, using the radio resource control (RRC) state machine to minimize power consumption.
Step 6: Operating System Processing
The device’s operating system:
- Receives the encrypted payload
- Decrypts and validates it
- Determines which app should receive it
- Optionally wakes the app for processing
- Constructs the notification display
- Shows the notification to the user
Step 7: User Interaction
The user sees the notification and may:
- Tap to open the app
- Dismiss it
- Interact with action buttons
- Ignore it
Each interaction can trigger callbacks to your app, allowing you to track engagement and respond appropriately.
Push Notification Strategies and Patterns
Beyond the technical infrastructure, effective push notification usage requires thoughtful strategy. The same infrastructure that enables valuable notifications can also annoy users into disabling them entirely.
Permission and Trust
Users must grant permission for push notifications. This permission is precious—once denied, it is difficult to recover. Best practices include:
Delayed Permission Requests: Do not ask for permission immediately when users first open your app. Let them experience value first. Ask when they reach a point where notifications would clearly benefit them.
Explain the Value: Before requesting permission, explain what kinds of notifications you will send and why they matter. “Get instant alerts about messages from friends” is more compelling than a generic system permission dialog.
Respect the Decision: If users decline, do not repeatedly prompt them. They made a choice. Focus on delivering value through other means.
flowchart TB
subgraph BestPractices["Permission Request Best Practices"]
direction TB
Install[User Installs App] --> Wait[Let User Experience Value]
Wait --> Moment[Find Right Moment]
Moment --> Explain[Explain Notification Value]
Explain --> Ask[Request Permission]
Ask --> Granted[Permission Granted]
Ask --> Denied[Permission Denied]
Granted --> Deliver[Deliver Valuable Notifications]
Denied --> Respect[Respect Decision<br/>Don't Repeatedly Ask]
end
subgraph BadPractices["Common Mistakes"]
direction TB
B1[Ask immediately on install]
B2[No explanation of value]
B3[Repeatedly prompt after denial]
B4[Send irrelevant notifications]
end
Notification Frequency
More notifications is not better. Each notification costs user attention. Send too many and users disable them—or worse, uninstall your app.
Aggregate Where Possible: Instead of ten notifications for ten messages, send one notification summarizing them.
Time-Sensitive Only: Only interrupt users for truly time-sensitive information. A shipping notification when a package arrives is valuable. A reminder that you have not used the app in three days is not.
User Control: Let users customize what notifications they receive. Some want every update; others want only critical alerts.
Rich Notifications
Modern platforms support rich notifications with images, videos, action buttons, and interactive elements. Used well, these enhance the notification experience.
Images Add Context: A notification about a friend’s photo is better with a thumbnail of the photo.
Actions Save Steps: “Reply” and “Like” buttons on a message notification let users respond without opening the app.
Live Activities: On iOS, live activities can display ongoing events directly on the lock screen, updated in real-time.
flowchart TB
subgraph NotificationTypes["Notification Richness"]
direction LR
subgraph Basic["Basic Notification"]
BTitle[Title: New Message]
BBody[Body: John sent you a photo]
end
subgraph Rich["Rich Notification"]
RTitle[Title: New Message]
RBody[Body: John sent you a photo]
RImage[Image: Thumbnail preview]
RActions[Actions: Reply | Like]
end
subgraph Live["Live Activity"]
LTitle[Ongoing: Food Delivery]
LBody[Driver 5 min away]
LMap[Live Map]
LUpdate[Updates in real-time]
end
end
Silent Notifications
Silent notifications wake your app without alerting the user. They are useful for:
Background Updates: Sync data before users open the app so content is ready.
Token Refresh: Check if push tokens need updating.
Content Preparation: Download content referenced in upcoming notifications.
But silent notifications have limits. Operating systems restrict how often apps can be woken silently and how much work they can do. Abuse leads to throttling.
Alternative Approaches and Complementary Technologies
Push notifications are not the only way to deliver real-time information. Understanding alternatives helps you choose the right approach for each use case.
WebSockets for Active Sessions
When users are actively using your application, WebSockets often make more sense than push notifications. WebSockets provide:
- Lower latency (no intermediate push service)
- Full bidirectional communication
- More data per message
- Direct connection to your servers
Push notifications complement WebSockets by reaching users when they are not actively using your app.
flowchart TB
subgraph Combined["Combined Approach"]
direction TB
subgraph Active["User Active in App"]
WS[WebSocket Connection]
WS --> Instant[Instant Updates]
WS --> Bidirectional[Bidirectional Chat]
WS --> Rich[Rich Data Sync]
end
subgraph Inactive["User Not in App"]
Push[Push Notification]
Push --> Alert[Alert User]
Push --> Summary[Summary Info]
Push --> DeepLink[Deep Link to Content]
end
Active -->|App Closed| Inactive
Inactive -->|App Opened| Active
end
Server-Sent Events (SSE)
Server-Sent Events provide one-way streaming from server to client. They are simpler than WebSockets but only support server-to-client communication. SSE is useful for live dashboards, news feeds, or any scenario where the server broadcasts to clients.
SMS for Critical Communications
For truly critical notifications—password resets, two-factor authentication, emergency alerts—SMS remains valuable. SMS reaches phones without internet connectivity, works on feature phones, and has near-universal delivery rates.
The tradeoff is cost (SMS has per-message charges) and limited formatting (text only, character limits).
Email for Non-Urgent Updates
Email remains appropriate for notifications that are not time-sensitive. Weekly digests, monthly reports, and transactional confirmations work well as email. Users can read them when convenient without the interruption of push notifications.
The Future of Push Notifications
Push notification technology continues to evolve. Several trends are shaping its future.
Increased Intelligence
Operating systems are becoming smarter about notification delivery. iOS and Android both use machine learning to:
- Predict when users want to be interrupted
- Group and summarize notifications
- Suggest notification settings based on behavior
- Filter spam and low-value notifications
This intelligence reduces the burden on users while ensuring important notifications still arrive.
Richer Experiences
Notifications are becoming more capable. Live Activities on iOS show real-time updating content on the lock screen. Android’s ongoing notifications serve similar purposes. The line between notifications and widgets is blurring.
Future notifications might include:
- Interactive mini-apps within notifications
- Augmented reality overlays
- Voice-activated responses
- Integration with smart home devices
Privacy Enhancements
Privacy concerns are driving changes in notification infrastructure. Techniques like:
- End-to-end encryption (already used in web push)
- Privacy-preserving token systems
- On-device notification filtering
- Reduced tracking in delivery systems
These changes protect user privacy while maintaining notification functionality.
Cross-Platform Unification
Managing notifications across phones, tablets, computers, watches, and other devices is challenging. Users want notifications to follow them intelligently—appearing on whichever device they are using, not all devices simultaneously.
Platforms are developing systems to coordinate notification delivery across device types, reducing duplicate alerts while ensuring users receive important messages.
flowchart TB
subgraph Future["Future Notification Trends"]
direction TB
subgraph Intelligence["AI/ML Intelligence"]
I1[Predict best delivery time]
I2[Smart summarization]
I3[Relevance filtering]
I4[Behavior-based settings]
end
subgraph Experience["Richer Experiences"]
E1[Interactive mini-apps]
E2[Real-time updating content]
E3[AR/VR notifications]
E4[Voice interaction]
end
subgraph Privacy["Privacy Enhancement"]
P1[End-to-end encryption]
P2[On-device processing]
P3[Reduced tracking]
P4[User-controlled data]
end
subgraph CrossPlatform["Cross-Platform"]
C1[Follow user between devices]
C2[Deduplicate intelligently]
C3[Context-aware delivery]
C4[Unified settings]
end
end
Common Challenges and Solutions
Working with push notifications involves navigating several common challenges.
Token Management
Tokens change. Users reinstall apps, get new phones, or revoke permissions. Your system must:
Detect Invalid Tokens: Push services report when tokens are no longer valid. Listen for these responses and remove dead tokens from your database.
Handle Token Refresh: When tokens change, apps receive new tokens. Ensure your app sends updated tokens to your server promptly.
Avoid Duplicate Sends: The same user might have multiple valid tokens (multiple devices). Decide whether to send to all devices or implement logic to choose.
Delivery Reliability
Push notifications offer best-effort delivery, not guaranteed delivery. For critical communications:
Implement Fallbacks: If push fails, try email or SMS.
Track Delivery: Monitor delivery rates and investigate anomalies.
Handle Edge Cases: What happens if the notification arrives after it is relevant? Design your app to handle stale notifications gracefully.
Rate Limiting
Push services impose rate limits. Exceeding them degrades delivery for all your notifications. Strategies include:
Batch Requests: Aggregate multiple tokens into single API calls where supported.
Queue and Throttle: Do not send all notifications simultaneously. Spread them over time.
Prioritize: When hitting limits, ensure high-priority notifications are sent first.
Testing
Push notifications are hard to test. The chain from server to device involves many components outside your control. Consider:
Local Testing Tools: Firebase provides local testing tools. Apple’s Xcode includes push notification simulation.
Staging Environments: Maintain separate push credentials for development and production.
Real Device Testing: Simulators do not fully replicate push behavior. Test on real devices.
Conclusion
Push notifications are a remarkable piece of modern computing infrastructure. They represent a successful solution to a hard problem: how do you instantly reach billions of devices, distributed across the planet, many of which are offline or power-constrained, without draining batteries or overwhelming networks?
The answer involves persistent connections managed by operating systems, not apps. It requires global infrastructure operated by platform vendors. It demands careful protocol design that minimizes radio usage while maximizing delivery speed. And it depends on deep integration between software and hardware, from cellular modems to notification UI.
Understanding this infrastructure transforms how you think about push notifications. They are not simple messages sent to phones. They are messages that travel through a sophisticated system of queues, routers, persistent connections, and optimized protocols before finally appearing on a user’s screen.
For developers, this understanding leads to better implementations. You know why token management matters, why delivery is not guaranteed, and why rate limits exist. You understand the constraints you are working within and can design accordingly.
For architects, this understanding informs technology choices. You know when push notifications are the right solution and when alternatives like WebSockets or polling are more appropriate. You can evaluate push services based on their actual capabilities rather than marketing claims.
For everyone, this understanding demystifies a technology we use every day. The next time your phone buzzes with a notification, you will know the remarkable journey that message took to reach you—and appreciate the engineering that made it possible.
Push notifications exemplify a pattern we see throughout technology: simple experiences powered by complex systems. The user sees a message appear. Behind that simple moment is a global infrastructure of connections, protocols, and algorithms, all working together to deliver information instantly across any distance.
That is the magic of push notifications. And now you understand how the magic works.
Tags
Related Articles
The Good, the Bad, and the Ugly of Firebase
A comprehensive analysis of Firebase as a backend platform: services, pricing, scaling patterns, when to use it, when to avoid it, and how it compares to AWS and Azure alternatives.
Flutter: The Platform that Unifies Enterprise Mobile Development
A comprehensive guide to Flutter explained in human language: what it is, how it works, why it reduces costs, how to scale, and why it should be part of your enterprise stack.
Software Architecture: Beyond the Code
A comprehensive guide to software architecture explained in human language: patterns, organization, structure, and how to build systems that scale with your business.