Use SNS & SQS to build Pub/Sub System

2018.05.23

Recently, we build pub/sub system based on AWS’s SNS & SQS service, take some notes.

Originally, we have an pub/sub system based on redis(use BLPOP to listen to a redis list). It’s really simple, and mainly for cross app operations. Now we have needs to enhance it to support more complex pubsub logic, eg: topic based distribution. It don’t support redelivery as well, if subscribers failed to process the message, message will be dropped.

There’re three obvious choices in my mind:

kafka
AMQP based system (rabbitmq,activemq …)
SNS + SQS

My demands for this system are:

Support message persistence.
Support topic based message distribution.
Easy to manage.

The data volume won’t be very large, so performance and throughput won’t be critical concerns.

I choose SNS + SQS, main concerns are from operation side:

kafka need zookeeper to support cluster.
rabbitmq need extra configuration for HA, and AMQP model is relatively complex for programming.

So my decision is:

application publish message to SNS topic
Setup multi SQS queues to subscribe SNS topic
Let different application processes to subscribe to different queues to finish its logic.

SQS and SNS is very simple, not too much to say, just some notes:

SQS queue have two types, FIFO queue and standard queue. FIFO queue will ensure message order, and ensure exactly once delivery, tps is limited(3000/s) standard queue is at least once delivery, message order is not ensured, tps is unlimited. In my case, I use standard queue, order is not very important.
SQS message size limit is 256KB.
Use goaws for local development, it has problem on processing message attributes, but I just use message body. messages only store in ram, will be cleared after restarted.
If you failed to deliver message to sqs from sns, can setup topic’s sqs failure feedback role to log to cloudwatch, in most case it’s caused by iam permission.
Message in sqs can retain at most 14 days.
Once a message is received by a client, it will be invisible to other clients in visibility_timeout_seconds(default 30s). It means if your client failed to process the message, it will be redelivered after 30s.
SQS client use long polling to receive message, set receive_wait_time_seconds to reduce api call to reduce fee.
If your client failed to process a message due to bug, the message will be redelivered looply, set redrive_policy for the queue to limit retry count, and set a dead letter queue to store those messages. You can decide how to handle them late.

I setup SNS and SQS via terraform, used following resources:

aws_sns_topic
aws_sns_topic_subscription
aws_sqs_queue
aws_sqs_queue_policy (iam policy to allow sns to send msg to sqs)

comments powered by Disqus

△

Shining Moon

Use SNS & SQS to build Pub/Sub System