Explain Codes LogoExplain Codes Logo

How can I send large messages with Kafka (over 15MB)?

java
performance-testing
kafka-configuration
data-management
Nikita BarsukovbyNikita Barsukov·Dec 24, 2024
TLDR

Maximise Kafka's payload handling with these system-wide settings:

  • Broker:

    • message.max.bytes: Raise the default limit to 15728640 (≈15MB).
    • replica.fetch.max.bytes: Keep this aligned to the value above.
  • Producer:

    • max.request.size: Bump this up to 15728640 to enable larger payloads.

In application:

Broker (server.properties):
// Let's treat our messages to more space! ☕

message.max.bytes=15728640
replica.fetch.max.bytes=15728640

Producer (Java snippet):
// Time to lift those payload limits! 🏋️‍♀️

Properties props = new Properties(); props.put("bootstrap.servers", "kafka-broker:9092"); props.put("max.request.size", "15728640"); KafkaProducer<String, byte[]> producer = new KafkaProducer<>(props);

Optimise Kafka by ensuring these values align with your data requirements. Encounter storage issues? Offload data to an external storage system or break it into manageable chunks.

Streamlined data management techniques

The right serialization matters

Choose a binary serialization format like Avro, Protobuf or Thrift over language-specific serialization for efficiency. They require schema management, but the benefits often outweigh the added complexity. For textual data, consider using efficient compression algorithms (like GZIP) for drastic size reduction.

Offload to external storage

For gigantic payloads, divert the core data to external storage systems such as Amazon S3 or HDFS. Kafka can then be used to share the references or URLs.

Network setup fine-tuning

Large messages ask for consideration of your network buffers and timeouts. If not, consider adjusting socket.receive.buffer.bytes and socket.send.buffer.bytes.

Keeping component settings in harmony

Ensure coherence in max.partition.fetch.bytes (consumer), max.request.size (producer), and message.max.bytes (broker). Harmony across all brokers, producers, and consumers results in a more fluent Kafka journey.

Replication nuances & silent failures

Silent failures can occur when the size of a message exceeds the value mentioned under replica.fetch.max.bytes. Avoid cluster inconsistencies by monitoring your logs.

Testing & Validation

Thorough testing under all possible scenarios is a must. Utilise the performance testing tools that Kafka provides to simulate production loads.

Evaluate the hit on performance

Handling large messages can impact Kafka cluster's throughput, latency, and stability. Resorting to a hybrid approach can help, where messages are directed through Kafka or alternate routes based on their size or importance.

Monitoring

Be aware of silent issues

Kafka won’t raise exceptions if large messages exceed certain parameters. To avoid these silent failures, implement log.message.format.version and constantly check logs related to replica.fetch.max.bytes.

Managing timeouts and retries

Consider increasing the request.timeout.ms and retry.backoff.ms uniquely in your producer configuration to avoid unnecessary retries due to long processing durations of humongous messages!

Observability is key

Observability into your Kafka environment is essential to sniff out any potential issues. Tools like Prometheus, Grafana, or even Kafka provided JMX metrics would help to maintain system health.