Vertical Slice Benefit

2022 Oct 11 byDruDru

Recently, while demoing Message Aid to a colleague the demo god's blessed me with a teachable moment. While on the demo, what should have been an easy operation was failing. At the time I was completely unable to see why the application was failing. After the call, I quickly SSH'd into the server to inspect things. After a wee bit of journald and grep I was able to see exactly why it was failing. Thank you to solid logging.

That next week, I set out to implement centralized logging so that if this happened in the future I wouldn't be caught in a lurch. While implementing centralized logging and making it visible on a Grafana instance, I ran across an unexpected benefit of the vertical slice architecture.

The Idea

In a variety of languages and frameworks its common to see the log message annotated with the class name. In the case of Java or dotnet, I've grown accustom to my log messages always containing the fully qualified name of the class in the log message:

[10:00:02 MessageAid.SomeNamespace.Class INFO] Look! A log message

or in JSON (Serilog)

{
  "@l":"INFO",
  "@t":"2022-10-11T14:34:46.4198800Z",
  "@m":"Look! A log message",
  "@i":"1d4cc25c",
  "SourceContext": "MessageAid.SomeNamespace.Class"
}

While playing around in Grafana I had the obvious idea that I could monitor the health of my feature by tracking the log messages for the namespace that contained the vertical slice. For this example the feature is a message locker, which stores your messages in your own cloud account. All of the functionality is stored in a single directory/namespace, which allows me to scope the log messages I care about to a single namespace.

{app="ma", source_context=~"MessageAid.MessageLocker.+"}

loki query

However, that doesn't help me see the success or failure of the feature, for that I'm going to group it by the log level.

sum by (level) (
  count_over_time(
    {app="ma", source_context=~"MessageAid.MessageLocker.+"}[$__interval]
  )
)

loki query

Currently, there where only info messages, so this graphic is showing log volume by host (which in the future would be the log level)

With this query, I can now see the overall volume of logs, which gives me a sense of activity in the system. Even a spike in "good" messages is an interesting signal that I can keep my eye on. Being able to group them by level gives me a nice, log data only, overview of the system. If I see an increase in warning or error, I can zoom into those logs quickly to diagnose the issue.

In closing

Like most operations people, I like being able to see what my system is doing. I've previously been limited in what logging tools I could deploy and use based on either cost or scale. Right now Message Aid is still very small, so implementing a log aggregation system at this point was easy and pleasant.

© 2023 A Curious Mind, LLC. All rights reserved.
Message Aid and the Message Aid logo are trademarks and tradenames of A Curious Mind, LLC and may not be used or reproduced without consent.