Application Monitoring using Splunk

Across the enterprise, there is untapped value in the machine data generated by your business infrastructure and applications. All your IT applications, systems and technology infrastructure generate data every millisecond of every day. This machine data is one of the fastest growing and most complex areas of big data. It’s also one of the most valuable, containing a definitive record of all user transactions, customer behavior, sensor activity, machine behavior, security threats, fraudulent activity and more. Making use of this data, however, presents real challenges. Traditional data analysis, management and monitoring solutions are simply not engineered for this high volume, high velocity and highly diverse data.

Splunk Enterprise makes it easy for Cayan to collect, analyze and act upon the untapped value of the big data generated by its technology infrastructure, security systems and business applications—giving us the insights to drive operational performance and business results. 

By monitoring and analyzing everything from merchants' transactions to security events and network activity, Splunk Enterprise helps Cayan gain valuable Operational Intelligence from its machine-generated data.  Splunk will collect data from virtually any source, including logs, sensors, network traffic, web servers, and custom applications. Having easy access to this data allows Cayan to search, monitor and analyze our operational data to discover powerful insights across multiple use cases like security, IT operations and application delivery, giving us valuable intelligence across our entire organization. With Splunk Enterprise, everyone from data or security analysts to product owners can gain insights to drive operational performance and business results. Whether we’re looking to troubleshoot IT, monitor our security posture or optimize our marketing campaigns, Splunk Enterprise helps us get there.

Realtime Transaction Monitoring and Alerting

Splunk alerts can be based on a variety of threshold and trend-based conditions and to any level of granularity. The search language goes beyond simple Boolean searches into fielded searches, statistical searches and subsearches. You can correlate on anything you want and alert on complex patterns such as abandoned shopping carts, brute force attacks and fraud scenarios. Add context to alerts. Alerts can be embedded with machine data context, thereby reducing mean-time-to-resolution (MTTR).

Splunk's realtime monitoring and alerting is a critical part of Cayan's Network Operations Center strategy. Splunk helps Cayan monitor its merchants' and partners' experiences, and alerts us whenever something doesn't look quite right. Using Splunk, we turn our searches into real-time alerts. Searches can be saved and scheduled for continual monitoring and can trigger alerts via email, which integrate into our PagerDuty and StatusPage.io systems.

Scheduling alerts is a great way to complete the investigation of a problem or security incident by proactively looking for similar occurrences in the future. Cayan has over 60 alerts that are constantly monitoring our production environment and payment gateway. This is a form of passive monitoring - Splunk is analyzing the telemetry generated by our merchants' transactions. This passive monitoring solution integrates with PagerDuty and StatusPage.io, and is the perfect complement to our active monitoring solution, using synthetic transactions.

Some examples of how Cayan uses Splunk alerts include:

  • Monitoring for unusual and unexpected responses from our payment partners, such as network communication failures
  • Monitoring merchants' transactions for unusual numbers of declines
  • Basic fraud detection, by monitoring the numbers of refunds and authorizations performed

If a production incident happens, such as a payment partner experiencing an outage, Cayan can use Splunk to automatically update StatusPage.io with information about the outage, and quickly identify the impacted merchants and partners, so that we could send them a notification and incident report.

Security and Network Monitoring

Network and Technical Operations: provide rapid incident response, real-time correlation and in-depth monitoring across data sources; conduct statistical analysis for advance pattern detection and threat defense Infrastructure and Operations Management: proactively monitor across IT silos to ensure uptime; rapidly pinpoint and resolve problems; identify infrastructure service relationships, establish baselines and create analytics to report on SLAs or track SLAs of service providers. Splunk Enterprise lets you correlate complex events from multiple data sources across your IT infrastructure so you can monitor more meaningful events. For example, you can track a series of related events as a single transaction to measure duration or status.

All of Cayan's application servers and its networking appliances forward data to Splunk. Splunk makes it ridiculously easy to ingest data from a variety of sources. Cayan's C# application servers forward application telemetry using syslog4net, a syslog companion library for the popular log4net logging framework. Splunk's universal forwarders pick up the data and send it to our centralized logging servers. Our networking appliances (switches, routers, firewalls, ...) also send every network event to Splunk over the syslog protocol. By appending a simple Correlation ID to each message - a UUID - we're able to track every step in a transaction's lifecycle from the moment it reaches our perimeter network, across each hop in our SOA architecture, to when it gets routed to one of our payment partners, and then monitor everything on its way back out again.

Development, QA, Production Support, and Release Engineering

Accelerate and empower developers, QA and operations; provide end-to-end visibility across distributed applications and infrastructures; troubleshoot and isolate problems across application environments; measure service levels and application performance; gain insight on mobile app performance.

Splunk plays a pivotal part in Cayan's development, quality assurance, production support and release engineering activities. Applications have long lives - the time you spend developing and testing your app is just the first part of its life cycle. Splunk gives us insight into how our customers are using our products, and lets us know of any unexpected behaviors noticed in production that we might not have seen in testing. Splunk gives us visibility into our technical debt, application performance, and a whole lot more.

This sort of realtime insight into our customers' experiences help us make game-day decisions on our Release Engineering team. Cayan typically releases updates to its gateway multiple times each day without any downtime, using a strategy called a Blue/Green Deployment. We'll typically start things off by selecting a small group of merchants to be "canaries" and roll out our changes to them first. As a final safeguard after all of our automated tests and user acceptance tests have happened, Splunk will tell us if these handful of merchants run into any issues in production, and whether it's safe to roll the changes out or whether we need to roll things back. We would not be able to move at such a high velocity without the operational insight provided by Splunk.

Business Analysts

Advanced features enable technical users to explore and interact with their machine data with a powerful user interface and Search Processing Language designed to search, correlate and visualize data. Business users can gain rapid insights using a simple drag-and-drop interface to analyze data without learning the search language. Pattern detection, instant pivot and an advanced field extractor makes it easy for everyone in your organization—including nonspecialists—to turn machine data into powerful insights.

By creating reports and dashboards from our application data, Cayan is using Splunk as a mini data warehouse, and can provide up to the minute details about our business to business analysts and executives. It's easy to see breakdowns between our different product lines, or to use Splunk to answer questions about how our merchants use our products like "what % of all transactions use Apple Pay", and then come up with an A/B test to see if we can't move that needle.