Machine Data to Readable Reports - System Monitoring, Alerting and Reporting - Ashley Fisher, Business Systems Analyst, University of the Sunshine Coast | ANZTLC15

Within a year, USC have enhanced various system administration tasks. From length file and database interrogation, we are now running with a proactive instant alerting process where incidents are captured and actioned before staff and students are impacted. A number of commercial, open-source and in-house tools have been utilised to facilitate these improvements and sights are now set on shifting to self-healing incidents. Delivered at Innovate and Educate: Teaching and Learning Conference by Blackboard. 24 -27 August 2015 in Adelaide, Australia.
of 29
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  • 1. Machine Data to Readable Reports System Monitoring, Alerting and Reporting Ashley Fisher University of the Sunshine Coast, Queensland.
  • 2. Welcome Ashley Fisher Business Systems Analyst University of the Sunshine Coast Sunshine Coast, Queensland, Australia. 2
  • 3. 3 • System Health • Monitoring • Alerting • Reporting
  • 4. 4 Microsoft Windows Ahead While this presentation focusses on Microsoft Windows Server and associated technologies, the concepts and implementation of these systems is similar in other operating environments.
  • 5. Underlying Infrastructure • USC is Microsoft centric • Servers are running on Windows Server 2008 R2 • Authentication through Active Directory • Currently running Microsoft SQL Server 2008 5
  • 6. Blackboard Infrastructure • 5 Environments • Total 12 Application Servers • 3 Dedicated Batch Servers • 4 SQL Clusters, 1 Standalone MSSQL Installation • 7 F5 BigIP Pools • 7tb File Share Storage • Approx. 12,000 Successful Logins per Day. 6
  • 7. Mediasite Infrastructure • 2 Environments • Total 12 Application Servers • 2 SQL Clusters • 8 F5 BigIP Pools • 9.5tb File Share Storage • 380 Recorded Presentations per Week • Approximately 1,100 hours of content viewed per Day 7
  • 8. Monitoring Systems In Place • Nagios – Monitoring Server Availability • Zabbix (Pictured Left) – Monitoring Server Availability and Performance – Currently Proof of Concept • Splunk – Log Monitoring 8
  • 9. 9 Splunk captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations. Splunk has a mission of making machine data accessible across an organization by identifying data patterns, providing metrics, diagnosing problems and providing intelligence for business operations. Splunk is a horizontal technology used for application management, security and compliance, as well as business and web analytics.
  • 10. Splunk Interface 10
  • 11. Blackboard Logging • 67 Log files on each Blackboard host » A lot of information we can and are using. » A lot we’re potentially missing. • Daily rotation of important logs » Troubleshooting issues across multiple days is frustrating. • Logs archived Monday morning, weekly » As above, however we need to unzip the archived logs to get access to the contained information. 11
  • 12. Blackboard Database 12 • The activity_accumulator table retains a transcript of user activity. • We can use the behind table joins to track user login times, course access times, and individual content item interactions. • USC rotates our activity_accumulator table data into a backup database every 180 days.
  • 13. Student Contesting Late Submission Penalty 13 Students are penalised by a percentage of their received grade for late assignment submissions, students do contest the penalty from time to time. • Traditional Method of Investigation – Database Query (activity_accumulator) – Individual Host Log Interrogation (Repeat) • Lots of Steps • Time Consuming • Room for Error or Misinterpretation
  • 14. Student Contesting Late Submission Penalty 14 Students are penalised by a percentage of their received grade for late assignment submissions, students do contest the penalty from time to time. • Intermediate Method – Database Query (activity_accumulator) – Log Into Splunk – Search string: index=“blackboard_prod” “_userpk1_” • Few Steps • Easy Training • Now Dashboarded
  • 15. 15
  • 16. 16 Zabbix is an enterprise open source monitoring solution for networks and applications(…) It is designed to monitor and track the status of various network services, servers, and other network hardware. • Simple checks can verify the availability and responsiveness of standard services such as SMTP or HTTP without installing any software on the monitored host. • A Zabbix agent can also be installed on UNIX and Windows hosts to monitor statistics such as CPU load, network utilization, disk space, etc. • As an alternative to installing an agent on hosts, Zabbix includes support for monitoring via SNMP, TCP and ICMP checks, as well as over IPMI, JMX, SSH, Telnet and using custom parameters(…)
  • 17. Zabbix Interface 17 Overview/Landing Page
  • 18. • Zabbix holds a very template centred view of deployment. • The approach we’ve taken is to have ‘opt-in’ templates available for hosts. • CPU Load, Memory Use, Network Traffic/Bandwidth and HDD Space checks are in a template added to all hosts with an agent installed Our Zabbix Environment 18
  • 19. Zabbix Templates • Example Template: ‘Core Infrastructure Connectivity’. 19 When this template is applied to a host, the Zabbix agent on the host will ping those end-points locally. We can see if an individual host cannot connect to the time servers, domain controllers or our LDAP servers.
  • 20. Blackboard and Zabbix • We have multiple Blackboard specific templates, one is inline with the last example, however it watches availability and response times of external connectors, SafeAssign and Collaborate for example. 20
  • 21. Blackboard and Zabbix 21
  • 22. Blackboard and Zabbix • One very powerful tool we have is JMX monitoring pulling information about the Blackboard application itself. 22
  • 23. Zabbix Environment Mapping Zabbix allows you to map relationships between nodes. Show where problems lay, and their impact. IE. If there was a problem with file03, the line between bbdev01 and file03 would turn red, file03’s status would change from OK to Problem. This is an easy way to assess what the problem will impact. 23
  • 24. Mediasite and Zabbix • Mediasite is really the forefront of monitoring through Zabbix. • In Nagios, we currently have 5 checks per recorder in production. • In Zabbix so far, I have 26 individual checks per recorder. 24
  • 25. 25
  • 26. • The graph below shows the available space on our production Blackboard file share for the incident. • Emergency maintenance was carried out on the 15th to increase the allocated disk space. The Platforms in Collaboration 26
  • 27. • An alert was set up in Splunk to in real time, let us know when a student submits an assessment submission is greater than 200mb. The Platforms in Collaboration 27
  • 28. Self-Healing? The above video is the only way that I could think of how to present this particular part. In the video, I have the Zabbix monitoring platform on one side, and a camera feed of the remote Mediasite recorder on the other. As illustrated in the previous slide, there are a few checks deemed “self-healing”, this is one such scenario. In the event that the Mediasite scheduler service fails, or stops, Zabbix picks it up, realises there is something not right, and I’ve got it sending a command to the recorder to shut the software down, and force a restart on the recorder. 28
  • 29. Questions? 29
  • Search
    Similar documents
    View more...
    Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks