Processing Steps
Using the Quartz Scheduler to monitor SMART-enabled hard drives makes it possible to automate the entire process from data generation and collection to reporting. Key advantages are minimal user intervention and the compilation of a consistent long-term data set. Quartz provides a powerful and higly flexible scheduling framework that can execute hundreds or thousands of computing tasks repeatedly without user intervention. Once set up, a schedule will run until the scheduler is interrupted or shut down. This is especially advantageous for recurring and repetitive tasks such as monitoring hardware. When the data gathered by such tasks is not only evaluated for possible errors or faulty behavior but also stored on a long-term basis, it is possible to compile a record of 'normal operation' from which newly occurring anomalies stand out more clearly.
SMART data is generated, of course, by running smartctl with the appropriate options on the host machine (see 1. below). If the host machine is accessible via SSH, WebFamulus can execute smartctl itself and capture the output. Alternatively, a script (Python, Perl, etc.) installed on the host machine and configured to be executed by a utility program such as cron can run smartctl and forward the output via email or file upload to WebFamulus.
| SMART-Data Processing |
![]() |
The Smartmontools daemon (smartd) can be configured to send out emails when a new error is detected in a logfile or when a SMART attribute is failing. However, error logs are only updated when self-tests are run and, hence, an error may only be detected after a test has been initiated. SMART-enabled devices usually can run two self-tests: short and long. A short test, which takes about two to three minutes, updates the SMART attributes, checks the electrical and mechanical parts, and performs spot checks on the disk surface. A long test--which scans the entire disk surface for errors--may take hours to complete, depending on the size of the disk. Hence, regular testing of hard drives is imparative in order to minimize the danger of data loss due to equipment failure.
WebFamulus uses the Quartz scheduler and an extensive background database to generate, store, and analyze SMART data. All steps are automated and run on a daily, weekly, or longer schedule but, through a web interface, they can also be executed on demand. Furthermore, some jobs are executed as sequences. For example, before a daily report is issued, a collector job gathers all SMART reports that have not been processed and initiates a parsing job; then graphs and summary reports are compiled, and a mailer job sends the report to the stakeholders. (That is, steps 2 to 6 are executed as a sequence before step 7 is initiated. Otherwise, step 2, for example, may be running independently all the time at regular intervals.)
The advantages of regular testing and reporting are obvious: it establishes a basis of normal operation against which errors and faulty behavior is more easily noticed. A rise in the average working temperature, for example, could be an indication of a fan not working properly. This may go unnoticed if only occasional spot checks are done. With automated SMART testing and long-term, data storage the chances for early detection--before a serioushardware failure--are much higher.
Yet, even a small office may have tens of computers to monitor, and testing and evaluating can be a time-consuming task. Automating the process with a Quartz-based application like WebFamulus allows hundreds or thousands of hard drives to be monitored every day with minimal intervention beyond the initial set-up. And, having test results stored in a database makes it possible to document past performance and maintenance, which is very useful for system administrators when hard disk monitoring is part of a service agreement.

No comments:
Post a Comment