ZettaLogs – Online Log Management and Analytics System
Today’s online applications consist of middleware units (such as databases, web servers, etc.) working together with application software. These units as well as the operating system they run on continuously generate records to mirror their statuses. Collecting and aggregating these records at a central point for real-time examination, analysis and rule-based alert generation is a crucial task in monitoring the health of such component-based systems. Receiving instant notifications in error conditions, and examining the sequence of events at varying levels of detail in order to determine the root cause is necessary to assure the smooth operation of the system and for resolution of errors in a way that has the least impact on customers. The ZettaLogs system is designed as a cloud-based log management service to achieve the above objectives. Users register to use the service from the web interface and request the logs they are interested in from the ZettaLogs service. The system maintains records for a retrospective period and provides users with the necessary infrastructure and interfaces to analyze their records in real time.
- Analysis, archiving and aggregation of application logs in a central place,
- Supervising the health of applications through inspection of application logs,
- Decomposition of text logs into fields and indexing for real-time search,
- Visualization of metrics extracted from logs,
- Defining alerts on data extracted from logs and sending real-time notifications to team members when a problem occurs,
- Analysis:
- Users can define thresholds for alert creation,
- Correlative analysis using multi-search feature,
- Aggregation analysis on log fields,
- Near real-time processing of streaming logs,
- Multi-tenant, scalable and fault-tolerant design
- Correlation between logs: Complex Event Processing (CEP)
- Knowledge Representation and Reasoning (KRR): Rule based knowledge representation
- Production Rule System: Rule engine
- Easy DSL for rule entry: User / system rules
- Facts: received logs, user / system definitions, user can define on the go (2., 3. level facts)
- Automatic anomaly detection
- Feature extraction (feature vectors from system/user facts)
- Automatic and unsupervised learning
- Rule extraction from learned models
- Effective detection using rule engine
- User guidance:
- Interactive feature definition
- Capability to modify automatically extracted rules