50,000 feet overview
- Transport: Jabber protocol
- Language: Python
- Probing: set of python modules + API
- Triggers: set of python modules + API
- Storage: local to nodes
- Query: API to query data on the node
- Monitor: mokup in GTK-Python
The big picture
Jabber for transport
- Instant messaging: permanent connection
- Offers group communication and point to point
- Homogeneous naming mechanism
- Open specification (IETF RFC)
- Free server with possibility of industrial solution
- Lazy: evaluation in 03/2001
Jabber Implementation
- Reimplemented minimal client capabilities in Python
- XML-RPC on top of Jabber for direct and group queries
- Encryption though SSL
- Authentication is weak ATM
- Code in CVS: rhn/playpen/Jabber
Extensible/Flexible infrastructure
- 5 Key APIs:
- Probes: data acquisition
- DB: data storage and query
- Monitoring: checks for conditions and triggers events
- Scheduler: on each client add/remove modules
- Modules: dynamically query/check/load signed Python code
- Transparent remote invocation through XML-RPC/Jabber
Probes
- Base python class + derivation
- Sampling frequency can be modified
- Predefined: disk, load5, network use (and services ports)
- Probes are named e.g. "base/rhn/load5"
- Adding SNMP or custom samples should be easy (Python)
DB
- Only local storage ATM
- Using bsddb python interface to db (switch to DB3)
- API: list, last, closest, averages, avgList
- Auto cleanup by averaging after a month
- rpmfind since january: 40MB of data
Monitoring
- Module query the database for abnormal conditions
- Basically an entry point call periodically
- Anything scriptable in Python can be tested
- Dynamically removable
- Automatic dependancy to probes checked
- Geneate events which are logged and dispatched on the group
- Monitors are named too e.g. "rhn/base/netdev"
Scheduler
- Keep a list of modules to call and the interval
- Also check for incoming data from Jabber
- Handle the XML-RPC entry points
Modules
- PGP/GPG signed python modules
- uses popen3 to only load verified modules
- PBM: how to build the web ot trust (seed + delagation)
- Allow to extend both probes and monitors
- Module servers can be installed and queried for a given module
Monitor station
- Mokup in GTK-Python
- Shows basic interfaces, used to check the framework
- "nearly" usable
- I suck at UI work !!!
Monitor station screenshot
TODO List
- Improve the network code for disconnect
- Build an RHNAPP front-end connected to Jabber
- Database for sampling ? DB3 ? Alternate storage nodes
- Dynamic triggers: threshold on values and deltas
- Packaging is weak
- Testing, testing, testing