Building Aegis: an anomaly-based IDS from scratch

December 20, 2025
securitymlpythonproject

Why I built this

Most signature-based IDS tools are reactive — they can only catch attacks they've already seen. I wanted to explore whether a purely statistical + ML approach could flag novel anomalies without any known signatures.

The approach

Aegis uses a two-layer detection strategy:

  1. Statistical baseline — rolling entropy, inter-arrival times, byte rate
  2. ML classifier — trained on labelled PCAP data using scikit-learn

The idea is that the statistical layer catches obvious outliers cheaply, and the ML layer handles subtler patterns.

Processing the data

The biggest challenge was feature extraction from raw .pcap files. I used Scapy to parse packets and NumPy for sliding-window computation:

def extract_features(packets, window=100):
    features = []
    for i in range(len(packets) - window):
        window_pkts = packets[i:i+window]
        entropy = compute_entropy(window_pkts)
        byte_rate = sum(len(p) for p in window_pkts) / window
        features.append([entropy, byte_rate, ...])
    return np.array(features)

Results

Tested against a 50MB PCAP dataset with injected anomalies:

MethodPrecisionRecallF1
Statistical only0.710.840.77
ML only0.830.790.81
Hybrid (Aegis)0.910.880.89

What I'd do differently