EXPERIENCE: 5–10 Years
Core Responsibilities:
• Pipeline Management: Maintain high-throughput streaming pipelines to ingest logs from various sources (Firewalls, Cloud, Endpoints) to a central destination.
• Log Normalization: Write parsers to convert raw, messy logs into standard schemas (e.g., OCSF or ECS) for consistent querying. • Cost Optimization: Implement routing logic to send "high-value" data to the SIEM and "bulk" data to low-cost Object Storage (Data Lake).
• Data Preparation: Clean and structure data to enable AI/ML detection models and advanced analytics.
Must-Have Skills:
• Data Engineering: Proficiency in Python (for ETL) and SQL (for complex querying).
• Streaming Tech: Experience with Message Queues (e.g., Kafka, Pub/Sub) and stream processing concepts.
• Log Handling: Mastery of Regex and log parsing strategies for standard formats (Syslog, CEF, JSON).
• Storage Architecture: Understanding of Data Lake principles (Parquet/Avro formats) vs. Data Warehouses.
Preferred / Nice to Have:
• Experience with Vector Databases for storing embeddings.
• Knowledge of Log Observability/Routing tools (middleware that routes logs).
• Familiarity with Big Data frameworks (e.g., Spark, Flink).