Skip to content

Logging Framework

This document provides an overview of the logging framework implemented in dsRAG.

Overview

The dsRAG logging framework provides detailed visibility into the library's operations, including document ingestion, querying, and chat interactions. It uses Python's standard logging module, making it easy to integrate with existing application logging.

Key Features

  • Standard Python Logging: Uses the built-in logging module
  • Hierarchical Loggers: Organized by component (dsrag.ingestion, dsrag.query, dsrag.chat)
  • Structured Data: Uses the extra parameter to include structured data with log messages
  • Operation IDs: Includes unique IDs to correlate logs from the same operation
  • Comprehensive Metrics: Tracks durations, counts, and other metrics
  • Configurable Levels: Different verbosity levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Logger Hierarchy

  • dsrag: Root logger
  • dsrag.ingestion: Document ingestion operations
  • dsrag.query: Knowledge base querying operations
  • dsrag.chat: Chat interactions
  • dsrag.dsparse: Document parsing and chunking operations
  • dsrag.dsparse.semantic_sectioning: Semantic sectioning operations
  • dsrag.dsparse.chunking: Document chunking operations

Log Levels

  • DEBUG: Detailed step timings, parameters, intermediate results
  • INFO: Major operation start/end points, key outcomes
  • WARNING: Potential issues that don't prevent operation
  • ERROR: Operation failures that might be recoverable
  • CRITICAL: Severe errors likely causing application failure

Configuring Logging

Basic Configuration

import logging

# Configure basic logging to see INFO level messages
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# Use dsRAG normally
from dsrag.knowledge_base import KnowledgeBase
kb = KnowledgeBase("example_kb")

Detailed Configuration

import logging
import sys

# Set up a detailed formatter
formatter = logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s - %(filename)s:%(lineno)d'
)

# Create a handler for console output
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(logging.DEBUG)
console_handler.setFormatter(formatter)

# Configure the dsrag logger
logger = logging.getLogger("dsrag")
logger.setLevel(logging.DEBUG)

# Clear existing handlers and add our handler
if logger.hasHandlers():
    logger.handlers.clear()
logger.addHandler(console_handler)

JSON Logging

For structured logging suitable for log analysis systems:

import logging
import sys
from pythonjsonlogger import jsonlogger

# Create a JSON formatter
formatter = jsonlogger.JsonFormatter(
    '%(asctime)s %(name)s %(levelname)s %(message)s %(filename)s %(lineno)d'
)

# Create a handler for console output
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(logging.DEBUG)
console_handler.setFormatter(formatter)

# Configure the dsrag logger
logger = logging.getLogger("dsrag")
logger.setLevel(logging.DEBUG)
logger.propagate = True

# Clear existing handlers and add our handler
if logger.hasHandlers():
    logger.handlers.clear()
logger.addHandler(console_handler)

Log Examples

Document Ingestion (INFO)

2025-04-03 12:34:56 - dsrag.ingestion - INFO - Starting document ingestion - {"kb_id": "kb123", "doc_id": "doc456", "file_path": "example.pdf"}
2025-04-03 12:35:01 - dsrag.ingestion - INFO - Document ingestion successful - {"kb_id": "kb123", "doc_id": "doc456", "total_duration_s": 5.1234}

Document Ingestion (DEBUG)

2025-04-03 12:34:56 - dsrag.ingestion - DEBUG - Parsing and Chunking complete - {"kb_id": "kb123", "doc_id": "doc456", "step": "parse_chunk", "duration_s": 2.1234, "num_sections": 5, "num_chunks": 25}
2025-04-03 12:34:58 - dsrag.ingestion - DEBUG - AutoContext complete - {"kb_id": "kb123", "doc_id": "doc456", "step": "auto_context", "duration_s": 1.5432, "model": "OpenAIChatAPI"}
2025-04-03 12:34:59 - dsrag.ingestion - DEBUG - Embedding complete - {"kb_id": "kb123", "doc_id": "doc456", "step": "embedding", "duration_s": 0.8765, "num_embeddings": 25, "model": "OpenAIEmbedding"}

Query (INFO)

2025-04-03 12:40:12 - dsrag.query - INFO - Starting query - {"kb_id": "kb123", "query_id": "q789", "num_search_queries": 3}
2025-04-03 12:40:15 - dsrag.query - INFO - Query successful - {"kb_id": "kb123", "query_id": "q789", "total_duration_s": 3.2109, "num_final_segments": 4}

Chat (INFO)

2025-04-03 12:45:23 - dsrag.chat - INFO - Starting chat response - {"thread_id": "t123", "message_id": "m456", "kb_ids": ["kb123", "kb789"], "stream": false, "user_input_length": 182}
2025-04-03 12:45:30 - dsrag.chat - INFO - Chat response completed - {"thread_id": "t123", "message_id": "m456", "kb_ids": ["kb123", "kb789"], "total_duration_s": 7.6543, "response_length": 512, "num_queries": 2, "num_segments": 5, "num_citations": 3}

Document Parsing (DEBUG)

2025-04-03 12:34:10 - dsrag.dsparse - DEBUG - Starting VLM file parsing - {"kb_id": "kb123", "doc_id": "doc456", "file_path": "/path/to/document.pdf"}
2025-04-03 12:34:20 - dsrag.dsparse - DEBUG - VLM file parsing complete - {"kb_id": "kb123", "doc_id": "doc456", "step": "vlm_parse", "duration_s": 10.3421, "num_elements": 35, "provider": "gemini", "model": "gemini-2.0-flash"}

Semantic Sectioning (DEBUG)

2025-04-03 12:34:20 - dsrag.dsparse.semantic_sectioning - DEBUG - Starting semantic sectioning - {"kb_id": "kb123", "doc_id": "doc456", "document_lines_count": 768, "llm_provider": "openai", "model": "gpt-4o-mini"}
2025-04-03 12:34:25 - dsrag.dsparse.semantic_sectioning - DEBUG - Sectioning iteration complete - {"kb_id": "kb123", "doc_id": "doc456", "iteration": 1, "start_line": 0, "end_line": 300, "sections_found": 3, "duration_s": 4.7891}
2025-04-03 12:34:30 - dsrag.dsparse.semantic_sectioning - DEBUG - Semantic sectioning complete - {"kb_id": "kb123", "doc_id": "doc456", "total_duration_s": 10.1234, "iterations_used": 3, "sections_count": 8}

More Information

For more examples and details, see examples/logging_example.py in the repository.