Logging Framework

This document provides an overview of the logging framework implemented in dsRAG.

Overview

The dsRAG logging framework provides detailed visibility into the library's operations, including document ingestion, querying, and chat interactions. It uses Python's standard logging module, making it easy to integrate with existing application logging.

Key Features

Standard Python Logging: Uses the built-in logging module
Hierarchical Loggers: Organized by component (dsrag.ingestion, dsrag.query, dsrag.chat)
Structured Data: Uses the extra parameter to include structured data with log messages
Operation IDs: Includes unique IDs to correlate logs from the same operation
Comprehensive Metrics: Tracks durations, counts, and other metrics
Configurable Levels: Different verbosity levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)

Logger Hierarchy

dsrag: Root logger
dsrag.ingestion: Document ingestion operations
dsrag.query: Knowledge base querying operations
dsrag.chat: Chat interactions
dsrag.dsparse: Document parsing and chunking operations
dsrag.dsparse.semantic_sectioning: Semantic sectioning operations
dsrag.dsparse.chunking: Document chunking operations

Log Levels

DEBUG: Detailed step timings, parameters, intermediate results
INFO: Major operation start/end points, key outcomes
WARNING: Potential issues that don't prevent operation
ERROR: Operation failures that might be recoverable
CRITICAL: Severe errors likely causing application failure

Configuring Logging

Basic Configuration

import logging

# Configure basic logging to see INFO level messages
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# Use dsRAG normally
from dsrag.knowledge_base import KnowledgeBase
kb = KnowledgeBase("example_kb")

Detailed Configuration

import logging
import sys

# Set up a detailed formatter
formatter = logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s - %(filename)s:%(lineno)d'
)

# Create a handler for console output
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(logging.DEBUG)
console_handler.setFormatter(formatter)

# Configure the dsrag logger
logger = logging.getLogger("dsrag")
logger.setLevel(logging.DEBUG)

# Clear existing handlers and add our handler
if logger.hasHandlers():
    logger.handlers.clear()
logger.addHandler(console_handler)

JSON Logging

For structured logging suitable for log analysis systems:

import logging
import sys
from pythonjsonlogger import jsonlogger

# Create a JSON formatter
formatter = jsonlogger.JsonFormatter(
    '%(asctime)s %(name)s %(levelname)s %(message)s %(filename)s %(lineno)d'
)

# Create a handler for console output
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(logging.DEBUG)
console_handler.setFormatter(formatter)

# Configure the dsrag logger
logger = logging.getLogger("dsrag")
logger.setLevel(logging.DEBUG)
logger.propagate = True

# Clear existing handlers and add our handler
if logger.hasHandlers():
    logger.handlers.clear()
logger.addHandler(console_handler)

Log Examples

Document Ingestion (INFO)

2025-04-03 12:34:56 - dsrag.ingestion - INFO - Starting document ingestion - {"kb_id": "kb123", "doc_id": "doc456", "file_path": "example.pdf"}
2025-04-03 12:35:01 - dsrag.ingestion - INFO - Document ingestion successful - {"kb_id": "kb123", "doc_id": "doc456", "total_duration_s": 5.1234}

Document Ingestion (DEBUG)

2025-04-03 12:34:56 - dsrag.ingestion - DEBUG - Parsing and Chunking complete - {"kb_id": "kb123", "doc_id": "doc456", "step": "parse_chunk", "duration_s": 2.1234, "num_sections": 5, "num_chunks": 25}
2025-04-03 12:34:58 - dsrag.ingestion - DEBUG - AutoContext complete - {"kb_id": "kb123", "doc_id": "doc456", "step": "auto_context", "duration_s": 1.5432, "model": "OpenAIChatAPI"}
2025-04-03 12:34:59 - dsrag.ingestion - DEBUG - Embedding complete - {"kb_id": "kb123", "doc_id": "doc456", "step": "embedding", "duration_s": 0.8765, "num_embeddings": 25, "model": "OpenAIEmbedding"}

Query (INFO)

2025-04-03 12:40:12 - dsrag.query - INFO - Starting query - {"kb_id": "kb123", "query_id": "q789", "num_search_queries": 3}
2025-04-03 12:40:15 - dsrag.query - INFO - Query successful - {"kb_id": "kb123", "query_id": "q789", "total_duration_s": 3.2109, "num_final_segments": 4}

Chat (INFO)

2025-04-03 12:45:23 - dsrag.chat - INFO - Starting chat response - {"thread_id": "t123", "message_id": "m456", "kb_ids": ["kb123", "kb789"], "stream": false, "user_input_length": 182}
2025-04-03 12:45:30 - dsrag.chat - INFO - Chat response completed - {"thread_id": "t123", "message_id": "m456", "kb_ids": ["kb123", "kb789"], "total_duration_s": 7.6543, "response_length": 512, "num_queries": 2, "num_segments": 5, "num_citations": 3}

Document Parsing (DEBUG)

2025-04-03 12:34:10 - dsrag.dsparse - DEBUG - Starting VLM file parsing - {"kb_id": "kb123", "doc_id": "doc456", "file_path": "/path/to/document.pdf"}
2025-04-03 12:34:20 - dsrag.dsparse - DEBUG - VLM file parsing complete - {"kb_id": "kb123", "doc_id": "doc456", "step": "vlm_parse", "duration_s": 10.3421, "num_elements": 35, "provider": "gemini", "model": "gemini-2.0-flash"}

Semantic Sectioning (DEBUG)

2025-04-03 12:34:20 - dsrag.dsparse.semantic_sectioning - DEBUG - Starting semantic sectioning - {"kb_id": "kb123", "doc_id": "doc456", "document_lines_count": 768, "llm_provider": "openai", "model": "gpt-4o-mini"}
2025-04-03 12:34:25 - dsrag.dsparse.semantic_sectioning - DEBUG - Sectioning iteration complete - {"kb_id": "kb123", "doc_id": "doc456", "iteration": 1, "start_line": 0, "end_line": 300, "sections_found": 3, "duration_s": 4.7891}
2025-04-03 12:34:30 - dsrag.dsparse.semantic_sectioning - DEBUG - Semantic sectioning complete - {"kb_id": "kb123", "doc_id": "doc456", "total_duration_s": 10.1234, "iterations_used": 3, "sections_count": 8}

More Information

For more examples and details, see examples/logging_example.py in the repository.