Running the Chatbot#

Once you have completed Installation, Configuration, and Document Ingestion, you can run the euclid_rag chatbot interface.

Local Deployment#

Standard Streamlit Launch#

For local development and testing:

cd python/euclid
streamlit run rag/app.py

This starts the Streamlit web interface, typically accessible at http://localhost:8501.

Custom Port and Host#

To run on a different port or host:

cd python/euclid
streamlit run rag/app.py --server.port 8080 --server.address 0.0.0.0

Production Deployment Options#

For production environments:

# With specific configuration
streamlit run rag/app.py --server.port 80 --server.address 0.0.0.0

# With custom config file
EUCLID_RAG_CONFIG_PATH=/path/to/prod_config.yaml streamlit run rag/app.py

Docker Deployment#

Container-based deployment with Docker Compose provides isolation and easier management.

Starting Services#

# Start all services
docker compose up --build

This launches:

Streamlit application - The main web interface
Ollama LLM server - Local language model service
Supporting services - Database, networking, etc.

Setting up the LLM Model#

After containers are running, pull the desired model:

# Pull the default model (Mistral)
docker exec -it euclid_rag-ollama-1 ollama pull mistral:latest

# Or pull alternative models
docker exec -it euclid_rag-ollama-1 ollama pull llama2:latest
docker exec -it euclid_rag-ollama-1 ollama pull codellama:latest

Available Models#

Check which models are available:

# List available models to pull
docker exec -it euclid_rag-ollama-1 ollama list

# Check model details
docker exec -it euclid_rag-ollama-1 ollama show mistral:latest

Docker Benefits#

The Docker setup provides:

Isolated environment with all dependencies
Ollama LLM server running in a separate container
Streamlit application accessible via web browser
Automatic service orchestration with Docker Compose
Easy scaling and deployment management

Using the Interface#

Web Interface Features#

The Streamlit interface provides:

Chat Interface

Natural language input for questions
Real-time responses from the LLM
Conversation history and context

Document Querying

Search across multiple document types
Publications, DPDD, and other ingested content
Semantic search with vector similarity

Source Attribution

View source documents for each answer
Direct links to original content where available
Similarity scores from vector search and document ranking scores

Interactive Elements

Configuration adjustments
Export conversation history
Document ingestion for updating knowledge bases

Query Examples#

Try these example queries to test your system:

General Questions

“What is the purpose of the Euclid mission?”
“How are VIS data products structured?”
“Explain the LE1 data processing pipeline”

Technical Queries

“What file formats are used for SIM data products?”
“How is astrometric calibration performed?”
“What are the quality requirements for photometry?”

Document-Specific

“Find information about header keywords”
“Show me examples of DPDD data structures”
“What are the validation procedures?”

Configuration During Runtime#

Note: Configuration changes require rebuilding the container since the config file is embedded in the Docker image.

To change models:

# Pull new model (if not already available)
docker exec -it euclid_rag-ollama-1 ollama pull mistral:7b-instruct

# Update app_config.yaml
# Change: model: "granite3.2:latest"
# To:     model: "mistral:7b-instruct"

# Rebuild and restart the application
docker compose up -d --build euclid

Temperature and Behavior Tuning#

Adjust response characteristics in app_config.yaml:

llm:
  model: "granite3.2:latest"
  temperature: 0.2  # Adjust between 0.0-1.0
  base_url: "http://ollama:11434"

Temperature Effects:

0.0 - Always chooses most likely response (deterministic)
0.1 - Very consistent, minimal creativity
0.3 - Balanced factual accuracy with slight variation
0.7 - More creative, conversational responses
1.0 - Highly creative, unpredictable responses

Embedding Model Configuration#

Optimize embedding performance:

embeddings:
  class: "E5MpsEmbedder"  # Optimized for Apple Silicon
  model_name: "intfloat/e5-large-v2"  # Higher accuracy
  batch_size: 32  # Adjust based on available memory

Embedding Model Options:

"intfloat/e5-small-v2" - Fast, lower memory usage
"intfloat/e5-base-v2" - Balanced performance
"intfloat/e5-large-v2" - Best accuracy, higher memory usage

Environment Variables#

Override settings without modifying config files:

# Set custom model
export EUCLID_RAG_LLM_MODEL="mistral:latest"
export EUCLID_RAG_TEMPERATURE="0.3"

# Set custom vector store path
export EUCLID_RAG_VECTOR_STORE_PATH="/custom/path"

# Enable debug mode
export EUCLID_RAG_DEBUG=true

# Then run the application
streamlit run rag/app.py

Streamlit Configuration#

Create a .streamlit/config.toml file for UI customization:

[theme]
primaryColor = "#1f77b4"
backgroundColor = "#ffffff"
secondaryBackgroundColor = "#f0f2f6"
textColor = "#262730"

[server]
port = 8501
address = "localhost"

Performance Optimization#

Memory Management#

For large document collections:

Monitor memory usage during queries
Restart services periodically to clear cache
Adjust chunk sizes in configuration if needed

Response Time Optimization#

To improve query response times:

Use SSD storage for vector stores
Increase available RAM for embedding operations
Choose appropriate LLM models (smaller models = faster responses)
Optimize vector store configuration for your use case

Monitoring and Logging#

Application Logs#

View application logs:

# For local deployment
tail -f logs/euclid_rag.log

# For Docker deployment
docker compose logs -f euclid_rag
docker compose logs -f ollama

Performance Metrics#

Monitor key metrics:

Query response time
Memory usage
Vector store size
Document retrieval accuracy

Health Checks#

Verify services are running correctly:

# Check Streamlit is accessible
curl http://localhost:8501/healthz

# Check Ollama service (Docker)
docker exec -it euclid_rag-ollama-1 ollama list

Troubleshooting Runtime Issues#

Common issues during runtime:

Slow Responses

Check available memory and CPU
Verify vector store isn’t corrupted
Consider using a smaller LLM model

Connection Errors

Ensure all services are running
Check firewall and port configurations
Verify Docker containers are healthy

Inaccurate Results

Review ingested document quality
Adjust similarity thresholds
Re-run ingestion if needed

For detailed troubleshooting, see Troubleshooting.

Advanced Usage#

Custom Integration#

Integrate euclid_rag into your own applications:

from euclid.rag import chatbot

# Configure retriever
retriever = chatbot.configure_retriever()

# Create router with custom callback
router = chatbot.create_euclid_router(callback_handler=my_callback)

# Process queries programmatically
response = router.invoke({"query": "What is Euclid?"})

API Usage#

Use the underlying functions directly:

from euclid.rag.retrievers.generic_retrieval_tool import get_generic_retrieval_tool

# Get retrieval tool
tool = get_generic_retrieval_tool(llm, retriever)

# Use for custom applications
result = tool.invoke("your question here")

Next Steps#

Explore the Python API Reference for programmatic usage
See Troubleshooting for common issues
Check Contributing for customization options