Running the Chatbot#

Once you have completed Installation, Configuration, and Document Ingestion, you can run the euclid_rag chatbot interface.

Local Deployment#

Standard Streamlit Launch#

For local development and testing:

cd python/euclid
streamlit run rag/app.py

This starts the Streamlit web interface, typically accessible at http://localhost:8501.

Custom Port and Host#

To run on a different port or host:

cd python/euclid
streamlit run rag/app.py --server.port 8080 --server.address 0.0.0.0

Production Deployment Options#

For production environments:

# With specific configuration
streamlit run rag/app.py --server.port 80 --server.address 0.0.0.0

# With custom config file
EUCLID_RAG_CONFIG_PATH=/path/to/prod_config.yaml streamlit run rag/app.py

Docker Deployment#

Container-based deployment with Docker Compose provides isolation and easier management.

Starting Services#

# Start all services
docker compose up --build

This launches:

  • Streamlit application - The main web interface

  • Ollama LLM server - Local language model service

  • Supporting services - Database, networking, etc.

Setting up the LLM Model#

After containers are running, pull the desired model:

# Pull the default model (Mistral)
docker exec -it euclid_rag-ollama-1 ollama pull mistral:latest

# Or pull alternative models
docker exec -it euclid_rag-ollama-1 ollama pull llama2:latest
docker exec -it euclid_rag-ollama-1 ollama pull codellama:latest

Available Models#

Check which models are available:

# List available models to pull
docker exec -it euclid_rag-ollama-1 ollama list

# Check model details
docker exec -it euclid_rag-ollama-1 ollama show mistral:latest

Docker Benefits#

The Docker setup provides:

  • Isolated environment with all dependencies

  • Ollama LLM server running in a separate container

  • Streamlit application accessible via web browser

  • Automatic service orchestration with Docker Compose

  • Easy scaling and deployment management

Using the Interface#

Web Interface Features#

The Streamlit interface provides:

Chat Interface
  • Natural language input for questions

  • Real-time responses from the LLM

  • Conversation history and context

Document Querying
  • Search across multiple document types

  • Publications, DPDD, and other ingested content

  • Semantic search with vector similarity

Source Attribution
  • View source documents for each answer

  • Direct links to original content where available

  • Similarity scores from vector search and document ranking scores

Interactive Elements
  • Configuration adjustments

  • Export conversation history

  • Document ingestion for updating knowledge bases

Query Examples#

Try these example queries to test your system:

General Questions
  • “What is the purpose of the Euclid mission?”

  • “How are VIS data products structured?”

  • “Explain the LE1 data processing pipeline”

Technical Queries
  • “What file formats are used for SIM data products?”

  • “How is astrometric calibration performed?”

  • “What are the quality requirements for photometry?”

Document-Specific
  • “Find information about header keywords”

  • “Show me examples of DPDD data structures”

  • “What are the validation procedures?”

Configuration During Runtime#

Note: Configuration changes require rebuilding the container since the config file is embedded in the Docker image.

To change models:

# Pull new model (if not already available)
docker exec -it euclid_rag-ollama-1 ollama pull mistral:7b-instruct

# Update app_config.yaml
# Change: model: "granite3.2:latest"
# To:     model: "mistral:7b-instruct"

# Rebuild and restart the application
docker compose up -d --build euclid

Temperature and Behavior Tuning#

Adjust response characteristics in app_config.yaml:

llm:
  model: "granite3.2:latest"
  temperature: 0.2  # Adjust between 0.0-1.0
  base_url: "http://ollama:11434"
Temperature Effects:
  • 0.0 - Always chooses most likely response (deterministic)

  • 0.1 - Very consistent, minimal creativity

  • 0.3 - Balanced factual accuracy with slight variation

  • 0.7 - More creative, conversational responses

  • 1.0 - Highly creative, unpredictable responses

Embedding Model Configuration#

Optimize embedding performance:

embeddings:
  class: "E5MpsEmbedder"  # Optimized for Apple Silicon
  model_name: "intfloat/e5-large-v2"  # Higher accuracy
  batch_size: 32  # Adjust based on available memory
Embedding Model Options:
  • "intfloat/e5-small-v2" - Fast, lower memory usage

  • "intfloat/e5-base-v2" - Balanced performance

  • "intfloat/e5-large-v2" - Best accuracy, higher memory usage

Environment Variables#

Override settings without modifying config files:

# Set custom model
export EUCLID_RAG_LLM_MODEL="mistral:latest"
export EUCLID_RAG_TEMPERATURE="0.3"

# Set custom vector store path
export EUCLID_RAG_VECTOR_STORE_PATH="/custom/path"

# Enable debug mode
export EUCLID_RAG_DEBUG=true

# Then run the application
streamlit run rag/app.py

Streamlit Configuration#

Create a .streamlit/config.toml file for UI customization:

[theme]
primaryColor = "#1f77b4"
backgroundColor = "#ffffff"
secondaryBackgroundColor = "#f0f2f6"
textColor = "#262730"

[server]
port = 8501
address = "localhost"

Performance Optimization#

Memory Management#

For large document collections:

  • Monitor memory usage during queries

  • Restart services periodically to clear cache

  • Adjust chunk sizes in configuration if needed

Response Time Optimization#

To improve query response times:

  • Use SSD storage for vector stores

  • Increase available RAM for embedding operations

  • Choose appropriate LLM models (smaller models = faster responses)

  • Optimize vector store configuration for your use case

Monitoring and Logging#

Application Logs#

View application logs:

# For local deployment
tail -f logs/euclid_rag.log

# For Docker deployment
docker compose logs -f euclid_rag
docker compose logs -f ollama

Performance Metrics#

Monitor key metrics:

  • Query response time

  • Memory usage

  • Vector store size

  • Document retrieval accuracy

Health Checks#

Verify services are running correctly:

# Check Streamlit is accessible
curl http://localhost:8501/healthz

# Check Ollama service (Docker)
docker exec -it euclid_rag-ollama-1 ollama list

Troubleshooting Runtime Issues#

Common issues during runtime:

Slow Responses
  • Check available memory and CPU

  • Verify vector store isn’t corrupted

  • Consider using a smaller LLM model

Connection Errors
  • Ensure all services are running

  • Check firewall and port configurations

  • Verify Docker containers are healthy

Inaccurate Results
  • Review ingested document quality

  • Adjust similarity thresholds

  • Re-run ingestion if needed

For detailed troubleshooting, see Troubleshooting.

Advanced Usage#

Custom Integration#

Integrate euclid_rag into your own applications:

from euclid.rag import chatbot

# Configure retriever
retriever = chatbot.configure_retriever()

# Create router with custom callback
router = chatbot.create_euclid_router(callback_handler=my_callback)

# Process queries programmatically
response = router.invoke({"query": "What is Euclid?"})

API Usage#

Use the underlying functions directly:

from euclid.rag.retrievers.generic_retrieval_tool import get_generic_retrieval_tool

# Get retrieval tool
tool = get_generic_retrieval_tool(llm, retriever)

# Use for custom applications
result = tool.invoke("your question here")

Next Steps#