Configuration#
Configure euclid_rag for your environment and document sources.
Overview#
The system uses a YAML configuration file to define vector store settings, document sources, and ingestion parameters. The main configuration file is typically located at python/euclid/rag/app_config.yaml
.
Main Configuration File#
Basic Structure#
vector_store:
type: "faiss"
redmine_index_dir: "redmine_vector_store"
public_data_index_dir: "public_data_vector_store"
data:
dpdd:
config: "path/to/dpdd_ingest_config.yaml"
Vector Store Settings#
- type
Currently only “faiss” is supported for the vector store backend.
- {prefix}_index_dir
Path where FAISS index files (
index.faiss
,index.pkl
) will be stored. This can be:Relative path: Within the repository (e.g.,
"vector_store"
)Absolute path: Full system path (e.g.,
"/data/vector_stores/euclid"
)
Example Configurations#
Development Configuration#
For local development:
llm:
model: "granite3.2:latest"
temperature: 0.1
base_url: "http://localhost:11434"
vector_store:
type: "faiss"
redmine_index_dir: "./data/redmine_vector_store"
public_data_index_dir: "./data/public_data_vector_store"
json_data:
redmine_json_dir: "./data/redmine_exports/"
chunk_size: 500 # Smaller for development
Production Configuration#
For production deployment:
llm:
model: "granite3.2:latest"
temperature: 0
base_url: "http://ollama:11434"
embeddings:
class: "E5MpsEmbedder"
model_name: "intfloat/e5-large-v2" # More accurate
batch_size: 32 # Larger batches
vector_store:
type: "faiss"
redmine_index_dir: "/data/euclid_rag/redmine_vector_store"
public_data_index_dir: "/data/euclid_rag/public_data_vector_store"
json_data:
redmine_json_dir: "/data/redmine_exports/"
chunk_size: 800
High Performance Configuration#
For faster responses:
llm:
model: "mistral:7b" # Smaller, faster model
temperature: 0
base_url: "http://ollama:11434"
embeddings:
class: "E5MpsEmbedder"
model_name: "intfloat/e5-small-v2" # Faster embedding
batch_size: 64
DPDD Ingestion Configuration#
The DPDD (Data Product Description Document) ingestion requires a separate configuration file, typically dpdd_ingest_config.yaml
:
# Base URLs for DPDD content
base_urls:
- type: msp
base_url: https://euclid.esac.esa.int/dr/q1/dpdd/
version: dm10
# Topics to ingest
topics:
- name: Purpose and Scope
link: purpose.html
- name: LE1 Data Products
link: le1dpd/le1index.html
- name: SIM Data Products
link: simdpd/simindex.html
- name: VIS Data Products
link: visdpd/visindex.html
# Ingestion limits and options
topics_number_limit: 0 # 0 = no limit (ingest all topics)
scrape_all: true # If true, ignores topics list and scrapes all sections
# Sections to skip during ingestion
banned_sections:
names:
- Header
- Data Header
full_links: []
Configuration Parameters#
DPDD Parameters#
- base_urls
List of base URLs to scrape DPDD content from.
- topics
Specific topics to ingest. Each topic has a
name
andlink
.- topics_number_limit
Maximum number of topics to ingest. Set to
0
for no limit.- scrape_all
If
true
, ignores the topics list and scrapes all available sections.- banned_sections
Sections to skip during ingestion:
names: Section names to skip (e.g., “Header”, “Data Header”)
full_links: Complete URLs to skip
Note
Banned sections help prevent ingesting content that might confuse the LLM, such as repetitive headers or navigation elements.
Custom Configuration Paths#
Using Custom Config Files#
You can specify custom configuration files when running ingestion:
# For publications
python python/euclid/rag/ingestion/ingest_publications.py -c /path/to/custom_config.yaml
# For Redmine (or other JSON sources)
python python/euclid/rag/ingestion/ingest_redmine.py -c /path/to/custom_config.yaml
# For DPDD
python python/euclid/rag/ingestion/ingest_dpdd.py --config /path/to/custom_config.yaml
Note: You can ingest multiple sources into the same vector store by using the same redmine_index_dir
or public_data_index_dir
in the main configuration file.
Environment Variables#
Some configurations can be overridden with environment variables:
# Set custom vector store path
export EUCLID_RAG_VECTOR_STORE_PATH="/custom/path/to/vector_store"
# Set custom config file
export EUCLID_RAG_CONFIG_PATH="/path/to/config.yaml"
Troubleshooting Configuration#
Common Issues#
- Permission Errors
Ensure the specified directories are writable by the user running the application.
- Path Not Found
Verify that relative paths are correct relative to your working directory.
- YAML Syntax Errors
Use a YAML validator to check your configuration files for syntax issues.
Validation#
Test your configuration by running:
python -c "
import yaml
with open('python/euclid/rag/app_config.yaml', 'r') as f:
config = yaml.safe_load(f)
print('Configuration loaded successfully!')
print(f'Vector store type: {config[\"vector_store\"][\"type\"]}')
"
Next Steps#
After configuring your system:
Document Ingestion - Ingest documents into your configured vector stores
Running the Chatbot - Run the chatbot with your configuration
Troubleshooting - Resolve common configuration issues