πΎ Disque S204
UtilisΓ©β¦
Disponibleβ¦
Pctβ¦
π§ MΓ©moire RAM
UtilisΓ©eβ¦
Disponibleβ¦
Pctβ¦
β‘ Charge CPU
1 minβ¦
5 minβ¦
15 minβ¦
π Quick Ask 24h
Callsβ¦
RAG usedβ¦
p50 latencyβ¦
π¦ Ollama LOCAL (ModΓ¨les Souverains S204)
Chargementβ¦
π Cascade Quick Ask v3.13
Chargementβ¦
π Sovereign API + Remote
Engineβ¦
Providers actifsβ¦
Port 4000β¦
Kaggle Remoteβ¦
URLβ¦
Models GPUβ¦
π‘ Si Kaggle DOWN : reset le tunnel cloudflared sur ton notebook Kaggle, puis update OLLAMA_REMOTE_URL dans /etc/weval/ollama-remote.env
π RAG SMART (Doctrines + Vault)
Wiki MD docsβ¦
Obsidian vaultβ¦
TF-IDF indexβ¦
Index Γ’geβ¦
ποΈ Compression de Contexte (Conversations longues)
Colle ici une conversation longue. Le brain va la rΓ©sumer en gardant: dΓ©cisions, faits techniques, actions, Γ©tats. Γconomie tokens 60-90%.
𧬠Composition WEVAL Brain v6
WEVAL BRAIN v6 = WEVIA Master + Ollama Sovereign + RAG SMART + Compression
ββ COUCHE 1 β WEVIA MASTER (orchestrator) ββββββββββββββββββββββββββββββ
β /api/wevia-master-api.php β 2820+ intents, multi-agent β
β Triggers: NL chat, qa, demande IA, status, etc. β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββ COUCHE 2 β QUICK ASK CASCADE (RAG-aware Q&A) ββββββββββββββββββββββββ
β /api/wevia-quick-ask.php v3.13 β 5+ providers cascade β
β - Cerebras qwen-3-235b (primary, 248ms) β
β - Mistral small-latest (fast, 500ms) β
β - Cohere command-r-plus (safety, 2s) β
β - NVIDIA NIM Llama 3.3 70b (cloud) β
β - GitHub Models Llama 3.1 8b (cloud) β
β - tinydolphin LOCAL (uncensored, 100% sovereign) β
β - llama3.2:1b LOCAL (uncensored, 100% sovereign) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββ COUCHE 3 β RAG SMART (knowledge retrieval) ββββββββββββββββββββββββββ
β - 200+ wiki doctrines + 500+ obsidian vault β
β - TF-IDF index 756 docs, cosine fallback β
β - Tier1/2/3 scoring + filename PREFIX +500 β
β - Cache 5min auto-invalidate sur mtime β
β - Citations validation (cited_sources/invalid) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββ COUCHE 4 β COMPRESSION CONTEXTE (Γ©conomie tokens) βββββββββββββββββββ
β /api/weval-brain-v6-api.php?action=compress β
β - RΓ©sume conversations longues via Mistral (60-90% reduction) β
β - PrΓ©serve dΓ©cisions + faits techniques + actions + Γ©tats β
β - Γlimine politesses, redondances, debug, essais ratΓ©s β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββ COUCHE 5 β OLLAMA UNCENSORED (souverainetΓ© absolue) βββββββββββββββββ
β - tinydolphin:latest (637 MB, 1B params, Q4_0) β
β - llama3.2:1b (1.3 GB, 1.2B params, Q8_0) β
β - dolphin-llama3:8b (4.7 GB, 8B params, Q4_0) [slow CPU] β
β - all-minilm:latest (46 MB, 23M, embeddings) β
β - Endpoint: http://127.0.0.1:11434/v1/chat/completions β
β - 0β¬ cost, 0 cloud dependency, censorship-free β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββ COUCHE 6 β REMOTE GPU (vitesse + qualitΓ©) βββββββββββββββββββββββββββ
β Kaggle Cloudflare Tunnel β Ollama GPU T4 30h/sem GRATUIT β
β Setup: notebook Kaggle + cloudflared + URL β /etc/weval/ollama-remote.env
β β οΈ Actuellement DOWN β restart tunnel pour rΓ©activer β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ENDPOINTS:
π /api/weval-brain-v6-api.php?action=state β Γ©tat complet
π§ͺ /api/weval-brain-v6-api.php?action=test_model β test modΓ¨le
ποΈ /api/weval-brain-v6-api.php?action=compress β compression
π¬ /api/wevia-quick-ask.php β Q&A cascade
π /api/wevia-master-api.php β NL orchestrator
π /api/quick-ask-stats.php β analytics 24h
πΎ /api/disk-monitor.php β disk monitoring