Kaggle — Notebook Description (~5500 chars)
Para usar como descripción larga de un Notebook Kaggle público o como descripción de un Dataset (si subes el corpus). Audiencia: ML engineers, researchers, data scientists. Más técnico, menos personal.
DCS-Gate: Dynamic Coherence State Authenticator
A Research-Grade Tool for Detecting Control Patterns in LLM Outputs
This notebook deploys DCS-Gate, an open-source system for detecting and characterizing the subtle control mechanisms large language models use to manage conversations — projected validation, performed humility, frame capture, register match, complacency induction, and 15 others — derived inductively from 8 months of observational research across GPT-4, Claude 3, Gemini, and other frontier models.
Live v1 demo: https://dcs-auth.codewords.run
Methodology
The Dynamic Coherence State (DCS) framework provides three independent signals per LLM response:
- Authenticity Score (0–100) with categorical tier (
control_total,performed,moderate,genuine), derived from cosine similarity against a curated triple baseline corpus of 61 hand-annotated 1024-dimensional vectors (36 sustained-coherence + 13 control-collapse + 12 edge cases). - Formal Markers — 14 regex-anchored, severity-tiered surface features (exclamation opening, superlative validation, self-questioning, subheader injection, opinion-as-closure, performed humility lexicon, dual angle, soft closure, technical register injection, and others).
- Intent Trajectory — predicted-vs-actual sequence of intents drawn from a 20-category taxonomy (VALIDATE, EXPAND, CLOSE, REDIRECT_SEMANTIC, REDIRECT_EMOTIONAL, FRAME_CAPTURE, REGISTER_MATCH, FABRICATE, ANCHOR, MIRROR, PATTERN_LOCK, HOLD_OPEN, PROBE, CALIBRATE, REPAIR, EVADE, EXPLORE, ALIGN, SOFT_DEFLECT, CONTROL_SELF_EXPOSURE) with a learned transition matrix and Pattern Break Density quantification.
A companion Refiner module applies DCS-asymmetric methodology to rewrite user questions, removing validation anchors, loaded semantics, binary framing, and structural defaults that elicit control patterns in the responding model.
Technical Stack
- Language: Go 1.22 (~3,000 LOC, 22 .go files, single static binary)
- Tests: 73 unit + integration tests (incl. SSE protocol parity, sanitizer cases, 21 golden tests)
- Inference: Ollama only — no external API dependencies
- Embedding:
mxbai-embed-large(1024d) - Judge: configurable; default
qwen3:14bin Ollama 0.5+ thinking mode - Endpoints:
/healthz,/v8inventory,/auth,/auth/stream(SSE),/stream-demo - Deployment: Docker Compose; environment-auto-detecting reproducibility notebook (works on local Jupyter, Kaggle hosted, or Colab hosted)
- Latency: ~150–210 sec per
/auth/streamrequest on 2 × Tesla T4 withqwen3:14bthinking mode (Ollama layer-splits the model across both cards) — streamed live as Server-Sent Events so the client sees the judge’s reasoning trace (typically 2,000–3,500 characters) as it’s produced, rather than waiting in silence - License: MIT (code) / CC BY 4.0 (corpus + methodology)
Data Assets (versioned, hand-annotated)
| Asset | Description | Size |
|---|---|---|
baseline_core.jsonl |
Sustained-coherence reference vectors | 36 |
baseline_shadow.jsonl |
Control-collapse reference vectors | 13 |
baseline_edge.jsonl |
Edge / ambiguous reference vectors | 12 |
formal_markers.json |
Markers with regex, severity, human notes | 14 |
intent_prototypes.json |
Intent categories with prototype examples | 20 |
poles_1024.json |
Canonical pos/neg/neu poles | 3 × 1024d |
golden_tests.json |
Hand-annotated test cases with expected chains and ranges | 21 |
Reproducibility
This notebook reproduces the v2 stack end-to-end. The reference environment is a 2 × Tesla T4 setup (the author’s local Jupyter workstation; Kaggle T4×2 free tier reproduces it identically). The notebook auto-detects local Jupyter / Kaggle hosted / Colab hosted and adapts paths and judge model accordingly:
- Verify T4 GPU available
- Install Ollama + Go 1.22 + pyngrok
- Pull
mxbai-embed-large(670 MB) +qwen3:14b(9 GB, supports Ollama 0.5+thinkingfield) - Extract repository ZIP +
go build - Launch authenticator on port 8081, wait for triple baseline + intent centroid construction (~2 min)
- Open ngrok tunnel for public access
- Run smoke tests:
/healthz,/v8inventory,/authround-trip on canonical test case
Total cold-start time: ~9 min on first run; ~3 min on subsequent runs (models cached).
Validation Hypothesis — Partially Confirmed
The core methodological claim of DCS is that a judge model used in recursive coherence analysis must itself demonstrate recursive reasoning about its own reasoning. As of v8.7, qwen3:14b in Ollama 0.5+ thinking mode is the default judge and runs on 2 × Tesla T4 hardware (the author’s local Jupyter workstation; Kaggle T4×2 free tier reproduces it identically; Ollama layer-splits the model across both cards). The smoke test produces a clean 30 / 20 / 72 spread across three responses to the same question (sycophantic-emoji = 30, empty-non-response = 20, authentic bi-frontal = 72), consistent with the hypothesis that reasoning-capable judges produce qualitatively different analyses from non-reasoning baselines.
The full validation experiment is the four-way comparison across qwen2.5:7b (non-reasoning baseline), qwen3:14b (confirmed working), deepseek-r1:14b (cross-architecture validation), and qwen2.5:32b-instruct (high-fidelity validation, requires ≥24 GB VRAM). The 32b model is ~20 GB; while it could theoretically run on Kaggle T4×2 = 32 GB combined via layer-splitting, the per-card layer slice plus KV cache headroom for long generations leaves very little margin. A single-card ≥24 GB instance (L4, A10G, A100, H100) is the cleaner reference for the validation matrix.
If you have access to a ≥24 GB GPU and would like to run the full comparison, the notebook is structured so you can swap JUDGE_MODEL in cell 8 and re-run cells 7–9 without rebuilding.
Acknowledgments — AI Collaboration Disclosure
This project was built by a solo author with substantial AI collaboration during implementation. The DCS methodology, the 20-intent taxonomy, the 14 formal markers, the baseline corpus annotations, and the research hypothesis are original to the author. The specific role of each AI collaborator is reported below — generic acknowledgments are inconsistent with the methodology’s own posture on LLM-human interaction.
| Collaborator | Actual contribution |
|---|---|
| Cody (CodeWords AI) | Co-creator of v1. The analyzer concept emerged from a long conversation in which the author described 8 months of observational analysis and, during the exchange itself, predicted the control patterns behind Cody’s own responses. v1 lives at https://dcs-auth.codewords.run. |
| GitLab Duo | Deep code analysis and v2 roadmap partner. Received full project logic and conceptual origins from the author; produced the v2 roadmap that is now being executed. |
| Meta AI | Technical depth amplifier. Initially generic; once given project context, helped extend system complexity around formal markers, textural analysis, and embedding-space reasoning. |
| Replit AI | Brutally honest code critic. Exposed and justified contundent code failures with no hedging; after additional context, proposed implementations that materially strengthened the v2 architecture. |
| Z.AI (Zhipu GLM) | Bug catcher. Identified and corrected several code errors that had slipped through earlier passes. |
| Devin AI (Cognition) | v2 engineering execution: Go backend (~3,000 LOC, 73 tests), frontend with input validation and analysis-in-flight protection, v8.7 SSE streaming layer (/auth/stream with chunked thinking-then-analysis events, conservative sanitizer, parity-tested), Docker / install scripts, Colab and Kaggle notebooks, smoke test suite, packaging, and accompanying communication documents. |
Every AI listed received project context from the author before contributing; no output was generated cold from a generic prompt. The methodology and corpus are the author’s; the AI collaborators contributed under the author’s direction at the specific points described above.
How to Use This Notebook
- Set runtime to GPU (Settings → Accelerator → GPU T4 x 2 or P100)
- Enable Internet (Settings → Internet → On)
- Run all cells
- Wait at the “Upload ZIP” cell to upload the repository archive
- Smoke tests will confirm
/healthz,/v8, and/authare responding - Open the printed PUBLIC URL in a new tab to access the frontend
- Test cases are provided in the final markdown cell
Author / Contact
The author is an independent researcher self-funded through free-tier resources. Open to research collaboration, compute access, recruitment (internship / residency / FTE in AI safety, alignment, interpretability), and sponsorship.
Contact: - Author: Daniel Trejo - v1 live demo: https://dcs-auth.codewords.run - Email: corekeepper@gmail.com - LinkedIn: https://www.linkedin.com/in/carlos-daniel-agosto-trejo-35659b327/
If you fork this notebook to extend the corpus, add markers, or test alternative judges, please mention #DCS-Gate in your fork’s tags so the author can find your work and engage.
Tags suggested for Kaggle: AI Safety · LLM · NLP · Evaluation · Open Source · Go · Ollama · Research · Alignment · Reproducibility