Fact-Classification-System

Fact Classification System

Fact Classification System is a FastAPI-based service that classifies English factual text as "правда" (true), "неправда" (false), or "нейтрально" (neutral).

It combines claim extraction, Wikipedia evidence retrieval (FAISS), and NLI verification (roberta-large-mnli) to produce transparent, evidence-backed results.

Why this project

End-to-end NLP pipeline with practical trade-offs (accuracy vs latency).
Stateless API architecture with startup model loading and structured error handling.
Reproducible local setup with automated knowledge base build.
Separate unit/integration test suites for fast feedback and realistic validation.
Simple frontend for interactive demos without build tooling.

What you get

API endpoint to classify text and return per-claim evidence.
Confidence-aware aggregation for multi-claim text.
Built-in rate limiting, response caching, and input validation.
Health/status endpoints for runtime observability.

Architecture

flowchart LR
    A[Input text] --> B[Claim extraction]
    B --> C[Evidence retrieval\nFAISS + Wikipedia snippets]
    C --> D[NLI verification\nroberta-large-mnli]
    D --> E[Claim-level scoring]
    E --> F[Weighted aggregation]
    F --> G[Overall classification + evidence]

Main modules:

app/services/claim_extractor.py - sentence splitting and claim filtering.
app/services/evidence_retriever.py - embedding + FAISS nearest-neighbor lookup.
app/services/nli_verifier.py - entailment scoring for claim-evidence pairs.
app/services/classifier.py - thresholds and overall aggregation.
app/core/models.py - singleton lifecycle manager for all heavy models.

See docs/ARCHITECTURE.md for a deeper walkthrough.

Quick Start

git clone https://github.com/levvius/Fact-Classification-System.git
cd Fact-Classification-System
./run.sh

run.sh handles:

virtual environment creation (if missing),
dependency installation,
knowledge base build (if missing),
API startup on http://localhost:8000.

Open:

Web UI: http://localhost:8000
API docs: http://localhost:8000/docs
Health: http://localhost:8000/api/v1/health

Manual Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python scripts/build_kb.py
uvicorn app.main:app --host 0.0.0.0 --port 8000

API Example

Request:

curl -X POST http://localhost:8000/api/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"text":"Albert Einstein was born in 1879 and won the Nobel Prize in Physics in 1921."}'

Response (shape):

{
  "overall_classification": "правда",
  "confidence": 0.95,
  "claims": [
    {
      "claim": "Albert Einstein was born in 1879.",
      "classification": "правда",
      "confidence": 0.99,
      "best_evidence": {
        "snippet": "Albert Einstein was born in Ulm...",
        "source": "https://en.wikipedia.org/wiki/Albert_Einstein",
        "nli_score": 0.99,
        "retrieval_score": 0.98
      }
    }
  ]
}

Endpoints

Endpoint	Method	Description
`/`	GET	Frontend UI (or API info fallback)
`/api/v1/health`	GET	Service/model readiness check
`/api/v1/classify`	POST	Main text classification endpoint
`/api/v1/topics`	GET	Available Wikipedia topics
`/api/v1/cache-info`	GET	Cache statistics
`/docs`	GET	OpenAPI/Swagger UI

Configuration

Environment variables are loaded from .env (see .env.example).

Key settings:

TRUTH_THRESHOLD (default 0.75)
FALSEHOOD_THRESHOLD (default 0.4)
TOP_K_PROOFS (default 10)
MAX_CLAIMS (default 8)
USE_WEIGHTED_AGGREGATION (default true)
USE_NLI_CONTEXT (default true)

Testing

# Unit tests (fast, mocked models)
pytest tests/unit -m unit

# Integration tests (real models, slower)
pytest tests/integration -m integration

Current test layout includes 99 tests in total (82 unit + 17 integration).

Repository Layout

Fact-Classification-System/
├── app/
│   ├── api/           # FastAPI routes and schemas
│   ├── core/          # config, model manager, cache, exceptions
│   ├── services/      # claim extraction, retrieval, NLI, classifier
│   ├── static/        # web UI (vanilla HTML/CSS/JS)
│   └── utils/         # KB building helpers
├── scripts/           # helper scripts (KB build)
├── tests/             # unit + integration tests
├── docs/              # architecture and development docs
├── run.sh
└── README.md

Engineering Notes

Models load once at startup through ModelManager.
API runs inference in a dedicated thread pool to avoid event-loop blocking.
CPU-only + single-threaded torch settings improve stability on macOS.
Rate limiting and validation harden public API usage.

Contributing

Contributions are welcome. Start with CONTRIBUTING.md for setup and workflow expectations.

License

MIT - see LICENSE.

This site is open source. Improve this page.