Fact-Classification-System

Fact Classification System

Python FastAPI License CI

Fact Classification System is a FastAPI-based service that classifies English factual text as "правда" (true), "неправда" (false), or "нейтрально" (neutral).

It combines claim extraction, Wikipedia evidence retrieval (FAISS), and NLI verification (roberta-large-mnli) to produce transparent, evidence-backed results.

Why this project

What you get

Architecture

flowchart LR
    A[Input text] --> B[Claim extraction]
    B --> C[Evidence retrieval\nFAISS + Wikipedia snippets]
    C --> D[NLI verification\nroberta-large-mnli]
    D --> E[Claim-level scoring]
    E --> F[Weighted aggregation]
    F --> G[Overall classification + evidence]

Main modules:

See docs/ARCHITECTURE.md for a deeper walkthrough.

Quick Start

git clone https://github.com/levvius/Fact-Classification-System.git
cd Fact-Classification-System
./run.sh

run.sh handles:

Open:

Manual Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python scripts/build_kb.py
uvicorn app.main:app --host 0.0.0.0 --port 8000

API Example

Request:

curl -X POST http://localhost:8000/api/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"text":"Albert Einstein was born in 1879 and won the Nobel Prize in Physics in 1921."}'

Response (shape):

{
  "overall_classification": "правда",
  "confidence": 0.95,
  "claims": [
    {
      "claim": "Albert Einstein was born in 1879.",
      "classification": "правда",
      "confidence": 0.99,
      "best_evidence": {
        "snippet": "Albert Einstein was born in Ulm...",
        "source": "https://en.wikipedia.org/wiki/Albert_Einstein",
        "nli_score": 0.99,
        "retrieval_score": 0.98
      }
    }
  ]
}

Endpoints

Endpoint Method Description
/ GET Frontend UI (or API info fallback)
/api/v1/health GET Service/model readiness check
/api/v1/classify POST Main text classification endpoint
/api/v1/topics GET Available Wikipedia topics
/api/v1/cache-info GET Cache statistics
/docs GET OpenAPI/Swagger UI

Configuration

Environment variables are loaded from .env (see .env.example).

Key settings:

Testing

# Unit tests (fast, mocked models)
pytest tests/unit -m unit

# Integration tests (real models, slower)
pytest tests/integration -m integration

Current test layout includes 99 tests in total (82 unit + 17 integration).

Repository Layout

Fact-Classification-System/
├── app/
│   ├── api/           # FastAPI routes and schemas
│   ├── core/          # config, model manager, cache, exceptions
│   ├── services/      # claim extraction, retrieval, NLI, classifier
│   ├── static/        # web UI (vanilla HTML/CSS/JS)
│   └── utils/         # KB building helpers
├── scripts/           # helper scripts (KB build)
├── tests/             # unit + integration tests
├── docs/              # architecture and development docs
├── run.sh
└── README.md

Engineering Notes

Contributing

Contributions are welcome. Start with CONTRIBUTING.md for setup and workflow expectations.

License

MIT - see LICENSE.