Biblical Language Vocabulary Database

Structured vocabulary datasets for studying the Greek New Testament and Hebrew Old Testament, optimized for language learners.

2
Corpora
66
Biblical Books
353,000+
Total Tokens
13,700+
Unique Lemmas

Greek New Testament

27 books · 78,419 tokens · 5,349 unique lemmas

Complete vocabulary of the Greek NT with morphological parsing, Strong's numbers, and interlinear translations. Includes two computationally optimized reading orders for gradual vocabulary acquisition.

Explore Greek NT →

Hebrew Old Testament

39 books · 275,993 tokens · 8,399 unique lemmas

Complete vocabulary of the Hebrew OT with transliteration, parsing, Strong's numbers, and English glosses. Includes two optimized reading orders designed for progressive vocabulary building.

Explore Hebrew OT →

What Makes This Project Different

Most biblical language resources present vocabulary in frequency-sorted lists or in canonical book order. This project adds a data-driven approach: computationally optimized reading orders that minimize the vocabulary burden at each step.

Two greedy algorithms produce two strategies:

Strategy A — Minimal New Vocabulary
At each step, read the book that introduces the fewest new words. Ideal for steady, incremental vocabulary growth.
Strategy B — Maximum Readability
At each step, read the book where you already know the highest percentage of the vocabulary. Ideal for reading comprehension.

Every dataset is available as a downloadable CSV, ready for spreadsheets, flashcard apps (Anki, Quizlet), or computational analysis.

How to Use These Files

All files are UTF-8 CSV format. Open them in Excel, Google Sheets, LibreOffice, or import into flashcard applications.