About This Project

This project began as a personal learning tool for studying the Greek New Testament and Hebrew Old Testament in their original languages. It brings together an interest in data, structured workflows, and a desire to read the Scriptures more carefully in the original languages.

How It Was Built

The raw data was extracted from interlinear Bible sources, producing per-word records with morphological parsing, Strong's concordance numbers, and English glosses. Each corpus was processed through an automated pipeline:

  1. Extraction — Parse source data into per-book CSV files
  2. Deduplication — Generate unique-lemma files (first occurrence of each Strong's number)
  3. Master compilation — Concatenate all books with globally renumbered ordinals
  4. Reading order analysis — Run two greedy optimization algorithms
  5. Optimized file generation — Produce per-book files in each optimized order

Data Schema

Greek New Testament

ColumnDescriptionExample
Greek WordThe Greek word form as it appears in the textλόγος
english_translationEnglish gloss / interlinear translationword
parsing_abbreviationMorphological parsing codeN-NMS
strongs_numberStrong's Greek concordance numberG3056
chapter_refBook and chapter referenceJohn 1
Row OrdinalSequential position number1

Hebrew Old Testament

ColumnDescriptionExample
hebrew_wordThe Hebrew word form with cantillationבראשִׁית
lemmaStrong's number (lexical form reference)H7225
transliterationRomanized pronunciationbə·rē·šîṯ
english_translationEnglish glossIn the beginning
parsing_abbreviationMorphological parsing codeN‑fs
strongs_numberStrong's Hebrew concordance numberH7225
bookBook nameGenesis
chapterChapter number1
ordinalSequential position number1

Reading Order Algorithms

Both strategies use a greedy algorithm. They share the same starting book — the one whose vocabulary set has the highest overlap with the most frequent words across the entire corpus.

Strategy A — Minimal New Vocabulary

At each step, pick the remaining book that introduces the fewest new lemmas. This keeps vocabulary growth as gradual as possible.

Strategy B — Maximum Readability

At each step, pick the remaining book where you already know the highest percentage of its vocabulary. This maximizes reading comprehension at each step.

Inspiration

The Greek NT reading order was originally inspired by Greek for Life by Jonathan T. Pennington, which recommends reading the NT in a specific order to build vocabulary naturally.

The computational approach extends this idea by using actual corpus data to optimize the sequence algorithmically.

License & Citation

This project is licensed under CC BY 4.0. You are free to use, share, and adapt the data with attribution.

Source code: GitHub