About This Project

This project began as a personal learning tool for studying the Greek New Testament and Hebrew Old Testament in their original languages. It brings together an interest in data, structured workflows, and a desire to read the Scriptures more carefully in the original languages.

How It Was Built

The raw data was extracted from interlinear Bible sources, producing per-word records with morphological parsing, Strong's concordance numbers, and English glosses. Each corpus was processed through an automated pipeline:

Extraction — Parse source data into per-book CSV files
Deduplication — Generate unique-lemma files (first occurrence of each Strong's number)
Master compilation — Concatenate all books with globally renumbered ordinals
Reading order analysis — Run two greedy optimization algorithms
Optimized file generation — Produce per-book files in each optimized order

Data Schema

Greek New Testament

Column	Description	Example
Greek Word	The Greek word form as it appears in the text	λόγος
english_translation	English gloss / interlinear translation	word
parsing_abbreviation	Morphological parsing code	N-NMS
strongs_number	Strong's Greek concordance number	G3056
chapter_ref	Book and chapter reference	John 1
Row Ordinal	Sequential position number	1

Hebrew Old Testament

Column	Description	Example
hebrew_word	The Hebrew word form with cantillation	בראשִׁית
lemma	Strong's number (lexical form reference)	H7225
transliteration	Romanized pronunciation	bə·rē·šîṯ
english_translation	English gloss	In the beginning
parsing_abbreviation	Morphological parsing code	N‑fs
strongs_number	Strong's Hebrew concordance number	H7225
book	Book name	Genesis
chapter	Chapter number	1
ordinal	Sequential position number	1

Reading Order Algorithms

Both strategies use a greedy algorithm. They share the same starting book — the one whose vocabulary set has the highest overlap with the most frequent words across the entire corpus.

Strategy A — Minimal New Vocabulary

At each step, pick the remaining book that introduces the fewest new lemmas. This keeps vocabulary growth as gradual as possible.

Strategy B — Maximum Readability

At each step, pick the remaining book where you already know the highest percentage of its vocabulary. This maximizes reading comprehension at each step.

Inspiration

The Greek NT reading order was originally inspired by Greek for Life by Jonathan T. Pennington, which recommends reading the NT in a specific order to build vocabulary naturally.

The computational approach extends this idea by using actual corpus data to optimize the sequence algorithmically.

License & Citation

This project is licensed under CC BY 4.0. You are free to use, share, and adapt the data with attribution.

Source code: GitHub