Friday, January 26, 2018

A Great API for the Good Book

I'm working on a new project at shul that I think would greatly benefit from just the right spreadsheet. In my mind's eye, this ideal document contains the specific weekly Torah reading breakdown as well as how many verses are in each aliyah. Why do I need this? Unfortunately, the project isn't far enough along to share. Stay tuned.

I cringed at the thought of manually keying his information in, and imagined my only option was going to be parsing randomly structured PDF files. My hopes were immediately raised when I discovered that hebcal.com offers exactly the information I was seeking, in CSV form, no less!

Well, that was easy, I thought as I downloaded the CSV file for the Triennial cycle our shul uses. I manually examined the file and realized that this wasn't gso simple after all. While the specification for the export files promise verse count, the actual files are mostly missing this information. I can't complain to hebcal about this, as they're providing me with a massive amount of information free of charge.

In the easiest case, I realized I could derive verse count. For example:

27-Jan-2018,Beshalach,1,"Exodus 14:15 - 14:20"
27-Jan-2018,Beshalach,2,"Exodus 14:21 - 14:25"

If the reading starts and ends in the same chapter, all I need to do is a bit of subtraction to get the number of verses involved. However, when a reading crosses a chapter boundary, things aren't so simple. Consider this aliyah:

27-Jan-2018,Beshalach,3,"Exodus 14:26 - 15:21"

Without knowing how many verses are in Exodus, Chapter 14, I can't say how long this aliyah is. I'd normally solve this sort of problem by making use of 3rd party API. That is, if I was after weather or stock information, I'd consult some resource on the web to get this answer. Out of other options, I Google for Torah API and surprisingly there was a meaningful hit. Front and center was the API spec for sefaria.org. It looked to be exactly the information I was after. And like hebcal, the API was free to access and super easy to use.

For the record, here's how many verses there are in Exodus, Chapter 14:

$ curl -s https://www.sefaria.org/api/texts/Exodus.14 | jq '.text | length'
31

With Sefaria's API at my disposal, I had the pieces in place to fill in the verse count in a hebcal CSV file. Below is a command line PHP script that does this.

I share of all this to heap credit on sites like hebcal and Sefaria for being so open and willing to share. They're amazing resources and the Jewish Community is truly blessed to have them.

<?php
/*
 * A PHP file for reading in a data file from hebcal:
 *  https://www.hebcal.com/sedrot/
 * and adding extra info, like the verse count
 */

if(count($argv) == 0 || !file_exists($argv[1])) {
  echo "Usage: {$argv[0]} input.csv\n";
  exit();
}

function lookup_verse_count($book, $chapter) {
  $book = str_replace(' ', '_', $book);
  $ch = curl_init("https://www.sefaria.org/api/texts/$book.$chapter");
  curl_setopt($ch, CURLOPT_TIMEOUT, 4);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  $info = curl_exec($ch);
  if($info === false) {
    echo "Failed to lookup verse count for $book $chapter: " . curl_error($ch);
    var_dump(curl_getinfo());
    exit();
  }
  $details = json_decode ($info, true);
  return count($details['text']);
}

$input_fd = fopen($argv[1], "r");
$output_fd = fopen("php://stdout", "w");

while($row = fgetcsv($input_fd)) {
  if(count($row) == 5 && $row[4] == '' && 
     preg_match('/([A-Z].*?) ([0-9]+):([0-9]+) - ([0-9]+):([0-9]+)/', $row[3], $matches)) {
    $book          = $matches[1];
    $start_chapter = $matches[2];
    $start_verse   = $matches[3];
    $end_chapter   = $matches[4];
    $end_verse     = $matches[5];

    if($start_chapter == $end_chapter) {
      $row[4] = $end_verse - $start_verse + 1;
    } else if($start_chapter < $end_chapter) {
      $verses = lookup_verse_count($book, $start_chapter) - $start_verse + 1;
      for($c = $start_chapter + 1; $c < $end_chapter; $c++) {
        $verses += lookup_verse_count($book, $c);
      }
      $row[4] = $verses + $end_verse;
    } else {
      echo "Don't know how to compute verse count\n";
      var_dump($matches);
      die();
    }

  }
    
  fputcsv($output_fd, $row);


}
?>

No comments:

Post a Comment