Monday, February 16, 2015

Gotcha of the Day: Google Docs API Suddenly Starts Returning 401 Unauthorized

Some time back I created a tool that uses the Zend Gdata Spreadsheets API to pull data in from various spreadsheets and store it in a system database. The code has been rock solid. That is, until today, when attempts to access one particular Google doc resulted in this error:

  HTTP 401 Unauthorized

Yeah, that wasn't a heck of a lot to go on. Not only that, but nothing obvious had changed: the document in question was still shared with the user trying to access the doc, and the user credentials hadn't been updated. So why the lack of authorization?

My first thought was that it had something to do with the automatic migration Google promised for older docs. Perhaps this one particular document had been migrated, and in doing so, the spreadsheet key had been altered. If that was the case, the existing code may very well be expected to fail (as it looks up the spreadsheet by this unique ID).

At first I thought I was on the right track: the Google Doc in question did have an updated URL. But alas, putting in the new spreadsheet key made no difference. The system still got an unauthorized message when it tried to access the document.

Next up, I tried using a different set of user credentials. Again, no change.

Then I started picking apart the Zend API calls. I'm essentially executing this code:

    $service = Zend_Gdata_Spreadsheets::AUTH_SERVICE_NAME;
    $client = Zend_Gdata_ClientLogin::getHttpClient($user, $pass, $service);
    $spreadsheetService = new Zend_Gdata_Spreadsheets($client);

    $query = new Zend_Gdata_Spreadsheets_DocumentQuery();
    $query->setSpreadsheetKey($spreadsheetKey);
    $feed = $spreadsheetService->getWorksheetFeed($query);  // Failing here

I updated the above code to print out just a list of possible spreadsheets. The code was something along these lines:

    $service = Zend_Gdata_Spreadsheets::AUTH_SERVICE_NAME;
    $client = Zend_Gdata_ClientLogin::getHttpClient($user, $pass, $service);
    $spreadsheetService = new Zend_Gdata_Spreadsheets($client);
    $feed = $spreadsheetService->getSpreadsheetFeed();
    var_dump($feed);

I finally caught a break here and was pleased to see that the spreadsheet in question was indeed found in the list =Google was returning to me.

So this proves that the user can see the spreadsheet, yet he still can't access it.

After spending more time digging around the Zend code I finally took an altogether different tact.

The Google Spreadsheet API, at least accessing the list of worksheets, is actually quite simple. You can cobble together client to accomplish this using little more than curl. So that's what I did. I followed this guide and manually executed the URL requests to authorize my user and access the spreadsheet in question. Again, to my surprise, it worked.

So what was I doing manually with curl that the Zend API wasn't doing? Hmmm...looking closely I realized that the Zend request to pick up the spreadsheet was using http. the curl example was, you guessed it, using https. I opened up Zend/Gdata/App.php and scrolled down to around line 640 and added this line of code:

    public function performHttpRequest($method, $url, $headers = null,
        $body = null, $contentType = null, $remainingRedirects = null)
    {
        ...

        // Make sure the HTTP client object is 'clean' before making a request
        // In addition to standard headers to reset via resetParameters(),
        // also reset the Slug header
        $this->_httpClient->resetParameters();
        $this->_httpClient->setHeaders('Slug', null);

        $url = str_replace('http', 'https', $url);  // [ADDED]

        // Set the params for the new request to be performed
        $this->_httpClient->setHeaders($headers);
        $this->_httpClient->setUri($url);
        $this->_httpClient->setConfig(array('maxredirects' => 0));

That line of code blindly re-writes the URL from 'http' to 'https' -- and what do you know, that solved the problem!

I can't really blame this on Zend. I'm probably using version 1.x of their API, and they have version 2.3.

My guess is that it was the spreadsheet upgrade that ended up biting me. On the surface the change Google made was pretty tiny: you must now access Google Docs over HTTPs (versues having HTTP and HTTPs both work). That change, however, was enough to throw a monkey wrench into my code.

In hindsight, requiring docs be accessed over HTTPs makes perfect sense. Though I would have appreciated a clearer error message when I violated this requirement. With that said, it sure was nice to re-familiarize myself with the Google Docs API and how it can be accessed from the command line. Who knows, maybe I'll find a way to integrate this in with my other Linux hacking projects.

No comments:

Post a Comment