Preflight CORS check in PHP

I was reading up on CORS today; apparently my previous understanding of it was flawed.

Found a worthwhile article by Remy. Also found a problem in the article in the same PHP code he offered. This was server-side code that was shown to illustrate how to handle a CORS preflight request.

The “preflight” is an HTTP OPTIONS request that the user-agent makes in some cases, to check that the server is prepared to serve a request from XmlHttpRequest. The preflight request carries with it the special HTTP Header, Origin.

His suggested code to handle the preflight was:

// respond to preflights
if ($_SERVER['REQUEST_METHOD'] == 'OPTIONS') {
  // return only the headers and not the content
  // only allow CORS if we're doing a GET - i.e. no saving for now.
  if (isset($_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD']) &&
      $_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD'] == 'GET') {
    header('Access-Control-Allow-Origin: *');
    header('Access-Control-Allow-Headers: X-Requested-With');
  }
  exit;
}

But according to my reading of the CORS spec, The Access-Control-Xxx-XXX headers should not be included in a response if the request does not include the Origin header.

See section 6.2 of the CORS doc.

The corrected code is something like this:

// respond to preflights
if ($_SERVER['REQUEST_METHOD'] == 'OPTIONS') {
  // return only the headers and not the content
  // only allow CORS if we're doing a GET - i.e. no saving for now.
  if (isset($_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD']) &&
       $_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD'] == 'GET' &&
       isset($_SERVER['HTTP_ORIGIN']) &&
       is_approved($_SERVER['HTTP_ORIGIN'])) {
    header('Access-Control-Allow-Origin: *');
    header('Access-Control-Allow-Headers: X-Requested-With');
  }
  exit;
}

Implementing the is_approved() method is left as an exercise for the reader!

A more general approach is to do as this article on HTML5 security suggests: perform a lookup in a table on the value passed in Origin header. The lookup can be generalized so that it responds with different Access-Control-Xxxx-Xxx headers when the preflight comes from different origins, and for different resources. This might look like this:

// respond to preflights
if ($_SERVER['REQUEST_METHOD'] == 'OPTIONS') {
  // return only the headers and not the content
  // only allow CORS if we're doing a GET - i.e. no saving for now.
  if (isset($_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD']) &&
      $_SERVER['HTTP_ACCESS_CONTROL_REQUEST_METHOD'] == 'GET' &&
      isset($_SERVER['HTTP_ORIGIN']) &&
      is_approved($_SERVER['HTTP_ORIGIN'])) {
    $allowedOrigin = $_SERVER['HTTP_ORIGIN'];
    $allowedHeaders = get_allowed_headers($allowedOrigin);
    header('Access-Control-Allow-Methods: GET, POST, OPTIONS'); //...
    header('Access-Control-Allow-Origin: ' . $allowedOrigin);
    header('Access-Control-Allow-Headers: ' . $allowedHeaders);
    header('Access-Control-Max-Age: 3600');
  }
  exit;
}

Reference:

APIs Win Because They’re Easy

I went out to Jimmy John’s the other night, to get a sandwich with a good buddy of mine. It was pretty fast. I ordered a standard ham-and-cheese sandwich, and my buddy got an Italian hoagie with banana peppers. They were quite tasty! I guess we were in the store, conducting this transaction, for less than 5 minutes. Pretty fast. And easy.

Easy wins. People will always choose the easier option. People are willing to pay more, or receive less, in order to get something easy. “Fast food” is cheap, but it isn’t the cheapest. Often it’s even less expensive to buy ingredients at the grocery, and prepare a meal.

We paid about $USD 15 for 2 sandwiches. (We didn’t order drinks because soda is toxic sludge and doesn’t belong in the human diet.) Now, I could have gone to the grocery across the parking lot, purchased a pound of ham, a loaf of bread, and some lettuce and tomato for about $20, and had plenty of ingredients to make 5 sandwiches or more. Instead, we paid for the convenience and we were happy to do it.

But fast food does not always deliver the best nutrition. It’s easy, and it might even be “tasty”, but sometimes it’s not good. (and yes, I know that the film “Super Size Me” was criticized for being less than 100% truthful, and less than scientific about the analysis.) In a free economy, for every potential purchase, the consumer decides whether the combination of ease, quality, and the price of the thing on offer represents a good deal. Generally, “easy” is worth good money.

And that makes sense. People have things to do, and for some people, preparing food is not high on the priority list. Likewise, people will pay $50 to get the oil changed in their car, even though they could do it themselves with about $28 worth of supplies. People pay for convenience when maintaining their cars.

And the same is true when managing and operating information systems. People will often sacrifice quality in order to gain some ease. Or they may pay more for greater ease. That’s often the right choice.

The adoption of Windows Server in the enterprise, starting in 1996 or so, was a perfect example of that. People had so-called “Enterprise Grade” Unix-based options available. But Windows was simple to deploy, easy to configure for basic file and printer sharing, and database access. Who can argue with those choices? Sure, there were drawbacks, but even so, the preference for easier solutions continues.

As another example, these days, for interconnecting disparate information systems, REST APIs are much, much more popular than using SOAP. There are numerous indicators of this trend, but one of them is the Google Search Insights data:

(Can you guess which line in the above is “REST” and which represents the search volume on “SOAP”?) The popularity of REST may seem curious, to law-and-order IT professionals. SOAP provides a means to explicitly state the contract for interconnecting systems, in the WSDL document. And WSDL provides a way to rigorously specify the type of the data being sent and received, including the shape of the XML data, the XML namespaces, and so on. SOAP also provides a way to sign documents, as a way to provide non-repudiation. In the REST API model, none of that is available. You’d have to do it all yourself.

But apparently, people don’t want all that capability. Or better, they may want it, but they want ease of development more. When given the choice to opt for greater capability and more strictures (SOAP), versus a simpler, easier way to connect (REST APIs), people have been choosing the easy path, and the trend is strengthening.

APIs are winning because they’re easy.

How not to do APIs; and …My New Job

Having access to a quick way to get dictionary lookups of English words while using the computer has always been useful to me. I don’t like to pay the “garish dancing images” ad tax that comes with using commercial dictionary sites like merriam-webster.com; or worse, the page-load-takes-10-seconds tax. I respect Merriam-Webster and appreciate all they’ve done for the English language over the years, but that doesn’t mean I am willing to put up with visual torture today. No, in fact I do not want to download a free e-book, and I do not want to sign up for online courses at Capella University. I just want a definition.

Where’s the API?

I’m a closet hacker, so I figured there had to be an API out there that allowed me to build something that would  do dictionary lookups from … whatever application I was currently using.

And looking back, this is an ongoing obsession of mine, apparently. In the job with Microsoft about 10 years ago, I spent a good deal of my time writing documents using Microsoft Word.  I had the same itch then, and for whatever reason I was not satisfied with the built-in dictionary.  So I built an Office Research Plugin, using the Office Research Services SDK (bet you never heard of that), that would screen-scrape the Merriam-Webster website and allow MS-Word to display definitions in a “smart pane”.  The guts of how it worked was ugly; it was brittle in that every time M-W used a different layout, the service would break.  But it worked, and it displayed nicely in Word and other MS-Office programs.

The nice thing was that Microsoft published an interface that any research service could comply to, that would allow the service to “plug in” to the Office UI.  The idea behind Office Research Services was similar to the idea behind Java Portlets, or  Sharepoint WebParts, or even Vista Gadgets, though it was realized differently than any of those. In fact the Office Research Services model was very weird, in that it was a SOAP service, yet the payload was an encoded XML string.  It wasn’t even xsd:any. The Office team failed on the data model there.

It was also somewhat novel in that the client, in this case the MS-Office program, specified the interface over which the client and service would interact.  Typically the service is the anchor point of development, and the developer of the service therefore has the prerogative to specify the service interface. For SOAP this specification was provided formally in  WSDL, but these days the way to connect to a REST service can be specified in WADL or Plain-old-Documentation.  The Office Research Service was different in that the client specified the interface that research services needed to comply to.  To me, this made perfect sense: rather than one service and many types of clients connecting to it, the Office Research Service model called for one client (Office) connecting to myriad different research services. It was logically sensible that the client got to specify the interface.  But I had some discussions with people who just could not accept that.  I also posted a technical description showing how to build a service that complied with the contract specified by Office.

The basic model worked well enough, once the surprise of the-client-wears-the-pants-around-here wore off. But the encoded XML string was a sort of an ugly part of the design. Also, like the Portlet model, it mixed the idea of a service with UI; the service would actually return UI formatting instructions. This made sense for the specific case of an Office service, but it was wrong in the general sense. Also, there was no directory of services, no way to discover services, to play with them or try them out. Registration was done by the service itself. There are some lessons here in “How not to do APIs”.

Fast Forward 9 Years

We’ve come a long way.  Or have we?  Getting back to the dictionary service, WordNik has a nice dictionary API, and it’s free! for reasonable loads, or until the good people at WordNik change their minds, I guess.

It seemed that it would be easy enough to use. The API is clear; they provide a developer portal with examples of how to use it, as well as a working try-it-out console where you can specify the URLs to tickle and see the return messages. All very good for development.

But… there are some glaring problems. The WordNik service requires registration, at which point the developer receives an API key. That in itself is standard practice. But, confoundingly, I could not find a single document describing how to use this API key in a request, nor a single programming example showing how to use it. I couldn’t even find a statement explicitly stating that authentication was required. While the use of an API key is pretty standard, the way to pass the API key is not. Google does it one way, other services do it another. The simplest way to document the authentication model is to provide a short paragraph and show a few examples. Wordnik didn’t do this. Doesn’t do this. I learned how to use the WordNik service by googling and finding hints from 2009 on someone else’s site. That should not be acceptable for any organization offering a programmable service.

In the end I built the thing – it’s an emacs plugin if you must know, and it uses url.el and json.el to connect to Wordnik and display the definition of the word at point. But it took much longer, and involved much more frustration than was necessary. This is not something that API providers want to repeat.


The benefits of  interconnecting disparate piece-parts have long been tantalizingly obvious.  You could point to v0.9 of the SOAP spec in 1999 as a key point in that particular history, but IBM made a good business of selling integration middleware (MQ they called it) for at least a decade before that.  Even so, only in the past few years have we seen the state of the art develop to enable this on an internet scale.  We have cheap, easy-to-use server frameworks; elastic hosting infrastructure; XML and JSON data formats; agreement on pricing models; an explosion in mobile clients.

Technologies and business needs are aligning, and this opens up new opportunities for integration. To take advantage of these opportunities, companies need the right tools, architectural guidance, and solid add-on building blocks.

I’ve recently joined Apigee, the API Company, to help companies exploit those opportunities.  I’ll say more about my job in the future. Right now, I am very excited to be in this place.

 

PHP and JSON and Unicode, Oh my!

Today I had a bit of a puzzle.  The gplus widget that I wrote for WordPress was rendering Google+ activity with a ragged left edge.

Google+ Widget
The ragged edge is shown for *some* activities

As geeks know, html by default collapses whitespace, so in order for the ragged edge to appear there,  a “hard whitespace” must have crept in, somewhere along the line.   For example, HTML has entites like   that will do this.  A quick look in the Chrome developer tool confirmed this.

I figured this was a simple fix:

$content = str_replace(" "," ", $content);

Simple, right?

But that didn’t work.

Hmmm…. Next I added in some obvious variations on that theme:

$content = str_replace(" "," ", $content);
$content = str_replace("\n"," ", $content);
$content = str_replace("\r"," ", $content);

That also did not work.

This data was simple json, coming from Google+, via their REST API. (But not directly. I built caching into the gplus widget so that it doesn’t hit the Google server every time it renders itself. It reads data from the cache file if it’s fresh, and only connects to Google+ if necessary). A call to json_decode() turns that json into a PHP object, and badda-boom. Right?

Turns out, no. The json had a unicode sequence 0xC2A0, which I guess is a non-breaking space if you speak Unicode. Not sure how that got into the datastream, I certainly did not put it in there myself, explicitly. And when WordPress rendered the widget, that sequence somehow, somewhere got translated into  , but only after the gplus widget code was finished.

I needed to replace the unicode sequence with a regular whitespace. This did the trick.

$content = str_replace("\xc2\xa0"," ", $content);

PHP, JSON, HTTP, HTML – these are all minor miracle technologies. Sometimes when gluing them all together you don’t get the results you expect. In those cases I’m glad to have a set of tools at my disposal, so I can match impedances.

Twitter and OAuth from C# and .NET

In some of my previous posts I discussed Twitter and OAuth. The reason I sniffed the protocol enough to describe it here was that I wanted to build an application that posted Tweets.  Actually, I wanted to go one better – I wanted an app that would post tweets and images, via Twitter’s /1/statuses/update_with_media.xml API call. This was for a screen-capture tool.  I wanted to be able to capture an image on the screen and “Tweet it” automatically, without opening a web browser, without “logging in”, without ever seeing a little birdy from a Twitter logo.  In other words I wanted the screen capture application to connect programmatically with Twitter.

It shouldn’t be difficult to do, I figured, but when I started looking into it in August 2011, there were no libraries supporting Twitter’s update_with_media.xml, and there was skimpy documentation from Twitter.  This is just par for the course, I think. In this fast-paced social-media world, systems don’t wait for standards or documented protocols.  Systems get deployed, and the code is the documentation.  We’re in a hurry here! The actual implementation is the documentation.  Have at it, is the attitude, and good luck!  Don’t think I’m complaining – Facebook’s acquisition of Instagram provides one billion reasons why things should work this way.

With that as the backdrop I set about investigating. I found some existing code bases, that were, how shall I put it? a little crufty.  There’s a project on google’s code repository that held value, but I didn’t like the programming model. There are other options out there but nothing definitive. So I started with the google code and reworked it to provide a simpler, higher-level programming interface for the app.

What I wanted was a programming model something like this:

var request = (HttpWebRequest)WebRequest.Create(url);
var authzHeader = oauth.GenerateAuthzHeader(url, "POST");
request.Method = "POST";
request.Headers.Add("Authorization", authzHeader);
using (var response = (HttpWebResponse)request.GetResponse())
{
  if (response.StatusCode != HttpStatusCode.OK)
    MessageBox.Show("There's been a problem trying to tweet:", "!!!");
}

In other words – using OAuth adds one API call .  Just generate the header for me!  But what I found was nothing like that.

So I wrote OAuth.cs to solve that issue.  It implements the OAuth 1.0a protocol for Twitter for .NET applications, and allows desktop .NET applications to connect with Twitter very simply.

That code is available via a TweetIt app that I wrote to demonstrate the library.  It includes logic to run a web browser and programmatically do the necessary one-time copy/paste of the approval pin.  The same OAuth code is used in that screen-capture tool.

OAuth is sort of involved, with all the requirements around sorting and encoding and signing and encoding again.  There are multiple steps to the protocol, lots of artifacts like temporary tokens, access tokens, authorization verifiers, and so on.  But a programmer’s interface to the protocol need not be so complicated.

 

Twitter’s use of OAuth, part 3

In the prior post, I described the message flows required for granting a client application write access to a user’s Twitter status. In this post, I’ll cover the additional use case, posting a status message.

Use Case 2: Approved Client posts a status

The output of a successful authorization of a client application is an oauth_token and an oauth_token_secret. Twitter does not “expire” these items and so the client can re-use them forever.

Typically, clients (applications) will store the oauth_token and oauth_token_secret in a settings file. A .NET application on Windows might put the file in %AppData%, in other words, \users\Someone\AppData\Roaming\Vendor\Appname\Settings.xml.

To post status updates to a user’s Twitter stream, the client application can simply POST to /1/statuses/update.xml or /1/statuses/update.json, including an Authorization header constructed as described previously. In fact there is just one request+response for this use case. This header is formed in exactly the same way as described in the first use-case, except that there are now different required parameters. For regular status updates, the client needs to construct the signature base using:

  • all the oauth_ parameters (except the “secret” ones). This will include the oauth_token obtained from the approval dance, but not the oauth_verifier, which is a one-time thing, or the oauth_callback which is used only when requesting client approval.
  • the complete status parameter, URL-encoded, which must be inserted into the base in observance of lexicographic sorting order.

Again, the client must normalize the parameters – sort by name, encode the values, and concatenate together with ampersands – and then sign the UTF-8 encoded payload. The added twist here is that the payload to be signed must include named query string parameters. In the case of a status update to Twitter, there is just one query string parameter, ‘status’. That parameter must be included in the signature base, with the value OAuth-percent-encoded as described previously. The result must be inserted into the base in observance of the lexicographic sorting requirement.

The resulting signature base looks something like this:

POST&http%3A%2F%2Fapi.twitter.com%2F1%2Fstatuses%2Fupdate.xml&
    oauth_consumer_key%3DFXJ0DIH50S7ZpXD5HXlalQ%26
    oauth_nonce%3D1j91hk2m%26oauth_signature_method%3DHMAC-SHA1%26
    oauth_timestamp%3D1336764858%26
    oauth_token%3D59152613-HBwngBPTBUIrl7qKJXvBKaM0Ko1FpA2n5Hfmdh4uc%26
    oauth_version%3D1.0%26
    status%3DTwas%2520brillig%252C%2520and%2520the...

Notice that the actual status string is double-URL-encoded. Yes, you must URL-encode it once, and append it to the sorted list of parameters. Then that sorted list gets URL-encoded again, before being concatenated with the method and URL, separated by ampersands. The result, the signature base, is what you see here. To sign, the entire signature base is URL-encoded again, before being UTF-8 encoded and signed. This is the kind of thing that is hard to discern in the scattered Twitter documentation. A simple example like this makes everything clearer, though not exactly clear.

The resulting message with the computed signature looks like this:

POST http://api.twitter.com/1/statuses/update.xml?status=Twas%20bri..  HTTP/1.1
Authorization: OAuth
    oauth_consumer_key="xxxxxxxxxx",
    oauth_nonce="1j91hk2m",
    oauth_signature="YyyTkVQyecxqeGN%2BlgdhsZNkkJw%3D",
    oauth_signature_method="HMAC-SHA1",
    oauth_timestamp="1336764858",
    oauth_token="59152613-PNjdWlAPngxOVa4dCWnLMIzZcUXKmwMoQpQWTkbvD",
    oauth_version="1.0"
Host: api.twitter.com

(That Authorization header is all on one line). The response message, if you are getting XML, looks like this:

<status>
  <created_at>Fri May 11 19:34:12 +0000 2012</created_at>
  <id>201032478964711427</id>
  <text>Twas brillig, and the slithy toves...</text>
  <source>Dino's TweetIt</source>
  <truncated>false</truncated>
  <favorited>false</favorited>
  <in_reply_to_status_id></in_reply_to_status_id>
  <in_reply_to_user_id></in_reply_to_user_id>
  <in_reply_to_screen_name></in_reply_to_screen_name>
  <retweet_count>0</retweet_count>
  <retweeted>false</retweeted>
  <user>
    <id>59152613</id>
    <name>dino chiesa</name>
    <screen_name>dpchiesa</screen_name>
    <location>Seattle</location>
    <description></description>
     <url>http://dinochiesa.net</url>
    <protected>false</protected>
    <followers_count>0</followers_count>
    <profile_background_color>C0DEED</profile_background_color>
    <profile_text_color>333333</profile_text_color>
    <profile_link_color>0084B4</profile_link_color>
    <profile_sidebar_fill_color>DDEEF6</profile_sidebar_fill_color>
    <profile_sidebar_border_color>C0DEED</profile_sidebar_border_color>
    <friends_count>13</friends_count>
    <created_at>Wed Jul 22 15:20:26 +0000 2009</created_at>
    <favourites_count>0</favourites_count>
    <utc_offset>-28800</utc_offset>
    <time_zone>Pacific Time (US &amp; Canada)</time_zone>
    <profile_background_tile>false</profile_background_tile>
    <profile_use_background_image>true</profile_use_background_image>
    <notifications>false</notifications>
    <geo_enabled>false</geo_enabled>
    <verified>false</verified>
    <following>false</following>
    <statuses_count>22</statuses_count>
    <lang>en</lang>
    <contributors_enabled>false</contributors_enabled>
    <follow_request_sent>false</follow_request_sent>
    <listed_count>0</listed_count>
    <show_all_inline_media>false</show_all_inline_media>
    <default_profile>true</default_profile>
    <default_profile_image>true</default_profile_image>
    <is_translator>false</is_translator>
  </user>
  <geo/>
  <coordinates/>
  <place/>
  <contributors/>
</status>

If the client uses the .json suffix on the update URL, then it would get a JSON-formatted message containing the same information.

There may arise a situation in which the user has already approved the client application, but the client does not have access to the oauth_token and oauth_token_secret, and is therefore unable to properly construct the signature for the Authorization header. This might happen when a user gets a new PC or device, or tries to use the client application from a different PC, for example. In that case the client should simply request authorization again, as in Use Case #1, described in the previous post. If this happens, the /oauth/authenticate endpoint can automatically redirect back to your callback without requiring any user interaction or even showing any Twitter UI to the user.

Twitter’s use of OAuth, part 2

Last time, I covered the basics around Twitter’s use of OAuth.  Now I will look at the protocol, or message flows. Remember, the basic need is to allow a user (resource owner) employing an application (what OAuth calls the client) to post a status message to Twitter (what OAuth calls the server).

There are two use cases: approve the app, and post a status message. Each use case has an associated message flow.

Use Case 1: Initial client (application) approval

The first time a user tries to employ a client to post a status update to Twitter, the user must explicitly grant access to allow the client to act on its behalf with Twitter. The first step is to request a temporary token, what Twitter calls a “request token”, via a POST to https://api.twitter.com/oauth/request_token like this:

POST https://api.twitter.com/oauth/request_token HTTP/1.1
Authorization: OAuth oauth_callback="oob",
      oauth_consumer_key="XXXXXXXXXXXXXX",
      oauth_nonce="0cv1i19r",
      oauth_signature="mIggVeqh58bsJAMyLsTrgeNq0qo%3D",
      oauth_signature_method="HMAC-SHA1",
      oauth_timestamp="1336759491",
      oauth_version="1.0"
Host: api.twitter.com
Connection: Keep-Alive

In the above message, the Authorization header appears all on one line, with each named oauth parameter beginning with oauth_ separated from the next with a comma-space. It is shown with broken lines only to allow easier reading. That header needs to be specially constructed. According to OAuth, the signature is an HMAC, using SHA1, over a normalized string representation of the relevant pieces of the payload. For this request, that normalized string must include:

  1. the HTTP Method
  2. a normalized URL
  3. a URL-encoded form of the oauth_ parameters (except the “consumer secret” – think of that as a key or password), sorted lexicographically and concatenated with ampersand.

Each of those elements is itself concatenated with ampersands. Think of it like this:

METHOD&URL&PARAMSTRING

Where the intended METHOD is POST or GET (etc), the URL is the URL of course (including any query parameters!), and the PARAMSTRING is a blob of stuff, which itself is the URL-encoded form of something like this:

oauth_callback=oob&oauth_consumer_key=xxxxxxxxxxxxxxx&
    oauth_nonce=0cv1i19r&oauth_signature_method=HMAC-SHA1&
    oauth_timestamp=1336759491&oauth_version=1.0

Once again, that is all one contiguous string – the newlines are there just for visual clarity.  Note the lack of double-quotes here. 

The final result, those three elements concatenated together (remember to url-encode the param string), is known as the signature base.

Now, about the values for those parameters:

  • The callback value of ‘oob’  should be used for desktop or mobile applications. For web apps, let’s say a PHP-driven website that will post tweets on the user’s behalf, they need to specify a web URL for the callback. This is a web endpoint that Twitter will redirect to, after the user confirms their authorization decision (either denial or approval).
  • The timestamp is simply the number of seconds since January 1, 1970. You can get this from PHP’s time() function, for example. In .NET it’s a little trickier.
  • The nonce needs to be a unique string per timestamp, which means you could probably use any monotonically increasing number. Or even a random number. Some people use UUIDs but that’s probably overkill.
  • The consumer key is something you get from the Twitter developer dashboard when you register your application.
  • The other pieces (signature method, version), for Twitter anyway, are fixed.

That form must then be URL-encoded according to a special approach defined just for OAuth. OAuth specifies percentage hex-encoding (%XX), and rather than defining which characters must be encoded, OAuth declares which characters need no encoding: alpha, digits, dash, dot, underscore, and twiddle (aka tilde). Everything else gets encoded.

The encoded result is then concatenated with the other two elements, and the resulting “signature base” is something like this.

POST&https%3A%2F%2Fapi.twitter.com%2Foauth%2Frequest_token&
    oauth_callback%3Doob%26oauth_consumer_key%3DXXXXXXXXXX%26
    oauth_nonce%3D0cv1i19r%26oauth_signature_method%3DHMAC-SHA1%26
    oauth_timestamp%3D1336759491%26oauth_version%3D1.0

The client app must then compute the HMACSHA1 for this thing, using a key derived as the concatenation of the consumer secret (aka client key) and the token secret, separated by ampersands. In this first message, there is no token secret yet, therefore the key is simply the consumer secret with an ampersand appended. Get the key bytes by UTF-8 encoding that result. Sign the UTF-8 encoding of the URL-encoded signature base with that key, then base64-encode the resulting byte array to get the string value of oauth_signature. Whew! The client needs to embed this signature into the Authorization header of the HTTP request, as shown above.

If the signature and timestamp check out, the server responds with a temporary request token and secret, like this:

HTTP/1.1 200 OK
Date: Fri, 11 May 2012 18:04:46 GMT
Status: 200 OK
Pragma: no-cache
...
Last-Modified: Fri, 11 May 2012 18:04:46 GMT
Content-Length: 147

oauth_token=vXwB2yCZYFAPlr4RcUcjzfIVW6F0b8lPVsAbMe7x8e4
    &oauth_token_secret=T81nWXuM5tUHKm8Kz7Pin8x8k70i2aThfWVuWcyXi0
    &oauth_callback_confirmed=true

Here again, the message is shown on multiple lines, but in response it will actually be all on one line.  This response contains what OAuth calls a temporary token, and what Twitter has called a request token. With this the client must direct the user to grant authorization.  For Twitter, the user does that by opening a web page pointing to https://api.twitter.com/oauth/authorize?oauth_token=xxxx, inserting the proper value of oauth_token.

This pops a web page that looks like so:


When the user clicks the “approve” button, the web page submits the form to https://api.twitter.com/oauth/authorize, Twitter responds with a web page, formatted in HTML, containing an approval PIN. The normal flow is for the web browser to display that form.  The user can then copy that PIN from the browser, and paste it into the client application, to allow the client to complete the message exchange with Twitter required for approval.   If the client itself is controlling the web browser, as with a Windows Forms WebBrowser control, then the client app can extract the PIN programmatically without requiring any explicit cut-n-paste step by the user.

The client then sends another message to Twitter to complete the authorization. Here again, the Authorization header is formatted specially, as described above. In this case the header must include oauth_token and oauth_verifier set to the token and the PIN received in the previous exchange, but it need not include the oauth_callback. Once again, the client must prepare the signature base and sign it, and then embed the resulting signature into the Authorization header. It looks like this:

POST https://api.twitter.com/oauth/access_token HTTP/1.1
Authorization: OAuth
    oauth_consumer_key="xxxxxxxxx",
    oauth_nonce="1ovs611q",
    oauth_signature="24iNJYm4mD4FIyZCM8amT8GPkrg%3D",
    oauth_signature_method="HMAC-SHA1",
    oauth_timestamp="1336764022",
    oauth_token="RtYhup118f01CWCMgbyPvwoOcCHk0fXPeXqlPVRZzM",
    oauth_verifier="0167809",
    oauth_version="1.0"
Host: api.twitter.com

(In an actual message, that Authorization header will be all on one line). The server responds with a regular HTTP response, with message (payload) like this:

oauth_token=59152613-PNjdWlAPngxOVa4dCWnLMIzZcUXKmwMoQpQWTkbvD&
    oauth_token_secret=UV1oFX1wb7lehcH6p6bWbm3N1RcxNUVs3OEGVts4&
    user_id=59152613&
    screen_name=dpchiesa

(All on one line). The oauth_token and oauth_token_secret shown in this response look very similar to the data with the same names received in the prior response. But these are access tokens, not temporary tokens.  The client application now has a token and token secret that it can use for OAuth-protected transactions.

All this is necessary when using an application for the first time, and it has not been authorized yet.

In the next post I’ll describe the message flows that use this token and secret to post a status update to Twitter.

Twitter’s use of OAuth, part 1

Sending messages to Twitter from any application is straightforward, if you follow the protocol.  The problem is, Twitter does not explicitly define the protocol in any one place.

If you go to the developer center on Twitter’s site, there is a section that deals with how Twitter uses OAuth (which is defined in IETF RFC 5849), but Twitter doesn’t explain exactly how.  Instead, Twitter refers to external sources (one, two) for the definition of the general OAuth protocol, and it expends a bunch of effort explaining why apps cannot use Basic Auth.  Not helpful for an implementer. Twitter’s doc page then points developers to existing third-party OAuth libraries.  There is no documentation that I could find that explicitly describes or demonstrates what I would call the communications protocol: the message flows required to use OAuth to connect to Twitter, for the various scenarios – a web application, a “traditional” desktop app, a mobile app, and so on.

Beyond the communication protocol, there is no firm guidance from Twitter on User Interface models – what an application should look like or do.  There is no firm guidance around the issues surrounding the OAuth consumer keys and secret keys.

For various reasons a developer might want more detailed information.  For one thing, the developer may be developing their own Twitter-friendly library, for various reasons: licensing or other intellectual property concerns, efficiency, control.  If you are developing your own application that uses Twitter, the upshot is: the implementation is the documentation. Twitter provides some reference material, but it is not complete; when you get down to it, you need to try and see. Develop, test, diagnose, see what works. Developers will have to figure it out themselves.

I went through that process with a couple of applications based in C#, and here’s what I learned.  The basics:

  1. OAuth isn’t mysterious or mystical. It’s well-defined. Using the protocol with Twitter requires some exploration.
  2. OAuth defines 3 parties to the transaction: a server, a client, and a resource owner. The server is the manager of resources; in the case of Twitter, twitter is the server.  The resource owner is the user.  The client is the application that the user employs to connect to the server – for example, something that posts Tweets on behalf of a user. This can be a web application (another server), a desktop application, a mobile phone app, and so on.
  3. All OAuth 1.0 clients (and remember, client = application) have an associated “consumer key” and  “consumer secret”.  In later drafts of OAuth 1.0 these were together termed “client credentials”.  When a developer builds an app, he registers it with the OAuth provider – in this case, Twitter – and receives those two opaque strings. As part of the OAuth protocol, the application (aka client) sends the consumer key to Twitter at runtime, encrypted with his consumer secret, in order to identify itself. If Twitter can decrypt the message with the copy of the secret it holds in association to that consumer key, Twitter concludes the client to be authentic.
  4. When the client (application) identifies itself to Twitter, Twitter checks to see if the client is authorized to act on behalf of the user or “resource owner”.  If not, Twitter instructs the client to tell the user to approve the client for this interaction, via a web page interaction. In other words, Twitter tells the client, “Have the user visit this web page…” and when the client does so, Twitter asks the user, “Do you approve this app?”.  For this approval, the user (resource owner) must authenticate directly to Twitter with his username and password, and can then reply yes or no.  With an affirmative reply, Twitter marks the client (really, the application, identified by the consumer key it presents to Twitter) as authorized to act on behalf of this particular user.
  5. Now that the client is identified and authorized, it can send requests to Twitter to update the resources on behalf of the resource owner. In simple terms: the app can post tweets to the user’s status stream.
  6. If the user wants to later de-authorize the client application from acting on its behalf, it can visit Twitter.com to do so.

A Concern about Client Identity

The use of the consumer key or client credential as the basis for application identification means that the claimed identity of desktop applications cannot be trusted.  Here’s why:  OAuth assumes that an application that holds these credentials is the registered application.  Yet most traditional desktop or mobile applications will store this information in the compiled binary, or in some external store. This information will be accessible to other applications, which means the “client identity” can be easily forged, either mistakenly or on purpose. A developer just needs to grab those two items and embed them in a new app, and you’ve got a “stolen application identity” problem.  In short, anyone that uses a desktop application to connect to Twitter has the power to forge the identity of that application.  This is one of a set of concerns around Twitter’s use of OAuth, that have been previously articulated.  I don’t have a good answer for this problem.

In my next post I’ll describe the OAuth 1.0a protocol flows.