Sunday, March 29, 2020

Sunday, March 29, 2020

  • Harvested the carrots from the planter boxes, a few fistfuls of small knotted roots. In the future I think they're only worth planting in rows. We had bought these as seedlings from Sloat last year.
  • We still have two garlic growing from last summer, plus the one that appeared in the jasmine planter. None are quite large enough to harvest yet.
  • Brought the pots up from the storage unit. Transferred two tomatoes and one pepper into pots, and two tomatoes into the planter boxes.
  • Ordered two larger indoor pots from Cost Plus.
  • Tried to call Mom and Data on the Portal, but no luck. Dad called later on the telephone. They are having internet problems, and their Portal doesn't seem to work. Two more days until retirement.
  • Took a walk with Emma past Kaiser, past Humanmade, and back home via Connecticut.
  • Played Catan on the iPad for probably three hours today. Emma worked on her crossword puzzle.
  • Drank wine with lunch and dinner, though Emma abstained. In future I will not drink if she's not drinking.

Saturday, March 28, 2020

Saturday, March 28, 2020

  • Lunch of leftover Japanese takeout from Moshi Moshi last night. We have been appreciating food and drink more since the lockdown began.
  • Finished watching Charlie Parr KEXP concert with Emma, begun yesterday morning between work meetings. Too melancholy for her right now. I liked his short interview about homelessness towards the end.
  • Donated to the CDC Foundation and UCSF.
  • Finally set up recurring monthly donation to San Francisco-Marin Food Bank after intending to for months.
  • Ordered an extra large pizza from Goat Hill. Some kind of hoarding instinct. We will be eating this for a week.

Thursday, January 13, 2011

HTTP chunks and onreadystatechange

One of the features of HTTP 1.1 is "chunked transfer encoding". Rather than send a Content-Length header followed by the entire document, it is possible to transmit the body as a series of chunks, each with their own content length declaration. This lets you start sending the beginning of the document before you know how long it's going to be.

It also makes Comet "streaming" possible, letting you trickle down data without the overhead of a full HTTP request for each message. This depends on your browser telling you when new chunks arrive. As you might guess, this isn't supported by Internet Explorer. But all other major browsers that I've tried (Firefox, Chrome, Safari) will fire multiple XMLHttpRequest onreadystatechange events (readyState == 3) as additional parts of the document are received.

Here's MochiWeb's implementation of chunked transfer encoding, which is pretty straightforward:

%% @spec write_chunk(iodata()) -> ok
%% @doc Write a chunk of a HTTP chunked response. If Data is zero length,
%% then the chunked response will be finished.
write_chunk(Data) ->
    case Request:get(version) of
        Version when Version >= {1, 1} ->
            Length = iolist_size(Data),
            send([io_lib:format("~.16b\r\n", [Length]), Data, <<"\r\n">>]);
        _ ->

For each chunk, you send an integer size, followed by a newline, followed by that number of bytes of data, followed by another newline. On the client, the web browser stitches each segment together, appending the data to responseText.

When designing Kanaloa's streaming protocol, I initially took it for granted that each chunk would have its own onreadystatechange event. This made parsing the chunks simple; in my case, I just sent down a valid JSON array in each chunk, kept track of how much responseText I'd already seen on the client, and called JSON.parse on the difference.

The first thing I noticed was that sometimes single chunks would be split across multiple events. I theorized that this resulted from them being put into multiple TCP packets, and indeed limiting the chunk size to the typical TCP segment size seemed to fix this problem.

The next thing I noticed was that sometimes multiple chunks would be concatenated into the same event.  This was also a problem, as JSON.parse needs a valid expression, and '["foo"]["bar"]' wasn't cutting it.

You can see both of these cases demonstrated here:
The small chunks are often concatenated, and the large chunks are split.

I took a look at the TCP packets in Wireshark, and am struct by two things. First, the small messages do in fact arrive as their own separate TCP packets. So the browser is stitching them together into the same event in some cases. They arrive at roughly equal intervals in the case I examined.

Secondly, in the cases where a chunk is split across multiple events, the event boundaries do correspond with the packet boundaries.

So I think we can conclude that the browser simply reads in incoming packets into its responseText buffer, and fires onreadystatechange for each. If your script is still running from the previous event, it just makes the responseText available to you when you get that field, rather than wait to send another event later.

This sort of begs the question whether we could construct a scenario where additional text gets appended to responseText without you being notified, or where it changes between multiple reads to that field by the same JavaScript thread. But I've just about had my fill of this topic for now : )

In the end I had to do what I'd been hoping to avoid from the start, and write my own logic to split the response, rather than trust the events to delineate them. The result may be the world's simplest and least featureful JSON parser, whose only job is to split a string into substrings that encode JSON arrays, which can in turn be properly deserialized. But it seems to work, and because it leaves unterminated arrays untouched, I can now also receive messages of arbitrary size that span multiple chunks.