Iterating on Mobile Apps at Web Speed
This article covers a collection of techniques that allow me to build HTML5 applications with native capabilities, and iterate on my code using the edit-save-refresh cycle that is characteristic of Web development.
Motivation
I love building Web applications, and I'm addicted to the edit-save-refresh cycle.
At the same time, I miss many features that mobile app developers take for
granted, but will take years to get through the Web standards gauntlet, such
as WakeLock and
Push notifications. Security decisions that
make perfect sense in the context of Web pages become really limiting for
applications. For example, it is impossible to build an alarm clock, because
the play()
method on HTML5 media elements (<audio>
and <video>
) requires
user interaction, at least on
Android
and
iOS
HTML5 application wrappers rose to respond to this challenge. A wrapper
essentially bundles a browser component (usually known as a WebView
),
glue code that exposes native capabilities to the JavaScript inside the
WebView, and the HTML, CSS and JS code that you supply. The first wrapper that
I am aware of is Titanium.
Today's favorite wrapper, Apache Cordova, has made a lot of progress on the tooling front, but still uses the same bundling technique, which means that I have to build a native application, deploy it on my mobile device, and restart, whenever I want to make a change.
Outline and Code
The following sections will describe the tricks I used to restore the edit-save-refresh cycle inside a Cordova application container.
Some tricks in this article can be repurposed for other wrappers easily, while some tricks are specific to Cordova's implementation. The latter may still suggest approaches to solving similar problems in different wrappers.
I used the methods described here in the html5-app-container project, which is open-sourced on GitHub. The code samples in this article lack some of the details in the GitHub project, in the interest of brevity and readability. I consider the GitHub project code to be the production-quality version that you should use for your applications.
Chapter 1: The Naive Solution
In order to be able to use edit-save-refresh, I have to get the Cordova container to load my application code from a server. The naive solution would be to simply navigate to the server's URL from the application's main page.
This is essentially the application code shipped with the initial html5-app-container version. The key files are app/index.html and app/js/index.js.
While the application gets the edit-save-refresh cycle, it loses all the native
Cordova functionality. This is because the native functionality uses the
Cordova bridge, which is a fancy name for platform-specific hacks that let JavaScript code
communicate with the native Cordova code. The Cordova bridge is initialized by
loading a platform-specific cordova.js
file that is placed along the HTML5
application's files by the Cordova build chain.
At this stage, it is tempting to attempt to load cordova.js
from the Web
site. This could work on Android, where the HTML5 application is accessible
from file:///android_asset/www/index.html
, but is doomed to fail on iOS,
because the application path has a GUID in it, e.g.
file:///Users/pwnall/Library/Developer/CoreSimulator/Devices/CF0D2CC2-3E1B-49FB-BC59-AFE4FD4C65FD/data/Applications/75A1BEC5-6FBF-4E90-90B8-F4B8B22EF5B0/AppContainer.app/www/index.html
.
Even if the complex path on iOS can be tamed, Chrome's remote inspector
reveals that accessing a file:///android_asset
resource from a http
origin
causes a security error, as shown below.
Before moving on to the next attempt, it's worth noting that even the naive
solution provides advantages over running the application from a Web browser.
Cordova's WebView uses a security model that is better suited for applications,
e.g. the play()
method of HTML5 media elements can be called without user
interaction.
Furtermore, the Crosswalk project offers a Cordova fork that uses a WebView built on Chromium, instead of the platform's native WebView. This approach removes a lot of variability coming from the wide range of WebView versions shipped by device firmware. I can't recommend it strongly enough! In fact, I wasted a few months trying to build pretty much the same thing, before this existed.
Last, both Cordova and Crosswalk support debug and release builds. The debug builds set up the WebViews so that they can be debugged remotely. On iOS, the Cordova WebView can be debugged by Safari's Remote Inspector. On Android, the Cordova WebView and the Crosswalk WebView can be debugged using Chrome's Remote Debugger.
In both remote debuggers, hitting the Refresh keyboard shortcut (Command+R on Mac OS X, Control+R on all other systems) reloads the WebView. This is an important piece of the edit-save-refresh cycle, as having to reach out for the phone and close/reopen the application would be much slower than hitting Refresh on my development machine.
Chapter 2: The Iframe
The <iframe>
element has introduced many security issues in the Web platform,
so it seems fitting that we'd try to use it to work around Cordova's
restrictions.
A reasonable design would be to have the actual application running inside a
host page's <iframe>
. The <iframe>
takes up the host page's entire display
space. The host page is the default Cordova starting page,
www/index.html
, so it can load cordova.js
and talk to the Cordova
bridge.
The host page uses
postMessage
to communicate with the application running inside the <iframe>
.
To keep the initialization process simple, the host page first waits until the
Cordova bridge is initialized, and then loads the Web application in the
<iframe>
.
The postMessage
receiver only accepts messages whose origin matches the Web
application. This prevents hostile content, such as embedded ads, from
obtaining native capabilities.
The receiver shown above is simple by design. It essentially eval()
s the
received message and responds with a message that contains the evaluation
result. Using eval
means that I can iterate on all the interesting code by
changing the Web application code.
The main drawback of this approach is that sending JavaScript strings to the host page is an unnatural technique for developing Web applications, and can get tedious as the code that interacts with native features becomes more complex. Rails (my Web framework of choice) does not have good support for putting together JavaScript strings, and this was a deal-breaker for me.
The second drawback is that debugging my application with Chrome or Safari's
remote inspector is a bit more tedious. In order to see my page's elements, I
have to peek into the host page's <iframe>
. To play with the JavaScript
objects, I have to access the <iframe>
's contentWindow
. The deviceready
handler code posted above assigins the <iframe>
element to the iframe
property on the host page's window, to make debugging slightly easier.
Last, I was worried that as I use data-intensive features, such as Cordova's File API, shuttling large strings around might become a performance bottleneck.
Fortunately, we can do better!
Chapter 3: The Iframe Grows Stronger
This section describes a trick that removes the need to postMessage
strings
to the host page in most cases, by giving the Web application inside the
<iframe>
direct access to Cordova's native objects. This makes the <iframe>
approach more bearable.
In a nutshell, after the Web application loads inside the <iframe
>, it posts
a message to the host page, asking it to copy the references to Cordova's
objects to the <iframe>
's window object. For generality, the host page
creates an empty object, copies all the enumerable properties of its window
into the newly created object, then assigs the object to a property on the
<iframe>
's window.
Note that the host page still waits for the Cordova bridge to be fully
initialized before navigating the <iframe>
to the Web application. This
guarantees that the host page's window contains Cordova's native objects by
the time the postMessage
receiver is invoked.
The second major html5-app-container version
implements the trick above, combined with the eval()
receiver described in
the previous section. The key files are
app/index.html
and
app/js/index.js.
Unfortunately, this trick does not work in Crosswalk 10+, so I had to look for a better solution. However, it is worth knowing, as it still works in Cordova, and might come in handy when dealing with other HTML5 application wrappers.
Chapter 4: The Demise of the Iframe
A different approach to obtaining native capabilities is to attempt to initialize the Cordova bridge in the Web application.
In this case, we don't embed any code in the Cordova application. Instead, we
add a config.xml
whose <access>
points to our Web application.
According to Cordova Bug 5988,
the Cordova bridge can be used from a file://
origin, or from the top level
page's origin. So, if we can obtain the platform-specific JavaScript in the Web
application, we can execute it and gain access to Cordova's native features.
Fortunately, Cordova's JavaScript is well-structured, and collected in the
platform-specific assets directory for each platform. For example, the Android
JavaScript is all in platforms/android/assets/www
and the iOS JavaScript is
in platforms/ios/www
.
The first file we need is cordova.js
, the build product of the
cordova-js project. As far as we're
concerned, it implements Cordova's module system (cordova.define
and
cordova.require
), and contains the platform-specific bootstrap code.
The second file we must extract is cordova_plugins.js
, in the same directory.
It defines the cordova/plugin_list
module, which exports a list of all the
plugins installed in the application.
The other files needed to set up the bridge are spread throughout the
plugins/
directory. A reasonable method for obtaining them is recusively
looking for .js
files inside the plugins/
directory. In order to avoid
including any test code, we can filter out the files that don't start with
cordova.define
.
We can collect and concatenate all the files for each platform into one big
bundle file, e.g. android.js
for Android, and ios.js
for iOS. In the Web
application, we need to detect whether we're running under Cordova and, if so,
load the appropriate bundle.
I use the Rails asset pipeline to serve the JavaScript files, as it does all the tricks needed to serve them with long-lived cache expiration dates. In order for this to work, I need to use asset pipeline-generated URLs to load the scripts.
The JavaScript bundles must not be executed when the Web application does not
run under Cordova. For example, the Android bundle currently uses
window.prompt()
to talk to the native Cordova code. If this code runs outside
Cordova, the user will be presented with cryptic dialog boxes. In order to
avoid that, we must make sure that we run under Cordova before executing any
Cordova JavaScript.
The code above takes advantage of Cordova implementation details, and was
derived from Cordova 3.6. On Android, Cordova defines a _cordovaNative
object. On iOS, Cordova appends a numeric nonce wrapped in parantheses to the
WebView's User-Agent
property.
If the Web application detects that it runs under Cordova, a <script>
tag
pointing to the appropriate script is dynamically generated and added to the
page.
While implementing this approach, I ran over a
Crosswalk crashing bug
that brings down the entire Android application when a non-file://
page
attempts to initialize the Cordova bridge. Fortunately, I was able to find and
submit a fix.
The third major html5-app-container version uses the approach outlined in this section. The only application file is app/config.xml. The Cordova JavaScript specific to Android and iOS is collected by two bash scripts, script/bundle-android.sh and script/bundle-ios.sh.
This method addresses the main issues raised by the <iframe>
approach. The
Web application does not need to postMessage
JavaScript strings to a host
page, and the Chrome and Safari remote inspectors display the Web application's
elements and interact directly with the application's JavaScript world.
Unfortunately, this method's requirement of hosting Cordova's platform-specific JavaScript bndles on the server introduces significant complexity in long-lived production applications. For example, if a security issue is reported in the Cordova version used by the application wrappper, a new version of the wrapper must be released. New bundles must be produced and deployed on the production server before the new wrapper can be released. Furthermore, the Web application must retain the bundles for all the wrapper versions that were ever released.
From a performance standpoint, this approach has the downside of having to load
the platform-specific JavaScript over the network. The Android-specific
JavaScript file has almost 300kb, and the iOS-specific JavaScript is slightly
biggger. Furthermore, the file is loaded via a dynamically-added <script>
tag, which eludes
the pre-loader.
The Rails asset pipeline sweetens the performance blow quite a bit, as the file is 26kb after minifying and gzipping, and is served with a long-lived cache expiration date. Still, it's nice to know that the following refinement removes the performance issue completely, while being easier to maintain at the same time.
Chapter 5: The Return of the Iframe
In this section, we modify the approach presented above to remove the need for server-hosted JavaScript bundles. The high-level idea is rather straightforward, namely we must restructure our code so that the Web application receives the JavaScript bundle from the Cordova wrapper. However, we need a couple of tricks to bypass the security measures in Cordova's WebView.
First, the build scripts for the Cordova-based wrapper must be augmented to extract the JavaScript bundle, as described in the previous section.
This is mildly tricky, because the Cordova build scripts wipe the www
directory inside the platform-specific platforms/
subdirectory at the
beginning of a build. Fortunately, the script copies the platform-specific
cordova.js
from the platform_www
directory inside the platform's
platforms/
subdirectory, so we can stash our JavaScript bundle there.
Also, we minify the JavaScript bundle using UglifyJS 2. On the Andoid and iOS bundles, minification cut down about 300k of JavaScript to less than 100k. This matters because we'll store the bundle in sessionStorage, which is read synchronously from JavaScript.
Second, we must send the JavaScript bundle to the Web application, despite the
fact that it cannot read file://
resources. We bypass this limitation by
resurrecting the <iframe>
technique discussed earlier.
The application wrapper includes a host page that loads a bootstrap page from
our Web application in a hidden <iframe>
.
The bootstrap page receives the JS bundle from the host page via postMessage
and stores it in its sessionStorage
. Note that the bootstrap page must be
hosted by the Web application, because the sessionStorage
store is
per-origin. After the bundle is stored, the bootstrap page sends a message to
the host message telling it to navigate to the Web application's main page. The
main page URL is contained in the message so that the application wrapper only
needs to hard-code one URL, namely the bootstrap page's URL.
The host page JavaScript below looks large, but it can be de-constructed into
small straightforward parts. The getBundle
function uses
XMLHttpRequest to read the JavaScript
bundle. The onMessage
function implements the postMessage
-based protocol
used to communicate with the Web application's bootstrap page, and is similar
to the code for the <iframe>
-based approach presented above.
Note that after the JavaScript bundle transfer is complete, the host page's
code navigates the top-level window to the Web application's main URL. So,
after the bundle transfer, the Web application takes over the entire Cordova
WebView, as opposed to being contained inside an <iframe>
, so the drawbacks
of the <iframe>
-based solution do not apply.
This method also does not require the Web application to rely on Cordova
implementation details to detect whether it runs inside a Cordova container or
not. Instead, application code can simply probe sessionStorage
for the
Cordova JavaScript bundle.
This was implemented in the 4th major html5-app-container version uses the approach outlined in this section. The host page HTML is app/index.html and the loader code is in app/js/loader.js. The Cordova JavaScript bundles are collected by script/bundle-ios.sh, which (confusingly) builds the Cordova-based wrapper for both iOS and Android. script/bundle-android.sh builds an Android wrapper using the Crosswalk fork of Cordova.
Unfortunately, the code here has an XSS (cross-site scripting) vulnerability.
If an attacker convinces a user to download a malicious HTML page and open the
downloaded version, such that it gets the file://
origin, the malicious page
can open our Web application's bootstrap page in an <iframe>
and use the
postMessage
protocol to convince the bootstrap page to store some evil
JavaScript in sessionStorage
. If the user then visits our Web application
from the same computer, the application will run the attacker's evil
JavaScript.
Chapter 6: The Revenge of the Iframe
In order to fix the XSS vulnerability, we apply a well-known method for preventing against CSRF (cross-site request forgery) vulnerabilities.
The Web application's bootstrap page generates a random token and places it
inside a <div>
attribute in its DOM tree. The application wrapper uses its
privileges to extract the token from the <div>
element, and sends the token
together with the Cordova JavaScript. The bootstrap does not write to
sessionStore
if the token in the message doesn't match the value that it
generated.
The main source of complexity in the updated bootstrap page code (shown below)
is the token generation code. We first attempt to generate a cryptographically
secure token, using the WebCrypto API.
If that is not supported, we fall back to using Math.random()
.
Note that when the code above receives an incorrect token, it follows through
the postMessage
protocol, but doesn't update sessionStorage
. This reduces
the amount of information that a potential attacker might obtain, at the cost
of making it harder to debug the JavaScript bundle transmission process.
The host page's code, embedded in the application wrapper, is identical to the
previous version, up to the getIframeToken
function. The main source of
complexity here is working around browser support for the contentDocument
property. The onMessage
function was modified to call out to
getIframeToken
and embed the return value in the message that it posts to
the bootstrap page.
The security in this method can be proven by reasoning that an attacker who can
extract the token from the bootstrap page can also extract a CSRF token
embedded in a <form>
, <input>
or <meta>
tag. Such an attacker can already
execute requests on behalf of its victim.
Unfortunately, just like the contentWindow
trick described in a previous
section, the code above does not work in Crosswalk 10, because of
this bug.
If you want to use Crosswalk versions impacted by the bug, the host page
JavaScript must be modified to catch security exceptions and return a special
CSRF token, such as 0000
. The bootstrap page can accept the special token if
it can verify that the JavaScript execution environment belongs to an
impacted Crosswalk version.
The host page changes above make the token extraction code more robust, and
have no negative impact, from a security perspective. Therefore, they are
included in the html5-app-container
code.
The Web application page gets to decide whether it accepts the special CSRF token. Therefore the modification below can simply be skipped by an application that never releases a wrapper based on a Crossswalk version impacted by the bug above. Although the changes
This was implemented in the 5th major html5-app-container version uses the approach outlined in this section. The new loader code is in app/js/loader.js.
The code presented above has a couple of performance issues, compared to a wrapper that loads the Web application directly into a WebView. Time-permitting, they may be the subject of future work.
First, loading the bootstrap page adds an HTTP round-trip to the application's startup time. This can be significant for applications that are meant to be used on the go. The round-trip can be eliminated from most starts by serving the bootstrap page with a far-in-the-future cache expiration time.
A high-performance solution would remove the bootstrap page completely. Instead, a Cordova plugin would serve the JavaScript bundle from a custom URL scheme. The Web application could attempt to load the bundle via an XMLHttpRequest, and would report a Cordova boot failure if the request fails. This approach would be expensive in terms of engineering time, as a Cordova plugin requires different native code for each supported platform, and must be maintained to track updates to the Cordova API.
Second, storing the (rather large) JavaScript bundle in sessionStorage
is a
source of performance issues. The sessionStorage
API consists of blocking
calls, so the WebView's JS thread is unresponsive while the mobile device reads
or writes the JavaScript bundle into the sessionStorage
backing store. Also,
some WebView implementations store the sessionStorage
contents in RAM, so the
JavaScript bundle might use 200kb of RAM (the bundle is 100kb after
minification, but is likely stored as a
UTF-16 or UCS-2 string.
So, we're possibly taking up 200kb of RAM with a string that's only used once
during page load.
The storageSession
is used in these samples because of its simplicity and
wide support. A performance-conscious Web application could use
IndexedDB and
fall back to the deprecated
Filesystem API and
WebSQL database API.
Unfortunately, all fallbacks are necessary to achieve reasonable platform
coverage. On a brighter note, the choice of bundle backing store is completely
decoupled from the code inside the Cordova wrapper, so a Web application can
deploy this performance improvement without having to change the code in
html5-app-container
.
Of course, the problem of storing the JavaScript bundle goes away if the bootstrap page is eliminated by implementing a Cordova plugin.
JavaScript Bundle Signatures?
A noteworthy alternative to the CSRF token extraction method described here is signing the JavaScript bundles. The signatures would be produced during the application wrapper build process, and verified by the code in the bootstrap page.
Unfortunately, this leaves secret key signing methods, like HMAC, out of the question, because the signature would be embedded on the bootstrap page. Verifying asymmetric (e.g., RSA) signatures is slow when done in JavaScript. Furthermore, signing would burden the build process with the extra complexity of handling a secret key.
Last but not least, relying on signatures would mean that the bootstrap page would accept any previously signed JavaScript bundle. A troll could use this to load a Cordova bundle into the site when used from a normal browser, which would cause JavaScript prompts to pop up when the user browses to our Web application.
A more serious attacker could load an old (but still signed) version of a bundle with known security vulnerabilities, and exploit the vulnerabilities against the site.
General Security Considerations
Unfortunately, all the designs above have the drawback that any XSS (cross-site scripting) vulnerability can be used to obtain access to native features.
At the same time, it is worth mentioning that native applications also suffer from security vulnerabilities, such as push notifications spam and hijacking and Android intent hijacking. Furthermore, a security patch in a Web application requires a server push to be deployed, which can be done in minutes or at most hours. A mobile application patch requires an app store / market submission, which often includes a human review, and can take between hours and days.
Conclusion
This article described the tricks I used in the html5-app-container project, which allows me to build Web applications with the native capabilities afforded by Cordova, and iterate on my code using the edit-save-reload cycle that is typical of Web applications.